Research Project

When Should We Trust a Medical Image Classifier?

Explore one patient at a time. For each chest X-ray, see the model's prediction, confidence, and explanation—with full transparency about certainty and limitations.

This is a research project for methodological validation. It demonstrates rigorous approaches to model reliability, calibration, and trustworthiness. Not intended for clinical deployment without proper validation and regulatory approval.

How to Use This Demo

One Patient, Step-by-Step

Trust is learned through transparency and progressive explanation.

Pick a Patient Type

Select a category of patients to explore (e.g., confirmed pneumonia, normal findings, or uncertain cases).

See the Prediction

For the selected patient, read the model's assessment: "92% chance of pneumonia" - plain English, not jargon.

Assess Confidence

Learn how confident we should be in that prediction based on model uncertainty and explanation stability.

Interpret the Explanation

See where the model focused (red heat map) and compare it to what an untrained model would see.

Browse Similar Cases

Use arrow buttons to explore more patients of the same type and see patterns.

Learn Deeper (Optional)

If interested, expand the "System Insights" section to see calibration curves and aggregate metrics.

Interactive Demo

Analyze Patient Cases

Select a patient type, then browse one patient at a time.

This viewer uses representative random test cases with Grad-CAM overlays to mirror expected field behavior.

In this filtered set: 38 correct and 2 incorrect.

Patient Category:Trust Status:

Selected patient type

Showing all patient categories.

Selected trust status

Showing all trust outcomes.

X-ray Image

Where the model looks

Comparison: untrained model

Practitioner View

Predicted diagnosis

NORMAL

Pneumonia probability: 10%

Raw confidence highUncertainty lowUncertain: Manual review

Recommendation

Model should defer to manual professional review for this case.

Optional Details

System-Level Insights

Expand below to see system-wide metrics. These are useful for researchers, less so for individual patient decisions.

Conclusion

Final Takeaways on Trust

What we learned about when (and when not) to trust this model.

Trust Distribution

Key Lessons

Total audited samples: 624
Incorrect predictions: 63 (10.10%)
Overconfident errors (high confidence + wrong): 3 (0.48%)
Stably wrong cases: 21 (3.37%)
Material calibration shifts: 0 (0.00%)
Decision flips after calibration: 0 (0.00%)
Calibration primarily changes probability interpretation; it does not guarantee better decision quality.
Deterministic uncertainty-error correlation: 0.3625

This model is not a replacement for expert radiologists. It is a decision-support tool, and its predictions should always be reviewed in clinical context.