Towards Trustworthy Predictions: Theory and Applications of Calibration for Modern AI

5 May 2026

Tangier, Morocco

About the workshop

This workshop focuses on calibration, the alignment between predicted probabilities and observed frequencies, which is fundamental to reliable decision-making and trust in modern AI systems. Bringing together researchers from machine learning, statistics, theoretical computer science, and applied domains such as medicine and forecasting, the workshop aims to unify perspectives on calibration theory, evaluation, and practice. Through a tutorial, invited talks, contributed posters, and interactive discussions, we seek to foster a shared understanding of calibration and to build a lasting cross-disciplinary community around trustworthy probabilistic prediction.

Call for papers

The primary aim of this workshop is to bring together researchers and practitioners working on calibration across machine learning, statistics, theoretical computer science, and applied domains. We seek to clarify foundational questions, align evaluation practices, and explore the practical implications of calibration for reliable and trustworthy AI systems.

Topics

The potential topics include, but are not limited to:

  • Foundations of calibration and probabilistic forecasting
  • Calibration metrics and evaluation methodologies
  • Proper scoring rules and decision-theoretic perspectives
  • Calibration in high-dimensional and multiclass settings
  • Post-hoc and end-to-end calibration methods
  • Calibration under distribution shift
  • Calibration for generative models and large language models
  • Calibration in high-stakes applications (e.g., medicine, forecasting, finance)
  • Connections between calibration, uncertainty, and trust in AI

Submissions

🚨 Submit to our workshop and win a free registration for AISTATS 2026 🚨
We will offer a free conference registration to the best workshop submission led by a student, don't miss the opportunity to showcase your work and attend the conference for free!

We invite submissions of short papers presenting recent work on calibration. Submissions are accepted through OpenReview.

If your paper about calibration (or a closely related topic) is already accepted at the main AISTATS 2026 conference (congrats 🎉), you can register to present it at our poster session by filling the following form: main conference paper track.

Important dates

  • Call for contributions: January 12, 2026
  • Submission deadline: February 20, 2026 📣 Extended deadline: February 27, 2026 (Anywhere on Earth) 📣
  • Notification of acceptance: March 9, 2026
  • Workshop date: May 5, 2026

Format

Submissions should be formatted using the AISTATS LaTeX style. Papers are limited to 4 pages (excluding references and appendices). The review process will be double-blind. Accepted contributions will be presented as posters during the workshop. If you include an appendix, keep in mind that reviewers might not read it carefully. Your principal idea / contribution should be understandable from the main text.

Policies

Submissions under review at other venues are allowed. All accepted papers are non-archival and will be made publicly available on OpenReview.

Speakers

Peter Flach

Peter Flach

Tutorial — Foundations of Calibration
Ewout W. Steyerberg

Ewout W. Steyerberg

Keynote — Trustworthy Patient-level Predictions
Johanna Ziegel

Johanna Ziegel

Keynote — Calibration of Probabilistic Predictions
Florian Buettner

Florian Buettner

Invited Talk — Calibrated Uncertainty for Biomedical Applications
Nika Haghtalab

Nika Haghtalab

Invited Talk — Multi-objective Learning

Schedule

Peter Flach

Tutorial Peter Flach Slides

Foundations of Calibration, Metrics, and Open Questions

Coffee Break

Ewout W. Steyerberg

Keynote Ewout W. Steyerberg Slides

Towards Trustworthy Patient-level Predictions: A Multiverse of Uncertainty and Heterogeneity

Nika Haghtalab

Invited Talk Nika Haghtalab Slides

Multi-objective Learning: An Algorithmic Toolbox for Optimal Predictions on any Downstream Task and Loss

Lunch Break

Florian Buettner

Invited Talk Florian Buettner Slides

Leveraging Calibrated Uncertainty Estimates for Biomedical Applications

Johanna Ziegel

Keynote Johanna Ziegel Slides

Calibration of Probabilistic Predictions

Break

Awarded Papers

Presentations by the Recipients of the Student Paper Award and Non-Student Paper Award

Poster Session

Contributed Posters Showcasing Recent Work on Calibration

Open Problems Session

Moderated Discussions on Open Challenges in Calibration

Accepted papers

Awards

Accepted papers

Main conference track accepted papers

Report

The AISTATS 2026 workshop "Towards Trustworthy Predictions: Theory and Applications of Calibration for Modern AI" was held on May 5th, 2026 at the conference venue in Tangier, Morocco. The workshop featured a tutorial, invited talks, contributed posters, and a great deal of interactive discussion. The slides and papers from the workshop are available on this website. In this report, we aim to briefly go over some of the highlights.

The motivation for organizing the workshop was that the topic of calibration is increasingly important, yet studied in different communities (computer science including AI and ML, statistics, economics, forecasting, etc), which have relatively limited interaction. This can make it challenging to keep up with the latest developments, and has led to instances of researchers from different communities reinventing the same concept. Therefore, the topic of calibration is an area where building a strong international community could have significant benefits. The perception of the organizers was that the community and attendees showed a good interest and turnout. The picture below shows a well-attended room of perhaps 80 (all three workshops had a capacity of 150 seats).

Workshop attendees

Tutorial

With this in mind, the first event of the workshop was a 90-minute tutorial on calibration by Peter Flach (University of Bristol), who has previously co-authored a well-known survey on calibration. Peter's talk started from the basic definition of calibration: a prediction for the probability of an event (such as a patient having heart disease) is calibrated if the fraction of people who have heart disease among those with any specific prediction—say 20%—is actually equal to the predicted fraction (i.e., 20%). Peter presented a variety of motivations for calibration, including based on utility theory and efficiency in adapting classifiers to updated class priors. The tutorial concluded with a variety of more advanced topics, Including various forms of re-calibration. For more details please see the slides.

Florian Buettner's talk — distribution shift when recalibrating
Speaker at the podium

Invited talks

The workshop then continued with several exciting talks covering various state-of-the-art research directions in calibration.

The first talk after the tutorial was by Ewout W. Steyerberg (University Medical Center Utrecht), on trustworthy patient-level predictions. Ewout focused on a central question in clinical prediction: when can we trust a number such as a patient's predicted risk? He emphasized that calibration is a crucial part of assessing trustworthiness, but also that calibration is usually evaluated at the population level, while interpreting a prediction for one individual patient is much more delicate. A recurring theme was that uncertainty in medical prediction has many sources, including finite-sample uncertainty, modeling choices, and lack of applicability to new settings. Therefore, great caution is needed when interpreting individual risk estimates, even though they can still be valuable for risk communication and shared decision-making.

Nika Haghtalab (University of California at Berkeley) then gave a talk on multi-objective learning as an algorithmic toolbox for obtaining predictions that are useful across many downstream tasks and losses. The talk connected calibration to a broader framework in which one wants a single predictor to perform well simultaneously for many groups, distributions, and loss functions. Nika explained how ideas from multi-calibration, federated learning, and distributionally robust optimization can be viewed through a common lens. She also described algorithmic ideas such as on-demand sampling and min-max optimization, and highlighted recent results showing that, in some regimes, optimizing for many tasks and losses can be nearly as statistically efficient as optimizing for just one.

After lunch, Florian Buettner (Goethe-University Frankfurt/German Cancer Research Center (DKFZ)) talked about calibrated uncertainty estimates for biomedical AI. He emphasized that in medical settings, confident but wrong predictions can cause real harm, especially under distribution shift, where deep networks may make high-confidence errors. The talk highlighted applications in dermatology, pathology, retinal imaging, and blood cancer classification, and presented an "audit, improve, monitor" framework in which ModelAuditor identifies clinically relevant failure modes and suggests targeted fixes. Florian also connected these ideas to generative models and LLMs, discussing kernel- and spectral-entropy methods for estimating uncertainty from generated outputs.

The final talk was by Johanna Ziegel (ETH Zurich), on calibration of probabilistic predictions. Johanna started from the setting where a prediction is not a single best guess, but a full predictive distribution intended to quantify uncertainty about a future outcome. She surveyed several notions of calibration for probabilistic forecasts, including probabilistic calibration, auto-calibration, isotonic calibration, threshold calibration, quantile calibration, and marginal calibration, clarifying that these notions can differ substantially outside the binary setting. The talk then developed connections with conformal prediction and isotonic distributional regression, and discussed how conformal calibration guarantees can be used to obtain strong out-of-sample calibration guarantees. The talk concluded with applications such as temperature forecasting and length-of-stay prediction in intensive care, illustrating how these ideas can be used in realistic forecasting problems.

Paper awards

The technical program was rounded out by presentations from the recipients of the paper awards:

Poster session

Another component of the workshop was the poster session. The workshop featured approximately 45 posters, with topics ranging from theory, methods, algorithms, and applications of calibration. The conference venue had a unique outdoor poster session with a great deal of natural sunlight, which stimulated lively discussion among the presenters and attendees.

Outdoor poster session
Roundtable discussion

Roundtable discussion

The workshop ended with a 45-minute roundtable discussion where speakers and attendees shared their key concerns and perspectives. Topics included:

  • The importance of choosing sound evaluation metrics for calibration, and how different scenarios require different evaluation approaches beyond the popular Expected Calibration Error (ECE) — Johanna Ziegel, Sebastian Gruber
  • We are better at correcting miscalibration than estimating or evaluating it, leaving room for improvement at a fundamental level — Eugene Berta
  • Calibration from the perspective of algorithmic decision making captures specific forms of failure due to miscalibration, but these are still missing outside of the binary case — Nika Haghtalab
  • The ECE remains popular partly because there has not been a focused demonstration of its failures — Edgar Dobriban
  • Miscalibration happening despite strong predictive performance of modern ML methods surprises many; the notion of calibeating offers one answer, showing one can gain calibration without losing expertise — Souhaib Ben Taieb, Nika Haghtalab
  • New and emerging research areas in AI such as automated research raise questions about how we should use probabilistic predictions going forward — Nika Haghtalab
  • The community should think about fundamental limits in the area of calibration — Souhaib Ben Taieb

Looking forward

Several attendees mentioned interest in participating in or leading a community on topics related to the workshop. The organizers believe the following activities could be valuable:

  • Further workshops and meetings focusing on calibration, associated with major ML conferences or independent venues such as BIRS, Oberwolfach, etc.
  • Tutorials on calibration delivered at conferences and/or recorded online for improved accessibility.
  • Community resources including a web repository of papers, videos, computational tools/packages, and links to events.
View from the conference venue

The organizers thank the speakers, the attendees, and the AISTATS conference organizers (workshop chairs Quentin Berthet and Claire Vernade) for their great help and support.

Organizers

Sebastian Gruber

Sebastian Gruber

KU Leuven
Teodora Popordanoska

Teodora Popordanoska

KU Leuven
Yifan Wu

Yifan Wu

Microsoft Research
Eugène Berta

Eugène Berta

INRIA
Francis Bach

Francis Bach

INRIA
Edgar Dobriban

Edgar Dobriban

University of Pennsylvania

Tips for attendees

Conference venue

Hilton Tanger Al Houara
Km 17.5, Route de l'Aéroport, Al Houara, Tangier, 90000, Morocco

Poster printing

Participants are responsible for printing their own posters.

Poster board specifications

  • Dimensions: 2.4 m x 1.2 m
  • Material: 100% wood
  • Fixing method: pins or command strips

Local printing options