If you have spent time submitting clinical prediction model papers in the past year or two, you have probably encountered the TRIPOD+AI checklist, either because a journal required it at submission or because a reviewer flagged missing items in a critique. The checklist is now the baseline reporting standard for any study that develops or validates a prediction model, whether the underlying method is logistic regression, a random forest, or a deep neural network trained on electronic health records. It replaced the TRIPOD 2015 checklist, which was not designed with modern machine learning in mind.

Yet adherence is poor. Studies published in late 2025 found that only about 28 percent of machine-learning-based prediction model papers met TRIPOD+AI requirements, compared to roughly 38 percent of conventional regression-based studies assessed against the earlier framework. In one systematic review of orthopaedic surgery AI papers, no study fully satisfied the abstract reporting requirements. This is not a sign that researchers are careless. It is a sign that a 27-item checklist with a separate 13-item abstract version, covering domains from fairness evaluation to open science practices, is genuinely difficult to satisfy without deliberate planning from the study design stage onward.

This guide explains what TRIPOD+AI requires, where papers most often fall short, and how to integrate the checklist into your workflow before the manuscript exists rather than after it is already drafted.

Working Principle

Treat TRIPOD+AI the way you treat your ethics approval: a decision that shapes the study design, not an administrative box to tick after data collection is complete. The checklist items that authors most often omit, including study registration, sample size justification, and fairness analysis, cannot be retrofitted after the analysis is done.

From TRIPOD 2015 to TRIPOD+AI: Why the Update Was Needed

TRIPOD, which stands for Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis, has been the standard reporting framework for prediction model studies since its original publication in 2015. The original 22-item checklist covered the essential elements of study design, outcome definition, predictor selection, and model performance reporting. For the logistic regression and Cox proportional hazard models that made up most clinical prediction research at the time, it was a workable standard.

The problem was that by the early 2020s, machine learning methods had entered clinical research at scale. Gradient-boosted trees, support vector machines, neural networks trained on imaging data, and natural language processing models applied to clinical notes all pose reporting challenges that a regression-focused checklist cannot handle. How should a paper describe feature importance in a method with hundreds of predictors? How do you document preprocessing steps for a model trained on chest radiographs? What should readers know about calibration when the model outputs probabilities from a deep learning classifier? TRIPOD 2015 offered no guidance on any of these questions.

The TRIPOD+AI statement, published in the BMJ on 16 April 2024 by Gary Collins, Karel Moons, Paula Dhiman, Richard Riley, Andrew Beam, Ben Van Calster, and collaborators, addressed this gap directly. It is explicitly designed for both regression and machine learning prediction model studies, applies to development, validation, and combined development-and-validation designs, and the new checklist supersedes the 2015 version entirely. If you are writing a prediction model paper today, TRIPOD 2015 is no longer the appropriate reference. The EQUATOR Network, which maintains the most comprehensive directory of clinical reporting standards, now lists TRIPOD+AI as the current guideline and has retired the earlier version from active recommendation.

What the 27-Item Checklist Actually Covers

The TRIPOD+AI checklist runs through six broad phases of a prediction model study: the title and abstract, background and objectives, methods, results, discussion, and what the statement calls other information, covering open science and transparency disclosures. Each phase has specific requirements that differ from simple structural completeness.

The title section deserves attention because it is one of the most frequently missed items. TRIPOD+AI asks that the type of study, whether development, external validation, or combined, be identifiable from the title. A title like "A machine learning model for predicting 30-day hospital readmission" passes only if it is clear whether the paper develops a new model, validates an existing one, or does both. Many titles still omit this. Reviewers looking at the title alone cannot tell whether they are reading a methodological contribution or a validation of someone else's prior work, and that distinction matters for how they evaluate the paper.

The methods section carries the heaviest load, and it is where most omissions cluster. TRIPOD+AI requires researchers to specify the study setting and time frame in enough detail to permit replication. It asks for explicit statements about how missing data were handled, because imputation method, complete case analysis, and listwise deletion can each produce substantially different estimates of model performance. It requires authors to describe all preprocessing steps for predictors, particularly for imaging or text-based inputs where normalization, augmentation, and feature extraction choices are often undisclosed. It asks for the predictor selection approach, whether variables were chosen a priori based on clinical reasoning or empirically during model training, and how any variable selection or dimensionality reduction step was performed and evaluated.

Calibration reporting is another persistent gap. Discrimination metrics like AUROC or the C-statistic describe how well a model ranks patients by risk. Calibration describes whether the predicted probabilities are actually accurate. A model that assigns 20 percent predicted probability to a group should, if well-calibrated, see around 20 percent of those patients experience the outcome. Many prediction model papers report discrimination only. TRIPOD+AI requires calibration assessment and specifically asks for calibration plots or metrics alongside discrimination statistics. Papers that report only AUROC are not satisfying item 17 of the checklist, and reviewers familiar with the statement will notice.

Checklist items most commonly omitted in published studies

1.Study type not specified in the title (development vs. validation)
2.Calibration statistics and calibration plot absent from results
3.No confidence intervals reported alongside performance metrics
4.Study not registered in a public registry before data collection
5.Sample size justification absent or post hoc
6.Missing data handling not specified
7.Code or model not deposited anywhere accessible
8.No subgroup performance analysis for fairness evaluation

The New Domains That Set TRIPOD+AI Apart

Beyond updating the existing TRIPOD items, the 2024 statement introduced domains that reflect where clinical prediction research is heading and what funders, journals, and ethics boards are starting to require.

Fairness is a new domain. TRIPOD+AI asks whether model performance was assessed across subgroups of the study population, including subgroups defined by age, sex, race, or other clinically relevant characteristics. A model that performs well overall may perform substantially worse in a subgroup that was underrepresented in the training data or whose data came from a different care setting. Including a fairness analysis is not about political signaling. It is about whether the model actually works for the patients it will be applied to. A sepsis prediction model trained predominantly on one tertiary hospital's data may perform poorly at community hospitals, and that gap should be documented.

Reproducibility and open science requirements have also been strengthened. TRIPOD+AI asks whether the code used in the analysis is available, whether the model weights have been deposited somewhere, and whether training and test data sets are accessible or whether the paper includes a statement explaining why they are not. This mirrors requirements from major funders, including NIH data-sharing mandates and the UK Research Councils' open science expectations. Journals like PLOS Medicine, BMJ Open, and Nature Medicine now routinely ask about code and model availability at submission. Papers that offer no pathway for independent validation face increasing scrutiny during review, and in some cases outright rejection.

Patient and public involvement is another new item. For prediction models that will inform clinical decisions affecting patients directly, TRIPOD+AI asks whether patients or members of the public were involved in identifying the study question, designing evaluation criteria, or interpreting findings. Most academic research teams have not been doing this routinely, and the item is aspirational for many groups at present. Its inclusion in the checklist signals where funder expectations are moving, and grant applications for clinical AI work are increasingly assessed on whether patient involvement is planned.

A note on class imbalance

TRIPOD+AI added explicit guidance on class imbalance, which affects many clinical prediction models where the outcome is rare. If you used oversampling, undersampling, or synthetic data generation (such as SMOTE) to address imbalance during training, that step must be described in the methods. Applying it inside cross-validation folds also needs to be specified separately from applying it to the full training set before splitting, because the two approaches produce different estimates of performance.

The Abstracts Checklist Is Not Optional

One of the most consistently overlooked components of TRIPOD+AI is the separate 13-item checklist for the abstract. Published studies have found that no prediction model papers fully satisfy it, even when the full manuscript is reasonably complete. This matters because the abstract is often the only part of a paper that most readers see, and it is the section reviewers read first to decide whether the full manuscript is worth their time.

The abstract checklist requires the type of study to be stated, the source and size of the development and validation data to be described, at least one measure of model discrimination and one measure of calibration to be reported, confidence intervals to accompany any performance metrics, and a statement on whether the model, code, or data is publicly available. These requirements sit alongside any word limits the target journal imposes, which creates real tension. A 250-word structured abstract does not have a lot of room for all of this.

The practical answer is to prioritize metrics over prose. An abstract that opens with a single background sentence, spends three sentences on the study population and outcome, reports discrimination and calibration statistics with confidence intervals, and closes with a one-sentence data availability statement will satisfy reviewers better than one with eloquent framing but missing numbers. Background is the part of an abstract that readers understand from context. Performance with confidence intervals is the part they cannot infer. If space is tight, trim the background.

What the Adherence Studies Tell Us About the Gaps

The evidence on TRIPOD+AI adherence published through late 2025 paints a consistent picture across multiple clinical fields. A study examining AI prediction model papers in orthopaedic surgery published in the 18 months after TRIPOD+AI appeared found that the guideline had not measurably improved reporting quality. Appropriate title formatting was present in a minority of cases. Confidence intervals around performance metrics were rarely reported. Study registration was almost never documented. These findings were published in late 2025 and echo earlier analyses of prediction models in oncology, cardiology, and COVID-19 prognosis that found widespread gaps under both TRIPOD 2015 and TRIPOD+AI.

A separate systematic review of machine-learning-based COVID-19 prognostic models found TRIPOD+AI adherence at roughly 28 percent overall, lower than the adherence rate observed for conventional regression models assessed against the original TRIPOD checklist. The pattern across these studies is consistent. Authors report discrimination metrics and then stop. Calibration does not appear. Sample size is not justified. Missing data handling gets one sentence. The model is not deposited anywhere a reader could retrieve and test it.

These findings have prompted a December 2025 publication in the Journal of Clinical Epidemiology of an updated TRIPOD+AI adherence assessment tool, which incorporates the new checklist domains including fairness, class imbalance handling, and open science disclosures. The tool is designed for peer reviewers and research teams auditing their own manuscripts before submission. It gives a structured way to check whether each item is present, not present, or genuinely not applicable, and it requires a brief justification when an item is marked not applicable rather than leaving it as a blank field.

The implication for authors is straightforward. Several of the most commonly missing items, including study registration, sample size justification with a pre-specified power calculation or training set size rationale, fairness analysis planning, and data or code deposit, require decisions before data collection begins. You cannot register a study after the analysis is done and pass that off as prospective registration. You cannot run a post hoc power analysis and call it a sample size justification. These items have to be in the protocol.

Writing the Methods Section Under TRIPOD+AI

The most practical advice is to read the checklist before you finalize your study protocol, not when you sit down to write the manuscript. Once you have the checklist in hand, go through the methods section item by item and write the corresponding paragraph. The result will be a longer methods section than many researchers are used to, but it will be the kind of methods section that allows a reader to understand exactly what was done.

The data sources subsection should cover not just where the data came from but when it was collected, who it covered, what exclusion criteria were applied, and whether the development and validation sets came from the same or different sources. If you are using retrospective electronic health record data from a single hospital, that is important context. If you are using a multi-site registry, name the sites or at least their characteristics. Reviewers for prediction model papers routinely look for generalizability, and a vague "data were collected from a tertiary care center" raises questions that a sentence or two of additional context can answer.

The predictors subsection should name every variable that entered the candidate predictor set, describe how each was measured or extracted, and note any transformations applied before model training. For imaging-based models, this includes the preprocessing pipeline. For models using clinical notes, it includes the text processing approach. For structured EHR data, it includes how continuous variables were handled (raw value, binned, log-transformed) and how categorical variables were encoded. These details are not incidental. Two models trained on the same clinical question with different preprocessing choices will perform differently, and a reader who cannot reproduce your preprocessing pipeline cannot validate your model.

Template: Model performance reporting under TRIPOD+AI

Model discrimination was assessed using the area under the receiver operating characteristic curve (AUROC) with 95% confidence intervals calculated by bootstrap resampling (1000 iterations). Calibration was assessed using the Brier score and visually inspected using a calibration plot comparing mean predicted probabilities against observed event rates across deciles of predicted risk. Subgroup performance was assessed in pre-specified strata by age (<65 vs. 65 years), sex, and primary diagnosis category.

Notice what this template does. It names the discrimination metric and how the confidence interval was calculated. It names the calibration metric and describes how calibration was visualized. It specifies that subgroup analyses were pre-specified, not exploratory, and names the strata. Each of these details answers a TRIPOD+AI checklist item. Writing your methods with the checklist open is slower than writing from memory, but it produces manuscripts that clear review faster.

Registering a Prediction Model Study and Depositing Your Code

Two TRIPOD+AI items cause the most difficulty for research teams that are encountering the checklist for the first time: study registration and code or model deposit. Both feel like bureaucratic overhead, and both require decisions that have to happen early.

Prediction model studies, unlike randomized controlled trials, do not have a mandatory registry. ClinicalTrials.gov accepts observational and prediction studies but is not routinely used for them. OSF (the Open Science Framework) is the most commonly used registry for prediction model and other observational study pre-registrations. Zenodo is another option. A pre-registration does not constrain your analysis permanently. Deviations from the pre-registered plan are acceptable and should be reported as deviations, because readers and reviewers can then distinguish confirmatory from exploratory findings. What the registration establishes is that the primary research question, outcome, and analysis approach were decided before you saw the data.

For code deposit, the practical options for most academic research teams are GitHub, OSF, and Zenodo. Depositing code does not require making patient data public. The model weights, preprocessing scripts, and evaluation code can be separated from the clinical records and shared independently. A GitHub repository with a README describing the data format required to run the model, even if the original data cannot be shared, satisfies TRIPOD+AI and gives other researchers a starting point for validation on their own data. Reviewers at PLOS Medicine, BMJ, and other journals that have adopted open science policies will look for a code availability statement in the manuscript and for the corresponding repository.

If your institutional review board prohibits data sharing and you cannot share code because doing so would expose identifiable information, TRIPOD+AI accepts a clear statement explaining this. The statement should name the specific restriction (IRB prohibition, data sharing agreement, commercial sensitivity) rather than simply saying "data are available on reasonable request," which most journals now treat as equivalent to non-disclosure. If there is a genuine reason, write it plainly.

Getting the Journal Submission Right

Most journals that have adopted TRIPOD+AI require authors to upload the completed checklist alongside the manuscript. The checklist form asks authors to indicate the page number or section in the manuscript where each item is addressed. An item marked as not applicable should include a brief explanation. Simply entering N/A without context will flag a gap in peer review, and a reviewer familiar with the checklist will likely ask about it.

If you are submitting to a journal whose author instructions do not explicitly mention TRIPOD+AI but do require reporting guideline checklists for clinical research, include TRIPOD+AI regardless. The EQUATOR Network recommends it for all prediction model studies irrespective of whether the journal has updated its guidance. A checklist submitted voluntarily is not a reason for rejection. A missing checklist when the journal expects one sometimes is, and a reviewer who knows the standard and finds the manuscript incomplete will say so.

For journals that have not yet explicitly adopted TRIPOD+AI, it is worth checking whether the journal follows EQUATOR recommendations or publishes guidance specifically for clinical prediction model papers. Journals in the BMJ family, PLOS Medicine, Annals of Internal Medicine, and many specialty journals in cardiology, oncology, and radiology have either adopted TRIPOD+AI directly or signal alignment with EQUATOR guidance. Journals that are newer to prediction model research may still list only TRIPOD 2015 in their instructions. In those cases, using TRIPOD+AI and noting in your cover letter that you have applied the updated 2024 standard is an appropriate approach and is unlikely to generate editorial pushback.

Before you submit: a quick self-check

Run through each of the following before uploading your manuscript:

Study type (development / validation / both) is stated in the title.
Abstract reports at least one discrimination metric and one calibration metric, both with confidence intervals.
Abstract includes a data/code availability sentence.
Methods describe missing data handling explicitly.
All preprocessing steps for predictors are named.
Calibration plot or calibration statistics appear in the results.
Performance is reported by pre-specified subgroups (fairness evaluation).
Study registration number appears in the methods or before.
Code or model deposit is documented or a justified exception is stated.
The completed TRIPOD+AI checklist with page references is attached to the submission.

What Poor Adherence Actually Costs You

A common response to reporting checklists is that they slow down writing without improving science. The evidence from adherence studies suggests otherwise. The reporting gaps that TRIPOD+AI is designed to close, missing calibration, undescribed preprocessing, absent confidence intervals, are not cosmetic problems. They are the gaps that make it impossible for other teams to validate a model, apply it to a different population, or determine whether it would work in their clinical setting.

The clinical AI literature already has a well-documented problem with models that perform impressively in development papers and then fail when used at other institutions. External validation studies routinely find much lower performance than development papers reported. Some of this is genuine generalizability failure. Some of it is preventable reporting omission: if the development paper had described preprocessing clearly, the validation team would have applied the same steps and compared models on a fairer basis.

There is also a practical publication benefit. Peer reviewers at journals with rigorous clinical AI policies, PLOS Medicine and BMJ among them, now include methodologists and statisticians who use TRIPOD+AI to assess manuscripts during review. A paper with a complete checklist and methods that answer every item moves through review faster than one where the reviewer has to write a list of missing details in the first round. Treating TRIPOD+AI as a writing framework rather than a post-hoc audit is, in the end, less work overall.

TRIPOD+AI: A Practical Guide to Reporting Clinical Prediction Models

Working Principle

From TRIPOD 2015 to TRIPOD+AI: Why the Update Was Needed

What the 27-Item Checklist Actually Covers

Checklist items most commonly omitted in published studies

The New Domains That Set TRIPOD+AI Apart

A note on class imbalance

The Abstracts Checklist Is Not Optional

What the Adherence Studies Tell Us About the Gaps

Writing the Methods Section Under TRIPOD+AI

Template: Model performance reporting under TRIPOD+AI

Registering a Prediction Model Study and Depositing Your Code

Getting the Journal Submission Right

Before you submit: a quick self-check

What Poor Adherence Actually Costs You

Further Reading

Publishing Medical AI Research in 2026

CONSORT 2025 Clinical Trial Reporting

How to Disclose AI Use in Medical Manuscripts

How to Read Journal Author Guidelines

Related Articles

Statistical Reporting in Medical Manuscripts: What the SAMPL Guidelines Require in 2026

PRISMA 2020: A Practical Guide to Reporting Systematic Reviews and Meta-Analyses in 2026

Checking Citations for Retractions Before Journal Submission: A 2026 Guide