Writing Guide

Data Availability Statements in 2026: What Medical Journals Actually Require

Adding “data available on reasonable request” to your manuscript is no longer sufficient at many major publishers. Here is what journals now expect, which repositories work for which data type, and how to write statements that protect sensitive patient data without leaving editors dissatisfied.

MZ
Dr. Meng Zhao|Physician-Scientist · Founder, LabCat AI
Published: May 202616 min readWriting Guide

The shift from optional to mandatory data availability statements has crept up on many researchers. Two or three years ago, appending a line about data being available on reasonable request was treated as adequate disclosure at most biomedical journals. In 2026, that phrasing is either explicitly prohibited or actively challenged at a growing number of major publishers, and the journals that still accept it are asking authors to justify it rather than simply accepting the phrase as a formality. Meanwhile, two developments this month have sharpened the issue: NIH is implementing its revised Data Management and Sharing Plan format on May 25, 2026, and the CONSORT 2025 update added new open science checklist items covering data availability that many journals are now enforcing at submission. Researchers preparing manuscripts right now are encountering requirements that were not there a year ago, and the practical guidance on how to write statements that actually satisfy editors is harder to find than it should be.

The root cause is not bureaucratic inconvenience. It is the accumulated evidence that data sharing as practiced in medical publishing had a poor track record. Studies from the mid-2010s through the early 2020s repeatedly found that when authors claimed data was available on request, a substantial fraction either did not respond, no longer had the dataset in usable form, or declined to share despite their stated policy. That pattern, combined with the broader reproducibility debate in clinical and basic science research, convinced major publishers and funders that the old approach was not working. What followed was a gradual but now quite visible tightening, and authors who have not revisited their data availability practices since 2022 or 2023 may be surprised by what their current target journal expects.

Working Principle

A data availability statement is not a formality. Treat it the same way you would treat a conflict of interest disclosure: specific, accurate, and written in language that will still make sense to a reader five years from now who is trying to verify your results.

The FAIR Framework and Why Publishers Adopted It

The practical language underpinning most current data availability policies comes from the FAIR principles, which were formalized in 2016 in a widely cited paper in Scientific Data. FAIR stands for Findable, Accessible, Interoperable, and Reusable. The core idea is that research data should be deposited in a way that assigns it a persistent unique identifier, makes access conditions explicit and machine-readable, uses standard formats that allow data to be combined with other datasets, and includes enough metadata that someone without prior knowledge of the project can understand and use what they find. General principles do not by themselves change submission practices, but the FAIR framework gave publishers a concrete checklist for what good data availability looks like, which made enforcement possible in a way that vague encouragement never had been.

The Enabling FAIR Data initiative, which Nature backs, and parallel pressure from major funders including NIH, Wellcome, and the European Research Council, gave publishers both cover and incentive to tighten their policies. By the time PLOS journals moved to requiring publicly available data at the point of publication rather than just on request, the trend was already set. The question for medical authors today is not whether their target journal has a data availability policy, but exactly what that policy requires and how stringently the editorial office will check compliance.

One thing worth noting: compliance checking is inconsistent. Some journals treat the data availability statement as a form field where almost any text will pass submission. Others, particularly at Nature Portfolio journals and PLOS titles, have moved toward checking whether the stated repository actually contains accessible data before the paper goes to peer review. The risk of a mismatch between what your statement says and what is actually in the repository has become a real reason for desk rejection at journals that were not checking closely two years ago.

The Four Types of Data Availability Statement

Before you can write the right statement, you need to know which of four basic configurations applies to your study. Most publishers organize their policies around these options, and picking the wrong one is a surprisingly common source of revision requests.

The first type is fully open data: your dataset is deposited in a public repository with a citable identifier and no access restrictions. This is the policy preference at PLOS Medicine, PLOS One, and the open-access Nature Portfolio journals. If you can provide this, your statement is short and verifiable, and most editors will not probe further.

The second type is repository-based restricted access: the data exists in a recognized repository, but access requires a data use agreement, an institutional review process, or approval from a data access committee. This is common for clinical trial participant data, genomic data with re-identification risk, and electronic health record-derived datasets. Repositories such as the UK Biobank, the NCBI database of Genotypes and Phenotypes (dbGaP), Vivli, and the European Genome-phenome Archive (EGA) support this model explicitly. Your statement names the repository and explains the access procedure. This option is widely accepted as a legitimate reason for not providing fully open access, provided the statement is specific.

The third type involves data deposited in a supplementary or publisher-hosted repository. Elsevier's Mendeley Data platform allows authors to deposit datasets that become linked directly to the published article. This satisfies most Elsevier journal requirements and is less logistically demanding than registering with an external domain-specific repository for a one-off study.

The fourth type, and the one under increasing scrutiny, is the “available on reasonable request” statement with no supporting deposit. Some journals still accept this with adequate justification. Most major medical publishers now require authors to explain why repository deposit is not possible rather than accepting the phrase as self-evident. A few Nature Portfolio titles have moved to treating this as an option that requires editorial approval rather than a standard choice.

Which Repository for Which Data Type

Repository selection matters because not all repositories are treated equally by journal editors or by researchers who later try to access your data. The general principle is to use a domain-specific repository where one exists for your data type, and to fall back to a general-purpose repository when it does not. General-purpose repositories are fine for aggregated or derived data; domain-specific ones carry the expectation of structured metadata that makes downstream reuse more practical.

Domain-specific repositories by data type

  • Gene expression data: Gene Expression Omnibus (GEO) at NCBI. Requires structured metadata. Journals such as Nature Methods treat GEO accession numbers as mandatory for RNA-seq and microarray studies.
  • Raw sequencing data: Sequence Read Archive (SRA) at NCBI, or the European Nucleotide Archive (ENA). Both are cross-indexed and widely accepted.
  • Protein structures: Protein Data Bank (PDB). Effectively mandatory for crystallography and cryo-EM studies at most journals in the field.
  • Clinical trial participant-level data: Vivli, the YODA Project at Yale, or dbGaP for genomically linked data. All use standardized data use agreement frameworks.
  • Neuroimaging data: OpenNeuro (accepts BIDS format) or Zenodo with BIDS-compliant packaging.
  • General datasets (spreadsheets, statistical outputs, aggregated results): Zenodo (hosted by CERN), Figshare, or Dryad. All three are FAIR-compliant, assign DOIs, and are accepted by Elsevier, Springer Nature, BMJ, and most other major publishers.

One consistent mistake is using institutional repositories that do not issue persistent identifiers or that have irregular uptime. An institutional repository URL that breaks in two years is worse than no repository at all from a reproducibility standpoint, and some journals now flag non-standard or non-DOI URLs during the submission check. If your institution does not run a FAIR-compliant repository with DOI minting, Zenodo is the practical default. It is free, well-maintained, backed by CERN infrastructure, and universally accepted.

For study code and computational analysis scripts, the combination of a GitHub repository with a mirrored Zenodo release is now an established standard. GitHub alone is not sufficient because repositories can be deleted or privatized after publication. Minting a Zenodo DOI from a specific GitHub release creates a stable, citable snapshot that Nature Portfolio and PLOS now expect for computational and machine learning studies. This step takes about ten minutes once you have connected your GitHub account to Zenodo, and it protects against the common situation where a corresponding author changes institutions and loses access to their old lab's repository.

What Major Publishers Currently Require

Publisher policies differ in specific ways that matter at the submission stage. Reading the exact current language from your target journal's author instructions is still the only reliable approach, but the following describes where the major players stood heading into mid-2026. These policies have been updated at least once in the past 18 months, so treat any specific detail you remember from a previous submission as potentially outdated.

Nature Portfolio journals require a data availability statement in all original research articles and define the minimum dataset as the data necessary to interpret, verify, and extend the findings. They back the Enabling FAIR Data initiative and prefer repository deposits with accession numbers or DOIs over supplementary data files, though supplementary files are still accepted in cases where a domain-specific repository does not exist. Nature journals also require a code availability statement separately from the data availability statement, and both are published as part of the article record. The submission system uses a structured template dropdown for these statements, which means selecting the wrong template creates an immediate flag for the editorial office.

PLOS Medicine and PLOS One require that all data necessary to replicate the study's findings be publicly available at the time of publication. PLOS interprets this strictly: patient-level clinical data shared under a data use agreement from a recognized controlled-access repository satisfies the policy, but a private institutional repository with no external access process does not. PLOS also uploads supporting information files to Figshare automatically after acceptance, which means any statement referencing supplementary files will end up pointing to Figshare even if you think of it as a journal-hosted resource. This matters if your institution has questions about third-party hosting of research data.

BMJ requires a data sharing statement in all original research and review articles. For clinical trials, BMJ aligns with the ICMJE position that individual participant data should be available to others, and increasingly encourages controlled-access repository deposit where deidentification is feasible. The BMJ statement template is relatively short and structured, but BMJ editors in some specialty areas review them carefully.

Elsevier applies a data availability policy across its journals but gives individual titles discretion to be more specific. The core expectation is a statement that links to data and includes a persistent identifier. Elsevier's Mendeley Data platform provides an integrated option for authors who prefer not to use an external repository. Many Elsevier journals still accept “available on request” statements but now ask for a named contact email, which is a small but meaningful shift toward accountability.

The Problem with “Available on Reasonable Request”

It is worth being direct about why this phrasing has become a problem, because many authors use it in good faith and are then surprised when editors push back. The issue is not primarily that researchers are being dishonest when they write it. The issue is the accumulated evidence from multiple systematic reviews and audit studies showing that the phrase does not reliably produce data access. When researchers followed up on papers containing this statement, response rates varied widely, datasets were frequently lost, reformatted, or sitting on hardware no longer accessible, and some authors who did reply declined to share without giving a reason that the original policy would sanction.

Some journals now require authors who use this phrasing to provide a specific justification, a named institutional email address that will remain active, and a confirmation that the data will be provided within a defined timeframe, often 30 to 60 days of receipt of a reasonable request. Others have moved to treating this as an option that requires editorial board approval before being accepted as a data availability statement. A few flagship journals have begun piloting programs where they verify at least one data request per published paper to spot-check compliance after publication.

If you genuinely cannot make your data publicly available and cannot use a controlled-access repository, the most defensible approach is to explain why in concrete terms. Regulatory restrictions in your original ethics approval, data ownership agreements with clinical partners, or contractual obligations to a funding body are all reasons that editors understand and that can be verified. Vague references to patient privacy are less convincing when repositories such as dbGaP and Vivli exist specifically to handle sensitive clinical data under a data use agreement framework.

Sensitive and Identifiable Data: What You Can Actually Say

The tension between data transparency and the protection of research participants is real, and most journal policies acknowledge it explicitly. The goal is not to force deidentification of datasets where re-identification risk is genuinely high, or to override ethics agreements that restrict secondary data use. The goal is to make clear what the data is, where it sits, who may access it and under what conditions, and what the process looks like for a legitimate researcher to make a request. A statement that does those four things is almost always sufficient.

For clinical trial data, Vivli and the YODA Project at Yale both provide infrastructure for sharing individual participant data under standardized data use agreements that include independent scientific review of access requests. The ICMJE has endorsed this general model. If your trial is registered and complete, depositing participant-level data in one of these platforms is defensible to most institutional review boards and meets the requirements of journals that have adopted the ICMJE's 2026 recommendations on data access.

For electronic health record data and administrative claims data, controlled access through dbGaP (for genomically linked data) or through your institution's formal data access process (for EHR data without a genomic component) is the standard path. Your data availability statement names the repository or the institutional mechanism, provides a contact point, and clarifies the approximate timeframe for processing a request. One detail many authors miss is specifying whether the data can be physically transferred or must be accessed through a secure analysis environment, because these have different implications for international collaborators who may face their own regulatory constraints.

Template: restricted-access clinical data

The dataset supporting the findings of this study consists of deidentified individual participant data from [trial name or cohort description]. Data are available to researchers whose proposed use has been approved by an independent scientific review committee. Requests should be submitted to [institutional contact email or repository URL]. The dataset is not publicly available due to the terms of participant consent and applicable data protection regulations. A summary of the variables available and the access request procedure can be found at [DOI or stable URL].

Notice what this template does. It specifies what the restriction is (consent terms and regulations, not vague privacy concerns), who reviews access requests (an independent committee), where to submit requests, and directs readers to a stable location for procedural details. That level of specificity is what distinguishes a statement that satisfies editors from one that generates a request for revision.

Writing the Statement: Templates by Scenario

The specific wording of your statement matters less than its completeness. An editor reviewing a data availability statement is checking for four things: whether the data location is clearly identified, whether it has a persistent identifier (DOI, accession number, or registry record), whether access conditions are explicit, and whether the statement matches the actual state of the repository at the time of submission. Getting all four right is straightforward once you have made the underlying decisions about where the data lives and who can access it.

Template: fully open dataset with DOI

All data generated or analyzed during this study are included in this published article and its supplementary files, and are deposited in the [repository name] repository at [DOI or accession number]. No restrictions apply to the availability of these data.

Template: code and analysis scripts

All analysis code used in this study is available at [GitHub repository URL] and is archived at Zenodo (DOI: [DOI]). The archived version corresponds to the exact release used to produce the results reported here.

Template: mixed open and restricted data

Summary statistics and derived variables used in the primary analysis are deposited at [repository name / DOI]. Individual participant-level data cannot be made publicly available due to participant consent restrictions but may be requested from the corresponding author for use under a data use agreement. Requests will be reviewed by the study data access committee within 30 days of receipt.

One consistent error authors make is writing the data availability statement before they have deposited the data. The deposit should come first, because you need the accession number or DOI before you can write a complete, honest statement. Running the deposit in parallel with peer review is the standard workflow, but make sure the repository is set to “publicly available upon publication” or the equivalent. A reviewer or editor who checks the DOI and finds an empty or embargoed entry will raise questions during review.

It is also worth being specific about versions. If your analysis used a publicly available dataset that has since been updated, name the exact version or release date you used. Reproducibility requires that a future researcher be able to obtain the same inputs, not just the same platform.

How This Connects to NIH DMSP and Funder Requirements

The NIH Data Management and Sharing Plan that takes effect in its new format on May 25, 2026 is specifically a funder-side document describing how you intend to manage and share data during and after the grant. The journal-side data availability statement describes what you actually did. These are not the same document, but they should be consistent: if your DMSP committed to depositing data in a specific repository by a specific date, your published data availability statement should reflect that commitment.

Journals are not required to verify whether your data availability statement matches your NIH DMSP, and most do not do so at the time of submission. But discrepancies can surface later, particularly if a data request leads to a dispute and someone begins tracing back through the paper's supporting documentation. NIH's program officers are also beginning to treat the data availability statement in published papers as a signal about whether researchers followed through on their DMSP commitments, though this is not yet systematic enforcement.

If you are funded by multiple sources with different data sharing requirements, write your data availability statement to the most stringent requirement. Wellcome Trust, the European Research Council, and UKRI all have specific expectations that may go further than NIH in some scenarios, and the journal you are submitting to may impose requirements on top of what your funders require. When these layers conflict in ways that are not straightforward, contact the journal's editorial office and your institution's research office before submission rather than after. A short pre-submission inquiry is almost always welcomed; a post-acceptance correction request is not.

The CONSORT 2025 update is worth mentioning here as well. The new open science checklist items in CONSORT 2025 include specific reporting items for data availability in clinical trial publications. If you are reporting a trial and your target journal requires CONSORT compliance, which most major medical journals now do, your data availability statement needs to satisfy both the journal's standalone policy and the CONSORT open science items simultaneously. In practice these overlap substantially, but checking both before submission avoids the situation of satisfying one and being asked to revise for the other.

A Pre-Submission Check for Your Data Availability Statement

Before you finalize your submission package, run through this check specifically for the data availability statement. Data availability problems are often caught late in the process and can delay acceptance while the editorial office waits for corrections that could have been made before submission.

Questions to check before you submit

  • Have you deposited the data before writing the statement, not just planned to deposit it later?
  • Does your statement include a persistent identifier (DOI, accession number, or registry record)?
  • Have you confirmed that the repository entry is accessible, or that the access procedure is correctly described?
  • Does the statement match your journal's current policy, not the policy from a previous submission?
  • If you included code or analysis scripts, is there a separate code availability statement?
  • If you used a controlled-access repository, does the statement name the review body and the timeframe for access decisions?
  • Does the statement align with your NIH DMSP or equivalent funder data management plan?
  • If your study uses a publicly available source dataset, have you specified the exact version or release date you used?
  • Is the repository you cited stable for long-term access, not a personal cloud folder or a temporary institutional link?

The last point deserves particular emphasis. Data availability is not a problem that ends at publication. It is a commitment that runs for as long as the paper is part of the scientific record, and that could be decades. The journals that have started checking compliance with data availability statements after publication are encountering a predictable problem: corresponding authors have moved institutions, email addresses have changed, and datasets that were accessible in 2023 are now sitting on a hard drive in a storage unit or are simply gone. Depositing in a stable, indexed, long-term repository at the time of submission is the only practical way to meet that long-term obligation rather than just satisfying the submission checklist.

For researchers who find this landscape genuinely confusing, the ICMJE and EQUATOR Network websites both maintain current guidance on data sharing expectations. Your institution's research librarian or data management office is also often an underused resource for repository selection and statement drafting. Most institutions with active research programs have dedicated staff for exactly this, and a thirty-minute conversation before submission is far more efficient than revising after the editorial office sends comments.

Further Reading

MZ

Written by Dr. Meng Zhao

Physician-Scientist · Founder, LabCat AI

MD · Former Neurosurgeon · Medical AI Researcher

Dr. Meng Zhao is a former neurosurgeon turned medical-AI researcher. After years in the operating room, he moved into applied AI for clinical workflows and now leads LabCat AI, a medical-AI company working on decision support and research tooling for clinicians. He built Journal Metrics as a free resource for researchers who need reliable journal metrics without paid database subscriptions.

Related Articles