Peer review is supposed to be the mechanism by which the academic community filters bad science from the published record. It is not a perfect system. Reviewers have always brought varying levels of care and expertise. But a functioning peer review process at least guarantees a human being with subject-matter knowledge read your paper and formed a substantive opinion. That guarantee has become unreliable. In early 2026, researchers and conference organizers discovered what many had long suspected: a substantial fraction of the peer reviews being submitted were written primarily by large language models, with little or no genuine expert analysis behind them.
The problem is well documented now in computer science conferences, which have the incentive and the technical sophistication to measure it. Whether the same rates apply to medical journals is genuinely uncertain. But the underlying conditions are identical. Reviewers at busy medical journals face the same time pressures, the same access to AI writing tools, and the same absence of reliable detection that drove rates so high in machine learning conferences. If anything, the clinical research community's slower adoption of formal AI oversight policies makes the situation harder to monitor, not easier.
What This Post Covers
This is not about journal policies that govern whether authors can use AI. It is about what happens on the reviewer side: how widespread AI-generated reviews have become, how detection now works, what a poor AI review looks like, and what medical authors can do when they suspect the review of their manuscript was not genuinely written by a human expert.
The Numbers From the Conferences
The clearest picture of the problem comes from the 2026 machine learning conference season. At ICLR 2026, Pangram Labs analyzed all 75,800 peer reviews submitted across 19,490 papers. Their finding: roughly 21 percent of reviews were fully AI-generated, and more than half showed at least some signs of AI assistance. These are not trivial numbers. At a conference with roughly 19,000 submissions, a 21 percent rate of fully AI-written reviews translates to tens of thousands of assessment documents in which no genuine expert engaged with the science.
ICML 2026 responded more aggressively in March. The conference organizers embedded machine-readable instructions inside the PDF copies of manuscripts sent to reviewers, a technique sometimes called a honeypot or a prompt-injection watermark. If a reviewer fed the watermarked PDF into a large language model, the hidden instructions told the AI to include two specific phrases, chosen from a dictionary of roughly 170,000 options, in whatever text it generated. Detection success exceeded 80 percent across most models tested, with a family-wise error rate of approximately 0.0001. The result: 506 reviewers were identified, and 497 papers were desk-rejected because their assigned reviewers had violated the conference's AI-use policy.
That rejection figure represents nearly two percent of all ICML 2026 submissions. For a conference of that size, with authors who typically spend months preparing manuscripts, the consequences for those 497 teams were significant. They lost a submission cycle. In some cases, they lost prioritization for a competitive venue that shapes careers. The reviewers caught were not anonymous graduate students. They were part of the pool of authors who had submitted to the same conference, which meant many had legitimate credentials in the field.
Key Numbers from 2026
- 21%of ICLR 2026 peer reviews fully AI-generated, per Pangram Labs analysis of 75,800 reviews
- 50%+of ICLR 2026 reviews showed at least some signs of AI assistance
- 506ICML 2026 reviewers caught violating AI-use policy via embedded watermarks
- 497papers desk-rejected at ICML 2026, nearly 2% of all submissions
- 78%of top 100 medical journals have explicit guidance on AI in peer review (late 2024 JAMA-network study)
Why the Same Conditions Exist in Medical Journals
The machine learning conference community noticed this problem first, partly because that community has both the technical tools to analyze language patterns and a strong incentive to investigate its own publishing infrastructure. But the pressures that drove reviewers toward AI shortcuts are not unique to AI research. They are structural features of academic publishing everywhere.
Medical journal review volumes have grown significantly over the past decade. Journals like JAMA, the New England Journal of Medicine, and BMJ each receive tens of thousands of submissions annually. Specialty journals in cardiology, oncology, and infectious disease operate at smaller scale but draw from narrower pools of qualified reviewers. The standard expectation at most journals is that a reviewer will provide a substantive written assessment, typically 400 to 1,000 words, within two to four weeks, for no compensation. Most reviewers carry this workload on top of clinical and research responsibilities. The temptation to delegate to a language model is obvious, even for someone who takes the obligation seriously.
A study published in a JAMA-network journal in late 2024 examined AI policy across the top 100 medical journals and found that 78 had issued explicit guidance on AI use in peer review. Of those 78, 46 prohibit AI use in review outright, and 32 allow it under specific conditions, typically requiring the reviewer to confirm confidentiality was maintained and that the AI output was their own responsibility. A cross-disciplinary analysis published in Learned Publishing in 2026 found that between March and August 2025 alone, 24.5 percent of high-impact-factor journals had revised their AI peer review policies. Policies are being written. Whether they are being enforced is a different question, and the honest answer is that almost no medical journal currently has the detection infrastructure that ICML deployed in March 2026.
What an AI-Generated Review Actually Looks Like
Experienced authors and editors can often recognize an AI-generated review on a first read, even without detection tools. The most reliable signal is not any specific phrase. It is the absence of the kind of friction that comes from a genuine expert engaging with specific material.
A human expert reviewing a clinical trial will typically anchor their feedback in particulars: the specific design decision that introduces confounding, the subgroup analysis that was not pre-specified, the comparison that was missing from Table 3. An AI-generated review tends to produce a structurally coherent document that reads as professionally written, praises the paper for its contributions to the field, lists generic concerns about statistical power or sample size, and recommends several broad categories of additional citation without naming any specific papers that should be cited. The length is often longer than a human would bother to write for a paper with clear problems. The tone is uniformly courteous. There is rarely any sign that the reviewer found anything genuinely surprising or counter-intuitive, which is unusual for a thoughtful reading of almost any paper.
Hallucinated citations are a separate marker that some editors now check. A reviewer who used a language model without careful supervision may include a reference list at the end of their report. If those references are fabricated, it is detectable. A growing number of journals ask editors handling manuscripts to do a spot check on any references included in reviewer comments, particularly after the Lancet research letter published in May 2026 found fabricated citations in one in 277 PubMed-indexed papers from early in the year. The same underlying behavior that produces fabricated citations in manuscripts can produce them in the review documents themselves.
Signals that a peer review may have been AI-generated
- Generic praise without reference to the specific study design, population, or results
- Concern about statistical power stated abstractly, without engaging with the actual sample size calculation or power analysis provided
- Requests for additional citations without naming specific authors, papers, or journals
- Unusually long reviews with many bullet points but no clearly novel observation
- References cited in the review body that cannot be verified in any database
- Absent commentary on figures or tables, or commentary that does not match what the figures actually show
- Courteous, hedged language that avoids ever taking a direct position on whether the study is scientifically sound
How Detection Is Evolving
The ICML watermark approach is clever but not yet portable to most medical journals. It requires the conference or journal to modify the manuscript PDFs before distributing them to reviewers, which adds a processing step and creates a chain of custody that most journal submission systems are not currently set up to manage. Springer Nature, Elsevier, Wiley, and the other major medical publishers have not announced plans to implement watermark-based detection, though it would be surprising if their editorial systems teams have not discussed it internally.
What is more likely to scale in the near term is statistical detection at the review corpus level. A journal that accumulates thousands of reviews can compare the linguistic patterns of reviews flagged as suspicious against the baseline patterns in its review archive. Unusual uniformity of tone, sentence length distributions that resemble known AI outputs, vocabulary that clusters too tightly around certain register patterns, these are detectable at scale even without embedding watermarks. Journals with enough volume to build such a baseline include the large general medical journals and the flagship specialty journals at major publishers. Smaller journals, which represent the majority of the medical literature, are harder to monitor this way.
Confidentiality constraints complicate everything. Medical manuscript peer review has traditionally been confidential, not just reviewer identity but the review content itself. Moving review content through an AI detection pipeline requires either the journal's internal infrastructure or a trusted third-party vendor with appropriate data agreements. Neither is trivial to arrange, and editors at journals without dedicated technology budgets are unlikely to have this capability in the near term.
There is also an important practical ceiling on what detection can accomplish. The ICML honeypot technique worked because reviewers fed the original watermarked PDF directly to a language model. Any reviewer who copied the text manually, paraphrased, or edited AI output substantially enough would not have triggered the watermark. The tool catches the careless violator, not the careful one. That limitation does not make detection worthless. The careless violator who submits a largely unedited AI output produces the worst reviews: the ones that fail to engage with the science at all. Catching those is valuable for the journals and the authors who receive them.
What Authors Can Do When They Receive a Suspect Review
Most medical authors have limited visibility into the review process and no direct way to verify how a review was written. But there are practical steps available at different points in the cycle.
Before you respond to a review, read it carefully for the signals described above. If a review is strikingly generic, does not engage with the substance of your methods, contains unverifiable references, or praises work that was rejected without explaining why, you are not obligated to treat it as authoritative. You can still respond to the points raised, which is almost always the right default. But you should respond to the specific methodological claims as if a thoughtful reader needed convincing, not as if the review itself identified genuine scientific problems with the paper.
If the review is so generic or poorly targeted that you genuinely cannot understand what the reviewer wanted, it is appropriate to note this diplomatically in your response letter to the editor. Something like: "We found Reviewer 2's concerns difficult to map to specific aspects of our study design, and we would welcome clarification on whether additional methodological information would address the concern." This signals to the handling editor that the review may not have provided useful guidance, without making an accusation you cannot support.
If you believe a review is genuinely problematic, most journals have an appeal or editorial inquiry process. At journals like BMJ, The Lancet, and JAMA, the editorial structure includes senior editors who handle escalated concerns. Contacting the editor-in-chief or editorial office to ask whether a second expert opinion might be warranted is within the range of accepted author behavior, particularly when a rejection appears to rest on a single poorly-reasoned review. Framing matters. You are not accusing the reviewer. You are requesting reassurance that the decision reflected an adequately informed assessment.
A practical note on raising concerns with an editor
If you contact an editorial office about review quality, be specific rather than general. "This review appears to have been generated by AI" is an accusation that will put editors on the defensive and is difficult for them to act on without evidence. "Reviewer 2's comments do not appear to engage with the statistical analysis described in our methods, and the references cited in comment 4 do not appear in PubMed under those titles" is an observation they can verify and act on.
Journals take review quality complaints more seriously when they are specific. Point to the exact comment, describe what a substantive response to it would require, and explain why the review as written does not allow that response.
The Reviewer Side: What This Means If You Are Invited to Review
For researchers who receive peer review invitations, the picture is straightforward but worth stating. Every major medical publisher and every reputable conference prohibits uploading manuscripts or their contents to public AI tools, because that breaks confidentiality regardless of how the output is used. Submitting an AI-generated review without disclosing it violates the basic ethical obligation of the reviewer role. You are representing to the journal that you personally assessed the science. An AI output, however polished, is not that representation.
The ICML 2026 consequences were career-damaging for some of those caught. Conference rejections at that level can delay publication of a paper by six months to a year. If similar detection infrastructure arrives at medical journal level, and the structural incentive to build it clearly exists, the consequences would carry into clinical research careers in ways that matter: grant applications, promotion review, institutional standing.
The right response to an excessive review burden is to decline invitations you cannot handle genuinely, not to accept them and outsource the work. Editors will generally prefer a declined invitation over a fake review. Most submission systems make it easy to decline with a brief explanation, and most editors understand that expert reviewers have limited time.
The Interaction with Transparent Peer Review
One underappreciated consequence of transparent peer review initiatives is that they create a deterrent against AI-generated reviews. Nature's decision in June 2025 to publish peer review correspondence by default means that reviewer reports for accepted Nature papers are now in the public record. A reviewer who submitted a generic AI output that was published alongside a high-profile paper could face public scrutiny from anyone who reads the published review file. The reputational risk is real, even for an anonymous reviewer, because the review's quality is visible to the community.
BMJ and several other journals in the eLife-adjacent orbit have moved toward more open review models over the past few years. PLOS Medicine publishes reviewer reports alongside accepted papers. As more journals make review correspondence part of the public record, the incentive structure shifts. A careless AI-generated review that becomes a permanent attachment to a published paper is a worse outcome for the reviewer than a declined invitation.
This suggests that the push toward transparency in peer review, often motivated by concerns about bias and accountability, may have a secondary benefit: reducing the attractiveness of AI-generated reviewing for anyone whose name might eventually be attached to it. The two reform movements, transparency and AI integrity, are not obviously related, but they push in the same direction.
Where Medical Journals Are Heading
The honest position is that medical journal editorial offices are behind where they need to be on this. Most do not have detection tools, many have only recently written policies, and the review infrastructure at the average specialty journal has not meaningfully changed in the past decade. The ICML watermark experiment is likely to be studied, refined, and eventually offered by submission platform vendors such as ScholarOne and Editorial Manager as an optional feature. How quickly publishers adopt it will depend on cost, confidentiality architecture, and whether enough high-profile integrity failures force the issue.
In the meantime, COPE (Committee on Publication Ethics) has been issuing guidance on AI in the publication process since 2023, and its position statements on peer review integrity are being revised. WAME (World Association of Medical Editors) has published its own guidance, and ICMJE's January 2026 recommendations updated the expectations around reviewer responsibility, though they stopped short of specifying technical detection requirements. These bodies move deliberately. The gap between the policy statements and the technical reality of detection will persist for at least another year or two.
For authors, the practical implication is that you cannot assume peer review of your medical manuscript was conducted at the standard you were taught to expect. This does not mean ignoring reviewer feedback. Reviewers who do their job properly still exist and still provide feedback that improves papers. It means reading reviewer reports with the same critical attention you would give to any other document in the publication process: identifying what the comments actually say, determining whether they are grounded in the specifics of your study, and responding to the substance rather than deferring to authority.
If the ICML story is a signal of what academic publishing is in the middle of, the adjustment period ahead is uncomfortable. But it is also an argument for the things that make peer review valuable in the first place: careful reviewer selection, adequate time, genuine subject-matter expertise, and the kind of editorial investment that can tell the difference between a review and a document. Authors who understand the current landscape are better positioned to navigate it, and to insist, professionally and specifically, when the review they received did not meet that standard.
Further Reading
AI in Peer Review: What Journal Policies Mean for Authors
The specific journal-level rules on what reviewers are and are not permitted to do with AI tools in 2026.
Transparent Peer Review at Nature in 2026
How Nature's default-public peer review changes the incentive structure for reviewers and what it means for authors.
How to Disclose AI Use in Medical Manuscripts
The disclosure standards that apply to authors, and what journals now expect when AI was used in manuscript preparation.
How to Respond to Peer Reviewer Comments
A strategic guide to writing point-by-point responses that work, even when the review was not entirely helpful.
Written by Dr. Meng Zhao
Physician-Scientist · Founder, LabCat AI
MD · Former Neurosurgeon · Medical AI Researcher
Dr. Meng Zhao is a former neurosurgeon turned medical-AI researcher. After years in the operating room, he moved into applied AI for clinical workflows and now leads LabCat AI, a medical-AI company working on decision support and research tooling for clinicians. He built Journal Metrics as a free resource for researchers who need reliable journal metrics without paid database subscriptions.
Related Articles
Fabricated Citations in Medical Research: What the Lancet Audit Means for Authors
A Lancet research letter published May 7, 2026 found fabricated citations in one in 277 PubMed-indexed papers early this year, a 12-fold rise since 2023. Here is what the Columbia University audit reveals, why AI writing tools are implicated, and what every author must now do before submission.
17 min readPublishing EthicsPaper Mills in 2026: What the BuyTheBy Dataset Reveals About Research Fraud
A dataset of 18,710 paper mill advertisements published in April 2026 shows authorship slots selling for $57 to over $5,600. A BMJ study flagged nearly 10 percent of cancer research papers as potentially fraudulent. Here is what working researchers need to know.
15 min readPublishing EthicsGuest-Edited Special Issues: The Peer Review Risks No One Warns Authors About
When a BMJ Group journal retracted seven of eight papers from a single guest-edited special issue in April 2026, it put a systemic problem back in the spotlight. Here is what honest authors need to understand before responding to that invitation email.
16 min read