- Scope: 22 public‑service media in 18 countries tested 3,000+ AI answers to news questions in 14 languages. Reuters
- Headline finding: 45% of AI responses contained a significant issue, and 81% had at least one problem. Reuters
- Biggest weakness: Sourcing—about a third of answers had serious citation or attribution faults. Reuters
- Who struggled most: Google Gemini showed the highest rate of problems, especially with sourcing; several reports put “significant issues” in ~76% of its answers. TechRadar
- Examples of errors: wrong laws, outdated facts (e.g., misidentifying the current Pope), and misattributed quotes. Reuters
- Why it matters: AI assistants are quickly becoming a gateway to news; 7% of people use them weekly (rising to 15% among under‑25s). odg.it
- What’s new from broadcasters: The EBU and BBC released a “News Integrity in AI Assistants” toolkit with practical fixes for vendors and newsrooms. ebu.ch
What the new international study actually found
A BBC‑led, EBU‑coordinated test asked leading AI assistants (including ChatGPT, Copilot, Gemini and Perplexity) to answer news questions across languages and countries. Professional journalists then judged each reply for accuracy, sourcing, and fact‑vs‑opinion. The topline result: 45% of answers had significant issues, and 81% had some issue. The single biggest failure mode was sourcing—missing citations, misleading links, or incorrect attributions in about a third of responses. Reuters
Researchers logged 3,000+ evaluations across 18 countries and 14 languages, making this the largest multi‑market study yet of how consumer AI assistants handle news. Reuters
Who struggled most—and why the headline matters
Several outlets summarizing the data report that Gemini performed worst on news, with “significant issues” in roughly three‑quarters of its answers—mainly due to poor or missing sourcing. Others fared better but still made serious mistakes. Importantly, the study measures errors, not intent; calling these “lies” is a catchy headline, but the findings are about reliability and sourcing, not deliberate deception. TechRadar
Real‑world failure cases
The researchers documented outdated facts (e.g., assistants asserting the wrong current Pope) and policy misstatements (e.g., incorrect claims about changes to vaping laws). Another example: confident but wrong assertions about astronauts “never” being stranded—despite well‑documented missions that undercut that claim. Reuters
How the test was run (methodology in brief)
- When & how: Participating newsrooms generated and rated answers between late May and mid‑June 2025, using a shared set of 30 core questions (plus local add‑ons). Search Engine Journal
- What was tested: Consumer/free versions of assistants—to reflect how ordinary people encounter AI for news. (Many publisher blocks were temporarily lifted so assistants could access their content.) Search Engine Journal
The EBU and BBC also published a practical toolkit detailing what “good” looks like (clear citations, time‑stamping, context, link‑outs, and stronger “I don’t know” behaviors). ebu.ch
Why chatbots go wrong with news
There’s a growing consensus—even from AI labs—that training and evaluation often reward guessing over honest uncertainty, encouraging models to produce fluent but unfounded claims. OpenAI’s September research bluntly argues that current practices “reward guessing rather than acknowledging uncertainty,” which helps explain persistent hallucinations. OpenAI
In a news context, these tendencies collide with fast‑moving facts, paywalled or geofenced sources, and multilingual nuance—conditions under which retrieval, citation, and time‑stamping must be rock‑solid.
What industry leaders are saying
- EBU’s Jean Philip De Tender warns of the civic stakes: “When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.” Reuters
- BBC News CEO Deborah Turness cautions that “AI‑distorted” headlines could cause real‑world harm if left unchecked. The Verge
Adoption is rising—especially among the young
The Reuters Institute’s Digital News Report 2025 finds 7% of people now use AI chatbots for news each week, climbing to 15% among under‑25s. That makes assistant reliability a near‑term, not hypothetical, problem. odg.it
What the companies say
Following publication, outlets noted that Google says Gemini “welcomes feedback” to improve; OpenAI and Microsoft acknowledge hallucinations and say they’re working on mitigation; Perplexity points to a “Deep Research” mode it claims is 93.9% factual. None of these statements dispute the basic thrust of the study: quality is inconsistent and sourcing is weak. Reuters
How this compares to earlier tests
In February 2025, BBC researchers asked the same assistants to summarize 100 BBC articles; 51% of outputs had significant issues, with quotes altered or invented in 13% of cases. That earlier probe already flagged Gemini as the most concerning. The new EBU–BBC study scales that inquiry across languages and countries—and still finds systemic problems. The Verge
Practical guidance (for readers, publishers, and AI builders)
For readers
- Treat AI as a sketch, not a source. Click through to the original reporting before you share or act. Reuters
- Scan the citations. Missing links, generic homepages, or circular references are red flags. TechRadar
- Watch the clocks. News changes hourly; check timestamps and whether an answer says when a claim was true. Reuters
For publishers
- Demand better attribution and linking. The EBU toolkit outlines minimum standards for citations, quotes, and provenance. ebu.ch
- Harden your content supply chain. Use structured metadata and canonical links to reduce misattribution. ebu.ch
For AI companies
- Prioritize sourcing and “I don’t know.” Favor precise citations and refusal/deferral when evidence is thin—align reward functions accordingly. OpenAI
- Time‑box news answers. Show last‑updated times and default to linking out on dynamic stories. ebu.ch
Bottom line
Is Gemini “lying the most”? The evidence shows Gemini performed worst in this study—especially on sourcing—but the broader takeaway is bigger: every assistant tested struggled with news reliability at meaningful rates. With millions already turning to AI for headlines, the EBU–BBC message is clear: fix the plumbing (sources, timestamps, uncertainty) before AI becomes the news front page by default. Reuters
Sources & further reading (selected)
- Reuters: global summary of the study and civic stakes. Reuters
- TechRadar: cross‑language findings and Gemini’s sourcing problems. TechRadar
- The Register: performance breakdowns, examples, and EBU quotes. The Register
- Reuters Institute Digital News Report 2025 (adoption trends). odg.it
- EBU/BBC News Integrity in AI Assistants toolkit. ebu.ch
- OpenAI: why language models hallucinate (training rewards guessing). OpenAI
- The Verge: BBC’s February 2025 audit of AI news summaries. The Verge









