‘Gemini Lies the Most?’ New BBC–EBU mega‑study says AI assistants mangle nearly half of news answers

informacja-lokalna.pl 2 dni temu

Scope: 22 public‑service media in 18 countries tested 3,000+ AI answers to news questions in 14 languages. Reuters
Headline finding: 45% of AI responses contained a significant issue, and 81% had at least one problem. Reuters
Biggest weakness: Sourcing—about a third of answers had serious citation or attribution faults. Reuters
Who struggled most: Google Gemini showed the highest rate of problems, especially with sourcing; several reports put “significant issues” in ~76% of its answers. TechRadar
Examples of errors: wrong laws, outdated facts (e.g., misidentifying the current Pope), and misattributed quotes. Reuters
Why it matters: AI assistants are quickly becoming a gateway to news; 7% of people use them weekly (rising to 15% among under‑25s). odg.it
What’s new from broadcasters: The EBU and BBC released a “News Integrity in AI Assistants” toolkit with practical fixes for vendors and newsrooms. ebu.ch

What the new international study actually found

A BBC‑led, EBU‑coordinated test asked leading AI assistants (including ChatGPT, Copilot, Gemini and Perplexity) to answer news questions across languages and countries. Professional journalists then judged each reply for accuracy, sourcing, and fact‑vs‑opinion. The topline result: 45% of answers had significant issues, and 81% had some issue. The single biggest failure mode was sourcing—missing citations, misleading links, or incorrect attributions in about a third of responses. Reuters

Researchers logged 3,000+ evaluations across 18 countries and 14 languages, making this the largest multi‑market study yet of how consumer AI assistants handle news. Reuters

Who struggled most—and why the headline matters

Several outlets summarizing the data report that Gemini performed worst on news, with “significant issues” in roughly three‑quarters of its answers—mainly due to poor or missing sourcing. Others fared better but still made serious mistakes. Importantly, the study measures errors, not intent; calling these “lies” is a catchy headline, but the findings are about reliability and sourcing, not deliberate deception. TechRadar

Real‑world failure cases

The researchers documented outdated facts (e.g., assistants asserting the wrong current Pope) and policy misstatements (e.g., incorrect claims about changes to vaping laws). Another example: confident but wrong assertions about astronauts “never” being stranded—despite well‑documented missions that undercut that claim. Reuters

How the test was run (methodology in brief)

When & how: Participating newsrooms generated and rated answers between late May and mid‑June 2025, using a shared set of 30 core questions (plus local add‑ons). Search Engine Journal
What was tested: Consumer/free versions of assistants—to reflect how ordinary people encounter AI for news. (Many publisher blocks were temporarily lifted so assistants could access their content.) Search Engine Journal

The EBU and BBC also published a practical toolkit detailing what “good” looks like (clear citations, time‑stamping, context, link‑outs, and stronger “I don’t know” behaviors). ebu.ch

Why chatbots go wrong with news

There’s a growing consensus—even from AI labs—that training and evaluation often reward guessing over honest uncertainty, encouraging models to produce fluent but unfounded claims. OpenAI’s September research bluntly argues that current practices “reward guessing rather than acknowledging uncertainty,” which helps explain persistent hallucinations. OpenAI

In a news context, these tendencies collide with fast‑moving facts, paywalled or geofenced sources, and multilingual nuance—conditions under which retrieval, citation, and time‑stamping must be rock‑solid.

What industry leaders are saying

EBU’s Jean Philip De Tender warns of the civic stakes: “When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.” Reuters
BBC News CEO Deborah Turness cautions that “AI‑distorted” headlines could cause real‑world harm if left unchecked. The Verge

Adoption is rising—especially among the young

The Reuters Institute’s Digital News Report 2025 finds 7% of people now use AI chatbots for news each week, climbing to 15% among under‑25s. That makes assistant reliability a near‑term, not hypothetical, problem. odg.it

What the companies say

Following publication, outlets noted that Google says Gemini “welcomes feedback” to improve; OpenAI and Microsoft acknowledge hallucinations and say they’re working on mitigation; Perplexity points to a “Deep Research” mode it claims is 93.9% factual. None of these statements dispute the basic thrust of the study: quality is inconsistent and sourcing is weak. Reuters

How this compares to earlier tests

In February 2025, BBC researchers asked the same assistants to summarize 100 BBC articles; 51% of outputs had significant issues, with quotes altered or invented in 13% of cases. That earlier probe already flagged Gemini as the most concerning. The new EBU–BBC study scales that inquiry across languages and countries—and still finds systemic problems. The Verge

Practical guidance (for readers, publishers, and AI builders)

For readers

Treat AI as a sketch, not a source. Click through to the original reporting before you share or act. Reuters
Scan the citations. Missing links, generic homepages, or circular references are red flags. TechRadar
Watch the clocks. News changes hourly; check timestamps and whether an answer says when a claim was true. Reuters

For publishers

Demand better attribution and linking. The EBU toolkit outlines minimum standards for citations, quotes, and provenance. ebu.ch
Harden your content supply chain. Use structured metadata and canonical links to reduce misattribution. ebu.ch

For AI companies

Prioritize sourcing and “I don’t know.” Favor precise citations and refusal/deferral when evidence is thin—align reward functions accordingly. OpenAI
Time‑box news answers. Show last‑updated times and default to linking out on dynamic stories. ebu.ch

Bottom line

Is Gemini “lying the most”? The evidence shows Gemini performed worst in this study—especially on sourcing—but the broader takeaway is bigger: every assistant tested struggled with news reliability at meaningful rates. With millions already turning to AI for headlines, the EBU–BBC message is clear: fix the plumbing (sources, timestamps, uncertainty) before AI becomes the news front page by default. Reuters

Sources & further reading (selected)