Skip to the content.

Welcome to the *SHROOM Shared-Task Series on Hallucinations and Related Observable Overgeneration Mistakes

The SHROOM shared task series brings together researchers and practitioners interested in detecting hallucinations — that is, fluent yet semantically incorrect or unsupported outputs — in natural language generation (NLG) systems. Since 2024, we’ve been pushing the boundaries of automatic hallucination detection, with each edition introducing new challenges and innovations.

This website serves as a central hub to explore the current and past editions of the shared task, including SHROOM (2024), Mu-SHROOM (2025), and the upcoming ν-SHROOM (2026).


Explore the tasks

SHROOM-visions 2026

Shroom-visions

Shroom-visions is the fourth iteration of the SHROOM series, hosted at the UncertainLP Workshop (co-located with EMNLP 2026). This edition tackles hallucinations in vision-language models.

With it we provide a dataset for hallucination detection in image-conditioned text generation (VQA, image captioning, etc.). and a a four-class taxonomy for hallucinaiton occurences in a multilingual setup (Chinese, English, French, Italian).

🔗 Explore Shroom-visions 2026


SHROOM-CAP Shared Task 2025

SHROOM-CAP

SHROOM-CAP is the third edition of the shared task, held at CHOMPS-2025. This cross-lingual extension of SHROOM expands the scope to high and low resource languages with the special focus to indic languages. This time, the task targets hallucination in scientific domain — asking participants to predict if there is scienific hallucinations or not and how likely they are.

🔗 Explore SHROOM-CAP Shared Task


Mu-SHROOM 2025

Mu-SHROOM

Mu-SHROOM is the second edition of the shared task, held at SemEval-2025. This multilingual extension of SHROOM expands the scope to 14 languages and shifts the focus to instruction-tuned large language models (LLMs). Morevoer, in Mu-SHROOM we target hallucination spans at the character level!

🔗 Explore Mu-SHROOM 2025


SHROOM 2024

SHROOM logo

SHROOM is the OG Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. We kicked off the initiative at SemEval-2024. Participants were asked to identify hallucinated content in NLG outputs across several generation tasks (e.g., machine translation, paraphrasing, definition modeling), both with and without access to the model that generated the outputs.

🔗 Go to the SHROOM 2024 website


👥🙌🌐 Join the SHROOM Community

Whether you’re interested in joining the next round, learning from past editions, or just staying informed about hallucination detection in NLG, we’d love to have you in the community.


🧪 Want to dive straight in?

Visit one of the task pages above and start exploring data, baselines, and results.

Reach out if you have further questions, collaboration ideas or simply want to say hi: