Welcome to the *SHROOM Shared-Task Series on Hallucinations and Related Observable Overgeneration Mistakes
The SHROOM shared task series brings together researchers and practitioners interested in detecting hallucinations — that is, fluent yet semantically incorrect or unsupported outputs — in natural language generation (NLG) systems. Since 2024, we’ve been pushing the boundaries of automatic hallucination detection, with each edition introducing new challenges and innovations.
This website serves as a central hub to explore the current and past editions of the shared task, including SHROOM (2024), Mu-SHROOM (2025), and the upcoming ν-SHROOM (2026).
Explore the tasks
SHROOM-visions 2026
Shroom-visions is the fourth iteration of the SHROOM series, hosted at the UncertainLP Workshop (co-located with EMNLP 2026). This edition tackles hallucinations in vision-language models.
With it we provide a dataset for hallucination detection in image-conditioned text generation (VQA, image captioning, etc.). and a a four-class taxonomy for hallucinaiton occurences in a multilingual setup (Chinese, English, French, Italian).
SHROOM-CAP Shared Task 2025

SHROOM-CAP is the third edition of the shared task, held at CHOMPS-2025. This cross-lingual extension of SHROOM expands the scope to high and low resource languages with the special focus to indic languages. This time, the task targets hallucination in scientific domain — asking participants to predict if there is scienific hallucinations or not and how likely they are.
🔗 Explore SHROOM-CAP Shared Task
Mu-SHROOM 2025

Mu-SHROOM is the second edition of the shared task, held at SemEval-2025. This multilingual extension of SHROOM expands the scope to 14 languages and shifts the focus to instruction-tuned large language models (LLMs). Morevoer, in Mu-SHROOM we target hallucination spans at the character level!
SHROOM 2024

SHROOM is the OG Shared-task on Hallucinations and Related Observable Overgeneration Mistakes. We kicked off the initiative at SemEval-2024. Participants were asked to identify hallucinated content in NLG outputs across several generation tasks (e.g., machine translation, paraphrasing, definition modeling), both with and without access to the model that generated the outputs.
🔗 Go to the SHROOM 2024 website
👥🙌🌐 Join the SHROOM Community
Whether you’re interested in joining the next round, learning from past editions, or just staying informed about hallucination detection in NLG, we’d love to have you in the community.
- Join the conversation on Slack
- Check out the past editions Google gorups
🧪 Want to dive straight in?
Visit one of the task pages above and start exploring data, baselines, and results.
Reach out if you have further questions, collaboration ideas or simply want to say hi:
- Timothee Mickus, University of Helsinki, Finland
- Raúl Vázquez, University of Helsinki, Finland