Welcome to Shroom-visions: A Shared Task on Hallucination Detection in Large Vision-Language Models
Welcome to the official shared task website for SHROOM-visions, a model-agnostic hallucination detection task focusing on large vision-language models (LVLMs).
Shroom-visions stands for “Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in vision language models “. This task invites participants to detect and classify fine-grained hallucination spans in image-conditioned text outputs, using a novel dataset designed for enduring evaluation across model generations.
This shared task builds upon the *SHROOM series with key innovations:
- Vision-language focus: Hallucination detection in image-conditioned text generation (VQA, image captioning, etc.)
- Fine-grained span classification: Four classes of hallucination
- Multilingual scope: Chinese, English, French, Italian
The information on this website is subject to change. We send announcements for any major update on the Google group mailing list and the shared task Slack.
What is Shroom-visions?
The task consists of detecting and classifying spans of text corresponding to hallucinations in image-conditioned outputs. Participants are asked to:
- Identify which character spans in a given text constitute hallucinations
- Classify each hallucinated span into one of four categories:
- A. Invention: entities, objects, properties, or events not present in the image
- B. Mischaracterization: incorrect description of content that is visible
- C. OCR Problem: misreading of text visible in the image
- D. Miscounting: incorrect reporting of quantities of visible items
- E. Other: the hallucination does not fit in classes A-C
In practice, each datapoint consists of:
- An image
- A textual prompt
- A text output
- Detected spans of hallucination
- A class of hallucinated content, for each span detected.
Participants must compute, for every character in the output string: a) The probability that it is part of a hallucinated span b) The hallucination category of each span
Participants are free to use any approach they deem appropriate, including external resources, and may work on any subset of the four languages.
How will participants be evaluated?
Participants will be ranked along two primary (character-level) metrics:
- Span Identification: Intersection-over-Union (IoU) of characters marked as hallucinated in the gold reference vs. predicted
- Confidence Calibration: Correlation between the probability assigned by a participant’s system that a character is hallucinated and the empirical probability observed in our multi-annotator gold data
Rankings and submissions will be handled separately per language.
Dataset Overview
We provide a curated dataset of 20,000 samples with multiple annotations with a fine-grained, span-level labeling scheme.
| Dataset Split | Size | Composition | Access |
|---|---|---|---|
| Training set | ~15,200 samples | Outputs from 5 diverse LVLMs, ~3,800 samples per language | |
| Test set | 4,800 samples | 1,200 samples per language | Closed test set |
Download the annotated training set and the unlabelled test set: Download data
Download input images: Download images
Important Dates
All deadlines are “anywhere on Earth” (23:59 UTC-12).
- Train sets available: 10 May 2026
- Evaluation phase ends: 12 July 2026
- System description papers due: 27 July 2026 (TBC)
- Notification of acceptance: 21 August 2026 (TBC)
- Camera-ready due: 30 August 2026 (TBC)
- UncertainLP workshop: October 2026 (co-located with EMNLP)
This information is subject to change; also refer to the UncertainLP workshop website for supplementary information.
How to Participate
- Register: Please register your team before making a submission on our submission platform
- Submit results: Use our platform to submit your predictions before 12 July 2026
- Submit your system description: Papers should be submitted by 19 July 2026 (TBC, further details will be announced later)
Organizers of the shared task
- Raúl Vázquez, University of Helsinki, Finland
- Timothee Mickus, University of Helsinki, Finland
- Claudio Savelli, Politecnico di Torino, Italy
- Eduardo Calò , Universiteit Utrecht, Netherlands
- Emilio Raimond, Université Bretagne Sud, France
- Stella Frank, University of Copenhagen, Denmark
- Hengyu Luo, University of Helsinki,Finland
- Aman Sinha, Université de Lorraine, France
- Vincent Segonne, Université Bretagne Sud, France
- Flavio Giobergia, Politecnico di Torino, Italy
- Jörg Tiedemann, University of Helsinki, Finland
Still have questions?
Don’t hesitate to reach out via our communication channels! We’re working on building up an FAQ page as well.
Looking for something else?
The websites for all iterations of the shared task series are available here:
- Mu-SHROOM @ SemEval-2025 Task 3
- SHROOM-CAP @ CHOMPS workshop AACL-IJCNLP 2025 Task 6
- SHROOM @ SemEval-2024 Task 6
The logo is available in several colors: blue, green, brown, and purple. We encourage participants to use it where relevant (especially in posters and presentations)!
Previous Participants & Teams
This section will be populated after the evaluation phase.
Join the mailing group: shroom-visions-2026@googlegroups.com