shroom

Welcome to Shroom-visions: A Shared Task on Hallucination Detection in Large Vision-Language Models

Shroom-visions

Welcome to the official shared task website for SHROOM-visions, a model-agnostic hallucination detection task focusing on large vision-language models (LVLMs).

Shroom-visions stands for “Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in vision language models “. This task invites participants to detect and classify fine-grained hallucination spans in image-conditioned text outputs, using a novel dataset designed for enduring evaluation across model generations.

This shared task builds upon the *SHROOM series with key innovations:

Vision-language focus: Hallucination detection in image-conditioned text generation (VQA, image captioning, etc.)
Fine-grained span classification: Four classes of hallucination
Multilingual scope: Chinese, English, French, Italian

The information on this website is subject to change. We send announcements for any major update on the Google group mailing list and the shared task Slack. We also have socials! Find us on Twitter at @SHROOM_ST and on BlueSky at @shroom-st.bsky.social.

What is Shroom-visions?

The task consists of detecting and classifying spans of text corresponding to hallucinations in image-conditioned outputs. Participants are asked to:

Identify which character spans in a given text constitute hallucinations
Classify each hallucinated span into one of four categories:
- A. Invention: entities, objects, properties, or events not present in the image
- B. Mischaracterization: incorrect description of content that is visible
- C. OCR Problem: misreading of text visible in the image
- D. Miscounting: incorrect reporting of quantities of visible items
- E. Other: the hallucination does not fit in classes A-C

In practice, each datapoint consists of:

An image
A textual prompt
A text output
Detected spans of hallucination
A class of hallucinated content, for each span detected.

Participants must compute, for every character in the output string: a) The probability that it is part of a hallucinated span b) The hallucination category of each span

Participants are free to use any approach they deem appropriate, including external resources, and may work on any subset of the four languages.

How will participants be evaluated?

Participants will be ranked along two primary (character-level) metrics:

Span Identification: Intersection-over-Union (IoU) of characters marked as hallucinated in the gold reference vs. predicted
Confidence Calibration: Correlation between the probability assigned by a participant’s system that a character is hallucinated and the empirical probability observed in our multi-annotator gold data

Rankings and submissions will be handled separately per language.

Dataset Overview

We provide a curated dataset of 20,000 samples with multiple annotations with a fine-grained, span-level labeling scheme.

Dataset Split	Size	Composition	Access
Training set	~15,200 samples	Outputs from 5 diverse LVLMs, ~3,800 samples per language
Test set	4,800 samples	1,200 samples per language	Closed test set

Download the annotated training set and the unlabelled test set: Download data
Download input images: Download images

We are releasing a participant kit containing:

Scoring program and format checker
Two baselines: a random baseline and a multimodal transformer-based system

Download the participant kit at this link!

Important Dates

All deadlines are “anywhere on Earth” (23:59 UTC-12).

Train sets available: 10 May 2026
Evaluation phase ends: 31 July 2026
System description papers due: 10 August 2026 (TBC)
Notification of acceptance: 21 August 2026 (TBC)
Camera-ready due: 10 September 2026 (TBC)
UncertainLP workshop: end of October 2026 (co-located with EMNLP)

This information is subject to change; also refer to the UncertainLP workshop website for supplementary information.

How to Participate

Register: Please register your team before making a submission on our submission platform
Submit results: Use our platform to submit your predictions before 31 July 2026
Submit your system description: Papers should be submitted by 10 August 2026 (TBC, further details will be announced later)

Organizers of the shared task

Raúl Vázquez, University of Helsinki, Finland
Timothee Mickus, University of Helsinki, Finland
Claudio Savelli, Politecnico di Torino, Italy
Eduardo Calò , Universiteit Utrecht, Netherlands
Emilio Raimond, Université Bretagne Sud, France
Stella Frank, University of Copenhagen, Denmark
Hengyu Luo, University of Helsinki,Finland
Aman Sinha, Université de Lorraine, France
Vincent Segonne, Université Bretagne Sud, France
Flavio Giobergia, Politecnico di Torino, Italy
Jörg Tiedemann, University of Helsinki, Finland
Artem Shelmanov, MBZUAI, UAE
Artem Vazhentsev, MBZUAI, UAE
Chuyuan Li, University Grenoble Alpes, France

Still have questions?

Don’t hesitate to reach out via our communication channels! We’re working on building up an FAQ page as well.

Looking for something else?

The websites for all iterations of the shared task series are available here:

The logo is available in several colors: blue, green, brown, and purple. We encourage participants to use it where relevant (especially in posters and presentations)!

Previous Participants & Teams

This section will be populated after the evaluation phase.

Join the mailing group: shroom-visions-2026@googlegroups.com

Submission platform

Shroom-visions Shared Task