Skip to the content.

Welcome to Shroom-visions: A Shared Task on Hallucination Detection in Large Vision-Language Models

Shroom-visions

Welcome to the official shared task website for SHROOM-visions, a model-agnostic hallucination detection task focusing on large vision-language models (LVLMs).

Shroom-visions stands for “Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in vision language models “. This task invites participants to detect and classify fine-grained hallucination spans in image-conditioned text outputs, using a novel dataset designed for enduring evaluation across model generations.

This shared task builds upon the *SHROOM series with key innovations:

The information on this website is subject to change. We send announcements for any major update on the Google group mailing list and the shared task Slack.

What is Shroom-visions?

The task consists of detecting and classifying spans of text corresponding to hallucinations in image-conditioned outputs. Participants are asked to:

  1. Identify which character spans in a given text constitute hallucinations
  2. Classify each hallucinated span into one of four categories:
    • A. Invention: entities, objects, properties, or events not present in the image
    • B. Mischaracterization: incorrect description of content that is visible
    • C. OCR Problem: misreading of text visible in the image
    • D. Miscounting: incorrect reporting of quantities of visible items
    • E. Other: the hallucination does not fit in classes A-C

In practice, each datapoint consists of:

Participants must compute, for every character in the output string: a) The probability that it is part of a hallucinated span b) The hallucination category of each span

Participants are free to use any approach they deem appropriate, including external resources, and may work on any subset of the four languages.

How will participants be evaluated?

Participants will be ranked along two primary (character-level) metrics:

  1. Span Identification: Intersection-over-Union (IoU) of characters marked as hallucinated in the gold reference vs. predicted
  2. Confidence Calibration: Correlation between the probability assigned by a participant’s system that a character is hallucinated and the empirical probability observed in our multi-annotator gold data

Rankings and submissions will be handled separately per language.

Dataset Overview

We provide a curated dataset of 20,000 samples with multiple annotations with a fine-grained, span-level labeling scheme.

Dataset Split Size Composition Access
Training set ~15,200 samples Outputs from 5 diverse LVLMs, ~3,800 samples per language  
Test set 4,800 samples 1,200 samples per language Closed test set

Download the annotated training set and the unlabelled test set: Download data
Download input images: Download images

Important Dates

All deadlines are “anywhere on Earth” (23:59 UTC-12).

This information is subject to change; also refer to the UncertainLP workshop website for supplementary information.

How to Participate

  1. Register: Please register your team before making a submission on our submission platform
  2. Submit results: Use our platform to submit your predictions before 12 July 2026
  3. Submit your system description: Papers should be submitted by 19 July 2026 (TBC, further details will be announced later)

Organizers of the shared task

Still have questions?

Don’t hesitate to reach out via our communication channels! We’re working on building up an FAQ page as well.

Looking for something else?

The websites for all iterations of the shared task series are available here:

The logo is available in several colors: blue, green, brown, and purple. We encourage participants to use it where relevant (especially in posters and presentations)!

Previous Participants & Teams

This section will be populated after the evaluation phase.


Join the mailing group: shroom-visions-2026@googlegroups.com

Submission platform