Skip to the content.

Mu-SHROOM

Welcome to the official shared task website for SHROOM-CAP, a CHOMPS 2025 shared task!

SHROOM-CAP stands for “Shared-task on Hallucinations and Related Observable Overgeneration Mistakes in Crosslingual Analyses of Publications”. SHROOM-CAP will invite participants to detect hallucination in the outputs of LLMs in a scientific context. This shared task extends our previous iteration, SHROOM, with a few key changes:

The information on this website is subject to change. We will send announcements for any major update on the Google group mailing list.

What is SHROOM-CAP?

The task consists of detecting presence of scientific hallucinations. Participants are asked to determine if a given scientific text produced by LLMs constitute hallucinations. The task is held in a cross-lingual setting, i.e., we provide data in multiple mixed languages produced by a variety of public-weights LLMs.´

In practice, we provide an LLM output (as a string of characters, a list of tokens, and a list of logits), and participants have to predict if the LLM output string contains a hallucination (binary classification).

Participants are free to use any approach they deem appropriate, including using external resources, and work on any subset of languages they are interested in.

How will participants be evaluated?

Participants will be evaluated for performing binary classification to identify cases of scientific hallucinations. This will be done using via macro-F1 score for two criterions: (i) Factual Mistakes and (ii) Fluency Mistakes

Rankings and submissions will be done separately per language.

Participant info

To participate, the participants need to register via https://forms.gle/hWR9jwTBjZQmFKAE7. This form will enable us add the participants on the google group for further communication.

Data

Below are links to access the data already released, as well as provisional expected release dates for future splits. Do note that release dates are subject to change.

Dataset split Access Description
Dev Set download (dev1) Contains languages: en, hi, es, fr
Sample Testing data download (test1) Contains sample format of test set

Important dates

This information is subject to change.

Organizers of the shared task

Looking for something else?

The websites for all the iterations of the shared task are available here: