GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts

GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts

Jenna Kang, Maria Beatriz Silva, Patsorn Sangkloy, Kenneth Chen, Niall L. Williams, Qi Sun
WACV 2026

Abstract

Video generated by the current state-of-the-art generative models contain undesirable artifacts. We introduce GeneVA, the first large-scale dataset of human-annotated artifact bounding boxes in AI-generated videos. The dataset consists of 16,356 AI-generated videos, each labeled by a human annotator with per-frame artifact bounding boxes, their labels and descriptions, and video quality ratings. A custom data collection pipeline was developed in Prolific, and a novel taxonomy for spatio-temporal artifacts present in AI-generated videos was defined. The videos were from the VidProM dataset, with text prompts from this dataset then used to generate an additional subset of videos using Sora. We trained an artifact detector and caption generator using a pre-trained image-based model, and a custom temporal fusion module.m The dataset can be found at https://www.immersivecomputinglab.org/publication/geneva/. We hope that datasets like GeneVA will encourage improvements in artifact detection in AI-generated video towards applications such as deepfake detection.

Videos: https://ai-generated-videos-icl.s3.us-east-1.amazonaws.com

Annotations: https://geneva-annotations.s3.us-east-1.amazonaws.com/AnnotatedVideos.json