The VIRAT Video Dataset

The VIRAT Video Dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets. It has become a benchmark dataset for the computer vision community.

January 2020: The VIRAT Video dataset annotations that were created as part of the IARPA DIVA program are now available in a public repository here. The repository also contains the activity definitions used for the annotations.

About the VIRAT Video dataset

The VIRAT video dataset distinguishes itself on the following characteristics.

Realism and natural scenes: Data was collected in natural scenes showing people performing normal actions in standard contexts, with uncontrolled, cluttered backgrounds. There are frequent incidental movers and background activities. Actions performed by directed actors were minimized; most were actions performed by the general population.

Diversity: Data was collected at multiple sites distributed throughout the USA. A variety of camera viewpoints and resolutions were included, and actions are performed by many different people.

Quantity: Diverse types of human actions and human-vehicle interactions are included, with a large number of examples (>30) per action class.

Wide range of resolution and frame rates: Many applications such as video surveillance operate across a wide range of spatial and temporal resolutions. The dataset is designed to capture these ranges, with 2–30Hz frame rates and 10–200 pixels in person-height. The dataset provides both the original videos with HD quality and downsampled versions both spatially and temporally.

Ground and Aerial Videos: Both ground camera videos and aerial videos are collected released as part of VIRAT Video Dataset.

The VIRAT Video Dataset contains two broad categories of activities (single-object and two-objects) which involve both human and vehicles. Details of included activities, and annotation formats may differ per release. Relevant information can be found from each release information.

Additional Data and Annotations

The large-scale Multiview Extended Video with Activities (MEVA) dataset features more than 250 hours of ground camera video, with additional resources such as UAV video, camera models, and a subset of 12.5 hours of annotated data. The dataset is designed for activity detection in multi-camera environments. It was created on the Intelligence Advanced Research Projects Activity (IARPA) Deep Intermodal Video Analytics (DIVA) program to support DIVA performers and the broader research community. The dataset and annotations are available at the site.

Release News

December 29, 2016: Aerial annotations status: Annotating the aerial video proved extremely challenging, and we were unable to complete the annotations on the original contract. We are actively pursuing promising funding opportunities and hope to have an update soon.

2012 Jan 11th: Version 2.0 of the VIRAT Public Dataset is updated with Aerial video subsets. Currently, only videos are available.

2011 Oct 4th: Version 2.0 of the VIRAT Public Dataset is released with Ground video subsets. All videos, totalling ~8.5 hours of HD data, are Stationary Ground Videos. The annotations are for 12 event types, annotated in videos from 11 different outdoor scenes. The release also includes suggested evaluation metrics and methodologies (data folds for cross-validation etc.)

Release 2.0 is described in a PDF available here.


Release 2.0 of the VIRAT Video Dataset is described in a PDF available here.

The VIRAT Aerial data is Distribution A, Release Unlimited, and is available via the VIRAT-Aerial collection.

The VIRAT Ground Camera data is available pursuant to the VIRAT Video Dataset Protection Agreement. You can browse the current Release 2.0, as well as prior releases.

Citation Information

If you make use of the VIRAT Video Dataset, please use the following citation (with release version information):
"A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video" by Sangmin Oh, Anthony Hoogs, Amitha Perera, Naresh Cuntoor, Chia-Chih Chen, Jong Taek Lee, Saurajit Mukherjee, J.K. Aggarwal, Hyungtae Lee, Larry Davis, Eran Swears, Xiaoyang Wang, Qiang Ji, Kishore Reddy, Mubarak Shah, Carl Vondrick, Hamed Pirsiavash, Deva Ramanan, Jenny Yuen, Antonio Torralba, Bi Song, Anesco Fong, Amit Roy-Chowdhury, and Mita Desai, in Proceedings of IEEE Comptuer Vision and Pattern Recognition (CVPR), 2011.

Annotation Information

The annotations currently available in the public repository were created on the IARPA DIVA program. They feature full tracks on all movers in the video data, with activity annotations for 46 activity types and 7 object types. The annotations are partitioned into train and validate sets. The Annotation Guidelines define and detail the object and activity types for these annotations.

Contact and Acknowledgements

A dedicated e-mail list to share information and report issues about the dataset can be found here. Please subscribe the list for announcements and questions and answers.

The VIRAT Video Dataset collection work is supported by Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-08-C-0135. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA.

The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.