Disability-First Design and Creation of A Dataset Showing Private Visual Information Collected With People Who Are Blind

Tanusree Sharma, Abigale Stangl, Lotus Zhang, Yu-Yun Tseng, Inan Xu, Leah Findlater, Danna Gurari, and Yang Wang

Abstract

We present the design and creation of a disability-first dataset, “BIV-Priv,” which contains 728 images and 728 videos of 14 private categories captured by 26 blind participants to support downstream development of artificial intelligence (AI) models. While best practices in dataset creation typically attempt to eliminate private content, some applications require such content for model development. We describe our approach in creating this dataset with private content in an ethical way, including using props rather than participants’ own private objects and balancing multi-disciplinary perspectives (e.g., accessibility, privacy, computer vision) to meet the tangible metrics (e.g., diversity, category, amount of content) to support AI innovations. We observed challenges that our participants encountered during the data collection, including accessibility issues (e.g., understanding foreground vs. background object placement) and issues due to the sensitive nature of the content (e.g., discomfort in capturing some props such as condoms around family members).

Publication

Tanusree Sharma, Abigale Stangl, Lotus Zhang, Yu-Yun Tseng, Inan Xu, Leah Findlater, Danna Gurari, and Yang Wang. Disability-First Design and Creation of A Dataset Showing Private Visual Information Collected With People Who Are Blind. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23–28, 2023, Hamburg, Germany.

Accepted Paper, Conference Presentation Recording, Preview, Supplemental Material: ACM Digital Library

Accepted PDF Version: Download

BIV-Priv Dataset

The BIV-Priv Image dataset is available here

The BIV-Priv Video dataset is available here

Acknowledgements

The authors thank the participants for their contributions and sharing their insights. We also thank Yaman Yu and Smirity Kaushik for their help with the user study. This research was in part supported by the National Science Foundation (NSF) grants #2126314, #2028387, #2125925, and #2148080 and the CRA CIFellows program.

Contact

For questions and/or comments, feel free to contact:

Tanusree Sharma
tsharma6@illinois.edu