We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight widely used datasets, we crowdsource labeling the images as “ambiguous” or “not ambiguous” to segment in order to construct a new dataset we call STATIC. Using STATIC, we develop a system that automatically predicts which images are ambiguous. Experiments demonstrate the advantage of our prediction system over existing saliency-based methods on images from vision benchmarks and images taken by blind people who are trying to recognize objects in their environment. Finally, we introduce a crowdsourcing system to achieve cost savings for collecting the diversity of all valid “ground truth” foreground object segmentations by collecting extra segmentations only when ambiguity is expected. Experiments show our system eliminates up to 47% of human effort compared to existing crowdsourcing methods with no loss in capturing the diversity of ground truths.
D. Gurari, K. He, B. Xiong, J. Zhang, M. Sameki, S. D. Jain, S. Sclaroff, M. Betke, and K. Grauman. “Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s).” International Journal of Computer Vision (IJCV), accepted January 2018.
Caffe model: top-performing CNN-FT model and prototxt file
Caffe model: top-performing CNN-FT model and prototxt file
The authors gratefully acknowledge funding from the Office of Naval Research (ONR YIP N00014-12-1-0754) and National Science Foundation (IIS-1421943) and thank the anonymous crowd workers for participating in our experiments.
For questions and/or comments, feel free to contact:
Danna Gurari
danna.gurari@ischool.utexas.edu