In this paper we investigate learning visual models for the steps of ordinary
tasks using weak supervision via instructional narrations and an ordered list
of steps instead of strong supervision via temporal annotations. At the heart
of our approach is the observation that weakly supervised learning may be
easier if a model shares components while learning different steps: pour egg' should be trained jointly with other tasks involving pour' and `egg'. We
formalize this in a component model for recognizing steps and a weakly
supervised learning framework that can learn this model under temporal
constraints from narration and the list of steps. Past data does not permit
systematic studying of sharing and so we also gather a new dataset, CrossTask,
aimed at assessing cross-task sharing. Our experiments demonstrate that sharing
across tasks improves performance, especially when done at the component level
and that our component model can parse previously unseen tasks by virtue of its
compositionality.
Cross-task weakly supervised learning from instructional videos
A component model for recognizing task steps using weak supervision via instructional narrations and step lists outperforms non-shared models, especially when sharing occurs at the component level.
- Year
- 2019
- Venue
- cross-task-weakly-supervised-learning-from-1
- Authors
- 6
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1903.08225v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar