Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or a simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end trainable architecture for learning to predict a distribution over future vehicle egomotion from instantaneous monocular camera observations and previous vehicle state. Our model incorporates a novel FCN-LSTM architecture, which can be learned from large-scale crowd-sourced vehicle action data, and leverages available scene segmentation side tasks to improve performance under a privileged learning paradigm.
End-to-end Learning of Driving Models from Large-scale Video Datasets
A generic vehicle motion model is learned from large-scale crowd-sourced video data using an FCN-LSTM architecture, improving performance with scene segmentation side tasks.
- Year
- 2016
- Venue
- end-to-end-learning-of-driving-models-from-1
- Authors
- 4
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1612.01079v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar