[CVPR2017] End-to-end learning of driving models from large-scale video dataset
The goal of this paper is to learn a driving model or policy from large scale un-calibrated sources. The proposed model (learned from the large scale data sources) is generic in that it learns a predictive future motion path given the present car state.
For the policy learning, the authors of the paper suggest Fully Convolutional Network (FCN)-Long Short Term Memory(LSTM) network architecture. The inputs to the network are the front-view driving image at current time and the previous egomotion of the car. The output of the network is then the next motion of the car. Here, the motion of the car can be the continuous value of the steering wheel angle or one of the four discrete actions {straight, stop, left-turn, right-turn}. The following figure briefly shows the network architecture.
What is noteworthy here is that the proposed network was trained to produce not only the next motion but also semantic segmentation of the input image. They called such a learning method 'privileged learning' and said that this extra information helps training of a better model than possible using only the view available at test time.