Abstract: Current models of human drivers related to autonomous vehicles is insufficient. Often, autonomous driving algorithms model human drivers simply as as opaque moving obstacles without reward functions of their own. The goal of this project is to determine the prediction accuracy of a continuous inverse reinforcement learning (IRL) algorithm in recovering a human driver’s reward function. While standard reinforcement learning seeks to produce actions that maximize a reward function, IRL seeks to produce a reward function that maximizes the probability of desired actions. IRL works by optimizing the weights of a feature vector, where features might include the velocity of the car, the centering of the car in a lane, or the distance between the car and other obstacles. In order to verify the prediction accuracy of IRL, we must first verify that the set of features intrinsic to the model are good. That is, we must ask, are we able to produce all (or most) useful driving behaviors from the current set of features? My work determined the limitations of a proposed set of features via a 2D driving simulator and user studies. 

Story of Contribution: In order to evaluate the baseline driving model and set of features, I formulated a list of standard driving maneuvers based on the California Driver Handbook. I then programmed interactive 2D driving simulations in Python, had participants perform various driving maneuvers in the simulator, and ran IRL with varying feature vectors on these demonstrations in order to see which models were able to reproduce similar behaviors. Iterating this process led to the addition of new features in the model.

Lessons Learned: Through this research experience, I learned how to read and manage a large codebase written by others (as my Python codebase was originally written by a graduate student and contained several thousand lines of code) and learned how to improve the codebase with relevant features and helpful documentation. I learned to design full factor user studies and then how to cope with this “noisy” human data these studies produced. Additionally, I became familiar with the current state of IRL algorithms and also gained new exposure to topics such as control theory.

Team: Anca Dragan, Dorsa Sadigh

13690584_1231899360177202_1061486447002041723_n-1

Presenting at Berkeley Deep Drive.

13770534_1247331951967276_4409220928342818850_n-1

Presenting at the SUPERB Poster Session.