Search | VHL Regional Portal

Outracing champion Gran Turismo drivers with deep reinforcement learning.

Wurman, Peter R; Barrett, Samuel; Kawamoto, Kenta; MacGlashan, James; Subramanian, Kaushik; Walsh, Thomas J; Capobianco, Roberto; Devlic, Alisa; Eckert, Franziska; Fuchs, Florian; Gilpin, Leilani; Khandelwal, Piyush; Kompella, Varun; Lin, HaoChih; MacAlpine, Patrick; Oller, Declan; Seno, Takuma; Sherstan, Craig; Thomure, Michael D; Aghabozorgi, Houmehr; Barrett, Leon; Douglas, Rory; Whitehead, Dion; Dürr, Peter; Stone, Peter; Spranger, Michael; Kitano, Hiroaki.

Nature ; 602(7896): 223-228, 2022 02.

Article in English | MEDLINE | ID: mdl-35140384

ABSTRACT

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits1. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world's best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing's important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world's best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.

Subject(s)

Automobile Driving , Deep Learning , Reinforcement, Psychology , Sports , Video Games , Automobile Driving/standards , Competitive Behavior , Humans , Reward , Sports/standards

Evolution of flexibility and rigidity in retaliatory punishment.

Morris, Adam; MacGlashan, James; Littman, Michael L; Cushman, Fiery.

Proc Natl Acad Sci U S A ; 114(39): 10396-10401, 2017 09 26.

Article in English | MEDLINE | ID: mdl-28893996

ABSTRACT

Natural selection designs some social behaviors to depend on flexible learning processes, whereas others are relatively rigid or reflexive. What determines the balance between these two approaches? We offer a detailed case study in the context of a two-player game with antisocial behavior and retaliatory punishment. We show that each player in this game-a "thief" and a "victim"-must balance two competing strategic interests. Flexibility is valuable because it allows adaptive differentiation in the face of diverse opponents. However, it is also risky because, in competitive games, it can produce systematically suboptimal behaviors. Using a combination of evolutionary analysis, reinforcement learning simulations, and behavioral experimentation, we show that the resolution to this tension-and the adaptation of social behavior in this game-hinges on the game's learning dynamics. Our findings clarify punishment's adaptive basis, offer a case study of the evolution of social preferences, and highlight an important connection between natural selection and learning in the resolution of social conflicts.

Subject(s)

Punishment/psychology , Social Behavior , Social Control, Formal , Aggression/psychology , Cooperative Behavior , Humans , Learning/physiology , Reward

Initial Progress Toward Development of a Voice-Based Computer-Delivered Motivational Intervention for Heavy Drinking College Students: An Experimental Study.

Kahler, Christopher W; Lechner, William J; MacGlashan, James; Wray, Tyler B; Littman, Michael L.

JMIR Ment Health ; 4(2): e25, 2017 Jun 28.

Article in English | MEDLINE | ID: mdl-28659259

ABSTRACT

BACKGROUND: Computer-delivered interventions have been shown to be effective in reducing alcohol consumption in heavy drinking college students. However, these computer-delivered interventions rely on mouse, keyboard, or touchscreen responses for interactions between the users and the computer-delivered intervention. The principles of motivational interviewing suggest that in-person interventions may be effective, in part, because they encourage individuals to think through and speak aloud their motivations for changing a health behavior, which current computer-delivered interventions do not allow. OBJECTIVE: The objective of this study was to take the initial steps toward development of a voice-based computer-delivered intervention that can ask open-ended questions and respond appropriately to users' verbal responses, more closely mirroring a human-delivered motivational intervention. METHODS: We developed (1) a voice-based computer-delivered intervention that was run by a human controller and that allowed participants to speak their responses to scripted prompts delivered by speech generation software and (2) a text-based computer-delivered intervention that relied on the mouse, keyboard, and computer screen for all interactions. We randomized 60 heavy drinking college students to interact with the voice-based computer-delivered intervention and 30 to interact with the text-based computer-delivered intervention and compared their ratings of the systems as well as their motivation to change drinking and their drinking behavior at 1-month follow-up. RESULTS: Participants reported that the voice-based computer-delivered intervention engaged positively with them in the session and delivered content in a manner consistent with motivational interviewing principles. At 1-month follow-up, participants in the voice-based computer-delivered intervention condition reported significant decreases in quantity, frequency, and problems associated with drinking, and increased perceived importance of changing drinking behaviors. In comparison to the text-based computer-delivered intervention condition, those assigned to voice-based computer-delivered intervention reported significantly fewer alcohol-related problems at the 1-month follow-up (incident rate ratio 0.60, 95% CI 0.44-0.83, P=.002). The conditions did not differ significantly on perceived importance of changing drinking or on measures of drinking quantity and frequency of heavy drinking. CONCLUSIONS: Results indicate that it is feasible to construct a series of open-ended questions and a bank of responses and follow-up prompts that can be used in a future fully automated voice-based computer-delivered intervention that may mirror more closely human-delivered motivational interventions to reduce drinking. Such efforts will require using advanced speech recognition capabilities and machine-learning approaches to train a program to mirror the decisions made by human controllers in the voice-based computer-delivered intervention used in this study. In addition, future studies should examine enhancements that can increase the perceived warmth and empathy of voice-based computer-delivered intervention, possibly through greater personalization, improvements in the speech generation software, and embodying the computer-delivered intervention in a physical form.

Social is special: A normative framework for teaching with and learning from evaluative feedback.

Ho, Mark K; MacGlashan, James; Littman, Michael L; Cushman, Fiery.

Cognition ; 167: 91-106, 2017 10.

Article in English | MEDLINE | ID: mdl-28341268

ABSTRACT

Humans often attempt to influence one another's behavior using rewards and punishments. How does this work? Psychologists have often assumed that "evaluative feedback" influences behavior via standard learning mechanisms that learn from environmental contingencies. On this view, teaching with evaluative feedback involves leveraging learning systems designed to maximize an organism's positive outcomes. Yet, despite its parsimony, programs of research predicated on this assumption, such as ones in developmental psychology, animal behavior, and human-robot interaction, have had limited success. We offer an explanation by analyzing the logic of evaluative feedback and show that specialized learning mechanisms are uniquely favored in the case of evaluative feedback from a social partner. Specifically, evaluative feedback works best when it is treated as communicating information about the value of an action rather than as a form of reward to be maximized. This account suggests that human learning from evaluative feedback depends on inferences about communicative intent, goals and other mental states-much like learning from other sources, such as demonstration, observation and instruction. Because these abilities are especially developed in humans, the present account also explains why evaluative feedback is far more widespread in humans than non-human animals.

Subject(s)

Feedback, Psychological , Punishment , Reward , Social Behavior , Communication , Humans , Models, Psychological , Reinforcement, Psychology

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL