Decision-making and control of an autonomous agent in interaction with partially-known agents

Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
This dissertation focuses on the higher-level decision making and learning as well as the lower-level motion planning and control of an autonomous agent in interaction with other dynamic and goal-driven, but partially known agents. In particular, this dissertation considers a reinforcement learning (RL)-based approach for the higher-level decision making, and for the lower-level control, it studies motion planning for signal temporal logic (STL) tasks. ☐ Regarding the higher-level decision making, this dissertation first considers the single-agent modeling of the interaction, for which a new probably approximately correct (PAC) RL algorithm for Markov Decision Processes (MDPs) is offered that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as Dyna-Delayed Q-learning (DDQ), combines model-free Delayed Q-learning and model-based R-max algorithms while it is guaranteed that its performance in the worst case is comparable to the best worst-case of either model-based or model free components. A PAC analysis of the DDQ algorithm with the derivation of its sample complexity bound is also presented. Numerical results are provided to evaluate new algorithm’s sample efficiency compared to its parents and compared to the best known PAC model-free and model-based algorithms in application. In addition, a real-world experimental implementation of DDQ in the context of pediatric motor rehabilitation facilitated by infant-robot interaction highlights the potential benefits of the reported method. ☐ The dissertation further investigates the multi-agent modeling of the interaction, for which a theoretical framework for PAC multi-agent reinforcement learning (MARL) is developed. In addition to guiding the design of a provably PAC marl algorithm, the framework enables checking whether an arbitrary marl algorithm is PAC. Using this theoretical framework together with the idea of delayed Q-learning, the well-known Nash Q-learning algorithm is modified to build a new PAC marl algorithm—referred to as Delayed Nash Q-learning algorithm—for general-sum Markov games. Comparative numerical results demonstrate the algorithm’s performance and robustness. ☐ As the unobservability of the reward functions renders the known marl methods inapplicable, the problem is addressed by this dissertation through a novel combination MARL and inverse reinforcement learning (IRL) techniques. The developed method can concurrently estimate the agents’ reward functions and learn an equilibrium policy for the autonomous agent. Numerical evidence in support of the proposed methodology is offered in the context of grid-world examples. ☐ Regarding the lower-level motion planning and control, this dissertation reports a new computationally efficient approach to signal temporal logic (STL) control synthesis. This new method utilizes navigation functions as the basis to construct control barrier functions, and composes navigation function-based barrier functions using nonsmooth mappings to realize Boolean operations between the predicates that those barrier functions encode. Because of these two key features, the reported approach (i) covers a larger fragment of STL compared to the existing approaches, (ii) alleviates some computational cost associated with the evaluation of the control law, and (iii) relaxes some of the conservativeness stemming from smooth implementations of Boolean operators. The efficacy of this new approach is demonstrated with various simulation case studies.
Description
Keywords
Decision-making, Autonomous agents, Reinforcement learning, Signal temporal logic
Citation