Attributes of programming by demonstration of a task expressed as a Locally k-Testable language
Date
2016
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
In the process of pbd (Programming by Demonstration), an expert programs a task in a machine by providing training examples to it. This thesis proposes a pbd approach that adds a combination of some attributes to the process, applicable for a task representable as a particular class of regular languages called Locally k-Testable or LTk . The addition of these attributes is challenging within state- of-the-art methods like Inverse Reinforcement Learning (IRL), Behavior Networks, and Precedence Graphs. The attributes include a generalization of task within a manageable size of training without applying a linear form to a task (task for IRL has to be linear), an implementation of new tasks by algebraically combining known tasks, and a quantification of the training required for complete learning (if known randomness underlie the training). We refer to this pbd approach by name pbd-LTk. ☐ pbd-LTk can train a learner without losing potential solutions. The learner can compute and implement a ‘Boolean composition’ of learned tasks without learn- ing this composition from scratch using a separate training. A designer of a procedure implementing pbd-LTk can assess the feasibility of learning by using the quantified expected size of training. The system arising from an interaction between a learner and its environment is modeled here as a mdp. The state space of the mdp is partitioned by using a predicate based abstraction method that enables an expression of the physical evolution of the mdp in a symbolic form which uses alphabet of LTk language modeling task. The thesis adopts two learning methods (extended string-extension learning and stochastic finite learning) applicable to LTk languages. Each of the methods learns a LTk task specification as a graph which can constrain the evolution of an mdp via a unique product operation. ☐ From the set of mdp trajectories, the process extracts that subset which meets the learned LTk specification. The attributes of pbd-LTk are validated numerically and experimentally through two case studies. In the first application, an agent navigating a Gridworld achieves a goal while avoiding prohibited transitions. In the second one, a mobile 5-DOF manipulator organizes objects of two colors on the shelves of a cabinet while keeping object-n-shelf color match and maintaining a check on the number of placed objects of each kind. In pbd-LTk , the learner converged to all possible options of terminating the navigation task whereas an alternative method (IRL) found it challenging to achieve without additional demonstrations. Prevalent methods have not considered either the compositionality of their task model or the pre-assessment of training required for a complete learning. ☐ The method outlined here provides a new tool to perform pbd. However, it has some limitations: a time complexity exponential in the number of alphabet sym- bols, a learner knows the nature of task beforehand, and stochastic finite learning assumes a known randomness in training. The limitations of pbd-LTk are defend- able by its benefits and by a similar level of restriction in the existing literature. The approach can be extended to other classes of languages which are already proved to model a variety of real life tasks.
Description
Keywords
Generalization, Grammatical inference, Learning from demonstration, Motion planning, Programming from demonstration, Sample complexity