WP4. Lifelong improvement of planning and decision making
The goal of this WP is to provide the decision making techniques used by the robot, including creation of high level plans, reasoning about goals, learning and adaptation to user preferences. This goal can be divided in the following sub goals and associated tasks:
T4.1 Lifelong Learning for Planning
The high level behaviour of the robot will be mainly driven using Automated Planning. The aim of this task is to research to produce better plans, where better can be measured along multiple dimensions: time to generate plans, perceived quality, robustness, predictability, reusability, naturality, Research will be conducted mainly in two fields: learning to improve the planning process and learning to reuse and repair already created plans. For the first field, research will be done in automatically defining portfolios of planners, selecting the most adequate ones for the problem being solved, and on the learning of control rules, macro-operators or domain specific heuristics guiding the planner toward the solution. In plan repair or reuse, research will focus on reducing the number of changes of the original plan and transferring knowledge from a plan to the next one following a lifelong perspective, making the robot behaviour more predictable, being thus tightly coupled with the work performed in task T4.3. Learning will be done on-line and lifelong.
T4.2 Lifelong Reinforcement Learning
Probabilistic Policy Reuse (PPR) is a well known approach for transferring learned knowledge in lifelong Reinforcement Learning. In this task, we will research in the advancement of PPR algorithms from three different perspectives: the definition of similarity metrics among tasks and problems, which will permit to decide when the transfer of knowledge among those learning tasks will be accurate in the long term; the discovery of structures in the domains, and the construction of abstractions, models and hierarchies, which will permit the transfer of generalized knowledge in the lifelong learning; and the integration of past knowledge with current decision making processes in the CORTEX cognitive architecture, with the goal of the improvement of current and future learning processes, permitting the lifelong decision making process.
T4.3 Lifelong Reasoning and Learning about Goals and User Preferences
The objective of this task is to adapt the generated plans to the preferences of the user so the robot behaves “as it should” from the user’s point of view. Preferences have been usually considered as immutable and manually defined. Within the project, preferences will be automatically learned and will change with time, following user desires. Also, most of the current work on preferences in planning is devoted to reasoning about preferences on goals, either when not all of them can be achieved (oversubscription problems) or when there is a trade-off between the benefit of achieving a goal and its cost (net-benefit problems). PDDL, the standard language used by the planning community, allows also to define trajectory constraints, guiding the plan toward including certain actions, but little work has been performed on the user’s preferences on the way the goals are achieved, i.e. the different plans that can achieve a goal, being most of the times the cost of the plan the only metric used. Research will be done both in goal and plan preferences to endow the robot with the skills to adapt its behaviour to the changing user expectations.
T4.4 Use-Case Modelling and Integration in a High-level Planning-Learning-Execution architecture
The objective of the task is the formalization of the use-cases designed in WP8 in a standard planning formalism, like PDDL, for its integration in the high level reasoning processes of CORTEX. The PELEA architecture, also used in the previous THERAPIST project, will be used as planning, learning, execution and monitoring mechanism for the high-level decision-making, following the results of the THERAPIST project (see https://www.youtube.com/NAOTherapist).