Modeling Language
The competition language for the learning part of IPC-2011 is the typed STRIPS subset of PDDL:
Example of domains that follow the competition format.
Evaluation Schema
The organizers are not placing any constraints on what style of learning approach might be used. For example, a system might utilize statistical/inductive learning or purely deductive learning techniques. In addition, the learning part provides a good venue for entering approaches that might not traditionally be viewed as learning, such as pure domain-analysis. Ultimately, we hope to see a wide variety of approaches, that will help answer the following questions. How can a planner best use a learning period in order to improve future performance?
The competition will be run in two stages:
The learning stage. Participants will be given two weeks, after which they must sent to the organizers the resulting Domain-specific Control Knowledge (DCK) folders together with the used sets of training files.
- The learning stage will begin after the participants deliver the final version of their code to the organizers. At this point the participants must freeze their code.
- After the code freeze, the organizers will distribute for each competition domain:
The domain definition file
The problem generator to produce the training set.
The problem test set. A set of problems from the target distribution. The ultimate goal of the competition is to learn DCK that allows a planner to perform well on problems drawn from this distribution. Naturally, the set of target problems used in the actual evaluation will not be made available to the learners during the learning stage.
Once domains are distributed, each participant will generate their training set and run their learner program to produce a DCK folder for each domain. At this step, participants must run the same learner program that was submitted during the code freeze. To ensure that the frozen learner produces the same DCK with the same training set as submitted by participants, the organizers will randomly select domains in which to run the learner programs locally.
The testing stage. The planner programs will be evaluated in each domain of the competition with and without the learned DCK on the same problem set. The no-knowledge evaluation will help provide insight into the impact that learning had for each participant.
The organizers will conduct the evaluation stage on their local machines. Right now, the amount of time and memory allocated for each planner run is not finalized yet. Those numbers partly depend on how many planners enter the competition and the computing resources available to us at the competition time. At the moment, the numbers we have in mind for each run are: 15 minutes for each run and 4 GB RAM.
Score
Three winners will be crowned according to the results obtained in the testing stage: one based on a planning-time metric, one based on a plan-quality metric and one based on a learning-impact metric.
Planning Time Metric
For a given problem let T* be the minimum time required by any planner to solve the problem. (When no planner solves the problem then we ignore it for evaluation.)
A planner that solves the problem in time T will get a score of 1/(1 + log10(T/T*)) for the problem. Those that do not solve the problem get a score of 0.
- Runtimes below 1 sec get the same score.
- The planning time metric for a planner is simply the sum of scores received over all evaluation problems.
Plan Quality Metric
For a given problem let N* be the minimum number of actions in any solution returned by a participant. (When no planner solves the problem then we ignore it for evaluation.)
A planner that returns a solution with N actions will get a score of N*/N for the problem. Those that do not solve the problem get a score of 0.
- The plan quality metric for a planner is simply the sum of scores received over all evaluation problems.
Learning Impact Metric. This is a jury award. Organizers will reward the planner that demonstrates the best performance improvement through learning. Please note that the spirit of the competition is to make the best planners possible. Accordingly, participants should not deliberately make their baseline planners poor. If organizers suspect that a participant has done this, this participant will not be eligible for the award. In order to support their decision, organizers will value planners using a Pareto Ranking that regards these two objectives:
- Score obtained by the baseline planner
- Delta score obtained exploiting the learned DCK
In addition, organizers plan to give a special award for any planner that: (1) beats the winner of the sequential part (on learning track domains) with the learned DCK and (2) does not beat the same winner on the same domains without the learned DCK.
Publication of source code
The organizers require all competitors to make public, through the IPC-2011 website, the source code of the version competing in the competition. This will encourage information sharing and allow independent evaluation and double checking of the competition results by the planning community.