.. versionadded:: 1.3
   :mod:`test.py` has been created and thus, a new requirement on third-software is posted: NumPy and SciPy.

.. versionadded:: 1.2
   Command-line arguments parsed with `argparse <http://docs.python.org/library/argparse.html?highlight=argparse#module-argparse>`_

.. versionadded:: 1.1
   :mod:`report.py` accepts a new collection of variables

.. versionadded:: 1.1
   :mod:`score.py` accepts now a new command-line argument :option:`--time`

.. versionadded:: 1.1
   Welcome :mod:`tscore.py`!

.. _ipcreport-label:

****************
IPCReport
****************

.. index::
   module: IPCReport
   pair: directive; --help
   pair: directive; --version
   

While the design and implementation of the package :mod:`IPCData` (see
:ref:`ipcdata-label`) followed much the ideas that were already
developed at the Sixth International Planning Competition in 2008,
this package is brand new. It is mainly devoted to provide some simple
(yet hopefully useful) mechanisms to access the data generated during
the IPC or, alternatively, during a number of experiments.

The :mod:`IPCReport` package consists mainly of three different Python
modules: :mod:`report.py`, :mod:`score.py` and :mod:`tscore.py`. While
the first is intended to inspect the data generated by the
:mod:`invokeplanner.py` module (see :ref:`invokeplanner-label`), the
second and third have been developed to provide a consistent way to
compute score tables and serves to compare the performance of a
selected subset of planners in a selected subset of domains.

All these modules have been developed with Python 2.x


.. _ipcreport-dependencies:

================
Dependencies
================

.. index::
   module: pyExcelerator
   module: PrettyTable

The modules described in this chapter have a number of dependencies
with third-party software that have to be installed prior to the
installation and usage of :mod:`IPCReport`:

  :pyExcelerator: This package provides an easy-to-use and clean interface to the
                  generation of Excel worksheets with a number of nice features
                  including colors, splitters, etc.

                  Instructions for downloading and installing the package are given
                  `here <http://sourceforge.net/projects/pyexcelerator/>`_

  :numPy and SciPy: NumPy is, according to its authors, the
                    fundamental package for scientific computing with
                    Python. The SciPy library depends on NumPy, which
                    provides convenient and fast N-dimensional array
                    manipulation for different purposes.

                    Instructions to install these packages are given
                    `here <http://www.scipy.org/Installing_SciPy/>`_

Finally, `PrettyTable <http://code.google.com/p/prettytable/>`_ is
used also. However, since it consists of a single module (i.e., no
:file:`__init__` file is given), it is provided within the package
:mod:`IPCReport` by default.

In what follows, it is assumed that the reader has already checked out
to her local computer the scripts located at::

 svn://svn@pleiades.plg.inf.uc3m.es/ipc2011/data/scripts/pycentral/IPCReport


.. _commandline-arguments-ipcreport:

=======================
Command-line arguments
=======================
.. index::
   pair: directive; --help
   pair: directive; --quiet
   pair: directive; --version
   pair: directive; --domain
   pair: directive; --planner
   pair: directive; --problem
   pair: directive; --directory
   pair: directive; --summary
   pair: directive; --name
   pair: directive; --style
   pair: directive; --variables
   pair: directive; --ascending
   pair: directive; --descending
   single: table
   single: octave
   single: html
   single: excel
   single: wiki
   single: latex
   single: placeholder

All the modules in this package adhere to a consistent naming of the
flags they acknowledge. This section describes this convention.

As a general rule all programs honour, at least, the following three flags:

  -h, --help      provide a brief description of the main purpose of the
                  script and presents all the available flags
  -q, --quiet     only prints the requested data
  -V, --version   shows the current version of the script along with the
                  head svn release that affects it and the building date

All the modules in package :mod:`IPCReport` can process data either
from a results tree directory (see :ref:`results-label`) or a summary
---also known as snapshot, see :ref:`snapshots-label`. While they
cannot be specified simultaneously, one has to be provided with one of
the following command-line arguments:

  -d, --directory  specifies the directory to explore. Their contents have to be consistent
                   with the structure of the results directory, see :ref:`results-label`
  -s, --summary    instructs the script to retrieve the data contained in 
                   the binary file specified. For more information see :ref:`snapshots-label`

On the other hand, all modules provide simple means to filter data by
planner, domain and problem. In all cases, the given command-line
argument receives a regular expression and only one:

  -P, --planner    only planners meeting the specified regexp are considered. 
                   All by default
  -D, --domain     only domains meeting the specified regexp are considered. 
                   All by default
  -I, --problem    only problem ids meeting the specified regexp are examined. 
                   All by default

The output of every module consists always of a table (with different
sorts of data, according to the purpose of the module) that can be
generated in different formats and can be given arbitrary names:

  -n, --name       name of the output table. In some cases, the name can acknowledge placeholders
  -y, --style      sets the table type. At least, :const:`table`, :const:`octave`, :const:`html`, 
                   :const:`excel` and :const:`wiki` are honoured by all modules. Exceptionally, :mod:`score.py`
                   also welcomes :const:`latex`

Most of these parameters accept a single value. However, some
directives can be specified an arbitrary number of times. For example,
one might want to examine the contents of the variables
:const:`solved` and :const:`oksolved` with :mod:`report.py`. Instead
of writing :option:`--variable solved --variable oksolved`, it is
possible to abbreviate it as :option:`--variable solved
oksolved`. Other directives that accept an arbitrary number of
arguments are: :option:`--ascending` and :option:`--descending`


.. _report-label:

================
report.py
================

.. index::
   module: report.py

:mod:`invokeplanner.py` generates a particular tree structure that
starts at the directory :file:`results/` in the same directory
specified with the command-line option :option:`--directory`. All
modules of the package :mod:`IPCReport` are able to process the data
in a results directory. However, this might result in large waiting
times. To speed up the process, summaries (alternatively known also as
snapshots) are provided.

While there is no need to be aware of the particular arrangement of
the results directory structure it is described here succintly for the
sake of completeness. Also, a gentle introduction to snapshots is
provided immediately after. Most readers can safely skip the first two
subsections and go directly to the next subsection that explains how
to inspect data, :ref:`inspecting-data-label`.


.. _results-label:

----------------------
The results directory
----------------------

.. index::
   pair: directive; --logfile
   single: nohup
   single: INFO

The contents of the :file:`results/` directory are sketched below:

.. blockdiag::

   {
      track-subtrack-1[label="track/subtrack 1"]
      planner-11[label="planner 1"]
      domain-111[label="domain 1"]
      problem-1111[label="000"]
      problem-1112[label="001"]

      results -> track-subtrack-1 -> planner-11 -> domain-111 -> problem-1111;
      results -> track-subtrack-1 -> planner-11 -> domain-111 -> problem-1112;
      results -> track-subtrack-1 -> planner-11 -> domain-111 -> problem-111i;

      problem-111i[shape=dots];

      domain-112[label="domain 2"]
      results -> track-subtrack-1 -> planner-11 -> domain-112 -> problem-112i;

      problem-112i[shape=dots];

      results -> track-subtrack-1 -> planner-11 -> domain-11i;

      domain-11i[shape=dots];

      results -> track-subtrack-1 -> planner-1i;

      planner-1i[shape=dots];

      track-subtrack-2[label="track/subtrack 2"]
      planner-21[label="planner 1"]
      domain-211[label="domain 1"]
      problem-2111[label="000"]
      problem-2112[label="001"]

      results -> track-subtrack-2 -> planner-21 -> domain-211 -> problem-2111;
      results -> track-subtrack-2 -> planner-21 -> domain-211 -> problem-2112;
      results -> track-subtrack-2 -> planner-21 -> domain-211 -> problem-211i;

      problem-211i[shape=dots];

      domain-212[label="domain 2"]
      results -> track-subtrack-2 -> planner-21 -> domain-212 -> problem-212i;

      problem-212i[shape=dots];

      results -> track-subtrack-2 -> planner-21 -> domain-21i;

      domain-21i[shape=dots];

      results -> track-subtrack-2 -> planner-2i;

      planner-2i[shape=dots];

      results -> track-subtrack-i;

      track-subtrack-i[shape=dots];

   }

Therefore, the particular results of executing planner `P` in domain
`D` in a particular track/subtrack are all stored in a number of
directories :file:`000/`, :file:`001/`, ... To examine the results of
one execution it is enough to examine the contents of that particular
directory. Recall that these directories contain the problems and
domains stored in the svn repository that result after sorting the
names of domains (if more than one) and problems in lexicographical
order ---according to the cases :const:`single` and :const:`multi`,
see :ref:`builddomain-label`.

In particular, the contents of the results directory of the Seventh
International Planning Competition can be accessed in::

  svn://svn@pleiades.plg.inf.uc3m.es/ipc2011/results

This repository contains three subdirectories:

  **logs/**

  This directory contains the execution log files that were generated
  by the script :mod:`invokeplanner.py` and also the standard output
  generated by the script itself. The first one refers to the logfile
  generated with the directive `--logfile`, and the second one is just
  the standard :file:`nohup.out` output that results when running a
  process in background that is explicitly requested to do not be a
  child of the current shell process ---explicitly requested with the
  :command:`/usr/bin/nohup` command that automatically records the
  standard output to the file :file:`nohup.out`.

  For the sake of clarity, a chunk of the output of the log file
  specified with :option:`--logfile` is shown here that shows the result
  of running all planners in the :const:`seq-opt` competition with the
  :const:`sokoban` domain::

   [2011-03-30 12:55:49,375] [       plg@tau] [invoke_planner::show_switches] INFO

   -----------------------------------------------------------------------------
    * Track     : 'seq'
    * Subtrack  : 'opt'
    * Planner   :  ['*']
    * Domain    :  ['sokoban']  

    * Directory :  /home/plg/seq-opt-sokoban
    * Bookmark  :  svn+ssh://svn@korf.plg.inf.uc3m.es/ipc2011

    * Timeout   :  1800 seconds
    * Memory    :  6442450944 bytes
    -----------------------------------------------------------------------------

   [2011-03-30 12:55:49,376] [       plg@tau] [invoke_planner::setup] INFO
     Building planner ...
   [2011-03-30 12:55:50,209] [       plg@tau] [co_and_build] INFO
	    Checking out bjolp in '/home/plg/seq-opt-sokoban/bjolp'
   [2011-03-30 12:55:51,403] [       plg@tau] [buildplanner:co_and_build] INFO
	    Building bjolp
   [2011-03-30 12:59:15,735] [       plg@tau] [co_and_build] INFO
	    Checking out cpt4 in '/home/plg/seq-opt-sokoban/cpt4'
   [2011-03-30 12:59:21,068] [       plg@tau] [buildplanner:co_and_build] INFO
	    Building cpt4

   ...

   [2011-03-30 13:36:52,066] [       plg@tau] [invoke_planner::setup] INFO
     Building domain ...
   [2011-03-30 13:36:52,735] [       plg@tau] [build_domain] INFO
	    Building domain sokoban in '/home/plg/seq-opt-sokoban/sokoban'
   [2011-03-30 13:37:01,694] [       plg@tau] [invoke_planner::setup] INFO
     Building workingdir /home/plg/seq-opt-sokoban/_bjolp.sokoban.000 ...
   [2011-03-30 13:37:07,015] [       plg@tau] [invoke_planner::collect] INFO
     Collecting results in /home/plg/seq-opt-sokoban/_bjolp.sokoban.000 ...
   [2011-03-30 13:37:07,064] [       plg@tau] [invoke_planner::setup] INFO
     Building workingdir /home/plg/seq-opt-sokoban/_bjolp.sokoban.001 ...

   ...

   [2011-03-31 01:47:46,860] [       plg@tau] [invoke_planner::show_stats] INFO

    * Overall running time (seconds):
   +------------------+---------------+---------------+
   | *                |    sokoban    |     total     |
   +------------------+---------------+---------------+
   | bjolp            | 1078.88212848 | 1078.88212848 |
   | cpt4             | 12922.8750858 | 12922.8750858 |
   | fd-autotune      | 887.525495529 | 887.525495529 |
   | fdss-1           |  616.94525528 |  616.94525528 |
   | fdss-2           | 611.941836357 | 611.941836357 |
   | forkinit         | 3423.95640659 | 3423.95640659 |
   | gamer            | 17898.9191561 | 17898.9191561 |
   | iforkinit        | 216.029652119 | 216.029652119 |
   | lmcut            | 878.142556906 | 878.142556906 |
   | lmfork           | 4155.72897196 | 4155.72897196 |
   | merge-and-shrink | 611.936996937 | 611.936996937 |
   | selmax           |  456.77576685 |  456.77576685 |
   | total            | 43759.6593089 |               |
   +------------------+---------------+---------------+

   [2011-03-31 01:47:46,861] [       plg@tau] [invoke_planner::show_stats] INFO

    * Overall memory (Mbytes):
   +------------------+---------------+---------------+
   | *                |    sokoban    |     total     |
   +------------------+---------------+---------------+
   | bjolp            | 11903.9882812 | 11903.9882812 |
   | cpt4             |  740.85546875 |  740.85546875 |
   | fd-autotune      |  714.7890625  |  714.7890625  |
   | fdss-1           |   10267.1875  |   10267.1875  |
   | fdss-2           | 11388.7304688 | 11388.7304688 |
   | forkinit         |  430.98046875 |  430.98046875 |
   | gamer            | 63276.8867188 | 63276.8867188 |
   | iforkinit        |   366.71875   |   366.71875   |
   | lmcut            |  703.5390625  |  703.5390625  |
   | lmfork           |  450.95703125 |  450.95703125 |
   | merge-and-shrink | 11436.7890625 | 11436.7890625 |
   | selmax           | 1637.82421875 | 1637.82421875 |
   | total            | 113319.246094 |               |
   +------------------+---------------+---------------+

   [2011-03-31 01:47:46,862] [       plg@tau] [invoke_planner::show_stats] INFO

    * Number of solved instances:
   +------------------+---------+-------+
   | *                | sokoban | total |
   +------------------+---------+-------+
   | bjolp            |    20   |   20  |
   | cpt4             |    1    |   1   |
   | fd-autotune      |    20   |   20  |
   | fdss-1           |    20   |   20  |
   | fdss-2           |    20   |   20  |
   | forkinit         |    19   |   19  |
   | gamer            |    19   |   19  |
   | iforkinit        |    20   |   20  |
   | lmcut            |    20   |   20  |
   | lmfork           |    19   |   19  |
   | merge-and-shrink |    20   |   20  |
   | selmax           |    20   |   20  |
   | total            |   218   |       |
   +------------------+---------+-------+

   [2011-03-31 01:47:46,863] [       plg@tau] [invoke_planner::show_stats] INFO

    * Number of overall solutions generated:
   +------------------+---------+-------+
   | *                | sokoban | total |
   +------------------+---------+-------+
   | bjolp            |    20   |   20  |
   | cpt4             |    1    |   1   |
   | fd-autotune      |    20   |   20  |
   | fdss-1           |    20   |   20  |
   | fdss-2           |    20   |   20  |
   | forkinit         |    19   |   19  |
   | gamer            |    19   |   19  |
   | iforkinit        |    20   |   20  |
   | lmcut            |    20   |   20  |
   | lmfork           |    19   |   19  |
   | merge-and-shrink |    20   |   20  |
   | selmax           |    20   |   20  |
   | total            |   218   |       |
   +------------------+---------+-------+
   
  The dump shown above is divided in four sections: the first one
  shows some administrative information with the current version of
  the script along with a description of all the parameters given to
  it. Next, various :const:`INFO` messages are issued to show what
  planners have been checked out from the svn repository and the exact
  times when they were compiled; the third part shows the testsets
  built; finally, a human-readable output is shown with some
  statistics about the overall performance of all planners.

  On the other hand, the output of the :program:`nohup` command is shown
  below::

   Revision: 115
   Date: 2011-03-27 16:08:59 +0200 (Sun, 27 Mar 2011)

   ./invokeplanner.py 1.0

  There is no particular arrangement for the contents of the
  :file:`logs` directory though the most usual is to have first a
  number of subdirectories sorted by track and subtrack. Beneath these
  directories different folders exist, each referring to a particular
  set of experiments. For example, a directory named :file:`acoplan` means
  that it contains the output of running the planner
  :program:`acoplan` with all domains. A subdirectory named
  :file:`barman` shall be expected to contain the results of all
  planners when facing that particular domain, and so on.

  **raw/**

  This directory contains the tree structure that results after
  merging the directory ``results/`` of all the experiments performed
  so far in all tracks.

  However, none of these directories contain solutions validated by
  the Automatic Validation Tool `VAL
  <http://planning.cis.strath.ac.uk/VAL/>`_

  **val/** 

  This directory follows the same structure than the directory
  :file:`raw/` but it contains only the minimum number of files that
  are necessary for validating each solution ---if any was generated.

  The script :mod:`validate.py` was later run on this directory,
  leaving a validation log file at each terminal directory with the
  result of the validation process. For more details, the interested
  reader is referred to :ref:`validate-label`

Therefore, from the previous descriptions it follows that all the
results of the Seventh International Planning Competition are
available in two different formats: either raw or
validated. Unfortunately, processing these directories takes usually a
long time. To speed it up snapshots are provided.


.. _snapshots-label:

----------------------
Snapshots
----------------------

.. index::
   single: snapshot
   single: summary
   pair: directive; --summary

A snapshot (or alternatively, a summary) is just a binary file that
contains the same relevant data stored in a results tree
directory. Besides, it follows the same structure depicted there.

Snapshots provide a number of advantages:

  :Speed:      handling a binary file is far faster than traversing 
               a tree structure, visiting files and parsing their contents

  :Size:       besides, snapshots are usually smaller than a compressed 
               file with the contents of a results directory so that they
               ease exchanging data among developers or the participants/organizers
               of an International Planning Competition

Snapshots are just created by instructing :mod:`report.py` to write
the result of a query in a file specified with the directive
:option:`--summarize`, i.e., snapshots contain the results that were
processed from a particular directory or another snapshot. In the
first case, the tree that contains the data to inspect is specified
with the directive :option:`--directory` whereas a snapshot can be
specified with the flag :option:`--summary` ---see
:ref:`commandline-arguments-ipcreport`.

While they contain all the necessary information to understand and
analyze the performance of each planner in every single domain, they
are not easy to process manually. Instead, a dedicated module is
devoted to this goal: :mod:`report.py`

.. _inspecting-data-label:

-------------------
Inspecting data
-------------------

.. index::
   pair: directive; --summarize
   pair: directive; --directory
   pair: directive; --summary
   pair: directive; --level
   pair: directive; --planner
   pair: directive; --domain
   pair: directive; --problem
   pair: directive; --variable
   pair: directive; --variables
   pair: directive; --quiet
   pair: directive; --ascending
   pair: directive; --descending
   pair: directive; --style
   pair: directive; --unroll
   pair: directive; --name
   single: summary
   single: snapshot
   single: query
   single: level
   single: table
   single: excel
   single: octave
   single: html
   single: wiki
   single: origin

This section describes all the command-line arguments accepted by
:mod:`report.py`. For a thorough discussion of the command-line
arguments see :ref:`commandline-arguments-ipcreport`. Besides, the
data retrieved with the script described here can be directly given to
:mod:`test.py` (see :ref:`test-label`) to perform various sorts of
statistical tests.

The :mod:`report.py` scripts accepts trees as the one described in
subsection :ref:`results-label` with the option
:option:`--directory`. Besides, it can also accept a binary file with
the same contents, termed here as :const:`summaries` or
:const:`snapshots` (see section :ref:`snapshots-label`) with the
directive :option:`--summary`. In the following we will refer both to
the snapshots and the results tree directories as the `origin`.

The specified origin automatically sets the `level` of the query. If
it refers to a directory that looks like :file:`track-n-subtrack-m`
(such as :file:`seq-sat`), the query refers to the whole
track/subtrack; if the origin is relative to a directory such as
:file:`planner-p` (such as :file:`cbp`), the queries are referred only
to that particular planner in the track/subtrack that contains it; if
the origin points deeper to :file:`domain-d` (e.g., :file:`tidybot`)
then all queries are relative to the combination of planner and domain
that are defined within that particular track/subtrack. Finally,
specifying an origin with a particular problem restricts all queries
to that particular problem. However, the level set by default by a
particular origin can be altered with :option:`--level`. Both
snapshots and results tree directories are arranged as explained in
:ref:`results-label`. Therefore, the only legal levels are:
:const:`planner`, :const:`domain` and :const:`problem` and in exactly
that order. Obviously, the level cannot be pushed up (e.g., involving
other planners when specifying a domain) but it can be refined further
by specifying any of the legal levels if and only if the origin is the
same or above than the specified level.

Furthermore, queries can be refined by a number of arguments, as
explained in :ref:`commandline-arguments-ipcreport` by providing one
(and only one) regular expression to any of the following directives:
:option:`--planner`, :option:`--domain` and/or
:option:`--problem`. They can be provided in any combination. For
example, :option:`--planner lama --domain p --problem "0[01][02468]"`
retrieves information for the problems with an even identifier in
those domains that start with `p` that where given to planners whose
name starts with `lama`.

:mod:`report.py` acknowledges a number of variables whose values are
returned after inspecting the corresponding origin. The list of
available variables is listed if :option:`--variables` is specified in
the command line ---for a thorough introduction to the variables
acknowledged by :mod:`report.py` the reader is referred to
:ref:`reporting-variables-label`. Variables are specified with the
directive :option:`--variable`. There is no need to use the directive
more than once unless the specification of different variables happen
in different locations of the command-line. For example, to access
vars `var1`, `var2`, `var3`, ... the following suffices:
:option:`--variable var1 var2 var3 ...`. The report will show the
variables in the same order they have been specified so that the same
results can be achieved with :option:`--variable var3 var2 var1 ...`
but in a different order.

For example, the number of problems in the sequential satisficing
track of the Seventh International Planning Competition can be
retrieved with the following command::

  $ ./report.py --directory /Volumes/Owl/Downloads/ipc2011/results/val/seq-sat 
    --variable numprobs --summarize seq-sat.snapshot

  Revision: 282 
  Date: 2011-07-04 10:48:49 +0200 (Mon, 04 Jul 2011) 

  ./report.py 1.0

  -----------------------------------------------------------------------------
  * directory      : /Volumes/Owl/Downloads/ipc2011/results/val/seq-sat
  * snapshot       : /Users/clinares/lab/ipc2011-data/scripts/pycentral/IPCReport/seq-sat.snapshot
  * name           : report
  * level          : None
  * planner        : .*
  * domain         : .*
  * problem        : .*
  * variables      : ['numprobs']
  * unroll         : False
  * sorting        : []
  * style          : table
  -----------------------------------------------------------------------------

  name: report
 +----------+
 | numprobs |
 +----------+
 |   7560   |
 +----------+
  legend:
    numprobs: total number of problems [elaborated data]

  created by IPCrun 1.0 (Revision: 283), Thu Jul 21 13:42:08 2011
 

Note that the preceding command creates also a summary with all the
data that results from processing all the tree structure rooted at the
particular location given
:file:`/Volumes/Owl/Downloads/ipc2011/results/val/seq-sat`

The same query can be refined further requesting the number of
problems by planner just by altering the level as follows::

  $ ./report.py --summary seq-sat.snapshot --variable numprobs --level planner

  Revision: 282 
  Date: 2011-07-04 10:48:49 +0200 (Mon, 04 Jul 2011) 

  ./report.py 1.0

  -----------------------------------------------------------------------------
  * summary        : /Users/clinares/lab/ipc2011-data/scripts/pycentral/IPCReport/seq-sat.snapshot
  * snapshot       : 
  * name           : report
  * level          : planner
  * planner        : .*
  * domain         : .*
  * problem        : .*
  * variables      : ['numprobs']
  * unroll         : False
  * sorting        : []
  * style          : table
  -----------------------------------------------------------------------------

  name: report
 +---------------+----------+
 |    planner    | numprobs |
 +---------------+----------+
 |    acoplan    |   280    |
 |    acoplan2   |   280    |
 |     arvand    |   280    |
 |      brt      |   280    |
 |      cbp      |   280    |
 |      cbp2     |   280    |
 |      cpt4     |   280    |
 |   dae_yahsp   |   280    |
 | fd-autotune-1 |   280    |
 | fd-autotune-2 |   280    |
 |     fdss-1    |   280    |
 |     fdss-2    |   280    |
 |  forkuniform  |   280    |
 |   lama-2008   |   280    |
 |   lama-2011   |   280    |
 |     lamar     |   280    |
 |     lprpgp    |   280    |
 |   madagascar  |   280    |
 |  madagascar-p |   280    |
 |     popf2     |   280    |
 |     probe     |   280    |
 |    randward   |   280    |
 |     roamer    |   280    |
 |  satplanlm-c  |   280    |
 |    sharaabi   |   280    |
 |     yahsp2    |   280    |
 |   yahsp2-mt   |   280    |
 +---------------+----------+
  legend:
    planner [key]
    numprobs: total number of problems [elaborated data]  

  created by IPCrun 1.0 (Revision: 283), Thu Jul 21 13:43:36 2011

Note that because a snapshot was created in the first query, it is now
feasible to use it instead of directly accessing the tree
structure. This procedure actually saves a lot of time and goes far
faster.

Of course, variables can be combined. For example, the following
command returns the number of problems, the number of solved tasks
(but not validated) and the number of problems where the winners of
the Sixth International Planning Competition (LAMA 2008) and the
Seventh International Planning Competition (LAMA 2011) failed::

  $ /report.py --summary seq-sat.snapshot --planner 'lama.*20.*' --level planner
    --variable numprobs numsolved numfails --quiet

  name: report
 +-----------+----------+-----------+----------+
 |  planner  | numprobs | numsolved | numfails |
 +-----------+----------+-----------+----------+
 | lama-2008 |   280    |    188    |    92    |
 | lama-2011 |   280    |    250    |    30    |
 +-----------+----------+-----------+----------+
  legend:
    planner [key]
    numprobs: total number of problems [elaborated data]
    numsolved: number of solved problems (independently of the solution files generated) [elaborated data]
    numfails: total number of fails [elaborated data]

  created by IPCrun 1.0 (Revision: 283), Thu Jul 21 13:51:06 2011

In this case, because the directive :option:`--quiet` was given, all
the headers were removed from the output.

Moreover, the results can be sorted either in ascending or descending
order of any combination of variables thanks to the flags
:option:`--ascending` and :option:`--descending`. These variables
shall be given along with one of the variables specified in the query
and/or any of the legal levels: :const:`planner`, :const:`domain` and/or
:const:`problem`. For example, the following command shows the number
of problems successfully solved and the number of plan solution files
generated by all planners in the sequential satisficing track. It then
sorts the output giving preference to the planners that solved more
tasks and, in case of a tie (note the case of planners
:command:`probe` and :command:`fdss-2` both with 233 tasks solved), it
ranks first those that generated more solution files::

  $ ./report.py --summary seq-sat.snapshot --variable oknumsolved oksumnumsols 
    --level planner --quiet --descending oknumsolved --descending oksumnumsols
   name: report
  +---------------+-------------+--------------+
  |    planner    | oknumsolved | oksumnumsols |
  +---------------+-------------+--------------+
  |   lama-2011   |     250     |     874      |
  |     fdss-2    |     233     |     645      |
  |     probe     |     233     |     460      |
  |     fdss-1    |     232     |     828      |
  | fd-autotune-1 |     223     |     557      |
  |     roamer    |     213     |     779      |
  |  forkuniform  |     207     |     589      |
  |     lamar     |     195     |     764      |
  | fd-autotune-2 |     193     |     516      |
  |     arvand    |     190     |     1813     |
  |   lama-2008   |     188     |     743      |
  |    randward   |     184     |     689      |
  |      brt      |     157     |     499      |
  |     yahsp2    |     138     |     246      |
  |   yahsp2-mt   |     137     |     423      |
  |      cbp2     |     135     |     834      |
  |      cbp      |     123     |     788      |
  |   dae_yahsp   |     120     |     963      |
  |     lprpgp    |     118     |     236      |
  |  madagascar-p |      88     |      88      |
  |     popf2     |      81     |     100      |
  |   madagascar  |      67     |      67      |
  |      cpt4     |      52     |      52      |
  |    sharaabi   |      33     |      33      |
  |  satplanlm-c  |      32     |      32      |
  |    acoplan    |      20     |      80      |
  |    acoplan2   |      20     |      70      |
  +---------------+-------------+--------------+
   legend:
     planner [key]
     oknumsolved: number of *successfully* solved problems (independently of the solution files generated) 
                  [elaborated data]
     oksumnumsols: sum of the total number of *successful* solution files generated [elaborated data]

   created by IPCrun 1.0 (Revision: 283), Thu Jul 21 14:09:09 2011

Another very interesting flag is :option:`--unroll`. This flag
`correlates` the values of an arbitrary number of variables ---usually
two. If the values of all variables are lists then :option:`--unroll`
creates as many rows in the resulting table as elements in the
shortest list. For example, to show how the quality of the solutions
generated by :command:`arvand` in problem `011` of the domain
`openstacks` improved over time::

  $ /report.py --summary seq-sat.snapshot --quiet 
    --planner 'arvand' --level problem --domain 'openstacks' --problem '011' 
    --variable timesols values --unroll

   name: report
  +---------+------------+---------+----------+--------+
  | planner |   domain   | problem | timesols | values |
  +---------+------------+---------+----------+--------+
  |  arvand | openstacks |   011   |    19    | 126.0  |
  |  arvand | openstacks |   011   |    27    | 125.0  |
  |  arvand | openstacks |   011   |    63    | 123.0  |
  |  arvand | openstacks |   011   |   173    | 122.0  |
  |  arvand | openstacks |   011   |   181    | 121.0  |
  |  arvand | openstacks |   011   |   257    | 120.0  |
  |  arvand | openstacks |   011   |   266    | 118.0  |
  |  arvand | openstacks |   011   |   375    | 116.0  |
  |  arvand | openstacks |   011   |   425    | 115.0  |
  |  arvand | openstacks |   011   |   478    | 113.0  |
  |  arvand | openstacks |   011   |   589    | 112.0  |
  |  arvand | openstacks |   011   |   755    | 111.0  |
  |  arvand | openstacks |   011   |   876    | 110.0  |
  |  arvand | openstacks |   011   |   931    | 109.0  |
  +---------+------------+---------+----------+--------+
   legend:
     planner [key]
     domain [key]
     problem [key]
     timesols: elapsed time when each solution was generated (in seconds) [raw data]
     values: final values returned by VAL, one per each *valid* solution file [raw data]

   created by IPCrun 1.0 (Revision: 283), Thu Jul 21 14:22:04 2011

Should :option:`--unroll` not have been given, the report would have
just issued a single line with one list per variable, which is not the
desired effect.

Because :mod:`report.py` acknowledges a number of output formats with
the flag :option:`--style`, :option:`--unroll` is very useful for
creating figures. The available styles are :const:`table`,
:const:`octave`, :const:`html`, :const:`excel` and :const:`wiki`. The
first is used by default. :const:`octave` shows the same information
but in the format of `GNU Octave
<http://www.gnu.org/software/octave/>`_ which can be read also
by `gnuplot <http://www.gnuplot.info/>`_. :const:`html` and
:const:`wiki` are markup languages to show the same data either in
html pages or in the wiki format recognized by `MoinMoin
<http://moinmo.in/>`_. Finally, :const:`excel` creates a file named
:file:`report.xls` with the result of the query. 

The last directive that affects the output is :option:`--name`. It can
be used to give the resulting table an arbitrary name.


.. _score-label:

================
score.py
================

.. index::
   module: score.py
   pair: directive; --directory
   pair: directive; --summary
   pair: directive; --planner
   pair: directive; --domain
   pair: directive; --problem
   pair: directive; --quiet
   pair: directive; --style
   pair: directive; --name
   pair: directive; --time
   pair: placeholder; $track
   pair: placeholder; $subtrack
   pair: placeholder; $domain
   pair: placeholder; $date
   pair: placeholder; $time
   single: snapshot
   single: summary
   single: origin
   single: table
   single: excel
   single: octave
   single: gnuplot
   single: html
   single: wiki
   single: LaTeX
   single: pdf
   single: quality
   single: solutions
   single: time0
   single: time1
   single: time2
   single: qt
   single: pareto dominance
   single: ranking
   single: matrix.tex
   single: matrix.xls
   single: ps-tricks
   single: pdflatex
   single: makefile
   single: score

This section describes all the command-line arguments accepted by
:mod:`score.py`. For a thorough discussion of the command-line
arguments see :ref:`commandline-arguments-ipcreport`.

This script automatically generates score tables for a selected subset
of domains, planners and problems. As in the case of :mod:`report.py`
(see :ref:`report-label`), this script receives either a results tree
(as the one depicted in :ref:`results-label`) or a snapshot ---as
described in :ref:`snapshots-label`. Let `origin` denote both a result
directory and a snapshot or summary ---note that :mod:`score.py` does
not generate any snapshots and that only :mod:`report.py` can do it,
for more information refer to the directive :option:`--summarize` in
:ref:`inspecting-data-label`. The origin shall refer always to a whole
track-subtrack. This is, it is not valid to specify either a directory
or a snapshot that points to a planner, domain or problem.

The collection of planners, domains and problems to consider can be
refined further with regular expresions with :option:`--planner`,
:option:`--domain` and :option:`--problem`.

If the directive :option:`--time` is given, then all measurements are
relative to the time interval [0, `time`] (where `time` is the value
given to :option:`--time` in seconds). If none is specified, then all
results are used. This allows drawing conclusions for different time
horizons, others than that used in the experimentation ---see the
usage of the directive :option:`--timeout` in
:ref:`invokeplanner-label`.

:mod:`score.py` acknowledges up to six different metrics. All of them
are described if the directive :option:`--metrics` is given:

  :quality:           This is the official metric of both the Sixth and Seventh 
                      International Planning Competitions. It computes for each 
                      task a score which equals :math:`\frac{Q^*}{Q}` where :math:`Q^*` 
                      is the quality of the best plan found for this particular task 
                      and :math:`Q` stands for the quality of the plan produced
                      by this planner.

  :solutions:         It gives one point to every planner that solves the current task
                      and zero otherwise.

  :time0:             Computes the score of a planner for a given task as the quotient 
                      :math:`\frac{T^*}{T}` where :math:`T^*` is the minimum time required 
                      by any planner to solve the same task and :math:`T` is the time it 
                      took this particular planner to solve the same task.

                      All times below 1 second are considered to be exactly equal to 1 second. 
                      In other words, differences below one second are considered to be 
                      negligible.

  :time1:             Computes the score of a planner for a given task as the quotient 
                      :math:`\frac{1}{1+\log \left(\frac{T}{T^*}\right)}` where :math:`T^*` 
                      is the minimum time required by any planner to solve the same task and 
                      :math:`T` is the time it took this particular planner to solve the same 
                      task.

                      All times below 1 second are considered to be exactly equal to 1 second. 
                      In other words, differences below one second are considered to be 
                      negligible.

  :time2:             Computes the score of a planner for a given task as the quotient 
                      :math:`\frac{\log\left(1+T^*\right)}{\log\left(1+T\right)}` where :math:`T^*` 
                      is the minimum time required by any planner to solve the same task and 
                      :math:`T` is the time it took this particular planner to solve the same task.

  :qt:                It computes for each planner and task a tuple :math:`(Q, T)` where :math:`Q`
                      stands for the quality of the best solution found by the same planner 
                      and :math:`T` is the time (in seconds) it took for the planner to find it. 
                      Next, it gives to each planner a score that equals the number of tuples it 
                      pareto-dominates for the same task. 

                      :math:`(Q, T)` is said to pareto-dominate :math:`(Q', T')` if and only if 
                      :math:`Q\leq Q'` and :math:`T\leq T'`

All the scores are shown in the form of tables, one per domain that
meet the regular expression given to :option:`--domain`. Besides, if
more than one domain is given, the script computes a final table
called `ranking` with the sum of the scores of all the previous
tables. Each table can be given a name with :option:`--name`. This
directive accepts placeholders which are symbolized with the dollar
sign ``$``. In particular, there are five recognized variables:
*$track*, *$subtrack*, *$domain*, *$date* and *$time* which are
substituted with the particular track, subtrack, domain, current date
and time. The default value of this variable is *$track-$subtrack:
$domain ($date)* [#]_.

For example, the following command::

  $ /score.py --summary seq-opt.results.snapshot 
              --planner 'f' --domain 'sokoban|parcprinter' --time 10

will output the score tables for those planners of the sequential
optimization track that start with the letter `f` in the domains
`sokoban` and `parcprinter` taking into account only the results that
were produced in the first 10 seconds ---though the specified snapshot
contains the results of running the planners up to 1800 seconds. The
metric used is the default one :const:`quality` and the output
(discussed below) is just shown as ASCII tables::

  seq-opt: parcprinter (Mon Dec 19 23:35:01 2011)
  +-------+-------------+--------+--------+----------+------------+
  |  no.  | fd-autotune | fdss-1 | fdss-2 | forkinit |    best    |
  +-------+-------------+--------+--------+----------+------------+
  |  000  |     1.00    |  1.00  |  1.00  |   1.00   | 375821.00  |
  |  001  |     1.00    |  1.00  |  1.00  |   1.00   | 438047.00  |
  |  002  |     1.00    |  1.00  |  1.00  |   1.00   | 510256.00  |
  |  003  |     1.00    |  1.00  |  1.00  |   ---    | 876094.00  |
  |  004  |     1.00    |  1.00  |  1.00  |   1.00   | 519232.00  |
  |  005  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  006  |     1.00    |  ---   |  ---   |   ---    | 1145132.00 |
  |  007  |     1.00    |  1.00  |  1.00  |   1.00   | 751642.00  |
  |  008  |     1.00    |  1.00  |  1.00  |   1.00   | 693064.00  |
  |  009  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  010  |     1.00    |  ---   |  ---   |   ---    | 1216462.00 |
  |  011  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  012  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  013  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  014  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  015  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  016  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  017  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  018  |     ---     |  ---   |  ---   |   ---    |    ---     |
  |  019  |     1.00    |  ---   |  ---   |   ---    | 1270874.00 |
  | total |    10.00    |  7.00  |  7.00  |   6.00   |            |
  +-------+-------------+--------+--------+----------+------------+
  
   ---: unsolved
    X : invalid
  
   created by IPCrun 1.2 (Revision: 295), Mon Dec 19 23:35:01 2011
  
  
  seq-opt: sokoban (Mon Dec 19 23:35:01 2011)
  +-------+-------------+--------+--------+----------+-------+
  |  no.  | fd-autotune | fdss-1 | fdss-2 | forkinit |  best |
  +-------+-------------+--------+--------+----------+-------+
  |  000  |     1.00    |  1.00  |  1.00  |   1.00   |  9.00 |
  |  001  |     1.00    |  1.00  |  1.00  |   1.00   | 37.00 |
  |  002  |     1.00    |  1.00  |  1.00  |   1.00   | 29.00 |
  |  003  |     1.00    |  1.00  |  1.00  |   1.00   | 29.00 |
  |  004  |     ---     |  1.00  |  1.00  |   ---    | 50.00 |
  |  005  |     ---     |  1.00  |  1.00  |   ---    | 35.00 |
  |  006  |     1.00    |  1.00  |  1.00  |   1.00   | 30.00 |
  |  007  |     1.00    |  1.00  |  1.00  |   1.00   | 19.00 |
  |  008  |     1.00    |  1.00  |  1.00  |   1.00   | 15.00 |
  |  009  |     1.00    |  1.00  |  1.00  |   1.00   |  8.00 |
  |  010  |     1.00    |  1.00  |  1.00  |   1.00   | 20.00 |
  |  011  |     1.00    |  1.00  |  1.00  |   1.00   |  2.00 |
  |  012  |     ---     |  ---   |  ---   |   ---    |  ---  |
  |  013  |     1.00    |  ---   |  ---   |   1.00   | 32.00 |
  |  014  |     ---     |  ---   |  ---   |   ---    |  ---  |
  |  015  |     ---     |  ---   |  ---   |   ---    |  ---  |
  |  016  |     ---     |  ---   |  ---   |   ---    |  ---  |
  |  017  |     1.00    |  1.00  |  1.00  |   ---    | 10.00 |
  |  018  |     ---     |  ---   |  ---   |   ---    |  ---  |
  |  019  |     ---     |  ---   |  ---   |   ---    |  ---  |
  | total |    12.00    | 13.00  | 13.00  |  11.00   |       |
  +-------+-------------+--------+--------+----------+-------+
  
   ---: unsolved
    X : invalid
  
   created by IPCrun 1.2 (Revision: 295), Mon Dec 19 23:35:01 2011
  
  
  seq-opt: ranking (Mon Dec 19 23:35:01 2011)
  +-------------+---------+-------------+-------+
  |   planner   | sokoban | parcprinter | total |
  +-------------+---------+-------------+-------+
  | fd-autotune |  12.00  |    10.00    | 22.00 |
  |    fdss-1   |  13.00  |     7.00    | 20.00 |
  |    fdss-2   |  13.00  |     7.00    | 20.00 |
  |   forkinit  |  11.00  |     6.00    | 17.00 |
  |    total    |  49.00  |    30.00    |       |
  +-------------+---------+-------------+-------+
  
   ---: unsolved
    X : invalid
  
   created by IPCrun 1.2 (Revision: 295), Mon Dec 19 23:35:01 2011

Finally, ``score.py`` can produce the output in a variety of
formats. It recognizes at least the same ones described in
:ref:`inspecting-data-label` and, additionally, :const:`latex`. If
used, it creates a LaTeX file called :file:`matrix.tex`. Each page is
divided in two halves: the upper contains the table whereas the lower
half shows up a matrix of color codes with the following meanings:

  :Red boxes:       invalid entry. The planner generated a solution but it was considered 
                    invalid by the Automatic Validation Tool `VAL <http://planning.cis.strath.ac.uk/VAL/>`_

  :Yellow boxes:    empty solution. The planner never found a solution for this task.

  :Gray boxes:      solved tasks. It uses gray levels to mean scores. The darker the better.

Since the resulting LaTeX file uses `ps-tricks
<http://www.tug.org/PSTricks/main.cgi/>`_ it cannot be processed
directly with :command:`pdflatex`. Instead, a makefile is given in the same
directory where this package resides. To produce the corresponding pdf
file just type::

      $ make filename.pdf

where :file:`filename` stands for the name of the LaTeX file
---:file:`matrix` in this case.


.. _tscore-label:

================
tscore.py
================

.. index::
   module: tscore.py
   pair: directive; --planner
   pair: directive; --labels
   pair: directive; --domain
   single: score

This script behaves much the same like :mod:`score.py` but with a key
difference. While it acknowledges the same directives than the
previous script (though no LaTeX output is supported), it just
computes how the score of all planners evolve over time on all the
selected domains.

These figures are computed taking for each planner the time when they
generate a solution. This is, at each time instant, where at least one
planner among those selected by the regular expression given in
:option:`--planner` found a solution to at least one problem in a
particular domain, the score of all planners is computed. This process
produces a curve that shows how the score of each planner evolved over
time at precise time instants.

Since this computation can be costly (a matter of minutes in the
larger tracks if direct access to the result directories is being
performed instead of snapshots), the script also acknowledges a new
flag :option:`--labels`. This directive allows the user to specify a
number of time points which are drawn from the original list of time
instants at regular intervals ---this implies that the final number of
points drawn might not be exactly equal to the number requested by the
user, though it will be always as close as possible.

For example, the following command computes how the score evolves over
time for the six top ranked planners in the Seventh International
Planning Competition (if more than one variant of the same planner
ranked among these, the best is picked up) for the domain `barman`::

  $ ./tscore.py --summary seq-sat.snapshot --metric quality 
    --planner "lama-2011|fdss-1|fd-autotune-1|roamer|forkuniform|probe" --domain "barman" 
    --style octave --quiet > salida.m

Now, if all matrices in the output file :option:`salida.m` are removed but
`scores_barman`, the following commands in gnuplot::

  gnuplot> set xlabel "Time (seconds)"
  gnuplot> set ylabel "Score"
  gnuplot> set terminal png
  gnuplot> set output "barman.png"
  gnuplot> plot "salida.m" using 1:2 with linesp title "fd-autotune-1", 
                "salida.m" using 1:3 with linesp title "fdss-1", 
                "salida.m" using 1:4 with linesp title "forkuniform", 
                "salida.m" using 1:5 with linesp title "lama-2011", 
                "salida.m" using 1:6 with linesp title "probe", 
                "salida.m" using 1:7 with linesp title "roamer"

produce the following output:

.. image:: barman.png
   :align: center

which can be used to draw a number of interesting conclusions. 

Even if only one domain meets the regular expression specified with
:option:`--domain` this scripts shows up an overall ranking table at
the end. This table takes the time instant where at least one planner
solved one task in any of the domains specified. Besides, while the
tables generated per domain list planners in alphabetical order, the
overall ranking table shows them in decreasing order of total score.


.. _test-label:

================
test.py
================

.. index::
   module: test.py
   pair: directive; --directory
   pair: directive; --summary
   pair: directive; --planner
   pair: directive; --domain
   pair: directive; --problem
   pair: directive; --variable
   pair: directive; --filter
   pair: directive; --matcher
   pair: directive; --noentry
   pair: directive; --name 
   pair: directive; --unroll
   pair: directive; --ascending
   pair: directive; --descending
   pair: directive; --style
   pair: directive; --variables
   pair: directive; --tests
   single: test
   single: statistical test
   single: Wilcoxon signed-rank test
   single: Mann-Whitney U test
   single: t-Test
   single: Binomial test
   single: Double hits

In most cases, looking at the number of problems solved, their plan
quality or other characteristics is not enough to judge whether one
planner performs better than another.  This problem is rather typical
in many fields of Science (including Artificial Intelligence) and the
usual approach consists of performing statistical tests. Since the
module :mod:`IPCReport` already provides a facility to access data
(see :ref:`report-label`), it is almost straightforward to provide
another script to perform statistical tests over the same data. This
is the target of :mod:`test.py`. This script uses :mod:`report.py`
transparently to the user to retrieve data from a snapshot or summary
(see :ref:`snapshots-label`) or results tree directory (see
:ref:`results-label`) and perform the indicated statistical tests over
the resulting series.

The script :mod:`test.py` implements four different statistical
tests. Since parametrical statistical tests make questionnable
assumptions about the distribution of data, and besides most series
are more likely to be relatively short (e.g., in the seventh
International Planning Competition there were 20 planning tasks per
domain so that most series have *n=20* samples which is regarded in
some texts as being borderline between a small and large set) three of
them are nonparametric. However, because of its popularity, a fourth
one which is parametric is included as well:

  :Mann-Whitney U-test:     It compares two samples that are
                            independent, or not related. It assesses
                            whether one of two samples of independent
                            observations tends to have larger values
                            than the other. The test automatically
                            corrects for ties and by default uses a
                            continuity correction. The reported
                            *p*-value is for a one-tailed hypothesis,
                            i.~e., when information about whether one
                            sample have larger values than the other
                            is provided. To get the two-tailed
                            *p*-value (i.~e., when the null hypothesis
                            is rejected if the test statistic is
                            either too small or too large) the
                            returned *p*-value has to be multiplied by
                            two.

  :Wilcoxon signed rank test:     In contraposition to the previous
                                  test, the Wilcoxon signed rank test
                                  is a two-tailed nonparametric
                                  statistical procedure for comparing
                                  two samples that are paired, or
                                  related. It tests the null
                                  hypothesis that both samples come
                                  from the same distribution.

                                  This test has been extensively used
                                  in the analysis of previous
                                  International Planning Competitions,
                                  mostly in the third and fifth.

  :Binomial test:                 It is an exact test used with dichotomous data
                                  ---that is, when each individual in the sample is classified in one
                                  of two categories such as success/failure. It provides statistical
                                  significance of deviations from a binomial distribution with
                                  *p=0.5*.

				  The use of this test in the context
				  of Automated Planning was originally
				  proposed by Hoffmann and Nebel to
				  provide statistical significance of
				  the differences in performance of
				  their planner :program:`FF` when
				  using different combinations of
				  enhancements

  :t-Test:                        It is the parametric equivalent test of the Wilcoxon signed
                                  rank test.  This is a two-tailed
                                  test for the null hypothesis that
                                  two independent samples have
                                  identical average (expected) values

One restriction of all of these tests, however, is that they just
compare two series of data. Other tests such as, the
Kolmogorov-Smirnov one-sample test to determine if a data sample meets
acceptable levels of normality or the Friedmann or the Kruskal-Wallis
*H*-tests to compare three or more samples (either related or
unrelated respectively), are not currently implemented. Instead, all
these statistical tests perform pairwise comparisons of an arbitrary
number of series and provide the *p*-value of each pair according to
the selected statistical procedure. If the resulting *p*-value is less
or equal than the critical value that corresponds to a particular
level of risk α, the null hypothesis is rejected and the alternate or
research hypothesis is accepted instead. Typical values of the level
of risk are *α=0.05, 0.01* and *0.001* which stand for a probability
of 95%, 99% and 99.9% respectively that any observed statistical
difference will be real and not due to chance.

This script retrieves data from :mod:`report.py` (see
:ref:`report-label`) transparently to the user so that it acknowledges
the same directives that can be used with exactly the same purpose but
just one restriction: only one variable (with :option:`--variable`)
can be provided so that only single-valued series are allowed. Once
:mod:`report.py` has been silently invoked it retrieves a unique table
of data which is split in as many series as primary keys are present
in the table by :mod:`test.py`. For example, the following query
returns the number of problems *apparently* solved (i.e., that the
planner claims that it solved) and those that are successfully solved
(i.e., validated with `VAL <http://planning.cis.strath.ac.uk/VAL/>`_)
in the woodworking domain by planners :program:`fdss-2`,
:program:`lmcut` and :program:`gamer`::

  $ ./report.py --summary ./seq-opt.results.snapshot --variable solved oksolved 
                --domain woodworking --planner 'fdss-2|lmcut|gamer'

    +---------+-------------+---------+--------+----------+
    | planner |    domain   | problem | solved | oksolved |
    +---------+-------------+---------+--------+----------+
    |  fdss-2 | woodworking |   000   |  True  |   True   |
    |  fdss-2 | woodworking |   001   |  True  |   True   |
    |  fdss-2 | woodworking |   002   |  True  |   True   |
    |  fdss-2 | woodworking |   003   |  True  |   True   |
    |  fdss-2 | woodworking |   004   |  True  |   True   |
    |  fdss-2 | woodworking |   005   |  True  |   True   |
    |  fdss-2 | woodworking |   006   |  True  |   True   |
    |  fdss-2 | woodworking |   007   |  True  |   True   |
    |  fdss-2 | woodworking |   008   |  True  |   True   |
    |  fdss-2 | woodworking |   009   |  True  |   True   |
    |  fdss-2 | woodworking |   010   | False  |  False   |
    |  fdss-2 | woodworking |   011   | False  |  False   |
    |  fdss-2 | woodworking |   012   | False  |  False   |
    |  fdss-2 | woodworking |   013   | False  |  False   |
    |  fdss-2 | woodworking |   014   |  True  |   True   |
    |  fdss-2 | woodworking |   015   | False  |  False   |
    |  fdss-2 | woodworking |   016   | False  |  False   |
    |  fdss-2 | woodworking |   017   | False  |  False   |
    |  fdss-2 | woodworking |   018   | False  |  False   |
    |  fdss-2 | woodworking |   019   | False  |  False   |
    |  gamer  | woodworking |   000   |  True  |   True   |
    |  gamer  | woodworking |   001   |  True  |   True   |
    |  gamer  | woodworking |   002   |  True  |   True   |
    |  gamer  | woodworking |   003   |  True  |   True   |
    |  gamer  | woodworking |   004   |  True  |   True   |
    |  gamer  | woodworking |   005   |  True  |   True   |
    |  gamer  | woodworking |   006   |  True  |   True   |
    |  gamer  | woodworking |   007   |  True  |   True   |
    |  gamer  | woodworking |   008   |  True  |   True   |
    |  gamer  | woodworking |   009   |  True  |   True   |
    |  gamer  | woodworking |   010   |  True  |   True   |
    |  gamer  | woodworking |   011   |  True  |   True   |
    |  gamer  | woodworking |   012   |  True  |   True   |
    |  gamer  | woodworking |   013   |  True  |   True   |
    |  gamer  | woodworking |   014   |  True  |   True   |
    |  gamer  | woodworking |   015   |  True  |   True   |
    |  gamer  | woodworking |   016   | False  |  False   |
    |  gamer  | woodworking |   017   | False  |  False   |
    |  gamer  | woodworking |   018   |  True  |   True   |
    |  gamer  | woodworking |   019   | False  |  False   |
    |  lmcut  | woodworking |   000   |  True  |   True   |
    |  lmcut  | woodworking |   001   |  True  |   True   |
    |  lmcut  | woodworking |   002   |  True  |   True   |
    |  lmcut  | woodworking |   003   |  True  |   True   |
    |  lmcut  | woodworking |   004   |  True  |   True   |
    |  lmcut  | woodworking |   005   |  True  |   True   |
    |  lmcut  | woodworking |   006   |  True  |   True   |
    |  lmcut  | woodworking |   007   |  True  |   True   |
    |  lmcut  | woodworking |   008   |  True  |   True   |
    |  lmcut  | woodworking |   009   |  True  |   True   |
    |  lmcut  | woodworking |   010   | False  |  False   |
    |  lmcut  | woodworking |   011   | False  |  False   |
    |  lmcut  | woodworking |   012   | False  |  False   |
    |  lmcut  | woodworking |   013   | False  |  False   |
    |  lmcut  | woodworking |   014   |  True  |   True   |
    |  lmcut  | woodworking |   015   |  True  |   True   |
    |  lmcut  | woodworking |   016   | False  |  False   |
    |  lmcut  | woodworking |   017   | False  |  False   |
    |  lmcut  | woodworking |   018   | False  |  False   |
    |  lmcut  | woodworking |   019   | False  |  False   |
    +---------+-------------+---------+--------+----------+

The primary key in this case is **planner** which is instatiated to
**fdss-2**, **gamer** and **lmcut**. Therefore, :mod:`test.py`
automatically creates three series with the values of a single
variable for these keys ---in the previous example **solved** was
shown also to exemplify below how two variables can be used
simultaneously with the directive :option:`--filter`.

Observing the number of solved problems in this particular domain, it
turns out that :program:`gamer` seems to be the best (solving 17
problems) followed by :program:`lmcut` (which solves 12) and
:program:`fdss-2` which solves 11. However, performing a statistical
test over these series will provide a more reliable impression of the
relative performance of these planners. Since the data in these series
are dichotomic (it only takes the values ``True`` and ``False``) a
`Binomial test` is performed to know whether one planner performs
better than another. Initially, one can directly ask :mod:`test.py` to
perform the statistical test just passing by the same parameters but
with just one single variable of interest, **oksolved** along with the
particular selection of the statistical test to perform with the
option :option:`--test`::

  $ ./test.py --summary ./seq-opt.results.snapshot --variable oksolved 
              --domain woodworking --planner 'fdss-2|lmcut|gamer' --test bt

     Revision
     Date
    
     ./test.py 1.3
    
     -----------------------------------------------------------------------------
     * snapshot       : ./seq-opt.results.snapshot
     * tests          : ['Binomial test']
     * name           : report
     * level          : None
     * planner        : fdss-2|lmcut|gamer
     * domain         : woodworking
     * problem        : .*
     * variable       : ['oksolved']
     * filter         : None
     * matcher        : all
     * noentry        : -1
     * unroll         : False
     * sorting        : []
     * style          : table
     -----------------------------------------------------------------------------
    
    
     name: report
    +--------+----------+-------+---------+
    |        |  fdss-2  | gamer |  lmcut  |
    +--------+----------+-------+---------+
    | fdss-2 |   ---    |  1.0  |   1.0   |
    | gamer  | 0.015625 |  ---  | 0.03125 |
    | lmcut  |   0.5    |  1.0  |   ---   |
    +--------+----------+-------+---------+
     Binomial test : Perform a binomial two-sided sign test. It computes the number n of times that the serie 
     shown in the row behaves differently than the serie shown in the column. It returns the probability 
     according to a binomial distribution with p=0.5 that the number of times that the serie shown in the row 
     takes values larger than the serie shown in the column equals at least the number of times that this
     difference was observed.
     If this probability is less or equal than a given threshold, e.g., 0.01, 0.05 or 0.1, then reject
     the null hypothesis and assume that the serie shown in the column is significantly smaller
    
     created by IPCtest 1.3 (Revision: 312), Sun Jul 15 17:21:38 2012

It is possible to invoke :mod:`test.py` with the option
:option:`--tests` to get a full list of all the implemented
statistical tests along with a description of their use and
purpose. It is also feasible to request an arbitrary number of
statistical tests passing them altogether after :option:`--test` ---as
in :option:`--test wx mw` which requests simultaneously the Wilcoxon
Signed-rank test and Mann Whitney U test.

From the preceding figure, it seems that :program:`fdss-2` and
:program:`lmcut` perform better than :program:`gamer` with a
confidence level *α=0.05* (i.e., with a probability equal to
95%). This result goes against the original intuition. The reason is
that the Binomial test tests whether the planner in the colum has
values smallers than the planner shown in the row. Since ``False`` is
considered to be smaller than ``True`` in most computing languages
(including Python) the results are clearly misleading. To correct the
results it is neccessary to provide a larger value to those problems
that were not solved. 

Hence, the first step consists of *filtering* data. In this case, the
variable of interest is **solved** (whether the planner provided at
least one solution to a single planning task) which is filtered by
**oksolved** ---whether the plan found is valid or not. A filter
(**oksolved** in the following example) sets the value of a particular
sample to the constant ``NOENTRY`` if it is ``False`` and passes the
value of the selected variable (**solved**) in case it is ``True``. In
other words, it filters the input data according to a secondary
variable. The primary variable is selected with :option:`--variable`
(as in :mod:`report.py`), whereas the secondary variable is selected
with :option:`--filter`. However, filtering data poses a new question:
`When comparing two series, what to do with those entries in one serie
whose value equals NOENTRY?`

Sometimes it is desirable to compare only those entries from two
series where both elements have been filtered ---these are known as
**double hits**. In other cases, it might be better to preserve those
entries where only one serie has the value ``NOENTRY``. The third
alternative consists of comparing both series even if the same entry
contains ``NOENTRY`` for both series. This selection can be performed
with :option:`--matcher` which accepts the values ``and``, ``or`` and
``all`` to match two series as indicated before respectively:

  :and:           It only accepts those entries where both series have values
                  different than ``NOENTRY``

  :or:            It rejects only those entries where both series have values
                  equal to ``NOENTRY``

  :all:           It accepts all entries processing both series in their 
                  current format

Finally, it is safe to set the value of all entries equal to
``NOENTRY`` to a particular value which depends upon the *Null
Hypothesis* used. In the running example, it is that the distribution
of problems solved by one planner is the same than the problems solved
by a different planner. To force the statistical test to consider
those problems unsolved (or which are not valid) as being worse than
those that have been solved it is a must to set ``NOENTRY`` (which
would correspond to either problems unsolved or solved problems which
are not valid) to a large value. This is done with the option
:option:`--noentry`. 

In the following example, the same test shown above is performed again
but this time: first, the values of the variable **solved** are
filtered with the variable **oksolved** to make sure that they are
valid plans with the option :option:`--filter oksolved`; second, all
entries where both series have the value ``NOENTRY`` are discarded
with :option:`--matcher or`; thirdly, all the resulting entries with
the value ``NOENTRY`` are set to 100 to force the statistical test to
consider them worse (as they are larger values than ``True`` which
just equals the integer 1) than those entries that correspond to valid
solutions::

  $ ./test.py --summary ./seq-opt.results.snapshot --variable solved --filter oksolved 
              --matcher or --noentry 100 --domain woodworking 
              --planner 'fdss-2|lmcut|gamer' --test bt

     Revision
     Date

     ./test.py 1.3
    
     -----------------------------------------------------------------------------
     * snapshot       : ./seq-opt.results.snapshot
     * tests          : ['Binomial test']
     * name           : report
     * level          : None
     * planner        : fdss-2|lmcut|gamer
     * domain         : woodworking
     * problem        : .*
     * variable       : ['solved']
     * filter         : ['oksolved']
     * matcher        : or
     * noentry        : 100
     * unroll         : False
     * sorting        : []
     * style          : table
     -----------------------------------------------------------------------------
    
    
     name: report
    +--------+--------+----------+-------+
    |        | fdss-2 |  gamer   | lmcut |
    +--------+--------+----------+-------+
    | fdss-2 |  ---   | 0.015625 |  0.5  |
    | gamer  |  1.0   |   ---    |  1.0  |
    | lmcut  |  1.0   | 0.03125  |  ---  |
    +--------+--------+----------+-------+
     Binomial test : Perform a binomial two-sided sign test. It computes the number n of times that the serie 
     shown in the row behaves differently than the serie shown in the column. It returns the probability 
     according to a binomial distribution with p=0.5 that the number of times that the serie shown in the row 
     takes values larger than the serie shown in the column equals at least the number of times that this
     difference was observed.
     If this probability is less or equal than a given threshold, e.g., 0.01, 0.05 or 0.1, then reject
     the null hypothesis and assume that the serie shown in the column is significantly smaller

     created by IPCtest 1.3 (Revision: 312), Sun Jul 15 21:00:19 2012

As it can be seen, the results indicate now that :program:`gamer`
performs better (i.e., it has values smaller) than :program:`lmcut`
with a confidence level larger than 96% and :program:`fdss-2` with a
confidence level larger than 98%. It seems that the *Research
Hypothesis* that :program:`gamer` outperforms the other two planners
can be accepted only with the most conservative of the typical
confidence levels, 95% ---since the *p*-values retrieved are only
smaller than 0.05 but not than other typical values, 0.01 and 0.001.

Finally, this script acknowledges all the different styles provided by
:mod:`report.py` with the directive :option:`--style` so that the same
tables can be shown in the markup languages html and wiki, octave
files and also in excel worksheets.

.. rubric:: Footnotes

.. [#] When using the dollar sign ``$`` in the command line, the shell will always try to expand it to the values of environment variables. This is known as *interpolation*. To avoid it, strings containing dollar signs shall be embraced between single quotes as in \`$track-$subtrack.$planner-build\`