.. _automated-testing-page:

*****************
Automated testing
*****************

Introduction
============

LISA comes with tools to run synthetic tests and analyse the results. A simple
workflow will only use the ``exekall`` test runner (or its ``lisa-test`` thin
wrapper). More advanced automation needs will be covered by using ``bisector``.

exekall
=======

``exekall`` is the test runner of LISA. ``lisa-test`` command is a thin wrapper
on top of ``exekall``. It efficiently shares as many stages as possible between
tests to speed up the process, and it records these results in a database for
later inspection.

.. seealso:: :ref:`exekall main documentation<exekall-doc>`

Running tests
+++++++++++++

``exekall run`` subcommand starts a test session. It needs to be pointed at some
Python sources (or module name) containing the definition of stages for each
test and some initial spark like ``--conf`` or ``--load-db``.

``--conf`` will usually be used with a YAML configuration file in the format
specified by :class:`~lisa.target.TargetConf`.

.. code-block:: sh

  exekall run lisa lisa_tests --conf target_conf.yml

When pointed at folders (or packages), ``exekall`` will recursively look for
Python files.

.. note:: ``lisa_tests`` package is now distributed separately from the
	``lisa`` package.

A subset of the tests can be selected using ``-s PATTERN``. The pattern is a
globbing-style pattern, where ``*`` stands as a wildcard. If the pattern starts
with an ``!``, no test matching that pattern will be selected. Use ``--list``
to list available tests.

.. note:: To list the available tests with ``--list``, both the Python sources
  and a spark needs to be specified, since exekall will infer what can be run by
  consuming it.

.. code-block:: sh

  # Select and run all tests starting with PELTTask but not containing "load"
  exekall run lisa lisa_tests --conf target_conf.yml -s 'PELTTask*' -s '!*load*'

``--artifact-dir`` can be used to set the location at which ``exekall`` will
store its artifacts. By default, it will be stored in a sub directory of
``$EXEKALL_ARTIFACT_ROOT`` environment variable.

More of it
----------

Multiple iterations of the same set of tests can be executed using ``-n``.

.. tip:: In order to speed up the test session when executing multiple
  iterations, ``--share '*Target'`` can be used. That will share the target
  connection stage between all iterations, so some autodetection mechanisms
  will only run once.

Analyzing results
+++++++++++++++++

``exekall run`` produces an folder that is the home of all artifacts produced.
Some levels in that hierarchy are UUIDs that can be cross-linked with the
output. At the deepest levels, one can find artifacts useful for failure
analysis, such as ``trace.dat`` files obtained using ``trace-cmd`` tool, and
some graphs generated by the tests.

A major artifact is a ``VALUE_DB.pickle.xz`` file (see
:class:`exekall.engine.ValueDB`). It contains the objects returned by every
stage of the tests, serialized in Python's Pickle format. The ``exekall
compare`` subcommand can compare two such files, and give a list of changes in
failure rate. The compared files need to contain multiple iterations of the
same test to have a useful comparison.  Non-significant
regressions/improvements are not displayed by default. The threshold over which
the change is considered non-significant can be modified using ``--alpha``,
which will set the alpha risk of the Fisher's exact test carried out on a
contingency table like this one:

.. list-table:: Contingency table for one testcase
  :widths: auto
  :align: center
  :header-rows: 1
  :stub-columns: 1

  * - count
    - old
    - new
  * - passed
    - 15
    - 80
  * - failed
    - 5
    - 20

That would represent an ``old`` test session with 20 iterations of the test, 15
of which passed and 5 failed. The ``new`` session would have had 100
iterations, out of which 80 passed and 20 failed.

.. note:: This kind of experiments only fixes some marginal totals and
  therefore does not totally satisfy the conditions to use Fisher's exact test.
  The total number of results on one column (columns marginal total) is fixed,
  since a test either has to pass or fail. However, the row marginal totals are
  not fixed, since the experiment does not constrains the total number of
  success and total number of failures. This kind of experiment would be best
  analysed using Barnard's test.

  That said, Fisher's exact test is just less powerful than Barnard's test,
  which means its only issue is to be too conservative, i.e. will sometimes
  fail to spot a failure rate change although there actually was one.

  Another way to express that is that Fisher's exact test will require more
  iterations before detecting a failure rate change than strictly required.
  Barnard's test is unfortunately not widely implemented, so Fisher it is !

.. seealso:: :class:`lisa.regression.RegressionResult`

The output of ``exekall compare`` looks like that:

.. Comparison of 20190222 and 20190412 integration

::

  testcase                                                             old%   new%  delta%       pvalue fix_iter#
  ----------------------------------------------------------------------------------------------------------------
  PELTTask:test_load_avg_behaviour                                     2.9%   0.0%   -2.9%     4.58e-04
  PELTTask:test_load_avg_range                                         0.0%   7.1%    7.1%     1.08e-10        54
  PELTTask:test_util_avg_behaviour                                     2.4%   0.0%   -2.4%     1.70e-03
  PELTTask:test_util_avg_range                                         0.0%   7.1%    7.1%     1.08e-10        54
  TwoBigTasks:test_slack                                               4.7%   1.6%   -3.1%     1.25e-02

The columns have the following meaning:

  * ``old%``: failure rate of the test in the old database (i.e. the first on the command line)
  * ``new%``: failure rate of the test in the new database (i.e. the second on the command line)
  * ``delta%``: the difference in the old and new failure rates
  * ``pvalue``: The p-value resulting from the Fisher's exact test used to
    filter significant regressions or improvements
  * ``fix_iter#``: The number of iterations required to observe the effects of
    a fix of a regression. This gives an indication on how many iterations are
    needed to have `exekall compare` answer the question "is my fix fixing this
    regression ?", assuming that you actually fixed it. Running less iterations
    than that to validate a fix will likely result in ``exekall compare`` not
    being able to conclude that there was a failure rate change (i.e. an
    improvement), even if the fix is actually correct.

.. tip:: When comparing results collected from different boards, the test IDs
  will probably not match since they are tagged with the user-defined board
  name. In order to overcome that, use ``--remove-tag board``, so IDs can be
  matched as expected.

Advanced use
++++++++++++

Parametric sweep
----------------

``--sweep`` allows running the same stage multiple times, sweeping over a range
of values for some of its parameters:

.. code-block:: sh

  # The energy_est_threshold_pct parameter of functions with a name matching
  # '*test_task_placement' will take the following values all values from 0 to 15
  # by increments of 5.
  exekall run lisa lisa_tests --conf target_conf.yml --sweep '*test_task_placement' energy_est_threshold_pct 0 15 5

When something went wrong
-------------------------

``--replay`` provides a simple way of re-executing the last few stages of a
test that had an error. That can be used to reproduce a bug in the test code
that makes it raise an exception while working on a fix. ``--replay`` takes the
UUID of the value of a stage that could not be computed due to an exception. It
will then reload the value of all stages that executed correctly, and start
again from there. For trace analysis related issues, it allows re-executing the
test code without having to re-execute the workload on a board (and thus
without needing a board at all):

.. code-block:: sh

  exekall run lisa lisa_tests --load-db artifacts/VALUE_DB.pickle.xz --replay ba017f269bee4687b2a902329ba22bd9

.. warning:: ``--replay`` currently will not restore values that were set using
  ``--sweep``.


Partial execution
-----------------

By default, ``exekall run`` on LISA will try to build instances of
:class:`~lisa.tests.base.ResultBundle`, which is the last stage of a test's
"pipeline" containing the final pass/fail result. That behaviour can be altered
using ``--goal``, so ``exekall run`` only runs the first steps of the pipeline
in order to gather data without processing them immediately. Data-collection
stages are subclasses of :class:`~lisa.tests.base.TestBundle`. No later stage
in the pipeline will interact with the target, so it's a good place to stop:

.. code-block:: sh

  exekall run lisa lisa_tests --conf target_conf.yml --goal '*TestBundle' --artifact-dir artifacts


Later on, the processing methods can be run from the data collected:

.. code-block:: sh

  exekall run lisa lisa_tests --load-db artifacts/VALUE_DB.pickle.xz --load-type '*TestBundle'


.. tip:: ``--load-db`` can also be used to re-process data from regular
  invocation of exekall run. That can be useful to observe a change made to the
  processing code over the set of data acquired during an earlier session. A
  typical use case would be to look at the impact of changing a margin of a
  test like the ``energy_est_threshold_pct`` parameter of
  :meth:`~lisa_tests.kernel..scheduler.eas_behaviour.EASBehaviour.test_task_placement`

Aggregating results
-------------------

One way to get multiple iterations for a test is to run with ``-n``. Another
one is to merge the artifact folders created by many calls to ``exekall run``.

.. code-block:: sh

  exekall merge artifacts1 artifacts2 -o merged_artifacts


The ``merged_artifacts`` folder will contains all the artifacts of all original
folders. File name conflicts are avoided by the use of UUIDs in the artifact
folder hierarchy. ``merged_artifacts/VALUE_DB.pickle.xz`` contains all the
results of each original databases, and is a suitable input for ``exekall compare``:

.. code-block:: sh

  # Aggregate the results of all runs of the tests under the "old" conditions
  exekall merge old1 old2 old3 ... -o old_merged
  # Aggregate the results of all runs of the tests under the "new" conditions
  exekall merge new1 new2 new3 ... -o new_merged
  # Look for regressions in the common tests
  exekall compare old_merged/VALUE_DB.pickle.xz new_merged/VALUE_DB.pickle.xz

bisector
========

``bisector`` allows setting up the steps of a test iteration, repeating
them an infinite number of times (by default), similarly to [#]_.

.. seealso:: :ref:`bisector main documentation<bisector-doc>`

.. [#] https://git-scm.com/docs/git-bisect

Running
+++++++

``bisector run`` is in charge of executing the steps and producing a report.
The most important option is ``--steps`` which needs to be pointed at a YAML
file with this kind of content:

.. code-block:: YAML

  steps:
    - class: build
      cmd: make defconfig Image dtbs

    # If a flash step fails, the whole session is aborted, otherwise the exit
    # status is not impacted
    - class: flash
      cmd: #insert the command to flash board
      timeout: 180 # timeout in seconds
      trials: 5 # If the command fails, try again and only consider the last trial

    # If a reboot step fails, the whole session is aborted. If it succeeds, it
    # will participate as "good", like a test step. This allows using bisector
    # for boot testing.
    - class: reboot
      cmd: # insert a command to reboot your board
      timeout: 300
      trials: 5

    # A simple shell step will not participate to the overall return code, even if it fails.
    - class: shell
      name: ssh-copy-id
      timeout: 300
      trials: 1
      # make sure we have ssh key authentication enabled on the target, to
      # simplify settings of other scripts
      cmd: sshpass -p password ssh-copy-id -i $HOME/.ssh/id_rsa "$USER@$HOSTNAME"


    # A test step will make the result good if the command exit with 0, or bad otherwise.
    - class: LISA-test
      name: eas-behaviour
      timeout: 3600
      # Block-style strings allow multiple lines. For more block style examples:
      # https://learnxinyminutes.com/docs/yaml/
      cmd: >
	cd "$LISA_HOME" &&
	exekall run lisa lisa_tests --conf target_conf.yml -s 'OneSmallTask*'

    # Another test example, that is not integrated with exekall
    - class: test
      name: my-other-test
      cmd: echo hello world

.. note:: Since all steps are executed in a loop, that means flashing and rebooting is
  going to occur over and over. If that is considered as an overhead, it should
  be done beforehand and not included as a step. Alternatively, one can use
  ``--skip boot`` to skip steps that have a name or category matching *boot*.

.. code-block:: sh

  # As a convenience, myreport.yml.gz.log will also be created, with a
  # behaviour similar to: bisector run ... 2>&1 | tee myreport.yml.gz.log
  bisector run --steps steps.yml --report myreport.yml.gz

All available step classes along with available ``run`` options can be looked
up using ``bisector step-help``. Options are documented in their CLI form, but
also equally apply to the steps configuration file.

.. tip:: Bisector supports executing commands in a transient systemd scope
  using ``systemd-run`` binary, using the ``-ouse-systemd-run`` option. This
  ensures that all processes started indirectly by the command will be
  terminated/killed when the step finishes, just like for a systemd service.
  That is a good idea to enable it for long-running sessions.

More on steps options
---------------------

Steps are configured using options, that can be set either from the ``--steps``
YAML config file, or directly on the command line.

The CLI accepts the format ``-o <name or category>.<option>[=<value>]``. If
``<name or category>`` is omitted, it will match all steps. Otherwise, that is
a globbing-style pattern matching both the name and category of steps. If the
the same option is specified multiple times for a given step, the precedence
rules are:

  1. command line wins over steps config file
  2. on the command line, rightmost ``-o`` wins

.. note:: There is no notion of one pattern being more specific than an other:
  all that matters is the position on the command line.

Steps' name can be set using ``name: foo`` key in the YAML config, and category
using ``cat: bar``.  All steps class come with a default name and category, so
you usually don't need to change the category.

When setting an option in the YAML config file, strings will be parsed as when
specified on the command line, other types will be validated but otherwise taken
as is.

I don't want a config file
--------------------------

In some cases, its easier to declare steps directly on the command line rather
than having a configuration file. This can be used to build thin wrappers
around ``bisector``. Once a step is declared with a name and a class, its
options can be set as usual:

.. code-block:: sh

  bisector run --inline reboot reboot -oreboot.cmd='reboot_my_board.sh' --inline LISA-test mytest -omytest.cmd='lisa-test' --report myreport.yml.gz

Analyzing results
+++++++++++++++++

Reports generated using ``bisector run`` can be inspected using ``bisector
report``. The expected output of the example configuration could look like
that if everything went well:

::

  flash/flash (flash) [GOOD]
      command: <your reboot command>
      #1 : OK
      #2 : OK
      #3 : OK
      #4 : OK
      #5 : OK

  boot/reboot (reboot) [GOOD]
      command: <your reboot command>
      #1 : OK
      #2 : OK
      #3 : OK
      #4 : OK
      #5 : OK

  shell/shell (shell) [GOOD]
      command: sshpass -p password ssh-copy-id -i $HOME/.ssh/id_rsa "$USER@$HOSTNAME"
      #1 : OK
      #2 : OK
      #3 : OK
      #4 : OK
      #5 : OK

  test/behaviour (LISA-test) [GOOD]
      OneSmallTask[board=juno-r0]:test_slack:                      passed 163/163 (100.0%)
      OneSmallTask[board=juno-r0]:test_task_placement:             passed 163/163 (100.0%)
      Error: 0/2, Failed: 0/2, Undecided: 0/2, Skipped: 0/2, Passed: 2/2

  my-other-test/test (test) [GOOD]
      command: hello world
      #1 : OK
      #2 : OK
      #3 : OK
      #4 : OK
      #5 : OK

  Overall bisect result: good commit

There is one section per step, reflecting the steps configuration. Each step
will aggregate the results of all its iterations. The header is formatted as
*<step name>/<step category> (step class name) [<step result>]*. The overall
bisect result is the combination of the result of each steps.

``LISA-test`` has special support for inspecting ``exekall``'s database
collected during each iteration of ``bisector``, and can display a summary
table. By default, a **passed** label will only appear if all iteration
successfully passed.  Otherwise, an appropriate combination of **FAILED**,
**ERROR**, **SKIPPED** and **UNDECIDED** lines will be displayed with the
corresponding count.

Various options can affect what is displayed and taken into account. For
example, ``--skip my-other-test`` will remove the contribution of that step to
the final result. Step-specific report options are documented in ``bisector
step-help``. Some of the options allow exporting collected artifacts from the
report, like ``-oexport-logs``. In the case of ``LISA-test`` step,
that option also makes a symlink to the artifact folder available along the
stdout/stderr log.

.. tip:: Generally speaking, ``-overbose`` will show all available information
  apart from the stdout/stderr output of commands. That may be a lot of
  information, you have been warned :-). ``-oshow-details`` may be all what
  you need after all.

Looking for regressions
-----------------------

Using the ``LISA-test`` step, ``bisector`` collects a pruned version of
``VALUE_DB.pickle.xz`` artifact for each iteration. These databases are stored
directly inside the report. When using the ``-oexport-db=VALUE_DB.pickle.xz``,
it is possible to export a database that is the result of merging all the
collected ones. This can then be compared with another one for regressions:

.. code-block:: sh

  bisector report old_report.yml.gz -oexport-db=old_db.pickle.xz
  bisector report new_report.yml.gz -oexport-db=new_db.pickle.xz
  exekall compare old_db.pickle.xz new_db.pickle.xz


.. note:: If the file already exists, it will be opened as a database and its
  content merged with the new content, then written back to the file.


.. note:: It is also possible to use ``-oexport-logs`` to get all artifact
  folders/archives, and merge them manually using ``exekall merge``. The
  advantage of using ``-oexport-db`` is that the report is self-contained,
  without relying on other files/folders being available (locally or over
  HTTP).

Fixing regressions
------------------

``check-test-fix.py`` tool can be used to check that a fix to a test resolved
errors or a regression, provided that the test can be re-executed on
already-collected :class:`lisa.tests.base.TestBundle` instances. It will call
``exekall run`` in parallel on all the ``exekall``'s
:class:`exekall.engine.ValueDB` collected by ``bisector run``, and will produce
a regression table using ``exekall compare`` with ``old`` being the results
from the report, and ``new`` being the new results.

.. code-block:: sh

  # The test to check is selected using --select in the same way as for `exekall run`.
  # hikey960.report.yml.gz is a bisector report generated using `bisector run`
  # All options coming after the report are passed to `bisector report` to
  # control what artifacts are downloaded and what TestBundle are used.
  check-test-fix.py --select 'OneSmallTask:test_task_placement' hikey960.report.yml.gz -oiterations=1-20

When something goes wrong
+++++++++++++++++++++++++

It's not my fault !
-------------------

Sometimes, things go wrong, and your board may need to be manually power cycled
since it does not reboot anymore for example. ``bisector run`` may have aborted
if you use a step that can trigger that, and you are left with too few
iterations.

You can take care of your board manually, and then resume execution using:

.. code-block:: sh

  bisector run --resume --report report.yml.gz

Typo in the configuration
-------------------------

One step has been misconfigured, but some other expensive steps have run fine.
We don't want to throw the whole report and loose our precious precious data.
Hope is not lost, you can interrupt ``bisector run``, and then pass ``-o``
options to ``bisector run --resume`` to update the value of some options:

.. code-block:: sh

  bisector run --steps steps.yml --report myreport.yml.gz
  # oops, wrong test command
  # <ctrl-c>
  # let's fix that and start again the execution
  bisector run --resume --report report.yml.gz -omy-other-test.cmd='exit $RANDOM'

.. note:: It is also possible to update ``-n`` in the same way. ``bisector run
  --resume`` will top up with the necessary number of iterations to meet ``-n``'s
  value.


No time for script baby-sitting
-------------------------------

``bisector run`` comes with a dbus interface that can publish desktop
notifications on various events to keep you updated when something goes wrong.

.. code-block:: sh

  # That process will relay desktop notifications using the FreeDesktop dbus
  # API Most of the time, only state change (abort, stop etc) is needed, and we
  # don't want to be bothered by every new iteration
  bisector monitor-server --notif enable state &
  bisector run --steps steps.yml --report myreport.yml.gz
  # Notification settings can be later updated using:
  bisector monitor all --notif enable all

A monitoring command is also available:

.. code-block:: sh

  # used with an explicit PID, no monitor-server is needed
  bisector monitor BISECTOR_RUN_PID --log
  # used with "all", the monitor-server is needed as all run instances register
  # to it
  bisector monitor all --status

.. note:: As long as the necessary packages have been installed and unless
  ``--no-dbus`` have been used, it is possible to start ``bisector
  monitor-server`` after ``bisector run``. The latter will detect the appearance
  of the server and will connect to it.

Integration in a CI loop
++++++++++++++++++++++++

``bisector run`` has the ability of uploading reports on the fly to
either Artifactorial or Artifactory.

The ``LISA-test`` step can upload compressed exekall artifact archives using
``-oupload-artifact`` run option. It will record the new HTTP location of the
artifacts in the report. In a way, the report becomes an index that contains
enough information to make a decision on what artifact archive to download
for further analysis (usually to look at ``trace-cmd`` traces).

.. tip:: ``bisector report`` accepts both local files and HTTP URLs

If the worker is unstable, the latest report can still be used and will contain
all the steps information collected so far. When using the
``exekall-LISA-step``, ``-oexport-logs`` will by default download artifact archives
accessible over HTTP. That can be changed using ``-odownload=false``.

Artifactorial
-------------

*Artifactorial* [#]_ is convenient since it allows pushing large quantities
of data to a server, that are automatically cleaned up after a period of time.

.. code-block:: sh

  export ARTIFACTORIAL_TOKEN='ONE_TOKEN_TO_RULE_THEM_ALL'
  export ARTIFACTORIAL_FOLDER='http://instance.of.artifactorial/artifacts/myfolder'
  bisector run --steps steps.yml --report myreport.yml.gz -oupload-artifact --upload-report

.. [#] https://github.com/ivoire/Artifactorial

Artifactory
-----------

*Artifactory* [#]_ has more complex features and it allows pushing large
quantities of data to a server, while giving you control over the policy used
for data cleaning. The pushed data can also be described through properties
which can be used to drive the cleaning policy and to select the data fetched
from the server at a later point in time.

.. code-block:: sh

  export ARTIFACTORY_TOKEN='API_KEY'
  export ARTIFACTORY_FOLDER='http://instance.of.artifactory/mynamespace.myrepo;prop=val'
  bisector run --steps steps.yml --report myreport.yml.gz -oupload-artifact --upload-report

.. [#] https://jfrog.com/artifactory/