Automated testing

Introduction

LISA comes with tools to run synthetic tests and analyse the results. A simple workflow will only use the exekall test runner (or its lisa-test thin wrapper). More advanced automation needs will be covered by using bisector.

exekall

exekall is the test runner of LISA. lisa-test command is a thin wrapper on top of exekall. It efficiently shares as many stages as possible between tests to speed up the process, and it records these results in a database for later inspection.

Running tests

exekall run subcommand starts a test session. It needs to be pointed at some Python sources (or module name) containing the definition of stages for each test and some initial spark like --conf or --load-db.

--conf will usually be used with a YAML configuration file in the format specified by TargetConf.

exekall run lisa lisa_tests --conf target_conf.yml

When pointed at folders (or packages), exekall will recursively look for Python files.

Note

lisa_tests package is now distributed separately from the lisa package.

A subset of the tests can be selected using -s PATTERN. The pattern is a globbing-style pattern, where * stands as a wildcard. If the pattern starts with an !, no test matching that pattern will be selected. Use --list to list available tests.

Note

To list the available tests with --list, both the Python sources and a spark needs to be specified, since exekall will infer what can be run by consuming it.

# Select and run all tests starting with PELTTask but not containing "load"
exekall run lisa lisa_tests --conf target_conf.yml -s 'PELTTask*' -s '!*load*'

--artifact-dir can be used to set the location at which exekall will store its artifacts. By default, it will be stored in a sub directory of $EXEKALL_ARTIFACT_ROOT environment variable.

More of it

Multiple iterations of the same set of tests can be executed using -n.

Tip

In order to speed up the test session when executing multiple iterations, --share '*Target' can be used. That will share the target connection stage between all iterations, so some autodetection mechanisms will only run once.

Analyzing results

exekall run produces an folder that is the home of all artifacts produced. Some levels in that hierarchy are UUIDs that can be cross-linked with the output. At the deepest levels, one can find artifacts useful for failure analysis, such as trace.dat files obtained using trace-cmd tool, and some graphs generated by the tests.

A major artifact is a VALUE_DB.pickle.xz file (see exekall.engine.ValueDB). It contains the objects returned by every stage of the tests, serialized in Python’s Pickle format. The exekall compare subcommand can compare two such files, and give a list of changes in failure rate. The compared files need to contain multiple iterations of the same test to have a useful comparison. Non-significant regressions/improvements are not displayed by default. The threshold over which the change is considered non-significant can be modified using --alpha, which will set the alpha risk of the Fisher’s exact test carried out on a contingency table like this one:

Contingency table for one testcase

count

old

new

passed

15

80

failed

5

20

That would represent an old test session with 20 iterations of the test, 15 of which passed and 5 failed. The new session would have had 100 iterations, out of which 80 passed and 20 failed.

Note

This kind of experiments only fixes some marginal totals and therefore does not totally satisfy the conditions to use Fisher’s exact test. The total number of results on one column (columns marginal total) is fixed, since a test either has to pass or fail. However, the row marginal totals are not fixed, since the experiment does not constrains the total number of success and total number of failures. This kind of experiment would be best analysed using Barnard’s test.

That said, Fisher’s exact test is just less powerful than Barnard’s test, which means its only issue is to be too conservative, i.e. will sometimes fail to spot a failure rate change although there actually was one.

Another way to express that is that Fisher’s exact test will require more iterations before detecting a failure rate change than strictly required. Barnard’s test is unfortunately not widely implemented, so Fisher it is !

The output of exekall compare looks like that:

testcase                                                             old%   new%  delta%       pvalue fix_iter#
----------------------------------------------------------------------------------------------------------------
PELTTask:test_load_avg_behaviour                                     2.9%   0.0%   -2.9%     4.58e-04
PELTTask:test_load_avg_range                                         0.0%   7.1%    7.1%     1.08e-10        54
PELTTask:test_util_avg_behaviour                                     2.4%   0.0%   -2.4%     1.70e-03
PELTTask:test_util_avg_range                                         0.0%   7.1%    7.1%     1.08e-10        54
TwoBigTasks:test_slack                                               4.7%   1.6%   -3.1%     1.25e-02

The columns have the following meaning:

  • old%: failure rate of the test in the old database (i.e. the first on the command line)

  • new%: failure rate of the test in the new database (i.e. the second on the command line)

  • delta%: the difference in the old and new failure rates

  • pvalue: The p-value resulting from the Fisher’s exact test used to filter significant regressions or improvements

  • fix_iter#: The number of iterations required to observe the effects of a fix of a regression. This gives an indication on how many iterations are needed to have exekall compare answer the question “is my fix fixing this regression ?”, assuming that you actually fixed it. Running less iterations than that to validate a fix will likely result in exekall compare not being able to conclude that there was a failure rate change (i.e. an improvement), even if the fix is actually correct.

Tip

When comparing results collected from different boards, the test IDs will probably not match since they are tagged with the user-defined board name. In order to overcome that, use --remove-tag board, so IDs can be matched as expected.

Advanced use

Parametric sweep

--sweep allows running the same stage multiple times, sweeping over a range of values for some of its parameters:

# The energy_est_threshold_pct parameter of functions with a name matching
# '*test_task_placement' will take the following values all values from 0 to 15
# by increments of 5.
exekall run lisa lisa_tests --conf target_conf.yml --sweep '*test_task_placement' energy_est_threshold_pct 0 15 5

When something went wrong

--replay provides a simple way of re-executing the last few stages of a test that had an error. That can be used to reproduce a bug in the test code that makes it raise an exception while working on a fix. --replay takes the UUID of the value of a stage that could not be computed due to an exception. It will then reload the value of all stages that executed correctly, and start again from there. For trace analysis related issues, it allows re-executing the test code without having to re-execute the workload on a board (and thus without needing a board at all):

exekall run lisa lisa_tests --load-db artifacts/VALUE_DB.pickle.xz --replay ba017f269bee4687b2a902329ba22bd9

Warning

--replay currently will not restore values that were set using --sweep.

Partial execution

By default, exekall run on LISA will try to build instances of ResultBundle, which is the last stage of a test’s “pipeline” containing the final pass/fail result. That behaviour can be altered using --goal, so exekall run only runs the first steps of the pipeline in order to gather data without processing them immediately. Data-collection stages are subclasses of TestBundle. No later stage in the pipeline will interact with the target, so it’s a good place to stop:

exekall run lisa lisa_tests --conf target_conf.yml --goal '*TestBundle' --artifact-dir artifacts

Later on, the processing methods can be run from the data collected:

exekall run lisa lisa_tests --load-db artifacts/VALUE_DB.pickle.xz --load-type '*TestBundle'

Tip

--load-db can also be used to re-process data from regular invocation of exekall run. That can be useful to observe a change made to the processing code over the set of data acquired during an earlier session. A typical use case would be to look at the impact of changing a margin of a test like the energy_est_threshold_pct parameter of test_task_placement()

Aggregating results

One way to get multiple iterations for a test is to run with -n. Another one is to merge the artifact folders created by many calls to exekall run.

exekall merge artifacts1 artifacts2 -o merged_artifacts

The merged_artifacts folder will contains all the artifacts of all original folders. File name conflicts are avoided by the use of UUIDs in the artifact folder hierarchy. merged_artifacts/VALUE_DB.pickle.xz contains all the results of each original databases, and is a suitable input for exekall compare:

# Aggregate the results of all runs of the tests under the "old" conditions
exekall merge old1 old2 old3 ... -o old_merged
# Aggregate the results of all runs of the tests under the "new" conditions
exekall merge new1 new2 new3 ... -o new_merged
# Look for regressions in the common tests
exekall compare old_merged/VALUE_DB.pickle.xz new_merged/VALUE_DB.pickle.xz

bisector

bisector allows setting up the steps of a test iteration, repeating them an infinite number of times (by default), similarly to [1].

Running

bisector run is in charge of executing the steps and producing a report. The most important option is --steps which needs to be pointed at a YAML file with this kind of content:

steps:
  - class: build
    cmd: make defconfig Image dtbs

  # If a flash step fails, the whole session is aborted, otherwise the exit
  # status is not impacted
  - class: flash
    cmd: #insert the command to flash board
    timeout: 180 # timeout in seconds
    trials: 5 # If the command fails, try again and only consider the last trial

  # If a reboot step fails, the whole session is aborted. If it succeeds, it
  # will participate as "good", like a test step. This allows using bisector
  # for boot testing.
  - class: reboot
    cmd: # insert a command to reboot your board
    timeout: 300
    trials: 5

  # A simple shell step will not participate to the overall return code, even if it fails.
  - class: shell
    name: ssh-copy-id
    timeout: 300
    trials: 1
    # make sure we have ssh key authentication enabled on the target, to
    # simplify settings of other scripts
    cmd: sshpass -p password ssh-copy-id -i $HOME/.ssh/id_rsa "$USER@$HOSTNAME"


  # A test step will make the result good if the command exit with 0, or bad otherwise.
  - class: LISA-test
    name: eas-behaviour
    timeout: 3600
    # Block-style strings allow multiple lines. For more block style examples:
    # https://learnxinyminutes.com/docs/yaml/
    cmd: >
      cd "$LISA_HOME" &&
      exekall run lisa lisa_tests --conf target_conf.yml -s 'OneSmallTask*'

  # Another test example, that is not integrated with exekall
  - class: test
    name: my-other-test
    cmd: echo hello world

Note

Since all steps are executed in a loop, that means flashing and rebooting is going to occur over and over. If that is considered as an overhead, it should be done beforehand and not included as a step. Alternatively, one can use --skip boot to skip steps that have a name or category matching boot.

# As a convenience, myreport.yml.gz.log will also be created, with a
# behaviour similar to: bisector run ... 2>&1 | tee myreport.yml.gz.log
bisector run --steps steps.yml --report myreport.yml.gz

All available step classes along with available run options can be looked up using bisector step-help. Options are documented in their CLI form, but also equally apply to the steps configuration file.

Tip

Bisector supports executing commands in a transient systemd scope using systemd-run binary, using the -ouse-systemd-run option. This ensures that all processes started indirectly by the command will be terminated/killed when the step finishes, just like for a systemd service. That is a good idea to enable it for long-running sessions.

More on steps options

Steps are configured using options, that can be set either from the --steps YAML config file, or directly on the command line.

The CLI accepts the format -o <name or category>.<option>[=<value>]. If <name or category> is omitted, it will match all steps. Otherwise, that is a globbing-style pattern matching both the name and category of steps. If the the same option is specified multiple times for a given step, the precedence rules are:

  1. command line wins over steps config file

  2. on the command line, rightmost -o wins

Note

There is no notion of one pattern being more specific than an other: all that matters is the position on the command line.

Steps’ name can be set using name: foo key in the YAML config, and category using cat: bar. All steps class come with a default name and category, so you usually don’t need to change the category.

When setting an option in the YAML config file, strings will be parsed as when specified on the command line, other types will be validated but otherwise taken as is.

I don’t want a config file

In some cases, its easier to declare steps directly on the command line rather than having a configuration file. This can be used to build thin wrappers around bisector. Once a step is declared with a name and a class, its options can be set as usual:

bisector run --inline reboot reboot -oreboot.cmd='reboot_my_board.sh' --inline LISA-test mytest -omytest.cmd='lisa-test' --report myreport.yml.gz

Analyzing results

Reports generated using bisector run can be inspected using bisector report. The expected output of the example configuration could look like that if everything went well:

flash/flash (flash) [GOOD]
    command: <your reboot command>
    #1 : OK
    #2 : OK
    #3 : OK
    #4 : OK
    #5 : OK

boot/reboot (reboot) [GOOD]
    command: <your reboot command>
    #1 : OK
    #2 : OK
    #3 : OK
    #4 : OK
    #5 : OK

shell/shell (shell) [GOOD]
    command: sshpass -p password ssh-copy-id -i $HOME/.ssh/id_rsa "$USER@$HOSTNAME"
    #1 : OK
    #2 : OK
    #3 : OK
    #4 : OK
    #5 : OK

test/behaviour (LISA-test) [GOOD]
    OneSmallTask[board=juno-r0]:test_slack:                      passed 163/163 (100.0%)
    OneSmallTask[board=juno-r0]:test_task_placement:             passed 163/163 (100.0%)
    Error: 0/2, Failed: 0/2, Undecided: 0/2, Skipped: 0/2, Passed: 2/2

my-other-test/test (test) [GOOD]
    command: hello world
    #1 : OK
    #2 : OK
    #3 : OK
    #4 : OK
    #5 : OK

Overall bisect result: good commit

There is one section per step, reflecting the steps configuration. Each step will aggregate the results of all its iterations. The header is formatted as <step name>/<step category> (step class name) [<step result>]. The overall bisect result is the combination of the result of each steps.

LISA-test has special support for inspecting exekall’s database collected during each iteration of bisector, and can display a summary table. By default, a passed label will only appear if all iteration successfully passed. Otherwise, an appropriate combination of FAILED, ERROR, SKIPPED and UNDECIDED lines will be displayed with the corresponding count.

Various options can affect what is displayed and taken into account. For example, --skip my-other-test will remove the contribution of that step to the final result. Step-specific report options are documented in bisector step-help. Some of the options allow exporting collected artifacts from the report, like -oexport-logs. In the case of LISA-test step, that option also makes a symlink to the artifact folder available along the stdout/stderr log.

Tip

Generally speaking, -overbose will show all available information apart from the stdout/stderr output of commands. That may be a lot of information, you have been warned :-). -oshow-details may be all what you need after all.

Looking for regressions

Using the LISA-test step, bisector collects a pruned version of VALUE_DB.pickle.xz artifact for each iteration. These databases are stored directly inside the report. When using the -oexport-db=VALUE_DB.pickle.xz, it is possible to export a database that is the result of merging all the collected ones. This can then be compared with another one for regressions:

bisector report old_report.yml.gz -oexport-db=old_db.pickle.xz
bisector report new_report.yml.gz -oexport-db=new_db.pickle.xz
exekall compare old_db.pickle.xz new_db.pickle.xz

Note

If the file already exists, it will be opened as a database and its content merged with the new content, then written back to the file.

Note

It is also possible to use -oexport-logs to get all artifact folders/archives, and merge them manually using exekall merge. The advantage of using -oexport-db is that the report is self-contained, without relying on other files/folders being available (locally or over HTTP).

Fixing regressions

check-test-fix.py tool can be used to check that a fix to a test resolved errors or a regression, provided that the test can be re-executed on already-collected lisa.tests.base.TestBundle instances. It will call exekall run in parallel on all the exekall’s exekall.engine.ValueDB collected by bisector run, and will produce a regression table using exekall compare with old being the results from the report, and new being the new results.

# The test to check is selected using --select in the same way as for `exekall run`.
# hikey960.report.yml.gz is a bisector report generated using `bisector run`
# All options coming after the report are passed to `bisector report` to
# control what artifacts are downloaded and what TestBundle are used.
check-test-fix.py --select 'OneSmallTask:test_task_placement' hikey960.report.yml.gz -oiterations=1-20

When something goes wrong

It’s not my fault !

Sometimes, things go wrong, and your board may need to be manually power cycled since it does not reboot anymore for example. bisector run may have aborted if you use a step that can trigger that, and you are left with too few iterations.

You can take care of your board manually, and then resume execution using:

bisector run --resume --report report.yml.gz

Typo in the configuration

One step has been misconfigured, but some other expensive steps have run fine. We don’t want to throw the whole report and loose our precious precious data. Hope is not lost, you can interrupt bisector run, and then pass -o options to bisector run --resume to update the value of some options:

bisector run --steps steps.yml --report myreport.yml.gz
# oops, wrong test command
# <ctrl-c>
# let's fix that and start again the execution
bisector run --resume --report report.yml.gz -omy-other-test.cmd='exit $RANDOM'

Note

It is also possible to update -n in the same way. bisector run --resume will top up with the necessary number of iterations to meet -n’s value.

No time for script baby-sitting

bisector run comes with a dbus interface that can publish desktop notifications on various events to keep you updated when something goes wrong.

# That process will relay desktop notifications using the FreeDesktop dbus
# API Most of the time, only state change (abort, stop etc) is needed, and we
# don't want to be bothered by every new iteration
bisector monitor-server --notif enable state &
bisector run --steps steps.yml --report myreport.yml.gz
# Notification settings can be later updated using:
bisector monitor all --notif enable all

A monitoring command is also available:

# used with an explicit PID, no monitor-server is needed
bisector monitor BISECTOR_RUN_PID --log
# used with "all", the monitor-server is needed as all run instances register
# to it
bisector monitor all --status

Note

As long as the necessary packages have been installed and unless --no-dbus have been used, it is possible to start bisector monitor-server after bisector run. The latter will detect the appearance of the server and will connect to it.

Integration in a CI loop

bisector run has the ability of uploading reports on the fly to either Artifactorial or Artifactory.

The LISA-test step can upload compressed exekall artifact archives using -oupload-artifact run option. It will record the new HTTP location of the artifacts in the report. In a way, the report becomes an index that contains enough information to make a decision on what artifact archive to download for further analysis (usually to look at trace-cmd traces).

Tip

bisector report accepts both local files and HTTP URLs

If the worker is unstable, the latest report can still be used and will contain all the steps information collected so far. When using the exekall-LISA-step, -oexport-logs will by default download artifact archives accessible over HTTP. That can be changed using -odownload=false.

Artifactorial

Artifactorial [2] is convenient since it allows pushing large quantities of data to a server, that are automatically cleaned up after a period of time.

export ARTIFACTORIAL_TOKEN='ONE_TOKEN_TO_RULE_THEM_ALL'
export ARTIFACTORIAL_FOLDER='http://instance.of.artifactorial/artifacts/myfolder'
bisector run --steps steps.yml --report myreport.yml.gz -oupload-artifact --upload-report

Artifactory

Artifactory [3] has more complex features and it allows pushing large quantities of data to a server, while giving you control over the policy used for data cleaning. The pushed data can also be described through properties which can be used to drive the cleaning policy and to select the data fetched from the server at a later point in time.

export ARTIFACTORY_TOKEN='API_KEY'
export ARTIFACTORY_FOLDER='http://instance.of.artifactory/mynamespace.myrepo;prop=val'
bisector run --steps steps.yml --report myreport.yml.gz -oupload-artifact --upload-report