.. _automated-testing-page: ***************** Automated testing ***************** Introduction ============ LISA comes with tools to run synthetic tests and analyse the results. A simple workflow will only use the ``exekall`` test runner (or its ``lisa-test`` thin wrapper). More advanced automation needs will be covered by using ``bisector``. exekall ======= ``exekall`` is the test runner of LISA. ``lisa-test`` command is a thin wrapper on top of ``exekall``. It efficiently shares as many stages as possible between tests to speed up the process, and it records these results in a database for later inspection. .. seealso:: :ref:`exekall main documentation` Running tests +++++++++++++ ``exekall run`` subcommand starts a test session. It needs to be pointed at some Python sources (or module name) containing the definition of stages for each test and some initial spark like ``--conf`` or ``--load-db``. ``--conf`` will usually be used with a YAML configuration file in the format specified by :class:`~lisa.target.TargetConf`. .. code-block:: sh exekall run lisa lisa_tests --conf target_conf.yml When pointed at folders (or packages), ``exekall`` will recursively look for Python files. .. note:: ``lisa_tests`` package is now distributed separately from the ``lisa`` package. A subset of the tests can be selected using ``-s PATTERN``. The pattern is a globbing-style pattern, where ``*`` stands as a wildcard. If the pattern starts with an ``!``, no test matching that pattern will be selected. Use ``--list`` to list available tests. .. note:: To list the available tests with ``--list``, both the Python sources and a spark needs to be specified, since exekall will infer what can be run by consuming it. .. code-block:: sh # Select and run all tests starting with PELTTask but not containing "load" exekall run lisa lisa_tests --conf target_conf.yml -s 'PELTTask*' -s '!*load*' ``--artifact-dir`` can be used to set the location at which ``exekall`` will store its artifacts. By default, it will be stored in a sub directory of ``$EXEKALL_ARTIFACT_ROOT`` environment variable. More of it ---------- Multiple iterations of the same set of tests can be executed using ``-n``. .. tip:: In order to speed up the test session when executing multiple iterations, ``--share '*Target'`` can be used. That will share the target connection stage between all iterations, so some autodetection mechanisms will only run once. Analyzing results +++++++++++++++++ ``exekall run`` produces an folder that is the home of all artifacts produced. Some levels in that hierarchy are UUIDs that can be cross-linked with the output. At the deepest levels, one can find artifacts useful for failure analysis, such as ``trace.dat`` files obtained using ``trace-cmd`` tool, and some graphs generated by the tests. A major artifact is a ``VALUE_DB.pickle.xz`` file (see :class:`exekall.engine.ValueDB`). It contains the objects returned by every stage of the tests, serialized in Python's Pickle format. The ``exekall compare`` subcommand can compare two such files, and give a list of changes in failure rate. The compared files need to contain multiple iterations of the same test to have a useful comparison. Non-significant regressions/improvements are not displayed by default. The threshold over which the change is considered non-significant can be modified using ``--alpha``, which will set the alpha risk of the Fisher's exact test carried out on a contingency table like this one: .. list-table:: Contingency table for one testcase :widths: auto :align: center :header-rows: 1 :stub-columns: 1 * - count - old - new * - passed - 15 - 80 * - failed - 5 - 20 That would represent an ``old`` test session with 20 iterations of the test, 15 of which passed and 5 failed. The ``new`` session would have had 100 iterations, out of which 80 passed and 20 failed. .. note:: This kind of experiments only fixes some marginal totals and therefore does not totally satisfy the conditions to use Fisher's exact test. The total number of results on one column (columns marginal total) is fixed, since a test either has to pass or fail. However, the row marginal totals are not fixed, since the experiment does not constrains the total number of success and total number of failures. This kind of experiment would be best analysed using Barnard's test. That said, Fisher's exact test is just less powerful than Barnard's test, which means its only issue is to be too conservative, i.e. will sometimes fail to spot a failure rate change although there actually was one. Another way to express that is that Fisher's exact test will require more iterations before detecting a failure rate change than strictly required. Barnard's test is unfortunately not widely implemented, so Fisher it is ! .. seealso:: :class:`lisa.regression.RegressionResult` The output of ``exekall compare`` looks like that: .. Comparison of 20190222 and 20190412 integration :: testcase old% new% delta% pvalue fix_iter# ---------------------------------------------------------------------------------------------------------------- PELTTask:test_load_avg_behaviour 2.9% 0.0% -2.9% 4.58e-04 PELTTask:test_load_avg_range 0.0% 7.1% 7.1% 1.08e-10 54 PELTTask:test_util_avg_behaviour 2.4% 0.0% -2.4% 1.70e-03 PELTTask:test_util_avg_range 0.0% 7.1% 7.1% 1.08e-10 54 TwoBigTasks:test_slack 4.7% 1.6% -3.1% 1.25e-02 The columns have the following meaning: * ``old%``: failure rate of the test in the old database (i.e. the first on the command line) * ``new%``: failure rate of the test in the new database (i.e. the second on the command line) * ``delta%``: the difference in the old and new failure rates * ``pvalue``: The p-value resulting from the Fisher's exact test used to filter significant regressions or improvements * ``fix_iter#``: The number of iterations required to observe the effects of a fix of a regression. This gives an indication on how many iterations are needed to have `exekall compare` answer the question "is my fix fixing this regression ?", assuming that you actually fixed it. Running less iterations than that to validate a fix will likely result in ``exekall compare`` not being able to conclude that there was a failure rate change (i.e. an improvement), even if the fix is actually correct. .. tip:: When comparing results collected from different boards, the test IDs will probably not match since they are tagged with the user-defined board name. In order to overcome that, use ``--remove-tag board``, so IDs can be matched as expected. Advanced use ++++++++++++ Parametric sweep ---------------- ``--sweep`` allows running the same stage multiple times, sweeping over a range of values for some of its parameters: .. code-block:: sh # The energy_est_threshold_pct parameter of functions with a name matching # '*test_task_placement' will take the following values all values from 0 to 15 # by increments of 5. exekall run lisa lisa_tests --conf target_conf.yml --sweep '*test_task_placement' energy_est_threshold_pct 0 15 5 When something went wrong ------------------------- ``--replay`` provides a simple way of re-executing the last few stages of a test that had an error. That can be used to reproduce a bug in the test code that makes it raise an exception while working on a fix. ``--replay`` takes the UUID of the value of a stage that could not be computed due to an exception. It will then reload the value of all stages that executed correctly, and start again from there. For trace analysis related issues, it allows re-executing the test code without having to re-execute the workload on a board (and thus without needing a board at all): .. code-block:: sh exekall run lisa lisa_tests --load-db artifacts/VALUE_DB.pickle.xz --replay ba017f269bee4687b2a902329ba22bd9 .. warning:: ``--replay`` currently will not restore values that were set using ``--sweep``. Partial execution ----------------- By default, ``exekall run`` on LISA will try to build instances of :class:`~lisa.tests.base.ResultBundle`, which is the last stage of a test's "pipeline" containing the final pass/fail result. That behaviour can be altered using ``--goal``, so ``exekall run`` only runs the first steps of the pipeline in order to gather data without processing them immediately. Data-collection stages are subclasses of :class:`~lisa.tests.base.TestBundle`. No later stage in the pipeline will interact with the target, so it's a good place to stop: .. code-block:: sh exekall run lisa lisa_tests --conf target_conf.yml --goal '*TestBundle' --artifact-dir artifacts Later on, the processing methods can be run from the data collected: .. code-block:: sh exekall run lisa lisa_tests --load-db artifacts/VALUE_DB.pickle.xz --load-type '*TestBundle' .. tip:: ``--load-db`` can also be used to re-process data from regular invocation of exekall run. That can be useful to observe a change made to the processing code over the set of data acquired during an earlier session. A typical use case would be to look at the impact of changing a margin of a test like the ``energy_est_threshold_pct`` parameter of :meth:`~lisa_tests.kernel..scheduler.eas_behaviour.EASBehaviour.test_task_placement` Aggregating results ------------------- One way to get multiple iterations for a test is to run with ``-n``. Another one is to merge the artifact folders created by many calls to ``exekall run``. .. code-block:: sh exekall merge artifacts1 artifacts2 -o merged_artifacts The ``merged_artifacts`` folder will contains all the artifacts of all original folders. File name conflicts are avoided by the use of UUIDs in the artifact folder hierarchy. ``merged_artifacts/VALUE_DB.pickle.xz`` contains all the results of each original databases, and is a suitable input for ``exekall compare``: .. code-block:: sh # Aggregate the results of all runs of the tests under the "old" conditions exekall merge old1 old2 old3 ... -o old_merged # Aggregate the results of all runs of the tests under the "new" conditions exekall merge new1 new2 new3 ... -o new_merged # Look for regressions in the common tests exekall compare old_merged/VALUE_DB.pickle.xz new_merged/VALUE_DB.pickle.xz bisector ======== ``bisector`` allows setting up the steps of a test iteration, repeating them an infinite number of times (by default), similarly to [#]_. .. seealso:: :ref:`bisector main documentation` .. [#] https://git-scm.com/docs/git-bisect Running +++++++ ``bisector run`` is in charge of executing the steps and producing a report. The most important option is ``--steps`` which needs to be pointed at a YAML file with this kind of content: .. code-block:: YAML steps: - class: build cmd: make defconfig Image dtbs # If a flash step fails, the whole session is aborted, otherwise the exit # status is not impacted - class: flash cmd: #insert the command to flash board timeout: 180 # timeout in seconds trials: 5 # If the command fails, try again and only consider the last trial # If a reboot step fails, the whole session is aborted. If it succeeds, it # will participate as "good", like a test step. This allows using bisector # for boot testing. - class: reboot cmd: # insert a command to reboot your board timeout: 300 trials: 5 # A simple shell step will not participate to the overall return code, even if it fails. - class: shell name: ssh-copy-id timeout: 300 trials: 1 # make sure we have ssh key authentication enabled on the target, to # simplify settings of other scripts cmd: sshpass -p password ssh-copy-id -i $HOME/.ssh/id_rsa "$USER@$HOSTNAME" # A test step will make the result good if the command exit with 0, or bad otherwise. - class: LISA-test name: eas-behaviour timeout: 3600 # Block-style strings allow multiple lines. For more block style examples: # https://learnxinyminutes.com/docs/yaml/ cmd: > cd "$LISA_HOME" && exekall run lisa lisa_tests --conf target_conf.yml -s 'OneSmallTask*' # Another test example, that is not integrated with exekall - class: test name: my-other-test cmd: echo hello world .. note:: Since all steps are executed in a loop, that means flashing and rebooting is going to occur over and over. If that is considered as an overhead, it should be done beforehand and not included as a step. Alternatively, one can use ``--skip boot`` to skip steps that have a name or category matching *boot*. .. code-block:: sh # As a convenience, myreport.yml.gz.log will also be created, with a # behaviour similar to: bisector run ... 2>&1 | tee myreport.yml.gz.log bisector run --steps steps.yml --report myreport.yml.gz All available step classes along with available ``run`` options can be looked up using ``bisector step-help``. Options are documented in their CLI form, but also equally apply to the steps configuration file. .. tip:: Bisector supports executing commands in a transient systemd scope using ``systemd-run`` binary, using the ``-ouse-systemd-run`` option. This ensures that all processes started indirectly by the command will be terminated/killed when the step finishes, just like for a systemd service. That is a good idea to enable it for long-running sessions. More on steps options --------------------- Steps are configured using options, that can be set either from the ``--steps`` YAML config file, or directly on the command line. The CLI accepts the format ``-o .