Man page
Description
exekall
is a python-based test runner. The expressions it executes are
discovered from Python PEP 484 parameter and return value annotations.
Options
exekall
usage: exekall [-h] [--debug] {run,merge,compare,show} ...
Test runner
PATTERNS
All patterns are fnmatch pattern, following basic shell globbing syntax.
A pattern starting with "!" is used as a negative pattern.
options:
-h, --help show this help message and exit
--debug Show complete Python backtrace when exekall crashes.
subcommands:
{run,merge,compare,show}
exekall run
usage: exekall run [-h] [--dependency DEPENDENCY] [-s ID_PATTERN] [--list]
[-n N] [--load-db LOAD_DB] [--load-type TYPE_PATTERN]
[--replay REPLAY | --load-uuid LOAD_UUID]
[--artifact-dir ARTIFACT_DIR | --artifact-root ARTIFACT_ROOT]
[--no-save-value-db] [--verbose] [--pdb]
[--log-level {debug,info,warn,error,critical}]
[--param CALLABLE_PATTERN PARAM VALUE]
[--sweep CALLABLE_PATTERN PARAM START STOP STEP]
[--share TYPE_PATTERN] [--random-order]
[--symlink-artifact-dir-to SYMLINK_ARTIFACT_DIR_TO]
[--restrict CALLABLE_PATTERN] [--forbid TYPE_PATTERN]
[--allow CALLABLE_PATTERN]
[--goal TYPE_PATTERN | --callable-goal CALLABLE_PATTERN]
[--template-scripts SCRIPT_FOLDER] [--adaptor ADAPTOR]
[--conf CONF] [--inject SERIALIZED_OBJECT_PATH]
PYTHON_MODULES [PYTHON_MODULES ...]
Run expressions
Note that the adaptor in the customization module is able to add more
parameters to ``exekall run``. In order to get the complete set of options,
please run ``exekall run YOUR_SOURCES_OR_MODULES --help``.
positional arguments:
PYTHON_MODULES Python modules files or module names. If passed a folder, all
contained files recursively are selected. By default, the current
directory is selected.
options:
-h, --help show this help message and exit
--dependency DEPENDENCY
Same as specifying a module in PYTHON_MODULES but will only be used to
build an expression if it would have been selected without that module
listed. Operators defined in modules listed here will not be used as
the root operator in any expression.
-s ID_PATTERN, --select ID_PATTERN
Only run the expressions with an ID matching any of the supplied
pattern. A pattern starting with "!" can be used to exclude IDs
matching it.
--list List the expressions that will be run without running them.
-n N Run the tests for a number of iterations.
--load-db LOAD_DB Reload a database to use some of its objects. The DB and its artifact
directory will be merged in the produced DB at the end of the
execution, to form a self-contained artifact directory.
--load-type TYPE_PATTERN
Load the (indirect) instances of the given class from the database
instead of the root objects.
--replay REPLAY Replay the execution of the given UUID, loading as much prerequisite
from the DB as possible. This implies --pdb for convenience.
--load-uuid LOAD_UUID
Load the given UUID from the database.
--artifact-dir ARTIFACT_DIR
Folder in which the artifacts will be stored. Defaults to
EXEKALL_ARTIFACT_DIR env var.
--artifact-root ARTIFACT_ROOT
Root folder under which the artifact folders will be created. Defaults
to EXEKALL_ARTIFACT_ROOT env var.
--conf CONF LISA configuration file. If multiple configurations of a given type
are found, they are merged (last one can override keys in previous
ones). Only load trusted files as it can lead to arbitrary code
execution.
--inject SERIALIZED_OBJECT_PATH
Serialized object to inject when building expressions
advanced arguments:
Options not needed for every-day use
--no-save-value-db Do not create a VALUE_DB.pickle.xz file in the artifact folder. This
avoids a costly serialization of the results, but prevents partial re-
execution of expressions.
--verbose, -v More verbose output. Can be repeated for even more verbosity. This
only impacts exekall output, --log-level for more global settings.
--pdb If an exception occurs in the code ran by ``exekall``, drops into a
debugger shell.
--log-level {debug,info,warn,error,critical}
Change the default log level of the standard logging module.
--param CALLABLE_PATTERN PARAM VALUE
Set a function parameter. It needs three fields:
* pattern matching qualified name of the callable
* name of the parameter
* value
--sweep CALLABLE_PATTERN PARAM START STOP STEP
Parametric sweep on a function parameter. It needs five fields:
* pattern matching qualified name of the callable
* name of the parameter
* start value
* stop value
* step size.
--share TYPE_PATTERN Class name pattern to share between multiple iterations.
--random-order Run the expressions in a random order, instead of sorting by name.
--symlink-artifact-dir-to SYMLINK_ARTIFACT_DIR_TO
Create a symlink pointing at the artifact dir.
--restrict CALLABLE_PATTERN
Callable names patterns. Types produced by these callables will only
be produced by these (other callables will be excluded).
--forbid TYPE_PATTERN
Fully qualified type names patterns. Callable returning these types or
any subclass will not be called.
--allow CALLABLE_PATTERN
Allow using callable with a fully qualified name matching these
patterns, even if they have been not selected for various reasons.
--goal TYPE_PATTERN Compute expressions leading to an instance of a class with name
matching this pattern (or a subclass of it).
--callable-goal CALLABLE_PATTERN
Compute expressions ending with a callable which name is matching this
pattern.
--template-scripts SCRIPT_FOLDER
Only create the template scripts of the expressions without running
them.
--adaptor ADAPTOR Adaptor to use from the customization module, if there is more than
one to choose from.
exekall compare
usage: exekall compare [-h] db db
Compare two DBs produced by exekall run.
Note that the adaptor in the customization module recorded in the database
is able to add more parameters to ``exekall compare``. In order to get the
complete set of options, please run ``exekall compare DB1 DB2 --help``.
Options part of a custom group will need to be passed after positional
arguments.
positional arguments:
db DBs created using exekall run to compare.
options:
-h, --help show this help message and exit
exekall show
usage: exekall show [-h] db
Show the content of a ValueDB created by exekall ``run``
Note that the adaptor in the customization module recorded in the database
is able to add more parameters to ``exekall show``. In order to get the
complete set of options, please run ``exekall show DB --help``.
Options part of a custom group will need to be passed after positional
arguments.
positional arguments:
db DB created using exekall run to show.
options:
-h, --help show this help message and exit
exekall merge
usage: exekall merge [-h] -o OUTPUT [--copy] artifact_dirs [artifact_dirs ...]
Merge artifact directories of "exekall run" executions.
By default, it will use hardlinks instead of copies to improve speed and
avoid eating up large amount of space, but that means that artifact
directories should be treated as read-only.
positional arguments:
artifact_dirs Artifact directories created using "exekall run", or value databases
to merge.
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output merged artifacts directory or value database. If the
output already exists, the merged DB will only contain the same roots
as this one. This allows patching-up a pruned DB with other DBs that
contains subexpression's values.
--copy Force copying files, instead of using hardlinks.
Executing expressions
Expressions are built by scanning the python source code passed to exekall
run
. Selecting which expression to execute using exekall run
can be
achieved in several ways:
--select
/-s
with a pattern matching an expression ID. Pattern prefixed with ! can be used to exclude some expressions.Pointing
exekall run
at a subset of python source files, or to module names. Only files (directly or indirectly) imported from these python modules will be scanned for callables.
Once the expressions are selected, multiple iterations of it can be executed
using -n
. --share TYPE_PATTERN
can be used to share part of the expression
graph between all iterations, to avoid re-executing some parts of the
expression. Be aware that all parameters of what is shared will also be shared
implicitly to keep consistent expressions.
The adaptor found in the customization module of the python sources you are
using can add extra options to exekall run
, which are shown in --help
only when these sources are specified as well.
Expression engine
At the core of exekall
is the expression engine. It is in charge of
building sensible sequences of calls out of python-level annotations (see PEP
484), and then executing them. An expression is a graph where each node has
named parameters that point to other nodes.
Expression ID
Each expression has an associated ID that is derived from its structure. The rules are:
The ID of the first parameter of a given node is prepended to the ID of the node, separated with :. The code
f(g())
has the IDg:f
.The ID of the node is composed of the name of the operator of that node (name of a Python callable), followed by a parenthesis-enclosed list of parameters ID, excluding the first parameter. The code
f(p1=g(), p2=h(k()))
has the IDg:f(p2=k:h)
.Expression values can have named tags attached to them. When displaying the ID of such a value, the tag would be inserted right after the operator name, inside brackets. The value returned by
g
tagged with a tag namedmytag
with value42
would give:g[mytag=42]:f(p2=k:h)
. Note that tags are only relevant when using expression values, since the tags are attached to values, not operators.
The first rule allows seamless composition of simple pipeline stages and is
especially suited to object oriented programming, since the first parameter of
methods will be self
.
Tags can be used to add attach some important metadata to the return value of an operator, so it can be easily distinguished when taken out of context.
Execution
Executing an expression means evaluating each node if it has not already been evaluated. If an operator is not reusable, it will always be called when a value is requested from it, even if some existing values computed with the same parameters exist. By default, all operators are reusable, but some types can be flagged as non-reusable by the customization module (see Customizing exekall).
Operators are allowed to be generator functions as well. In that case, the engine will iterate over the generator, and will execute the downstream expressions for each value it provides. Multiple generator functions can be chained, leading to a cascade of values for the same expression.
Once an expression has been executed, all its values will get a UUID that can be used to uniquely refer to it, and track where it was used in the logs.
Exploiting artifacts
exekall run
produces an artifact folder. The location can be set using
--artifact-dir
and other options.
Folder hierarchy
The artifact folder contains the following files:
INFO.log and DEBUG.log contain logs for info and debug levels of the
logging
standard module. Note that standard output is not included in this log, as it does not go through thelogging
moduleVALUE_DB.pickle.xz contains a serialized objects graph for each expression that was executed. The value of each subexpression is included if the object was serializable.
BY_UUID contains symlinks named after UUIDs, and pointing to a relevant subfolder in the artifacts. That allows quick lookup of the artifacts of a given expression if one has its UUID.
A folder for each expression.
Optionally, an ORIGIN folder if the artifact folder is the result of exekall merge, or exekall run –load-db. It contains the hierarchy of each original artifact folder by using folders and symlinks pointing inside the artifact folder.
Inside each expression’s folder, there is a folder with the UUID of the expression itself. Having that level allows merging artifact folders together and avoids conflict in case two different expressions share the same ID.
Inside that folder, the following files can be found:
STRUCTURE which contains the structure of the expression. Each operator is described by its callable name, its return type, and its parameters. Parameters are recursively defined the same way. An svg or .dot (graphviz) variant may exist as well.
EXPRESSION.py and TEMPLATE_EXPRESSION.py files are executable Python script that are equivalent to what was executed by
exekall run
. The template one is created before execution and contains some placeholders for the sparks. The other one is updated after execution to add commented code that reloads any given value from the database. That gives the option to the user to not re-execute some part of the code, but load a serialized value instead.Artifact folders allocated by some operators.
exekall compare
VALUE_DB.pickle.xz can be compared using exekall compare
. This will call the
comparison method of the adaptor that was used when exekall run
was
executed. That function is expected to compare the expression values found in
the databases, by matching values that have the same ID on both databases.
Adding new expressions
Since exekall run
will discover expressions based on type annotations of
callable parameters and return value, all that is needed to extend an existing
package is to write new callables with such annotations. It is possible to use
a base class in an annotation, in which case the engine will be free to pick
all the subclasses it can, and produce an expression with each. A dummy example
would be:
import abc
class BaseConf(abc.ABC):
@abc.abstractmethod
def get_conf(self):
pass
class Conf(BaseConf):
# By default, callables with an empty parameter list are ignored. They
# can be explicitly be used with "exekall run --allow '*Conf'"
def __init__(self):
self.x = 42
def get_conf(self):
return x
class Stage1:
# exekall recognizes classes as a special case: the parameter annotations
# are taken from __init__ and the return type is the class
def __init__(self, conf:BaseConf):
print("building stage1")
self.conf = conf
# first parameter of methods is automatically annotated with the right
# class.
# "forward-references are possible by using a string to annotate.
def process_method(self) -> 'Stage2':
return Stage2(x.conf.x == 42)
class Stage2:
def __init__(self, passed):
self.passed = passed
def process1(x:Stage1) -> Stage2:
return Stage2(x.conf.x == 42)
def process2(x:Stage1, conf:BaseConf, has_default_val=33) -> Stage2:
return Stage2(x.conf.x == 0)
From that, exekall run --allow '*Conf' --goal '*Stage2'
would infer the
expressions Conf:Stage1:process_method
, Conf:Stage1:process1
and
Conf:Stage1:process2(conf=Conf)
. The common subexpression Conf:Stage1
would be
shared between these two by default.
Callables are assumed to not be polymorphic in their return value, as the methods that will be called on the resulting value is decided ahead of time. A limited form of polymorphism akin to Rust’s Generic Associated Types (GATs) or Haskell’s associated type families is supported:
import typing
class Base:
ASSOCIATED_CLS = typing.TypeVar('ASSOCIATED_CLS')
# Methods are allowed to use this polymorphic type as a return type, as
# long as all subclasses override ASSOCIATED_CLS class attribute.
def foo(self) -> 'Base.ASSOCIATED_CLS':
return X
class Derived1(Base):
X = 1
ASSOCIATED_CLS = type(X)
class Derived2(Base):
X = 'hello'
ASSOCIATED_CLS = type(X)
If a parameter has a default value, its annotation can be omitted. If a
parameter has both a default value and an annotation, exekall
will try to
provide a value for it, or use the default value if no subexpression has the right
type.
When an expression is not detected correctly, --verbose
/-v
can be used and
repeated twice to get more information on what callables are being ignored and
why. Most common issues are:
Partial annotations: all parameters and return values need to be either annotated or have a default value.
Abstract Base Classes (see
abc.ABC
) with missing implementation of some attributes.Cycles in the expression graphs. Considering types as pipeline stages helps avoiding cycles in expression graphs when architecturing a module. Not all classes need to be considered as such, only the ones that will be used in annotations.
Missing “spark”, i.e. operator that can provide values without any parameter. The adaptor in the customization module usually takes care of doing that based on domain-specific command line options, but some ignored callables may be forcefully selected using
--allow
if needed.Missing
import
chain from the sources given toexekall run
to the module that defines the callable that is expected to be used. That can be solved by adding moreimport
statements, or simply giving that source file directly toexekall run
.Wrong goal selected using
--goal
.
Customizing exekall
The behavior of exekall
can be customized by subclassing
exekall.customization.AdaptorBase
in a module that must be called
exekall_customization.py
and located in one of the parent packages of the
modules that are explicitly passed to exekall run
. This allows adding
extra options to exekall run
and compare
, tag values in IDs, change the
set of callables that will be hidden from the ID and define what type is
considered to provide reusable values by the engine among other things.