Statistical comparison

Statistical comparisons

lisa.stats.series_mean_stats(series, kind, confidence_level=0.95)[source]

Compute the mean along with the a confidence interval based on the T-score.

Returns:

A tuple with:

The mean
The standard deviation, or its equivalent
The standard error of the mean, or its equivalent (Harmonic Standard Error, Geometric Standard Error).
The interval, as an 2-tuple of +/- values

Parameters:

kind (str) – Kind of mean to use: * arithmetic * harmonic * geometric
confidence_level (float) – Confidence level of the confidence interval.

lisa.stats.guess_mean_kind(unit, control_var)[source]

Guess which kind of mean should be used to summarize results in the given unit.

Returns:

'arithmetic' if an arithmetic mean should be used, or 'harmonic'. Geometric mean uses cannot be inferred by this function.

Parameters:

unit (str) – Unit of the values, e.g. 'km/h'.
control_var (str) – Control variable, i.e. variable that is fixed during the experiment. For example, in a car speed experiment, the control variable could be the distance (fixed distance), or the time. In that case, we would have unit='km/h' and control_var='h' if the time was fixed, or control_var='km' if the distance was fixed.

class lisa.stats.Stats(df, value_col='value', ref_group=None, filter_rows=None, compare=True, agg_cols=None, mean_ci_confidence=None, stats=None, stat_col='stat', unit_col='unit', ci_cols=('ci_minus', 'ci_plus'), control_var_col='fixed', mean_kind_col='mean_kind', non_normalizable_units={'pval'})[source]

Bases: Loggable

Compute the statistics on an input pandas.DataFrame in “database” format.

Parameters:

df (pandas.DataFrame) –
Dataframe in database format, i.e. meaningless index, and values in a given column with the other columns used as tags.

Note

Redundant tag columns (aka that are equal) will be removed from the dataframe.
value_col (str) – Name of the column containing the values.
ref_group (dict(str, object)) –
Reference group used to compare the other groups against. It’s format is dict(tag_column_name, tag_value). The comparison will be made on subgroups built out of all the other tag columns, with the reference subgroups being the one matching that dictionary. If the tag value is None, the key will only be used for grouping in graphs. Comparison will add the following statistics:
- A 2-sample Komolgorov-Smirnov test 'ks2samp_test' column. This test is non-parametric and checks for difference in distributions. The only assumption is that the distribution is continuous, which should suit almost all use cases
- Most statistics will be normalized against the reference group as a difference percentage, except for a few non-normalizable values.
Note

The group referenced must exist, otherwise unexpected behaviours might occur.
filter_rows (dict(object, object) or None) – Filter the given pandas.DataFrame with a dict of {“column”: value) that rows has to match to be selected.
compare (bool) – If True, normalize most statistics as a percentage of change compared to ref_group.
agg_cols (list(str)) –
Columns to aggregate on. In a sense, the given columns will be treated like a compound iteration number. Defaults to:
- iteration column if available, otherwise
- All the tag columns that are neither the value nor part of the ref_group.
mean_ci_confidence (float) – Confidence level used to establish the mean confidence interval, between 0 and 1.
stats (dict(str, str or collections.abc.Callable)) –
Dictionnary of statistical functions to summarize each value group formed by tag columns along the aggregation columns. If None is given as value, the name will be passed to pandas.core.groupby.SeriesGroupBy.agg(). Otherwise, the provided function will be run.
Note

One set of keys is special: 'mean', 'std' and 'sem'. When value None is used, a custom function is used instead of the one from pandas, which will compute other related statistics and provide a confidence interval. An attempt will be made to guess the most appropriate kind of mean to use using the mean_kind_col, unit_col and control_var_col:
- The mean itself, as:
  
  'mean' (arithmetic)
  
  'hmean' (harmonic)
  
  'gmean' (geometric)
- The Standard Error of the Mean (SEM):
  
  'sem' (arithmetic)
  
  'hse' (harmonic)
  
  'gse' (geometric)
- The standard deviation:
  
  'std' (arithmetic)
  
  'hsd' (harmonic)
  
  'gsd' (geometric)
stat_col (str) – Name of the column used to hold the name of the statistics that are computed.
unit_col (str) – Name of the column holding the unit of each value (as a string).
ci_cols (tuple(str, str)) – Name of the two columns holding the confidence interval for each computed statistics.
control_var_col – Name of the column holding the control variable name in the experiment leading to the given value. .. seealso:: guess_mean_kind()
control_var_col – str
mean_kind_col (str) –
Type of mean to be used to summarize this value.

Note

Unless geometric mean is used, unit_col and control_var_col should be used to make things more obvious and reduce risks of confusion.
non_normalizable_units (list(str)) – List of units that cannot be normalized against the reference group.

Examples:

import pandas as pd

# The index is meaningless, all what matters is to uniquely identify
# each row using a set of tag columns, such as 'board', 'kernel',
# 'iteration', ...
df = pd.DataFrame.from_records(
    [
        ('juno', 'kernel1', 'bench1', 'score1', 1, 42, 'frame/s', 's'),
        ('juno', 'kernel1', 'bench1', 'score1', 2, 43, 'frame/s', 's'),
        ('juno', 'kernel1', 'bench1', 'score2', 1, 420, 'frame/s', 's'),
        ('juno', 'kernel1', 'bench1', 'score2', 2, 421, 'frame/s', 's'),
        ('juno', 'kernel1', 'bench2', 'score',  1, 54, 'foobar', ''),
        ('juno', 'kernel2', 'bench1', 'score1', 1, 420, 'frame/s', 's'),
        ('juno', 'kernel2', 'bench1', 'score1', 2, 421, 'frame/s', 's'),
        ('juno', 'kernel2', 'bench1', 'score2', 1, 4200, 'frame/s', 's'),
        ('juno', 'kernel2', 'bench1', 'score2', 2, 4201, 'frame/s', 's'),
        ('juno', 'kernel2', 'bench2', 'score',  1, 540, 'foobar', ''),

        ('hikey','kernel1', 'bench1', 'score1', 1, 42, 'frame/s', 's'),
        ('hikey','kernel1', 'bench1', 'score2', 1, 420, 'frame/s', 's'),
        ('hikey','kernel1', 'bench2', 'score',  1, 54, 'foobar', ''),
        ('hikey','kernel2', 'bench1', 'score1', 1, 420, 'frame/s', 's'),
        ('hikey','kernel2', 'bench1', 'score2', 1, 4200, 'frame/s', 's'),
        ('hikey','kernel2', 'bench2', 'score',  1, 540, 'foobar', ''),
    ],
    columns=['board', 'kernel', 'benchmark', 'metric', 'iteration', 'value', 'unit', 'fixed'],
)


# Get a Dataframe will all the default statistics.
Stats(df).df

# Use a ref_group will also compare other groups against it
Stats(df, ref_group={'board': 'juno', 'kernel': 'kernel1'}).df

property df: pandas.DataFrame containing the statistics.

See also

get_df() for more controls.

get_df(remove_ref=None, compare=None)[source]

Returns a pandas.DataFrame containing the statistics.

Parameters:

compare (bool or None) – See Stats compare parameter. If None, it will default to the value provided to Stats.
remove_ref (bool or None) – If True, the rows of the reference group described by ref_group for this object will be removed from the returned dataframe. If None, it will default to compare.

plot_stats(filename=None, remove_ref=None, backend=None, groups_as_row=False, kind=None, **kwargs)[source]

Returns a matplotlib.figure.Figure containing the statistics for the class input pandas.DataFrame.

Parameters:

filename (str or None) – Path to the image file to write to.
remove_ref (bool or None) – If True, do not plot the reference group. See get_df().
backend (str or None) – Holoviews backend to use: bokeh or matplotlib. If None, the current holoviews backend selected with hv.extension() will be used.
groups_as_row (bool) – By default, subgroups are used as rows in the subplot matrix so that the values shown on a given graph can be expected to be in the same order of magnitude. However, when there are many subgroups, this can lead to very large and somewhat hard to navigate plot matrix. In this case, using the group for the rows might help a great deal.
kind (str or None) –
Type of plot. Can be any of:
- horizontal_bar
- vertical_bar
- None

Variable keyword arguments:

Forwarded to get_df().

plot_histogram(cumulative=False, bins=50, nbins=None, density=False, **kwargs)[source]

Returns a matplotlib.figure.Figure with histogram of the values in the input pandas.DataFrame.

Parameters:

cumulative (bool) – Cumulative plot (CDF).
bins (int or None) – Number of bins for the distribution.
filename (str or None) – Path to the image file to write to.

plot_values(**kwargs)[source]

Returns a holoviews element with the values in the input pandas.DataFrame.

Parameters:: filename (str or None) – Path to the image file to write to.

Workload Automation

exception lisa.wa.WAOutputNotFoundError(collectors)[source]

Bases: Exception

classmethod from_collector(collector, excep)[source]

classmethod from_excep_list(exceps)[source]

class lisa.wa.StatsProp[source]

Bases: object

Provides a stats property.

get_stats(ensure_default_groups=True, ref_group=None, agg_cols=None, **kwargs)[source]

Returns a lisa.stats.Stats loaded with the result pandas.DataFrame.

Parameters:

ensure_default_groups (bool) – If True, ensure ref_group will contain appropriate keys for usual Workload Automation result display.
ref_group – Forwarded to lisa.stats.Stats

Variable keyword arguments:

Forwarded to lisa.stats.Stats

property stats: Short-hand property equivalent to self.get_stats()

See also

get_stats()

class lisa.wa.WAOutput(path, kernel_path=None)[source]

Bases: StatsProp, Mapping, Loggable

Recursively parse a Workload Automation output, using registered collectors (leaf subclasses of WACollectorBase). The data collected are accessible through a pandas.DataFrame in “database” format:

meaningless index

all values are tagged using tag columns

Parameters:

path (str) – Path containing a Workload Automation output.
kernel_path – Kernel source path. Used to resolve the name of the kernel which ran the workload.
kernel_path – str

Example:

wa_output = WAOutput('wa/output/path')
# Pick a specific collector. See also WAOutput.get_collector()
stats = wa_output['results'].stats
stats.plot_stats(filename='stats.html')

__hash__()[source]: Each instance is different, like regular objects, and unlike dictionaries.

property df: DataFrame containing the data collected by all the registered WAOutput collectors.

get_collector(name, **kwargs)[source]

Returns a new collector with custom parameters passed to it.

Parameters:: name (str) – Name of the collector.
Variable keyword arguments:: Forwarded to the collector’s constructor.

Example:

WAOutput('wa/output/path').get_collector('energy', postprocess=func)

property jobs: List containing all the jobs present in the output of ‘wa run’.

property outputs: Dict containing a mapping of ‘wa run’ names to RunOutput objects.

class lisa.wa.WACollectorBase(wa_output, df_postprocess=None)[source]

Bases: StatsProp, Loggable, ABC

Base class for all Workload Automation dataframe collectors.

Parameters:

wa_output (WAOutput) – WAOutput parent object.
df_postprocess (collections.abc.Callable) – Function called to postprocess the collected pandas.DataFrame.

See also

Instances of this classes are typically built using WAOutput.get_collector() rather than directly.

property df: pandas.DataFrame containing the data collected.

class lisa.wa.WAResultsCollector(wa_output, df_postprocess=None)[source]

Bases: WACollectorBase

Collector for the Workload Automation test results.

NAME = 'results'

class lisa.wa.WAArtifactCollectorBase(wa_output, df_postprocess=None)[source]

Bases: WACollectorBase

Workload Automation artifact collector base class.

class lisa.wa.WAEnergyCollector(wa_output, df_postprocess=None)[source]

Bases: WAArtifactCollectorBase

WA collector for the energy_measurement augmentation.

Example:

def postprocess(df):
    df = df.pivot_table(values='value', columns='metric', index=['sample', 'iteration', 'workload'])

    df = pd.DataFrame({
        'CPU_power': (
            df['A55_power'] +
            df['A76_1_power'] +
            df['A76_2_power']
        ),
    })
    df['unit'] = 'Watt'
    df = df.reset_index()
    df = df.melt(id_vars=['sample', 'iteration', 'workload', 'unit'], var_name='metric')
    return df

WAOutput('wa/output/path').get_collector(
    'energy',
    df_postprocess=postprocess,
).df

NAME = 'energy'

get_stats(**kwargs)[source]

Returns a lisa.stats.Stats loaded with the result pandas.DataFrame.

Parameters:

ensure_default_groups (bool) – If True, ensure ref_group will contain appropriate keys for usual Workload Automation result display.
ref_group – Forwarded to lisa.stats.Stats

Variable keyword arguments:

Forwarded to lisa.stats.Stats

class lisa.wa.WATraceCollector(wa_output, trace_to_df=<function _stub_trace_to_df>, **kwargs)[source]

Bases: WAArtifactCollectorBase

WA collector for the trace augmentation.

Parameters:: trace_to_df (collections.abc.Callable) – Function used by the collector to convert the lisa.trace.Trace to a pandas.DataFrame.
Variable keyword arguments:: Forwarded to lisa.trace.Trace.

Example:

def trace_idle_analysis(trace):
    cpu = 0
    df = trace.ana.idle.df_cluster_idle_state_residency([cpu])
    df = df.reset_index()
    df['cpu'] = cpu

    # Melt the column 'time' into lines, so that the dataframe is in
    # "database" format: each value is uniquely identified by "tag"
    # columns
    return df.melt(
        var_name='metric',
        value_vars=['time'],
        id_vars=['idle_state'],
     )

WAOutput('wa/output/path').get_collector(
    'trace',
    trace_to_df=trace_idle_analysis,
).df

NAME = 'trace'

property traces: lisa.utils.LazyMapping that maps job names & iteration numbers to their corresponding lisa.trace.Trace.

class lisa.wa.WAJankbenchCollector(wa_output, df_postprocess=None)[source]

Bases: WAArtifactCollectorBase

WA collector for the jankbench frame timings.

The collector framework will return a single pandas.DataFrame with the results from every jankbench job in lisa.stats.Stats format (i.e. the returned dataframe is arranged such that each reported metric is separated as a separate row). The metrics reported are:

. total_duration: Time in milliseconds to complete the frame

. jank_frame: Boolean indicator of missed frame deadline. 1 is a Jank frame, 0 is not.

. name: Subtest name, provided by the Jankbench app

. frame_id: monotonically increasing frame number, starts from 1 for each subtest iteration.

An example plotter matching the old-style output can be found in the jupyter notebook working directory at ipynb/wltests/WAOutput-JankbenchDemo.ipynb

If you have existing code expecting a more direct translation of the original sqlite database format, you can massage the collected dataframe back into a closer resemblance to the original source database with this sequence of pandas operations:

wa_output = WAOutput('wa/output/path')
df = wa_output['jankbench'].df
db_df = df.pivot(index=['iteration', 'id', 'kernel', 'frame_id'], columns=['variable'])
db_df = db_df['value'].reset_index()
db_df.columns.name = None
# db_df now looks more like the original format

NAME = 'jankbench'

get_stats(**kwargs)[source]

Returns a lisa.stats.Stats loaded with the result pandas.DataFrame.

Parameters:

ensure_default_groups (bool) – If True, ensure ref_group will contain appropriate keys for usual Workload Automation result display.
ref_group – Forwarded to lisa.stats.Stats

Variable keyword arguments:

Forwarded to lisa.stats.Stats

class lisa.wa.WASysfsExtractorCollector(wa_output, path, type='diff', **kwargs)[source]

Bases: WAArtifactCollectorBase

WA collector for the syfs-extractor augmentation.

Example:

def pixel6_energy_meter(df):
    # Keep only CPU's meters
    df = df[df.value.str.contains('S4M_VDD_CPUCL0|S3M_VDD_CPUCL1|S2M_VDD_CPUCL2')]
    df[['variable', 'value']] = df.value.str.split(', ', expand=True)

    def _clean_variable(variable):
        if 'S4M_VDD_CPUCL0' in variable:
            return 'little-energy'
        if 'S3M_VDD_CPUCL1' in variable:
            return 'mid-energy'
        if 'S2M_VDD_CPUCL2' in variable:
            return 'big-energy'
        return ''

    df['variable'] = df['variable'].apply(_clean_variable)
    df['value'] = df['value'].astype(int)
    df['unit'] = "bogo-ujoules"

    # Add a total energy variable
    df = pd.concat([
        df,
        pd.DataFrame(data={
            'variable': 'total-energy',
            'value': [df['value'].sum()]
        })
    ])
    df.ffill(inplace=True)

    return df

df = WAOutput('.').get_collector(
        'sysfs-extractor',
        path='/sys/bus/iio/devices/iio:device0/energy_value',
        df_postprocess=pixel6_energy_meter
).df

NAME = 'sysfs-extractor'

class lisa.wa_results_collector.WaResultsCollector(base_dir=None, wa_dirs='.*', plat_info=None, kernel_repo_path=None, parse_traces=True, use_cached_trace_metrics=True, display_charts=True)[source]

Bases: Loggable

Collects, analyses and visualises results from multiple WA3 directories

Takes a list of output directories from Workload Automation 3 and parses them. Finds metrics reported by WA itself, and extends those metrics with extra detail extracted from ftrace files, energy instrumentation output, and workload-specific artifacts that are found in the output.

Results can be grouped according to the following terms:

‘metric’ is a specific measurable quantity such as a single frame’s rendering time or the average energy consumed during a workload run.
‘workload’ is the general name of a workload such as ‘jankbench’ or ‘youtube’.
‘test’ is a more specific identification for workload - for example this might identify one of Jankbench’s sub-benchmarks, or specifically playing a certain video on Youtube for 30s.

WaResultsCollector ultimately derives ‘test’ names from the ‘classifiers’::’test’ field of the WA3 agenda file’s ‘workloads’ entries.
‘tag’ is an identifier for a set of run-time target configurations that the target was run under. For example there might exist one ‘tag’ identifying running under the schedutil governor and another for the performance governor.

WaResultsCollector ultimately derives ‘tag’ names from the ‘classifiers’ field of the WA3 agenda file’s ‘sections’ entries.
‘kernel’ identifies the kernel that was running when the metric was collected. This may be a SHA1 or a symbolic ref (branch/tag) derived from a provided Git repository. To try to keep identifiers readable, common prefixes of refs are removed: if the raw refs are ‘test/foo/bar’ and ‘test/foo/baz’, they will be referred to just as ‘bar’ and ‘baz’.

Aside from the provided helper attributes, all metrics are exposed in a DataFrame as the results_df attribute.

Parameters:

wa_dirs (str) – List of paths to WA3 output directories or a regexp of WA3 output directories names to consider starting from the specified base_path
base_dir (str) – The path of a directory containing a collection of WA3 output directories
plat_info (lisa.platforms.platinfo.PlatformInfo) – Optional LISA platform description. If provided, used to enrich extra metrics gleaned from trace analysis.
kernel_repo_path – Optional path to kernel repository. WA3 reports the SHA1 of the kernel that workloads were run against. If this param is provided, the repository is search for symbolic references to replace SHA1s in data representation. This is purely to make the output more manageable for humans.
parse_traces – This class uses LISA to parse and analyse ftrace files for extra metrics. With multiple/large traces this can take some time. Set this param to False to disable trace parsing.
use_cached_trace_metrics – This class uses LISA to parse and analyse ftrace files for extra metrics. With multiple/large traces this can take some time, so the extracted metrics are cached in the provided output directories. Set this param to False to disable this caching.
display_charts – This class uses IPython.display module to render some charts of workloads’ results. But we also want to use this class without rendering any charts when we are only interested in table of figures. Set this param to False if you only want table of results but not display them.

Attention

Deprecated since version 2.0.

WaResultsCollector is deprecated and will be removed in version 4.0, use lisa.wa.WAOutput instead

RE_WLTEST_DIR = re.compile('wa\\.(?P<sha1>\\w+)_(?P<name>.+)')

property workloads

property tags

tests(workload=None)[source]

workload_available_metrics(workload)[source]

class SortBy(key, params, column)

Bases: tuple

Create new instance of SortBy(key, params, column)

__getnewargs__(): Return self as a plain tuple. Used by copy and pickle.

__match_args__ = ('key', 'params', 'column')

static __new__(_cls, key, params, column): Create new instance of SortBy(key, params, column)

column: Alias for field number 2

key: Alias for field number 0

params: Alias for field number 1

boxplot(workload, metric, tag='.*', kernel='.*', test='.*', by=['test', 'tag', 'kernel'], sort_on='mean', ascending=False, xlim=None)[source]

Display boxplots of a certain metric

Creates horizontal boxplots of metrics in the results. Check workloads and workload_available_metrics to find the available workloads and metrics. Check tags, tests and kernels to find the names that results can be filtered against.

By default, the box with the lowest mean value is plotted at the top of the graph, this can be customized with sort_on and ascending.

Parameters:

workload – Name of workload to display metrics for
metric – Name of metric to display
tag – regular expression to filter tags that should be plotted
kernel – regular expression to filter kernels that should be plotted
tag – regular expression to filter tags that should be plotted
by – List of identifiers to group output as in DataFrame.groupby.
sort_on – Name of the statistic to order data for. Supported values are: count, mean, std, min, max. You may alternatively specify a percentile to sort on, this should be an integer in the range [1..100] formatted as a percentage, e.g. 95% is the 95th percentile.
ascending – When True, boxplots are plotted by increasing values (lowest-valued boxplot at the top of the graph) of the specified sort_on statistic.

describe(workload, metric, tag='.*', kernel='.*', test='.*', by=['test', 'tag', 'kernel'], sort_on='mean', ascending=False)[source]

Return a DataFrame of statistics for a certain metric

Compute mean, std, min, max and [50, 75, 95, 99] percentiles for the values collected on each iteration of the specified metric.

Check workloads and workload_available_metrics to find the available workloads and metrics. Check tags, tests and kernels to find the names that results can be filtered against.

Parameters:

workload – Name of workload to display metrics for
metric – Name of metric to display
tag – regular expression to filter tags that should be plotted
kernel – regular expression to filter kernels that should be plotted
tag – regular expression to filter tags that should be plotted
by – List of identifiers to group output as in DataFrame.groupby.
sort_on – Name of the statistic to order data for. Supported values are: count, mean, std, min, max. It’s also supported at the usage of a percentile value, which has to be an integer in the range [1..100] and formatted as a percentage, e.g. 95% is the 95th percentile.
ascending – When True, the statistics are reported by increasing values of the specified sort_on column

report(workload, metric, tag='.*', kernel='.*', test='.*', by=['test', 'tag', 'kernel'], sort_on='mean', ascending=False, xlim=None)[source]

Report a boxplot and a set of statistics for a certain metric

This is a convenience method to call both boxplot and describe at the same time to get a consistent graphical and numerical representation of the values for the specified metric.

Check workloads and workload_available_metrics to find the available workloads and metrics. Check tags, tests and kernels to find the names that results can be filtered against.

Parameters:

workload – Name of workload to display metrics for
metric – Name of metric to display
tag – regular expression to filter tags that should be plotted
kernel – regular expression to filter kernels that should be plotted
tag – regular expression to filter tags that should be plotted
by – List of identifiers to group output as in DataFrame.groupby.

class CDF(df, threshold, above, below)

Bases: tuple

Create new instance of CDF(df, threshold, above, below)

__getnewargs__(): Return self as a plain tuple. Used by copy and pickle.

__match_args__ = ('df', 'threshold', 'above', 'below')

static __new__(_cls, df, threshold, above, below): Create new instance of CDF(df, threshold, above, below)

above: Alias for field number 2

below: Alias for field number 3

df: Alias for field number 0

threshold: Alias for field number 1

plot_cdf(workload='jankbench', metric='frame_total_duration', threshold=16, top_most=None, ncol=1, tag='.*', kernel='.*', test='.*')[source]

Display cumulative distribution functions of a certain metric

Draws CDFs of metrics in the results. Check workloads and workload_available_metrics to find the available workloads and metrics. Check tags, tests and kernels to find the names that results can be filtered against.

The most likely use-case for this is plotting frame rendering times under Jankbench, so default parameters are provided to make this easy.

Parameters:

workload (str) – Name of workload to display metrics for
metric (str) – Name of metric to display
threshold (int) – Value to highlight in the plot - the likely use for this is highlighting the maximum acceptable frame-rendering time in order to see at a glance the rough proportion of frames that were rendered in time.
top_most (int) – Maximum number of CDFs to plot, all available plots if not specified
ncol (int) – Number of columns in the legend, default: 1. If more than one column is requested the legend will be force placed below the plot to avoid covering the data.
tag (int) – regular expression to filter tags that should be plotted
kernel (int) – regular expression to filter kernels that should be plotted
tests (str) – regular expression to filter tests that should be plotted

find_comparisons(base_id=None, by='kernel')[source]

Find metrics that changed between a baseline and variants

The notion of ‘variant’ and ‘baseline’ is defined by the by param. If by=’kernel’, then base_id should be a kernel SHA (or whatever key the ‘kernel’ column in the results_df uses). If by=’tag’ then base_id should be a WA ‘tag id’ (as named in the WA agenda).

plot_comparisons(base_id=None, by='kernel')[source]

Visualise metrics that changed between a baseline and variants

The notion of ‘variant’ and ‘baseline’ is defined by the by param. If by=’kernel’, then base_id should be a kernel SHA (or whatever key the ‘kernel’ column in the results_df uses). If by=’tag’ then base_id should be a WA ‘tag id’ (as named in the WA agenda).

get_artifacts(workload='.*', tag='.*', kernel='.*', test='.*', iteration=1)[source]

Get a dict mapping artifact names to file paths for a specific job.

artifact_name specifies the name of an artifact, e.g. ‘trace_bin’ to find the ftrace file from the specific job run. The other parameters should be used to uniquely identify a run of a job.

get_artifact(artifact_name, workload='.*', tag='.*', kernel='.*', test='.*', iteration=1)[source]

Get the path of an artifact attached to a job output.

artifact_name specifies the name of an artifact, e.g. ‘trace_bin’ to find the ftrace file from the specific job run. The other parameters should be used to uniquely identify a run of a job.