metricx package

Top-level package for MetricX.

class metricx.Metric(name: str, description: str = '', is_higher_better: bool = True)[source]

Bases: object

This class represents an objective to be optimized.

name

The name of the metric.

description

A human-readible description of the metric.

is_higher_better

Whether larger values are better.

class metricx.Selector(task: metricx.task.Task, policies: List[Tuple[Callable, float]] = [], min_samples: int = 3)[source]

Bases: object

This class implements methods for selecting models to run.

Given a target metric, the Selector class provides methods for selecting models to run. The default policy is to:

  1. Obtain min_samples=3 for each model.

  2. Sample from policies:

    • (p=0.25) Sample a random model.

    • (p=0.25) Select the model with the largest standard errror.

    • (p=0.25) Select the model with the largest number of samples needed to achieve power.

    • (p=0.25) Select the model with the largest upper confidence bound.

task

The target task.

policies

A list of tuples containing a policy and the probability of selecting that policy.

min_samples

The minimum number of samples required for each model before the randomized policies are applied.

DEFAULT_POLICY = [(<function random_policy>, 0.25), (<function stderr_policy>, 0.25), (<function power_policy>, 0.25), (<function ucb_policy>, 0.25)]
propose(metric: Union[str, metricx.metric.Metric, None] = None) → str[source]

This selects a model to execute.

Parameters

metric – The target metric to optimize.

Returns

The model to execute.

class metricx.Task(name: str, metrics: List[metricx.metric.Metric])[source]

Bases: object

This class represents a task which is used to evaluate models.

name

The name of the task.

metrics

A list of Metrics which are produced by models which are executed on the task. The first metric in this list is considered the default metric.

best(metric: Union[str, metricx.metric.Metric]) → str[source]

Get the best model.

Parameters

metric – The target metric to sort by.

Returns

The best model on this task.

get_metric(metric: Union[str, metricx.metric.Metric, None]) → metricx.metric.Metric[source]
likelihood(rank: List[str], metric: Union[str, metricx.metric.Metric, None] = None, N: int = 1000)[source]

Estimate the likelihood of the given ranking.

Parameters
  • rank – A proposed list of models sorted from best to worst.

  • metric – The target metric.

  • N – The number of samples to use for the estimation,

Returns

A probability of the given ranking being the true ranking.

model_to_mu_var_n(metric: Union[str, metricx.metric.Metric, None])[source]

Compute mean, variance, and count.

rank(metric: Union[str, metricx.metric.Metric, None] = None) → List[str][source]

Rank the models.

Parameters

metric – The target metric to sort by.

Returns

A list of models, sorted from best to worst.

report(model: str, result: Dict[Union[str, metricx.metric.Metric], float])[source]

Report a result for the specified model.

Parameters
  • model – A string identifying the model.

  • result – A dictionary mapping each metric to its value.

Raises

KeyError – If the result dictionary is missing metrics.

samples_to_achieve_power(modelA: str, modelB: str, metric: Union[str, metricx.metric.Metric, None] = None)[source]

Number of samples needed to achieve power.

This method estimates the number of samples needed - for each model - to achieve 50% statistical power. This corresponds to the probability of detecting a statistically significant effect (p-value=0.1) if there is one.

to_bokeh()[source]

Export to bokeh Figure.

to_csv(path_to_csv)[source]

Export to CSV.

Parameters

path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]

Export to DataFrame.

Returns

A DataFrame where each row corresponds to a single run.

to_figure() → matplotlib.figure.Figure[source]

Export to Figure.

Returns

A Figure where each subplot shows a metric.

class metricx.TaskGrid(tasks: List[metricx.task.Task])[source]

Bases: object

This class represents a set of tasks.

tasks

A list of benchmark tasks.

to_bokeh()[source]

Export to bokeh Figure.

to_csv(path_to_csv)[source]

Export to CSV.

Parameters

path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]
to_html(path_to_html)[source]

Export to html file.

Parameters

path_to_html – The path to write the HTML.

Submodules

metricx.grid module

class metricx.grid.TaskGrid(tasks: List[metricx.task.Task])[source]

Bases: object

This class represents a set of tasks.

tasks

A list of benchmark tasks.

to_bokeh()[source]

Export to bokeh Figure.

to_csv(path_to_csv)[source]

Export to CSV.

Parameters

path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]
to_html(path_to_html)[source]

Export to html file.

Parameters

path_to_html – The path to write the HTML.

metricx.grid.make_colors(N)[source]

metricx.metric module

class metricx.metric.Metric(name: str, description: str = '', is_higher_better: bool = True)[source]

Bases: object

This class represents an objective to be optimized.

name

The name of the metric.

description

A human-readible description of the metric.

is_higher_better

Whether larger values are better.

metricx.selector module

class metricx.selector.Selector(task: metricx.task.Task, policies: List[Tuple[Callable, float]] = [], min_samples: int = 3)[source]

Bases: object

This class implements methods for selecting models to run.

Given a target metric, the Selector class provides methods for selecting models to run. The default policy is to:

  1. Obtain min_samples=3 for each model.

  2. Sample from policies:

    • (p=0.25) Sample a random model.

    • (p=0.25) Select the model with the largest standard errror.

    • (p=0.25) Select the model with the largest number of samples needed to achieve power.

    • (p=0.25) Select the model with the largest upper confidence bound.

task

The target task.

policies

A list of tuples containing a policy and the probability of selecting that policy.

min_samples

The minimum number of samples required for each model before the randomized policies are applied.

DEFAULT_POLICY = [(<function random_policy>, 0.25), (<function stderr_policy>, 0.25), (<function power_policy>, 0.25), (<function ucb_policy>, 0.25)]
propose(metric: Union[str, metricx.metric.Metric, None] = None) → str[source]

This selects a model to execute.

Parameters

metric – The target metric to optimize.

Returns

The model to execute.

metricx.selector.power_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]

Select the model with the most samples needed to achieve power.

metricx.selector.random_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]

Select a random model.

metricx.selector.stderr_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]

Select the model with the largest variance.

metricx.selector.ucb_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]

Select the model with the largest upper confidence bound.

metricx.task module

class metricx.task.Task(name: str, metrics: List[metricx.metric.Metric])[source]

Bases: object

This class represents a task which is used to evaluate models.

name

The name of the task.

metrics

A list of Metrics which are produced by models which are executed on the task. The first metric in this list is considered the default metric.

best(metric: Union[str, metricx.metric.Metric]) → str[source]

Get the best model.

Parameters

metric – The target metric to sort by.

Returns

The best model on this task.

get_metric(metric: Union[str, metricx.metric.Metric, None]) → metricx.metric.Metric[source]
likelihood(rank: List[str], metric: Union[str, metricx.metric.Metric, None] = None, N: int = 1000)[source]

Estimate the likelihood of the given ranking.

Parameters
  • rank – A proposed list of models sorted from best to worst.

  • metric – The target metric.

  • N – The number of samples to use for the estimation,

Returns

A probability of the given ranking being the true ranking.

model_to_mu_var_n(metric: Union[str, metricx.metric.Metric, None])[source]

Compute mean, variance, and count.

rank(metric: Union[str, metricx.metric.Metric, None] = None) → List[str][source]

Rank the models.

Parameters

metric – The target metric to sort by.

Returns

A list of models, sorted from best to worst.

report(model: str, result: Dict[Union[str, metricx.metric.Metric], float])[source]

Report a result for the specified model.

Parameters
  • model – A string identifying the model.

  • result – A dictionary mapping each metric to its value.

Raises

KeyError – If the result dictionary is missing metrics.

samples_to_achieve_power(modelA: str, modelB: str, metric: Union[str, metricx.metric.Metric, None] = None)[source]

Number of samples needed to achieve power.

This method estimates the number of samples needed - for each model - to achieve 50% statistical power. This corresponds to the probability of detecting a statistically significant effect (p-value=0.1) if there is one.

to_bokeh()[source]

Export to bokeh Figure.

to_csv(path_to_csv)[source]

Export to CSV.

Parameters

path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]

Export to DataFrame.

Returns

A DataFrame where each row corresponds to a single run.

to_figure() → matplotlib.figure.Figure[source]

Export to Figure.

Returns

A Figure where each subplot shows a metric.