metricx package¶

Top-level package for MetricX.

class metricx.Metric(name: str, description: str = '', is_higher_better: bool = True)[source]¶

Bases: object

This class represents an objective to be optimized.

name¶: The name of the metric.

description¶: A human-readible description of the metric.

is_higher_better¶: Whether larger values are better.

class metricx.Selector(task: metricx.task.Task, policies: List[Tuple[Callable, float]] = [], min_samples: int = 3)[source]¶

Bases: object

This class implements methods for selecting models to run.

Given a target metric, the Selector class provides methods for selecting models to run. The default policy is to:

Obtain min_samples=3 for each model.

Sample from policies:

(p=0.25) Sample a random model.

(p=0.25) Select the model with the largest standard errror.

(p=0.25) Select the model with the largest number of samples needed to achieve power.

(p=0.25) Select the model with the largest upper confidence bound.

task¶: The target task.

policies¶: A list of tuples containing a policy and the probability of selecting that policy.

min_samples¶: The minimum number of samples required for each model before the randomized policies are applied.

DEFAULT_POLICY = [(<function random_policy>, 0.25), (<function stderr_policy>, 0.25), (<function power_policy>, 0.25), (<function ucb_policy>, 0.25)]¶

propose(metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶

This selects a model to execute.

Parameters: metric – The target metric to optimize.
Returns: The model to execute.

class metricx.Task(name: str, metrics: List[metricx.metric.Metric])[source]¶

Bases: object

This class represents a task which is used to evaluate models.

name¶: The name of the task.

metrics¶: A list of Metrics which are produced by models which are executed on the task. The first metric in this list is considered the default metric.

best(metric: Union[str, metricx.metric.Metric]) → str[source]¶

Get the best model.

Parameters: metric – The target metric to sort by.
Returns: The best model on this task.

get_metric(metric: Union[str, metricx.metric.Metric, None]) → metricx.metric.Metric[source]¶

likelihood(rank: List[str], metric: Union[str, metricx.metric.Metric, None] = None, N: int = 1000)[source]¶

Estimate the likelihood of the given ranking.

Parameters

rank – A proposed list of models sorted from best to worst.
metric – The target metric.
N – The number of samples to use for the estimation,

Returns

A probability of the given ranking being the true ranking.

model_to_mu_var_n(metric: Union[str, metricx.metric.Metric, None])[source]¶: Compute mean, variance, and count.

rank(metric: Union[str, metricx.metric.Metric, None] = None) → List[str][source]¶

Rank the models.

Parameters: metric – The target metric to sort by.
Returns: A list of models, sorted from best to worst.

report(model: str, result: Dict[Union[str, metricx.metric.Metric], float])[source]¶

Report a result for the specified model.

Parameters

model – A string identifying the model.
result – A dictionary mapping each metric to its value.

Raises

KeyError – If the result dictionary is missing metrics.

samples_to_achieve_power(modelA: str, modelB: str, metric: Union[str, metricx.metric.Metric, None] = None)[source]¶

Number of samples needed to achieve power.

This method estimates the number of samples needed - for each model - to achieve 50% statistical power. This corresponds to the probability of detecting a statistically significant effect (p-value=0.1) if there is one.

to_bokeh()[source]¶: Export to bokeh Figure.

to_csv(path_to_csv)[source]¶

Export to CSV.

Parameters: path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]¶

Export to DataFrame.

Returns: A DataFrame where each row corresponds to a single run.

to_figure() → matplotlib.figure.Figure[source]¶

Export to Figure.

Returns: A Figure where each subplot shows a metric.

class metricx.TaskGrid(tasks: List[metricx.task.Task])[source]¶

Bases: object

This class represents a set of tasks.

tasks¶: A list of benchmark tasks.

to_bokeh()[source]¶: Export to bokeh Figure.

to_csv(path_to_csv)[source]¶

Export to CSV.

Parameters: path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]¶

to_html(path_to_html)[source]¶

Export to html file.

Parameters: path_to_html – The path to write the HTML.

Submodules¶

metricx.grid module¶

class metricx.grid.TaskGrid(tasks: List[metricx.task.Task])[source]¶

Bases: object

This class represents a set of tasks.

tasks¶: A list of benchmark tasks.

to_bokeh()[source]¶: Export to bokeh Figure.

to_csv(path_to_csv)[source]¶

Export to CSV.

Parameters: path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]¶

to_html(path_to_html)[source]¶

Export to html file.

Parameters: path_to_html – The path to write the HTML.

metricx.grid.make_colors(N)[source]¶

metricx.metric module¶

class metricx.metric.Metric(name: str, description: str = '', is_higher_better: bool = True)[source]¶

Bases: object

This class represents an objective to be optimized.

name¶: The name of the metric.

description¶: A human-readible description of the metric.

is_higher_better¶: Whether larger values are better.

metricx.selector module¶

class metricx.selector.Selector(task: metricx.task.Task, policies: List[Tuple[Callable, float]] = [], min_samples: int = 3)[source]¶

Bases: object

This class implements methods for selecting models to run.

Given a target metric, the Selector class provides methods for selecting models to run. The default policy is to:

Obtain min_samples=3 for each model.

Sample from policies:

(p=0.25) Sample a random model.

(p=0.25) Select the model with the largest standard errror.

(p=0.25) Select the model with the largest number of samples needed to achieve power.

(p=0.25) Select the model with the largest upper confidence bound.

task¶: The target task.

policies¶: A list of tuples containing a policy and the probability of selecting that policy.

min_samples¶: The minimum number of samples required for each model before the randomized policies are applied.

DEFAULT_POLICY = [(<function random_policy>, 0.25), (<function stderr_policy>, 0.25), (<function power_policy>, 0.25), (<function ucb_policy>, 0.25)]¶

propose(metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶

This selects a model to execute.

Parameters: metric – The target metric to optimize.
Returns: The model to execute.

metricx.selector.power_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶: Select the model with the most samples needed to achieve power.

metricx.selector.random_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶: Select a random model.

metricx.selector.stderr_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶: Select the model with the largest variance.

metricx.selector.ucb_policy(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶: Select the model with the largest upper confidence bound.

metricx.task module¶

class metricx.task.Task(name: str, metrics: List[metricx.metric.Metric])[source]¶

Bases: object

This class represents a task which is used to evaluate models.

name¶: The name of the task.

metrics¶: A list of Metrics which are produced by models which are executed on the task. The first metric in this list is considered the default metric.

best(metric: Union[str, metricx.metric.Metric]) → str[source]¶

Get the best model.

Parameters: metric – The target metric to sort by.
Returns: The best model on this task.

get_metric(metric: Union[str, metricx.metric.Metric, None]) → metricx.metric.Metric[source]¶

likelihood(rank: List[str], metric: Union[str, metricx.metric.Metric, None] = None, N: int = 1000)[source]¶

Estimate the likelihood of the given ranking.

Parameters

rank – A proposed list of models sorted from best to worst.
metric – The target metric.
N – The number of samples to use for the estimation,

Returns

A probability of the given ranking being the true ranking.

model_to_mu_var_n(metric: Union[str, metricx.metric.Metric, None])[source]¶: Compute mean, variance, and count.

rank(metric: Union[str, metricx.metric.Metric, None] = None) → List[str][source]¶

Rank the models.

Parameters: metric – The target metric to sort by.
Returns: A list of models, sorted from best to worst.

report(model: str, result: Dict[Union[str, metricx.metric.Metric], float])[source]¶

Report a result for the specified model.

Parameters

model – A string identifying the model.
result – A dictionary mapping each metric to its value.

Raises

KeyError – If the result dictionary is missing metrics.

samples_to_achieve_power(modelA: str, modelB: str, metric: Union[str, metricx.metric.Metric, None] = None)[source]¶

Number of samples needed to achieve power.

This method estimates the number of samples needed - for each model - to achieve 50% statistical power. This corresponds to the probability of detecting a statistically significant effect (p-value=0.1) if there is one.

to_bokeh()[source]¶: Export to bokeh Figure.

to_csv(path_to_csv)[source]¶

Export to CSV.

Parameters: path_to_csv – The path to write the csv.

to_df() → pandas.core.frame.DataFrame[source]¶

Export to DataFrame.

Returns: A DataFrame where each row corresponds to a single run.

to_figure() → matplotlib.figure.Figure[source]¶

Export to Figure.

Returns: A Figure where each subplot shows a metric.