metricx package¶
Top-level package for MetricX.
-
class
metricx.
Metric
(name: str, description: str = '', is_higher_better: bool = True)[source]¶ Bases:
object
This class represents an objective to be optimized.
-
name
¶ The name of the metric.
-
description
¶ A human-readible description of the metric.
-
is_higher_better
¶ Whether larger values are better.
-
-
class
metricx.
Selector
(task: metricx.task.Task, policies: List[Tuple[Callable, float]] = [], min_samples: int = 3)[source]¶ Bases:
object
This class implements methods for selecting models to run.
Given a target metric, the Selector class provides methods for selecting models to run. The default policy is to:
Obtain min_samples=3 for each model.
Sample from policies:
(p=0.25) Sample a random model.
(p=0.25) Select the model with the largest standard errror.
(p=0.25) Select the model with the largest number of samples needed to achieve power.
(p=0.25) Select the model with the largest upper confidence bound.
-
task
¶ The target task.
-
policies
¶ A list of tuples containing a policy and the probability of selecting that policy.
-
min_samples
¶ The minimum number of samples required for each model before the randomized policies are applied.
-
DEFAULT_POLICY
= [(<function random_policy>, 0.25), (<function stderr_policy>, 0.25), (<function power_policy>, 0.25), (<function ucb_policy>, 0.25)]¶
-
class
metricx.
Task
(name: str, metrics: List[metricx.metric.Metric])[source]¶ Bases:
object
This class represents a task which is used to evaluate models.
-
name
¶ The name of the task.
-
metrics
¶ A list of Metrics which are produced by models which are executed on the task. The first metric in this list is considered the default metric.
-
best
(metric: Union[str, metricx.metric.Metric]) → str[source]¶ Get the best model.
- Parameters
metric – The target metric to sort by.
- Returns
The best model on this task.
-
likelihood
(rank: List[str], metric: Union[str, metricx.metric.Metric, None] = None, N: int = 1000)[source]¶ Estimate the likelihood of the given ranking.
- Parameters
rank – A proposed list of models sorted from best to worst.
metric – The target metric.
N – The number of samples to use for the estimation,
- Returns
A probability of the given ranking being the true ranking.
-
model_to_mu_var_n
(metric: Union[str, metricx.metric.Metric, None])[source]¶ Compute mean, variance, and count.
-
rank
(metric: Union[str, metricx.metric.Metric, None] = None) → List[str][source]¶ Rank the models.
- Parameters
metric – The target metric to sort by.
- Returns
A list of models, sorted from best to worst.
-
report
(model: str, result: Dict[Union[str, metricx.metric.Metric], float])[source]¶ Report a result for the specified model.
- Parameters
model – A string identifying the model.
result – A dictionary mapping each metric to its value.
- Raises
KeyError – If the result dictionary is missing metrics.
-
samples_to_achieve_power
(modelA: str, modelB: str, metric: Union[str, metricx.metric.Metric, None] = None)[source]¶ Number of samples needed to achieve power.
This method estimates the number of samples needed - for each model - to achieve 50% statistical power. This corresponds to the probability of detecting a statistically significant effect (p-value=0.1) if there is one.
-
-
class
metricx.
TaskGrid
(tasks: List[metricx.task.Task])[source]¶ Bases:
object
This class represents a set of tasks.
-
tasks
¶ A list of benchmark tasks.
-
Submodules¶
metricx.grid module¶
metricx.metric module¶
-
class
metricx.metric.
Metric
(name: str, description: str = '', is_higher_better: bool = True)[source]¶ Bases:
object
This class represents an objective to be optimized.
-
name
¶ The name of the metric.
-
description
¶ A human-readible description of the metric.
-
is_higher_better
¶ Whether larger values are better.
-
metricx.selector module¶
-
class
metricx.selector.
Selector
(task: metricx.task.Task, policies: List[Tuple[Callable, float]] = [], min_samples: int = 3)[source]¶ Bases:
object
This class implements methods for selecting models to run.
Given a target metric, the Selector class provides methods for selecting models to run. The default policy is to:
Obtain min_samples=3 for each model.
Sample from policies:
(p=0.25) Sample a random model.
(p=0.25) Select the model with the largest standard errror.
(p=0.25) Select the model with the largest number of samples needed to achieve power.
(p=0.25) Select the model with the largest upper confidence bound.
-
task
¶ The target task.
-
policies
¶ A list of tuples containing a policy and the probability of selecting that policy.
-
min_samples
¶ The minimum number of samples required for each model before the randomized policies are applied.
-
DEFAULT_POLICY
= [(<function random_policy>, 0.25), (<function stderr_policy>, 0.25), (<function power_policy>, 0.25), (<function ucb_policy>, 0.25)]¶
-
metricx.selector.
power_policy
(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶ Select the model with the most samples needed to achieve power.
-
metricx.selector.
random_policy
(task: metricx.task.Task, metric: Union[str, metricx.metric.Metric, None] = None) → str[source]¶ Select a random model.
metricx.task module¶
-
class
metricx.task.
Task
(name: str, metrics: List[metricx.metric.Metric])[source]¶ Bases:
object
This class represents a task which is used to evaluate models.
-
name
¶ The name of the task.
-
metrics
¶ A list of Metrics which are produced by models which are executed on the task. The first metric in this list is considered the default metric.
-
best
(metric: Union[str, metricx.metric.Metric]) → str[source]¶ Get the best model.
- Parameters
metric – The target metric to sort by.
- Returns
The best model on this task.
-
likelihood
(rank: List[str], metric: Union[str, metricx.metric.Metric, None] = None, N: int = 1000)[source]¶ Estimate the likelihood of the given ranking.
- Parameters
rank – A proposed list of models sorted from best to worst.
metric – The target metric.
N – The number of samples to use for the estimation,
- Returns
A probability of the given ranking being the true ranking.
-
model_to_mu_var_n
(metric: Union[str, metricx.metric.Metric, None])[source]¶ Compute mean, variance, and count.
-
rank
(metric: Union[str, metricx.metric.Metric, None] = None) → List[str][source]¶ Rank the models.
- Parameters
metric – The target metric to sort by.
- Returns
A list of models, sorted from best to worst.
-
report
(model: str, result: Dict[Union[str, metricx.metric.Metric], float])[source]¶ Report a result for the specified model.
- Parameters
model – A string identifying the model.
result – A dictionary mapping each metric to its value.
- Raises
KeyError – If the result dictionary is missing metrics.
-
samples_to_achieve_power
(modelA: str, modelB: str, metric: Union[str, metricx.metric.Metric, None] = None)[source]¶ Number of samples needed to achieve power.
This method estimates the number of samples needed - for each model - to achieve 50% statistical power. This corresponds to the probability of detecting a statistically significant effect (p-value=0.1) if there is one.
-