Skip to content

Autoembedder

Evaluator

Autoembedder

Code reference
Code reference
- Learner
- Data
- Model
- Evaluator Evaluator
  Table of contents
  - src.autoembedder.evaluator
  - loss_delta()
How to contribute

Evaluator

`loss_delta(_, __, model, parameters, df=None)` ¶

This evaluation function calculates the loss delta between the training and test set. This delta describes how well the model can distinguish between the categories of the target variable.

Parameters:

Name	Type	Description	Default
`_`	`None`	Not in use. Needed by Pytorch-ignite.	required
`__`	`None`	Not in use. Needed by Pytorch-ignite.	required
`model`	`Autoembedder`	Instance from the model used for prediction.	required
`parameters`	`Dict[str, Any]`	Dictionary with the parameters used for training and prediction. In the documentation all possible parameters are listed.	required
`df`	`Optional[Union[dd.DataFrame, pd.DataFrame]]`	Dask or Pandas DataFrame. If it is not given, the data is loaded from the given path (`eval_input_path`).	`None`

Returns:

Type	Description
`Tuple[float, float]`	Tuple[float, float]: `loss_mean_diff`, `loss_std_diff` and DataFrame.

Source code in src/autoembedder/evaluator.py

def loss_delta(_, __, model: Autoembedder, parameters: Dict[str, Any], df: Optional[Union[dd.DataFrame, pd.DataFrame]] = None) -> Tuple[float, float]:  # type: ignore
    """This evaluation function calculates the loss delta between the training
    and test set. This delta describes how well the model can distinguish
    between the categories of the target variable.

    Args:
        _ (None): Not in use. Needed by Pytorch-ignite.
        __ (None): Not in use. Needed by Pytorch-ignite.
        model (Autoembedder): Instance from the model used for prediction.
        parameters (Dict[str, Any]): Dictionary with the parameters used for training and prediction.
            In the [documentation](https://chrislemke.github.io/autoembedder/#parameters) all possible parameters are listed.
        df (Optional[Union[dd.DataFrame, pd.DataFrame]], optional): Dask or Pandas DataFrame. If it is not given,
            the data is loaded from the given path (`eval_input_path`).

    Returns:
        Tuple[float, float]: `loss_mean_diff`, `loss_std_diff` and DataFrame.
    """
    if parameters.get("eval_input_path", None) is not None:
        try:
            df = (
                dd.read_parquet(parameters["eval_input_path"], infer_divisions=True)
                .compute()
                .sample(frac=1)
            )
        except ValueError:
            df = pd.read_parquet(parameters["eval_input_path"]).sample(frac=1)
    elif df is not None:
        if isinstance(df, dd.DataFrame):
            df = df.compute()
        df = df.sample(frac=1)
    else:
        raise ValueError(
            "No DataFrame given! Please provide a DataFrame or a path to a parquet file."
        )

    target = parameters["target"]

    df_1 = df.query(f"{target} == 1").drop([target], axis=1)
    df_0 = df.query(f"{target} == 0").drop([target], axis=1).sample(n=df_1.shape[0])

    losses_0: List[float] = []
    losses_1: List[float] = []

    for losses_df, losses in [(df_0, losses_0), (df_1, losses_1)]:
        loss = MSELoss()
        for batch in losses_df.itertuples(index=False):
            losses.append(_predict(model, batch, loss, parameters))

    if parameters.get("trim_eval_errors", 0) == 1:
        losses_0.remove(max(losses_0))
        losses_0.remove(min(losses_0))
        losses_1.remove(max(losses_1))
        losses_1.remove(min(losses_1))

    return np.absolute(np.mean(losses_1) - np.mean(losses_0)), np.absolute(
        np.median(losses_1) - np.median(losses_0)
    )