Skip to content

Learner

__training_step(engine, batch, model, optimizer, criterion, parameters)

Here the actual training step is performed. It is called by the training engine. Not using PyTorch ignite this code would be wrapped in some kind of training loop over a range of epochs and batches. But using ignite this is handled by the engine.

Parameters:

Name Type Description Default
engine ignite.engine.Engine

The engine that is calling this method.

required
batch NamedTuple

The batch that is passed to the engine for training.

required
model Autoembedder

The model to be trained.

required
optimizer torch.optim

The optimizer to be used for training.

required
criterion torch.nn.MSELoss

The loss function to be used for training.

required
parameters Dict[str, Any]

The parameters of the training process.

required

Returns:

Type Description
Union[np.float32, np.float64]

Union[np.float32, np.float64]: The loss of the current batch.

Source code in src/autoembedder/learner.py
def __training_step(
    engine: Engine,
    batch: NamedTuple,
    model: Autoembedder,
    optimizer: Adam,
    criterion: MSELoss,
    parameters: Dict,
) -> Union[np.float32, np.float64]:
    """Here the actual training step is performed. It is called by the training
    engine. Not using [PyTorch ignite](https://github.com/pytorch/ignite) this
    code would be wrapped in some kind of training loop over a range of epochs
    and batches. But using ignite this is handled by the engine.

    Args:
        engine (ignite.engine.Engine): The engine that is calling this method.
        batch (NamedTuple): The batch that is passed to the engine for training.
        model (Autoembedder): The model to be trained.
        optimizer (torch.optim): The optimizer to be used for training.
        criterion (torch.nn.MSELoss): The loss function to be used for training.
        parameters (Dict[str, Any]): The parameters of the training process.

    Returns:
        Union[np.float32, np.float64]: The loss of the current batch.
    """

    model.train()
    optimizer.zero_grad()
    cat, cont = model_input(batch, parameters)
    outputs = model(cat, cont)
    train_loss = criterion(outputs, model.last_target)

    if parameters.get("l1_lambda", 0) > 0:
        l1_lambda = parameters["l1_lambda"]
        l1_norm = sum(p.abs().sum() for p in model.parameters())
        train_loss = train_loss + l1_lambda * l1_norm

    train_loss.backward()
    optimizer.step()
    return train_loss.item()

__validation_step(engine, batch, model, criterion, parameters)

Parameters:

Name Type Description Default
engine ignite.engine.Engine

The engine that is calling this method.

required
batch NamedTuple

The batch that is passed to the engine for validation.

required
model Autoembedder

The model used for validation.

required
criterion torch.nn.MSELoss

The loss function to be used for validation.

required
parameters Dict[str, Any]

The parameters of the validation process.

required

Returns:

Type Description
Union[np.float32, np.float64]

Union[np.float32, np.float64]: The loss of the current batch.

Source code in src/autoembedder/learner.py
def __validation_step(
    engine: Engine,
    batch: NamedTuple,
    model: Autoembedder,
    criterion: MSELoss,
    parameters: Dict,
) -> Union[np.float32, np.float64]:
    """
    Args:
        engine (ignite.engine.Engine): The engine that is calling this method.
        batch (NamedTuple): The batch that is passed to the engine for validation.
        model (Autoembedder): The model used for validation.
        criterion (torch.nn.MSELoss): The loss function to be used for validation.
        parameters (Dict[str, Any]): The parameters of the validation process.

    Returns:
        Union[np.float32, np.float64]: The loss of the current batch.
    """

    model.eval()
    with torch.no_grad():
        cat, cont = model_input(batch, parameters)
        val_outputs = model(cat, cont)
        val_loss = criterion(val_outputs, model.last_target)
    return val_loss.item()

fit(parameters, model, train_dataloader, test_dataloader, eval_df=None)

This method is the general wrapper around the fitting process. It is preparing the optimizer, the loss function, the trainer, the validator and the evaluator. Then it attaches everything to the corresponding engines and runs the training.

Parameters:

Name Type Description Default
parameters Dict[str, Any]

The parameters of the training process. In the documentation all possible parameters are listed.

required
model Autoembedder

The model to be trained.

required
train_dataloader torch.utils.data.DataLoader

The dataloader for the training data.

required
test_dataloader torch.utils.data.DataLoader

The dataloader for the test data.

required
eval_df Optional[Union[dd.DataFrame, pd.DataFrame]]

Dask or Pandas DataFrame for the evaluation step. If the path to the evaluation data is given in the parameters (eval_input_path) this argument is not needed. If neither eval_input_path nor eval_df is given, no evaluation step is performed.

None

Returns:

Name Type Description
Autoembedder Autoembedder

Trained Autoembedder model.

Source code in src/autoembedder/learner.py
def fit(
    parameters: Dict,
    model: Autoembedder,
    train_dataloader: DataLoader,
    test_dataloader: DataLoader,
    eval_df: Optional[Union[dd.DataFrame, pd.DataFrame]] = None,
) -> Autoembedder:
    """This method is the general wrapper around the fitting process. It is
    preparing the optimizer, the loss function, the trainer, the validator and
    the evaluator. Then it attaches everything to the corresponding engines and
    runs the training.

    Args:
        parameters (Dict[str, Any]): The parameters of the training process.
            In the [documentation](https://chrislemke.github.io/autoembedder/#parameters) all possible parameters are listed.
        model (Autoembedder): The model to be trained.
        train_dataloader (torch.utils.data.DataLoader): The dataloader for the training data.
        test_dataloader (torch.utils.data.DataLoader): The dataloader for the test data.
        eval_df (Optional[Union[dd.DataFrame, pd.DataFrame]], optional): Dask or Pandas DataFrame for the evaluation step.
            If the path to the evaluation data is given in the parameters (`eval_input_path`) this argument is not needed.
            If neither `eval_input_path` nor `eval_df` is given, no evaluation step is performed.

    Returns:
        Autoembedder: Trained Autoembedder model.
    """

    model = model.to(
        torch.device(
            "cuda"
            if torch.cuda.is_available()
            else "mps"
            if torch.backends.mps.is_available() and parameters.get("use_mps", False)
            else "cpu"
        )
    )
    if (
        torch.backends.mps.is_available() is False
        or parameters.get("use_mps", False) is False
    ):
        model = model.double()

    optimizer = Adam(
        model.parameters(),
        lr=parameters.get("lr", 1e-3),
        weight_decay=parameters.get("weight_decay", 0),
        amsgrad=parameters.get("amsgrad", False),
    )
    criterion = MSELoss()

    if parameters.get("xavier_init", False):
        model.init_xavier_weights()

    tb_logger = None
    if parameters.get("tensorboard_log_path", None):
        tb_logger = TensorboardLogger(
            log_dir=f"{parameters['tensorboard_log_path']}/{date.strftime('%Y.%m.%d-%H_%M')}"
        )

    trainer = Engine(
        partial(
            __training_step,
            model=model,
            optimizer=optimizer,
            criterion=criterion,
            parameters=parameters,
        )
    )
    validator = Engine(
        partial(
            __validation_step, model=model, criterion=criterion, parameters=parameters
        )
    )
    evaluator = Engine(
        partial(loss_delta, model=model, parameters=parameters, df=eval_df)
    )

    if parameters.get("verbose", 0) >= 1:
        __print_summary(model, train_dataloader, parameters)
    __attach_progress_bar(trainer, parameters.get("verbose", False))
    __attach_tb_logger_if_needed(
        trainer, validator, evaluator, tb_logger, model, optimizer, parameters
    )
    __attach_terminate_on_nan(trainer)
    __attach_validation(
        trainer, validator, test_dataloader, parameters.get("verbose", 0) >= 1
    )

    if (
        parameters.get("eval_input_path", None) or eval_df is not None
    ) and parameters.get("target", None):
        __attach_evaluation(
            trainer, evaluator, test_dataloader, parameters.get("verbose", 0) >= 1
        )
    __attach_checkpoint_saving_if_needed(
        trainer, validator, model, optimizer, parameters
    )

    __attach_tb_teardown_if_needed(tb_logger, trainer, validator, evaluator, parameters)

    if parameters.get("load_checkpoint_path", None):
        checkpoint = torch.load(
            parameters["load_checkpoint_path"],
            map_location=torch.device(
                "cuda"
                if torch.cuda.is_available()
                else "mps"
                if torch.backends.mps.is_available()
                and parameters.get("use_mps", False)
                else "cpu"
            ),
        )
        Checkpoint.load_objects(
            to_load={"model": model, "optimizer": optimizer, "trainer": trainer},
            checkpoint=checkpoint,
        )
        print(
            f"""
            Checkpoint loaded!
            Epoch_length: {checkpoint['trainer']['epoch_length']}
            Iterations: {checkpoint['trainer']['iteration']}
            """
        )

    trainer.run(
        train_dataloader,
        max_epochs=parameters["epochs"],
        epoch_length=(
            len(train_dataloader.dataset.ddf.index) // train_dataloader.batch_size
        ),
    )
    return model