Skip to content

Data

dataloader(source, parameters=None)

Parameters:

Name Type Description Default
source Union[str, dask.DataFrame, pandas.DataFrame]

Path to Dask/Pandas DataFrame stored as Parquet or a Dask/Pandas DataFrame.

required
parameters Optional[Dict[str, Any]]

Parameters for the DataLoader. In the documentation all possible parameters are listed.

None

Returns:

Type Description
DataLoader

torch.utils.data.DataLoader: A DataLoader object

Source code in src/autoembedder/data.py
def dataloader(
    source: Union[str, dd.DataFrame, pd.DataFrame],
    parameters: Optional[Dict[str, Any]] = None,
) -> DataLoader:
    """
    Args:
        source (Union[str, dask.DataFrame, pandas.DataFrame]): Path to Dask/Pandas DataFrame stored as Parquet or a Dask/Pandas DataFrame.
        parameters (Optional[Dict[str, Any]]): Parameters for the DataLoader.
            In the [documentation](https://chrislemke.github.io/autoembedder/#parameters) all possible parameters are listed.

    Returns:
        torch.utils.data.DataLoader: A DataLoader object
    """
    if parameters is None:
        parameters = {}

    return DataLoader(
        dataset=_Dataset(source, parameters.get("drop_cat_columns", False)),
        batch_size=parameters.get("batch_size", 32),
        pin_memory=parameters.get("pin_memory", True),
        num_workers=parameters.get("num_workers", 0),
        drop_last=parameters.get("drop_last", True),
    )