Utils
check_data(X, y, check_nans=True)
¶
Checks if the data has the correct types, shapes and does not contain any missing values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
The features. |
required |
y |
pandas.Series
|
The target variable. |
required |
check_nans |
bool
|
Whether to check for missing values. Defaults to |
True
|
Raises:
Type | Description |
---|---|
TypeError
|
If the features are not a |
ValueError
|
If the features or target variable contain missing values. |
Returns:
Type | Description |
---|---|
None
|
None |
Source code in src/sk_transformers/utils.py
check_ready_to_transform(transformer, X, features, force_all_finite=True, dtype=None, return_polars=False)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transformer |
Any
|
The transformer that calls this function. It must be a subclass of |
required |
X |
pandas.DataFrame
|
|
required |
features |
Optional[Union[str, List[str]]]
|
The features to check if they are in the dataframe. |
required |
force_all_finite |
Union[bool, str]
|
Whether to raise an error on np.inf and np.nan in X. The possibilities are: - True: Force all values of array to be finite. - False: accepts np.inf, np.nan, pd.NA in array. - "allow-nan": accepts only np.nan and pd.NA values in array. Values cannot be infinite. |
True
|
dtype |
Optional[Union[str, List[str]]]
|
Data type of result. If None, the |
None
|
Raises:
Type | Description |
---|---|
TypeError
|
If the input |
ValueError
|
If the input |
ValueError
|
If the input is an empty Pandas dataframe. |
ValueError
|
If the input |
ValueError
|
if the input |
Returns:
Type | Description |
---|---|
Union[pd.DataFrame, pl.DataFrame]
|
pandas.DataFrame: A checked copy of original dataframe. |
Source code in src/sk_transformers/utils.py
prepare_categorical_data(X, categories)
¶
Checks for the validity of the categorical features inside the
dataframe. And prepares the data for further processing by changing the
dtypes
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
The dataframe containing the categorical features. |
required |
categories |
List[Tuple[str, int]]
|
The list of categorical features and their thresholds. If the number of unique values is greater than the threshold, the feature is not considered categorical. |
required |
Raises:
Type | Description |
---|---|
TypeError
|
If the features are not a |
ValueError
|
If the categorical features are not in the dataframe. |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: The original dataframe with the categorical features converted to |