Utils
check_data(X, y, check_nans=True)
¶
  Checks if the data has the correct types, shapes and does not contain any missing values.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| X | pandas.DataFrame | The features. | required | 
| y | pandas.Series | The target variable. | required | 
| check_nans | bool | Whether to check for missing values. Defaults to  | True | 
Raises:
| Type | Description | 
|---|---|
| TypeError | If the features are not a  | 
| ValueError | If the features or target variable contain missing values. | 
Returns:
| Type | Description | 
|---|---|
| None | None | 
Source code in src/sk_transformers/utils.py
        
check_ready_to_transform(transformer, X, features, force_all_finite=True, dtype=None, return_polars=False)
¶
  Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| transformer | Any | The transformer that calls this function. It must be a subclass of  | required | 
| X | pandas.DataFrame | 
 | required | 
| features | Optional[Union[str, List[str]]] | The features to check if they are in the dataframe. | required | 
| force_all_finite | Union[bool, str] | Whether to raise an error on np.inf and np.nan in X. The possibilities are: - True: Force all values of array to be finite. - False: accepts np.inf, np.nan, pd.NA in array. - "allow-nan": accepts only np.nan and pd.NA values in array. Values cannot be infinite. | True | 
| dtype | Optional[Union[str, List[str]]] | Data type of result. If None, the  | None | 
Raises:
| Type | Description | 
|---|---|
| TypeError | If the input  | 
| ValueError | If the input  | 
| ValueError | If the input is an empty Pandas dataframe. | 
| ValueError | If the input  | 
| ValueError | if the input  | 
Returns:
| Type | Description | 
|---|---|
| Union[pd.DataFrame, pl.DataFrame] | pandas.DataFrame: A checked copy of original dataframe. | 
Source code in src/sk_transformers/utils.py
        
prepare_categorical_data(X, categories)
¶
  Checks for the validity of the categorical features inside the
dataframe. And prepares the data for further processing by changing the
dtypes.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| X | pandas.DataFrame | The dataframe containing the categorical features. | required | 
| categories | List[Tuple[str, int]] | The list of categorical features and their thresholds. If the number of unique values is greater than the threshold, the feature is not considered categorical. | required | 
Raises:
| Type | Description | 
|---|---|
| TypeError | If the features are not a  | 
| ValueError | If the categorical features are not in the dataframe. | 
Returns:
| Type | Description | 
|---|---|
| pd.DataFrame | pandas.DataFrame: The original dataframe with the categorical features converted to  |