Generic transformer
AggregateTransformer
¶
Bases: BaseTransformer
This transformer uses Pandas groupby
method and aggregate
to apply
function on a column grouped by another column. Read more about Pandas
aggregate
method to understand how to use function for aggregation. Other
than Pandas function this transformer only support functions and string-
names.
Internally this transformer uses Polars. You may encounter issues with your implementation. Please check the Polars documentation for more information: https://pola-rs.github.io/polars/py-polars/html/reference/
Example:
import pandas as pd
from sk_transformers import AggregateTransformer
X = pd.DataFrame(
{
"foo": ["mr", "mr", "ms", "ms", "ms", "mr", "mr", "mr", "mr", "ms"],
"bar": [46, 32, 78, 48, 93, 68, 53, 38, 76, 56],
}
)
transformer = AggregateTransformer([("foo", ("bar", "mean", "MEAN(foo_bar)"))])
transformer.fit_transform(X)
foo bar MEAN(foo_bar)
0 mr 46 52.166668
1 mr 32 52.166668
2 ms 78 68.750000
3 ms 48 68.750000
4 ms 93 68.750000
5 mr 68 52.166668
6 mr 53 52.166668
7 mr 38 52.166668
8 mr 76 52.166668
9 ms 56 68.750000
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, Tuple[str, Union[str, Callable], str]]]
|
List of tuples where the first element is the name or the list of names of columns to be grouped-by, and the second element is a tuple or list of tuples containing the aggregation information. In this tuple, the first element is the name of the column to be aggregated, the second element is the aggregation function as a string or function object, and the third element is the name of the new aggregated column. |
required |
Source code in src/sk_transformers/generic_transformer.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
|
transform(X)
¶
Creates new columns by using Pandas groupby
method and aggregate
to apply function on the column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame
|
Input dataframe. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Transformed dataframe. It contains the original columns and the new columns created by this transformer. |
Source code in src/sk_transformers/generic_transformer.py
AllowedValuesTransformer
¶
Bases: BaseTransformer
Replaces all values that are not in a list of allowed values with a replacement value. This performs an complementary transformation to that of the ValueReplacerTransformer. This is useful while lumping several minor categories together by selecting them using a list of major categories.
Example:
import pandas as pd
from sk_transformers.generic_transformer import AllowedValuesTransformer
X = pd.DataFrame({"foo": ["a", "b", "c", "d", "e"]})
transformer = AllowedValuesTransformer([("foo", ["a", "b"], "other")])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, List[Any], Any]]
|
List of tuples where the first element is the column name, the second element is the list of allowed values in the column, and the third element is the value to replace disallowed values in the column. |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Replaces values not in a list with another value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame
|
Dataframe containing the columns with values to be replaced. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe with replaced values. |
Source code in src/sk_transformers/generic_transformer.py
ColumnDropperTransformer
¶
Bases: BaseTransformer
Drops columns from a dataframe using Pandas drop
method.
Example:
import pandas as pd
from sk_transformers import ColumnDropperTransformer
X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = ColumnDropperTransformer(["foo"])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Union[str, List[str]]
|
Columns to drop. Either a single column name or a list of column names. |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Returns the dataframe with the columns
dropped.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame
|
Dataframe to drop columns from. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe with columns dropped. |
Source code in src/sk_transformers/generic_transformer.py
ColumnEvalTransformer
¶
Bases: BaseTransformer
Provides the possibility to use Pandas methods on columns. Internally this transformer uses Polars. You may encounter issues with your implementation. Please check the Polars documentation for more information: https://pola-rs.github.io/polars/py-polars/html/reference/
Example:
import pandas as pd
from sk_transformers import ColumnEvalTransformer
# In this example we use Polars implementation of Pandas `str.upper()`: `str.to_uppercase()`.
X = pd.DataFrame({"foo": ["a", "b", "c"], "bar": [1, 2, 3]})
transformer = ColumnEvalTransformer(
[("foo", "str.to_uppercase()"), ("bar", "apply(lambda x: x + 1)")]
)
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Union[Tuple[str, str], Tuple[str, str, str]]]
|
List of tuples containing the column name and the method ( |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the |
ValueError
|
If the |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Transform the dataframe by using the eval
function provided.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
dataframe to transform. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: Transformed dataframe. |
Source code in src/sk_transformers/generic_transformer.py
DtypeTransformer
¶
Bases: BaseTransformer
Transformer that converts a column to a different dtype.
Example:
import numpy as np
import pandas as pd
from sk_transformers import DtypeTransformer
X = pd.DataFrame({"foo": [1, 2, 3], "bar": ["a", "a", "b"]})
transformer = DtypeTransformer([("foo", np.float32), ("bar", "category")])
transformer.fit_transform(X).dtypes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, Union[str, type]]]
|
List of tuples containing the column name and the dtype ( |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Transform the dataframe by converting the columns to the specified dtypes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
dataframe to transform. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: Transformed dataframe. |
Source code in src/sk_transformers/generic_transformer.py
FunctionsTransformer
¶
Bases: BaseTransformer
This transformer is a plain wrapper around the.
sklearn.preprocessing.FunctionTransformer
. Its
main function is to apply multiple functions to different columns. Other
than the scikit-learn transformer, this transformer does not support the
inverse_func
, accept_sparse
, feature_names_out
and, inv_kw_args
parameters.
Example:
import numpy as np
import pandas as pd
from sk_transformers import FunctionsTransformer
X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = FunctionsTransformer([("foo", np.log1p, None), ("bar", np.sqrt, None)])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[str, Callable, Optional[Dict[str, Any]]]
|
List of tuples containing the name of the
column to apply the function on and the function itself.
As well as a dictionary passed to the function as |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Applies the functions to the columns, and returns the dataframe with the modified columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
DataFrame containing the columns to apply the functions on. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: The original dataframe with the modified columns. |
Source code in src/sk_transformers/generic_transformer.py
LeftJoinTransformer
¶
Bases: BaseTransformer
Performs a database-style left-join using pd.merge
. This transformer
is suitable for replacing values in a column of a dataframe by looking-up
another pd.DataFrame
or pd.Series
. Note that, the join is based on the
index of the right dataframe.
Example:
import pandas as pd
from sk_transformers import LeftJoinTransformer
X = pd.DataFrame({"foo": ["A", "B", "C", "A", "C"]})
lookup_df = pd.Series([1, 2, 3], index=["A", "B", "C"], name="values")
transformer = LeftJoinTransformer([("foo", lookup_df)])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, Union[pd.Series, pd.DataFrame]]]
|
A list of tuples where the first element is the name of the column and the second element is the look-up dataframe or series. |
required |
Source code in src/sk_transformers/generic_transformer.py
730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 |
|
transform(X)
¶
Perform a left-join on the given columns of a dataframe with another cooresponding dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame
|
Dataframe containing the columns to be joined on. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe joined on the given columns. |
Source code in src/sk_transformers/generic_transformer.py
MapTransformer
¶
Bases: BaseTransformer
This transformer iterates over all columns in the features
list and
applies the given callback to the column.
Internally this transformer uses Polars. You may encounter issues with your implementation. Please check the Polars documentation for more information: https://pola-rs.github.io/polars/py-polars/html/reference/
Example:
import pandas as pd
from sk_transformers import MapTransformer
X = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
transformer = MapTransformer([("foo", lambda x: x + 1)])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, Callable]]
|
List of tuples containing the name of the column to apply the callback on and the callback itself. |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Applies the callback to the column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
Dataframe containing the the columns to apply the callback on. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: The dataframe containing the new column together with the non-transformed original columns. |
Source code in src/sk_transformers/generic_transformer.py
NaNTransformer
¶
Bases: BaseTransformer
Replace NaN values with a specified value. Internally Pandas fillna
method is used.
Example:
from sk_transformers import NaNTransformer
import pandas as pd
import numpy as np
X = pd.DataFrame({"foo": [1, np.NaN, 3], "bar": ["a", np.NaN, "c"]})
transformer = NaNTransformer([("foo", -999), ("bar", "-999")])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, Any]]
|
List of tuples where the first element is the column name, and the second is the value to replace NaN with. |
required |
Raises:
Type | Description |
---|---|
TypeError
|
If the value to replace NaN with is not a number, but the column is a number or vice versa. |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Replace NaN values with a specified value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
Dataframe to transform. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: Transformed dataframe. |
Source code in src/sk_transformers/generic_transformer.py
QueryTransformer
¶
Bases: BaseTransformer
Applies a list of queries to a dataframe. If it operates on a dataset
used for supervised learning this transformer should be applied on the
dataframe containing X
and y
. So removing of columns by queries also
removes the corresponding y
value.
Example:
import pandas as pd
from sk_transformers import QueryTransformer
X = pd.DataFrame({"foo": [1, 8, 3, 6, 5, 4, 7, 2]})
transformer = QueryTransformer(["foo > 4"])
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
queries |
List[str]
|
List of query string to evaluate to the dataframe. |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(Xy)
¶
Applies the list of queries to the dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
Xy |
pd.DataFrame
|
Dataframe to apply the queries to. For also operating on the target column |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe with the queries applied. |
Source code in src/sk_transformers/generic_transformer.py
ValueIndicatorTransformer
¶
Bases: BaseTransformer
Adds a column to a dataframe indicating if a value is equal to a
specified value. The idea behind this method is, that it is often useful to
know if a NaN
value was present in the original data and has been changed
by some imputation step. Sometimes the present of a NaN
value is actually
important information. But obviously this method works with any kind of
data.
NaN
, None
or np.nan
are Not caught by this implementation.
Example:
from sk_transformers import ValueIndicatorTransformer
import pandas as pd
X = pd.DataFrame({"foo": [1, -999, 3], "bar": ["a", "-999", "c"]})
transformer = ValueIndicatorTransformer([("foo", -999), ("bar", "-999")])
transformer.fit_transform(X).to_dict()
foo bar foo_found_indicator bar_found_indicator
0 1 a False False
1 -999 -999 True True
2 3 c False False
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[str, Any]]
|
A list of tuples where the first value in represents the column name and the second value represents the value to check for. |
required |
Source code in src/sk_transformers/generic_transformer.py
transform(X)
¶
Add a column to a dataframe indicating if a value is equal to a specified value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pandas.DataFrame
|
Dataframe to transform. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pandas.DataFrame: Transformed dataframe containing columns indicating if a certain value was found.
Format of the new columns: |
Source code in src/sk_transformers/generic_transformer.py
ValueReplacerTransformer
¶
Bases: BaseTransformer
Uses Pandas replace
method to replace values in a column. This
transformer loops over the features
and applies replace
to the
according columns. If the column is not from type string but a valid
regular expression is provided the column will be temporarily changed to a
string column and after the manipulation by replace
changed back to its
original type. It may happen, that this type changing fails if the modified
column is not compatible with its original type.
Example:
import pandas as pd
from sk_transformers import ValueReplacerTransformer
X = pd.DataFrame(
{"foo": ["0000-01-01", "2022/01/08", "bar", "1982-12-7", "28-09-2022"]}
)
transformer = ValueReplacerTransformer(
[
(
["foo"],
r"^(?!(19|20)\d\d[-\/.](0[1-9]|1[012]|[1-9])[-\/.](0[1-9]|[12][0-9]|3[01]|[1-9])$).*",
"1900-01-01",
)
]
)
transformer.fit_transform(X)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
features |
List[Tuple[List[str], str, Any]]
|
List of tuples containing the column names as a list, the value to replace (can be a regex), and the replacement value. |
required |
Source code in src/sk_transformers/generic_transformer.py
643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 |
|
transform(X)
¶
Replaces a value or regular expression with another value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
pd.DataFrame
|
Dataframe containing the columns where values should be replaced. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Dataframe with replaced values. |