How to contribute¶

First of all, thank you 🙏 for your interest in contributing to this project. This project is open source and is therefore open to contributions from the community. The following document describes how you can contribute to it.

Check out this dummy example of how to create a custom transformer ready for use in a pipeline:

Example of a custom transformer¶

Your custom transformer could look something like this:

 from from  class                             2

More documentation than code. You know how it is 🤷‍♂️. This transformer does not href="#__codelineno-0-1">import pandas as pd sk_transformers.base_transformer import BaseTransformer sk_transformers.utils import check_ready_to_transform DummyTransformer(BaseTransformer): """ Replaces all strings in a given column with `dummy`. Args: string_to_replace (str): The string which should be replaced by `dummy`. column (str): The column to replace the strings with dummy. Example: ```python import pandas as pd from sklearn.pipeline import Pipeline df = pd.DataFrame({ "cocktail": ["French Connection", "Incredible Hulk", "Tom and Jerry"], "bar": ["foo", "Schikaneder", "Futuregarden"] }) transformer = DummyTransformer("foo", "bar") transformer.fit_transform(df) ``` ``` cocktail bar 0 French Connection DUMMY! 1 Incredible Hulk Schikaneder Tom and Jerry Futuregarden ``` """ class="k">def __init__(self, string_to_replace: str, column: str) -> None: super().__init__() self.string_to_replace = string_to_replace self.column = column class="k">def transform(self, X: pd.DataFrame) -> pd.DataFrame: """ Replaces all occurrences of `string_to_replace` in a certain column of X with `DUMMY!`. Args: X (pd.DataFrame): Dataframe containing the column where the replacement should happen. Returns: pandas.DataFrame: Dataframe with replaced strings. """ X = check_ready_to_transform(self, X, self.column) X[self.column] = X[self.column].replace(self.string_to_replace, "DUMMY!") return X need an fit method because it does not learn anything from the data. It just replaces a string with another string.

Now you can use it:

import pandas as pd
from sklearn.pipeline import Pipeline

df = pd.DataFrame({
    "cocktail": ["French Connection", "Incredible Hulk", "Tom and Jerry"],
    "bar": ["foo", "Schikaneder", "Futuregarden"]
})

pipeline = Pipeline([
    ("dummy_transformer", DummyTransformer("foo", "bar")),
])

pipeline.fit_transform(df).head()

            cocktail           bar
0  French Connection        DUMMY!
1    Incredible Hulk   Schikaneder
2      Tom and Jerry  Futuregarden

For a non-dummy examples check out the MathExpressionTransformer or the ValueIndicatorTransformer for a simpler example.

Poetry¶

We are using Poetry to manage the dependencies, for deployment, and the virtual environment. If you have not used it before please check out the documentation to get started.

If you want to start working on the project. The first thing you have to do is:

poetry install --with test

This installs all needed dependencies for development and testing.

Pre-commit hooks¶

We are using pre-commit to ensure a consistent code style and to avoid common mistakes. Please install the pre-commit and install the hook with:

pre-commit install
pre-commit install --hook-type commit-msg

Homebrew¶

We are using Homebrew to manage the dependencies for the development environment. Please install Homebrew and run

 brew bundle

to install the dependencies. If you don't want/can't use Homebrew, you can also install the dependencies manually.

Conventional Commits¶

We are using Conventional Commits to ensure a consistent commit message style. Please use the following commit message format:

<type>[optional scope]: <description>

E.g.:

feat: add a new fantastic transformer 🤖

How to contribute¶

The following steps will give a short guide on how to contribute to this project:

Create a personal fork of the project on GitHub.
Clone the fork on your local machine. Your remote repo on GitHub is called origin.
Add the original repository as a remote called upstream.
If you created your fork a while ago be sure to pull upstream changes into your local repository.
Create a new branch to work on! Start from develop if it exists, else from main.
Implement/fix your feature, comment your code, and add some examples.
Follow the code style of the project, including indentation. Black, isort, Pylint, and mypy can help you with it.
Run all tests.
Write or adapt tests as needed.
Add or change the documentation as needed. Please follow the "Google Python Style Guide".
Squash your commits into a single commit with git's interactive rebase. Create a new branch if necessary.
Push your branch to your fork on GitHub, the remote origin.
From your fork open a pull request in the correct branch. Target the project's develop branch!
Once the pull request is approved and merged you can pull the changes from upstream to your local repo and delete your extra branch(es).