Skip to content

How to contribute

First of all, thank you 🙏 for your interest in contributing to this project. This project is open source and is therefore open to contributions from the community. The following document describes how you can contribute to it.

Check out this dummy example of how to create a custom transformer ready for use in a pipeline:

Example of a custom transformer

Your custom transformer could look something like this:

import pandas as pd

from sk_transformers.base_transformer import BaseTransformer
from sk_transformers.utils import check_ready_to_transform

class DummyTransformer(BaseTransformer):
    """
    Replaces all strings in a given column with `dummy`.

    Args:
        string_to_replace (str): The string which should be replaced by `dummy`.
        column (str): The column to replace the strings with dummy.

    Example:
    ```python
    import pandas as pd
    from sklearn.pipeline import Pipeline

    df = pd.DataFrame({
        "cocktail": ["French Connection", "Incredible Hulk", "Tom and Jerry"],
        "bar": ["foo", "Schikaneder", "Futuregarden"]
    })

    transformer = DummyTransformer("foo", "bar")
    transformer.fit_transform(df)
    ```
    ```
                cocktail           bar
    0  French Connection        DUMMY!
    1    Incredible Hulk   Schikaneder
    2      Tom and Jerry  Futuregarden
    ```
    """
    def __init__(self, string_to_replace: str, column: str) -> None:
        super().__init__()
        self.string_to_replace = string_to_replace
        self.column = column

    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
        """
         Replaces all occurrences of `string_to_replace`
         in a certain column of X with `DUMMY!`.

        Args:
            X (pd.DataFrame): Dataframe containing the
            column where the replacement should happen.

        Returns:
            pandas.DataFrame: Dataframe with replaced strings.
        """
        X = check_ready_to_transform(self, X, self.column)

        X[self.column] = X[self.column].replace(self.string_to_replace, "DUMMY!")
        return X
More documentation than code. You know how it is 🤷‍♂️. This transformer does not need an fit method because it does not learn anything from the data. It just replaces a string with another string.

Now you can use it:

import pandas as pd
from sklearn.pipeline import Pipeline

df = pd.DataFrame({
    "cocktail": ["French Connection", "Incredible Hulk", "Tom and Jerry"],
    "bar": ["foo", "Schikaneder", "Futuregarden"]
})

pipeline = Pipeline([
    ("dummy_transformer", DummyTransformer("foo", "bar")),
])

pipeline.fit_transform(df).head()
            cocktail           bar
0  French Connection        DUMMY!
1    Incredible Hulk   Schikaneder
2      Tom and Jerry  Futuregarden
For a non-dummy examples check out the MathExpressionTransformer or the ValueIndicatorTransformer for a simpler example.

Poetry

We are using Poetry to manage the dependencies, for deployment, and the virtual environment. If you have not used it before please check out the documentation to get started.

If you want to start working on the project. The first thing you have to do is:

poetry install --with test
This installs all needed dependencies for development and testing.

Pre-commit hooks

We are using pre-commit to ensure a consistent code style and to avoid common mistakes. Please install the pre-commit and install the hook with:

pre-commit install
pre-commit install --hook-type commit-msg

Homebrew

We are using Homebrew to manage the dependencies for the development environment. Please install Homebrew and run

 brew bundle
to install the dependencies. If you don't want/can't use Homebrew, you can also install the dependencies manually.

Conventional Commits

We are using Conventional Commits to ensure a consistent commit message style. Please use the following commit message format:

<type>[optional scope]: <description>
E.g.:
feat: add a new fantastic transformer 🤖

How to contribute

The following steps will give a short guide on how to contribute to this project:

  • Create a personal fork of the project on GitHub.
  • Clone the fork on your local machine. Your remote repo on GitHub is called origin.
  • Add the original repository as a remote called upstream.
  • If you created your fork a while ago be sure to pull upstream changes into your local repository.
  • Create a new branch to work on! Start from develop if it exists, else from main.
  • Implement/fix your feature, comment your code, and add some examples.
  • Follow the code style of the project, including indentation. Black, isort, Pylint, and mypy can help you with it.
  • Run all tests.
  • Write or adapt tests as needed.
  • Add or change the documentation as needed. Please follow the "Google Python Style Guide".
  • Squash your commits into a single commit with git's interactive rebase. Create a new branch if necessary.
  • Push your branch to your fork on GitHub, the remote origin.
  • From your fork open a pull request in the correct branch. Target the project's develop branch!
  • Once the pull request is approved and merged you can pull the changes from upstream to your local repo and delete your extra branch(es).