How to contribute¶
First of all, thank you 🙏 for your interest in contributing to this project. This project is open source and is therefore open to contributions from the community. The following document describes how you can contribute to it.
Check out this dummy example of how to create a custom transformer ready for use in a pipeline:
Example of a custom transformer¶
Your custom transformer could look something like this:
import pandas as pd
from sk_transformers.base_transformer import BaseTransformer
from sk_transformers.utils import check_ready_to_transform
class DummyTransformer(BaseTransformer):
"""
Replaces all strings in a given column with `dummy`.
Args:
string_to_replace (str): The string which should be replaced by `dummy`.
column (str): The column to replace the strings with dummy.
Example:
```python
import pandas as pd
from sklearn.pipeline import Pipeline
df = pd.DataFrame({
"cocktail": ["French Connection", "Incredible Hulk", "Tom and Jerry"],
"bar": ["foo", "Schikaneder", "Futuregarden"]
})
transformer = DummyTransformer("foo", "bar")
transformer.fit_transform(df)
```
```
cocktail bar
0 French Connection DUMMY!
1 Incredible Hulk Schikaneder
2 Tom and Jerry Futuregarden
```
"""
def __init__(self, string_to_replace: str, column: str) -> None:
super().__init__()
self.string_to_replace = string_to_replace
self.column = column
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
"""
Replaces all occurrences of `string_to_replace`
in a certain column of X with `DUMMY!`.
Args:
X (pd.DataFrame): Dataframe containing the
column where the replacement should happen.
Returns:
pandas.DataFrame: Dataframe with replaced strings.
"""
X = check_ready_to_transform(self, X, self.column)
X[self.column] = X[self.column].replace(self.string_to_replace, "DUMMY!")
return X
fit
method because it does not learn anything from the data. It just replaces a string with another string.
Now you can use it:
import pandas as pd
from sklearn.pipeline import Pipeline
df = pd.DataFrame({
"cocktail": ["French Connection", "Incredible Hulk", "Tom and Jerry"],
"bar": ["foo", "Schikaneder", "Futuregarden"]
})
pipeline = Pipeline([
("dummy_transformer", DummyTransformer("foo", "bar")),
])
pipeline.fit_transform(df).head()
MathExpressionTransformer
or the ValueIndicatorTransformer
for a simpler example.
Poetry¶
We are using Poetry to manage the dependencies, for deployment, and the virtual environment. If you have not used it before please check out the documentation to get started.
If you want to start working on the project. The first thing you have to do is:
This installs all needed dependencies for development and testing.Pre-commit hooks¶
We are using pre-commit to ensure a consistent code style and to avoid common mistakes. Please install the pre-commit and install the hook with:
Homebrew¶
We are using Homebrew to manage the dependencies for the development environment. Please install Homebrew and run
to install the dependencies. If you don't want/can't use Homebrew, you can also install the dependencies manually.Conventional Commits¶
We are using Conventional Commits to ensure a consistent commit message style. Please use the following commit message format:
E.g.:How to contribute¶
The following steps will give a short guide on how to contribute to this project:
- Create a personal fork of the project on GitHub.
- Clone the fork on your local machine. Your remote repo on GitHub is called
origin
. - Add the original repository as a remote called
upstream
. - If you created your fork a while ago be sure to pull upstream changes into your local repository.
- Create a new branch to work on! Start from
develop
if it exists, else frommain
. - Implement/fix your feature, comment your code, and add some examples.
- Follow the code style of the project, including indentation. Black, isort, Pylint, and mypy can help you with it.
- Run all tests.
- Write or adapt tests as needed.
- Add or change the documentation as needed. Please follow the "Google Python Style Guide".
- Squash your commits into a single commit with git's interactive rebase. Create a new branch if necessary.
- Push your branch to your fork on GitHub, the remote
origin
. - From your fork open a pull request in the correct branch. Target the project's
develop
branch! - Once the pull request is approved and merged you can pull the changes from
upstream
to your local repo and delete your extra branch(es).