Build a model through repeated encoding#

In this guide, we will build an ensemble estimator using bayesian target encoding. First, initialize the bayte class for your modelling problem your base estimator as well as an initialized encoder and the number of samples you want to draw.

In this classification example, we will fit a logistic regression.

from sklearn.linear_model import LogisticRegression

import bayte as bt

estimator = bt.BayesianTargetClassifier(
    base_estimator=LogisticRegression(),
    encoder=bt.BayesianTargetEncoder(dist="bernoulli"),
    n_estimators=10,
)

Next, call fit. The fit call accepts a categorical_feature parameter for specifying which features in the training dataset should be encoded. If you are using a numpy.ndarray, specify the indices of the categorical columns. If you are using a pandas.DataFrame, specify the column names. If not set, any pandas columns with a categorical data type will be encoded; for numpy users, no supplied list indicates that all features are categorical.

estimator.fit(X, y, categorical_feature=[1, 2, 5])

For regression problems, predict will produce an average of each estimator prediction. For classification, predict depends on an estimator initialization parameter called voting[1]:

voting value

Description

hard (default)

The predicted class label is a majority vote of the
predicted labels from each estimator.

soft

The predicted class label is based on the sums of
predicted probabilities for each class.

For well-calibrated classifiers, soft voting is preferred.