Encode your categorical variables#

To encode your variables, you have to first choose a likelihood for your target.

Model type

Description

Likelihood

Classification

Binary

bernoulli

Multi-class

multinomial

Regression

normal

exponential

gamma

invgamma

Important

The normal likelihood assumes a known variance that is estimated from the training data. Similarly, the gamma and inverse gamma likelihoods assume a known shape parameter. Both of these assumptions were made to help make implementing the algorithm easier.

Basic usage#

Once you’ve chosen your likelihood, import and fit the encoder on your data. Suppose you have X and y, with three categorical columns: 1, 2, and 5.

import bayte as bt

encoder = bt.BayesianTargetEncoder(dist=...)
encoder.fit(X[:, [1, 2, 5]], y)

By default, when you transform the data

X_encoded = encoder.transform(X[:, [1, 2, 5]])

the encoding level will be the mean of the posterior distribution for the level. To sample, set sample=True on encoder initialization.

Important

The encoder has support for joblib. Since the encoding procedure involves generating posterior parameters for every categorical level in every supplied variable, it can be computationally inefficient if executed serially.

Changing hyperparameter initialization#

If you want to change how the hyperparameters are initialized for a given likelihood, supply a callable for the initializer argument. This callable must take the dist and the target values y and return a tuple of the parameters.

Important

Although you can change the initializer, your code will break if you try to implement a new likelihood.