sparklightautoml.transformers

Basic feature generation steps and helper utils.

Base Classes

SparkBaseEstimator

Base class for estimators from sparklightautoml.transformers.

SparkBaseTransformer

Base class for transformers from sparklightautoml.transformers.

SparkChangeRolesTransformer

Transformer that change roles for input columns.

SparkSequentialTransformer

Entity that represents sequential of transformers in preprocess pipeline.

SparkUnionTransformer

Entity that represents parallel layers (transformers) in preprocess pipeline.

SparkColumnsAndRoles

Helper and base class for SparkBaseTransformer and SparkBaseEstimator.

HasInputRoles

Mixin for param inputCols: input column names.

HasOutputRoles

Mixin for param inputCols: input column names.

DropColumnsTransformer

Transformer that drops columns from input dataframe.

PredictionColsTransformer

Converts prediction columns values from ONNX model format to LGBMCBooster format

ProbabilityColsTransformer

Converts probability columns values from ONNX model format to LGBMCBooster format

Numeric

SparkFillnaMedianEstimator

Fillna with median.

SparkNaNFlagsEstimator

Estimator that calculate nan rate for input columns and build SparkNaNFlagsTransformer.

SparkQuantileBinningEstimator

Discretization of numeric features by quantiles.

SparkStandardScalerEstimator

Classic StandardScaler.

SparkFillInfTransformer

Transformer that replace inf values to np.nan values in input columns.

SparkFillnaMedianTransformer

Fillna with median.

SparkLogOddsTransformer

Convert probs to logodds.

SparkNaNFlagsTransformer

Adds columns with nan flags (0 or 1) for input columns.

SparkQuantileBinningTransformer

Adds column with quantile bin number of input columns.

SparkStandardScalerTransformer

Classic StandardScaler.

Categorical

SparkLabelEncoderEstimator

Spark label encoder estimator.

SparkOrdinalEncoderEstimator

Spark ordinal encoder estimator.

SparkFreqEncoderEstimator

Calculates frequency in train data and produces SparkFreqEncoderTransformer instance.

SparkCatIntersectionsEstimator

Combines categorical features and fits SparkLabelEncoderEstimator.

SparkTargetEncoderEstimator

Spark target encoder estimator.

SparkMulticlassTargetEncoderEstimator

Spark multiclass target encoder estimator.

SparkOHEEncoderEstimator

Simple OneHotEncoder over label encoded categories.

SparkLabelEncoderTransformer

Simple Spark version of LabelEncoder.

SparkOrdinalEncoderTransformer

Spark version of OrdinalEncoder.

SparkFreqEncoderTransformer

Labels are encoded with frequency in train data.

SparkCatIntersectionsTransformer

Combines category columns and encode with label encoder.

SparkMultiTargetEncoderTransformer

Spark multiclass target encoder transformer.

SparkCatIntersectionsHelper

Helper class for SparkCatIntersectionsEstimator and SparkCatIntersectionsTransformer.

Categorical (Scala)

laml_string_indexer.LAMLStringIndexer

Custom implementation of PySpark StringIndexer wrapper

laml_string_indexer.LAMLStringIndexerModel

Model fitted by StringIndexer.

Datetime

SparkDateSeasonsEstimator

SparkTimeToNumTransformer

Transforms datetime columns values to numeric values.

SparkBaseDiffTransformer

Basic conversion strategy, used in selection one-to-one transformers.

SparkDateSeasonsTransformer

Extracts unit of time from Datetime values and marks holiday dates.

SparkDatetimeHelper

Helper class for SparkTimeToNumTransformer, SparkBaseDiffTransformer and SparkDateSeasonsTransformer