API Reference

Groupyr contains estimator classes that are fully compliant with the scikit-learn ecosystem. Consequently, their initialization, fit, predict, transform, and score methods will be familiar to sklearn users.

Sparse Groups Lasso Estimators

These are groupyr’s canonical estimators. SGL is intended for regression problems while LogisticSGL is intended for classification problems.

class groupyr.SGL(l1_ratio=1.0, alpha=0.0, groups=None, scale_l2_by='group_length', fit_intercept=True, max_iter=1000, tol=1e-07, warm_start=False, verbose=0, suppress_solver_warnings=True, include_solver_trace=False)[source]

An sklearn compatible sparse group lasso regressor.

This solves the sparse group lasso [1] problem for a feature matrix partitioned into groups using the proximal gradient descent (PGD) algorithm.

Parameters:
l1_ratiofloat, default=1.0

Hyper-parameter : Combination between group lasso and lasso. l1_ratio=0 gives the group lasso and l1_ratio=1 gives the lasso.

alphafloat, default=1.0

Hyper-parameter : overall regularization strength.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=1000

Maximum number of iterations for PGD solver.

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_.

verboseint, default=0

Verbosity flag for PGD solver. Any positive integer will produce verbose output

suppress_solver_warningsbool, default=True

If True, suppress convergence warnings from PGD solver. This is useful for hyperparameter tuning when some combinations of hyperparameters may not converge.

References

[1]

Noah Simon, Jerome Friedman, Trevor Hastie & Robert Tibshirani, “A Sparse-Group Lasso,” Journal of Computational and Graphical Statistics, vol. 22:2, pp. 231-245, 2012 DOI: 10.1080/10618600.2012.681250

Attributes:
coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ coef_ + intercept_).

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

class groupyr.LogisticSGL(l1_ratio=1.0, alpha=0.0, groups=None, scale_l2_by='group_length', fit_intercept=True, max_iter=1000, tol=1e-07, warm_start=False, verbose=0, suppress_solver_warnings=True, include_solver_trace=False)[source]

An sklearn compatible sparse group lasso classifier.

This solves the sparse group lasso [1] problem for a feature matrix partitioned into groups using the proximal gradient descent (PGD) algorithm.

Parameters:
l1_ratiofloat, default=1.0

Hyper-parameter : Combination between group lasso and lasso. l1_ratio=0 gives the group lasso and l1_ratio=1 gives the lasso.

alphafloat, default=0.0

Hyper-parameter : overall regularization strength.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

fit_interceptbool, default=True

Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).

max_iterint, default=1000

Maximum number of iterations for PGD solver.

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

warm_startbool, default=False

If set to True, reuse the solution of the previous call to fit as initialization for coef_ and intercept_.

verboseint, default=0

Verbosity flag for PGD solver. Any positive integer will produce verbose output

suppress_solver_warningsbool, default=True

If True, suppress convergence warnings from PGD solver. This is useful for hyperparameter tuning when some combinations of hyperparameters may not converge.

References

[1]

Noah Simon, Jerome Friedman, Trevor Hastie & Robert Tibshirani, “A Sparse-Group Lasso,” Journal of Computational and Graphical Statistics, vol. 22:2, pp. 231-245, 2012 DOI: 10.1080/10618600.2012.681250

Attributes:
classes_ndarray of shape (n_classes, )

A list of class labels known to the classifier.

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ coef_ + intercept_).

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

n_iter_int

Actual number of iterations used in the solver.

Cross-validation Estimators

These estimators have built-in cross-validation capabilities to find the best values of the hyperparameters alpha and l1_ratio. These are more efficient than using the canonical estimators with grid search because they make use of warm-starting. Alternatively, you can specify tuning_strategy = "bayes" to use Bayesian optimization over the hyperparameters instead of a grid search.

class groupyr.SGLCV(l1_ratio=1.0, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, max_iter=1000, tol=1e-07, copy_X=True, scoring=None, cv=None, verbose=False, n_jobs=None, tuning_strategy='grid', n_bayes_iter=50, n_bayes_points=1, random_state=None, suppress_solver_warnings=True)[source]

Iterative SGL model fitting along a regularization path.

See the scikit-learn glossary entry for cross-validation estimator

Parameters:
l1_ratiofloat or list of float, default=1.0

float between 0 and 1 passed to SGL (scaling between group lasso and lasso penalties). For l1_ratio = 0 the penalty is the group lasso penalty. For l1_ratio = 1 it is the lasso penalty. For 0 < l1_ratio < 1, the penalty is a combination of group lasso and lasso. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values will depend on the problem. For problems where we expect strong overall sparsity and would like to encourage grouping, put more values close to 1 (i.e. Lasso). In contrast, if we expect strong group-wise sparsity, but only mild sparsity within groups, put more values close to 0 (i.e. group lasso).

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path, used for each l1_ratio.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

scoringcallable, default=None

A string (see sklearn model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). For a list of scoring functions that can be used, look at sklearn.metrics. The default scoring option is “r2”.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • an sklearn CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, sklearn.model_selection.KFold is used.

Refer to the scikit-learn User Guide for the various cross-validation strategies that can be used here.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=0

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

tuning_strategy[“grid”, “bayes”], default=”grid”

Hyperparameter tuning strategy to use. If tuning_strategy == "grid", then evaluate all parameter points on the l1_ratio and alphas grid, using warm start to evaluate different alpha values along the regularization path. If tuning_strategy == "bayes", then a fixed number of parameter settings is sampled using skopt.BayesSearchCV. The fixed number of settings is set by n_bayes_iter. The l1_ratio setting is sampled uniformly from the minimum and maximum of the input l1_ratio parameter. The alpha setting is sampled log-uniformly either from the maximum and minumum of the input alphas parameter, if provided or from eps * max_alpha to max_alpha where max_alpha is a conservative estimate of the maximum alpha for which the solution coefficients are non-trivial.

n_bayes_iterint, default=50

Number of parameter settings that are sampled if using Bayes search for hyperparameter optimization. n_bayes_iter trades off runtime vs quality of the solution. Consider increasing n_bayes_points if you want to try more parameter settings in parallel.

n_bayes_pointsint, default=1

Number of parameter settings to sample in parallel if using Bayes search for hyperparameter optimization. If this does not align with n_bayes_iter, the last iteration will sample fewer points.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

suppress_solver_warningsbool, default=True

If True, suppress warnings from BayesSearchCV when the objective is evaluated at the same point multiple times. Setting this to False, may be useful for debugging.

See also

sgl_path
SGL
Attributes:
alpha_float

The amount of penalization chosen by cross validation

l1_ratio_float

The compromise between l1 and l2 penalization chosen by cross validation

coef_ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula),

intercept_float or ndarray of shape (n_targets, n_features)

Independent term in the decision function.

scoring_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)

Mean square error for the test set on each fold, varying l1_ratio and alpha.

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio.

n_iter_int

number of iterations run by the proximal gradient descent solver to reach the specified tolerance for the optimal alpha.

bayes_optimizer_skopt.BayesSearchCV instance or None

The BayesSearchCV instance used for hyperparameter optimization if tuning_strategy == "bayes". If tuning_strategy == "grid", then this attribute is None.

class groupyr.LogisticSGLCV(l1_ratio=1.0, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, max_iter=1000, tol=1e-07, scoring=None, cv=None, copy_X=True, verbose=False, n_jobs=None, tuning_strategy='grid', n_bayes_iter=50, n_bayes_points=1, random_state=None, suppress_solver_warnings=True)[source]

Iterative Logistic SGL model fitting along a regularization path.

See the scikit-learn glossary entry for cross-validation estimator

Parameters:
l1_ratiofloat or list of float, default=1.0

float between 0 and 1 passed to SGL (scaling between group lasso and lasso penalties). For l1_ratio = 0 the penalty is the group lasso penalty. For l1_ratio = 1 it is the lasso penalty. For 0 < l1_ratio < 1, the penalty is a combination of group lasso and lasso. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values will depend on the problem. For problems where we expect strong overall sparsity and would like to encourage grouping, put more values close to 1 (i.e. Lasso). In contrast, if we expect strong group-wise sparsity, but only mild sparsity within groups, put more values close to 0 (i.e. group lasso).

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in __init__ so that it can be reused in model selection and CV routines.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path, used for each l1_ratio.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically

fit_interceptbool, default=True

whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

max_iterint, default=1000

The maximum number of iterations

tolfloat, default=1e-7

Stopping criterion. Convergence tolerance for the copt proximal gradient solver

scoringcallable, default=None

A string (see sklearn model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y). For a list of scoring functions that can be used, look at sklearn.metrics. The default scoring option used is accuracy_score.

cvint, cross-validation generator or iterable, default=None

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 5-fold cross-validation,

  • int, to specify the number of folds.

  • an sklearn CV splitter,

  • An iterable yielding (train, test) splits as arrays of indices.

For int/None inputs, sklearn.model_selection.StratifiedKFold is used.

Refer to the scikit-learn User Guide for the various cross-validation strategies that can be used here.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=False

Amount of verbosity.

n_jobsint, default=None

Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

tuning_strategy[“grid”, “bayes”], default=”grid”

Hyperparameter tuning strategy to use. If tuning_strategy == "grid", then evaluate all parameter points on the l1_ratio and alphas grid, using warm start to evaluate different alpha values along the regularization path. If tuning_strategy == "bayes", then a fixed number of parameter settings is sampled using skopt.BayesSearchCV. The fixed number of settings is set by n_bayes_iter. The l1_ratio setting is sampled uniformly from the minimum and maximum of the input l1_ratio parameter. The alpha setting is sampled log-uniformly either from the maximum and minumum of the input alphas parameter, if provided or from eps * max_alpha to max_alpha where max_alpha is a conservative estimate of the maximum alpha for which the solution coefficients are non-trivial.

n_bayes_iterint, default=50

Number of parameter settings that are sampled if using Bayes search for hyperparameter optimization. n_bayes_iter trades off runtime vs quality of the solution. Consider increasing n_bayes_points if you want to try more parameter settings in parallel.

n_bayes_pointsint, default=1

Number of parameter settings to sample in parallel if using Bayes search for hyperparameter optimization. If this does not align with n_bayes_iter, the last iteration will sample fewer points.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

suppress_solver_warningsbool, default=True

If True, suppress warnings from BayesSearchCV when the objective is evaluated at the same point multiple times. Setting this to False, may be useful for debugging.

See also

logistic_sgl_path
LogisticSGL
Attributes:
alpha_float

The amount of penalization chosen by cross validation

l1_ratio_float

The compromise between l1 and l2 penalization chosen by cross validation

classes_ndarray of shape (n_classes, )

A list of class labels known to the classifier.

coef_array of shape (n_features,)

Estimated coefficients for the linear predictor (X @ coef_ + intercept_).

intercept_float

Intercept (a.k.a. bias) added to linear predictor.

scoring_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)

Classification score for the test set on each fold, varying l1_ratio and alpha.

alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)

The grid of alphas used for fitting, for each l1_ratio.

n_iter_int

number of iterations run by the proximal gradient descent solver to reach the specified tolerance for the optimal alpha.

bayes_optimizer_skopt.BayesSearchCV instance or None

The BayesSearchCV instance used for hyperparameter optimization if tuning_strategy == "bayes". If tuning_strategy == "grid", then this attribute is None.

Dataset Generation

Use these functions to generate synthetic sparse grouped data.

groupyr.datasets.make_group_classification(n_samples=100, n_groups=20, n_informative_groups=2, n_features_per_group=20, n_informative_per_group=2, n_redundant_per_group=2, n_repeated_per_group=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, useful_indices=False, random_state=None)[source]

Generate a random n-class sparse group classification problem.

This function is a generalization of sklearn.datasets.make_classification to feature matrices with grouped covariates. Prior to shuffling, X stacks a number of these primary “informative” features, “redundant” linear combinations of these, “repeated” duplicates of sampled features, and arbitrary noise for and remaining features. This method uses sklearn.datasets.make_classification to construct a giant unshuffled classification problem of size n_groups * n_features_per_group and then distributes the returned features to each group. It then optionally shuffles each group.

Parameters:
n_samplesint, optional (default=100)

The number of samples.

n_groupsint, optional (default=10)

The number of feature groups.

n_informative_groupsint, optional (default=2)

The total number of informative groups. All other groups will be just noise.

n_features_per_groupint, optional (default=20)

The total number of features_per_group. These comprise n_informative informative features, n_redundant redundant features, n_repeated duplicated features and n_features-n_informative-n_redundant- n_repeated useless features drawn at random.

n_informative_per_groupint, optional (default=2)

The number of informative features_per_group. Each class is composed of a number of gaussian clusters each located around the vertices of a hypercube in a subspace of dimension n_informative_per_group. For each cluster, informative features are drawn independently from N(0, 1) and then randomly linearly combined within each cluster in order to add covariance. The clusters are then placed on the vertices of the hypercube.

n_redundant_per_groupint, optional (default=2)

The number of redundant features per group. These features are generated as random linear combinations of the informative features.

n_repeated_per_groupint, optional (default=0)

The number of duplicated features per group, drawn randomly from the informative and the redundant features.

n_classesint, optional (default=2)

The number of classes (or labels) of the classification problem.

n_clusters_per_classint, optional (default=2)

The number of clusters per class.

weightslist of floats or None (default=None)

The proportions of samples assigned to each class. If None, then classes are balanced. Note that if len(weights) == n_classes - 1, then the last class weight is automatically inferred. More than n_samples samples may be returned if the sum of weights exceeds 1.

flip_yfloat, optional (default=0.01)

The fraction of samples whose class are randomly exchanged. Larger values introduce noise in the labels and make the classification task harder.

class_sepfloat, optional (default=1.0)

The factor multiplying the hypercube size. Larger values spread out the clusters/classes and make the classification task easier.

hypercubeboolean, optional (default=True)

If True, the clusters are put on the vertices of a hypercube. If False, the clusters are put on the vertices of a random polytope.

shiftfloat, array of shape [n_features] or None, optional (default=0.0)

Shift features by the specified value. If None, then features are shifted by a random value drawn in [-class_sep, class_sep].

scalefloat, array of shape [n_features] or None, optional (default=1.0)

Multiply features by the specified value. If None, then features are scaled by a random value drawn in [1, 100]. Note that scaling happens after shifting.

shuffleboolean, optional (default=True)

Shuffle the samples and the features.

useful_indicesboolean, optional (default=False)

If True, a boolean array indicating useful features is returned

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:
Xarray of shape [n_samples, n_features]

The generated samples.

yarray of shape [n_samples]

The integer labels for class membership of each sample.

groupslist of arrays

Each element is an array of feature indices that belong to that group

indicesarray of shape [n_features]

A boolean array indicating which features are useful. Returned only if useful_indices is True.

See also

sklearn.datasets.make_classification

non-group-sparse version

sklearn.datasets.make_blobs

simplified variant

sklearn.datasets.make_multilabel_classification

unrelated generator for multilabel tasks

Notes

The algorithm is adapted from Guyon [1] and was designed to generate the “Madelon” dataset.

References

[1]

I. Guyon, “Design of experiments for the NIPS 2003 variable selection benchmark”, 2003.

groupyr.datasets.make_group_regression(n_samples=100, n_groups=20, n_informative_groups=5, n_features_per_group=20, n_informative_per_group=5, effective_rank=None, noise=0.0, shift=0.0, scale=1.0, shuffle=False, coef=False, random_state=None)[source]

Generate a sparse group regression problem.

This function is a generalization of sklearn.datasets.make_regression to feature matrices with grouped covariates. Prior to shuffling, X stacks a number of these primary “informative” features, and arbitrary noise for and remaining features. This method uses sklearn.datasets.make_regression to construct a giant unshuffled regression problem of size n_groups * n_features_per_group and then distributes the returned features to each group. It then optionally shuffles each group.

Parameters:
n_samplesint, optional (default=100)

The number of samples.

n_groupsint, optional (default=10)

The number of feature groups.

n_informative_groupsint, optional (default=2)

The total number of informative groups. All other groups will be just noise.

n_features_per_groupint, optional (default=20)

The total number of features_per_group. These comprise n_informative informative features, and n_features-n_informative useless features drawn at random.

n_informative_per_groupint, optional (default=2)

The number of informative features_per_group that have a non-zero regression coefficient.

effective_rankint or None, optional (default=None)

If not None, provides the number of singular vectors to explain the input data.

noisefloat, optional (default=0.0)

The standard deviation of the gaussian noise applied to the output.

shuffleboolean, optional (default=False)

Shuffle the samples and the features.

coefboolean, optional (default=False)

If True, returns coefficient values used to generate samples via sklearn.datasets.make_regression.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns:
Xarray of shape [n_samples, n_features]

The generated samples.

yarray of shape [n_samples]

The integer labels for class membership of each sample.

groupslist of arrays

Each element is an array of feature indices that belong to that group

coefarray of shape [n_features]

A numpy array containing true regression coefficient values. Returned only if coef is True.

See also

sklearn.datasets.make_regression

non-group-sparse version

Regularization Paths

Use these functions to compute regression coefficients along a regularization path.

groupyr.sgl_path(X, y, l1_ratio=0.5, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, Xy=None, normalize=False, copy_X=True, verbose=False, return_n_iter=False, check_input=True, **params)[source]

Compute sparse group lasso path.

We use the previous solution as the initial guess for subsequent alpha values

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training data. Pass directly as Fortran-contiguous data to avoid unnecessary memory duplication.

y{array-like, sparse matrix} of shape (n_samples,)

Target values.

l1_ratiofloat, default=0.5

Number between 0 and 1 passed to SGL estimator (scaling between the group lasso and lasso penalties). l1_ratio=1 corresponds to the Lasso.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically.

Xyarray-like of shape (n_features,), default=None

Xy = np.dot(X.T, y) that can be precomputed. If supplying Xy, prevent train/test leakage by ensuring the Xy is precomputed using only training data.

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=False

Amount of verbosity.

check_inputbool, default=True

Skip input validation checks, assuming there are handled by the caller when check_input=False.

**paramskwargs

Keyword arguments passed to the SGL estimator

Returns:
coefsndarray of shape (n_features, n_alphas) or (n_features + 1, n_alphas)

List of coefficients for the Logistic Regression model. If fit_intercept is set to True then the first dimension will be n_features + 1, where the last item represents the intercept.

alphasndarray of shape (n_alphas,)

The alphas along the path where models are computed.

n_itersarray of shape (n_alphas,)

Actual number of iteration for each alpha.

See also

SGL
SGLCV
groupyr.logistic.logistic_sgl_path(X, y, l1_ratio=0.5, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, Xy=None, normalize=False, copy_X=True, verbose=False, check_input=True, **params)[source]

Compute a Logistic SGL model for a list of regularization parameters.

This is an implementation that uses the result of the previous model to speed up computations along the regularization path, making it faster than calling LogisticSGL for the different parameters without warm start.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

Training data. Pass directly as Fortran-contiguous data to avoid unnecessary memory duplication.

y{array-like, sparse matrix} of shape (n_samples,)

Target values.

l1_ratiofloat, default=0.5

Number between 0 and 1 passed to SGL estimator (scaling between the group lasso and lasso penalties). l1_ratio=1 corresponds to the Lasso.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

scale_l2_by[“group_length”, None], default=”group_length”

Scaling technique for the group-wise L2 penalty. By default, scale_l2_by="group_length and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features and scale_l2_by=None would be more appropriate for that case. scale_l2_by=None will also reproduce ElasticNet results when all features belong to one group.

epsfloat, default=1e-3

Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

n_alphasint, default=100

Number of alphas along the regularization path.

alphasndarray, default=None

List of alphas where to compute the models. If None alphas are set automatically.

Xyarray-like of shape (n_features,), default=None

Xy = np.dot(X.T, y) that can be precomputed. If supplying Xy, prevent train/test leakage by ensuring the Xy is precomputed using only training data.

normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use sklearn.preprocessing.StandardScaler before calling fit on an estimator with normalize=False.

copy_Xbool, default=True

If True, X will be copied; else, it may be overwritten.

verbosebool or int, default=False

Amount of verbosity.

check_inputbool, default=True

Skip input validation checks, assuming there are handled by the caller when check_input=False.

**paramskwargs

Keyword arguments passed to the LogisticSGL estimator

Returns:
coefsndarray of shape (n_features, n_alphas) or (n_features + 1, n_alphas)

List of coefficients for the Logistic Regression model. If fit_intercept is set to True then the second dimension will be n_features + 1, where the last item represents the intercept.

alphasndarray

Grid of alphas used for cross-validation.

n_itersarray of shape (n_alphas,)

Actual number of iteration for each alpha.

Group Transformers

These classes perform group-wise transformations on their inputs.

class groupyr.transform.GroupExtractor(select=None, groups=None, group_names=None, copy_X=False, select_intersection=False)[source]

An sklearn-compatible group extractor.

Given a sequence of all group indices and a subsequence of desired group indices, this transformer returns the columns of the feature matrix, X, that are in the desired subgroups.

Parameters:
selectnumpy.ndarray, int, or str, optional

subsequence of desired groups to extract from feature matrix If int or sequence of ints, these will be treated as group indices. If str or sequence of str, these will be treated as labels for any level of the (potentially multi-indexed) group names, which must be specified in group_names

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

group_namessequence of str or sequences, optional

The names of the groups of X. If this is a sequence of strings, then this transformer will extract groups whose names match select. If this is a sequence of sequences, then this transformer will extract groups that have labels that match select at any level of their multi-index.

copy_Xbool, default=False

if True, X will be copied; else, transform may return a view

select_intersectionbool, default=False

if True, and select is a sequence, then transform will return the group intersection of labels in select. Otherwise, transform will return the group union.

class groupyr.transform.GroupRemover(select=None, groups=None, group_names=None, copy_X=False, select_intersection=False)[source]

An sklearn-compatible group remover.

Given a sequence of all group indices and a subsequence of unwanted group indices, this transformer returns the columns of the feature matrix, X, that DO NOT include the unwanted subgroups.

Parameters:
selectnumpy.ndarray, int, or str, optional

subsequence of desired groups to remove from feature matrix If int or sequence of ints, these will be treated as group indices. If str or sequence of str, these will be treated as labels for any level of the (potentially multi-indexed) group names, which must be specified in group_names

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

group_namessequence of str or sequences, optional

The names of the groups of X. If this is a sequence of strings, then this transformer will remove groups whose names match select. If this is a sequence of sequences, then this transformer will remove groups that have labels that match select at any level of their multi-index.

copy_Xbool, default=False

if True, X will be copied; else, transform may return a view

select_intersectionbool, default=False

if True, and select is a sequence, then transform will return the group intersection of labels in select. Otherwise, transform will return the group union.

class groupyr.transform.GroupShuffler(select=None, groups=None, group_names=None, random_state=None, select_intersection=False)[source]

Shuffle some groups of a feature matrix, leaving others as is.

Given a sequence of all group indices and a subsequence of group indices, this transformer returns the feature matrix, X, with the subset of groups shuffled.

Parameters:
selectnumpy.ndarray, int, or str, optional

subsequence of desired groups to shuffle in the feature matrix If int or sequence of ints, these will be treated as group indices. If str or sequence of str, these will be treated as labels for any level of the (potentially multi-indexed) group names, which must be specified in group_names

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

group_namessequence of str or sequences, optional

The names of the groups of X. If this is a sequence of strings, then this transformer will shuffle groups whose names match select. If this is a sequence of sequences, then this transformer will shuffle groups that have labels that match select at any level of their multi-index.

random_stateint, RandomState instance or None, optional (default=None)

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

select_intersectionbool, default=False

if True, and select is a sequence, then transform will return the group intersection of labels in select. Otherwise, transform will return the group union.

class groupyr.transform.GroupAggregator(func=None, groups=None, group_names=None, kw_args=None)[source]

Aggregate each group of a feature matrix using one or more functions.

Parameters:
funcfunction, str, list or dict

Function to use for aggregating the data. If a function, it must accept an axis=1 parameter. If a string, it must be part of the numpy namespace. Acceptable input types are

  • function

  • string function name

  • list of functions and/or function names, e.g. [np.sum, 'mean']

If no function is specified, np.mean is used.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

group_namessequence of str or sequences, optional

The names of the groups of X. This parameter has no effect on the output of the transform() method. However, this transformer will keep track of the transformed feature names using group_names if provided.

kw_args

Additional keyword arguments to pass to func. These will be applied to all elements of func if func is a sequence. If “axis” is one of these keywords, it will be ignored and set to axis=1.

Attributes:
n_features_in_int

The number of features in the feature matrix input to fit().

n_features_out_int

The number of features in the feature matrix output by transform().

groups_list of np.ndarray

The validated group indices used by the transformer

feature_names_out_list of str

A list of the feature names corresponding to columns of the transformed output.

class groupyr.transform.GroupResampler(resample_to=1.0, groups=None, group_names=None, kind='linear')[source]

Upsample or downsample each group.

Parameters:
resample_toint or float, default=1.0

If an int, the number of desired resampled features per group. If a float, the resampling ratio.

groupslist of numpy.ndarray

list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be [array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.

group_namessequence of str or sequences, optional

The names of the groups of X. This parameter has no effect on the output of the transform() method. However, this transformer will keep track of the transformed feature names using group_names if provided.

kindstr or int, optional

Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.

Attributes:
n_features_in_int

The number of features in the feature matrix input to fit().

n_features_out_int

The number of features in the feature matrix output by transform().

groups_list of np.ndarray

The validated group indices used by the transformer

feature_names_out_list of str

A list of the feature names corresponding to columns of the transformed output.