API Reference
Groupyr contains estimator classes that are fully compliant
with the scikit-learn ecosystem. Consequently,
their initialization, fit
, predict
, transform
, and score
methods will be familiar to sklearn
users.
Sparse Groups Lasso Estimators
These are groupyr’s canonical estimators. SGL
is intended for regression
problems while LogisticSGL
is intended for classification problems.
- class groupyr.SGL(l1_ratio=1.0, alpha=0.0, groups=None, scale_l2_by='group_length', fit_intercept=True, max_iter=1000, tol=1e-07, warm_start=False, verbose=0, suppress_solver_warnings=True, include_solver_trace=False)[source]
An sklearn compatible sparse group lasso regressor.
This solves the sparse group lasso [1] problem for a feature matrix partitioned into groups using the proximal gradient descent (PGD) algorithm.
- Parameters:
- l1_ratiofloat, default=1.0
Hyper-parameter : Combination between group lasso and lasso. l1_ratio=0 gives the group lasso and l1_ratio=1 gives the lasso.
- alphafloat, default=1.0
Hyper-parameter : overall regularization strength.
- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in__init__
so that it can be reused in model selection and CV routines.- scale_l2_by[“group_length”, None], default=”group_length”
Scaling technique for the group-wise L2 penalty. By default,
scale_l2_by="group_length
and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features andscale_l2_by=None
would be more appropriate for that case.scale_l2_by=None
will also reproduce ElasticNet results when all features belong to one group.- fit_interceptbool, default=True
Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).
- max_iterint, default=1000
Maximum number of iterations for PGD solver.
- tolfloat, default=1e-7
Stopping criterion. Convergence tolerance for the
copt
proximal gradient solver- warm_startbool, default=False
If set to
True
, reuse the solution of the previous call tofit
as initialization forcoef_
andintercept_
.- verboseint, default=0
Verbosity flag for PGD solver. Any positive integer will produce verbose output
- suppress_solver_warningsbool, default=True
If True, suppress convergence warnings from PGD solver. This is useful for hyperparameter tuning when some combinations of hyperparameters may not converge.
References
[1]Noah Simon, Jerome Friedman, Trevor Hastie & Robert Tibshirani, “A Sparse-Group Lasso,” Journal of Computational and Graphical Statistics, vol. 22:2, pp. 231-245, 2012 DOI: 10.1080/10618600.2012.681250
- Attributes:
- coef_array of shape (n_features,)
Estimated coefficients for the linear predictor (X @ coef_ + intercept_).
- intercept_float
Intercept (a.k.a. bias) added to linear predictor.
- n_iter_int
Actual number of iterations used in the solver.
- class groupyr.LogisticSGL(l1_ratio=1.0, alpha=0.0, groups=None, scale_l2_by='group_length', fit_intercept=True, max_iter=1000, tol=1e-07, warm_start=False, verbose=0, suppress_solver_warnings=True, include_solver_trace=False)[source]
An sklearn compatible sparse group lasso classifier.
This solves the sparse group lasso [1] problem for a feature matrix partitioned into groups using the proximal gradient descent (PGD) algorithm.
- Parameters:
- l1_ratiofloat, default=1.0
Hyper-parameter : Combination between group lasso and lasso. l1_ratio=0 gives the group lasso and l1_ratio=1 gives the lasso.
- alphafloat, default=0.0
Hyper-parameter : overall regularization strength.
- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in__init__
so that it can be reused in model selection and CV routines.- scale_l2_by[“group_length”, None], default=”group_length”
Scaling technique for the group-wise L2 penalty. By default,
scale_l2_by="group_length
and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features andscale_l2_by=None
would be more appropriate for that case.scale_l2_by=None
will also reproduce ElasticNet results when all features belong to one group.- fit_interceptbool, default=True
Specifies if a constant (a.k.a. bias or intercept) should be added to the linear predictor (X @ coef + intercept).
- max_iterint, default=1000
Maximum number of iterations for PGD solver.
- tolfloat, default=1e-7
Stopping criterion. Convergence tolerance for the
copt
proximal gradient solver- warm_startbool, default=False
If set to
True
, reuse the solution of the previous call tofit
as initialization forcoef_
andintercept_
.- verboseint, default=0
Verbosity flag for PGD solver. Any positive integer will produce verbose output
- suppress_solver_warningsbool, default=True
If True, suppress convergence warnings from PGD solver. This is useful for hyperparameter tuning when some combinations of hyperparameters may not converge.
References
[1]Noah Simon, Jerome Friedman, Trevor Hastie & Robert Tibshirani, “A Sparse-Group Lasso,” Journal of Computational and Graphical Statistics, vol. 22:2, pp. 231-245, 2012 DOI: 10.1080/10618600.2012.681250
- Attributes:
- classes_ndarray of shape (n_classes, )
A list of class labels known to the classifier.
- coef_array of shape (n_features,)
Estimated coefficients for the linear predictor (X @ coef_ + intercept_).
- intercept_float
Intercept (a.k.a. bias) added to linear predictor.
- n_iter_int
Actual number of iterations used in the solver.
Cross-validation Estimators
These estimators have built-in cross-validation
capabilities to find the best values of the hyperparameters alpha
and
l1_ratio
. These are more efficient than using the canonical estimators
with grid search because they make use of warm-starting. Alternatively, you
can specify tuning_strategy = "bayes"
to use Bayesian optimization over
the hyperparameters
instead of a grid search.
- class groupyr.SGLCV(l1_ratio=1.0, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, max_iter=1000, tol=1e-07, copy_X=True, scoring=None, cv=None, verbose=False, n_jobs=None, tuning_strategy='grid', n_bayes_iter=50, n_bayes_points=1, random_state=None, suppress_solver_warnings=True)[source]
Iterative SGL model fitting along a regularization path.
See the scikit-learn glossary entry for cross-validation estimator
- Parameters:
- l1_ratiofloat or list of float, default=1.0
float between 0 and 1 passed to SGL (scaling between group lasso and lasso penalties). For
l1_ratio = 0
the penalty is the group lasso penalty. Forl1_ratio = 1
it is the lasso penalty. For0 < l1_ratio < 1
, the penalty is a combination of group lasso and lasso. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values will depend on the problem. For problems where we expect strong overall sparsity and would like to encourage grouping, put more values close to 1 (i.e. Lasso). In contrast, if we expect strong group-wise sparsity, but only mild sparsity within groups, put more values close to 0 (i.e. group lasso).- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in__init__
so that it can be reused in model selection and CV routines.- scale_l2_by[“group_length”, None], default=”group_length”
Scaling technique for the group-wise L2 penalty. By default,
scale_l2_by="group_length
and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features andscale_l2_by=None
would be more appropriate for that case.scale_l2_by=None
will also reproduce ElasticNet results when all features belong to one group.- epsfloat, default=1e-3
Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
.- n_alphasint, default=100
Number of alphas along the regularization path, used for each l1_ratio.
- alphasndarray, default=None
List of alphas where to compute the models. If None alphas are set automatically
- fit_interceptbool, default=True
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).
- normalizebool, default=False
This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.- max_iterint, default=1000
The maximum number of iterations
- tolfloat, default=1e-7
Stopping criterion. Convergence tolerance for the
copt
proximal gradient solver- scoringcallable, default=None
A string (see sklearn model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y)
. For a list of scoring functions that can be used, look at sklearn.metrics. The default scoring option is “r2”.- cvint, cross-validation generator or iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross-validation,
int, to specify the number of folds.
an sklearn CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For int/None inputs,
sklearn.model_selection.KFold
is used.Refer to the scikit-learn User Guide for the various cross-validation strategies that can be used here.
- copy_Xbool, default=True
If
True
, X will be copied; else, it may be overwritten.- verbosebool or int, default=0
Amount of verbosity.
- n_jobsint, default=None
Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.- tuning_strategy[“grid”, “bayes”], default=”grid”
Hyperparameter tuning strategy to use. If
tuning_strategy == "grid"
, then evaluate all parameter points on thel1_ratio
andalphas
grid, using warm start to evaluate differentalpha
values along the regularization path. Iftuning_strategy == "bayes"
, then a fixed number of parameter settings is sampled usingskopt.BayesSearchCV
. The fixed number of settings is set byn_bayes_iter
. Thel1_ratio
setting is sampled uniformly from the minimum and maximum of the inputl1_ratio
parameter. Thealpha
setting is sampled log-uniformly either from the maximum and minumum of the inputalphas
parameter, if provided or fromeps
* max_alpha to max_alpha where max_alpha is a conservative estimate of the maximum alpha for which the solution coefficients are non-trivial.- n_bayes_iterint, default=50
Number of parameter settings that are sampled if using Bayes search for hyperparameter optimization.
n_bayes_iter
trades off runtime vs quality of the solution. Consider increasingn_bayes_points
if you want to try more parameter settings in parallel.- n_bayes_pointsint, default=1
Number of parameter settings to sample in parallel if using Bayes search for hyperparameter optimization. If this does not align with
n_bayes_iter
, the last iteration will sample fewer points.- random_stateint, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- suppress_solver_warningsbool, default=True
If True, suppress warnings from BayesSearchCV when the objective is evaluated at the same point multiple times. Setting this to False, may be useful for debugging.
- Attributes:
- alpha_float
The amount of penalization chosen by cross validation
- l1_ratio_float
The compromise between l1 and l2 penalization chosen by cross validation
- coef_ndarray of shape (n_features,) or (n_targets, n_features)
Parameter vector (w in the cost function formula),
- intercept_float or ndarray of shape (n_targets, n_features)
Independent term in the decision function.
- scoring_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)
Mean square error for the test set on each fold, varying l1_ratio and alpha.
- alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)
The grid of alphas used for fitting, for each l1_ratio.
- n_iter_int
number of iterations run by the proximal gradient descent solver to reach the specified tolerance for the optimal alpha.
- bayes_optimizer_skopt.BayesSearchCV instance or None
The BayesSearchCV instance used for hyperparameter optimization if
tuning_strategy == "bayes"
. Iftuning_strategy == "grid"
, then this attribute is None.
- class groupyr.LogisticSGLCV(l1_ratio=1.0, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, fit_intercept=True, normalize=False, max_iter=1000, tol=1e-07, scoring=None, cv=None, copy_X=True, verbose=False, n_jobs=None, tuning_strategy='grid', n_bayes_iter=50, n_bayes_points=1, random_state=None, suppress_solver_warnings=True)[source]
Iterative Logistic SGL model fitting along a regularization path.
See the scikit-learn glossary entry for cross-validation estimator
- Parameters:
- l1_ratiofloat or list of float, default=1.0
float between 0 and 1 passed to SGL (scaling between group lasso and lasso penalties). For
l1_ratio = 0
the penalty is the group lasso penalty. Forl1_ratio = 1
it is the lasso penalty. For0 < l1_ratio < 1
, the penalty is a combination of group lasso and lasso. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values will depend on the problem. For problems where we expect strong overall sparsity and would like to encourage grouping, put more values close to 1 (i.e. Lasso). In contrast, if we expect strong group-wise sparsity, but only mild sparsity within groups, put more values close to 0 (i.e. group lasso).- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group. We set groups in__init__
so that it can be reused in model selection and CV routines.- scale_l2_by[“group_length”, None], default=”group_length”
Scaling technique for the group-wise L2 penalty. By default,
scale_l2_by="group_length
and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features andscale_l2_by=None
would be more appropriate for that case.scale_l2_by=None
will also reproduce ElasticNet results when all features belong to one group.- epsfloat, default=1e-3
Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
.- n_alphasint, default=100
Number of alphas along the regularization path, used for each l1_ratio.
- alphasndarray, default=None
List of alphas where to compute the models. If None alphas are set automatically
- fit_interceptbool, default=True
whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be centered).
- normalizebool, default=False
This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.- max_iterint, default=1000
The maximum number of iterations
- tolfloat, default=1e-7
Stopping criterion. Convergence tolerance for the
copt
proximal gradient solver- scoringcallable, default=None
A string (see sklearn model evaluation documentation) or a scorer callable object / function with signature
scorer(estimator, X, y)
. For a list of scoring functions that can be used, look at sklearn.metrics. The default scoring option used is accuracy_score.- cvint, cross-validation generator or iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross-validation,
int, to specify the number of folds.
an sklearn CV splitter,
An iterable yielding (train, test) splits as arrays of indices.
For int/None inputs,
sklearn.model_selection.StratifiedKFold
is used.Refer to the scikit-learn User Guide for the various cross-validation strategies that can be used here.
- copy_Xbool, default=True
If
True
, X will be copied; else, it may be overwritten.- verbosebool or int, default=False
Amount of verbosity.
- n_jobsint, default=None
Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors.- tuning_strategy[“grid”, “bayes”], default=”grid”
Hyperparameter tuning strategy to use. If
tuning_strategy == "grid"
, then evaluate all parameter points on thel1_ratio
andalphas
grid, using warm start to evaluate differentalpha
values along the regularization path. Iftuning_strategy == "bayes"
, then a fixed number of parameter settings is sampled usingskopt.BayesSearchCV
. The fixed number of settings is set byn_bayes_iter
. Thel1_ratio
setting is sampled uniformly from the minimum and maximum of the inputl1_ratio
parameter. Thealpha
setting is sampled log-uniformly either from the maximum and minumum of the inputalphas
parameter, if provided or fromeps
* max_alpha to max_alpha where max_alpha is a conservative estimate of the maximum alpha for which the solution coefficients are non-trivial.- n_bayes_iterint, default=50
Number of parameter settings that are sampled if using Bayes search for hyperparameter optimization.
n_bayes_iter
trades off runtime vs quality of the solution. Consider increasingn_bayes_points
if you want to try more parameter settings in parallel.- n_bayes_pointsint, default=1
Number of parameter settings to sample in parallel if using Bayes search for hyperparameter optimization. If this does not align with
n_bayes_iter
, the last iteration will sample fewer points.- random_stateint, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- suppress_solver_warningsbool, default=True
If True, suppress warnings from BayesSearchCV when the objective is evaluated at the same point multiple times. Setting this to False, may be useful for debugging.
See also
logistic_sgl_path
LogisticSGL
- Attributes:
- alpha_float
The amount of penalization chosen by cross validation
- l1_ratio_float
The compromise between l1 and l2 penalization chosen by cross validation
- classes_ndarray of shape (n_classes, )
A list of class labels known to the classifier.
- coef_array of shape (n_features,)
Estimated coefficients for the linear predictor (X @ coef_ + intercept_).
- intercept_float
Intercept (a.k.a. bias) added to linear predictor.
- scoring_path_ndarray of shape (n_l1_ratio, n_alpha, n_folds)
Classification score for the test set on each fold, varying l1_ratio and alpha.
- alphas_ndarray of shape (n_alphas,) or (n_l1_ratio, n_alphas)
The grid of alphas used for fitting, for each l1_ratio.
- n_iter_int
number of iterations run by the proximal gradient descent solver to reach the specified tolerance for the optimal alpha.
- bayes_optimizer_skopt.BayesSearchCV instance or None
The BayesSearchCV instance used for hyperparameter optimization if
tuning_strategy == "bayes"
. Iftuning_strategy == "grid"
, then this attribute is None.
Dataset Generation
Use these functions to generate synthetic sparse grouped data.
- groupyr.datasets.make_group_classification(n_samples=100, n_groups=20, n_informative_groups=2, n_features_per_group=20, n_informative_per_group=2, n_redundant_per_group=2, n_repeated_per_group=0, n_classes=2, n_clusters_per_class=2, weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0, scale=1.0, shuffle=True, useful_indices=False, random_state=None)[source]
Generate a random n-class sparse group classification problem.
This function is a generalization of sklearn.datasets.make_classification to feature matrices with grouped covariates. Prior to shuffling,
X
stacks a number of these primary “informative” features, “redundant” linear combinations of these, “repeated” duplicates of sampled features, and arbitrary noise for and remaining features. This method uses sklearn.datasets.make_classification to construct a giant unshuffled classification problem of sizen_groups * n_features_per_group
and then distributes the returned features to each group. It then optionally shuffles each group.- Parameters:
- n_samplesint, optional (default=100)
The number of samples.
- n_groupsint, optional (default=10)
The number of feature groups.
- n_informative_groupsint, optional (default=2)
The total number of informative groups. All other groups will be just noise.
- n_features_per_groupint, optional (default=20)
The total number of features_per_group. These comprise n_informative informative features, n_redundant redundant features, n_repeated duplicated features and n_features-n_informative-n_redundant- n_repeated useless features drawn at random.
- n_informative_per_groupint, optional (default=2)
The number of informative features_per_group. Each class is composed of a number of gaussian clusters each located around the vertices of a hypercube in a subspace of dimension n_informative_per_group. For each cluster, informative features are drawn independently from N(0, 1) and then randomly linearly combined within each cluster in order to add covariance. The clusters are then placed on the vertices of the hypercube.
- n_redundant_per_groupint, optional (default=2)
The number of redundant features per group. These features are generated as random linear combinations of the informative features.
- n_repeated_per_groupint, optional (default=0)
The number of duplicated features per group, drawn randomly from the informative and the redundant features.
- n_classesint, optional (default=2)
The number of classes (or labels) of the classification problem.
- n_clusters_per_classint, optional (default=2)
The number of clusters per class.
- weightslist of floats or None (default=None)
The proportions of samples assigned to each class. If None, then classes are balanced. Note that if len(weights) == n_classes - 1, then the last class weight is automatically inferred. More than n_samples samples may be returned if the sum of weights exceeds 1.
- flip_yfloat, optional (default=0.01)
The fraction of samples whose class are randomly exchanged. Larger values introduce noise in the labels and make the classification task harder.
- class_sepfloat, optional (default=1.0)
The factor multiplying the hypercube size. Larger values spread out the clusters/classes and make the classification task easier.
- hypercubeboolean, optional (default=True)
If True, the clusters are put on the vertices of a hypercube. If False, the clusters are put on the vertices of a random polytope.
- shiftfloat, array of shape [n_features] or None, optional (default=0.0)
Shift features by the specified value. If None, then features are shifted by a random value drawn in [-class_sep, class_sep].
- scalefloat, array of shape [n_features] or None, optional (default=1.0)
Multiply features by the specified value. If None, then features are scaled by a random value drawn in [1, 100]. Note that scaling happens after shifting.
- shuffleboolean, optional (default=True)
Shuffle the samples and the features.
- useful_indicesboolean, optional (default=False)
If True, a boolean array indicating useful features is returned
- random_stateint, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- Returns:
- Xarray of shape [n_samples, n_features]
The generated samples.
- yarray of shape [n_samples]
The integer labels for class membership of each sample.
- groupslist of arrays
Each element is an array of feature indices that belong to that group
- indicesarray of shape [n_features]
A boolean array indicating which features are useful. Returned only if useful_indices is True.
See also
sklearn.datasets.make_classification
non-group-sparse version
sklearn.datasets.make_blobs
simplified variant
sklearn.datasets.make_multilabel_classification
unrelated generator for multilabel tasks
Notes
The algorithm is adapted from Guyon [1] and was designed to generate the “Madelon” dataset.
References
[1]I. Guyon, “Design of experiments for the NIPS 2003 variable selection benchmark”, 2003.
- groupyr.datasets.make_group_regression(n_samples=100, n_groups=20, n_informative_groups=5, n_features_per_group=20, n_informative_per_group=5, effective_rank=None, noise=0.0, shift=0.0, scale=1.0, shuffle=False, coef=False, random_state=None)[source]
Generate a sparse group regression problem.
This function is a generalization of sklearn.datasets.make_regression to feature matrices with grouped covariates. Prior to shuffling,
X
stacks a number of these primary “informative” features, and arbitrary noise for and remaining features. This method uses sklearn.datasets.make_regression to construct a giant unshuffled regression problem of sizen_groups * n_features_per_group
and then distributes the returned features to each group. It then optionally shuffles each group.- Parameters:
- n_samplesint, optional (default=100)
The number of samples.
- n_groupsint, optional (default=10)
The number of feature groups.
- n_informative_groupsint, optional (default=2)
The total number of informative groups. All other groups will be just noise.
- n_features_per_groupint, optional (default=20)
The total number of features_per_group. These comprise n_informative informative features, and n_features-n_informative useless features drawn at random.
- n_informative_per_groupint, optional (default=2)
The number of informative features_per_group that have a non-zero regression coefficient.
- effective_rankint or None, optional (default=None)
If not None, provides the number of singular vectors to explain the input data.
- noisefloat, optional (default=0.0)
The standard deviation of the gaussian noise applied to the output.
- shuffleboolean, optional (default=False)
Shuffle the samples and the features.
- coefboolean, optional (default=False)
If True, returns coefficient values used to generate samples via sklearn.datasets.make_regression.
- random_stateint, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- Returns:
- Xarray of shape [n_samples, n_features]
The generated samples.
- yarray of shape [n_samples]
The integer labels for class membership of each sample.
- groupslist of arrays
Each element is an array of feature indices that belong to that group
- coefarray of shape [n_features]
A numpy array containing true regression coefficient values. Returned only if coef is True.
See also
sklearn.datasets.make_regression
non-group-sparse version
Regularization Paths
Use these functions to compute regression coefficients along a regularization path.
- groupyr.sgl_path(X, y, l1_ratio=0.5, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, Xy=None, normalize=False, copy_X=True, verbose=False, return_n_iter=False, check_input=True, **params)[source]
Compute sparse group lasso path.
We use the previous solution as the initial guess for subsequent alpha values
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data. Pass directly as Fortran-contiguous data to avoid unnecessary memory duplication.
- y{array-like, sparse matrix} of shape (n_samples,)
Target values.
- l1_ratiofloat, default=0.5
Number between 0 and 1 passed to SGL estimator (scaling between the group lasso and lasso penalties).
l1_ratio=1
corresponds to the Lasso.- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- scale_l2_by[“group_length”, None], default=”group_length”
Scaling technique for the group-wise L2 penalty. By default,
scale_l2_by="group_length
and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features andscale_l2_by=None
would be more appropriate for that case.scale_l2_by=None
will also reproduce ElasticNet results when all features belong to one group.- epsfloat, default=1e-3
Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
.- n_alphasint, default=100
Number of alphas along the regularization path.
- alphasndarray, default=None
List of alphas where to compute the models. If None alphas are set automatically.
- Xyarray-like of shape (n_features,), default=None
Xy = np.dot(X.T, y) that can be precomputed. If supplying
Xy
, prevent train/test leakage by ensuring theXy
is precomputed using only training data.- normalizebool, default=False
This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.- copy_Xbool, default=True
If
True
, X will be copied; else, it may be overwritten.- verbosebool or int, default=False
Amount of verbosity.
- check_inputbool, default=True
Skip input validation checks, assuming there are handled by the caller when check_input=False.
- **paramskwargs
Keyword arguments passed to the SGL estimator
- Returns:
- coefsndarray of shape (n_features, n_alphas) or (n_features + 1, n_alphas)
List of coefficients for the Logistic Regression model. If fit_intercept is set to True then the first dimension will be n_features + 1, where the last item represents the intercept.
- alphasndarray of shape (n_alphas,)
The alphas along the path where models are computed.
- n_itersarray of shape (n_alphas,)
Actual number of iteration for each alpha.
- groupyr.logistic.logistic_sgl_path(X, y, l1_ratio=0.5, groups=None, scale_l2_by='group_length', eps=0.001, n_alphas=100, alphas=None, Xy=None, normalize=False, copy_X=True, verbose=False, check_input=True, **params)[source]
Compute a Logistic SGL model for a list of regularization parameters.
This is an implementation that uses the result of the previous model to speed up computations along the regularization path, making it faster than calling LogisticSGL for the different parameters without warm start.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
Training data. Pass directly as Fortran-contiguous data to avoid unnecessary memory duplication.
- y{array-like, sparse matrix} of shape (n_samples,)
Target values.
- l1_ratiofloat, default=0.5
Number between 0 and 1 passed to SGL estimator (scaling between the group lasso and lasso penalties).
l1_ratio=1
corresponds to the Lasso.- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- scale_l2_by[“group_length”, None], default=”group_length”
Scaling technique for the group-wise L2 penalty. By default,
scale_l2_by="group_length
and the L2 penalty is scaled by the square root of the group length so that each variable has the same effect on the penalty. This may not be appropriate for one-hot encoded features andscale_l2_by=None
would be more appropriate for that case.scale_l2_by=None
will also reproduce ElasticNet results when all features belong to one group.- epsfloat, default=1e-3
Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
.- n_alphasint, default=100
Number of alphas along the regularization path.
- alphasndarray, default=None
List of alphas where to compute the models. If None alphas are set automatically.
- Xyarray-like of shape (n_features,), default=None
Xy = np.dot(X.T, y) that can be precomputed. If supplying
Xy
, prevent train/test leakage by ensuring theXy
is precomputed using only training data.- normalizebool, default=False
This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.- copy_Xbool, default=True
If
True
, X will be copied; else, it may be overwritten.- verbosebool or int, default=False
Amount of verbosity.
- check_inputbool, default=True
Skip input validation checks, assuming there are handled by the caller when check_input=False.
- **paramskwargs
Keyword arguments passed to the LogisticSGL estimator
- Returns:
- coefsndarray of shape (n_features, n_alphas) or (n_features + 1, n_alphas)
List of coefficients for the Logistic Regression model. If fit_intercept is set to True then the second dimension will be n_features + 1, where the last item represents the intercept.
- alphasndarray
Grid of alphas used for cross-validation.
- n_itersarray of shape (n_alphas,)
Actual number of iteration for each alpha.
Group Transformers
These classes perform group-wise transformations on their inputs.
- class groupyr.transform.GroupExtractor(select=None, groups=None, group_names=None, copy_X=False, select_intersection=False)[source]
An sklearn-compatible group extractor.
Given a sequence of all group indices and a subsequence of desired group indices, this transformer returns the columns of the feature matrix, X, that are in the desired subgroups.
- Parameters:
- selectnumpy.ndarray, int, or str, optional
subsequence of desired groups to extract from feature matrix If int or sequence of ints, these will be treated as group indices. If str or sequence of str, these will be treated as labels for any level of the (potentially multi-indexed) group names, which must be specified in
group_names
- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- group_namessequence of str or sequences, optional
The names of the groups of X. If this is a sequence of strings, then this transformer will extract groups whose names match
select
. If this is a sequence of sequences, then this transformer will extract groups that have labels that matchselect
at any level of their multi-index.- copy_Xbool, default=False
if
True
, X will be copied; else,transform
may return a view- select_intersectionbool, default=False
if
True
, andselect
is a sequence, thentransform
will return the group intersection of labels inselect
. Otherwise,transform
will return the group union.
- class groupyr.transform.GroupRemover(select=None, groups=None, group_names=None, copy_X=False, select_intersection=False)[source]
An sklearn-compatible group remover.
Given a sequence of all group indices and a subsequence of unwanted group indices, this transformer returns the columns of the feature matrix, X, that DO NOT include the unwanted subgroups.
- Parameters:
- selectnumpy.ndarray, int, or str, optional
subsequence of desired groups to remove from feature matrix If int or sequence of ints, these will be treated as group indices. If str or sequence of str, these will be treated as labels for any level of the (potentially multi-indexed) group names, which must be specified in
group_names
- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- group_namessequence of str or sequences, optional
The names of the groups of X. If this is a sequence of strings, then this transformer will remove groups whose names match
select
. If this is a sequence of sequences, then this transformer will remove groups that have labels that matchselect
at any level of their multi-index.- copy_Xbool, default=False
if
True
, X will be copied; else,transform
may return a view- select_intersectionbool, default=False
if
True
, andselect
is a sequence, thentransform
will return the group intersection of labels inselect
. Otherwise,transform
will return the group union.
- class groupyr.transform.GroupShuffler(select=None, groups=None, group_names=None, random_state=None, select_intersection=False)[source]
Shuffle some groups of a feature matrix, leaving others as is.
Given a sequence of all group indices and a subsequence of group indices, this transformer returns the feature matrix, X, with the subset of groups shuffled.
- Parameters:
- selectnumpy.ndarray, int, or str, optional
subsequence of desired groups to shuffle in the feature matrix If int or sequence of ints, these will be treated as group indices. If str or sequence of str, these will be treated as labels for any level of the (potentially multi-indexed) group names, which must be specified in
group_names
- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- group_namessequence of str or sequences, optional
The names of the groups of X. If this is a sequence of strings, then this transformer will shuffle groups whose names match
select
. If this is a sequence of sequences, then this transformer will shuffle groups that have labels that matchselect
at any level of their multi-index.- random_stateint, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- select_intersectionbool, default=False
if
True
, andselect
is a sequence, thentransform
will return the group intersection of labels inselect
. Otherwise,transform
will return the group union.
- class groupyr.transform.GroupAggregator(func=None, groups=None, group_names=None, kw_args=None)[source]
Aggregate each group of a feature matrix using one or more functions.
- Parameters:
- funcfunction, str, list or dict
Function to use for aggregating the data. If a function, it must accept an
axis=1
parameter. If a string, it must be part of the numpy namespace. Acceptable input types arefunction
string function name
list of functions and/or function names, e.g.
[np.sum, 'mean']
If no function is specified,
np.mean
is used.- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- group_namessequence of str or sequences, optional
The names of the groups of X. This parameter has no effect on the output of the
transform()
method. However, this transformer will keep track of the transformed feature names usinggroup_names
if provided.- kw_args
Additional keyword arguments to pass to
func
. These will be applied to all elements offunc
iffunc
is a sequence. If “axis” is one of these keywords, it will be ignored and set toaxis=1
.
- Attributes:
- n_features_in_int
The number of features in the feature matrix input to
fit()
.- n_features_out_int
The number of features in the feature matrix output by
transform()
.- groups_list of np.ndarray
The validated group indices used by the transformer
- feature_names_out_list of str
A list of the feature names corresponding to columns of the transformed output.
- class groupyr.transform.GroupResampler(resample_to=1.0, groups=None, group_names=None, kind='linear')[source]
Upsample or downsample each group.
- Parameters:
- resample_toint or float, default=1.0
If an int, the number of desired resampled features per group. If a float, the resampling ratio.
- groupslist of numpy.ndarray
list of arrays of non-overlapping indices for each group. For example, if nine features are grouped into equal contiguous groups of three, then groups would be
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
. If the feature matrix contains a bias or intercept feature, do not include it as a group. If None, all features will belong to one group.- group_namessequence of str or sequences, optional
The names of the groups of X. This parameter has no effect on the output of the
transform()
method. However, this transformer will keep track of the transformed feature names usinggroup_names
if provided.- kindstr or int, optional
Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of ‘linear’, ‘nearest’, ‘nearest-up’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, or ‘next’. ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order; ‘previous’ and ‘next’ simply return the previous or next value of the point; ‘nearest-up’ and ‘nearest’ differ when interpolating half-integers (e.g. 0.5, 1.5) in that ‘nearest-up’ rounds up and ‘nearest’ rounds down. Default is ‘linear’.
- Attributes:
- n_features_in_int
The number of features in the feature matrix input to
fit()
.- n_features_out_int
The number of features in the feature matrix output by
transform()
.- groups_list of np.ndarray
The validated group indices used by the transformer
- feature_names_out_list of str
A list of the feature names corresponding to columns of the transformed output.