Title: | Sparse Multi-Type Regularized Feature Modeling |
---|---|
Description: | Implementation of the SMuRF algorithm of Devriendt et al. (2021) <doi:10.1016/j.insmatheco.2020.11.010> to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood. |
Authors: | Tom Reynkens [aut, cre] , Sander Devriendt [aut], Katrien Antonio [aut] |
Maintainer: | Tom Reynkens <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.5 |
Built: | 2024-11-10 04:00:16 UTC |
Source: | https://gitlab.com/treynkens/smurf |
Implementation of the SMuRF algorithm of Devriendt et al. (2021) doi:10.1016/j.insmatheco.2020.11.010 to fit generalized linear models (GLMs) with multiple types of predictors via regularized maximum likelihood.
Maintainer: Tom Reynkens [email protected] (ORCID)
Authors:
Sander Devriendt [email protected]
Katrien Antonio
Useful links:
Function to extract the coefficients of the re-estimated model.
coefficients_reest
is an alias for it.
coef_reest(object, ...) ## S3 method for class 'glmsmurf' coef_reest(object, ...) coefficients_reest(object, ...) ## S3 method for class 'glmsmurf' coefficients_reest(object, ...)
coef_reest(object, ...) ## S3 method for class 'glmsmurf' coef_reest(object, ...) coefficients_reest(object, ...) ## S3 method for class 'glmsmurf' coefficients_reest(object, ...)
object |
An object for which the extraction of model coefficients is meaningful.
E.g. an object of class ' |
... |
Additional arguments which are currently ignored. |
A vector containing the coefficients of the re-estimated model in object
,
when they are available, or, otherwise, the coefficients of the estimated model in object
with a warning.
coef.glmsmurf
, coef
, summary.glmsmurf
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the coefficients of the estimated model.
coefficients
is an alias for it.
## S3 method for class 'glmsmurf' coef(object, ...) ## S3 method for class 'glmsmurf' coefficients(object, ...)
## S3 method for class 'glmsmurf' coef(object, ...) ## S3 method for class 'glmsmurf' coefficients(object, ...)
object |
An object of class ' |
... |
Additional arguments which are currently ignored. |
A vector containing the coefficients of the estimated model in object
.
coef_reest
, coef
, summary.glmsmurf
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the deviance of the re-estimated model.
deviance_reest(object, ...) ## S3 method for class 'glmsmurf' deviance_reest(object, ...)
deviance_reest(object, ...) ## S3 method for class 'glmsmurf' deviance_reest(object, ...)
object |
An object for which the extraction of the deviance is meaningful.
E.g. an object of class ' |
... |
Additional arguments which are currently ignored. |
The deviance of the re-estimated model in object
,
when it is available or, otherwise, the deviance of the estimated model in object
with a warning.
deviance.glmsmurf
, deviance
, summary.glmsmurf
,
glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the deviance of the estimated model.
## S3 method for class 'glmsmurf' deviance(object, ...)
## S3 method for class 'glmsmurf' deviance(object, ...)
object |
An object of class ' |
... |
Additional arguments which are currently ignored. |
The deviance of the estimated model in object
.
deviance_reest
, deviance
, summary.glmsmurf
,
glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the fitted values of the re-estimated model.
fitted_reest(object, ...) ## S3 method for class 'glmsmurf' fitted_reest(object, ...)
fitted_reest(object, ...) ## S3 method for class 'glmsmurf' fitted_reest(object, ...)
object |
An object for which the extraction of fitted values is meaningful.
E.g. an object of class ' |
... |
Additional arguments which are currently ignored. |
A vector containing the fitted values of the re-estimated model in object
,
when they are available or, otherwise, the fitted values of the estimated model in object
with a warning.
fitted.glmsmurf
, fitted
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the fitted values of the estimated model.
## S3 method for class 'glmsmurf' fitted(object, ...)
## S3 method for class 'glmsmurf' fitted(object, ...)
object |
An object of class ' |
... |
Additional arguments which are currently ignored. |
A vector containing the fitted values of the estimated model in object
.
fitted_reest
, fitted
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
SMuRF algorithm to fit a generalized linear model (GLM) with multiple types of predictors via regularized maximum likelihood.
glmsmurf.fit
contains the fitting function for a given design matrix.
glmsmurf( formula, family, data, weights, start, offset, lambda, lambda1 = 0, lambda2 = 0, pen.weights, adj.matrix, standardize = TRUE, control = list(), x.return = FALSE, y.return = TRUE, pen.weights.return = FALSE ) glmsmurf.fit( X, y, weights, start, offset, family, pen.cov, n.par.cov, group.cov, refcat.cov, lambda, lambda1 = 0, lambda2 = 0, pen.weights, adj.matrix, standardize = TRUE, control = list(), formula = NULL, data = NULL, x.return = FALSE, y.return = FALSE, pen.weights.return = FALSE )
glmsmurf( formula, family, data, weights, start, offset, lambda, lambda1 = 0, lambda2 = 0, pen.weights, adj.matrix, standardize = TRUE, control = list(), x.return = FALSE, y.return = TRUE, pen.weights.return = FALSE ) glmsmurf.fit( X, y, weights, start, offset, family, pen.cov, n.par.cov, group.cov, refcat.cov, lambda, lambda1 = 0, lambda2 = 0, pen.weights, adj.matrix, standardize = TRUE, control = list(), formula = NULL, data = NULL, x.return = FALSE, y.return = FALSE, pen.weights.return = FALSE )
formula |
A |
family |
A |
data |
A data frame containing the model response and predictors for |
weights |
An optional vector of prior weights to use in the likelihood. It should be a numeric vector of length |
start |
A vector containing the starting values for the coefficients. It should either be a numeric vector
of length |
offset |
A vector containing the offset for the model. It should be a vector of size |
lambda |
Either the penalty parameter, a positive number; or a string describing the method and measure used to select the penalty parameter:
E.g. |
lambda1 |
The penalty parameter for the |
lambda2 |
The penalty parameter for the |
pen.weights |
Either a string describing the method to compute the penalty weights:
or a list with the penalty weight vector per predictor. This list should have length equal to the number of predictors and predictor names as element names. |
adj.matrix |
A named list containing the adjacency matrices (a.k.a. neighbor matrices) for each of the predictors with a Graph-Guided Fused Lasso penalty. The list elements should have the names of the corresponding predictors. If only one predictor has a Graph-Guided Fused Lasso penalty, it is also possible to only give the adjacency matrix itself (not in a list). |
standardize |
Logical indicating if predictors with a Lasso or Group Lasso penalty are standardized, default is |
control |
A list of parameters used in the fitting process. This is passed to |
x.return |
Logical indicating if the used model matrix should be returned in the output object, default is |
y.return |
Logical indicating if the used response vector should be returned in the output object, default is |
pen.weights.return |
Logical indicating if the list of the used penalty weight vector per predictor should be returned in the output object, default is |
X |
Only for |
y |
Only for |
pen.cov |
Only for |
n.par.cov |
Only for |
group.cov |
Only for |
refcat.cov |
Only for |
See the package vignette for more details and a complete description of a use case.
As a user, it is important to take the following into acocunt:
The estimated coefficients are rounded to 7 digits.
The cross-validation folds are not deterministic. The validation sample for selecting lambda out-of-sample is determined at random when no indices are provided in 'validation.index' in the control object argument. In these cases, the selected value of lambda is hence not deterministic. When selecting lambda in-sample, or out-of-sample when indices are provided in 'validation.index' in the control object argument, the selected value of lambda is deterministic.
The glmsmurf
function can handle many use cases and is preferred for general use.
The glmsmurf.fit
function requires a more thorough understanding of the package internals and should hence be used with care!
An object of class 'glmsmurf
' is returned. See glmsmurf-class
for more details about this class and its generic functions.
Devriendt, S., Antonio, K., Reynkens, T. and Verbelen, R. (2021). "Sparse Regression with Multi-type Regularized Feature Modeling", Insurance: Mathematics and Economics, 96, 248–261. <doi:10.1016/j.insmatheco.2020.11.010>.
Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press.
glmsmurf-class
, glmsmurf.control
, p
, glm
# Munich rent data from catdata package data("rent", package = "catdata") # The considered predictors are the same as in # Gertheiss and Tutz (Ann. Appl. Stat., 2010). # Response is monthly rent per square meter in Euro # Urban district in Munich rent$area <- as.factor(rent$area) # Decade of construction rent$year <- as.factor(floor(rent$year / 10) * 10) # Number of rooms rent$rooms <- as.factor(rent$rooms) # Quality of the house with levels "fair", "good" and "excellent" rent$quality <- as.factor(rent$good + 2 * rent$best) levels(rent$quality) <- c("fair", "good", "excellent") # Floor space divided in categories (0, 30), [30, 40), ..., [130, 140) sizeClasses <- c(0, seq(30, 140, 10)) rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)]) # Is warm water present? rent$warm <- factor(rent$warm, labels = c("yes", "no")) # Is central heating present? rent$central <- factor(rent$central, labels = c("yes", "no")) # Does the bathroom have tiles? rent$tiles <- factor(rent$tiles, labels = c("yes", "no")) # Is there special furniture in the bathroom? rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes")) # Is the kitchen well-equipped? rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes")) # Create formula with 'rentm' as response variable, # 'area' with a Generalized Fused Lasso penalty, # 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties, # and the other predictors with Lasso penalties. formu <- rentm ~ p(area, pen = "gflasso") + p(year, pen = "flasso") + p(rooms, pen = "flasso") + p(quality, pen = "flasso") + p(size, pen = "flasso") + p(warm, pen = "lasso") + p(central, pen = "lasso") + p(tiles, pen = "lasso") + p(bathextra, pen = "lasso") + p(kitchen, pen = "lasso") # Fit a multi-type regularized GLM using the SMuRF algorithm. # We use standardization adaptive penalty weights based on an initial GLM fit. # The value for lambda is selected using cross-validation # (with the deviance as loss measure and the one standard error rule), see example(plot_lambda) munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent, pen.weights = "glm.stand", lambda = 0.02) #### # S3 methods for glmsmurf objects # Model summary summary(munich.fit) # Get coefficients of estimated model coef(munich.fit) # Get coefficients of re-estimated model coef_reest(munich.fit) # Plot coefficients of estimated model plot(munich.fit) # Plot coefficients of re-estimated model plot_reest(munich.fit) # Get deviance of estimated model deviance(munich.fit) # Get deviance of re-estimated model deviance_reest(munich.fit) # Get fitted values of estimated model fitted(munich.fit) # Get fitted values of re-estimated model fitted_reest(munich.fit) # Get predicted values of estimated model on scale of linear predictors predict(munich.fit, type = "link") # Get predicted values of re-estimated model on scale of linear predictors predict_reest(munich.fit, type = "link") # Get deviance residuals of estimated model residuals(munich.fit, type = "deviance") # Get deviance residuals of re-estimated model residuals_reest(munich.fit, type = "deviance")
# Munich rent data from catdata package data("rent", package = "catdata") # The considered predictors are the same as in # Gertheiss and Tutz (Ann. Appl. Stat., 2010). # Response is monthly rent per square meter in Euro # Urban district in Munich rent$area <- as.factor(rent$area) # Decade of construction rent$year <- as.factor(floor(rent$year / 10) * 10) # Number of rooms rent$rooms <- as.factor(rent$rooms) # Quality of the house with levels "fair", "good" and "excellent" rent$quality <- as.factor(rent$good + 2 * rent$best) levels(rent$quality) <- c("fair", "good", "excellent") # Floor space divided in categories (0, 30), [30, 40), ..., [130, 140) sizeClasses <- c(0, seq(30, 140, 10)) rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)]) # Is warm water present? rent$warm <- factor(rent$warm, labels = c("yes", "no")) # Is central heating present? rent$central <- factor(rent$central, labels = c("yes", "no")) # Does the bathroom have tiles? rent$tiles <- factor(rent$tiles, labels = c("yes", "no")) # Is there special furniture in the bathroom? rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes")) # Is the kitchen well-equipped? rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes")) # Create formula with 'rentm' as response variable, # 'area' with a Generalized Fused Lasso penalty, # 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties, # and the other predictors with Lasso penalties. formu <- rentm ~ p(area, pen = "gflasso") + p(year, pen = "flasso") + p(rooms, pen = "flasso") + p(quality, pen = "flasso") + p(size, pen = "flasso") + p(warm, pen = "lasso") + p(central, pen = "lasso") + p(tiles, pen = "lasso") + p(bathextra, pen = "lasso") + p(kitchen, pen = "lasso") # Fit a multi-type regularized GLM using the SMuRF algorithm. # We use standardization adaptive penalty weights based on an initial GLM fit. # The value for lambda is selected using cross-validation # (with the deviance as loss measure and the one standard error rule), see example(plot_lambda) munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent, pen.weights = "glm.stand", lambda = 0.02) #### # S3 methods for glmsmurf objects # Model summary summary(munich.fit) # Get coefficients of estimated model coef(munich.fit) # Get coefficients of re-estimated model coef_reest(munich.fit) # Plot coefficients of estimated model plot(munich.fit) # Plot coefficients of re-estimated model plot_reest(munich.fit) # Get deviance of estimated model deviance(munich.fit) # Get deviance of re-estimated model deviance_reest(munich.fit) # Get fitted values of estimated model fitted(munich.fit) # Get fitted values of re-estimated model fitted_reest(munich.fit) # Get predicted values of estimated model on scale of linear predictors predict(munich.fit, type = "link") # Get predicted values of re-estimated model on scale of linear predictors predict_reest(munich.fit, type = "link") # Get deviance residuals of estimated model residuals(munich.fit, type = "deviance") # Get deviance residuals of re-estimated model residuals_reest(munich.fit, type = "deviance")
The functions glmsmurf
and glmsmurf.fit
return objects of the S3 class 'glmsmurf
'
which partially inherits from the 'glm
' and 'lm
' classes.
An object of class 'glmsmurf
' is a list with at least following components:
coefficients |
Coefficients of the estimated model. |
residuals |
Working residuals of the estimated model, see |
fitted.values |
Fitted mean values of the estimated model |
rank |
Numeric rank of the estimated model, i.e. the number of unique non-zero coefficients. |
family |
The used |
linear.predictors |
Linear fit of the estimated model on the link scale |
deviance |
Deviance of the estimated model: minus twice the log-likelihood, up to a constant. |
aic |
Akaike Information Criterion of the estimated model: |
bic |
Bayesian Information Criterion of the estimated model: |
gcv |
Generalized Cross-Validation score of the estimated model: |
null.deviance |
Deviance of the null model, i.e. the model with only an intercept and offset. |
df.residual |
Residual degrees of freedom of the estimated model, i.e. the number of observations (excluding those with weight 0) minus the rank of the estimated model. |
df.null |
Residual degrees of freedom for the null model, i.e. the number of observations (excluding those with weight 0) minus the rank of the null model. |
obj.fun |
Value of the objective function of the estimated model: minus the regularized scaled log-likelihood of the estimated model. |
weights |
The prior weights that were initially supplied.
Note that they are called |
offset |
The used offset vector. |
lambda |
The used penalty parameter: initially supplied by the user, or selected in-sample, out-of-sample or using cross-validation. |
lambda1 |
The used penalty parameter for the |
lambda2 |
The used penalty parameter for the |
iter |
The number of iterations that are performed to fit the model. |
converged |
An integer code indicating whether the algorithm converged successfully:
|
final.stepsize |
Final step size used in the algorithm. |
n.par.cov |
List with number of parameters to estimate per predictor (covariate). |
pen.cov |
List with penalty type per predictor (covariate). |
group.cov |
List with group of each predictor (covariate) for Group Lasso where 0 means no group. |
refcat.cov |
List with number of the reference category in the original order of the levels of each predictor (covariate) where 0 indicates no reference category. |
control |
The used control list, see |
Optionally, following elements are also included:
X |
The model matrix, only returned when the argument |
y |
The response vector, only returned when the argument |
pen.weights |
List with the vector of penalty weights per predictor (covariate), only returned when the argument |
When the model is re-estimated, i.e. reest = TRUE
in glmsmurf.control
,
the following components are also present:
glm.reest |
Output from the call to |
coefficients.reest |
Coefficients of the re-estimated model. |
residuals.reest |
Working residuals of the re-estimated model. |
fitted.values.reest |
Fitted mean values of the re-estimated model. |
rank.reest |
Numeric rank of the re-estimated model, i.e. the number of unique non-zero re-estimated coefficients. |
linear.predictors.reest |
Linear fit of the re-estimated model on the link scale. |
deviance.reest |
Deviance of the re-estimated model. |
aic.reest |
AIC of the re-estimated model. |
bic.reest |
BIC of the re-estimated model. |
gcv.reest |
GCV score of the re-estimated model. |
df.residual.reest |
Residual degrees of freedom of the re-estimated model. |
obj.fun.reest |
Value of the objective function of the re-estimated model: minus the regularized scaled log-likelihood of the re-estimated model. |
X.reest |
The model matrix used in the re-estimation, only returned when the argument |
When lambda is not given as input but selected in-sample, out-of-sample or using cross-validation,
i.e. the lambda
argument in glmsmurf
or glmsmurf.fit
is a string describing the selection method,
the following components are also present:
lambda.method |
Method (in-sample, out-of-sample or cross-validation (possibly with the one standard error rule)) and measure (AIC, BIC, GCV score, deviance, MSE or DSS) used to select |
lambda.vector |
Vector of |
lambda.measures |
List with for each of the relevant measures a matrix containing for each considered value of |
lambda.coefficients |
Matrix containing for each considered value of |
When the object is output from glmsmurf
, following elements are also included:
call |
The matched call. |
formula |
The supplied formula. |
terms |
The |
contrasts |
The contrasts used (when relevant). |
xlevels |
The levels of the factors used in fitting (when relevant). |
Following S3 generic functions are available for an object of class "glmsmurf
":
coef
Extract coefficients of the estimated model.
coef_reest
Extract coefficients of the re-estimated model, when available.
deviance
Extract deviance of the estimated model.
deviance_reest
Extract deviance of the re-estimated model, when available.
family
Extract family object.
fitted
Extract fitted values of the estimated model.
fitted_reest
Extract fitted values of the re-estimated model, when available.
plot
Plot coefficients of the estimated model.
plot_reest
Plot coefficients of the re-estimated model, when available.
plot_lambda
Plot goodness-of-fit statistics or information criteria as a function of lambda, when lambda is selected in-sample, out-of-sample or using cross-validation.
predict
Obtain predictions using the estimated model.
predict_reest
Obtain predictions using the re-estimated model, when available.
residuals
Extract residuals of the estimated model.
residuals_reest
Extract residuals of the re-estimated model, when available.
summary
Print a summary of the estimated model, and of the re-estimated model (when available).
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Control function to handle parameters for fitting a multi-type regularized generalized linear model (GLM) using the SMuRF algorithm. The function sets defaults and performs input checks on the provided parameters.
glmsmurf.control( epsilon = 1e-08, maxiter = 10000, step = NULL, tau = 0.5, reest = TRUE, lambda.vector = NULL, lambda.min = NULL, lambda.max = NULL, lambda.length = 50L, lambda.reest = FALSE, k = 5L, oos.prop = 0.2, validation.index = NULL, ncores = NULL, po.ncores = NULL, print = FALSE )
glmsmurf.control( epsilon = 1e-08, maxiter = 10000, step = NULL, tau = 0.5, reest = TRUE, lambda.vector = NULL, lambda.min = NULL, lambda.max = NULL, lambda.length = 50L, lambda.reest = FALSE, k = 5L, oos.prop = 0.2, validation.index = NULL, ncores = NULL, po.ncores = NULL, print = FALSE )
epsilon |
Numeric tolerance value for stopping criterion. A numeric strictly larger than 0, default is |
maxiter |
Maximum number of iterations of the SMuRF algorithm. A numeric larger than or equal to 1, default is |
step |
Initial step size, a numeric strictly larger than 0 or |
tau |
Parameter for backtracking the step size. A numeric strictly between 0 and 1, default is 0.5. |
reest |
A logical indicating if the obtained (reduced) model is re-estimated using |
lambda.vector |
Values of lambda to consider when selecting the optimal value of lambda. A vector of strictly positive numerics (which is preferably a decreasing sequence as we make use of warm starts) or |
lambda.min |
Minimum value of lambda to consider when selecting the optimal value of lambda. A strictly positive numeric or |
lambda.max |
Maximum value of lambda to consider when selecting the optimal value of lambda. A strictly positive numeric larger than |
lambda.length |
Number of lambda values to consider when selecting the optimal value of lambda. A strictly positive integer, default is 50. This argument is ignored when |
lambda.reest |
Logical indicating if the re-estimated coefficients are used when selecting lambda, default is |
k |
Number of folds when selecting lambda using cross-validation. A strictly positive integer, default is 5 (i.e. five-fold cross-validation). This number cannot be larger than the number of observations. Note that cross-validation with one fold ( |
oos.prop |
Proportion of the data that is used as the validation sample when selecting |
validation.index |
Vector containing the row indices of the data matrix corresponding to the observations that are used as the validation sample.
This argument is only used when |
ncores |
Number of cores used when performing cross-validation. A strictly positive integer or |
po.ncores |
Number of cores used when computing the proximal operators. A strictly positive integer or |
print |
A logical indicating if intermediate results need to be printed, default is |
More details on the selection of lambda can be found in the package vignette.
A list with elements named as the arguments.
Fitting procedures: glmsmurf
and glmsmurf.fit
(given design matrix). glm.control
## See example(plot_lambda) for examples
## See example(plot_lambda) for examples
Function used to define regularization terms in a glmsmurf
model formula.
p(pred1, pred2 = NULL, pen = "lasso", refcat = NULL, group = NULL)
p(pred1, pred2 = NULL, pen = "lasso", refcat = NULL, group = NULL)
pred1 |
Name of the predictor used in the regularization term. |
pred2 |
Either |
pen |
Type of penalty for this predictor, one of
Default is |
refcat |
Reference level when |
group |
Group to which the predictor belongs, only used for a Group Lasso penalty.
Default is |
Predictors with no penalty, a Lasso penalty or a Group Lasso penalty should be numeric or a factor which can be non-numeric. Predictors with a Fused Lasso, Generalized Fused Lasso, Graph-Guided Fused Lasso or 2D Fused Lasso penalty should be given as a factor which can also be non-numeric. When a predictor is given as a factor, there cannot be any unused levels.
For a predictor with a Fused Lasso penalty, the levels should be ordered from smallest to largest.
The first level will be the reference level, but this can be changed using the refcat
argument.
When lambda * lambda1 > 0
or lambda * lambda2 > 0
in glmsmurf
, no reference level is used
for the Fused Lasso, Generalized Fused Lasso and Graph-Guided Fused Lasso penalties, and refcat
will hence be ignored.
If pred2
is different from NULL
, pen
should be set to "2dflasso"
, and vice versa.
Note that there cannot be any unused levels in the interaction between pred1
and pred2
.
When adding an interaction between pred1
and pred2
with a 2D Fused Lasso penalty, the 1D effects
should also be present in the model and the reference categories for the 1D predictors need to be the respective first levels.
The reference level for the 2D predictor will then be the 2D level where it least one of the 1D components is equal to the 1D reference levels.
It is also allowed to add binned factors, of predictors
that are included in the model, in the interaction. They should have the original predictor name + '.binned' as predictor names.
For example: the original predictors 'age' and 'power' are included in the model and
the interaction of 'age.binned' and 'power.binned' can also be present in the model formula.
An overview of the different penalty types and their usage can be found in the package vignette.
# Munich rent data from catdata package data("rent", package = "catdata") # The considered predictors are the same as in # Gertheiss and Tutz (Ann. Appl. Stat., 2010). # Response is monthly rent per square meter in Euro # Urban district in Munich rent$area <- as.factor(rent$area) # Decade of construction rent$year <- as.factor(floor(rent$year / 10) * 10) # Number of rooms rent$rooms <- as.factor(rent$rooms) # Quality of the house with levels "fair", "good" and "excellent" rent$quality <- as.factor(rent$good + 2 * rent$best) levels(rent$quality) <- c("fair", "good", "excellent") # Floor space divided in categories (0, 30), [30, 40), ..., [130, 140) sizeClasses <- c(0, seq(30, 140, 10)) rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)]) # Is warm water present? rent$warm <- factor(rent$warm, labels = c("yes", "no")) # Is central heating present? rent$central <- factor(rent$central, labels = c("yes", "no")) # Does the bathroom have tiles? rent$tiles <- factor(rent$tiles, labels = c("yes", "no")) # Is there special furniture in the bathroom? rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes")) # Is the kitchen well-equipped? rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes")) # Create formula with 'rentm' as response variable, # 'area' with a Generalized Fused Lasso penalty, # 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties # where the reference category for 'year' is changed to 2000, # 'warm' and 'central' are in one group for the Group Lasso penalty, # 'tiles' and 'bathextra' are not regularized and # 'kitchen' has a Lasso penalty formu <- rentm ~ p(area, pen = "gflasso") + p(year, pen = "flasso", refcat = 2000) + p(rooms, pen = "flasso") + p(quality, pen = "flasso") + p(size, pen = "flasso") + p(warm, pen = "grouplasso", group = 1) + p(central, pen = "grouplasso", group = 1) + p(tiles, pen = "none") + bathextra + p(kitchen, pen = "lasso") # Fit a multi-type regularized GLM using the SMuRF algorithm. # We use standardization adaptive penalty weights based on an initial GLM fit. munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent, pen.weights = "glm.stand", lambda = 0.1) # Model summary summary(munich.fit)
# Munich rent data from catdata package data("rent", package = "catdata") # The considered predictors are the same as in # Gertheiss and Tutz (Ann. Appl. Stat., 2010). # Response is monthly rent per square meter in Euro # Urban district in Munich rent$area <- as.factor(rent$area) # Decade of construction rent$year <- as.factor(floor(rent$year / 10) * 10) # Number of rooms rent$rooms <- as.factor(rent$rooms) # Quality of the house with levels "fair", "good" and "excellent" rent$quality <- as.factor(rent$good + 2 * rent$best) levels(rent$quality) <- c("fair", "good", "excellent") # Floor space divided in categories (0, 30), [30, 40), ..., [130, 140) sizeClasses <- c(0, seq(30, 140, 10)) rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)]) # Is warm water present? rent$warm <- factor(rent$warm, labels = c("yes", "no")) # Is central heating present? rent$central <- factor(rent$central, labels = c("yes", "no")) # Does the bathroom have tiles? rent$tiles <- factor(rent$tiles, labels = c("yes", "no")) # Is there special furniture in the bathroom? rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes")) # Is the kitchen well-equipped? rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes")) # Create formula with 'rentm' as response variable, # 'area' with a Generalized Fused Lasso penalty, # 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties # where the reference category for 'year' is changed to 2000, # 'warm' and 'central' are in one group for the Group Lasso penalty, # 'tiles' and 'bathextra' are not regularized and # 'kitchen' has a Lasso penalty formu <- rentm ~ p(area, pen = "gflasso") + p(year, pen = "flasso", refcat = 2000) + p(rooms, pen = "flasso") + p(quality, pen = "flasso") + p(size, pen = "flasso") + p(warm, pen = "grouplasso", group = 1) + p(central, pen = "grouplasso", group = 1) + p(tiles, pen = "none") + bathextra + p(kitchen, pen = "lasso") # Fit a multi-type regularized GLM using the SMuRF algorithm. # We use standardization adaptive penalty weights based on an initial GLM fit. munich.fit <- glmsmurf(formula = formu, family = gaussian(), data = rent, pen.weights = "glm.stand", lambda = 0.1) # Model summary summary(munich.fit)
Function to plot the goodness-of-fit statistics or information criteria as a function of lambda when lambda is selected in-sample, out-of-sample or using cross-validation.
plot_lambda(x, ...) ## S3 method for class 'glmsmurf' plot_lambda( x, xlab = NULL, ylab = NULL, lambda.opt = TRUE, cv1se = TRUE, log.lambda = TRUE, ... )
plot_lambda(x, ...) ## S3 method for class 'glmsmurf' plot_lambda( x, xlab = NULL, ylab = NULL, lambda.opt = TRUE, cv1se = TRUE, log.lambda = TRUE, ... )
x |
An object for which the extraction of goodness-of-fit statistics or information criteria is meaningful.
E.g. an object of class ' |
... |
Additional arguments for the |
xlab |
Label for the x-axis. The default value is |
ylab |
Label for the y-axis. The default value is |
lambda.opt |
Logical indicating if the optimal value of lambda should be indicated on the plot
by a vertical dashed line. Default is |
cv1se |
Logical indicating if the standard errors should be indicated on the plot
when cross-validation with the one standard error rule is performed (e.g. "cv1se.dev"). Default is |
log.lambda |
Logical indicating if the logarithm of lambda is plotted on the x-axis, default is |
This plot can only be made when lambda is selected in-sample, out-of-sample or using cross-validation (possibly with the one standard error rule),
see the lambda
argument of glmsmurf
.
# Munich rent data from catdata package data("rent", package = "catdata") # The considered predictors are the same as in # Gertheiss and Tutz (Ann. Appl. Stat., 2010). # Response is monthly rent per square meter in Euro # Urban district in Munich rent$area <- as.factor(rent$area) # Decade of construction rent$year <- as.factor(floor(rent$year / 10) * 10) # Number of rooms rent$rooms <- as.factor(rent$rooms) # Quality of the house with levels "fair", "good" and "excellent" rent$quality <- as.factor(rent$good + 2 * rent$best) levels(rent$quality) <- c("fair", "good", "excellent") # Floor space divided in categories (0, 30), [30, 40), ..., [130, 140) sizeClasses <- c(0, seq(30, 140, 10)) rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)]) # Is warm water present? rent$warm <- factor(rent$warm, labels = c("yes", "no")) # Is central heating present? rent$central <- factor(rent$central, labels = c("yes", "no")) # Does the bathroom have tiles? rent$tiles <- factor(rent$tiles, labels = c("yes", "no")) # Is there special furniture in the bathroom? rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes")) # Is the kitchen well-equipped? rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes")) # Create formula with 'rentm' as response variable, # 'area' with a Generalized Fused Lasso penalty, # 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties, # and the other predictors with Lasso penalties. formu <- rentm ~ p(area, pen = "gflasso") + p(year, pen = "flasso") + p(rooms, pen = "flasso") + p(quality, pen = "flasso") + p(size, pen = "flasso") + p(warm, pen = "lasso") + p(central, pen = "lasso") + p(tiles, pen = "lasso") + p(bathextra, pen = "lasso") + p(kitchen, pen = "lasso") # Fit a multi-type regularized GLM using the SMuRF algorithm and select the optimal value of lambda # using cross-validation (with the deviance as loss measure and the one standard error rule). # We use standardization adaptive penalty weights based on an initial GLM fit. # The number of values of lambda to consider in cross-validation is # set to 10 using the control argument (default is 50). munich.fit.cv <- glmsmurf(formula = formu, family = gaussian(), data = rent, pen.weights = "glm.stand", lambda = "cv1se.dev", control = list(lambda.length = 10L, ncores = 1L)) # Plot average deviance over cross-validation folds as a function of the logarithm of lambda plot_lambda(munich.fit.cv) # Zoomed plot plot_lambda(munich.fit.cv, xlim = c(-7, -3.5), ylim = c(1575, 1750))
# Munich rent data from catdata package data("rent", package = "catdata") # The considered predictors are the same as in # Gertheiss and Tutz (Ann. Appl. Stat., 2010). # Response is monthly rent per square meter in Euro # Urban district in Munich rent$area <- as.factor(rent$area) # Decade of construction rent$year <- as.factor(floor(rent$year / 10) * 10) # Number of rooms rent$rooms <- as.factor(rent$rooms) # Quality of the house with levels "fair", "good" and "excellent" rent$quality <- as.factor(rent$good + 2 * rent$best) levels(rent$quality) <- c("fair", "good", "excellent") # Floor space divided in categories (0, 30), [30, 40), ..., [130, 140) sizeClasses <- c(0, seq(30, 140, 10)) rent$size <- as.factor(sizeClasses[findInterval(rent$size, sizeClasses)]) # Is warm water present? rent$warm <- factor(rent$warm, labels = c("yes", "no")) # Is central heating present? rent$central <- factor(rent$central, labels = c("yes", "no")) # Does the bathroom have tiles? rent$tiles <- factor(rent$tiles, labels = c("yes", "no")) # Is there special furniture in the bathroom? rent$bathextra <- factor(rent$bathextra, labels = c("no", "yes")) # Is the kitchen well-equipped? rent$kitchen <- factor(rent$kitchen, labels = c("no", "yes")) # Create formula with 'rentm' as response variable, # 'area' with a Generalized Fused Lasso penalty, # 'year', 'rooms', 'quality' and 'size' with Fused Lasso penalties, # and the other predictors with Lasso penalties. formu <- rentm ~ p(area, pen = "gflasso") + p(year, pen = "flasso") + p(rooms, pen = "flasso") + p(quality, pen = "flasso") + p(size, pen = "flasso") + p(warm, pen = "lasso") + p(central, pen = "lasso") + p(tiles, pen = "lasso") + p(bathextra, pen = "lasso") + p(kitchen, pen = "lasso") # Fit a multi-type regularized GLM using the SMuRF algorithm and select the optimal value of lambda # using cross-validation (with the deviance as loss measure and the one standard error rule). # We use standardization adaptive penalty weights based on an initial GLM fit. # The number of values of lambda to consider in cross-validation is # set to 10 using the control argument (default is 50). munich.fit.cv <- glmsmurf(formula = formu, family = gaussian(), data = rent, pen.weights = "glm.stand", lambda = "cv1se.dev", control = list(lambda.length = 10L, ncores = 1L)) # Plot average deviance over cross-validation folds as a function of the logarithm of lambda plot_lambda(munich.fit.cv) # Zoomed plot plot_lambda(munich.fit.cv, xlim = c(-7, -3.5), ylim = c(1575, 1750))
Function to plot the coefficients of the re-estimated model.
plot_reest(x, ...) ## S3 method for class 'glmsmurf' plot_reest( x, xlab = "Index", ylab = "Re-estimated coefficients", basic = FALSE, ... )
plot_reest(x, ...) ## S3 method for class 'glmsmurf' plot_reest( x, xlab = "Index", ylab = "Re-estimated coefficients", basic = FALSE, ... )
x |
An object for which the extraction of model coefficients is meaningful.
E.g. an object of class ' |
... |
Additional arguments for the |
xlab |
Label for the x-axis, default is |
ylab |
Label for the y-axis, default is |
basic |
Logical indicating if the basic lay-out is used for the plot, default is |
When the re-estimated model is not included in x
,
the coefficients of the estimated model in x
are plotted with a warning.
See plot.glmsmurf
for more details.
plot.glmsmurf
, coef_reest
, summary.glmsmurf
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to plot the coefficients of the estimated model.
## S3 method for class 'glmsmurf' plot(x, xlab = "Index", ylab = "Estimated coefficients", basic = FALSE, ...)
## S3 method for class 'glmsmurf' plot(x, xlab = "Index", ylab = "Estimated coefficients", basic = FALSE, ...)
x |
An object of class ' |
xlab |
Label for the x-axis, default is |
ylab |
Label for the y-axis, default is |
basic |
Logical indicating if the basic lay-out is used for the plot, default is |
... |
Additional arguments for the |
When basic=FALSE
, the improved lay-out for the plot is used. Per predictor, groups of equal coefficients are indicated
in the same color (up to 8 colors), and zero coefficients are indicated by grey squares.
plot_reest
, coef.glmsmurf
, summary.glmsmurf
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to obtain predictions using the re-estimated model.
predict_reest(object, ...) ## S3 method for class 'glmsmurf' predict_reest( object, newdata = NULL, newoffset = NULL, type = c("link", "response", "terms"), ... )
predict_reest(object, ...) ## S3 method for class 'glmsmurf' predict_reest( object, newdata = NULL, newoffset = NULL, type = c("link", "response", "terms"), ... )
object |
An object for which predictions are meaningful.
E.g. an object of class ' |
... |
Additional arguments which are currently ignored. |
newdata |
Optionally, a data frame containing the predictors used in the prediction.
This can only be used when |
newoffset |
Optionally, a vector containing a new offset to be used in the prediction.
When |
type |
Type of prediction. The default is on the scale of the linear predictors ( |
A vector containing the predicted values using the re-estimated model in object
,
when this is available, or, otherwise, the predicted values using the estimated model in object
with a warning.
predict.glmsmurf
, predict.glm
, predict
,
glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to obtain predictions using the estimated model.
## S3 method for class 'glmsmurf' predict( object, newdata = NULL, newoffset = NULL, type = c("link", "response", "terms"), ... )
## S3 method for class 'glmsmurf' predict( object, newdata = NULL, newoffset = NULL, type = c("link", "response", "terms"), ... )
object |
An object of class ' |
newdata |
Optionally, a data frame containing the predictors used in the prediction.
This can only be used when |
newoffset |
Optionally, a vector containing a new offset to be used in the prediction.
When |
type |
Type of prediction. The default is on the scale of the linear predictors ( |
... |
Additional arguments which are currently ignored. |
A vector containing the predicted values using the estimated model in object
.
predict_reest
, predict.glm
, predict
,
glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the residuals of the re-estimated model.
resid_reest
is an alias for it.
residuals_reest(object, ...) ## S3 method for class 'glmsmurf' residuals_reest( object, type = c("deviance", "pearson", "working", "response", "partial"), ... ) resid_reest(object, ...) ## S3 method for class 'glmsmurf' resid_reest( object, type = c("deviance", "pearson", "working", "response", "partial"), ... )
residuals_reest(object, ...) ## S3 method for class 'glmsmurf' residuals_reest( object, type = c("deviance", "pearson", "working", "response", "partial"), ... ) resid_reest(object, ...) ## S3 method for class 'glmsmurf' resid_reest( object, type = c("deviance", "pearson", "working", "response", "partial"), ... )
object |
An object for which the extraction of model residuals is meaningful.
E.g. an object of class ' |
... |
Additional arguments which are currently ignored. |
type |
Type of residuals that should be returned. One of |
See glm.summaries
for an overview of the different types of residuals.
A vector containing the residuals of the re-estimated model in object
when they are available, or, otherwise, the residuals of the estimated model in object
with a warning.
residuals.glmsmurf
, residuals
, glm.summaries
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to extract the residuals of the estimated model.
resid
is an alias for it.
## S3 method for class 'glmsmurf' residuals( object, type = c("deviance", "pearson", "working", "response", "partial"), ... ) ## S3 method for class 'glmsmurf' resid( object, type = c("deviance", "pearson", "working", "response", "partial"), ... )
## S3 method for class 'glmsmurf' residuals( object, type = c("deviance", "pearson", "working", "response", "partial"), ... ) ## S3 method for class 'glmsmurf' resid( object, type = c("deviance", "pearson", "working", "response", "partial"), ... )
object |
An object of class ' |
type |
Type of residuals that should be returned. One of |
... |
Additional arguments which are currently ignored. |
See glm.summaries
for an overview of the different types of residuals.
A vector containing the residuals of the estimated model in object
.
residuals_reest
, residuals
, glm.summaries
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples
Function to print a summary of a glmsmurf
-object.
## S3 method for class 'glmsmurf' summary(object, digits = 3L, ...)
## S3 method for class 'glmsmurf' summary(object, digits = 3L, ...)
object |
An object of class ' |
digits |
The number of significant digits used when printing, default is 3. |
... |
Additional arguments which are currently ignored. |
summary.glm
, glmsmurf
, glmsmurf-class
## See example(glmsmurf) for examples
## See example(glmsmurf) for examples