Package 'fscaret' reference manual

Title:	Automated Feature Selection from 'caret'
Description:	Automated feature selection using variety of models provided by 'caret' package. This work was funded by Poland-Singapore bilateral cooperation project no 2/3/POL-SIN/2012.
Authors:	Jakub Szlek [aut, cre], Aleksander Mendyk [ctb]
Maintainer:	Jakub Szlek <[email protected]>
License:	GPL-2 \| GPL-3
Version:	0.9.4.4
Built:	2025-03-15 06:01:05 UTC
Source:	https://github.com/jszlek/fscaret

Automated feature selection caret (fscaret)

Description

This package provide fast and automated feature selection based on caret package modeling methods. The main advantage of this extension is that it requires minimum user involvement. Also the variety of used methods in combination with the scaling according to RMSE or MSE obtained from models profit the user. The idea is based on the assumption that the variety of models will balance the roughness of calculations (default model settings are applied). On Windows OS the time limiting function is off, multicore functionalaity is enabled via parLapply() function of package 'parallel'. Acknowledgments:
This work was funded by Poland-Singapore bilateral cooperation project no 2/3/POL-SIN/2012

Details

Package:	fscaret
Type:	Package
Version:	0.9.4.2
Date:	2017-12-07
License:	GPL-2 \| GPL-3

Author(s)

Jakub Szlek <[email protected]> Contributions from Aleksander Mendyk, also stackoverflow and [email protected] mailing list community.
Maintainer: Jakub Szlek <[email protected]>.

References

Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.
Szlek J, Paclawski A, Lau R, Jachowicz R, Mendyk A. Heuristic modeling of macromolecule release from PLGA microspheres. International Journal of Nanomedicine. 2013:8(1); 4601 - 4611. http://www.dovepress.com/international-journal-of-nanomedicine-journal.

classVarImp

Description

The function uses the caret package advantage to perform fitting of numerous classification models.

Usage

classVarImp(model, xTrain, yTrain, xTest,
	  fitControl, myTimeLimit, no.cores,
	  lk_col, supress.output)
classVarImp(model, xTrain, yTrain, xTest,
	  fitControl, myTimeLimit, no.cores,
	  lk_col, supress.output)

Arguments

`model`	Chosed models as called from function fscaret(), argument Used.funcClassPred.
`xTrain`	Training data set, data frame of input vector
`yTrain`	Training data set, vector of observed outputs, must be in binary form 0/1.
`xTest`	Testing data set, data frame of input vector
`fitControl`	Fitting controls passed to caret function
`myTimeLimit`	Time limit in seconds for single model fitting
`no.cores`	Number of used cores for calculations
`lk_col`	Number of columns for whole data set (inputs + output)
`supress.output`	If TRUE output of models are supressed.

Author(s)

Jakub Szlek and Aleksander Mendyk

References

Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.

dataPreprocess

Description

The functionality is realized in two main steps:

Check for near zero variance predictors and flag as near zero if:
1. the percentage of unique values is less than 20
2. the ratio of the most frequent to the second most frequent value is greater than 20,
Check for susceptibility to multicollinearity
1. Calculate correlation matrix
2. Find variables with correlation 0.9 or more and delete them

Usage

dataPreprocess(trainMatryca_nr, testMatryca_nr, labelsFrame, lk_col, lk_row, with.labels)
dataPreprocess(trainMatryca_nr, testMatryca_nr, labelsFrame, lk_col, lk_row, with.labels)

Arguments

`trainMatryca_nr`	Input training data matrix
`testMatryca_nr`	Input testing data matrix
`labelsFrame`	Transposed data frame of column names
`lk_col`	Number of columns
`lk_row`	Number of rows
`with.labels`	If with.labels=TRUE, additional data frame with preprocessed inputs corresponding to original data set column numbers as output is generated

Author(s)

Jakub Szlek and Aleksander Mendyk

References

Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.

Examples



library(fscaret)

# Create data sets and labels data frame
trainMatrix <- matrix(rnorm(150*120,mean=10,sd=1), 150, 120)

# Adding some near-zero variance attributes

temp1 <- matrix(runif(150,0.0001,0.0005), 150, 12)

# Adding some highly correlated attributes

sampleColIndex <- sample(ncol(trainMatrix), size=10)

temp2 <- matrix(trainMatrix[,sampleColIndex]*2, 150, 10)

# Output variable

output <- matrix(rnorm(150,mean=10,sd=1), 150, 1)

trainMatrix <- cbind(trainMatrix,temp1,temp2, output)

colnames(trainMatrix) <- paste("X",c(1:ncol(trainMatrix)),sep="")

# Subset test data set

testMatrix <- trainMatrix[sample(round(0.1*nrow(trainMatrix))),]

labelsDF <- data.frame("Labels"=paste("X",c(1:(ncol(trainMatrix)-1)),sep=""))

lk_col <- ncol(trainMatrix)
lk_row <- nrow(trainMatrix)

with.labels = TRUE

testRes <- dataPreprocess(trainMatrix, testMatrix,
			  labelsDF, lk_col, lk_row, with.labels)
			  
summary(testRes)

# Selected attributes after data set preprocessing
testRes$labelsDF

# Training and testing data sets after preprocessing
testRes$trainMatryca
testRes$testMatryca

library(fscaret)

# Create data sets and labels data frame
trainMatrix <- matrix(rnorm(150*120,mean=10,sd=1), 150, 120)

# Adding some near-zero variance attributes

temp1 <- matrix(runif(150,0.0001,0.0005), 150, 12)

# Adding some highly correlated attributes

sampleColIndex <- sample(ncol(trainMatrix), size=10)

temp2 <- matrix(trainMatrix[,sampleColIndex]*2, 150, 10)

# Output variable

output <- matrix(rnorm(150,mean=10,sd=1), 150, 1)

trainMatrix <- cbind(trainMatrix,temp1,temp2, output)

colnames(trainMatrix) <- paste("X",c(1:ncol(trainMatrix)),sep="")

# Subset test data set

testMatrix <- trainMatrix[sample(round(0.1*nrow(trainMatrix))),]

labelsDF <- data.frame("Labels"=paste("X",c(1:(ncol(trainMatrix)-1)),sep=""))

lk_col <- ncol(trainMatrix)
lk_row <- nrow(trainMatrix)

with.labels = TRUE

testRes <- dataPreprocess(trainMatrix, testMatrix,
			  labelsDF, lk_col, lk_row, with.labels)
			  
summary(testRes)

# Selected attributes after data set preprocessing
testRes$labelsDF

# Training and testing data sets after preprocessing
testRes$trainMatryca
testRes$testMatryca

Example testing data set

Description

The data set after preprocessing, which resulted in 29 inptus. Original data set was obtained in literature survey with 298 inputs. Input: chemical descriptors and characteristics of 8 PLGA microparicles formulation. Output: mean particle size of PLGA microparticles Number of attributes 29, single output.

Usage

data(dataset.test)data(dataset.test)

Format

data.frame

Details

Literature survey yielded 68 formulations of PLGA microspheres with protein as active pharmaceuticla ingridient. In vitro release profiles as well as formulation characteristics and composition were derived from articles. Chemical descriptors were obtained using Marvin ChemAxon software (cxcalc plugin). The final data base consisted of 298 inputs and single output mean particle size.

Source

Kang F, Singh J. Effect of additives on the release of a model protein from PLGA microspheres. AAPS PharmSciTech 2001(2)4, 1-7
Zhou XL et al. Pharmacokinetic and pharmacodynamic profiles of recombinant human erythropoietin-loaded poly(lactic-co-glycolic acid) microspheres in rats. ActaPharmSinica 2012(33), 137-144
Dongmei F et al. Mesoporous Silicon-PLGA Composite Microspheres for the Double Controlled Release of Biomolecules for Orthopedic Tissue Engineering. Adv Funct Mater 2012(22), 282-293.
Kim T.H. et al. Pegylated recombinant human epidermal growth factor (rhEGF) for sustained release from biodegradable PLGA microspheres. Biomater 2002,23, 2311-2317.
Blanco D et al. Protein encapsulation and release from poly(lactide-co-glycolide) microspheres: effect of the protein and polymer properties and of the co-encapsulation of surfactants. Eur J Pharm Biopharm. 1998, 45, 285-294.
Morita T et al. Applicability of various amphiphilic polymers to the modification of protein release kinetics from biodegradable reservoir-type microspheres. Eur J Pharm Biopharm. 2001, 51, 45-53.
Mok H et al. Water free microencapsulation of proteins within PLGA microparticles by spray drying using PEG assisted protein solubilization technique in organic solvent. Eur J Pharm Biopharm. 2008, 70, 137-144.
Buske J et al. Influence of PEG in PEG-PLGA microspheres on particle properties and protein release. Eur J Pharm Biopharm. 2012, 81, 57-63.
Corrigan OI et al. Quantifying drug release from PLGA nanoparticulates. Eur J Pharm Sci. 2009, 37, 477-485.
Puras G. et al. Encapsulation of A-beta-(1-15) in PLGA microparticles enhances serum antibody response in mice immunized by subcutaneous and intranasal routes. Eur J Pharm Sci. 2011 44, 200-206
Tran VT et al. Protein loaded PLGA PEG PLGA microspheres A tool for cell therapy. Eur J Pharm Sci. 2012, 45, 128-137.
Kim HK et al. Microencapsulation of dissociable human growth hormone aggregates within poly(D,L-lactic-co-glycolic acid) microparticles for sustained release. Int J Pharm. 2001, 229, 107-116
Han Y et al. Insulin nanoparticle preparation and encapsulation into poly(lactic-co-glycolic acid) microspheres by using an anhydrous system. Int J Pharm. 2009, 378, 159-166
Liu Q et al. In vitro and in vivo study of thymosin alpha1 biodegradable in situ forming poly(lactide-co-glycolide) implants. Int J Pharm. 2010, 397, 122-129.
He J et al. Stabilization and encapsulation of recombinant human erythropoietin into PLGA microspheres using human serum albumin as a stabilizer. Int J Pharm. 2011, 416, 69-76.
Gasper MM et al. Formulation of L-asparaginase-loaded poly(lactide-co-glycolide) nanoparticles: influence of polymer properties on enzyme loading, activity and in vitro release. J Control Release. 1998, 52, 53-62.
Kawashima Y et al. Pulmonary delivery of insulin with nebulized DL-lactide/glycolide copolymer (PLGA) nanospheres to prolong hypoglycemic effect. J Control Release. 1999, 62, 279-287.
Geng Y et al. Formulating erythropoietin-loaded sustained-release PLGA microspheres without protein aggregation. J Control Release. 2008, 130, 259-265.
Ungaro F et al. Insulin-loaded PLGA/cyclodextrin large porous particles with improved aerosolization properties: in vivo deposition and hypoglycaemic activity after delivery to rat lungs. J Control Release. 2009, 135(1), 25-34.
Iwata M et al. In vitro and in vivo release properties of brilliant blue and tumour necrosis factor-alpha (TNF-alpha) from poly(D,L-lactic-co-glycolic acid) multiphase microspheres. J Microencapsul. 1999, 16(6), 777-792.
Jiang HL et al. Improvement of protein loading and modulation of protein release from poly(lactide-co-glycolide) microspheres by complexation of proteins with polyanions. J Microencapsul. 2004, 21(6), 615-624
Pirooznia N et al. Encapsulation of alpha-1 antitrypsin in PLGA nanoparticles: in vitro characterization as an effective aerosol formulation in pulmonary diseases. J Nanobiotechnology. 2012, 10(1), 20-35.
Castellanos IJ et al. Effect of cyclodextrins on alpha-chymotrypsin stability and loading in PLGA microspheres upon S/O/W encapsulation. J Pharm Sci. 2006, 95(4), 849-858.
Brodbeck KJ et al. Sustained release of human growth hormone from PLGA solution depots. Pharm Res. 2009, 16(12), 1825-1829.

Example training data set

Description

Usage

data(dataset.train)data(dataset.train)

Format

data.frame

Details

Source

Kang F, Singh J. Effect of additives on the release of a model protein from PLGA microspheres. AAPS PharmSciTech 2001(2)4, 1-7
Zhou XL et al. Pharmacokinetic and pharmacodynamic profiles of recombinant human erythropoietin-loaded poly(lactic-co-glycolic acid) microspheres in rats. ActaPharmSinica 2012(33), 137-144
Dongmei F et al. Mesoporous Silicon-PLGA Composite Microspheres for the Double Controlled Release of Biomolecules for Orthopedic Tissue Engineering. Adv Funct Mater 2012(22), 282-293.
Kim T.H. et al. Pegylated recombinant human epidermal growth factor (rhEGF) for sustained release from biodegradable PLGA microspheres. Biomater 2002,23, 2311-2317.
Blanco D et al. Protein encapsulation and release from poly(lactide-co-glycolide) microspheres: effect of the protein and polymer properties and of the co-encapsulation of surfactants. Eur J Pharm Biopharm. 1998, 45, 285-294.
Morita T et al. Applicability of various amphiphilic polymers to the modification of protein release kinetics from biodegradable reservoir-type microspheres. Eur J Pharm Biopharm. 2001, 51, 45-53.
Mok H et al. Water free microencapsulation of proteins within PLGA microparticles by spray drying using PEG assisted protein solubilization technique in organic solvent. Eur J Pharm Biopharm. 2008, 70, 137-144.
Buske J et al. Influence of PEG in PEG-PLGA microspheres on particle properties and protein release. Eur J Pharm Biopharm. 2012, 81, 57-63.
Corrigan OI et al. Quantifying drug release from PLGA nanoparticulates. Eur J Pharm Sci. 2009, 37, 477-485.
Puras G. et al. Encapsulation of A-beta-(1-15) in PLGA microparticles enhances serum antibody response in mice immunized by subcutaneous and intranasal routes. Eur J Pharm Sci. 2011 44, 200-206
Tran VT et al. Protein loaded PLGA PEG PLGA microspheres A tool for cell therapy. Eur J Pharm Sci. 2012, 45, 128-137.
Kim HK et al. Microencapsulation of dissociable human growth hormone aggregates within poly(D,L-lactic-co-glycolic acid) microparticles for sustained release. Int J Pharm. 2001, 229, 107-116
Han Y et al. Insulin nanoparticle preparation and encapsulation into poly(lactic-co-glycolic acid) microspheres by using an anhydrous system. Int J Pharm. 2009, 378, 159-166
Liu Q et al. In vitro and in vivo study of thymosin alpha1 biodegradable in situ forming poly(lactide-co-glycolide) implants. Int J Pharm. 2010, 397, 122-129.
He J et al. Stabilization and encapsulation of recombinant human erythropoietin into PLGA microspheres using human serum albumin as a stabilizer. Int J Pharm. 2011, 416, 69-76.
Gasper MM et al. Formulation of L-asparaginase-loaded poly(lactide-co-glycolide) nanoparticles: influence of polymer properties on enzyme loading, activity and in vitro release. J Control Release. 1998, 52, 53-62.
Kawashima Y et al. Pulmonary delivery of insulin with nebulized DL-lactide/glycolide copolymer (PLGA) nanospheres to prolong hypoglycemic effect. J Control Release. 1999, 62, 279-287.
Geng Y et al. Formulating erythropoietin-loaded sustained-release PLGA microspheres without protein aggregation. J Control Release. 2008, 130, 259-265.
Ungaro F et al. Insulin-loaded PLGA/cyclodextrin large porous particles with improved aerosolization properties: in vivo deposition and hypoglycaemic activity after delivery to rat lungs. J Control Release. 2009, 135(1), 25-34.
Iwata M et al. In vitro and in vivo release properties of brilliant blue and tumour necrosis factor-alpha (TNF-alpha) from poly(D,L-lactic-co-glycolic acid) multiphase microspheres. J Microencapsul. 1999, 16(6), 777-792.
Jiang HL et al. Improvement of protein loading and modulation of protein release from poly(lactide-co-glycolide) microspheres by complexation of proteins with polyanions. J Microencapsul. 2004, 21(6), 615-624
Pirooznia N et al. Encapsulation of alpha-1 antitrypsin in PLGA nanoparticles: in vitro characterization as an effective aerosol formulation in pulmonary diseases. J Nanobiotechnology. 2012, 10(1), 20-35.
Castellanos IJ et al. Effect of cyclodextrins on alpha-chymotrypsin stability and loading in PLGA microspheres upon S/O/W encapsulation. J Pharm Sci. 2006, 95(4), 849-858.
Brodbeck KJ et al. Sustained release of human growth hormone from PLGA solution depots. Pharm Res. 2009, 16(12), 1825-1829.

feature selection caret

Description

Main function for fast feature selection. It utilizes other functions as regPredImp or impCalc to obtain results in a list of data frames.

Usage

fscaret(trainDF, testDF, installReqPckg = FALSE, preprocessData = FALSE,
	with.labels = TRUE, classPred = FALSE, regPred = TRUE, skel_outfile = NULL,
	impCalcMet = "RMSE&MSE", myTimeLimit = 24 * 60 * 60, Used.funcRegPred = NULL,
	Used.funcClassPred = NULL, no.cores = NULL, method = "boot", returnResamp = "all",
	missData=NULL, supress.output=FALSE, saveModel=FALSE, lvlScale=FALSE, ...)
fscaret(trainDF, testDF, installReqPckg = FALSE, preprocessData = FALSE,
	with.labels = TRUE, classPred = FALSE, regPred = TRUE, skel_outfile = NULL,
	impCalcMet = "RMSE&MSE", myTimeLimit = 24 * 60 * 60, Used.funcRegPred = NULL,
	Used.funcClassPred = NULL, no.cores = NULL, method = "boot", returnResamp = "all",
	missData=NULL, supress.output=FALSE, saveModel=FALSE, lvlScale=FALSE, ...)

Arguments

`trainDF`	Data frame of training data set, MISO (multiple input single output) type
`testDF`	Data frame of testing data set, MISO (multiple input single output) type
`installReqPckg`	If TRUE prior to calculations it installs all required packages, please be advised to be logged as root (admin) user
`preprocessData`	If TRUE data preprocessing is performed prior to modeling
`with.labels`	If TRUE header of the input files are read
`classPred`	If TRUE classification models are applied. Please be advised that importance is scaled according to F-measure regardless impCalcMet settings.
`regPred`	If TRUE regression models are applied
`skel_outfile`	Skeleton output file, e.g. skel_outfile=c("_myoutput_")
`impCalcMet`	Variable importance calculation scaling according to RMSE and MSE, for both please enter impCalcMet="RMSE&MSE"
`myTimeLimit`	Time limit in seconds for single model development
`Used.funcRegPred`	Vector of regression models to be used, for all available models please enter Used.funcRegPred="all"
`Used.funcClassPred`	Vector of classification models to be used, for all available models please enter Used.funcClassPred="all"
`no.cores`	Number of cores to be used for modeling, if NULL all available cores are used, should be numeric type or NULL
`method`	Method passed to fitControl of caret package
`returnResamp`	Returned resampling method passed to fitControl of caret package
`missData`	Handling of missing data values. Possible values: "delRow" - delete observations with missing values, "delCol" - delete attributes with missing values, "meanCol" - replace missing values with column mean.
`supress.output`	If TRUE output of modeling phase by caret functions are supressed. Only info which model is currently calculated and resulting variable importance.
`saveModel`	Logical value [TRUE/FALSE] if trained model should be embedded in final model.
`lvlScale`	Logical value [TRUE/FALSE] if additional scaling should be applied. For more information plase refer to impCalc().
`...`	Additional arguments, preferably passed to fitControl of caret package

Value

`$ModelPred`	List of outputs from caret model fitting
`$VarImp`	Data frames of variable importance and corresponding trained models
`$PPlabels`	Data frame of resulting preprocessed data set with original input numbers and names
`$PPTrainDF`	Training data set after preprocessing
`$PPTestDF`	Testing data set after preprocessing
`$VarImp$model`	Trained models

Note

Be advised when using fscaret function as it requires hard disk operations for saving fitted models and data frames. Files are written in R temp session folder, for more details see tempdir(), getwd() and setwd()

Author(s)

Jakub Szlek and Aleksander Mendyk

References

Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.

Examples


if((Sys.info()['sysname'])!="SunOS"){

library(fscaret)

# Load data sets
data(dataset.train)
data(dataset.test)

requiredPackages <- c("R.utils", "gsubfn", "ipred", "caret", "parallel", "MASS")

if(.Platform$OS.type=="windows"){

myFirstRES <- fscaret(dataset.train, dataset.test, installReqPckg=FALSE,
                  preprocessData=FALSE, with.labels=TRUE, classPred=FALSE,
                  regPred=TRUE, skel_outfile=NULL,
                  impCalcMet="RMSE&MSE", myTimeLimit=4,
                  Used.funcRegPred=c("lm"), Used.funcClassPred=NULL,
                  no.cores=1, method="boot", returnResamp="all",
                  supress.output=TRUE,saveModel=FALSE)

} else {

myCores <- 2

myFirstRES <- fscaret(dataset.train, dataset.test, installReqPckg=FALSE,
                  preprocessData=FALSE, with.labels=TRUE, classPred=FALSE,
                  regPred=TRUE, skel_outfile=NULL,
                  impCalcMet="RMSE&MSE", myTimeLimit=4,
                  Used.funcRegPred=c("lm","ppr"), Used.funcClassPred=NULL,
                  no.cores=myCores, method="boot", returnResamp="all",
                  supress.output=TRUE,saveModel=FALSE)

}



# Results
myFirstRES

}

if((Sys.info()['sysname'])!="SunOS"){

library(fscaret)

# Load data sets
data(dataset.train)
data(dataset.test)

requiredPackages <- c("R.utils", "gsubfn", "ipred", "caret", "parallel", "MASS")

if(.Platform$OS.type=="windows"){

myFirstRES <- fscaret(dataset.train, dataset.test, installReqPckg=FALSE,
                  preprocessData=FALSE, with.labels=TRUE, classPred=FALSE,
                  regPred=TRUE, skel_outfile=NULL,
                  impCalcMet="RMSE&MSE", myTimeLimit=4,
                  Used.funcRegPred=c("lm"), Used.funcClassPred=NULL,
                  no.cores=1, method="boot", returnResamp="all",
                  supress.output=TRUE,saveModel=FALSE)

} else {

myCores <- 2

myFirstRES <- fscaret(dataset.train, dataset.test, installReqPckg=FALSE,
                  preprocessData=FALSE, with.labels=TRUE, classPred=FALSE,
                  regPred=TRUE, skel_outfile=NULL,
                  impCalcMet="RMSE&MSE", myTimeLimit=4,
                  Used.funcRegPred=c("lm","ppr"), Used.funcClassPred=NULL,
                  no.cores=myCores, method="boot", returnResamp="all",
                  supress.output=TRUE,saveModel=FALSE)

}



# Results
myFirstRES

}

Classification methods used.

Description

Vector of all classification methods used in solving problems by caret

Usage

data(funcClassPred)data(funcClassPred)

Format

vector

All regression methods used

Description

Vector of all regression methods used in solving problems by caret

Usage

data(funcRegPred)data(funcRegPred)

Format

vector

impCalc

Description

impCalc function is designed to scale variable importance according to MSE and RMSE calculations. It also stores the raw MSE, RMSE, F-measure and developed models if saveModel=TRUE. impCalc is low-level function, it shouldn't be used alone unless user has trained models from caret package stored in RData files.

Usage

impCalc(skel_outfile, xTest, yTest, lk_col, 
          labelsFrame,with.labels,regPred,classPred,saveModel,lvlScale)
impCalc(skel_outfile, xTest, yTest, lk_col, 
          labelsFrame,with.labels,regPred,classPred,saveModel,lvlScale)

Arguments

`skel_outfile`	Skeleton name of output file
`xTest`	Input vector of testing data set
`yTest`	Output vector of testing data set
`lk_col`	Number of columns of whole data set
`labelsFrame`	Labels to sort variable importance
`with.labels`	Pass with.labels argument. It is advised to ALWAYS use labels as in some cases VarImp returns importance in descending values. If you insist turning with.labels FALSE, then make sure data base contains pure data and you read it (read.csv) to data.frame with option header=FALSE.
`regPred`	Indicating if regression predictions are computed. Logical value [TRUE/FALSE]. If regPred is set TRUE, then classPred should be set FALSE.
`classPred`	Indicating if classification predictions are computed. Possible values TRUE/FALSE. If classPred is set TRUE, then regPred should be set FALSE. Please be advised that importance is scaled according to F-measure.
`saveModel`	Logical value [TRUE/FALSE] if trained model should be embedded in final model.
`lvlScale`	Indicating if use additional scaling. The option is especially usefull when large number of features are getting NA's or are not included in feature ranking. It levels the scores of the features taking the overall number of features. Default value is FALSE. Logical value [TRUE/FALSE].

Details

impCalc function lists RData files in working directory assuming there are only models derived by caret. In a loop function loads models and tries to get the variable importance.

Author(s)

Jakub Szlek and Aleksander Mendyk

Examples


## Not run: 
# 
# Hashed to comply with new CRAN check
# 
library(fscaret)

# Load dataset
data(dataset.train)
data(dataset.test)

# Make objects
trainDF <- dataset.train
testDF <- dataset.test
model <- c("lm","Cubist")
fitControl <- trainControl(method = "boot", returnResamp = "all") 
myTimeLimit <- 5
no.cores <- 2
supress.output <- TRUE
skel_outfile <- paste("_default_",sep="")
mySystem <- .Platform$OS.type
with.labels <- TRUE
redPred <- TRUE
classPred <- FALSE
saveModel <- FALSE
lvlScale <- FALSE

if(mySystem=="windows"){
no.cores <- 1
}

# Scan dimensions of trainDF [lk_row x lk_col]
lk_col = ncol(trainDF)
lk_row = nrow(trainDF)

# Read labels of trainDF
labelsFrame <- as.data.frame(colnames(trainDF))
labelsFrame <-cbind(c(1:ncol(trainDF)),labelsFrame)
# Create a train data set matrix
trainMatryca_nr <- matrix(data=NA,nrow=lk_row,ncol=lk_col)

row=0
col=0

for(col in 1:(lk_col)) {
   for(row in 1:(lk_row)) {
     trainMatryca_nr[row,col] <- (as.numeric(trainDF[row,col]))
    }
}

# Pointing standard data set train
xTrain <- data.frame(trainMatryca_nr[,-lk_col])
yTrain <- as.vector(trainMatryca_nr[,lk_col])


#--------Scan dimensions of trainDataFrame1 [lk_row x lk_col]
lk_col_test = ncol(testDF)
lk_row_test = nrow(testDF)

testMatryca_nr <- matrix(data=NA,nrow=lk_row_test,ncol=lk_col_test)

row=0
col=0

for(col in 1:(lk_col_test)) {
   for(row in 1:(lk_row_test)) {
     testMatryca_nr[row,col] <- (as.numeric(testDF[row,col]))
    }
}

# Pointing standard data set test
xTest <- data.frame(testMatryca_nr[,-lk_col])
yTest <- as.vector(testMatryca_nr[,lk_col])


# Calling low-level function to create models to calculate on
myVarImp <- regVarImp(model, xTrain, yTrain, xTest,
	    fitControl, myTimeLimit, no.cores, lk_col,
	    supress.output, mySystem)


myImpCalc <- impCalc(skel_outfile, xTest, yTest,
              lk_col,labelsFrame,with.labels,redPred,classPred,saveModel,lvlScale)


## End(Not run)

## Not run: 
# 
# Hashed to comply with new CRAN check
# 
library(fscaret)

# Load dataset
data(dataset.train)
data(dataset.test)

# Make objects
trainDF <- dataset.train
testDF <- dataset.test
model <- c("lm","Cubist")
fitControl <- trainControl(method = "boot", returnResamp = "all") 
myTimeLimit <- 5
no.cores <- 2
supress.output <- TRUE
skel_outfile <- paste("_default_",sep="")
mySystem <- .Platform$OS.type
with.labels <- TRUE
redPred <- TRUE
classPred <- FALSE
saveModel <- FALSE
lvlScale <- FALSE

if(mySystem=="windows"){
no.cores <- 1
}

# Scan dimensions of trainDF [lk_row x lk_col]
lk_col = ncol(trainDF)
lk_row = nrow(trainDF)

# Read labels of trainDF
labelsFrame <- as.data.frame(colnames(trainDF))
labelsFrame <-cbind(c(1:ncol(trainDF)),labelsFrame)
# Create a train data set matrix
trainMatryca_nr <- matrix(data=NA,nrow=lk_row,ncol=lk_col)

row=0
col=0

for(col in 1:(lk_col)) {
   for(row in 1:(lk_row)) {
     trainMatryca_nr[row,col] <- (as.numeric(trainDF[row,col]))
    }
}

# Pointing standard data set train
xTrain <- data.frame(trainMatryca_nr[,-lk_col])
yTrain <- as.vector(trainMatryca_nr[,lk_col])


#--------Scan dimensions of trainDataFrame1 [lk_row x lk_col]
lk_col_test = ncol(testDF)
lk_row_test = nrow(testDF)

testMatryca_nr <- matrix(data=NA,nrow=lk_row_test,ncol=lk_col_test)

row=0
col=0

for(col in 1:(lk_col_test)) {
   for(row in 1:(lk_row_test)) {
     testMatryca_nr[row,col] <- (as.numeric(testDF[row,col]))
    }
}

# Pointing standard data set test
xTest <- data.frame(testMatryca_nr[,-lk_col])
yTest <- as.vector(testMatryca_nr[,lk_col])


# Calling low-level function to create models to calculate on
myVarImp <- regVarImp(model, xTrain, yTrain, xTest,
	    fitControl, myTimeLimit, no.cores, lk_col,
	    supress.output, mySystem)


myImpCalc <- impCalc(skel_outfile, xTest, yTest,
              lk_col,labelsFrame,with.labels,redPred,classPred,saveModel,lvlScale)


## End(Not run)

imputeMean

Description

Secondary function imputes the mean to columns with NA data.

Usage

impute.mean(x)
impute.mean(x)

Arguments

`x`	a vector to calculate mean

Author(s)

Jakub Szlek and Aleksander Mendyk

Examples


library(fscaret)

# Make sample matrix
testData <- matrix(data=rep(1:5),ncol=10,nrow=15)

# Replace random values with NA's
n <- 15
replace <- TRUE
set.seed(1)

rand.sample <- sample(length(testData), n, replace=replace)
testData[rand.sample] <- NA 

# Print out input matrix
testData

# Record cols with missing values
missing.colsTestMatrix <- which(colSums(is.na(testData))>0)

for(i in 1:length(missing.colsTestMatrix)){

rowToReplace <- missing.colsTestMatrix[i]
testData[,rowToReplace] <- impute.mean(testData[,rowToReplace])

}

# Print out matrix with replaced NA's by column mean 
testData

library(fscaret)

# Make sample matrix
testData <- matrix(data=rep(1:5),ncol=10,nrow=15)

# Replace random values with NA's
n <- 15
replace <- TRUE
set.seed(1)

rand.sample <- sample(length(testData), n, replace=replace)
testData[rand.sample] <- NA 

# Print out input matrix
testData

# Record cols with missing values
missing.colsTestMatrix <- which(colSums(is.na(testData))>0)

for(i in 1:length(missing.colsTestMatrix)){

rowToReplace <- missing.colsTestMatrix[i]
testData[,rowToReplace] <- impute.mean(testData[,rowToReplace])

}

# Print out matrix with replaced NA's by column mean 
testData

installPckg

Description

Function installs the packages that are listed in data(requiredPackages). The function is called within fscaret function. If argument "installReqPckg = TRUE" the function installs required packages.

Usage

installPckg(requiredPackages)
installPckg(requiredPackages)

Arguments

requiredPackages

Vector of packages to be installed

Details

Be advised setting "installReqPckg = TRUE" installs packages in your home directory (.R). To install packages for all users please login as root (admin).

Author(s)

Jakub Szlek and Aleksander Mendyk

MSE

Description

Function calculates mean squared error as predicted vs. observed

Usage

MSE(vect1, vect2, rows_no)
MSE(vect1, vect2, rows_no)

Arguments

`vect1`	Numeric vector of predicted values
`vect2`	Numeric vector of observed values
`rows_no`	Number of observations

Author(s)

Jakub Szlek and Aleksander Mendyk

regVarImp

Description

The function uses the caret package advantage to perform fitting of numerous regression models.

Usage

regVarImp(model, xTrain, yTrain, xTest,
	  fitControl, myTimeLimit, no.cores,
	  lk_col, supress.output)
regVarImp(model, xTrain, yTrain, xTest,
	  fitControl, myTimeLimit, no.cores,
	  lk_col, supress.output)

Arguments

`model`	Chosed models as called from function fscaret(), argument Used.funcRegPred.
`xTrain`	Training data set, data frame of input vector
`yTrain`	Training data set, vector of observed outputs
`xTest`	Testing data set, data frame of input vector
`fitControl`	Fitting controls passed to caret function
`myTimeLimit`	Time limit in seconds for single model fitting
`no.cores`	Number of used cores for calculations
`lk_col`	Number of columns for whole data set (inputs + output)
`supress.output`	If TRUE output of models are supressed.

Author(s)

Jakub Szlek and Aleksander Mendyk

References

Kuhn M. (2008) Building Predictive Models in R Using the caret Package Journal of Statistical Software 28(5) http://www.jstatsoft.org/.

requiredPackages

Description

Character vector of names of required packages to fully take advantage of fscaret

Usage

data(requiredPackages)data(requiredPackages)

Format

vector

Examples

data(requiredPackages)
data(requiredPackages)

RMSE

Description

Function calculates root mean squared error.

Usage

RMSE(vect1, vect2, rows_no)
RMSE(vect1, vect2, rows_no)

Arguments

`vect1`	Numeric vector of predicted values
`vect2`	Numeric vector of observed values
`rows_no`	Number of observations

Author(s)

Aleksander Mendyk

timeout

Description

This function limits elapsed time spent on single model development. It uses low-level functions of parallel packege and sets the fork process with time limit. If the result is not returned within set time, it kills fork. Function shouldn't be called from R console. The function is not used under Windows OS. Only Unix-like systems have fork functionality.

Usage

timeout(..., seconds)
timeout(..., seconds)

Arguments

`...`	Expression to be time limited
`seconds`	Number of seconds

Author(s)

Original code by Jeroen Ooms <jeroen.ooms at stat.ucla.edu> of OpenCPU package. Modifications by Jakub Szlek and Aleksander Mendyk.

Package 'fscaret'

Help Index

Automated feature selection caret (fscaret)

Description

Details

Author(s)

References

See Also

classVarImp

Description

Usage

Arguments

Author(s)

References

dataPreprocess

Description

Usage

Arguments

Author(s)

References

Examples

Example testing data set

Description

Usage

Format

Details

Source

Example training data set

Description

Usage

Format

Details

Source

feature selection caret

Description

Usage

Arguments

Value

Note

Author(s)

References

Examples

Classification methods used.

Description

Usage

Format

All regression methods used

Description

Usage

Format

impCalc

Description

Usage

Arguments

Details

Author(s)

Examples

imputeMean

Description

Usage

Arguments

Author(s)

Examples

installPckg

Description

Usage

Arguments

Details

Author(s)

MSE

Description

Usage

Arguments

Author(s)

regVarImp

Description

Usage

Arguments

Author(s)

References