데이터 분석에서 파생변수를 생성하는 과정은 매우 중요하고 노력과 시간이 많이 필요한 작업입니다.
We spent most of our efforts in feature engineering.
– Xavier Cohort, after winning one of many Kaggle competitions
… some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.
– Pedro Domingos, A Few Useful Things to Know about Machine Learning
rAutoFE 소개
rAutoFE(auto feature engineering in R)는 다양한 변수를 자동으로 생성해주는 R패키지 입니다.
rAutoFE 파생변수 목록
- numeric feature transformation
- numeric feature binning
- categorical feature encoding (frequency-based)
- categorical feature encoding (target-based)
- weight of evidence features
- interaction features
- reduce the number of levels for catogorical features
rAutoFE 패키지 설치 방법
library(devtools)
install_github("2econsulting/rAutoFE")
library(rAutoFE)
rAutoFE 패키지 사용 방법
library(data.table)
library(rAutoFE)
library(rAutoFS)
library(h2o)
savePath <- "c:/tmp"
dir.create(savePath)
data(churn)
churn <- as.data.table(churn)
churn[, Area.Code:=as.factor(Area.Code)]
splits <- splitFrame(dt=churn, ratio = c(0.5, 0.2), seed=1234)
train <- splits[[1]]
valid <- splits[[2]]
test <- splits[[3]]
y = "Churn."
x = colnames(train)[colnames(train)!=y]
h2o.init()
dataset <- autoFE(train=train, valid=valid, test=test, x=x, y=y, savePath=savePath, verbose=TRUE)