자동변수생성(rAutoFE) R패키지 개발

2018/06/13

데이터 분석에서 파생변수를 생성하는 과정은 매우 중요하고 노력과 시간이 많이 필요한 작업입니다.


We spent most of our efforts in feature engineering.

– Xavier Cohort, after winning one of many Kaggle competitions

… some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.

– Pedro Domingos, A Few Useful Things to Know about Machine Learning


rAutoFE 소개

rAutoFE(auto feature engineering in R)는 다양한 변수를 자동으로 생성해주는 R패키지 입니다.


rAutoFE 파생변수 목록

  1. numeric feature transformation
  2. numeric feature binning
  3. categorical feature encoding (frequency-based)
  4. categorical feature encoding (target-based)
  5. weight of evidence features
  6. interaction features
  7. reduce the number of levels for catogorical features


rAutoFE 패키지 설치 방법

library(devtools)
install_github("2econsulting/rAutoFE")
library(rAutoFE)


rAutoFE 패키지 사용 방법

library(data.table)
library(rAutoFE)
library(rAutoFS)
library(h2o)
savePath <- "c:/tmp"
dir.create(savePath)
data(churn)
churn <- as.data.table(churn)
churn[, Area.Code:=as.factor(Area.Code)]
splits <- splitFrame(dt=churn, ratio = c(0.5, 0.2), seed=1234)
train <- splits[[1]]
valid <- splits[[2]]
test  <- splits[[3]]
y = "Churn."
x = colnames(train)[colnames(train)!=y]
h2o.init()
dataset <- autoFE(train=train, valid=valid, test=test, x=x, y=y, savePath=savePath, verbose=TRUE)


-->