Feature Engineering 정규화 옵션 : log1p, z-score, min-max(mlp) 결측값 처리 옵션 : 0, -9999, NA WoE Features if there are too many splits Target Mean Encoding 2 or 3 Interaction Features Feature 6: Clustering features of original dataset Feature 7: Number of non-zeros elements in each row post-processing Feature Selection https://www.kaggle.com/ogrellier/feature-scoring-vs-zeros For each LGB model, I choose top 50 features and random choose 50 features from left features Feature Extraction .. Modeling Light GBM, CatBoost, XGBoost, Extra Trees Tuning : Cartesian + Random Grid Search Ensemble : 2-layer or 3-layer train데이터셋의 target range를 기준으로 예측값 cut off light gbm tuning guideline : https://github.com/Microsoft/LightGBM/issues/695 tuning options guideline: https://sites.google.com/view/lauraepp/parameters 3 LGBM models with the same parameters, but different seeds inside of LGBM (UPD: and different seeds in KFold splitting, which was even more important) binary_logloss