learning from arnowaczynski
https://www.kaggle.com/c/home-credit-default-risk/discussion/64609
[[ 모델 블랜딩 전략 ]]
-
many CV runs with random seeds with different random split(10 folds)
-
simple average of top 30 models
-
ridge regression on top 60 models
[[ 하이퍼 파라미터 탐색 전략 ]]
-
num_boost_round = 10000 with early_stopping_rounds=200
-
bagging_freq = 1
-
tuning hyper-params(6): learning_rate, num_leaves, max_depth, min_data_in_leaf, feature_fraction, bagging_freq
-
random grid search
-
set discrete search space for each params
-
while searching, print every cv score with selected hyper-parameters
-
at any time, stop search and adjust the search space (make sure not same parames evaluated more than once)