reinitializing keras model weights after each training pass

235 views Asked by At

I noticed few similar questions similar to this one in Stack-overflow, but none has an answer ..

I have a simple Keras model:

def create_model(x_train, y_train, x_val, y_val):
    # building the model
    # compile
    # fit
    # return the score using model.predict

I'm applying cross validation (Kfold stratified) as following:

    skf = StratifiedKFold(y, n_folds=5, shuffle=True, random_state=0)
    scores = []
    for train_index, val_index in skf:
        X_train, X_val = df[train_index], df[val_index]
        y_train, y_val = y[train_index], y[val_index]

        scores.append(create_model(X_train, y_train, X_val, y_val))
        # point A

Do I have to reinitialize the model weights after each training pass (point A), or Keras library manage this process?

If not, any suggestion that can improve the processing time (maybe flushing the memory? .. if possible).

I'm asking this question because I'm applying this process with Hyperopt library for Hyperparameter optimization, and I noticed that after many trials the model starts to take more time than in the beginning ..

Edit: as an Example, you can notice the processing time below for Hyperopt evals, where in each pass the 5-folds method is applied:

Hyperopt evals:   3%|▎         | 5/150 [16:09<7:54:20, 196.28s/it]

Hyperopt evals:   4%|▍         | 6/150 [22:33<10:06:20, 252.64s/it]

Hyperopt evals:   5%|▍         | 7/150 [26:20<9:43:55, 245.01s/it] 

Hyperopt evals:   5%|▌         | 8/150 [33:33<11:53:16, 301.38s/it]

Hyperopt evals:   6%|▌         | 9/150 [41:56<14:10:16, 361.82s/it]

Hyperopt evals:   7%|▋         | 10/150 [45:56<12:38:50, 325.22s/it]

Hyperopt evals:   7%|▋         | 11/150 [48:19<10:26:55, 270.61s/it]

Hyperopt evals:   8%|▊         | 12/150 [54:11<11:18:28, 294.99s/it]

Hyperopt evals:   9%|▊         | 13/150 [58:45<10:58:57, 288.59s/it]

Hyperopt evals:   9%|▉         | 14/150 [1:05:57<12:31:47, 331.68s/it]

Hyperopt evals:  10%|█         | 15/150 [1:13:38<13:53:30, 370.45s/it]

Hyperopt evals:  11%|█         | 16/150 [1:17:36<12:18:28, 330.66s/it]

Hyperopt evals:  11%|█▏        | 17/150 [1:25:56<14:06:13, 381.75s/it]

Hyperopt evals:  12%|█▏        | 18/150 [1:31:54<13:43:38, 374.39s/it]

Hyperopt evals:  13%|█▎        | 19/150 [1:36:11<12:20:55, 339.35s/it]

Hyperopt evals:  13%|█▎        | 20/150 [1:45:06<14:22:20, 398.01s/it]

Hyperopt evals:  14%|█▍        | 21/150 [1:49:14<12:38:51, 352.95s/it]

Hyperopt evals:  15%|█▍        | 22/150 [1:54:45<12:18:47, 346.31s/it]

Hyperopt evals:  15%|█▌        | 23/150 [1:59:04<11:17:24, 320.04s/it]

Hyperopt evals:  16%|█▌        | 24/150 [2:04:05<11:00:29, 314.52s/it]

Hyperopt evals:  17%|█▋        | 25/150 [2:07:47<9:57:11, 286.65s/it] 

Hyperopt evals:  17%|█▋        | 26/150 [2:12:47<10:00:37, 290.62s/it]

Hyperopt evals:  18%|█▊        | 27/150 [2:17:08<9:37:55, 281.91s/it] 

Hyperopt evals:  19%|█▊        | 28/150 [2:22:46<10:07:15, 298.65s/it]

Hyperopt evals:  19%|█▉        | 29/150 [2:28:56<10:45:29, 320.08s/it]

Hyperopt evals:  20%|██        | 30/150 [2:34:55<11:03:44, 331.87s/it]

Hyperopt evals:  21%|██        | 31/150 [2:40:20<10:53:43, 329.61s/it]

Hyperopt evals:  21%|██▏       | 32/150 [2:46:19<11:05:42, 338.50s/it]

Hyperopt evals:  22%|██▏       | 33/150 [2:51:47<10:53:54, 335.34s/it]

Hyperopt evals:  23%|██▎       | 34/150 [2:58:14<11:18:06, 350.75s/it]

Hyperopt evals:  23%|██▎       | 35/150 [3:04:10<11:15:41, 352.53s/it]

Hyperopt evals:  24%|██▍       | 36/150 [3:13:59<13:24:26, 423.39s/it]

Hyperopt evals:  25%|██▍       | 37/150 [3:20:13<12:49:38, 408.66s/it]

Hyperopt evals:  25%|██▌       | 38/150 [3:25:55<12:05:23, 388.61s/it]

Hyperopt evals:  26%|██▌       | 39/150 [3:35:53<13:54:59, 451.35s/it]

Hyperopt evals:  27%|██▋       | 40/150 [3:44:26<14:21:12, 469.75s/it]

Hyperopt evals:  27%|██▋       | 41/150 [3:50:42<13:22:33, 441.77s/it]

Hyperopt evals:  28%|██▊       | 42/150 [3:58:03<13:14:29, 441.39s/it]

Hyperopt evals:  29%|██▊       | 43/150 [4:11:11<16:12:35, 545.38s/it]

Hyperopt evals:  29%|██▉       | 44/150 [4:19:18<15:32:40, 527.93s/it]

Hyperopt evals:  30%|███       | 45/150 [4:26:03<14:19:21, 491.06s/it]

Hyperopt evals:  31%|███       | 46/150 [4:34:32<14:20:31, 496.46s/it]

Hyperopt evals:  31%|███▏      | 47/150 [4:45:01<15:20:25, 536.17s/it]

Hyperopt evals:  32%|███▏      | 48/150 [4:54:11<15:18:45, 540.45s/it]

Hyperopt evals:  33%|███▎      | 49/150 [4:58:42<12:53:19, 459.40s/it]

Hyperopt evals:  33%|███▎      | 50/150 [5:04:07<11:38:30, 419.11s/it]

Hyperopt evals:  34%|███▍      | 51/150 [5:12:48<12:22:14, 449.85s/it]

Hyperopt evals:  35%|███▍      | 52/150 [5:20:37<12:23:57, 455.49s/it]

Hyperopt evals:  35%|███▌      | 53/150 [5:28:18<12:19:19, 457.31s/it]

Hyperopt evals:  36%|███▌      | 54/150 [5:37:02<12:43:26, 477.15s/it]

Hyperopt evals:  37%|███▋      | 55/150 [5:45:21<12:46:00, 483.80s/it]

Hyperopt evals:  37%|███▋      | 56/150 [5:51:07<11:33:16, 442.51s/it]

Hyperopt evals:  38%|███▊      | 57/150 [5:59:38<11:57:39, 463.00s/it]

Hyperopt evals:  39%|███▊      | 58/150 [6:11:19<13:39:13, 534.27s/it]

Hyperopt evals:  39%|███▉      | 59/150 [6:28:06<17:05:39, 676.26s/it]

Hyperopt evals:  40%|████      | 60/150 [6:37:29<16:03:23, 642.27s/it]

Hyperopt evals:  41%|████      | 61/150 [6:43:38<13:51:06, 560.30s/it]

Hyperopt evals:  41%|████▏     | 62/150 [6:52:41<13:33:52, 554.92s/it]

Hyperopt evals:  42%|████▏     | 63/150 [7:00:05<12:36:40, 521.84s/it]

Hyperopt evals:  43%|████▎     | 64/150 [7:12:13<13:56:21, 583.50s/it]

Hyperopt evals:  43%|████▎     | 65/150 [7:20:03<12:58:38, 549.62s/it]

Hyperopt evals:  44%|████▍     | 66/150 [7:31:56<13:58:08, 598.68s/it]

Hyperopt evals:  45%|████▍     | 67/150 [7:44:48<15:00:05, 650.67s/it]

Hyperopt evals:  45%|████▌     | 68/150 [7:57:32<15:35:45, 684.70s/it]
1

There are 1 answers

0
Minions On

Do I have to reinitialize the model weights after each training pass (point A), or Keras library manage this process?

After checking the documentation + manual experiments: It seemed to me that Keras manage re-initializing the weights, and it's not needed.

If not, any suggestion that can improve the processing time (maybe flushing the memory? .. if possible).

The processing time in my case was increasing because:

1- Hyperopt uses bayesian optimization technique, so it tries each time when it pick the next parameter set to choose something better based on the prior probabilities
2- I'm using early stopping.

Thus, in each next eval, the hyperopt library starts to choose better parameter set where the model also starts to converge better than previously .. which means, less using of early stopping and more processing time (to complete the whole epoches).