作者: Haifeng Jin
创建日期 2023/02/28
最后修改日期 2023/02/28
描述: KerasTuner 中容错配置的基础知识。
一个 KerasTuner 程序可能需要很长时间才能运行完毕,因为每个模型训练可能需要很长时间。我们不希望程序仅仅因为某些 Trial 随机失败而终止。
在本指南中,我们将展示如何在 KerasTuner 中处理失败的 Trial,包括:
FatalError
来终止搜索!pip install keras-tuner -q
import keras
from keras import layers
import keras_tuner
import numpy as np
我们将在初始化 Tuner 时使用 max_retries_per_trial
和 max_consecutive_failed_trials
参数。
max_retries_per_trial
控制一个 Trial 如果持续失败的最大重试次数。例如,如果设置为 3,则 Trial 可能运行 4 次(1 次失败运行 + 3 次失败重试),最终才被标记为失败。max_retries_per_trial
的默认值为 0。
max_consecutive_failed_trials
控制在终止搜索之前允许连续失败的 Trial(此处失败的 Trial 指所有重试都失败的 Trial)的数量。例如,如果设置为 3 且 Trial 2、Trial 3 和 Trial 4 都失败了,搜索就会终止。但是,如果设置为 3 且只有 Trial 2、Trial 3、Trial 5 和 Trial 6 失败,搜索就不会终止,因为失败的 Trial 不是连续的。max_consecutive_failed_trials
的默认值为 3。
以下代码演示了这两个参数的实际作用。
ValueError
。def build_model(hp):
# Define the 2 hyperparameters for the units in dense layers
units_1 = hp.Int("units_1", 10, 40, step=10)
units_2 = hp.Int("units_2", 10, 30, step=10)
# Define the model
model = keras.Sequential(
[
layers.Dense(units=units_1, input_shape=(20,)),
layers.Dense(units=units_2),
layers.Dense(units=1),
]
)
model.compile(loss="mse")
# Raise an error when the model is too large
num_params = model.count_params()
if num_params > 1200:
raise ValueError(f"Model too large! It contains {num_params} params.")
return model
我们按如下方式设置 Tuner。
max_retries_per_trial
设置为 3。max_consecutive_failed_trials
设置为 8。GridSearch
来枚举所有超参数值的组合。tuner = keras_tuner.GridSearch(
hypermodel=build_model,
objective="val_loss",
overwrite=True,
max_retries_per_trial=3,
max_consecutive_failed_trials=8,
)
# Use random data to train the model.
tuner.search(
x=np.random.rand(100, 20),
y=np.random.rand(100, 1),
validation_data=(
np.random.rand(100, 20),
np.random.rand(100, 1),
),
epochs=10,
)
# Print the results.
tuner.results_summary()
Trial 12 Complete [00h 00m 00s]
Best val_loss So Far: 0.12375041842460632
Total elapsed time: 00h 00m 08s
Results summary
Results in ./untitled_project
Showing 10 best trials
Objective(name="val_loss", direction="min")
Trial 0003 summary
Hyperparameters:
units_1: 20
units_2: 10
Score: 0.12375041842460632
Trial 0001 summary
Hyperparameters:
units_1: 10
units_2: 20
Score: 0.12741881608963013
Trial 0002 summary
Hyperparameters:
units_1: 10
units_2: 30
Score: 0.13982832431793213
Trial 0000 summary
Hyperparameters:
units_1: 10
units_2: 10
Score: 0.1433391124010086
Trial 0005 summary
Hyperparameters:
units_1: 20
units_2: 30
Score: 0.14747518301010132
Trial 0006 summary
Hyperparameters:
units_1: 30
units_2: 10
Score: 0.15092280507087708
Trial 0004 summary
Hyperparameters:
units_1: 20
units_2: 20
Score: 0.21962997317314148
Trial 0007 summary
Hyperparameters:
units_1: 30
units_2: 20
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
model = self._try_build(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
model = self._build_hypermodel(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
model = self.hypermodel.build(hp)
File "/tmp/ipykernel_21713/966577796.py", line 19, in build_model
raise ValueError(f"Model too large! It contains {num_params} params.")
ValueError: Model too large! It contains 1271 params.
Trial 0008 summary
Hyperparameters:
units_1: 30
units_2: 30
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
model = self._try_build(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
model = self._build_hypermodel(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
model = self.hypermodel.build(hp)
File "/tmp/ipykernel_21713/966577796.py", line 19, in build_model
raise ValueError(f"Model too large! It contains {num_params} params.")
ValueError: Model too large! It contains 1591 params.
Trial 0009 summary
Hyperparameters:
units_1: 40
units_2: 10
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
model = self._try_build(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
model = self._build_hypermodel(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
model = self.hypermodel.build(hp)
File "/tmp/ipykernel_21713/966577796.py", line 19, in build_model
raise ValueError(f"Model too large! It contains {num_params} params.")
ValueError: Model too large! It contains 1261 params.
当模型过大时,我们不需要重试。无论使用相同的超参数尝试多少次,它总是过大。
我们可以设置 max_retries_per_trial=0
来实现这一点。然而,这会导致无论发生什么错误都不会重试,而我们可能仍希望对其他意外错误进行重试。有没有更好的方法来处理这种情况呢?
我们可以引发 FailedTrialError
来跳过重试。无论何时引发此错误,Trial 都不会被重试。当发生其他错误时,重试仍然会运行。示例如下。
def build_model(hp):
# Define the 2 hyperparameters for the units in dense layers
units_1 = hp.Int("units_1", 10, 40, step=10)
units_2 = hp.Int("units_2", 10, 30, step=10)
# Define the model
model = keras.Sequential(
[
layers.Dense(units=units_1, input_shape=(20,)),
layers.Dense(units=units_2),
layers.Dense(units=1),
]
)
model.compile(loss="mse")
# Raise an error when the model is too large
num_params = model.count_params()
if num_params > 1200:
# When this error is raised, it skips the retries.
raise keras_tuner.errors.FailedTrialError(
f"Model too large! It contains {num_params} params."
)
return model
tuner = keras_tuner.GridSearch(
hypermodel=build_model,
objective="val_loss",
overwrite=True,
max_retries_per_trial=3,
max_consecutive_failed_trials=8,
)
# Use random data to train the model.
tuner.search(
x=np.random.rand(100, 20),
y=np.random.rand(100, 1),
validation_data=(
np.random.rand(100, 20),
np.random.rand(100, 1),
),
epochs=10,
)
# Print the results.
tuner.results_summary()
Trial 12 Complete [00h 00m 00s]
Best val_loss So Far: 0.08265472948551178
Total elapsed time: 00h 00m 05s
Results summary
Results in ./untitled_project
Showing 10 best trials
Objective(name="val_loss", direction="min")
Trial 0002 summary
Hyperparameters:
units_1: 10
units_2: 30
Score: 0.08265472948551178
Trial 0005 summary
Hyperparameters:
units_1: 20
units_2: 30
Score: 0.11731438338756561
Trial 0006 summary
Hyperparameters:
units_1: 30
units_2: 10
Score: 0.13600358366966248
Trial 0004 summary
Hyperparameters:
units_1: 20
units_2: 20
Score: 0.1465979516506195
Trial 0000 summary
Hyperparameters:
units_1: 10
units_2: 10
Score: 0.15967626869678497
Trial 0001 summary
Hyperparameters:
units_1: 10
units_2: 20
Score: 0.1646396517753601
Trial 0003 summary
Hyperparameters:
units_1: 20
units_2: 10
Score: 0.1696309596300125
Trial 0007 summary
Hyperparameters:
units_1: 30
units_2: 20
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
model = self._try_build(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
model = self._build_hypermodel(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
model = self.hypermodel.build(hp)
File "/tmp/ipykernel_21713/2463037569.py", line 20, in build_model
raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: Model too large! It contains 1271 params.
Trial 0008 summary
Hyperparameters:
units_1: 30
units_2: 30
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
model = self._try_build(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
model = self._build_hypermodel(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
model = self.hypermodel.build(hp)
File "/tmp/ipykernel_21713/2463037569.py", line 20, in build_model
raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: Model too large! It contains 1591 params.
Trial 0009 summary
Hyperparameters:
units_1: 40
units_2: 10
Traceback (most recent call last):
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 273, in _try_run_and_update_trial
self._run_and_update_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/base_tuner.py", line 238, in _run_and_update_trial
results = self.run_trial(trial, *fit_args, **fit_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 314, in run_trial
obj_value = self._build_and_fit_model(trial, *args, **copied_kwargs)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 232, in _build_and_fit_model
model = self._try_build(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 164, in _try_build
model = self._build_hypermodel(hp)
File "/home/codespace/.local/lib/python3.10/site-packages/keras_tuner/src/engine/tuner.py", line 155, in _build_hypermodel
model = self.hypermodel.build(hp)
File "/tmp/ipykernel_21713/2463037569.py", line 20, in build_model
raise keras_tuner.errors.FailedTrialError(
keras_tuner.src.errors.FailedTrialError: Model too large! It contains 1261 params.
如果代码中存在 bug,我们应该立即终止搜索并修复 bug。您可以在满足您定义的条件时通过编程方式终止搜索。引发 FatalError
(或其子类 FatalValueError
、FatalTypeError
或 FatalRuntimeError
)将终止搜索,而不管 max_consecutive_failed_trials
参数的值如何。
以下是一个当模型过大时终止搜索的示例。
def build_model(hp):
# Define the 2 hyperparameters for the units in dense layers
units_1 = hp.Int("units_1", 10, 40, step=10)
units_2 = hp.Int("units_2", 10, 30, step=10)
# Define the model
model = keras.Sequential(
[
layers.Dense(units=units_1, input_shape=(20,)),
layers.Dense(units=units_2),
layers.Dense(units=1),
]
)
model.compile(loss="mse")
# Raise an error when the model is too large
num_params = model.count_params()
if num_params > 1200:
# When this error is raised, the search is terminated.
raise keras_tuner.errors.FatalError(
f"Model too large! It contains {num_params} params."
)
return model
tuner = keras_tuner.GridSearch(
hypermodel=build_model,
objective="val_loss",
overwrite=True,
max_retries_per_trial=3,
max_consecutive_failed_trials=8,
)
try:
# Use random data to train the model.
tuner.search(
x=np.random.rand(100, 20),
y=np.random.rand(100, 1),
validation_data=(
np.random.rand(100, 20),
np.random.rand(100, 1),
),
epochs=10,
)
except keras_tuner.errors.FatalError:
print("The search is terminated.")
Trial 7 Complete [00h 00m 01s]
val_loss: 0.14219732582569122
Best val_loss So Far: 0.09755773097276688
Total elapsed time: 00h 00m 04s
Search: Running Trial #8
Value |Best Value So Far |Hyperparameter
30 |10 |units_1
20 |20 |units_2
The search is terminated.
在本指南中,您学习了如何在 KerasTuner 中处理失败的 Trial
max_retries_per_trial
指定失败 Trial 的重试次数。max_consecutive_failed_trials
指定允许的最大连续失败 Trial 数。FailedTrialError
直接将 Trial 标记为失败并跳过重试。FatalError
、FatalValueError
、FatalTypeError
、FatalRuntimeError
立即终止搜索。