作者: fchollet
创建日期 2023/07/10
最后修改日期 2023/07/10
描述: 初次接触 Keras 3。
Keras 3 是一个深度学习框架,可以与 TensorFlow、JAX 和 PyTorch 互换使用。本笔记本将引导您了解 Keras 3 的关键工作流程。
我们将在这里使用 JAX 后端 – 但您可以将下面的字符串编辑为 "tensorflow"
或 "torch"
并点击 “重启运行时”,整个笔记本将以同样的方式运行!本指南与后端无关。
import numpy as np
import os
os.environ["KERAS_BACKEND"] = "jax"
# Note that Keras should only be imported after the backend
# has been configured. The backend cannot be changed once the
# package is imported.
import keras
让我们从机器学习的 “Hello World” 开始:训练一个卷积神经网络来分类 MNIST 数字。
这是数据
# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
x_train shape: (60000, 28, 28, 1)
y_train shape: (60000,)
60000 train samples
10000 test samples
这是我们的模型。
Keras 提供的不同模型构建选项包括:
# Model parameters
num_classes = 10
input_shape = (28, 28, 1)
model = keras.Sequential(
[
keras.layers.Input(shape=input_shape),
keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dropout(0.5),
keras.layers.Dense(num_classes, activation="softmax"),
]
)
这是我们的模型摘要
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ │ conv2d (Conv2D) │ (None, 26, 26, 64) │ 640 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ conv2d_1 (Conv2D) │ (None, 24, 24, 64) │ 36,928 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ max_pooling2d (MaxPooling2D) │ (None, 12, 12, 64) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ conv2d_2 (Conv2D) │ (None, 10, 10, 128) │ 73,856 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ conv2d_3 (Conv2D) │ (None, 8, 8, 128) │ 147,584 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ global_average_pooling2d │ (None, 128) │ 0 │ │ (GlobalAveragePooling2D) │ │ │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ dropout (Dropout) │ (None, 128) │ 0 │ ├─────────────────────────────────┼───────────────────────────┼────────────┤ │ dense (Dense) │ (None, 10) │ 1,290 │ └─────────────────────────────────┴───────────────────────────┴────────────┘
Total params: 260,298 (1016.79 KB)
Trainable params: 260,298 (1016.79 KB)
Non-trainable params: 0 (0.00 B)
我们使用 compile()
方法来指定优化器、损失函数和要监控的指标。请注意,对于 JAX 和 TensorFlow 后端,默认情况下会启用 XLA 编译。
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(),
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
metrics=[
keras.metrics.SparseCategoricalAccuracy(name="acc"),
],
)
让我们训练和评估模型。在训练期间,我们将预留 15% 的数据作为验证集,以监控在未见过的数据上的泛化性能。
batch_size = 128
epochs = 20
callbacks = [
keras.callbacks.ModelCheckpoint(filepath="model_at_epoch_{epoch}.keras"),
keras.callbacks.EarlyStopping(monitor="val_loss", patience=2),
]
model.fit(
x_train,
y_train,
batch_size=batch_size,
epochs=epochs,
validation_split=0.15,
callbacks=callbacks,
)
score = model.evaluate(x_test, y_test, verbose=0)
Epoch 1/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 74s 184ms/step - acc: 0.4980 - loss: 1.3832 - val_acc: 0.9609 - val_loss: 0.1513
Epoch 2/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 74s 186ms/step - acc: 0.9245 - loss: 0.2487 - val_acc: 0.9702 - val_loss: 0.0999
Epoch 3/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 70s 175ms/step - acc: 0.9515 - loss: 0.1647 - val_acc: 0.9816 - val_loss: 0.0608
Epoch 4/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 69s 174ms/step - acc: 0.9622 - loss: 0.1247 - val_acc: 0.9833 - val_loss: 0.0541
Epoch 5/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 68s 171ms/step - acc: 0.9685 - loss: 0.1083 - val_acc: 0.9860 - val_loss: 0.0468
Epoch 6/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 70s 176ms/step - acc: 0.9710 - loss: 0.0955 - val_acc: 0.9897 - val_loss: 0.0400
Epoch 7/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 69s 172ms/step - acc: 0.9742 - loss: 0.0853 - val_acc: 0.9888 - val_loss: 0.0388
Epoch 8/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 68s 169ms/step - acc: 0.9789 - loss: 0.0738 - val_acc: 0.9902 - val_loss: 0.0387
Epoch 9/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 75s 187ms/step - acc: 0.9789 - loss: 0.0691 - val_acc: 0.9907 - val_loss: 0.0341
Epoch 10/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 77s 194ms/step - acc: 0.9806 - loss: 0.0636 - val_acc: 0.9907 - val_loss: 0.0348
Epoch 11/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 74s 186ms/step - acc: 0.9812 - loss: 0.0610 - val_acc: 0.9926 - val_loss: 0.0271
Epoch 12/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 219s 550ms/step - acc: 0.9820 - loss: 0.0590 - val_acc: 0.9912 - val_loss: 0.0294
Epoch 13/20
399/399 ━━━━━━━━━━━━━━━━━━━━ 70s 176ms/step - acc: 0.9843 - loss: 0.0504 - val_acc: 0.9918 - val_loss: 0.0316
在训练期间,我们在每个 epoch 结束时保存模型。您也可以像这样保存模型的最新状态:
model.save("final_model.keras")
并像这样重新加载它:
model = keras.saving.load_model("final_model.keras")
接下来,您可以使用 predict()
查询类概率的预测:
predictions = model.predict(x_test)
313/313 ━━━━━━━━━━━━━━━━━━━━ 3s 9ms/step
这就是基础知识!
Keras 使您能够编写可在 TensorFlow、JAX 和 PyTorch 中使用相同代码库的自定义层、模型、指标、损失和优化器。让我们先来看看自定义层。
keras.ops
命名空间包含:
keras.ops.stack
或 keras.ops.matmul
。keras.ops.conv
或 keras.ops.binary_crossentropy
。让我们创建一个适用于所有后端的自定义 Dense
层:
class MyDense(keras.layers.Layer):
def __init__(self, units, activation=None, name=None):
super().__init__(name=name)
self.units = units
self.activation = keras.activations.get(activation)
def build(self, input_shape):
input_dim = input_shape[-1]
self.w = self.add_weight(
shape=(input_dim, self.units),
initializer=keras.initializers.GlorotNormal(),
name="kernel",
trainable=True,
)
self.b = self.add_weight(
shape=(self.units,),
initializer=keras.initializers.Zeros(),
name="bias",
trainable=True,
)
def call(self, inputs):
# Use Keras ops to create backend-agnostic layers/metrics/etc.
x = keras.ops.matmul(inputs, self.w) + self.b
return self.activation(x)
接下来,让我们创建一个依赖于 keras.random
命名空间的自定义 Dropout
层:
class MyDropout(keras.layers.Layer):
def __init__(self, rate, name=None):
super().__init__(name=name)
self.rate = rate
# Use seed_generator for managing RNG state.
# It is a state element and its seed variable is
# tracked as part of `layer.variables`.
self.seed_generator = keras.random.SeedGenerator(1337)
def call(self, inputs):
# Use `keras.random` for random ops.
return keras.random.dropout(inputs, self.rate, seed=self.seed_generator)
接下来,让我们编写一个使用我们两个自定义层的自定义子类化模型:
class MyModel(keras.Model):
def __init__(self, num_classes):
super().__init__()
self.conv_base = keras.Sequential(
[
keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
keras.layers.Conv2D(128, kernel_size=(3, 3), activation="relu"),
keras.layers.GlobalAveragePooling2D(),
]
)
self.dp = MyDropout(0.5)
self.dense = MyDense(num_classes, activation="softmax")
def call(self, x):
x = self.conv_base(x)
x = self.dp(x)
return self.dense(x)
让我们编译并拟合它:
model = MyModel(num_classes=10)
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(),
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
metrics=[
keras.metrics.SparseCategoricalAccuracy(name="acc"),
],
)
model.fit(
x_train,
y_train,
batch_size=batch_size,
epochs=1, # For speed
validation_split=0.15,
)
399/399 ━━━━━━━━━━━━━━━━━━━━ 70s 174ms/step - acc: 0.5104 - loss: 1.3473 - val_acc: 0.9256 - val_loss: 0.2484
<keras.src.callbacks.history.History at 0x105608670>
所有 Keras 模型都可以在各种数据源上进行训练和评估,而与您使用的后端无关。这包括:
tf.data.Dataset
对象DataLoader
对象PyDataset
对象无论您使用 TensorFlow、JAX 还是 PyTorch 作为 Keras 后端,它们都可以正常工作。
让我们用 PyTorch DataLoaders
试一下:
import torch
# Create a TensorDataset
train_torch_dataset = torch.utils.data.TensorDataset(
torch.from_numpy(x_train), torch.from_numpy(y_train)
)
val_torch_dataset = torch.utils.data.TensorDataset(
torch.from_numpy(x_test), torch.from_numpy(y_test)
)
# Create a DataLoader
train_dataloader = torch.utils.data.DataLoader(
train_torch_dataset, batch_size=batch_size, shuffle=True
)
val_dataloader = torch.utils.data.DataLoader(
val_torch_dataset, batch_size=batch_size, shuffle=False
)
model = MyModel(num_classes=10)
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(),
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
metrics=[
keras.metrics.SparseCategoricalAccuracy(name="acc"),
],
)
model.fit(train_dataloader, epochs=1, validation_data=val_dataloader)
469/469 ━━━━━━━━━━━━━━━━━━━━ 81s 172ms/step - acc: 0.5502 - loss: 1.2550 - val_acc: 0.9419 - val_loss: 0.1972
<keras.src.callbacks.history.History at 0x2b3385480>
现在让我们用 tf.data
试一下:
import tensorflow as tf
train_dataset = (
tf.data.Dataset.from_tensor_slices((x_train, y_train))
.batch(batch_size)
.prefetch(tf.data.AUTOTUNE)
)
test_dataset = (
tf.data.Dataset.from_tensor_slices((x_test, y_test))
.batch(batch_size)
.prefetch(tf.data.AUTOTUNE)
)
model = MyModel(num_classes=10)
model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(),
optimizer=keras.optimizers.Adam(learning_rate=1e-3),
metrics=[
keras.metrics.SparseCategoricalAccuracy(name="acc"),
],
)
model.fit(train_dataset, epochs=1, validation_data=test_dataset)
469/469 ━━━━━━━━━━━━━━━━━━━━ 81s 172ms/step - acc: 0.5771 - loss: 1.1948 - val_acc: 0.9229 - val_loss: 0.2502
<keras.src.callbacks.history.History at 0x2b33e7df0>
我们的 Keras 3 新的多后端功能简介到此结束。接下来,您可以了解:
fit()
中发生的事情想要自己实现非标准的训练算法,但仍然想受益于 fit()
的强大功能和可用性?可以很容易地自定义 fit()
以支持任意用例:
享受这个库吧!🚀