代码示例 / 生成式深度学习 / 神经风格迁移与 AdaIN

神经风格迁移与 AdaIN

作者: Aritra Roy GosthipatyRitwik Raha
创建日期 2021/11/08
最后修改 2021/11/08

ⓘ 本示例使用 Keras 2

在 Colab 中查看 GitHub 源代码

描述: 使用自适应实例归一化的神经风格迁移。

引言

神经风格迁移(Neural Style Transfer)是将一幅图像的风格应用到另一幅图像内容上的过程。这首次在 Gatys 等人的里程碑式论文“艺术风格的一种神经算法”中提出。该技术的一个主要局限性在于其运行时长,因为该算法使用了缓慢的迭代优化过程。

后续论文引入了批归一化(Batch Normalization)实例归一化(Instance Normalization)条件实例归一化(Conditional Instance Normalization),使得风格迁移可以用新的方式进行,不再需要缓慢的迭代过程。

继这些论文之后,作者 Xun Huang 和 Serge Belongie 提出了自适应实例归一化(Adaptive Instance Normalization,AdaIN),实现了实时的任意风格迁移。

在本示例中,我们实现了用于神经风格迁移的自适应实例归一化。下图中展示了我们仅训练了 30 个 epoch 的 AdaIN 模型的输出。

Style transfer sample gallery

您也可以使用这个Hugging Face 演示,用您自己的图像尝试该模型。

设置

首先导入必要的包。我们还设置种子以确保结果可重现。全局变量是超参数,我们可以根据需要进行更改。

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import tensorflow_datasets as tfds
from tensorflow.keras import layers

# Defining the global variables.
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 64
# Training for single epoch for time constraint.
# Please use atleast 30 epochs to see good results.
EPOCHS = 1
AUTOTUNE = tf.data.AUTOTUNE

对于神经风格迁移,我们需要风格图像和内容图像。在本示例中,我们将使用史上最佳艺术品(Best Artworks of All Time)作为风格数据集,使用Pascal VOC作为内容数据集。

这与作者在原始论文实现中使用的不同,他们在其中分别使用WIKI-Art作为风格数据集和MSCOCO作为内容数据集。我们这样做是为了创建一个最小但可重现的示例。


从 Kaggle 下载数据集

史上最佳艺术品(Best Artworks of All Time)数据集托管在 Kaggle 上,可以通过以下步骤轻松在 Colab 中下载:

  • 如果您没有 Kaggle API 密钥,请按照此处的说明获取。
  • 使用以下命令上传 Kaggle API 密钥。
from google.colab import files
files.upload()
  • 使用以下命令将 API 密钥移动到相应目录并下载数据集。
$ mkdir ~/.kaggle
$ cp kaggle.json ~/.kaggle/
$ chmod 600 ~/.kaggle/kaggle.json
$ kaggle datasets download ikarus777/best-artworks-of-all-time
$ unzip -qq best-artworks-of-all-time.zip
$ rm -rf images
$ mv resized artwork
$ rm best-artworks-of-all-time.zip artists.csv

tf.data 数据管道

在本节中,我们将构建项目的tf.data数据管道。对于风格数据集,我们从文件夹中解码、转换和调整图像大小。对于内容图像,由于我们使用了tfds模块,因此已经有一个tf.data数据集。

准备好风格和内容数据管道后,我们将两者打包在一起,以获得模型将使用的数据管道。

def decode_and_resize(image_path):
    """Decodes and resizes an image from the image file path.

    Args:
        image_path: The image file path.

    Returns:
        A resized image.
    """
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.convert_image_dtype(image, dtype="float32")
    image = tf.image.resize(image, IMAGE_SIZE)
    return image


def extract_image_from_voc(element):
    """Extracts image from the PascalVOC dataset.

    Args:
        element: A dictionary of data.

    Returns:
        A resized image.
    """
    image = element["image"]
    image = tf.image.convert_image_dtype(image, dtype="float32")
    image = tf.image.resize(image, IMAGE_SIZE)
    return image


# Get the image file paths for the style images.
style_images = os.listdir("/content/artwork/resized")
style_images = [os.path.join("/content/artwork/resized", path) for path in style_images]

# split the style images in train, val and test
total_style_images = len(style_images)
train_style = style_images[: int(0.8 * total_style_images)]
val_style = style_images[int(0.8 * total_style_images) : int(0.9 * total_style_images)]
test_style = style_images[int(0.9 * total_style_images) :]

# Build the style and content tf.data datasets.
train_style_ds = (
    tf.data.Dataset.from_tensor_slices(train_style)
    .map(decode_and_resize, num_parallel_calls=AUTOTUNE)
    .repeat()
)
train_content_ds = tfds.load("voc", split="train").map(extract_image_from_voc).repeat()

val_style_ds = (
    tf.data.Dataset.from_tensor_slices(val_style)
    .map(decode_and_resize, num_parallel_calls=AUTOTUNE)
    .repeat()
)
val_content_ds = (
    tfds.load("voc", split="validation").map(extract_image_from_voc).repeat()
)

test_style_ds = (
    tf.data.Dataset.from_tensor_slices(test_style)
    .map(decode_and_resize, num_parallel_calls=AUTOTUNE)
    .repeat()
)
test_content_ds = (
    tfds.load("voc", split="test")
    .map(extract_image_from_voc, num_parallel_calls=AUTOTUNE)
    .repeat()
)

# Zipping the style and content datasets.
train_ds = (
    tf.data.Dataset.zip((train_style_ds, train_content_ds))
    .shuffle(BATCH_SIZE * 2)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

val_ds = (
    tf.data.Dataset.zip((val_style_ds, val_content_ds))
    .shuffle(BATCH_SIZE * 2)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

test_ds = (
    tf.data.Dataset.zip((test_style_ds, test_content_ds))
    .shuffle(BATCH_SIZE * 2)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)
Downloading and preparing dataset voc/2007/4.0.0 (download: 868.85 MiB, generated: Unknown size, total: 868.85 MiB) to /root/tensorflow_datasets/voc/2007/4.0.0...

Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]
0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/voc/2007/4.0.0.incompleteP16YU5/voc-test.tfrecord

  0%|          | 0/4952 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/voc/2007/4.0.0.incompleteP16YU5/voc-train.tfrecord

  0%|          | 0/2501 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/voc/2007/4.0.0.incompleteP16YU5/voc-validation.tfrecord

  0%|          | 0/2510 [00:00<?, ? examples/s]

Dataset voc downloaded and prepared to /root/tensorflow_datasets/voc/2007/4.0.0. Subsequent calls will reuse this data.

数据可视化

训练前最好先可视化数据。为了确保预处理管道的正确性,我们可视化数据集中的 10 个样本。

style, content = next(iter(train_ds))
fig, axes = plt.subplots(nrows=10, ncols=2, figsize=(5, 30))
[ax.axis("off") for ax in np.ravel(axes)]

for (axis, style_image, content_image) in zip(axes, style[0:10], content[0:10]):
    (ax_style, ax_content) = axis
    ax_style.imshow(style_image)
    ax_style.set_title("Style Image")

    ax_content.imshow(content_image)
    ax_content.set_title("Content Image")

png


架构

风格迁移网络接收一个内容图像和一个风格图像作为输入,并输出风格迁移后的图像。AdaIN 的作者提出了一个简单的编码器-解码器结构来实现这一点。

AdaIN architecture

内容图像(C)和风格图像(S)都被馈送到编码器网络。这些编码器网络的输出(特征图)随后被馈送到 AdaIN 层。AdaIN 层计算一个组合特征图。然后将此特征图馈送到一个随机初始化的解码器网络,该网络充当神经风格迁移图像的生成器。

AdaIn equation

风格特征图(fs)和内容特征图(fc)被馈送到 AdaIN 层。该层生成组合特征图t。函数g表示解码器(生成器)网络。

编码器

编码器是预训练(在ImageNet上预训练)VGG19 模型的一部分。我们从block4-conv1层切片模型。输出层与作者在论文中建议的一致。

def get_encoder():
    vgg19 = keras.applications.VGG19(
        include_top=False,
        weights="imagenet",
        input_shape=(*IMAGE_SIZE, 3),
    )
    vgg19.trainable = False
    mini_vgg19 = keras.Model(vgg19.input, vgg19.get_layer("block4_conv1").output)

    inputs = layers.Input([*IMAGE_SIZE, 3])
    mini_vgg19_out = mini_vgg19(inputs)
    return keras.Model(inputs, mini_vgg19_out, name="mini_vgg19")

自适应实例归一化

AdaIN 层接收内容图像和风格图像的特征。该层可以通过以下公式定义:

AdaIn formula

其中sigma是标准差,mu是相关变量的均值。在上面的公式中,内容特征图fc的均值和方差与风格特征图fs的均值和方差对齐。

需要注意的是,作者提出的 AdaIN 层除了均值和方差外不使用其他参数。该层也没有任何可训练参数。这就是我们使用Python 函数而不是使用Keras 层的原因。该函数接收风格和内容特征图,计算图像的均值和标准差,并返回自适应实例归一化后的特征图。

def get_mean_std(x, epsilon=1e-5):
    axes = [1, 2]

    # Compute the mean and standard deviation of a tensor.
    mean, variance = tf.nn.moments(x, axes=axes, keepdims=True)
    standard_deviation = tf.sqrt(variance + epsilon)
    return mean, standard_deviation


def ada_in(style, content):
    """Computes the AdaIn feature map.

    Args:
        style: The style feature map.
        content: The content feature map.

    Returns:
        The AdaIN feature map.
    """
    content_mean, content_std = get_mean_std(content)
    style_mean, style_std = get_mean_std(style)
    t = style_std * (content - content_mean) / content_std + style_mean
    return t

解码器

作者指定解码器网络必须与编码器网络镜像对称。我们对称地反转了编码器来构建我们的解码器。我们使用了UpSampling2D层来增加特征图的空间分辨率。

请注意,作者警告不要在解码器网络中使用任何归一化层,并且确实继续展示了包含批归一化或实例归一化会损害整体网络的性能。

这是整个架构中唯一可训练的部分。

def get_decoder():
    config = {"kernel_size": 3, "strides": 1, "padding": "same", "activation": "relu"}
    decoder = keras.Sequential(
        [
            layers.InputLayer((None, None, 512)),
            layers.Conv2D(filters=512, **config),
            layers.UpSampling2D(),
            layers.Conv2D(filters=256, **config),
            layers.Conv2D(filters=256, **config),
            layers.Conv2D(filters=256, **config),
            layers.Conv2D(filters=256, **config),
            layers.UpSampling2D(),
            layers.Conv2D(filters=128, **config),
            layers.Conv2D(filters=128, **config),
            layers.UpSampling2D(),
            layers.Conv2D(filters=64, **config),
            layers.Conv2D(
                filters=3,
                kernel_size=3,
                strides=1,
                padding="same",
                activation="sigmoid",
            ),
        ]
    )
    return decoder

损失函数

在这里,我们构建神经风格迁移模型的损失函数。作者建议使用预训练的 VGG-19 来计算网络的损失函数。重要的是要记住,这仅用于训练解码器网络。总损失(Lt)是内容损失(Lc)和风格损失(Ls)的加权组合。lambda项用于调整迁移的风格量。

The total loss

内容损失

这是内容图像特征与神经风格迁移图像特征之间的欧几里得距离。

The content loss

作者建议使用 AdaIn 层的输出t作为内容目标,而不是使用原始图像的特征作为目标。这样做是为了加速收敛。

风格损失

作者没有使用更常用的格拉姆矩阵(Gram Matrix),而是提出计算统计特征(均值和方差)之间的差异,这使得概念更清晰。这可以通过以下公式轻松可视化:

The style loss

其中theta表示用于计算损失的 VGG-19 层。在这种情况下,这对应于:

  • block1_conv1
  • block1_conv2
  • block1_conv3
  • block1_conv4
def get_loss_net():
    vgg19 = keras.applications.VGG19(
        include_top=False, weights="imagenet", input_shape=(*IMAGE_SIZE, 3)
    )
    vgg19.trainable = False
    layer_names = ["block1_conv1", "block2_conv1", "block3_conv1", "block4_conv1"]
    outputs = [vgg19.get_layer(name).output for name in layer_names]
    mini_vgg19 = keras.Model(vgg19.input, outputs)

    inputs = layers.Input([*IMAGE_SIZE, 3])
    mini_vgg19_out = mini_vgg19(inputs)
    return keras.Model(inputs, mini_vgg19_out, name="loss_net")

神经风格迁移

这是训练器模块。我们将编码器和解码器包装在tf.keras.Model子类中。这使我们可以自定义model.fit()循环中发生的事情。

class NeuralStyleTransfer(tf.keras.Model):
    def __init__(self, encoder, decoder, loss_net, style_weight, **kwargs):
        super().__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.loss_net = loss_net
        self.style_weight = style_weight

    def compile(self, optimizer, loss_fn):
        super().compile()
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.style_loss_tracker = keras.metrics.Mean(name="style_loss")
        self.content_loss_tracker = keras.metrics.Mean(name="content_loss")
        self.total_loss_tracker = keras.metrics.Mean(name="total_loss")

    def train_step(self, inputs):
        style, content = inputs

        # Initialize the content and style loss.
        loss_content = 0.0
        loss_style = 0.0

        with tf.GradientTape() as tape:
            # Encode the style and content image.
            style_encoded = self.encoder(style)
            content_encoded = self.encoder(content)

            # Compute the AdaIN target feature maps.
            t = ada_in(style=style_encoded, content=content_encoded)

            # Generate the neural style transferred image.
            reconstructed_image = self.decoder(t)

            # Compute the losses.
            reconstructed_vgg_features = self.loss_net(reconstructed_image)
            style_vgg_features = self.loss_net(style)
            loss_content = self.loss_fn(t, reconstructed_vgg_features[-1])
            for inp, out in zip(style_vgg_features, reconstructed_vgg_features):
                mean_inp, std_inp = get_mean_std(inp)
                mean_out, std_out = get_mean_std(out)
                loss_style += self.loss_fn(mean_inp, mean_out) + self.loss_fn(
                    std_inp, std_out
                )
            loss_style = self.style_weight * loss_style
            total_loss = loss_content + loss_style

        # Compute gradients and optimize the decoder.
        trainable_vars = self.decoder.trainable_variables
        gradients = tape.gradient(total_loss, trainable_vars)
        self.optimizer.apply_gradients(zip(gradients, trainable_vars))

        # Update the trackers.
        self.style_loss_tracker.update_state(loss_style)
        self.content_loss_tracker.update_state(loss_content)
        self.total_loss_tracker.update_state(total_loss)
        return {
            "style_loss": self.style_loss_tracker.result(),
            "content_loss": self.content_loss_tracker.result(),
            "total_loss": self.total_loss_tracker.result(),
        }

    def test_step(self, inputs):
        style, content = inputs

        # Initialize the content and style loss.
        loss_content = 0.0
        loss_style = 0.0

        # Encode the style and content image.
        style_encoded = self.encoder(style)
        content_encoded = self.encoder(content)

        # Compute the AdaIN target feature maps.
        t = ada_in(style=style_encoded, content=content_encoded)

        # Generate the neural style transferred image.
        reconstructed_image = self.decoder(t)

        # Compute the losses.
        recons_vgg_features = self.loss_net(reconstructed_image)
        style_vgg_features = self.loss_net(style)
        loss_content = self.loss_fn(t, recons_vgg_features[-1])
        for inp, out in zip(style_vgg_features, recons_vgg_features):
            mean_inp, std_inp = get_mean_std(inp)
            mean_out, std_out = get_mean_std(out)
            loss_style += self.loss_fn(mean_inp, mean_out) + self.loss_fn(
                std_inp, std_out
            )
        loss_style = self.style_weight * loss_style
        total_loss = loss_content + loss_style

        # Update the trackers.
        self.style_loss_tracker.update_state(loss_style)
        self.content_loss_tracker.update_state(loss_content)
        self.total_loss_tracker.update_state(total_loss)
        return {
            "style_loss": self.style_loss_tracker.result(),
            "content_loss": self.content_loss_tracker.result(),
            "total_loss": self.total_loss_tracker.result(),
        }

    @property
    def metrics(self):
        return [
            self.style_loss_tracker,
            self.content_loss_tracker,
            self.total_loss_tracker,
        ]

训练监控回调

此回调用于在每个 epoch 结束时可视化模型的风格迁移输出。风格迁移的目标无法准确量化,需要由受众主观评估。因此,可视化是评估模型的关键方面。

test_style, test_content = next(iter(test_ds))


class TrainMonitor(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        # Encode the style and content image.
        test_style_encoded = self.model.encoder(test_style)
        test_content_encoded = self.model.encoder(test_content)

        # Compute the AdaIN features.
        test_t = ada_in(style=test_style_encoded, content=test_content_encoded)
        test_reconstructed_image = self.model.decoder(test_t)

        # Plot the Style, Content and the NST image.
        fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 5))
        ax[0].imshow(tf.keras.utils.array_to_img(test_style[0]))
        ax[0].set_title(f"Style: {epoch:03d}")

        ax[1].imshow(tf.keras.utils.array_to_img(test_content[0]))
        ax[1].set_title(f"Content: {epoch:03d}")

        ax[2].imshow(
            tf.keras.utils.array_to_img(test_reconstructed_image[0])
        )
        ax[2].set_title(f"NST: {epoch:03d}")

        plt.show()
        plt.close()

训练模型

在本节中,我们定义优化器、损失函数和训练器模块。我们使用优化器和损失函数编译训练器模块,然后对其进行训练。

注意:出于时间限制,我们只训练模型一个 epoch,但为了看到好的结果,我们需要至少训练 30 个 epoch。

optimizer = keras.optimizers.Adam(learning_rate=1e-5)
loss_fn = keras.losses.MeanSquaredError()

encoder = get_encoder()
loss_net = get_loss_net()
decoder = get_decoder()

model = NeuralStyleTransfer(
    encoder=encoder, decoder=decoder, loss_net=loss_net, style_weight=4.0
)

model.compile(optimizer=optimizer, loss_fn=loss_fn)

history = model.fit(
    train_ds,
    epochs=EPOCHS,
    steps_per_epoch=50,
    validation_data=val_ds,
    validation_steps=50,
    callbacks=[TrainMonitor()],
)
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80142336/80134624 [==============================] - 1s 0us/step
80150528/80134624 [==============================] - 1s 0us/step
50/50 [==============================] - ETA: 0s - style_loss: 213.1439 - content_loss: 141.1564 - total_loss: 354.3002

png

50/50 [==============================] - 124s 2s/step - style_loss: 213.1439 - content_loss: 141.1564 - total_loss: 354.3002 - val_style_loss: 167.0819 - val_content_loss: 129.0497 - val_total_loss: 296.1316

推理

模型训练完成后,现在需要进行推理。我们将从测试数据集中传递任意内容和风格图像,并查看输出图像。

注意:要用您自己的图像尝试该模型,可以使用这个Hugging Face 演示

for style, content in test_ds.take(1):
    style_encoded = model.encoder(style)
    content_encoded = model.encoder(content)
    t = ada_in(style=style_encoded, content=content_encoded)
    reconstructed_image = model.decoder(t)
    fig, axes = plt.subplots(nrows=10, ncols=3, figsize=(10, 30))
    [ax.axis("off") for ax in np.ravel(axes)]

    for axis, style_image, content_image, reconstructed_image in zip(
        axes, style[0:10], content[0:10], reconstructed_image[0:10]
    ):
        (ax_style, ax_content, ax_reconstructed) = axis
        ax_style.imshow(style_image)
        ax_style.set_title("Style Image")
        ax_content.imshow(content_image)
        ax_content.set_title("Content Image")
        ax_reconstructed.imshow(reconstructed_image)
        ax_reconstructed.set_title("NST Image")

png


结论

自适应实例归一化(AdaIN)实现了实时的任意风格迁移。值得注意的是,作者的新颖之处在于仅通过对齐风格和内容图像的统计特征(均值和标准差)来实现这一点。

注意:AdaIN 也构成了Style-GANs的基础。


参考


致谢

感谢Luke Wood的详细审阅。