神经风格迁移与 AdaIN

作者： Aritra Roy Gosthipaty，Ritwik Raha
创建日期 2021/11/08
最后修改 2021/11/08

ⓘ 本示例使用 Keras 2

描述： 使用自适应实例归一化的神经风格迁移。

引言

神经风格迁移（Neural Style Transfer）是将一幅图像的风格应用到另一幅图像内容上的过程。这首次在 Gatys 等人的里程碑式论文“艺术风格的一种神经算法”中提出。该技术的一个主要局限性在于其运行时长，因为该算法使用了缓慢的迭代优化过程。

后续论文引入了批归一化（Batch Normalization）、实例归一化（Instance Normalization）和条件实例归一化（Conditional Instance Normalization），使得风格迁移可以用新的方式进行，不再需要缓慢的迭代过程。

继这些论文之后，作者 Xun Huang 和 Serge Belongie 提出了自适应实例归一化（Adaptive Instance Normalization，AdaIN），实现了实时的任意风格迁移。

在本示例中，我们实现了用于神经风格迁移的自适应实例归一化。下图中展示了我们仅训练了 30 个 epoch 的 AdaIN 模型的输出。

Style transfer sample gallery

您也可以使用这个Hugging Face 演示，用您自己的图像尝试该模型。

设置

首先导入必要的包。我们还设置种子以确保结果可重现。全局变量是超参数，我们可以根据需要进行更改。

import os
import numpy as np
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import tensorflow_datasets as tfds
from tensorflow.keras import layers

# Defining the global variables.
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 64
# Training for single epoch for time constraint.
# Please use atleast 30 epochs to see good results.
EPOCHS = 1
AUTOTUNE = tf.data.AUTOTUNE

风格迁移样本画廊

对于神经风格迁移，我们需要风格图像和内容图像。在本示例中，我们将使用史上最佳艺术品（Best Artworks of All Time）作为风格数据集，使用Pascal VOC作为内容数据集。

这与作者在原始论文实现中使用的不同，他们在其中分别使用WIKI-Art作为风格数据集和MSCOCO作为内容数据集。我们这样做是为了创建一个最小但可重现的示例。

从 Kaggle 下载数据集

史上最佳艺术品（Best Artworks of All Time）数据集托管在 Kaggle 上，可以通过以下步骤轻松在 Colab 中下载：

如果您没有 Kaggle API 密钥，请按照此处的说明获取。
使用以下命令上传 Kaggle API 密钥。

from google.colab import files
files.upload()

使用以下命令将 API 密钥移动到相应目录并下载数据集。

$ mkdir ~/.kaggle
$ cp kaggle.json ~/.kaggle/
$ chmod 600 ~/.kaggle/kaggle.json
$ kaggle datasets download ikarus777/best-artworks-of-all-time
$ unzip -qq best-artworks-of-all-time.zip
$ rm -rf images
$ mv resized artwork
$ rm best-artworks-of-all-time.zip artists.csv

`tf.data` 数据管道

在本节中，我们将构建项目的tf.data数据管道。对于风格数据集，我们从文件夹中解码、转换和调整图像大小。对于内容图像，由于我们使用了tfds模块，因此已经有一个tf.data数据集。

准备好风格和内容数据管道后，我们将两者打包在一起，以获得模型将使用的数据管道。

def decode_and_resize(image_path):
    """Decodes and resizes an image from the image file path.

    Args:
        image_path: The image file path.

    Returns:
        A resized image.
    """
    image = tf.io.read_file(image_path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.convert_image_dtype(image, dtype="float32")
    image = tf.image.resize(image, IMAGE_SIZE)
    return image


def extract_image_from_voc(element):
    """Extracts image from the PascalVOC dataset.

    Args:
        element: A dictionary of data.

    Returns:
        A resized image.
    """
    image = element["image"]
    image = tf.image.convert_image_dtype(image, dtype="float32")
    image = tf.image.resize(image, IMAGE_SIZE)
    return image


# Get the image file paths for the style images.
style_images = os.listdir("/content/artwork/resized")
style_images = [os.path.join("/content/artwork/resized", path) for path in style_images]

# split the style images in train, val and test
total_style_images = len(style_images)
train_style = style_images[: int(0.8 * total_style_images)]
val_style = style_images[int(0.8 * total_style_images) : int(0.9 * total_style_images)]
test_style = style_images[int(0.9 * total_style_images) :]

# Build the style and content tf.data datasets.
train_style_ds = (
    tf.data.Dataset.from_tensor_slices(train_style)
    .map(decode_and_resize, num_parallel_calls=AUTOTUNE)
    .repeat()
)
train_content_ds = tfds.load("voc", split="train").map(extract_image_from_voc).repeat()

val_style_ds = (
    tf.data.Dataset.from_tensor_slices(val_style)
    .map(decode_and_resize, num_parallel_calls=AUTOTUNE)
    .repeat()
)
val_content_ds = (
    tfds.load("voc", split="validation").map(extract_image_from_voc).repeat()
)

test_style_ds = (
    tf.data.Dataset.from_tensor_slices(test_style)
    .map(decode_and_resize, num_parallel_calls=AUTOTUNE)
    .repeat()
)
test_content_ds = (
    tfds.load("voc", split="test")
    .map(extract_image_from_voc, num_parallel_calls=AUTOTUNE)
    .repeat()
)

# Zipping the style and content datasets.
train_ds = (
    tf.data.Dataset.zip((train_style_ds, train_content_ds))
    .shuffle(BATCH_SIZE * 2)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

val_ds = (
    tf.data.Dataset.zip((val_style_ds, val_content_ds))
    .shuffle(BATCH_SIZE * 2)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

test_ds = (
    tf.data.Dataset.zip((test_style_ds, test_content_ds))
    .shuffle(BATCH_SIZE * 2)
    .batch(BATCH_SIZE)
    .prefetch(AUTOTUNE)
)

[1mDownloading and preparing dataset voc/2007/4.0.0 (download: 868.85 MiB, generated: Unknown size, total: 868.85 MiB) to /root/tensorflow_datasets/voc/2007/4.0.0...[0m

Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0 file [00:00, ? file/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/voc/2007/4.0.0.incompleteP16YU5/voc-test.tfrecord

  0%|          | 0/4952 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/voc/2007/4.0.0.incompleteP16YU5/voc-train.tfrecord

  0%|          | 0/2501 [00:00<?, ? examples/s]

0 examples [00:00, ? examples/s]

Shuffling and writing examples to /root/tensorflow_datasets/voc/2007/4.0.0.incompleteP16YU5/voc-validation.tfrecord

  0%|          | 0/2510 [00:00<?, ? examples/s]

[1mDataset voc downloaded and prepared to /root/tensorflow_datasets/voc/2007/4.0.0. Subsequent calls will reuse this data.[0m

数据可视化

训练前最好先可视化数据。为了确保预处理管道的正确性，我们可视化数据集中的 10 个样本。

style, content = next(iter(train_ds))
fig, axes = plt.subplots(nrows=10, ncols=2, figsize=(5, 30))
[ax.axis("off") for ax in np.ravel(axes)]

for (axis, style_image, content_image) in zip(axes, style[0:10], content[0:10]):
    (ax_style, ax_content) = axis
    ax_style.imshow(style_image)
    ax_style.set_title("Style Image")

    ax_content.imshow(content_image)
    ax_content.set_title("Content Image")

png

架构

风格迁移网络接收一个内容图像和一个风格图像作为输入，并输出风格迁移后的图像。AdaIN 的作者提出了一个简单的编码器-解码器结构来实现这一点。

AdaIN architecture

内容图像（C）和风格图像（S）都被馈送到编码器网络。这些编码器网络的输出（特征图）随后被馈送到 AdaIN 层。AdaIN 层计算一个组合特征图。然后将此特征图馈送到一个随机初始化的解码器网络，该网络充当神经风格迁移图像的生成器。

AdaIn equation

风格特征图（fs）和内容特征图（fc）被馈送到 AdaIN 层。该层生成组合特征图t。函数g表示解码器（生成器）网络。

编码器

编码器是预训练（在ImageNet上预训练）VGG19 模型的一部分。我们从block4-conv1层切片模型。输出层与作者在论文中建议的一致。

def get_encoder():
    vgg19 = keras.applications.VGG19(
        include_top=False,
        weights="imagenet",
        input_shape=(*IMAGE_SIZE, 3),
    )
    vgg19.trainable = False
    mini_vgg19 = keras.Model(vgg19.input, vgg19.get_layer("block4_conv1").output)

    inputs = layers.Input([*IMAGE_SIZE, 3])
    mini_vgg19_out = mini_vgg19(inputs)
    return keras.Model(inputs, mini_vgg19_out, name="mini_vgg19")

自适应实例归一化

AdaIN 层接收内容图像和风格图像的特征。该层可以通过以下公式定义：

AdaIn formula

其中sigma是标准差，mu是相关变量的均值。在上面的公式中，内容特征图fc的均值和方差与风格特征图fs的均值和方差对齐。

需要注意的是，作者提出的 AdaIN 层除了均值和方差外不使用其他参数。该层也没有任何可训练参数。这就是我们使用Python 函数而不是使用Keras 层的原因。该函数接收风格和内容特征图，计算图像的均值和标准差，并返回自适应实例归一化后的特征图。

def get_mean_std(x, epsilon=1e-5):
    axes = [1, 2]

    # Compute the mean and standard deviation of a tensor.
    mean, variance = tf.nn.moments(x, axes=axes, keepdims=True)
    standard_deviation = tf.sqrt(variance + epsilon)
    return mean, standard_deviation


def ada_in(style, content):
    """Computes the AdaIn feature map.

    Args:
        style: The style feature map.
        content: The content feature map.

    Returns:
        The AdaIN feature map.
    """
    content_mean, content_std = get_mean_std(content)
    style_mean, style_std = get_mean_std(style)
    t = style_std * (content - content_mean) / content_std + style_mean
    return t

解码器

作者指定解码器网络必须与编码器网络镜像对称。我们对称地反转了编码器来构建我们的解码器。我们使用了UpSampling2D层来增加特征图的空间分辨率。

请注意，作者警告不要在解码器网络中使用任何归一化层，并且确实继续展示了包含批归一化或实例归一化会损害整体网络的性能。

这是整个架构中唯一可训练的部分。

def get_decoder():
    config = {"kernel_size": 3, "strides": 1, "padding": "same", "activation": "relu"}
    decoder = keras.Sequential(
        [
            layers.InputLayer((None, None, 512)),
            layers.Conv2D(filters=512, **config),
            layers.UpSampling2D(),
            layers.Conv2D(filters=256, **config),
            layers.Conv2D(filters=256, **config),
            layers.Conv2D(filters=256, **config),
            layers.Conv2D(filters=256, **config),
            layers.UpSampling2D(),
            layers.Conv2D(filters=128, **config),
            layers.Conv2D(filters=128, **config),
            layers.UpSampling2D(),
            layers.Conv2D(filters=64, **config),
            layers.Conv2D(
                filters=3,
                kernel_size=3,
                strides=1,
                padding="same",
                activation="sigmoid",
            ),
        ]
    )
    return decoder

损失函数

在这里，我们构建神经风格迁移模型的损失函数。作者建议使用预训练的 VGG-19 来计算网络的损失函数。重要的是要记住，这仅用于训练解码器网络。总损失（Lt）是内容损失（Lc）和风格损失（Ls）的加权组合。lambda项用于调整迁移的风格量。

The total loss

内容损失

这是内容图像特征与神经风格迁移图像特征之间的欧几里得距离。

The content loss

作者建议使用 AdaIn 层的输出t作为内容目标，而不是使用原始图像的特征作为目标。这样做是为了加速收敛。

风格损失

作者没有使用更常用的格拉姆矩阵（Gram Matrix），而是提出计算统计特征（均值和方差）之间的差异，这使得概念更清晰。这可以通过以下公式轻松可视化：

The style loss

其中theta表示用于计算损失的 VGG-19 层。在这种情况下，这对应于：

block1_conv1
block1_conv2
block1_conv3
block1_conv4

def get_loss_net():
    vgg19 = keras.applications.VGG19(
        include_top=False, weights="imagenet", input_shape=(*IMAGE_SIZE, 3)
    )
    vgg19.trainable = False
    layer_names = ["block1_conv1", "block2_conv1", "block3_conv1", "block4_conv1"]
    outputs = [vgg19.get_layer(name).output for name in layer_names]
    mini_vgg19 = keras.Model(vgg19.input, outputs)

    inputs = layers.Input([*IMAGE_SIZE, 3])
    mini_vgg19_out = mini_vgg19(inputs)
    return keras.Model(inputs, mini_vgg19_out, name="loss_net")