代码示例 / 计算机视觉 / 使用 SegFormer 和 Hugging Face Transformers 进行语义分割

使用 SegFormer 和 Hugging Face Transformers 进行语义分割

作者: Sayak Paul
创建日期 2023/01/25
上次修改 2023/01/29
描述:对 SegFormer 模型变体进行微调以进行语义分割。

ⓘ 此示例使用 Keras 2

在 Colab 中查看 GitHub 源代码


简介

在本例中,我们将展示如何对 SegFormer 模型变体进行微调,以便在自定义数据集上进行语义分割。语义分割是指为图像中每个像素分配一个类别的任务。SegFormer 在 SegFormer:使用 Transformer 进行语义分割的简单高效设计 中提出。SegFormer 使用分层 Transformer 架构(称为“Mix Transformer”)作为其编码器,并使用轻量级解码器进行分割。因此,它在语义分割方面产生了最先进的性能,同时比现有模型更有效。有关更多详细信息,请查看原始论文。

segformer-arch

我们利用 Hugging Face Transformers 加载预训练的 SegFormer 检查点并对其进行微调以适应自定义数据集。

注意:此示例重用了以下来源的代码

要运行此示例,我们需要安装 transformers

!!pip install transformers -q
[]

加载数据

在本例中,我们使用 Oxford-IIIT Pets 数据集。我们利用 tensorflow_datasets 加载数据集。

import tensorflow_datasets as tfds

dataset, info = tfds.load("oxford_iiit_pet:3.*.*", with_info=True)
/opt/conda/lib/python3.7/site-packages/tensorflow_io/python/ops/__init__.py:98: UserWarning: unable to load libtensorflow_io_plugins.so: unable to open file: libtensorflow_io_plugins.so, from paths: ['/opt/conda/lib/python3.7/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so']
caused by: ['/opt/conda/lib/python3.7/site-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl5mutexC1Ev']
  warnings.warn(f"unable to load libtensorflow_io_plugins.so: {e}")
/opt/conda/lib/python3.7/site-packages/tensorflow_io/python/ops/__init__.py:104: UserWarning: file system plugins are not loaded: unable to open file: libtensorflow_io.so, from paths: ['/opt/conda/lib/python3.7/site-packages/tensorflow_io/python/ops/libtensorflow_io.so']
caused by: ['/opt/conda/lib/python3.7/site-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZNK10tensorflow4data11DatasetBase8FinalizeEPNS_15OpKernelContextESt8functionIFN3tsl8StatusOrISt10unique_ptrIS1_NS5_4core15RefCountDeleterEEEEvEE']
  warnings.warn(f"file system plugins are not loaded: {e}")

准备数据集

为了准备用于训练和评估的数据集,我们

  • 使用 SegFormer 预训练期间使用的均值和标准差对图像进行归一化。
  • 从分割掩码中减去 1,以便像素值从 0 开始。
  • 调整图像大小。
  • 转置图像,使其采用 "channels_first" 格式。这样做是为了使它们与来自 Hugging Face Transformers 的 SegFormer 模型兼容。
import tensorflow as tf
from tensorflow.keras import backend

image_size = 512
mean = tf.constant([0.485, 0.456, 0.406])
std = tf.constant([0.229, 0.224, 0.225])


def normalize(input_image, input_mask):
    input_image = tf.image.convert_image_dtype(input_image, tf.float32)
    input_image = (input_image - mean) / tf.maximum(std, backend.epsilon())
    input_mask -= 1
    return input_image, input_mask


def load_image(datapoint):
    input_image = tf.image.resize(datapoint["image"], (image_size, image_size))
    input_mask = tf.image.resize(
        datapoint["segmentation_mask"],
        (image_size, image_size),
        method="bilinear",
    )

    input_image, input_mask = normalize(input_image, input_mask)
    input_image = tf.transpose(input_image, (2, 0, 1))
    return {"pixel_values": input_image, "labels": tf.squeeze(input_mask)}

现在,我们使用上述工具来准备 tf.data.Dataset 对象,包括用于性能的 prefetch()。更改 batch_size 以匹配您用于训练的 GPU 上的 GPU 内存大小。

auto = tf.data.AUTOTUNE
batch_size = 4

train_ds = (
    dataset["train"]
    .cache()
    .shuffle(batch_size * 10)
    .map(load_image, num_parallel_calls=auto)
    .batch(batch_size)
    .prefetch(auto)
)
test_ds = (
    dataset["test"]
    .map(load_image, num_parallel_calls=auto)
    .batch(batch_size)
    .prefetch(auto)
)

我们可以检查输入图像及其分割图的形状

print(train_ds.element_spec)
{'pixel_values': TensorSpec(shape=(None, 3, 512, 512), dtype=tf.float32, name=None), 'labels': TensorSpec(shape=(None, 512, 512), dtype=tf.float32, name=None)}

可视化数据集

import matplotlib.pyplot as plt


def display(display_list):
    plt.figure(figsize=(15, 15))

    title = ["Input Image", "True Mask", "Predicted Mask"]

    for i in range(len(display_list)):
        plt.subplot(1, len(display_list), i + 1)
        plt.title(title[i])
        plt.imshow(tf.keras.utils.array_to_img(display_list[i]))
        plt.axis("off")
    plt.show()


for samples in train_ds.take(2):
    sample_image, sample_mask = samples["pixel_values"][0], samples["labels"][0]
    sample_image = tf.transpose(sample_image, (1, 2, 0))
    sample_mask = tf.expand_dims(sample_mask, -1)
    display([sample_image, sample_mask])

png

png


加载预训练的 SegFormer 检查点

现在,我们从 Hugging Face Transformers 加载预训练的 SegFormer 模型变体。SegFormer 模型有多种变体,称为 **MiT-B0** 到 **MiT-B5**。您可以在 此处 找到这些检查点。我们加载最小的变体 Mix-B0,它在推理效率和预测性能之间取得了良好的平衡。

from transformers import TFSegformerForSemanticSegmentation

model_checkpoint = "nvidia/mit-b0"
id2label = {0: "outer", 1: "inner", 2: "border"}
label2id = {label: id for id, label in id2label.items()}
num_labels = len(id2label)
model = TFSegformerForSemanticSegmentation.from_pretrained(
    model_checkpoint,
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
)
WARNING:tensorflow:5 out of the last 5 calls to <function Conv._jit_compiled_convolution_op at 0x7fa8cc1139e0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://tensorflowcn.cn/guide/function#controlling_retracing and https://tensorflowcn.cn/api_docs/python/tf/function for  more details.

WARNING:tensorflow:5 out of the last 5 calls to <function Conv._jit_compiled_convolution_op at 0x7fa8cc1139e0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://tensorflowcn.cn/guide/function#controlling_retracing and https://tensorflowcn.cn/api_docs/python/tf/function for  more details.

WARNING:tensorflow:6 out of the last 6 calls to <function Conv._jit_compiled_convolution_op at 0x7fa8bde37440> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://tensorflowcn.cn/guide/function#controlling_retracing and https://tensorflowcn.cn/api_docs/python/tf/function for  more details.

WARNING:tensorflow:6 out of the last 6 calls to <function Conv._jit_compiled_convolution_op at 0x7fa8bde37440> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://tensorflowcn.cn/guide/function#controlling_retracing and https://tensorflowcn.cn/api_docs/python/tf/function for  more details.
Some layers from the model checkpoint at nvidia/mit-b0 were not used when initializing TFSegformerForSemanticSegmentation: ['classifier']
- This IS expected if you are initializing TFSegformerForSemanticSegmentation from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSegformerForSemanticSegmentation from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFSegformerForSemanticSegmentation were not initialized from the model checkpoint at nvidia/mit-b0 and are newly initialized: ['decode_head']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

警告提示我们正在丢弃一些权重并重新初始化其他一些权重。不要惊慌!这完全正常。由于我们使用的是自定义数据集,其语义类别标签集与预训练数据集不同,TFSegformerForSemanticSegmentation 正在初始化一个新的解码器头部。

现在我们可以初始化一个优化器并使用它编译模型。


编译模型

lr = 0.00006
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
model.compile(optimizer=optimizer)
No loss specified in compile() - the model's internal loss computation will be used as the loss. Don't panic - this is a common way to train TensorFlow models in Transformers! To disable this behaviour please pass a loss argument, or explicitly pass `loss=None` if you do not want your model to compute a loss.

请注意,我们没有使用任何损失函数来编译模型。这是因为模型的前向传递实现了当我们提供标签以及输入图像时的损失计算部分。计算损失后,模型返回了一个结构化的 dataclass 对象,然后将其用于指导训练过程。

使用编译后的模型,我们可以继续调用 fit() 来开始微调过程!


预测回调以监控训练进度

它帮助我们在模型正在微调时可视化一些样本预测,从而帮助我们监控模型的进度。此回调的灵感来自本教程

from IPython.display import clear_output


def create_mask(pred_mask):
    pred_mask = tf.math.argmax(pred_mask, axis=1)
    pred_mask = tf.expand_dims(pred_mask, -1)
    return pred_mask[0]


def show_predictions(dataset=None, num=1):
    if dataset:
        for sample in dataset.take(num):
            images, masks = sample["pixel_values"], sample["labels"]
            masks = tf.expand_dims(masks, -1)
            pred_masks = model.predict(images).logits
            images = tf.transpose(images, (0, 2, 3, 1))
            display([images[0], masks[0], create_mask(pred_masks)])
    else:
        display(
            [
                sample_image,
                sample_mask,
                create_mask(model.predict(tf.expand_dims(sample_image, 0))),
            ]
        )


class DisplayCallback(tf.keras.callbacks.Callback):
    def __init__(self, dataset, **kwargs):
        super().__init__(**kwargs)
        self.dataset = dataset

    def on_epoch_end(self, epoch, logs=None):
        clear_output(wait=True)
        show_predictions(self.dataset)
        print("\nSample Prediction after epoch {}\n".format(epoch + 1))

训练模型

# Increase the number of epochs if the results are not of expected quality.
epochs = 5

history = model.fit(
    train_ds,
    validation_data=test_ds,
    callbacks=[DisplayCallback(test_ds)],
    epochs=epochs,
)
1/1 [==============================] - 0s 54ms/step

png

Sample Prediction after epoch 5
920/920 [==============================] - 89s 97ms/step - loss: 0.1742 - val_loss: 0.1927

推理

我们对测试集中的一些样本进行推理。

show_predictions(test_ds, 5)
1/1 [==============================] - 0s 54ms/step

png

1/1 [==============================] - 0s 54ms/step

png

1/1 [==============================] - 0s 53ms/step

png

1/1 [==============================] - 0s 53ms/step

png

1/1 [==============================] - 0s 53ms/step

png


结论

在本例中,我们学习了如何在自定义数据集上微调 SegFormer 模型变体以进行语义分割。为了简洁起见,示例保持简短。但是,您可以进一步尝试一些事情

  • 结合数据增强以潜在地改善结果。
  • 使用更大的 SegFormer 模型检查点以查看结果如何受到影响。
  • 将微调后的模型推送到 Hugging Face,以便轻松与社区共享。您可以通过执行 model.push_to_hub("your-username/your-awesome-model") 来做到这一点。然后,您可以通过执行 TFSegformerForSemanticSegmentation.from_pretrained("your-username/your-awesome-model" 来加载模型。此处 是一个端到端示例,如果您正在寻找参考。
  • 如果您希望在模型正在微调时将模型检查点推送到 Hub,则可以使用 PushToHubCallback Keras 回调。此处 是一个示例。此处 是使用此回调创建的模型存储库的示例。