► 代码示例 / 快速 Keras 教程 / 使用 TFServing 部署 TensorFlow 模型

使用 TFServing 服务 TensorFlow 模型

作者： Dimitre Oliveira
创建日期 2023/01/02
最后修改日期 2023/01/02
描述： 如何使用 TensorFlow Serving 部署 TensorFlow 模型。

ⓘ 本示例使用 Keras 3

简介

构建机器学习模型后，下一步就是部署它。您可以通过将模型公开为端点服务来实现这一点。有许多框架可供您使用，但 TensorFlow 生态系统有自己的解决方案，称为 TensorFlow Serving。

来自 TensorFlow Serving GitHub 页面

TensorFlow Serving 是一个灵活、高性能的机器学习模型部署系统，专为生产环境设计。它处理机器学习的推理方面，在训练后接收模型并管理其生命周期，通过高性能、引用计数的查找表为客户端提供版本化访问。TensorFlow Serving 提供与 TensorFlow 模型开箱即用的集成，但可以轻松扩展以部署其他类型的模型和数据。

一些特性需要注意

它可以同时部署多个模型，或同一模型的多个版本
它公开 gRPC 和 HTTP 推理端点
它允许部署新模型版本而无需更改任何客户端代码
它支持新版本的金丝雀部署和实验模型的 A/B 测试
由于高效、低开销的实现，它为推理时间增加了最小的延迟
它有一个调度器，将单个推理请求分组批处理，以在 GPU 上进行联合执行，并具有可配置的延迟控制
它支持许多可服务对象：Tensorflow 模型、嵌入、词汇表、特征转换，甚至非 TensorFlow 的机器学习模型

本指南使用 Keras applications API 创建一个简单的 MobileNet 模型，然后使用 TensorFlow Serving 进行部署。重点在于 TensorFlow Serving，而不是 TensorFlow 中的建模和训练。

注意：您可以在此链接找到包含完整工作代码的 Colab 笔记本。

依赖项

import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import json
import shutil
import requests
import numpy as np
import tensorflow as tf
import keras
import matplotlib.pyplot as plt

模型

这里我们从 Keras applications 加载一个预训练的 MobileNet，这是我们要部署的模型。

model = keras.applications.MobileNet()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_1_0_224_tf.h5
 17225924/17225924 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step

预处理

大多数模型无法直接处理原始数据，它们通常需要某种预处理步骤来将数据调整为模型的要求。对于这个 MobileNet，从其 API 页面可以看到，它的输入图像需要三个基本步骤：

像素值归一化到 [0, 1] 范围
像素值缩放到 [-1, 1] 范围
形状为 (224, 224, 3) 的图像，表示 (height, width, channels)

我们可以使用以下函数完成所有这些操作：

def preprocess(image, mean=0.5, std=0.5, shape=(224, 224)):
    """Scale, normalize and resizes images."""
    image = image / 255.0  # Scale
    image = (image - mean) / std  # Normalize
    image = tf.image.resize(image, shape)  # Resize
    return image

关于使用 "keras.applications" API 进行预处理和后处理的说明

在 Keras applications API 中可用的所有模型也都提供了 preprocess_input 和 decode_predictions 函数，这些函数分别负责每个模型的预处理和后处理，并已包含这些步骤所需的所有逻辑。这是在使用 Keras applications 模型时处理输入和输出的推荐方法。在本指南中，为了更清晰地展示自定义签名的优势，我们不使用它们。

后处理

在相同的上下文中，大多数模型输出的值需要额外的处理才能满足用户的要求。例如，用户不想知道给定图像的每个类别的 logits 值，用户想知道它属于哪个类别。对于我们的模型，这意味着对模型输出进行以下转换：

获取预测最高类别的索引
从该索引获取类别的名称

# Download human-readable labels for ImageNet.
imagenet_labels_url = (
    "https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt"
)
response = requests.get(imagenet_labels_url)
# Skipping background class
labels = [x for x in response.text.split("\n") if x != ""][1:]
# Convert the labels to the TensorFlow data format
tf_labels = tf.constant(labels, dtype=tf.string)


def postprocess(prediction, labels=tf_labels):
    """Convert from probs to labels."""
    indices = tf.argmax(prediction, axis=-1)  # Index with highest prediction
    label = tf.gather(params=labels, indices=indices)  # Class name
    return label

现在让我们下载一张香蕉图片，看看一切是如何整合的。

response = requests.get("https://i.imgur.com/j9xCCzn.jpeg", stream=True)

with open("banana.jpeg", "wb") as f:
    shutil.copyfileobj(response.raw, f)

sample_img = plt.imread("./banana.jpeg")
print(f"Original image shape: {sample_img.shape}")
print(f"Original image pixel range: ({sample_img.min()}, {sample_img.max()})")
plt.imshow(sample_img)
plt.show()

preprocess_img = preprocess(sample_img)
print(f"Preprocessed image shape: {preprocess_img.shape}")
print(
    f"Preprocessed image pixel range: ({preprocess_img.numpy().min()},",
    f"{preprocess_img.numpy().max()})",
)

batched_img = tf.expand_dims(preprocess_img, axis=0)
batched_img = tf.cast(batched_img, tf.float32)
print(f"Batched image shape: {batched_img.shape}")

model_outputs = model(batched_img)
print(f"Model output shape: {model_outputs.shape}")
print(f"Predicted class: {postprocess(model_outputs)}")

Original image shape: (540, 960, 3)
Original image pixel range: (0, 255)

png

Preprocessed image shape: (224, 224, 3)
Preprocessed image pixel range: (-1.0, 1.0)
Batched image shape: (1, 224, 224, 3)
Model output shape: (1, 1000)
Predicted class: [b'banana']

保存模型

为了将训练好的模型加载到 TensorFlow Serving 中，我们首先需要将其保存为 SavedModel 格式。这将创建一个具有良好定义的目录结构的 protobuf 文件，并包含版本号。 TensorFlow Serving 允许我们在进行推理请求时选择要使用的模型或 "servable" 的版本。每个版本将导出到给定路径下的不同子目录。

model_dir = "./model"
model_version = 1
model_export_path = f"{model_dir}/{model_version}"

tf.saved_model.save(
    model,
    export_dir=model_export_path,
)

print(f"SavedModel files: {os.listdir(model_export_path)}")

INFO:tensorflow:Assets written to: ./model/1/assets

INFO:tensorflow:Assets written to: ./model/1/assets

SavedModel files: ['variables', 'saved_model.pb', 'assets', 'fingerprint.pb']

检查您的已保存模型

我们将使用命令行工具 saved_model_cli 来查看 SavedModel 中的 MetaGraphDefs（模型）和 SignatureDefs（您可以调用的方法）。请参阅 TensorFlow 指南中关于 SavedModel CLI 的讨论。

!saved_model_cli show --dir {model_export_path} --tag_set serve --signature_def serving_default

The given SavedModel SignatureDef contains the following input(s):
  inputs['inputs'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 224, 224, 3)
      name: serving_default_inputs:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_0'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 1000)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

这让我们对模型有了很多了解！例如，我们可以看到它的输入具有 4D 形状 (-1, 224, 224, 3)，这意味着 (batch_size, height, width, channels)。另外请注意，此模型需要特定的图像形状 (224, 224, 3)，这意味着我们可能需要在将图像发送到模型之前重塑它们。我们还可以看到模型的输出形状为 (-1, 1000)，这是 ImageNet 数据集 1000 个类别的 logits。

这些信息并没有告诉我们所有事情，例如像素值需要处于 [-1, 1] 范围，但这一个很好的开始。

使用 TensorFlow Serving 部署您的模型

安装 TFServing

我们准备使用 Aptitude 安装 TensorFlow Serving，因为这个 Colab 在 Debian 环境中运行。我们将 tensorflow-model-server 包添加到 Aptitude 已知包列表中。请注意，我们以 root 用户身份运行。

注意：本示例正在本地运行 TensorFlow Serving，但您也可以在 Docker 容器中运行它，这是开始使用 TensorFlow Serving 最简单的方法之一。

wget 'http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-universal-2.8.0/t/tensorflow-model-server-universal/tensorflow-model-server-universal_2.8.0_all.deb'
dpkg -i tensorflow-model-server-universal_2.8.0_all.deb

开始运行 TensorFlow Serving

在这里，我们开始运行 TensorFlow Serving 并加载我们的模型。加载后，我们可以开始使用 REST 进行推理请求。有一些重要的参数：

port：您将用于 gRPC 请求的端口。
rest_api_port：您将用于 REST 请求的端口。
model_name：您将在 REST 请求的 URL 中使用它。它可以是任何名称。
model_base_path：这是您保存模型的目录路径。

请查看 TFServing API 参考以获取所有可用参数。

# Environment variable with the path to the model
os.environ["MODEL_DIR"] = f"{model_dir}"

%%bash --bg
nohup tensorflow_model_server \
  --port=8500 \
  --rest_api_port=8501 \
  --model_name=model \
  --model_base_path=$MODEL_DIR >server.log 2>&1

# We can check the logs to the server to help troubleshooting
!cat server.log

输出

[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

# Now we can check if tensorflow is in the active services
!sudo lsof -i -P -n | grep LISTEN

输出

node         7 root   21u  IPv6  19100      0t0  TCP *:8080 (LISTEN)
kernel_ma   34 root    7u  IPv4  18874      0t0  TCP 172.28.0.12:6000 (LISTEN)
colab-fil   63 root    5u  IPv4  17975      0t0  TCP *:3453 (LISTEN)
colab-fil   63 root    6u  IPv6  17976      0t0  TCP *:3453 (LISTEN)
jupyter-n   81 root    6u  IPv4  18092      0t0  TCP 172.28.0.12:9000 (LISTEN)
python3    101 root   23u  IPv4  18252      0t0  TCP 127.0.0.1:44915 (LISTEN)
python3    132 root    3u  IPv4  20548      0t0  TCP 127.0.0.1:15264 (LISTEN)
python3    132 root    4u  IPv4  20549      0t0  TCP 127.0.0.1:37977 (LISTEN)
python3    132 root    9u  IPv4  20662      0t0  TCP 127.0.0.1:40689 (LISTEN)
tensorflo 1101 root    5u  IPv4  35543      0t0  TCP *:8500 (LISTEN)
tensorflo 1101 root   12u  IPv4  35548      0t0  TCP *:8501 (LISTEN)

向您的 TensorFlow Serving 模型发出请求

现在，让我们为推理请求创建 JSON 对象，看看我们的模型对其进行了多好的分类。

REST API

可服务对象的最新版本

我们将把预测请求作为 POST 发送到服务器的 REST 端点，并将其作为示例传递。我们将通过不指定特定版本来要求服务器提供可服务对象的最新版本。

data = json.dumps(
    {
        "signature_name": "serving_default",
        "instances": batched_img.numpy().tolist(),
    }
)
url = "https://:8501/v1/models/model:predict"


def predict_rest(json_data, url):
    json_response = requests.post(url, data=json_data)
    response = json.loads(json_response.text)
    rest_outputs = np.array(response["predictions"])
    return rest_outputs

rest_outputs = predict_rest(data, url)

print(f"REST output shape: {rest_outputs.shape}")
print(f"Predicted class: {postprocess(rest_outputs)}")

输出

REST output shape: (1, 1000)
Predicted class: [b'banana']

gRPC API

gRPC 基于远程过程调用 (RPC) 模型，是一种用于实现 RPC API 的技术，它使用 HTTP 2.0 作为其底层传输协议。gRPC 通常是低延迟、高度可伸缩和分布式系统的首选。如果您想了解更多关于 REST 与 gRPC 的权衡，请查看这篇文章。

import grpc

# Create a channel that will be connected to the gRPC port of the container
channel = grpc.insecure_channel("localhost:8500")

pip install -q tensorflow_serving_api

from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc

# Create a stub made for prediction
# This stub will be used to send the gRPCrequest to the TF Server
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

# Get the serving_input key
loaded_model = tf.saved_model.load(model_export_path)
input_name = list(
    loaded_model.signatures["serving_default"].structured_input_signature[1].keys()
)[0]

def predict_grpc(data, input_name, stub):
    # Create a gRPC request made for prediction
    request = predict_pb2.PredictRequest()

    # Set the name of the model, for this use case it is "model"
    request.model_spec.name = "model"

    # Set which signature is used to format the gRPC query
    # here the default one "serving_default"
    request.model_spec.signature_name = "serving_default"

    # Set the input as the data
    # tf.make_tensor_proto turns a TensorFlow tensor into a Protobuf tensor
    request.inputs[input_name].CopyFrom(tf.make_tensor_proto(data.numpy().tolist()))

    # Send the gRPC request to the TF Server
    result = stub.Predict(request)
    return result


grpc_outputs = predict_grpc(batched_img, input_name, stub)
grpc_outputs = np.array([grpc_outputs.outputs['predictions'].float_val])

print(f"gRPC output shape: {grpc_outputs.shape}")
print(f"Predicted class: {postprocess(grpc_outputs)}")

输出

gRPC output shape: (1, 1000)
Predicted class: [b'banana']

自定义签名

请注意，对于此模型，我们总是需要预处理和后处理所有样本才能获得所需的输出。如果您维护和部署多个由大型团队开发的模型，这可能会非常棘手，并且每个模型可能需要不同的处理逻辑。

TensorFlow 允许我们自定义模型图以嵌入所有这些处理逻辑，这使得模型部署更加容易。有几种方法可以实现这一点，但由于我们将使用 TFServing 来部署模型，我们可以直接在部署签名中自定义模型图。

我们可以使用以下代码导出包含预处理和后处理逻辑作为默认签名的模型。这使得该模型能够对原始数据进行预测。

def export_model(model, labels):
    @tf.function(input_signature=[tf.TensorSpec([None, None, None, 3], tf.float32)])
    def serving_fn(image):
        processed_img = preprocess(image)
        probs = model(processed_img)
        label = postprocess(probs)
        return {"label": label}

    return serving_fn


model_sig_version = 2
model_sig_export_path = f"{model_dir}/{model_sig_version}"

tf.saved_model.save(
    model,
    export_dir=model_sig_export_path,
    signatures={"serving_default": export_model(model, labels)},
)

!saved_model_cli show --dir {model_sig_export_path} --tag_set serve --signature_def serving_default

INFO:tensorflow:Assets written to: ./model/2/assets

INFO:tensorflow:Assets written to: ./model/2/assets

The given SavedModel SignatureDef contains the following input(s):
  inputs['image'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, -1, -1, 3)
      name: serving_default_image:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['label'] tensor_info:
      dtype: DT_STRING
      shape: (-1)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

请注意，此模型具有不同的签名。它的输入仍然是 4D，但现在是 (-1, -1, -1, 3) 的形状，这意味着它支持任何高度和宽度的图像。其输出的形状也不同，它不再输出 1000 长的 logits。

我们可以使用下面的 API 测试模型的预测，使用特定的签名。

batched_raw_img = tf.expand_dims(sample_img, axis=0)
batched_raw_img = tf.cast(batched_raw_img, tf.float32)

loaded_model = tf.saved_model.load(model_sig_export_path)
loaded_model.signatures["serving_default"](**{"image": batched_raw_img})

{'label': <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'banana'], dtype=object)>}

使用可服务对象的特定版本进行预测

现在让我们指定一个可服务对象的特定版本。请注意，当我们使用自定义签名保存模型时，我们使用了不同的文件夹。第一个模型保存在文件夹 /1（版本 1）中，带有自定义签名的模型保存在文件夹 /2（版本 2）中。默认情况下，TFServing 将部署共享相同父文件夹的所有模型。

REST API

data = json.dumps(
    {
        "signature_name": "serving_default",
        "instances": batched_raw_img.numpy().tolist(),
    }
)
url_sig = "https://:8501/v1/models/model/versions/2:predict"

print(f"REST output shape: {rest_outputs.shape}")
print(f"Predicted class: {rest_outputs}")

输出

REST output shape: (1,)
Predicted class: ['banana']

gRPC API

channel = grpc.insecure_channel("localhost:8500")
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

input_name = list(
    loaded_model.signatures["serving_default"].structured_input_signature[1].keys()
)[0]

grpc_outputs = predict_grpc(batched_raw_img, input_name, stub)
grpc_outputs = np.array([grpc_outputs.outputs['label'].string_val])

print(f"gRPC output shape: {grpc_outputs.shape}")
print(f"Predicted class: {grpc_outputs}")

输出

gRPC output shape: (1, 1)
Predicted class: [[b'banana']]

其他资源

使用 TFServing 服务 TensorFlow 模型

简介

依赖项

模型

预处理

后处理

保存模型

检查您的已保存模型

使用 TensorFlow Serving 部署您的模型

安装 TFServing

开始运行 TensorFlow Serving

向您的 TensorFlow Serving 模型发出请求

REST API

gRPC API

自定义签名

使用可服务对象的特定版本进行预测

REST API

gRPC API

其他资源