作者: Ian Stenbit, fchollet, lukewood
创建日期 2022/09/28
最后修改日期 2022/09/28
描述: 探索 Stable Diffusion 的潜在流形。
生成式图像模型学习视觉世界的“潜在流形”:一个低维向量空间,其中每个点都映射到一个图像。从流形上的这样一个点返回到可显示的图像称为“解码”——在 Stable Diffusion 模型中,这由“解码器”模型处理。

这个图像的潜在流形是连续且可插值的,意味着
然而,Stable Diffusion 不仅仅是一个图像模型,它也是一个自然语言模型。它有两个潜在空间:训练期间使用的编码器学习的图像表示空间,以及使用预训练和训练时微调的组合学习的提示潜在空间。
潜在空间漫步,或潜在空间探索,是指采样潜在空间中的一个点并逐步改变潜在表示的过程。它最常见的应用是生成动画,其中每个采样点都被馈送到解码器并作为帧存储在最终动画中。对于高质量的潜在表示,这会产生连贯的动画。这些动画可以深入了解潜在空间的特征图,并最终改进训练过程。下方显示了一个这样的 GIF。

在本指南中,我们将展示如何利用 KerasCV 中的 Stable Diffusion API,通过 Stable Diffusion 的视觉潜在流形以及文本编码器的潜在流形进行提示插值和循环漫步。
本指南假定读者对 Stable Diffusion 有初步了解。如果您还没有,您应该先阅读 Stable Diffusion 教程。
首先,我们导入 KerasCV 并使用教程 使用 Stable Diffusion 生成图像 中讨论的优化来加载 Stable Diffusion 模型。请注意,如果您在 M1 Mac GPU 上运行,则不应启用混合精度。
!pip install keras-cv --upgrade --quiet
import keras_cv
import keras
import matplotlib.pyplot as plt
from keras import ops
import numpy as np
import math
from PIL import Image
# Enable mixed precision
# (only do this if you have a recent NVIDIA GPU)
keras.mixed_precision.set_global_policy("mixed_float16")
# Instantiate the Stable Diffusion model
model = keras_cv.models.StableDiffusion(jit_compile=True)
By using this model checkpoint, you acknowledge that its usage is subject to the terms of the CreativeML Open RAIL-M license at https://raw.githubusercontent.com/CompVis/stable-diffusion/main/LICENSE
在 Stable Diffusion 中,文本提示首先被编码为一个向量,然后该编码用于指导扩散过程。潜在编码向量的形状是 77x768(非常大!),当我们给 Stable Diffusion 一个文本提示时,我们是从潜在流形上的一个点生成图像。
为了探索这个流形,我们可以对两个文本编码进行插值,并在这些插值点生成图像。
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
prompt_2 = "A still life DSLR photo of a bowl of fruit"
interpolation_steps = 5
encoding_1 = ops.squeeze(model.encode_text(prompt_1))
encoding_2 = ops.squeeze(model.encode_text(prompt_2))
interpolated_encodings = ops.linspace(encoding_1, encoding_2, interpolation_steps)
# Show the size of the latent manifold
print(f"Encoding shape: {encoding_1.shape}")
Downloading data from https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz?raw=true
1356917/1356917 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_encoder.h5
492466864/492466864 ━━━━━━━━━━━━━━━━━━━━ 7s 0us/step
Encoding shape: (77, 768)
一旦我们插值了编码,我们就可以从每个点生成图像。请注意,为了在生成的图像之间保持一定的稳定性,我们在图像之间保持扩散噪声恒定。
seed = 12345
noise = keras.random.normal((512 // 8, 512 // 8, 4), seed=seed)
images = model.generate_image(
interpolated_encodings,
batch_size=interpolation_steps,
diffusion_noise=noise,
)
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_diffusion_model.h5
3439090152/3439090152 ━━━━━━━━━━━━━━━━━━━━ 26s 0us/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 173s 311ms/step
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_decoder.h5
198180272/198180272 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
现在我们已经生成了一些插值图像,让我们来看看它们!
在本教程中,我们将把图像序列导出为 GIF,以便可以轻松地带有时间上下文地查看它们。对于第一个和最后一个图像在概念上不匹配的图像序列,我们会对 GIF 进行“橡皮筋”处理(循环播放)。
如果您在 Colab 中运行,可以通过运行以下命令查看您自己的 GIF:
from IPython.display import Image as IImage
IImage("doggo-and-fruit-5.gif")
def export_as_gif(filename, images, frames_per_second=10, rubber_band=False):
if rubber_band:
images += images[2:-1][::-1]
images[0].save(
filename,
save_all=True,
append_images=images[1:],
duration=1000 // frames_per_second,
loop=0,
)
export_as_gif(
"doggo-and-fruit-5.gif",
[Image.fromarray(img) for img in images],
frames_per_second=2,
rubber_band=True,
)

结果可能令人惊讶。总的来说,在提示之间插值会产生连贯的图像,并且经常展示出两个提示内容之间渐进的概念转变。这表明了一个高质量的表示空间,它密切反映了视觉世界的自然结构。
为了最好地可视化这一点,我们应该进行更精细的插值,使用数百个步骤。为了保持批次大小较小(这样我们就不会 OOM GPU),这需要手动批处理我们的插值编码。
interpolation_steps = 150
batch_size = 3
batches = interpolation_steps // batch_size
interpolated_encodings = ops.linspace(encoding_1, encoding_2, interpolation_steps)
batched_encodings = ops.split(interpolated_encodings, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
num_steps=25,
diffusion_noise=noise,
)
]
export_as_gif("doggo-and-fruit-150.gif", images, rubber_band=True)
25/25 ━━━━━━━━━━━━━━━━━━━━ 77s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step

生成的 GIF 显示了两个提示之间更清晰、更连贯的转变。尝试您自己的提示并进行实验!
我们甚至可以将这个概念扩展到不止一张图像。例如,我们可以对四个提示进行插值。
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
prompt_2 = "A still life DSLR photo of a bowl of fruit"
prompt_3 = "The eiffel tower in the style of starry night"
prompt_4 = "An architectural sketch of a skyscraper"
interpolation_steps = 6
batch_size = 3
batches = (interpolation_steps**2) // batch_size
encoding_1 = ops.squeeze(model.encode_text(prompt_1))
encoding_2 = ops.squeeze(model.encode_text(prompt_2))
encoding_3 = ops.squeeze(model.encode_text(prompt_3))
encoding_4 = ops.squeeze(model.encode_text(prompt_4))
interpolated_encodings = ops.linspace(
ops.linspace(encoding_1, encoding_2, interpolation_steps),
ops.linspace(encoding_3, encoding_4, interpolation_steps),
interpolation_steps,
)
interpolated_encodings = ops.reshape(
interpolated_encodings, (interpolation_steps**2, 77, 768)
)
batched_encodings = ops.split(interpolated_encodings, batches)
images = []
for batch in range(batches):
images.append(
model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
diffusion_noise=noise,
)
)
def plot_grid(images, path, grid_size, scale=2):
fig, axs = plt.subplots(
grid_size, grid_size, figsize=(grid_size * scale, grid_size * scale)
)
fig.tight_layout()
plt.subplots_adjust(wspace=0, hspace=0)
plt.axis("off")
for ax in axs.flat:
ax.axis("off")
images = images.astype(int)
for i in range(min(grid_size * grid_size, len(images))):
ax = axs.flat[i]
ax.imshow(images[i].astype("uint8"))
ax.axis("off")
for i in range(len(images), grid_size * grid_size):
axs.flat[i].axis("off")
axs.flat[i].remove()
plt.savefig(
fname=path,
pad_inches=0,
bbox_inches="tight",
transparent=False,
dpi=60,
)
images = np.concatenate(images)
plot_grid(images, "4-way-interpolation.jpg", interpolation_steps)
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 204ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step

我们也可以在允许扩散噪声变化的情况下进行插值,通过删除 diffusion_noise 参数。
images = []
for batch in range(batches):
images.append(model.generate_image(batched_encodings[batch], batch_size=batch_size))
images = np.concatenate(images)
plot_grid(images, "4-way-interpolation-varying-noise.jpg", interpolation_steps)
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 215ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 13s 254ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 12s 235ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 12s 230ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 214ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 213ms/step

接下来——让我们来一次漫步!
我们的下一个实验将是围绕潜在流形进行一次漫步,从特定提示产生的点开始。
walk_steps = 150
batch_size = 3
batches = walk_steps // batch_size
step_size = 0.005
encoding = ops.squeeze(
model.encode_text("The Eiffel Tower in the style of starry night")
)
# Note that (77, 768) is the shape of the text encoding.
delta = ops.ones_like(encoding) * step_size
walked_encodings = []
for step_index in range(walk_steps):
walked_encodings.append(encoding)
encoding += delta
walked_encodings = ops.stack(walked_encodings)
batched_encodings = ops.split(walked_encodings, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
num_steps=25,
diffusion_noise=noise,
)
]
export_as_gif("eiffel-tower-starry-night.gif", images, rubber_band=True)
25/25 ━━━━━━━━━━━━━━━━━━━━ 6s 228ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 217ms/step

可能并不令人惊讶的是,离编码器的潜在流形太远会导致图像看起来不连贯。您可以自己尝试,设置您自己的提示,并调整 step_size 来增加或减少漫步的幅度。请注意,当漫步幅度变大时,漫步常常会进入产生极度噪声图像的区域。
我们的最终实验是坚持一个提示,并探索扩散模型可以从该提示生成各种图像。我们通过控制用于播种扩散过程的噪声来实现这一点。
我们创建了两个噪声分量,x 和 y,然后从 0 到 2π 进行漫步,将 x 分量的余弦与 y 分量的正弦相加以产生噪声。使用这种方法,我们的漫步结束时到达的噪声输入与我们开始漫步时相同,因此我们得到一个“可循环”的结果!
prompt = "An oil paintings of cows in a field next to a windmill in Holland"
encoding = ops.squeeze(model.encode_text(prompt))
walk_steps = 150
batch_size = 3
batches = walk_steps // batch_size
walk_noise_x = keras.random.normal(noise.shape, dtype="float64")
walk_noise_y = keras.random.normal(noise.shape, dtype="float64")
walk_scale_x = ops.cos(ops.linspace(0, 2, walk_steps) * math.pi)
walk_scale_y = ops.sin(ops.linspace(0, 2, walk_steps) * math.pi)
noise_x = ops.tensordot(walk_scale_x, walk_noise_x, axes=0)
noise_y = ops.tensordot(walk_scale_y, walk_noise_y, axes=0)
noise = ops.add(noise_x, noise_y)
batched_noise = ops.split(noise, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
encoding,
batch_size=batch_size,
num_steps=25,
diffusion_noise=batched_noise[batch],
)
]
export_as_gif("cows.gif", images)
25/25 ━━━━━━━━━━━━━━━━━━━━ 35s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 217ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step

尝试您自己的提示和不同的 unconditional_guidance_scale 值!
Stable Diffusion 提供的功能远不止单一的文本到图像生成。探索文本编码器的潜在流形和扩散模型的噪声空间是体验该模型强大功能的两种有趣方式,而 KerasCV 使这一切变得容易!