作者:Ian Stenbit, fchollet, lukewood
创建日期 2022/09/28
上次修改 2022/09/28
描述:探索 Stable Diffusion 的潜在流形。
生成式图像模型学习了视觉世界的“潜在流形”:一个低维向量空间,其中每个点映射到一个图像。从流形上的一个点返回到可显示的图像被称为“解码”——在 Stable Diffusion 模型中,这是由“解码器”模型处理的。
这个图像的潜在流形是连续的和可插值的,这意味着
然而,Stable Diffusion 不仅仅是一个图像模型,它也是一个自然语言模型。它有两个潜在空间:训练期间编码器学习的图像表示空间,以及使用预训练和训练时微调相结合学习的提示潜在空间。
潜在空间漫步或潜在空间探索,是指在潜在空间中采样一个点并逐步改变潜在表示的过程。它最常见的应用是生成动画,其中每个采样点都被馈送到解码器,并在最终动画中存储为一帧。对于高质量的潜在表示,这将产生连贯的动画。这些动画可以提供对潜在空间特征图的洞察,并最终导致训练过程的改进。下面显示了一个这样的 GIF
在本指南中,我们将展示如何利用 KerasCV 中的 Stable Diffusion API 来执行提示插值和在 Stable Diffusion 的视觉潜在流形以及文本编码器的潜在流形中进行循环漫步。
本指南假设读者对 Stable Diffusion 有一个高级的理解。如果您还没有,您应该先阅读Stable Diffusion 教程。
首先,我们导入 KerasCV 并使用教程中讨论的优化加载 Stable Diffusion 模型 使用 Stable Diffusion 生成图像。请注意,如果您使用 M1 Mac GPU 运行,则不应启用混合精度。
!pip install keras-cv --upgrade --quiet
import keras_cv
import keras
import matplotlib.pyplot as plt
from keras import ops
import numpy as np
import math
from PIL import Image
# Enable mixed precision
# (only do this if you have a recent NVIDIA GPU)
keras.mixed_precision.set_global_policy("mixed_float16")
# Instantiate the Stable Diffusion model
model = keras_cv.models.StableDiffusion(jit_compile=True)
By using this model checkpoint, you acknowledge that its usage is subject to the terms of the CreativeML Open RAIL-M license at https://raw.githubusercontent.com/CompVis/stable-diffusion/main/LICENSE
在 Stable Diffusion 中,文本提示首先被编码成一个向量,然后该编码被用来指导扩散过程。潜在编码向量形状为 77x768(非常大!),当我们给 Stable Diffusion 一个文本提示时,我们只是从潜在流形上的一个这样的点生成图像。
为了进一步探索这个流形,我们可以对两个文本编码进行插值,并在那些插值点生成图像
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
prompt_2 = "A still life DSLR photo of a bowl of fruit"
interpolation_steps = 5
encoding_1 = ops.squeeze(model.encode_text(prompt_1))
encoding_2 = ops.squeeze(model.encode_text(prompt_2))
interpolated_encodings = ops.linspace(encoding_1, encoding_2, interpolation_steps)
# Show the size of the latent manifold
print(f"Encoding shape: {encoding_1.shape}")
Downloading data from https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz?raw=true
1356917/1356917 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_encoder.h5
492466864/492466864 ━━━━━━━━━━━━━━━━━━━━ 7s 0us/step
Encoding shape: (77, 768)
对编码进行插值后,我们可以从每个点生成图像。请注意,为了在生成的图像之间保持一定的稳定性,我们保持图像之间的扩散噪声不变。
seed = 12345
noise = keras.random.normal((512 // 8, 512 // 8, 4), seed=seed)
images = model.generate_image(
interpolated_encodings,
batch_size=interpolation_steps,
diffusion_noise=noise,
)
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_diffusion_model.h5
3439090152/3439090152 ━━━━━━━━━━━━━━━━━━━━ 26s 0us/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 173s 311ms/step
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_decoder.h5
198180272/198180272 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
现在我们已经生成了一些插值图像,让我们来看看它们!
在本教程中,我们将以 gif 的形式导出图像序列,以便它们可以轻松地以一些时间上下文进行查看。对于第一个和最后一个图像在概念上不匹配的图像序列,我们将 gif 进行橡皮筋处理。
如果您在 Colab 中运行,您可以通过运行以下命令来查看自己的 GIF
from IPython.display import Image as IImage
IImage("doggo-and-fruit-5.gif")
def export_as_gif(filename, images, frames_per_second=10, rubber_band=False):
if rubber_band:
images += images[2:-1][::-1]
images[0].save(
filename,
save_all=True,
append_images=images[1:],
duration=1000 // frames_per_second,
loop=0,
)
export_as_gif(
"doggo-and-fruit-5.gif",
[Image.fromarray(img) for img in images],
frames_per_second=2,
rubber_band=True,
)
结果可能令人惊讶。通常,在提示之间插值会生成看起来连贯的图像,并且通常会展示两个提示内容之间的渐进的概念转变。这表明这是一个高质量的表示空间,它密切地反映了视觉世界的自然结构。
为了更好地可视化这一点,我们应该进行更细粒度的插值,使用数百个步骤。为了保持较小的批次大小(这样我们不会让 GPU 超出内存),这需要手动对插值编码进行批处理。
interpolation_steps = 150
batch_size = 3
batches = interpolation_steps // batch_size
interpolated_encodings = ops.linspace(encoding_1, encoding_2, interpolation_steps)
batched_encodings = ops.split(interpolated_encodings, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
num_steps=25,
diffusion_noise=noise,
)
]
export_as_gif("doggo-and-fruit-150.gif", images, rubber_band=True)
25/25 ━━━━━━━━━━━━━━━━━━━━ 77s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
生成的 gif 显示了两个提示之间更清晰、更连贯的转变。尝试使用您自己的提示进行实验!
我们甚至可以将这个概念扩展到多个图像。例如,我们可以对四个提示进行插值
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
prompt_2 = "A still life DSLR photo of a bowl of fruit"
prompt_3 = "The eiffel tower in the style of starry night"
prompt_4 = "An architectural sketch of a skyscraper"
interpolation_steps = 6
batch_size = 3
batches = (interpolation_steps**2) // batch_size
encoding_1 = ops.squeeze(model.encode_text(prompt_1))
encoding_2 = ops.squeeze(model.encode_text(prompt_2))
encoding_3 = ops.squeeze(model.encode_text(prompt_3))
encoding_4 = ops.squeeze(model.encode_text(prompt_4))
interpolated_encodings = ops.linspace(
ops.linspace(encoding_1, encoding_2, interpolation_steps),
ops.linspace(encoding_3, encoding_4, interpolation_steps),
interpolation_steps,
)
interpolated_encodings = ops.reshape(
interpolated_encodings, (interpolation_steps**2, 77, 768)
)
batched_encodings = ops.split(interpolated_encodings, batches)
images = []
for batch in range(batches):
images.append(
model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
diffusion_noise=noise,
)
)
def plot_grid(images, path, grid_size, scale=2):
fig, axs = plt.subplots(
grid_size, grid_size, figsize=(grid_size * scale, grid_size * scale)
)
fig.tight_layout()
plt.subplots_adjust(wspace=0, hspace=0)
plt.axis("off")
for ax in axs.flat:
ax.axis("off")
images = images.astype(int)
for i in range(min(grid_size * grid_size, len(images))):
ax = axs.flat[i]
ax.imshow(images[i].astype("uint8"))
ax.axis("off")
for i in range(len(images), grid_size * grid_size):
axs.flat[i].axis("off")
axs.flat[i].remove()
plt.savefig(
fname=path,
pad_inches=0,
bbox_inches="tight",
transparent=False,
dpi=60,
)
images = np.concatenate(images)
plot_grid(images, "4-way-interpolation.jpg", interpolation_steps)
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 204ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
我们也可以在允许扩散噪声变化的情况下进行插值,方法是删除diffusion_noise
参数
images = []
for batch in range(batches):
images.append(model.generate_image(batched_encodings[batch], batch_size=batch_size))
images = np.concatenate(images)
plot_grid(images, "4-way-interpolation-varying-noise.jpg", interpolation_steps)
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 215ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 13s 254ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 12s 235ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 12s 230ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 214ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 213ms/step
接下来——让我们去散步!
我们的下一个实验将从特定提示产生的一个点开始,在潜在流形周围散步。
walk_steps = 150
batch_size = 3
batches = walk_steps // batch_size
step_size = 0.005
encoding = ops.squeeze(
model.encode_text("The Eiffel Tower in the style of starry night")
)
# Note that (77, 768) is the shape of the text encoding.
delta = ops.ones_like(encoding) * step_size
walked_encodings = []
for step_index in range(walk_steps):
walked_encodings.append(encoding)
encoding += delta
walked_encodings = ops.stack(walked_encodings)
batched_encodings = ops.split(walked_encodings, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
num_steps=25,
diffusion_noise=noise,
)
]
export_as_gif("eiffel-tower-starry-night.gif", images, rubber_band=True)
25/25 ━━━━━━━━━━━━━━━━━━━━ 6s 228ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 217ms/step
也许并不令人惊讶的是,从编码器的潜在流形走得太远会导致生成看起来不连贯的图像。您可以自己尝试一下,设置自己的提示,并调整step_size
来增加或减少漫步的幅度。请注意,当漫步的幅度变大时,漫步通常会导致进入产生非常嘈杂图像的区域。
我们的最后一个实验是坚持一个提示,并探索扩散模型可以从该提示产生的各种图像。我们通过控制用于播种扩散过程的噪声来做到这一点。
我们创建了两个噪声分量x
和y
,并从 0 到 2π 进行漫步,将x
分量的余弦和y
分量的正弦相加来产生噪声。使用这种方法,我们漫步的结束到达了我们开始漫步时的相同噪声输入,因此我们得到一个“可循环”的结果!
prompt = "An oil paintings of cows in a field next to a windmill in Holland"
encoding = ops.squeeze(model.encode_text(prompt))
walk_steps = 150
batch_size = 3
batches = walk_steps // batch_size
walk_noise_x = keras.random.normal(noise.shape, dtype="float64")
walk_noise_y = keras.random.normal(noise.shape, dtype="float64")
walk_scale_x = ops.cos(ops.linspace(0, 2, walk_steps) * math.pi)
walk_scale_y = ops.sin(ops.linspace(0, 2, walk_steps) * math.pi)
noise_x = ops.tensordot(walk_scale_x, walk_noise_x, axes=0)
noise_y = ops.tensordot(walk_scale_y, walk_noise_y, axes=0)
noise = ops.add(noise_x, noise_y)
batched_noise = ops.split(noise, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
encoding,
batch_size=batch_size,
num_steps=25,
diffusion_noise=batched_noise[batch],
)
]
export_as_gif("cows.gif", images)
25/25 ━━━━━━━━━━━━━━━━━━━━ 35s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 217ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
尝试使用您自己的提示和不同的unconditional_guidance_scale
值进行实验!
Stable Diffusion 提供的不仅仅是单一的文本到图像生成。探索文本编码器的潜在流形和扩散模型的噪声空间是体验这个模型功能的两种有趣方法,而 KerasCV 使其变得容易!