作者:Ian Stenbit、fchollet、lukewood
创建日期 2022/09/28
上次修改日期 2022/09/28
描述:探索 Stable Diffusion 的潜流形。
生成式图像模型学习视觉世界的“潜流形”:一个低维向量空间,其中每个点都映射到一个图像。从流形上的一个点返回到可显示的图像称为“解码”——在 Stable Diffusion 模型中,这是由“解码器”模型处理的。
这个图像的潜流形是连续且可插值的,这意味着
然而,Stable Diffusion 不仅仅是一个图像模型,它还是一个自然语言模型。它有两个潜空间:训练期间编码器学习的图像表示空间,以及使用预训练和训练时微调相结合学习的提示潜空间。
潜空间行走或潜空间探索是在潜空间中采样一个点并增量地改变潜表示的过程。其最常见的应用是生成动画,其中每个采样点都被馈送到解码器并存储为最终动画中的一个帧。对于高质量的潜表示,这会产生连贯的动画。这些动画可以提供对潜空间特征图的洞察,并最终可以改进训练过程。下面显示了一个这样的 GIF
在本指南中,我们将展示如何利用 KerasCV 中的 Stable Diffusion API 来执行提示插值以及通过 Stable Diffusion 的视觉潜流形以及文本编码器的潜流形进行圆形行走。
本指南假设读者对 Stable Diffusion 有高级理解。如果您还没有,您应该首先阅读Stable Diffusion 教程。
首先,我们导入 KerasCV 并使用教程使用 Stable Diffusion 生成图像中讨论的优化加载 Stable Diffusion 模型。请注意,如果您使用 M1 Mac GPU 运行,则不应启用混合精度。
!pip install keras-cv --upgrade --quiet
import keras_cv
import keras
import matplotlib.pyplot as plt
from keras import ops
import numpy as np
import math
from PIL import Image
# Enable mixed precision
# (only do this if you have a recent NVIDIA GPU)
keras.mixed_precision.set_global_policy("mixed_float16")
# Instantiate the Stable Diffusion model
model = keras_cv.models.StableDiffusion(jit_compile=True)
By using this model checkpoint, you acknowledge that its usage is subject to the terms of the CreativeML Open RAIL-M license at https://raw.githubusercontent.com/CompVis/stable-diffusion/main/LICENSE
在 Stable Diffusion 中,文本提示首先被编码成一个向量,并且该编码用于指导扩散过程。潜编码向量形状为 77x768(非常大!),当我们向 Stable Diffusion 提供文本提示时,我们只是从潜流形上的一个点生成图像。
为了探索更多此流形,我们可以在两个文本编码之间进行插值并在这些插值点生成图像
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
prompt_2 = "A still life DSLR photo of a bowl of fruit"
interpolation_steps = 5
encoding_1 = ops.squeeze(model.encode_text(prompt_1))
encoding_2 = ops.squeeze(model.encode_text(prompt_2))
interpolated_encodings = ops.linspace(encoding_1, encoding_2, interpolation_steps)
# Show the size of the latent manifold
print(f"Encoding shape: {encoding_1.shape}")
Downloading data from https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz?raw=true
1356917/1356917 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_encoder.h5
492466864/492466864 ━━━━━━━━━━━━━━━━━━━━ 7s 0us/step
Encoding shape: (77, 768)
插值编码后,我们可以从每个点生成图像。请注意,为了在生成的图像之间保持一定的稳定性,我们保持图像之间的扩散噪声不变。
seed = 12345
noise = keras.random.normal((512 // 8, 512 // 8, 4), seed=seed)
images = model.generate_image(
interpolated_encodings,
batch_size=interpolation_steps,
diffusion_noise=noise,
)
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_diffusion_model.h5
3439090152/3439090152 ━━━━━━━━━━━━━━━━━━━━ 26s 0us/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 173s 311ms/step
Downloading data from https://hugging-face.cn/fchollet/stable-diffusion/resolve/main/kcv_decoder.h5
198180272/198180272 ━━━━━━━━━━━━━━━━━━━━ 1s 0us/step
现在我们已经生成了一些插值图像,让我们来看看它们!
在本教程中,我们将把图像序列导出为 gif,以便可以轻松地查看一些时间上下文。对于第一幅和最后一幅图像在概念上不匹配的图像序列,我们会对 gif 进行橡皮筋处理。
如果您在 Colab 中运行,可以通过运行以下命令查看您自己的 GIF
from IPython.display import Image as IImage
IImage("doggo-and-fruit-5.gif")
def export_as_gif(filename, images, frames_per_second=10, rubber_band=False):
if rubber_band:
images += images[2:-1][::-1]
images[0].save(
filename,
save_all=True,
append_images=images[1:],
duration=1000 // frames_per_second,
loop=0,
)
export_as_gif(
"doggo-and-fruit-5.gif",
[Image.fromarray(img) for img in images],
frames_per_second=2,
rubber_band=True,
)
结果可能令人惊讶。通常,在提示之间进行插值会产生看起来连贯的图像,并且通常会展示两个提示内容之间渐进的概念转变。这表明一个高质量的表示空间,它密切反映了视觉世界的自然结构。
为了最好地可视化这一点,我们应该进行更细粒度的插值,使用数百个步骤。为了保持较小的批次大小(以便不会使我们的 GPU 超出内存),这需要手动批处理我们的插值编码。
interpolation_steps = 150
batch_size = 3
batches = interpolation_steps // batch_size
interpolated_encodings = ops.linspace(encoding_1, encoding_2, interpolation_steps)
batched_encodings = ops.split(interpolated_encodings, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
num_steps=25,
diffusion_noise=noise,
)
]
export_as_gif("doggo-and-fruit-150.gif", images, rubber_band=True)
25/25 ━━━━━━━━━━━━━━━━━━━━ 77s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 203ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
生成的 gif 显示了两个提示之间更清晰、更连贯的转变。尝试您自己的提示并进行实验!
我们甚至可以将此概念扩展到多个图像。例如,我们可以在四个提示之间进行插值
prompt_1 = "A watercolor painting of a Golden Retriever at the beach"
prompt_2 = "A still life DSLR photo of a bowl of fruit"
prompt_3 = "The eiffel tower in the style of starry night"
prompt_4 = "An architectural sketch of a skyscraper"
interpolation_steps = 6
batch_size = 3
batches = (interpolation_steps**2) // batch_size
encoding_1 = ops.squeeze(model.encode_text(prompt_1))
encoding_2 = ops.squeeze(model.encode_text(prompt_2))
encoding_3 = ops.squeeze(model.encode_text(prompt_3))
encoding_4 = ops.squeeze(model.encode_text(prompt_4))
interpolated_encodings = ops.linspace(
ops.linspace(encoding_1, encoding_2, interpolation_steps),
ops.linspace(encoding_3, encoding_4, interpolation_steps),
interpolation_steps,
)
interpolated_encodings = ops.reshape(
interpolated_encodings, (interpolation_steps**2, 77, 768)
)
batched_encodings = ops.split(interpolated_encodings, batches)
images = []
for batch in range(batches):
images.append(
model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
diffusion_noise=noise,
)
)
def plot_grid(images, path, grid_size, scale=2):
fig, axs = plt.subplots(
grid_size, grid_size, figsize=(grid_size * scale, grid_size * scale)
)
fig.tight_layout()
plt.subplots_adjust(wspace=0, hspace=0)
plt.axis("off")
for ax in axs.flat:
ax.axis("off")
images = images.astype(int)
for i in range(min(grid_size * grid_size, len(images))):
ax = axs.flat[i]
ax.imshow(images[i].astype("uint8"))
ax.axis("off")
for i in range(len(images), grid_size * grid_size):
axs.flat[i].axis("off")
axs.flat[i].remove()
plt.savefig(
fname=path,
pad_inches=0,
bbox_inches="tight",
transparent=False,
dpi=60,
)
images = np.concatenate(images)
plot_grid(images, "4-way-interpolation.jpg", interpolation_steps)
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 204ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
我们也可以在允许扩散噪声变化的同时进行插值,方法是删除diffusion_noise
参数
images = []
for batch in range(batches):
images.append(model.generate_image(batched_encodings[batch], batch_size=batch_size))
images = np.concatenate(images)
plot_grid(images, "4-way-interpolation-varying-noise.jpg", interpolation_steps)
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 215ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 13s 254ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 12s 235ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 12s 230ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 214ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 210ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 209ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 208ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 10s 205ms/step
50/50 ━━━━━━━━━━━━━━━━━━━━ 11s 213ms/step
接下来——让我们开始散步吧!
我们的下一个实验将是从特定提示生成的点开始,围绕潜流形散步。
walk_steps = 150
batch_size = 3
batches = walk_steps // batch_size
step_size = 0.005
encoding = ops.squeeze(
model.encode_text("The Eiffel Tower in the style of starry night")
)
# Note that (77, 768) is the shape of the text encoding.
delta = ops.ones_like(encoding) * step_size
walked_encodings = []
for step_index in range(walk_steps):
walked_encodings.append(encoding)
encoding += delta
walked_encodings = ops.stack(walked_encodings)
batched_encodings = ops.split(walked_encodings, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
batched_encodings[batch],
batch_size=batch_size,
num_steps=25,
diffusion_noise=noise,
)
]
export_as_gif("eiffel-tower-starry-night.gif", images, rubber_band=True)
25/25 ━━━━━━━━━━━━━━━━━━━━ 6s 228ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 217ms/step
也许不出所料,远离编码器潜流形走得太远会导致生成的图像看起来不连贯。您可以通过设置自己的提示并调整step_size
来增加或减少步行的幅度,自己尝试一下。请注意,当步行的幅度变大时,步行通常会导致进入产生极度噪声图像的区域。
我们的最后一个实验是坚持一个提示并探索扩散模型可以从该提示生成的各种图像。我们通过控制用于播种扩散过程的噪声来做到这一点。
我们创建了两个噪声分量x
和y
,并从 0 到 2π 进行行走,将x
分量的余弦和y
分量的正弦相加以产生噪声。使用这种方法,我们行走的结束到达了我们开始行走的相同噪声输入,因此我们得到了一个“可循环”的结果!
prompt = "An oil paintings of cows in a field next to a windmill in Holland"
encoding = ops.squeeze(model.encode_text(prompt))
walk_steps = 150
batch_size = 3
batches = walk_steps // batch_size
walk_noise_x = keras.random.normal(noise.shape, dtype="float64")
walk_noise_y = keras.random.normal(noise.shape, dtype="float64")
walk_scale_x = ops.cos(ops.linspace(0, 2, walk_steps) * math.pi)
walk_scale_y = ops.sin(ops.linspace(0, 2, walk_steps) * math.pi)
noise_x = ops.tensordot(walk_scale_x, walk_noise_x, axes=0)
noise_y = ops.tensordot(walk_scale_y, walk_noise_y, axes=0)
noise = ops.add(noise_x, noise_y)
batched_noise = ops.split(noise, batches)
images = []
for batch in range(batches):
images += [
Image.fromarray(img)
for img in model.generate_image(
encoding,
batch_size=batch_size,
num_steps=25,
diffusion_noise=batched_noise[batch],
)
]
export_as_gif("cows.gif", images)
25/25 ━━━━━━━━━━━━━━━━━━━━ 35s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 213ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 218ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 211ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 210ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 217ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 204ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 208ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 207ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 215ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 212ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 209ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 216ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 205ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 206ms/step
25/25 ━━━━━━━━━━━━━━━━━━━━ 5s 214ms/step
使用您自己的提示以及不同的unconditional_guidance_scale
值进行实验!
Stable Diffusion 提供的功能远不止单一的文本到图像生成。探索文本编码器的潜流形和扩散模型的噪声空间是体验此模型强大功能的两种有趣方法,而 KerasCV 使其变得易如反掌!