放大/多道提示詞的好幫手：MultiDiffusion插件 - Stable Diffusion

2023/04/29閱讀時間約 40 分鐘

本心得使用Vlad Diffusion（下略Vlad）介面進行，但Automatic1111（下略A1111）也適用。

*請注意！只有Vlad Diffusion可以透過Tiled Diffusion功能無限制地提升解析度，A1111則有硬上限，例如長寬比2:1的話最高只能到4000x2000。

MultiDiffusion with Tiled VAE（或是叫Tiled Diffusion with Tiled VAE、multidiffusion-upscaler-for-automatic1111）是Vlad內建但A1111可下載使用的插件（extension），且A1111安裝後與Vlad顯示的介面是相同的。

A1111該插件安裝方法如下：

1. 前往Extensions分頁→Available項目→點擊"Load from":

2. 在出現的插件列表中找到"MultiDiffusion with Tiled VAE"：

3. 點擊該列最右邊的"Install"安裝，待畫面從變暗中恢復即是安裝完成；

4. 切換到Extensions分頁的Available項目→點擊"Apply and restart UI"：

5. Restart UI後有可能介面依然不會出現，請手動重新整理UI頁面，例如F5或在瀏覽器網址列輸入UI介面網址後，按下Enter強制重新整理：

在網址列按下Enter重新整理UI介面

6. MultiDiffusion with Tiled VAE會以"Tiled Diffusion"、"Tiled VAE"兩項可折疊項目出現在Text2Image和Image2Image中（Vlad的From Text和From Image）：

7. 打開上述兩者後看起來會像這樣：

上圖中間部分的"Noise Inversion"只有在Image2Image（From Image）中有。

"Noise Inversion"和其下方的"Region Prompt Control"將是本文重點。

不過，現在先列出其他有可能會用到的程式設定和遇到的問題與解決方法，接著再專心分享上述兩項。

優化放大圖片的設定：

1. CUDA out of memory 錯誤
使用NVIDIA顯卡且有安裝CUDA的使用者，放大圖片時最常遇到的Error（錯誤）之一是"CUDA out of memory"。在A1111的webui-user.bat/Vlad的webui.bat中加入下列指令，有助於提升放大圖片的解析度上限，像我使用GTX 3060 12GB （使用Vlad介面）搭配Tiled Diffusion和Tiled VAE，至少From Image (Image2Image)可直接突破8000x8000的解析度：

set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

修改保存webui-user.bat或webui.bat後，都需要重開程式。

直接擺在@echo off的下一行即可，中間其實不用多空一行，只是我習慣。

2. NaN VAE values 錯誤
NaN Vae Error錯誤在運算低解析度圖時發生機率較低，但以我的系統而言，一旦解析度達到2048以上就很常遇到。雖然有個撇步是在A1111的webui-user.bat中輸入set COMMANDLINE_ARGS= --disable-nan-check，或是Vlad的下圖設定：

但實際上這項指令只是忽視問題，問題本身沒有獲得解決，而可能造成辛苦等了老半天卻出現整張黑圖、圖片有全黑區塊，或者在使用Noise Inversion功能時產生意料外的成品如下：

解決方法：
I. Vlad請在Settings→CUDA分類勾選下圖選項後"Apply Settings"，並關掉Vlad的命令提示字元視窗重開程式；

II. A1111請在webui-user.bat中的set COMMANDLINE_ARGS=一行加入--no-half-vae，例如下圖：

set COMMANDLINE_ARGS= 可以加入多項指令

3. Vlad限定
在Vlad的Settings→Stable Diffusion中，可勾選下列紅框選項提升效能/減少VRAM用量：

下面數來第二個紅框，其實使用xFomers時勾左邊的即可，使用Scaled-Dot-Product時勾右邊的即可

4. Deep Negative

-Prompt的部分，我多加了一種Textual Inversion，請前往此CIVITAI網站頁面下載。

下載的檔案請放在：
Vlad\automatic\models\embeddings
A1111\webui\embeddings

使用時請從下圖位置點選或直接在-Prompt輸入ng_deepnegative_v1_75t。

好了！開始進入正題

先快速帶過其他選項

MultiDiffusion的選項很多，先快速帶過非本文重點的選項：

I. Tiled Diffusion方面，下圖紅框中的選項基本上不用修改：

而下圖紅框的選項則是：
1. Enable - 有使用（包括使用Noise Inversion）就要打勾
2. Keep input image size - 我習慣使用Image2Image（From Image）時打勾(*)
3. Method - 哪一種都可以
4. Free GPU - 點擊後會在命令提示字元視窗顯示可用VRAM大小
5. Upscaler - 選擇想使用的放大演算法(**)
6. Scale Factor - 放大倍率

*即使Image2Image的width/height有2048的上限，此選項可在Tiled Diffusion生效時，強制使用原圖的解析度。

**Upscaler最近我常使用4x-UltraSharp.pth，請前往此頁面下載。我用的是該頁面"Beta"資料夾裡面的"beta"版本，因為更新時間比主頁面中的要更新幾分鐘，但我懷疑是同個檔案不同名字。
下載後請把4x-UltraSharp.pth放到：
Vlad\automatic\models\ESRGAN
A1111\webui\models\ESRGAN

II. Tiled VAE方面，基本上把下面紅框的選項都打勾即可：

都不打勾其實也可以，但運算會慢一些。

桃(紫)框的部分根據插件作者，如果發生CUDA out of memory的情況可調低一個2倍數，例如2048→1024，我通常預先設定為1024。

——辛苦了！開始進入壓軸！

I. Region Prompt Control

和其他某些插件中的分割功能類似，可以把一張圖片區分為至多8個區塊(Region)來運算不同的提示詞。例如下面幾張圖：

上面三張都是用同樣的設定和指令，如下：

Model: cyriousmix_14
Clip skip: 2
Resolution: 768x512
Sampling steps: 20
Sampling Method: DPM++ 2S a Karras
CFG scale: 7
Seed: 516615266

使用Region Prompt Control的大原則：最上方的基礎+/-指令只放「整張圖適用」的提示詞，因為會套用到所有Region上；其他劃分區塊的提示詞則放在對應的Region中。上面三張範例圖的提示詞如下：

主Prompts

主Prompts
+
masterpiece, best quality, ultra-detailed, realistic, extremely clear, dramatic lighting, ruin, dusk, vegetation, vines, 
-
bad-hands-5, ng_deepnegative_v1_75t, (low quality, worst quality:1.4), (monochrome:1.1), (greyscale), (watermark), (text), blurry, jpeg artifacts, cropped, normal quality, signature, username, artist name, cartoon, canvas frame, lowres, (disfigured), (bad art), (deformed), (extra limbs), (b&w), weird colors, (duplicate), (morbid), (mutilated), mutated hands, (poorly drawn hands), (poorly drawn face), (mutation), (ugly),  (bad proportions), cloned face, out of frame, gross proportions, (malformed limbs), (missing arms), (missing legs), (extra arms), (extra legs), fused fingers, (long neck), lowres, (grayscale), (skin spots), acnes, skin blemishes, (age spot),

Region 1 - 因為是角色，Type設定為Foreground（前景）。Feather 就是透明度柔化程度

Region 1
+
1girl, <lora:2bNierAutomataLora_v2b:1>,  yorha no. 2 type b, black dress, black gloves, black hairband, black jacket, bob cut, black eye band, juliet sleeves, long sleeves, a single mole under mouth, puffy sleeves, short hair, white hair, looking at viewer, upper body, facing left,
-
yor briar,
Seed: 2262721093
*每個Region都有自己的Seed!

＊2B的LoRA模組請前往此CIVITAI頁面下載

Region 2，也是Foreground，x、y、w、h是在整張圖片中的相對位置

Region 2
+
1girl, <lora:yorBriarSpyXFamily_v10:1>, Yor Briar, head band, black dress, daggers, long hair, black hair, hairband, bare shoulders, cool posture, looking at viewer, smile, half body, upper body, facing right,
-
yorha no. 2 type b,
Seed： 569603253

＊約兒的LoRA模組請前往此CIVITAI頁面下載

Region 3 - 設為背景(Background)，這裡我將x y w h的值設成「佔滿整張圖片」

Region 3
+
stone wall
-
沒填
Seed: 3310627624

展開Region Prompt Control介面後，會有編號1~8的區塊(Region)，範例圖使用了三個，其中兩個人物的設定為"前景"(Foreground)，背景石牆是"背景"(Background)。（註：請記得勾選Tiled Diffusion和Region Control的Enable）

Region 1上方的上傳截圖區域必須先有圖片，不然無法勾選Region的"Enable Region #"。該圖片是用來顯示Region區塊之用，不需要和真正的圖片一樣大、也可以是空白圖，不過放上等比例的圖片會比較好判斷。範例圖的Regions設定好後，在參考圖上的範圍如下：

Region的區塊是可以用滑鼠在參考圖上調整位置和大小的，但編號數字大的Region會蓋過編號小的Region範圍，也就是當編號小的和編號大的重疊，會無法用滑鼠調整編號小的區塊。

但是Region Prompt有一個極限：即使所有設定完全相同，仍然會算出差別很大的圖片，所以才會有上面相同設定卻很不一樣的三張範例圖。不過能將想要的人物或物件放在指定的圖片範圍中，還是相當有用的功能。

II. Noise Inversion

用Image2Image （From Image）運算大圖時，最常遇到的情況就是 Denoising strength如果數值太小，細節會不夠，Denoising strength如果數值調大，卻又可能和原圖相差太大。

"Noise Inversion"即是用來解決上述煩惱的功能，原理是將低解析度的原圖回推為Stable Diffusion運算時當作依據的noise（雜訊），進而在高Denoising strength時仍能保留相當程度的原圖內容，例如角色五官的部分，實際運作時如下：

開始分析

繼續分析

最後分析

下面調整了幾項設定，擷取約兒的臉來比較：
（DS = Denoising strength / IS = Inversion steps）

約兒比較

可以看出來即使到了通常會有明顯變化的 Denoising Strength = 0.5，也能保留相當多的原圖架構。

上圖中"IS"代表的Inversion Steps，是Noise Inversion的其中一項數值：

Inversion steps

我自己玩過Noise Inversion各數值後，感覺只有Inversion steps的影響比較明顯，大致上數值越高保留的原圖架構越多。上圖約兒的臉差異不夠明顯，下圖擷取同樣四張圖的背景一小部分拿來比較，能明顯看出差別：

背景柱子比較

右下的DN0.5/IS20保留的原圖內容，大約落在右上DN0.3/IS10和左下DN0.5/IS10之間。

我嘗試過更多數值後，感想是Noise Inversion保留原圖架構的方式和低數值Denoising Strength比較起來，似乎比較「粗曠」。換言之，當原圖有不太滿意的細節時，可以嘗試使用高數值的Denoising Strength搭配高數值的Inversion steps，例如DN0.5 / IS50，既可透過高數值的Inversion steps保留原圖的主要結構，又能利用較高數值的Denoising strength來改變不滿意的細節（可搭配Inpaint使用），不過插件作者表示Denoising strength最好不要超過0.6。

——祝大家算圖愉快！

算圖分享1

算圖分享2

算圖分享3

算圖分享1設定：

masterpiece, best quality, ultra-detailed, realistic, dramatic lighting, ruin, dusk, vegetation, vines,
Negative prompt: bad-hands-5, ng_deepnegative_v1_75t, (low quality, worst quality:1.4), (monochrome:1.1), (greyscale), (watermark), (text), blurry, jpeg artifacts, cropped, normal quality, signature, username, artist name, cartoon, canvas frame, lowres, (disfigured), (bad art), (deformed), (extra limbs), (b&w), weird colors, (duplicate), (morbid), (mutilated), mutated hands, (poorly drawn hands), (poorly drawn face), (mutation), (ugly),  (bad proportions), cloned face, out of frame, gross proportions, (malformed limbs), (missing arms), (missing legs), (extra arms), (extra legs), fused fingers, (long neck), lowres, (grayscale), (skin spots), acnes, skin blemishes, (age spot),
Steps: 20, Sampler: Euler, CFG scale: 6, Seed: 516615266, Size: 3379x2252, Model hash: 76385703e8, Model: sweetMix_v15, Denoising strength: 0.4, Clip skip: 2, Token merging ratio: 0.3, Token merging ratio hr: 0.3, Tiled Diffusion upscaler: 4x-UltraScale_V0.5 BETA, Tiled Diffusion scale factor: 2.2, Tiled Diffusion: "{'Method': 'MultiDiffusion', 'Latent tile width': 128, 'Latent tile height': 128, 'Overlap': 16, 'Tile batch size': 1, 'Upscaler': '4x-UltraScale_V0.5 BETA', 'Scale factor': 2.2, 'Keep input size': True, 'Noise inverse': True, 'Steps': 30, 'Retouch': 1, 'Renoise strength': 1, 'Kernel size': 64, 'Region control': {'Region 1': {'enable': True, 'x': 0, 'y': 0.2208, 'w': 0.5139, 'h': 0.775, 'prompt': '1girl, <lora:2bNierAutomataLora_v2b:1>,  yorha no. 2 type b, black dress, black gloves, black hairband, black jacket, bob cut, black eye band, juliet sleeves, long sleeves, a single mole under mouth, puffy sleeves, short hair, white hair, looking at viewer, upper body, facing left,', 'neg_prompt': 'yor briar,', 'blend_mode': 'Foreground', 'feather_ratio': 0.2, 'seed': 2262721093}, 'Region 2': {'enable': True, 'x': 0.525, 'y': 0.2167, 'w': 0.475, 'h': 0.7833, 'prompt': '1girl, <lora:yorBriarSpyXFamily_v10:1>, Yor Briar, head band, black dress, daggers, long hair, black hair, hairband, bare shoulders, cool posture, looking at viewer, smile, half body, upper body, facing right,', 'neg_prompt': 'yorha no. 2 type b,', 'blend_mode': 'Foreground', 'feather_ratio': 0.2, 'seed': 569603253}, 'Region 3': {'enable': True, 'x': 0, 'y': 0, 'w': 1, 'h': 1, 'prompt': 'stone wall', 'neg_prompt': '', 'blend_mode': 'Background', 'feather_ratio': 0.2, 'seed': 3310627624}}}"

算圖分享2設定：

masterpiece, best quality, ultra-detailed, realistic, dramatic lighting, ruin, dusk, vegetation, vines,
Negative prompt: bad-hands-5, ng_deepnegative_v1_75t, (low quality, worst quality:1.4), (monochrome:1.1), (greyscale), (watermark), (text), blurry, jpeg artifacts, cropped, normal quality, signature, username, artist name, cartoon, canvas frame, lowres, (disfigured), (bad art), (deformed), (extra limbs), (b&w), weird colors, (duplicate), (morbid), (mutilated), mutated hands, (poorly drawn hands), (poorly drawn face), (mutation), (ugly),  (bad proportions), cloned face, out of frame, gross proportions, (malformed limbs), (missing arms), (missing legs), (extra arms), (extra legs), fused fingers, (long neck), lowres, (grayscale), (skin spots), acnes, skin blemishes, (age spot),
Steps: 20, Sampler: Euler, CFG scale: 6, Seed: 516615266, Size: 3379x2252, Model hash: 3177a3a2a0, Model: salutemix_v1, Denoising strength: 0.4, Clip skip: 2, Token merging ratio: 0.3, Token merging ratio hr: 0.3, Tiled Diffusion upscaler: 4x-UltraScale_V0.5 BETA, Tiled Diffusion scale factor: 2.2, Tiled Diffusion: "{'Method': 'MultiDiffusion', 'Latent tile width': 128, 'Latent tile height': 128, 'Overlap': 16, 'Tile batch size': 1, 'Upscaler': '4x-UltraScale_V0.5 BETA', 'Scale factor': 2.2, 'Keep input size': True, 'Noise inverse': True, 'Steps': 30, 'Retouch': 1, 'Renoise strength': 1, 'Kernel size': 64, 'Region control': {'Region 1': {'enable': True, 'x': 0, 'y': 0.2208, 'w': 0.5139, 'h': 0.775, 'prompt': '1girl, <lora:2bNierAutomataLora_v2b:1>,  yorha no. 2 type b, black dress, black gloves, black hairband, black jacket, bob cut, black eye band, juliet sleeves, long sleeves, a single mole under mouth, puffy sleeves, short hair, white hair, looking at viewer, upper body, facing left,', 'neg_prompt': 'yor briar,', 'blend_mode': 'Foreground', 'feather_ratio': 0.2, 'seed': 2262721093}, 'Region 2': {'enable': True, 'x': 0.525, 'y': 0.2167, 'w': 0.475, 'h': 0.7833, 'prompt': '1girl, <lora:yorBriarSpyXFamily_v10:1>, Yor Briar, head band, black dress, daggers, long hair, black hair, hairband, bare shoulders, cool posture, looking at viewer, smile, half body, upper body, facing right,', 'neg_prompt': 'yorha no. 2 type b,', 'blend_mode': 'Foreground', 'feather_ratio': 0.2, 'seed': 569603253}, 'Region 3': {'enable': True, 'x': 0, 'y': 0, 'w': 1, 'h': 1, 'prompt': 'stone wall', 'neg_prompt': '', 'blend_mode': 'Background', 'feather_ratio': 0.2, 'seed': 3310627624}}}"

算圖分享3設定：

masterpiece, best quality, ultra-detailed, realistic, dramatic lighting, ruin, dusk, vegetation, vines,
Negative prompt: bad-hands-5, ng_deepnegative_v1_75t, (low quality, worst quality:1.4), (monochrome:1.1), (greyscale), (watermark), (text), blurry, jpeg artifacts, cropped, normal quality, signature, username, artist name, cartoon, canvas frame, lowres, (disfigured), (bad art), (deformed), (extra limbs), (b&w), weird colors, (duplicate), (morbid), (mutilated), mutated hands, (poorly drawn hands), (poorly drawn face), (mutation), (ugly),  (bad proportions), cloned face, out of frame, gross proportions, (malformed limbs), (missing arms), (missing legs), (extra arms), (extra legs), fused fingers, (long neck), lowres, (grayscale), (skin spots), acnes, skin blemishes, (age spot),
Steps: 20, Sampler: Euler, CFG scale: 6, Seed: 516615266, Size: 3379x2252, Model hash: 3177a3a2a0, Model: salutemix_v1, Denoising strength: 0.5, Clip skip: 2, Token merging ratio: 0.3, Token merging ratio hr: 0.3, Tiled Diffusion upscaler: 4x-UltraScale_V0.5 BETA, Tiled Diffusion scale factor: 2.2, Tiled Diffusion: "{'Method': 'MultiDiffusion', 'Latent tile width': 128, 'Latent tile height': 128, 'Overlap': 16, 'Tile batch size': 1, 'Upscaler': '4x-UltraScale_V0.5 BETA', 'Scale factor': 2.2, 'Keep input size': True, 'Noise inverse': True, 'Steps': 15, 'Retouch': 1, 'Renoise strength': 1, 'Kernel size': 64, 'Region control': {'Region 1': {'enable': True, 'x': 0, 'y': 0.2208, 'w': 0.5139, 'h': 0.775, 'prompt': '1girl, <lora:2bNierAutomataLora_v2b:1>,  yorha no. 2 type b, black dress, black gloves, black hairband, black jacket, bob cut, black eye band, juliet sleeves, long sleeves, a single mole under mouth, puffy sleeves, short hair, white hair, looking at viewer, upper body, facing left,', 'neg_prompt': 'yor briar,', 'blend_mode': 'Foreground', 'feather_ratio': 0.2, 'seed': 2262721093}, 'Region 2': {'enable': True, 'x': 0.525, 'y': 0.2167, 'w': 0.475, 'h': 0.7833, 'prompt': '1girl, <lora:yorBriarSpyXFamily_v10:1>, Yor Briar, head band, black dress, daggers, long hair, black hair, hairband, bare shoulders, cool posture, looking at viewer, smile, half body, upper body, facing right,', 'neg_prompt': 'yorha no. 2 type b,', 'blend_mode': 'Foreground', 'feather_ratio': 0.2, 'seed': 569603253}, 'Region 3': {'enable': True, 'x': 0, 'y': 0, 'w': 1, 'h': 1, 'prompt': 'stone wall', 'neg_prompt': '', 'blend_mode': 'Background', 'feather_ratio': 0.2, 'seed': 3310627624}}}"

＊2023/4/30註記2：

MultiDiffusion插件作者修復了錯誤。如果之前有使用註記1的臨時修補方法，啟動Vlad時有可能出現插件更新的錯誤。
修正方法：

Vlad：前往Vlad\automatic\extensions-builtin\，刪除multidiffusion-upscaler-for-automatic1111資料夾後，重新跑webui.bat。
A1111：前往A1111\webui\extensions\，刪除multidiffusion-upscaler-for-automatic1111資料夾後，手動重新安裝該插件。

整個插件只有300KB，重新安裝很快！

＊2023/4/30註記1：

今天更新MultiDiffusion插件後，使用Tiled Vae會發生類似下列的問題，出現在命令提示字元視窗：

RuntimeError: The expanded size of the tensor (3) must match the existing size (8) at non-singleton dimension 1. Target sizes: [1, 3, 88, 96]. Tensor sizes: [8, 88, 96]

在開發者修正前，請下載這個連結內的vae_optimize.py檔案，取代在下列位置中的同名檔案：

A1111\webui\extensions\multidiffusion-upscaler-for-automatic1111\scripts
Vlad\automatic\extensions-builtin\multidiffusion-upscaler-for-automatic1111\scripts

如無上述問題，則代表開發者已更新修正。

為什麼會看到廣告