Stable Diffusion is a machine learning model developed by Stability AI in 2022 to generate digital images from natural language descriptions. The model can be used for generating image to image translations guided by text prompt and upscaling images.

Stable diffusion was trained on a subset of the LAION-Aesthetics V2 data set consisting of 512x512 images.[3] Critics have raised concerns about AI ethics, stating that the model can be used to create deepfakes[4] and also question the legality of generating images on a data-set that was trained on copyrighted content without the consent of the original artist(s).[5].

DreamStudio.ai

Stable Diffusion on DreamStudio.ai is very fast. It takes 5-10 seconds to generate an image with the default settings. Dreamstudio is paid, with 2 EUR worth of credit for free on dedicated machine learning gpu’s. By comparison on a 2060 Super with 8 GB of VRAM and with tensor cores it takes 15 seconds to infer with the default settings.

The Stability AI founder is an ex hedge fund manager, Emad Mostaque. There’s a recent video interview he did that goes into his vision: https://www.youtube.com/watch?v=YQ2QtKcK2dA

Using private funds he built a massive fleet of A1000s at AWS. He considers to be “for humanity”. He exclusively decides which applications gets to use it. His plan, or at least his claim, is to transform this situation in it being more diversely funded (institutions, businesses, even the UN) and for access to be decided by committee with main criteria it being useful for humanity. Let’s see if he keeps his promise. AI was on a trajectory to be solely in the hands of a hand full of ultra rich companies that can afford to train and run it, and us poor mortals being at the whims of gatekeeper terms. This guy is on a trajectory to put AI in the hands of the people. Not just for art, for everything. If he fully sees this through, he’s destined to be a tech icon.

SD with google colab (free), or paid with colab pro or colab pro plus

https://github.com/altryne/sd-webui-colab

SD packages for windows

Visions of Chaos (mostly)-automates the installs for dozens of ML models including SD: https://softology.pro/tutorials/tensorflow/tensorflow.htm

https://github.com/hlky/stable-diffusion/wiki/Docker-Guide works well using WSL2 with nvidia-docker.

SD packages for MacOS

https://olin.monster/pages/stable-diffusion/

optimized forks running on < 6GB gpu

https://github.com/basujindal/stable-diffusion (OptimizedSD)

https://github.com/neonsecret/stable-diffusion

Stable diffusion will run fine with 4G if you go for 448x448 instead (basically the same quality).

guides

https://replicate.com/blog/run-stable-diffusion-on-m1-mac https://github.com/lstein/stable-diffusion/blob/main/README-Mac-MPS.md https://rentry.org/GUItard https://news.ycombinator.com/item?id=32642255 A video guide for AMD GPU https://www.youtube.com/watch?app=desktop&v=d_CgaHyA_n4 A guide for AMD https://rentry.org/sdamd

quantize or bfloat

These techniques are supported on a wide range of hardware (basically all mid-range to high-end Intel CPUs since 2013, AMD MI5 and up compute cards, ARM NEON and NVIDIA cards since Pascal [10-series, 2016]).

It could speed up calculations and significantly reduce memory requirements. Perhaps results are worse.

See also https://github.com/basujindal/stable-diffusion/pull/103

energy consumption

Carbon Emitted (Power consumption x Time x Carbon produced based on location of power grid): 11250 kg CO2 eq.

About 6 kWh of electricity per diff

why can’t we virtualize gpu VRAM like with virtual machines?

Bandwidth of dual channel DDR4-3600: 48 GB/s
  Bandwidth of PCIe 4 x16: 26 GB/s
  Bandiwdth of 3090 GDDR6X memory: 935.8 GB/s

Since neural network evaluation is usually bandwidth limited, it’s possible that pushing the data through PCI-E from CPU to GPU is slower than doing the evaluation on CPU only for typical neural networks. That’s without taking into account latency of accessing main memory through PCIe, which would make matters even worse. There is no point in running it on the GPU if the job runs slower? Just run it on the CPU at that point. https://www.microway.com/knowledge-center-articles/performance-characteristics-of-common-transports-buseshttps://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces…

Projects like deepspeed (https://www.deepspeed.ai/) enable running a model that is larger than in VRAM through various tricks like moving weights from regular system RAM into VRAM between layers. This comes with a performance hit though depending on the model.

reads

https://huggingface.co/blog/stable_diffusion

https://twitter.com/_akhaliq/status/1566085920314589184