Stable Diffusion Tutorial (Deprecated)
Three days ago Stable Diffusion was publicly released and today I am bringing to you an easy way of using the model without the need of having any kind of extra hardware, just your laptop and wifi connection. At the end I will also leave a script in case you do have some extra hardware and want to put that RTX 3080 to good use.
In case any of you didn’t know what Stable Diffusion is, it is similar to DALLE·2. It is a diffusion model able to create images from text. For example, for the prompt “Amazing, complex, intricate and highly detailed treehouse in a snow covered bonsai tree on top of a table, steampunk, vibrant colors, vibrant, beautiful, contrast, neon highlights, Highly detailed, ray tracing, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by Beeple, Mike Winklemann, 8k” you get this amazing result.
If you want to be able to produce this astonishing images with just a few words and 20 seconds of computation continue reading. In the internet you may find many tutorials for using this model on Colab. However, the Colab notebooks sometimes don’t give you access to GPU resources and so you may take several minutes to generate one single image. To avoid that we are goint to run Stable Diffusion in Kaggle, their servers provide you with 30 weekly hours of GPU computation, which roughly translates to 5000 generated images, more than neccessary to satisfy your needs.
The first step you need to do is to create a Kaggle and HuggingFace account. The Kaggle account is to have access to GPUs as I said before, and the HuggingFace account is to have access to the Stable Diffusion model. I’ll go step by step. Let’s create the HuggingFace account. Go to https://huggingface.co/.
At the top right click on Sign Up.
Follow the steps and log in with your account. Then, when you are logged in go to Settings as showed in the next image.
Now, go to the Access Tokens section.
Finally, let’s create our needed token. Click on New token.
Enter any name you like, I will use StableDiffusion for obvious reasons. You can use write or read permissions, but you only need read permissions so I advise you to leave it like that.
And you should end up with something like this. Copy the token and save it for later use in Kaggle.
Before you can use this token, you need to agree to the terms and conditions of the Stable Diffusion model. Go to the page https://huggingface.co/CompVis/stable-diffusion-v1-4, access the repository and accept the terms and conditions. If you cannot see the tick box, you just need to log in.
Same process, to create your account go to https://www.kaggle.com/ and register yourself.
Again, just follow the steps.
Once you are logged in, we can start generating images. I have created a notebook with everything explained. To go to the notebook just go to this link: https://www.kaggle.com/code/josepc/stable-diffusion-nsfw/notebook. You should see something like this. Click on the 3 dots at the top right corner. And then go to Copy & edit notebook.
Ok, if it is your first time at kaggle there are a few things to explain before running the notebook. Once you are inside the notebook editor, to start the notebook you have to click the On button, but first you need to make sure the GPU is enabled and that the internet is enabled too. To do this, click the bottom right arrow.
You should see this,
If you don’t see the GPU there, then click on it and set it to GPU. Leave everything else as it is and start the notebook. Once the notebook started some extra bars appear to show the GPU is running.
To run each cell, click on it and do shift+Enter. Everything is explained there, but I’ll go cell by cell again here explaining what you need to modify to make it work for you.
The first cell is just for installing libraries, run it as it is. The second one is where you actually need to enter your token. Modify the highlighted line copying there your HuggingFace token previously created. Keep in mind that the token should be enclosed by commas.
Once you hit shift+Enter, it will start downloading the model. It takes a few, you can see the progress like this. That 1.72G bar is the biggest one, when that is finished you are almost done.
The third cell is for moving the model to the GPU, just run it normally with shift+Enter.
The fourth cell is the important one, here is where you are actually generating the images. There are two variables that you need to modify. The first one is num_images
, set it to the number of images you want to generate from a given prompt, but I advise you not to use more than 4 images at once for reasons I will mention at the end. The second variable is the prompt
variable, modify it to include your desired prompt. Delete the text in between commas (it is quite big) and write what you want. The result should be something like this: prompt = ['GIVEN TEXT'] * num_images
, don’t change the format or it won’t work. Just write your prompt inside the commas.
Once you hit shift+Enter, a progress bar will appear. When the value reaches 51 you are done. It takes nearly 20 seconds per image.
The last two cells are for visualizing and saving the images. Just hit shift+Enter and there you have it, fabulous new images that noone has ever seen before.
Possible errors
There is the chance that you ran into an error call CUDA out of memory. When that happens, the only solution is to reset the notebook and rerun everything. If you create images in batches of 4, then it is quite difficult for that error to happen. But if for some reason you see a big red error message, just ignore it and restart your notebook. For those interested, the reason why that error happens is because there is a memory leakage into the GPU, the model creates some auxiliary tensors and then forgets of their existence. You cannot delete them because they are internal to the model, and the cache memory manager cannot delete them because they are still active, although never used. To solve it one would have to find the references of the allocated tensors and deallocate them manually, but it is easier to just restart the environment.
The error looks like this, so that it doesn’t take you by surprise. If you go to the end of the large message you will see this.
NSFW version
The tutorial I showed you is for using the standard Stable Diffusion version from Hugging Face which has a safety checker to ensure you don’t generate nasty images. However, since the model is Open Source, it is possible to modify the code to remove that safety checker. In the name of liberty, I created another version of the notebook removing that safety checker. If you go to the original link of the notebook, you will see that there is a box stating Version 2 of 2.
If you click it you can see the first version which contains two extra cells to remove the safety checker. You can run that version or copy those cells into your copy of the other version. These are the cells. The first one is for loading extra libraries, and the second one is the actual code removing the checker.
This last cell is quite large, but the only important change was done at the end. Commenting out those two lines is the only thing needed.
After removing the checker you just change the original call function with the modified one.
But you don’t need to understand this, just copy and use it. I have taken the time to make it work for you.
Script version
Since I have explained everything before I will just leave the script version here for those of you that have access to some server with GPUs or those of you rich enough to have GPUs in your houses. The usage is quite simple, there are only 3 flags that you need to know: the number of images to generate, the prompt and the token. The save_path
can be left with the default value. The whole script is pasted below.
1
2
3
4
5
6
7
8
9
Creates several images from a given prompt.
optional arguments:
-h, --help Usage python3 StableDiffusion.py -n N --promp Text --token HuggingFaceToken [--save_path dir]]]
-n N Number of images to output.
--prompt PROMPT Prompt to generate image.
--token TOKEN HuggingFace token to download the model.
--save_path SAVE_PATH
Path to the folder to save the results.