Advanced Image Generation with Stable Diffusion and Gradio

Creating and modifying images using AI models has become increasingly accessible. In this tutorial, we'll dive into using Stable Diffusion, a powerful text-to-image model, alongside Gradio, an easy-to-use library for building interactive web interfaces for machine learning models. This tutorial covers not just basic image generation but also advanced techniques like image inpainting and adaptive image generation using LoRa weights.

Paper where Latent Consistency was introduced: LCM-LoRa: A universal Stable Diffusion Acceleration Module

This space is available as a live demo here

The repository is available on huggingface or on GitHub

https://macadeliccc-lcm-papercut-demo.hf.space

Setting Up Your Environment

If you have already done these steps or are familiar with the process, you can skip this section.

The first step is to ensure that you have the appropriate NVIDIA GPU drivers installed on your local machine.

Before we can leverage the GPU acceleration for the Whisper-large-v3 model, we need to ensure that we have the right tools installed. Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. It allows you to install, run, and update packages and their dependencies. Installing Conda

If you don't already have Conda installed, you'll need to install it before proceeding. You can download and install Conda from the official Conda website. Choose the installer that's right for your system and follow the on-screen instructions.

Once you have Conda installed, open a terminal window and proceed with setting up your environment.

Setting up Conda Environment

Creating a dedicated Conda environment for your project is considered a best practice as it helps manage dependencies and avoid conflicts.

conda create --name torch python=3.10
conda activate torch

You should make sure that you have followed the correct steps located here to install torch with GPU support for your correct platform. If you have not done this, you will not be able to use the GPU.

I am on linux for example, so this would be my install command using conda.

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch-nightly -c nvidia

Build the correct command for your machine on the website and make sure its installed properly by testing if torch has access to the gpu.

You do not need to locally install CUDA or cudnn, pytorch ships with prebuilt binaries for each version. These steps alone are enough for torch to see your GPU.

First, ensure you have all necessary dependencies installed:

pip install --upgrade gradio diffusers PIL transformers peft

This command installs Gradio for the interface, PyTorch for tensor computations, and the diffusers library which houses the Stable Diffusion models. Basic Image Generation

We start by setting up a function to generate images based on a text prompt using Stable Diffusion:


import torch
from diffusers import LCMScheduler, AutoPipelineForText2Image

def generate_image(prompt, num_inference_steps, guidance_scale):
    model_id = "stabilityai/stable-diffusion-xl-base-1.0"
    adapter_id = "latent-consistency/lcm-lora-sdxl"

    pipe = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16")
    pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
    pipe.to("cuda")

    # Load and fuse lcm lora
    pipe.load_lora_weights(adapter_id)
    pipe.fuse_lora()

    # Generate the image
    image = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale).images[0]
  
    return image

This function loads the Stable Diffusion model with LoRa weights, which help in maintaining consistency and quality in the generated images.

Image Inpainting

Inpainting allows you to modify specific parts of an image, guided by a mask. Here's how you can do it:


from diffusers import AutoPipelineForInpainting
from PIL import Image

def inpaint_image(prompt, init_image, mask_image, num_inference_steps, guidance_scale):
    pipe = AutoPipelineForInpainting.from_pretrained(
        "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
        torch_dtype=torch.float16,
        variant="fp16",
    ).to("cuda")
    pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
    pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
    pipe.fuse_lora()

    if init_image is not None and mask_image is not None:
        init_image = Image.open(io.BytesIO(init_image)).resize((1024, 1024))
        mask_image = Image.open(io.BytesIO(mask_image)).resize((1024, 1024))

        generator = torch.manual_seed(42)
        image = pipe(
            prompt=prompt,
            image=init_image,
            mask_image=mask_image,
            generator=generator,
            num_inference_steps=num_inference_steps,
            guidance_scale=guidance_scale,
        ).images[0]

        return image

Advanced Image Generation with Adapters

To further enhance the model's capabilities, we integrate additional adapters:


from diffusers import DiffusionPipeline

def generate_image_with_adapter(prompt, num_inference_steps, guidance_scale):
    pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16",
    torch_dtype=torch.float16
    ).to("cuda")

    pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
    pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm")
    pipe.load_lora_weights("TheLastBen/Papercut_SDXL", weight_name="papercut.safetensors", adapter_name="papercut")
    pipe.set_adapters(["lcm", "papercut"], adapter_weights=[1.0, 0.8])
    pipe.fuse_lora()

    generator = torch.manual_seed(0)
    image = pipe(prompt=prompt, num_inference_steps=num_inference_steps, guidance_scale=guidance_scale, generator=generator).images[0]
    pipe.unfuse_lora()
    return image

Image Modification

We can also modify the brightness and contrast of generated images:


from PIL import ImageEnhance
import io

def modify_image(image, brightness, contrast):
    image = Image.open(io.BytesIO(image))
    enhancer = ImageEnhance.Brightness(image)
    image = enhancer.enhance(brightness)
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(contrast)
    return image

Building the Gradio Interface

Gradio makes it easy to build interactive web interfaces. Here's how you can set up an interface for your models:


with gr.Blocks() as demo:
    with gr.Row():
        image_output = gr.Image(label="Generated Image")

    with gr.Row():
        with gr.Accordion(label="Configuration Options"):
            prompt_input = gr.Textbox(label="Prompt", placeholder="Self-portrait oil painting, a beautiful cyborg with golden hair, 8k")
            steps_input = gr.Slider(minimum=1, maximum=10, label="Inference Steps", value=4)
            guidance_input = gr.Slider(minimum=0, maximum=2, label="Guidance Scale", value=1)
            generate_button = gr.Button("Generate Image")
    with gr.Row():
        with gr.Accordion(label="Papercut Image Generation"):
            adapter_prompt_input = gr.Textbox(label="Prompt", placeholder="papercut, a cute fox")
            adapter_steps_input = gr.Slider(minimum=1, maximum=10, label="Inference Steps", value=4)
            adapter_guidance_input = gr.Slider(minimum=0, maximum=2, label="Guidance Scale", value=1)
            adapter_generate_button = gr.Button("Generate Image with Adapter")

    with gr.Row():
        with gr.Accordion(label="Inpainting"):
            inpaint_prompt_input = gr.Textbox(label="Prompt for Inpainting", placeholder="a castle on top of a mountain, highly detailed, 8k")
            init_image_input = gr.File(label="Initial Image")
            mask_image_input = gr.File(label="Mask Image")
            inpaint_steps_input = gr.Slider(minimum=1, maximum=10, label="Inference Steps", value=4)
            inpaint_guidance_input = gr.Slider(minimum=0, maximum=2, label="Guidance Scale", value=1)
            inpaint_button = gr.Button("Inpaint Image")

    with gr.Row():
        with gr.Accordion(label="Image Modification (Experimental)"):
            brightness_slider = gr.Slider(minimum=0.5, maximum=1.5, step=1, label="Brightness")
            contrast_slider = gr.Slider(minimum=0.5, maximum=1.5, step=1, label="Contrast")
            modify_button = gr.Button("Modify Image")

    

    generate_button.click(
        generate_image,
        inputs=[prompt_input, steps_input, guidance_input],
        outputs=image_output
    )

    modify_button.click(
        modify_image,
        inputs=[image_output, brightness_slider, contrast_slider],
        outputs=image_output
    )
    inpaint_button.click(
        inpaint_image,
        inputs=[inpaint_prompt_input, init_image_input, mask_image_input, inpaint_steps_input, inpaint_guidance_input],
        outputs=image_output
    )
    adapter_generate_button.click(
        generate_image_with_adapter,
        inputs=[adapter_prompt_input, adapter_steps_input, adapter_guidance_input],
        outputs=image_output
    )

demo.launch()

This Gradio app provides a user-friendly interface for generating and modifying images using the above functions. Conclusion

In this tutorial, we explored advanced image generation techniques using Stable Diffusion and LoRa weights, coupled with Gradio for an interactive experience. This setup demonstrates the potential of AI in creative fields, providing tools for artists, designers, and hobbyists alike.

For the full code and more details, you can visit the GitHub repository linked in the references. References

Stable Diffusion: Stability AI
Gradio: Gradio Docs
LoRa Weights: Latent Consistency