Vision & Multimedia Processing
18,701 Entities Indexed
Explore image generation, video AI, speech recognition, and music synthesis models.
Top Picks
imagen-3
Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty
imagen-4
Google's Imagen 4 flagship model
imagen-4-fast
Use this fast version of Imagen 4 when speed and cost are more important than quality
imagen-4-ultra
Use this ultra version of Imagen 4 when quality matters more than speed and cost
nano-banana
Google's latest image editing model in Gemini 2.5
nano-banana-pro
Google's state of the art image generation and editing model 🍌🍌
musicgen
Generate music from a prompt or melody
sana-sprint-1.6b
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
clip
Official CLIP models, generate CLIP (clip-vit-large-patch14) text & image embeddings
gpt-image-1
A multimodal image generation model that creates high-quality images. You need to bring your own verified OpenAI key to use this model. Your OpenAI account will be charged for usage.
gpt-image-1.5
OpenAI's latest image generation model with better instruction following and adherence to prompts
sdxl
A text-to-image generative AI model that creates beautiful images
stable-diffusion-3
A text-to-image model with greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency
stable-diffusion-3.5-large
A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, thanks to Query-Key Normalization.
stable-diffusion-inpainting
Fill in masked parts of images with Stable Diffusion
stable-diffusion-3.5-large-turbo
A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, with a focus on fewer inference steps
epiCRealism
Natural Sin Final and last of epiCRealismSince SDXL is right around the corner, let's say it is the final version for now since I put a lot effort into it and probably cannot do much more.I tried to refine the understanding of the Prompts, Hands and of course the Realism.Let's see what you guys can do with it.Thanks to
WAI-REAL_CN
If you want to use more my checkpoint online generation, please visit here. 我使用了1024x1360的大小,都是直接输出没有经过AD跟高修.For the example images
gemini-2.5-flash-image
Google's latest image generation model in Gemini 2.5
background-remover
Remove backgrounds from images.
kandinsky-2.2
multilingual text2image latent diffusion model
flux-1.1-pro
Faster, better FLUX Pro. Text-to-image model with excellent image quality, prompt adherence, and output diversity.
flux-2-klein-4b
Very fast image generation and editing model. 4 steps distilled, sub-second inference for production and near real-time applications.
flux-2-pro
High-quality image generation and editing with support for eight reference images