🎨

Vision & Multimedia Processing

18,701 Entities Indexed

Explore image generation, video AI, speech recognition, and music synthesis models.

Top Picks

Leading models for image, video, and audio processing
Page 1 of 780
📦 Source
🛡️ FNI 16

imagen-3

by google

Google's highest quality text-to-image model, capable of generating images with detail, rich lighting and beauty

public image text
❤️ 0
📥 1.9M
📦 Source
🛡️ FNI 16

imagen-4

by google

Google's Imagen 4 flagship model

public image
❤️ 0
📥 7.4M
📦 Source
🛡️ FNI 16

imagen-4-fast

by google

Use this fast version of Imagen 4 when speed and cost are more important than quality

public image
❤️ 0
📥 4.0M
📦 Source
🛡️ FNI 16

imagen-4-ultra

by google

Use this ultra version of Imagen 4 when quality matters more than speed and cost

public image
❤️ 0
📥 1.3M
📦 Source
🛡️ FNI 16

nano-banana

by google

Google's latest image editing model in Gemini 2.5

public image
❤️ 0
📥 77.7M
📦 Source
🛡️ FNI 16

nano-banana-pro

by google

Google's state of the art image generation and editing model 🍌🍌

public image
❤️ 0
📥 11.2M
📦 Source
🛡️ FNI 16

musicgen

by meta

Generate music from a prompt or melody

public
❤️ 0
📥 3.3M
📦 Source
🛡️ FNI 16

sana-sprint-1.6b

by nvidia

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

public diffusion
❤️ 0
📥 1.1M
📦 Source
🛡️ FNI 16

clip

by openai

Official CLIP models, generate CLIP (clip-vit-large-patch14) text & image embeddings

public image text
❤️ 0
📥 3.7M
📦 Source
🛡️ FNI 16

gpt-image-1

by openai

A multimodal image generation model that creates high-quality images. You need to bring your own verified OpenAI key to use this model. Your OpenAI account will be charged for usage.

public image
❤️ 0
📥 1.4M
📦 Source
🛡️ FNI 16

gpt-image-1.5

by openai

OpenAI's latest image generation model with better instruction following and adherence to prompts

public image
❤️ 0
📥 2.0M
📦 Source
🛡️ FNI 16

sdxl

by stability-ai

A text-to-image generative AI model that creates beautiful images

public image text
❤️ 0
📥 83.8M
📦 Source
🛡️ FNI 16

stable-diffusion-3

by stability-ai

A text-to-image model with greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency

public image text
❤️ 0
📥 1.8M
📦 Source
🛡️ FNI 16

stable-diffusion-3.5-large

by stability-ai

A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, thanks to Query-Key Normalization.

public image text
❤️ 0
📥 1.9M
📦 Source
🛡️ FNI 16

stable-diffusion-inpainting

by stability-ai

Fill in masked parts of images with Stable Diffusion

public image diffusion
❤️ 0
📥 20.9M
📦 Source
🛡️ FNI 15

stable-diffusion-3.5-large-turbo

by stability-ai

A text-to-image model that generates high-resolution images with fine details. It supports various artistic styles and produces diverse outputs from the same prompt, with a focus on fewer inference steps

public image text
❤️ 0
📥 894.6K
📦 Source
🛡️ FNI 13

epiCRealism

by epinikion text-to-image

Natural Sin Final and last of epiCRealismSince SDXL is right around the corner, let's say it is the final version for now since I put a lot effort into it and probably cannot do much more.I tried to refine the understanding of the Prompts, Hands and of course the Realism.Let's see what you guys can do with it.Thanks to

[ " r a w p h o t , s e l i c f m n g b d y u k - ]
❤️ 0
📥 826.5K
📦 Source
🛡️ FNI 13

WAI-REAL_CN

by WAI0731 text-to-image

If you want to use more my checkpoint online generation, please visit here. 我使用了1024x1360的大小,都是直接输出没有经过AD跟高修.For the example images

[ " w o m a n , r e l i s t c y p h k b - d f u g ]
❤️ 0
📥 119.4K
📦 Source
🛡️ FNI 13

gemini-2.5-flash-image

by google

Google's latest image generation model in Gemini 2.5

public image
❤️ 0
📥 672.7K
📦 Source
🛡️ FNI 13

background-remover

by 851-labs

Remove backgrounds from images.

public image
❤️ 0
📥 14.9M
📦 Source
🛡️ FNI 13

kandinsky-2.2

by ai-forever

multilingual text2image latent diffusion model

public image text diffusion
❤️ 0
📥 10.1M
📦 Source
🛡️ FNI 13

flux-1.1-pro

by black-forest-labs

Faster, better FLUX Pro. Text-to-image model with excellent image quality, prompt adherence, and output diversity.

public image text
❤️ 0
📥 66.8M
📦 Source
🛡️ FNI 13

flux-2-klein-4b

by black-forest-labs

Very fast image generation and editing model. 4 steps distilled, sub-second inference for production and near real-time applications.

public image
❤️ 0
📥 1.7M
📦 Source
🛡️ FNI 13

flux-2-pro

by black-forest-labs

High-quality image generation and editing with support for eight reference images

public image
❤️ 0
📥 2.1M