multimodal

Total 11 articles sites

Sorting

release update Views Like

LoLLMS WebUI

Feature-rich local web UI with personalities, multi-user support, and multimodal tools—handy for custom RP personas.

0370

Roleplay Frontends (Local/Self-Hosted UI)# adult # multi-user # multimodal

Google AI Studio (Gemini API)

Fast start with Gemini models; grab an API key, 1M-token context, and code snippets.

0340

Inference/Hosting & APIs # AI Studio # API # Gemini

OpenAI API

Unified API for GPT, audio, vision, and realtime—with tooling for evals, moderation, and assistants.

0400

Inference/Hosting & APIs # assistants # enterprise # evaluation

OpenAI Moderation API (omni-moderation-latest)

Multimodal moderation for text & images; granular safety categories and flags.

0390

Guardrails & Moderation # API # image # moderation

SuperAnnotate

AI-assisted annotation and services with workflow, QA, and multimodal support.

0380

Datasets & Labeling # annotation # multimodal # QA

Label Studio (OSS)

Open-source, highly configurable labeling for text, images, audio, video, and chat.

0400

Datasets & Labeling # active learning # labeling # multimodal

LAION-5B

Massive open image–text dataset (multilingual) widely used for generative models.

0370

Datasets & Labeling # CLIP # image-text # LAION

Hugging Face Hub

The largest open model repo to browse, download, and deploy LLMs, vision, audio, and multimodal models with rich metadata and tooling.

0330

Model Hubs # audio # datasets # inference

MiniCPM-V 2.6

Edge-friendly multimodal 8B model (images/video) with quantized variants for low-VRAM local inference.

0510

Open-source Models # edge # int4 # local

InternVL 2.5

Open multimodal family (1B–78B); 78B surpasses 70% on MMMU; broad image/video understanding.

0470

Open-source Models # InternVL # MMMU # multimodal

LLaVA-OneVision 1.5

Fully open multimodal (images/video+text) models & training stack; strong results and reproducible recipes.

0420

Open-source Models # LLaVA # local # multimodal