KoboldCpp Single-binary llama.cpp fork with a lightweight built-in UI; fast local inference for RP with long context options. 0370 Roleplay Frontends (Local/Self-Hosted UI)# adult# GGUF# inference
RunPod Serverless Endpoints Always-on, pre-warmed GPU endpoints for low-latency model inference at scale. 0320 Inference/Hosting & APIs# endpoints# GPU# inference
Modal Inference Serverless GPU inference with sub-second cold starts and Python-first workflows. 0310 Inference/Hosting & APIs# GPU# inference# Modal
Baseten Production inference platform—dedicated deployments, autoscaling, and GPU options. 0320 Inference/Hosting & APIs# autoscaling# Baseten# dedicated
SambaNova Cloud RDU-accelerated inference platform with OpenAI-compatible API keys for top open models. 0440 Inference/Hosting & APIs# API# inference# LLM
Cerebras Inference Wafer-scale engine cloud with OpenAI-style APIs for ultra-fast open-model inference. 0450 Inference/Hosting & APIs# API# Cerebras# inference
Together AI Inference Fast serverless APIs and dedicated endpoints for 200+ open models. 0460 Inference/Hosting & APIs# API# dedicated# inference
GroqCloud Ultra-low latency LPU-powered inference for text, speech, and vision models. 0480 Inference/Hosting & APIs# API# Groq# inference
NVIDIA NIM Prebuilt, optimized inference microservices for leading models on any NVIDIA-accelerated stack. 0310 Inference/Hosting & APIs# GPU# inference# microservices
Azure AI Foundry Models / OpenAI Catalog of OpenAI and open models with enterprise governance and Azure AI Inference APIs. 0330 Inference/Hosting & APIs# Azure# enterprise# governance
OpenVINO Open Model Zoo Optimized Intel OpenVINO reference models and demos for high-performance inference. 0310 Model Hubs# inference# Intel# model zoo
Together AI Model Library Fast inference and fine-tuning for 200+ open models with a unified developer experience. 0300 Model Hubs# fine-tuning# inference# model library
NVIDIA NGC Models Optimized model catalog for NVIDIA GPUs—LLMs, vision, speech—with containers and inference recipes. 0300 Model Hubs# containers# GPU# inference
GitHub Models A model catalog and API integrated with GitHub—evaluate, compare, and run many vendor models via one interface. 0460 Model Hubs# API# evaluation# GitHub
Hugging Face Hub The largest open model repo to browse, download, and deploy LLMs, vision, audio, and multimodal models with rich metadata and tooling. 0330 Model Hubs# audio# datasets# inference