inference

Total 16 articles sites

Sorting

release update Views Like

KoboldCpp

Single-binary llama.cpp fork with a lightweight built-in UI; fast local inference for RP with long context options.

0370

Roleplay Frontends (Local/Self-Hosted UI)# adult # GGUF # inference

RunPod Serverless Endpoints

Always-on, pre-warmed GPU endpoints for low-latency model inference at scale.

0320

Inference/Hosting & APIs # endpoints # GPU # inference

Modal Inference

Serverless GPU inference with sub-second cold starts and Python-first workflows.

0310

Inference/Hosting & APIs # GPU # inference # Modal

Baseten

Production inference platform—dedicated deployments, autoscaling, and GPU options.

0320

Inference/Hosting & APIs # autoscaling # Baseten # dedicated

SambaNova Cloud

RDU-accelerated inference platform with OpenAI-compatible API keys for top open models.

0440

Inference/Hosting & APIs # API # inference # LLM

Cerebras Inference

Wafer-scale engine cloud with OpenAI-style APIs for ultra-fast open-model inference.

0450

Inference/Hosting & APIs # API # Cerebras # inference

Together AI Inference

Fast serverless APIs and dedicated endpoints for 200+ open models.

0460

Inference/Hosting & APIs # API # dedicated # inference

GroqCloud

Ultra-low latency LPU-powered inference for text, speech, and vision models.

0480

Inference/Hosting & APIs # API # Groq # inference

NVIDIA NIM

Prebuilt, optimized inference microservices for leading models on any NVIDIA-accelerated stack.

0310

Inference/Hosting & APIs # GPU # inference # microservices

Azure AI Foundry Models / OpenAI

Catalog of OpenAI and open models with enterprise governance and Azure AI Inference APIs.

0330

Inference/Hosting & APIs # Azure # enterprise # governance

OpenVINO Open Model Zoo

Optimized Intel OpenVINO reference models and demos for high-performance inference.

0310

Model Hubs # inference # Intel # model zoo

Together AI Model Library

Fast inference and fine-tuning for 200+ open models with a unified developer experience.

0300

Model Hubs # fine-tuning # inference # model library

NVIDIA NGC Models

Optimized model catalog for NVIDIA GPUs—LLMs, vision, speech—with containers and inference recipes.

0300

Model Hubs # containers # GPU # inference

GitHub Models

A model catalog and API integrated with GitHub—evaluate, compare, and run many vendor models via one interface.

0460

Model Hubs # API # evaluation # GitHub

Hugging Face Hub

The largest open model repo to browse, download, and deploy LLMs, vision, audio, and multimodal models with rich metadata and tooling.

0330

Model Hubs # audio # datasets # inference