Inference/Hosting & APIs

Total 24 articles sites

Serve models via APIs and infra

Writing & Documents Images & Design Video & Avatars Audio & Voice Productivity & Office Coding & Dev Search & Research Agents & Automation Marketing & Growth Customer Support Open Source & Models Prompts & Templates Entertainment

Sorting

release update Views Like

Together AI (Models Catalog)

Model library and API docs to stream tokens, set safety models, and manage endpoints.

0330

Inference/Hosting & APIs # docs # endpoints # models

RunPod Serverless Endpoints

Always-on, pre-warmed GPU endpoints for low-latency model inference at scale.

0320

Inference/Hosting & APIs # endpoints # GPU # inference

Modal Inference

Serverless GPU inference with sub-second cold starts and Python-first workflows.

0310

Inference/Hosting & APIs # GPU # inference # Modal

Baseten

Production inference platform—dedicated deployments, autoscaling, and GPU options.

0320

Inference/Hosting & APIs # autoscaling # Baseten # dedicated

Anyscale Endpoints (Ray Serve)

OpenAI-compatible serving on Ray with autoscaling and many-model deployments.

0370

Inference/Hosting & APIs # Anyscale # autoscaling # endpoints

SambaNova Cloud

RDU-accelerated inference platform with OpenAI-compatible API keys for top open models.

0440

Inference/Hosting & APIs # API # inference # LLM

Cerebras Inference

Wafer-scale engine cloud with OpenAI-style APIs for ultra-fast open-model inference.

0450

Inference/Hosting & APIs # API # Cerebras # inference

Replicate

Run and deploy community and custom models with a simple cloud API and playgrounds.

0420

Inference/Hosting & APIs # API # deploy # hosted models

OpenRouter

One API that routes to hundreds of models with pricing, latency, and fallback controls.

0740

Inference/Hosting & APIs # API # catalog # fallback

Fireworks AI

High-throughput inference and fine-tuning for open models; global, scalable endpoints.

0300

Inference/Hosting & APIs # endpoints # fine-tuning # Fireworks

Together AI Inference

Fast serverless APIs and dedicated endpoints for 200+ open models.

0460

Inference/Hosting & APIs # API # dedicated # inference

GroqCloud

Ultra-low latency LPU-powered inference for text, speech, and vision models.

0490

Inference/Hosting & APIs # API # Groq # inference

Snowflake Cortex

Run LLMs in your governed data platform—AISQL functions and secure inference.

0350

Inference/Hosting & APIs # AISQL # Cortex # governed

Databricks Mosaic AI Model Serving

Serve custom and foundation models as REST endpoints with AI Gateway governance.

0330

Inference/Hosting & APIs # Databricks # Gateway # governance

IBM watsonx.ai

Business-ready foundation models and Model Gateway with governance and pricing controls.

0400

Inference/Hosting & APIs # API # foundation models # governance