LM Studio (Desktop) Popular desktop app to run and chat with local LLMs; simple model management and fast GPU/CPU offload for RP sessions. 0390 Roleplay Frontends (Local/Self-Hosted UI)# adult# desktop# GPU
RunPod Serverless Endpoints Always-on, pre-warmed GPU endpoints for low-latency model inference at scale. 0320 Inference/Hosting & APIs# endpoints# GPU# inference
Modal Inference Serverless GPU inference with sub-second cold starts and Python-first workflows. 0310 Inference/Hosting & APIs# GPU# inference# Modal
Baseten Production inference platform—dedicated deployments, autoscaling, and GPU options. 0320 Inference/Hosting & APIs# autoscaling# Baseten# dedicated
NVIDIA NIM Prebuilt, optimized inference microservices for leading models on any NVIDIA-accelerated stack. 0310 Inference/Hosting & APIs# GPU# inference# microservices
NVIDIA NGC Models Optimized model catalog for NVIDIA GPUs—LLMs, vision, speech—with containers and inference recipes. 0300 Model Hubs# containers# GPU# inference
NVIDIA Riva GPU-accelerated ASR/TTS SDK for low-latency, on-prem or cloud voice AI deployments. 0350 Speech-to-Text# ASR# GPU# low latency