LM Studio (Desktop) Popular desktop app to run and chat with local LLMs; simple model management and fast GPU/CPU offload for RP sessions. 0400 Roleplay Frontends (Local/Self-Hosted UI)# adult# desktop# GPU
NVIDIA Riva GPU-accelerated ASR/TTS SDK for low-latency, on-prem or cloud voice AI deployments. 0360 Speech-to-Text# ASR# GPU# low latency
RunPod Serverless Endpoints Always-on, pre-warmed GPU endpoints for low-latency model inference at scale. 0330 Inference/Hosting & APIs# endpoints# GPU# inference
Baseten Production inference platform—dedicated deployments, autoscaling, and GPU options. 0330 Inference/Hosting & APIs# autoscaling# Baseten# dedicated
NVIDIA NIM Prebuilt, optimized inference microservices for leading models on any NVIDIA-accelerated stack. 0320 Inference/Hosting & APIs# GPU# inference# microservices
NVIDIA NGC Models Optimized model catalog for NVIDIA GPUs—LLMs, vision, speech—with containers and inference recipes. 0320 Model Hubs# containers# GPU# inference
Modal Inference Serverless GPU inference with sub-second cold starts and Python-first workflows. 0310 Inference/Hosting & APIs# GPU# inference# Modal