Introducing Yarune Inference API v3

AI infrastructure
that scales with you

Deploy, monitor, and scale AI models with millisecond latency. The developer platform built for production AI workloads.

Trusted by engineering teams at

VercelStripeLinearNotionLoomFigmaRetool
yarune.ai / dashboard / inference

Overview

Inference
Models
Logs
Analytics

Config

Settings
API Keys
Usage
Inference Overview
Last 30 days · us-east-1
Requests / day
2.4M
↑ 18.2%
P50 Latency
38ms
↑ 12% faster
Uptime
99.99%
↑ SLA met
Cost / 1k tokens
$0.12
↑ 4% usage
ModelRequestsLatencyStatus
yarune-inference-xl
gpt-class · 175B
1.2M34mslive
yarune-embed-v2
embedding · 1.5B
890K12mslive
yarune-vision-pro
multimodal · 70B
310K88msidle

Platform

Everything you need to ship AI

From raw inference to full observability, Yarune is the AI infrastructure layer your team needs to move fast.

Inference API
Deploy any model — open-source or custom — with one API call. Auto-scaling, batching, and caching built in.
Observability
Request-level tracing, latency histograms, token usage analytics, and real-time alerting out of the box.
Fine-tuning
Train custom adapters on your data. Instant deployment of fine-tuned checkpoints with zero cold starts.
Streaming
First token in under 50ms. Native SSE and WebSocket streaming for real-time AI-powered UIs at any scale.
Global Edge
12 regions, 40+ PoPs. Requests automatically route to the nearest node. Sub-100ms for 95% of traffic globally.
Webhooks & Events
Subscribe to model events, usage milestones, and error thresholds. Integrate with Slack, PagerDuty, or any HTTP endpoint.

Developer First

Inference in three lines

Our TypeScript SDK is designed for humans. Type-safe, fully async, and streaming-native.

inference.tsTypeScript
import { Yarune } from '@yarune/sdk'

const client = new Yarune({
  apiKey: process.env.YARUNE_KEY
})

// Stream tokens in real-time
const stream = await client.inference.stream({
  model: 'yarune-xl-v3',
  messages: [{ role: 'user',
    content: 'Summarize this document'
  }],
  maxTokens: 2048
})

for await (const chunk of stream) {
  process.stdout.write(chunk.delta)
}

Pricing

Pay for what you use

Transparent, usage-based pricing with no hidden fees. Scale to millions of requests without negotiating contracts.

Hobby
$0
Free forever · 100k requests/mo
  • All public models
  • 1M tokens / month
  • Community support
  • Shared infrastructure
  • Basic analytics
Most Popular
Pro
$49
per seat / month · unlimited requests
  • All models + fine-tuning
  • 50M tokens / month
  • Priority support (4h SLA)
  • Dedicated inference nodes
  • Full observability suite
Enterprise
Custom
Volume discounts · SLA guaranteed
  • Private model deployment
  • Unlimited tokens
  • Dedicated Slack channel
  • HIPAA / SOC 2 compliance
  • Custom integrations

Start building with Yarune today

Join 15,000+ engineers shipping production AI. Get your API key in 30 seconds.

SOC 2 Type II · GDPR · HIPAA ready