Introducing Yarune Inference API v3

AI infrastructure
that scales with you

Deploy, monitor, and scale AI models with millisecond latency. The developer platform built for production AI workloads.

Trusted by engineering teams at

VercelStripeLinearNotionLoomFigmaRetool

yarune.ai / dashboard / inference

Overview

Inference

Models

Logs

Analytics

Config

Settings

API Keys

Usage

Inference Overview

Last 30 days · us-east-1

Requests / day

2.4M

↑ 18.2%

P50 Latency

38ms

↑ 12% faster

Uptime

99.99%

↑ SLA met

Cost / 1k tokens

$0.12

↑ 4% usage

ModelRequestsLatencyStatus

yarune-inference-xl

gpt-class · 175B

1.2M34mslive

yarune-embed-v2

embedding · 1.5B

890K12mslive

yarune-vision-pro

multimodal · 70B

310K88msidle

Platform

Everything you need to ship AI

From raw inference to full observability, Yarune is the AI infrastructure layer your team needs to move fast.

Inference API

Deploy any model — open-source or custom — with one API call. Auto-scaling, batching, and caching built in.

Observability

Request-level tracing, latency histograms, token usage analytics, and real-time alerting out of the box.

Fine-tuning

Train custom adapters on your data. Instant deployment of fine-tuned checkpoints with zero cold starts.

Streaming

First token in under 50ms. Native SSE and WebSocket streaming for real-time AI-powered UIs at any scale.

Global Edge

12 regions, 40+ PoPs. Requests automatically route to the nearest node. Sub-100ms for 95% of traffic globally.

Webhooks & Events

Subscribe to model events, usage milestones, and error thresholds. Integrate with Slack, PagerDuty, or any HTTP endpoint.

Developer First

Inference in three lines

Our TypeScript SDK is designed for humans. Type-safe, fully async, and streaming-native.

inference.tsTypeScript

import { Yarune } from '@yarune/sdk'

const client = new Yarune({
  apiKey: process.env.YARUNE_KEY
})

// Stream tokens in real-time
const stream = await client.inference.stream({
  model: 'yarune-xl-v3',
  messages: [{ role: 'user',
    content: 'Summarize this document'
  }],
  maxTokens: 2048
})

for await (const chunk of stream) {
  process.stdout.write(chunk.delta)
}

Pricing

Pay for what you use

Transparent, usage-based pricing with no hidden fees. Scale to millions of requests without negotiating contracts.

Hobby

^$0

Free forever · 100k requests/mo

All public models
1M tokens / month
Community support
Shared infrastructure
Basic analytics

Start building with Yarune today

Join 15,000+ engineers shipping production AI. Get your API key in 30 seconds.

SOC 2 Type II · GDPR · HIPAA ready

AI infrastructurethat scales with you

Everything you need to ship AI

Inference in three lines

Pay for what you use

Start building with Yarune today

AI infrastructure
that scales with you