Cerebras

Name: Cerebras
Rating: 92

online

2,000+ tokens/sec on Llama 70B. Free tier available.

LLMest. 2016 · Sunnyvale, CA

/ 100 APIVault Score

Get API Key Read Docs

// At a glance

Free Tier

30 req/min · 1M tokens/day · no card

// Free tier details

Available Models

Llama 3.3 70BLlama 3.1 8BLlama 3.1 70B

Monthly Requests

30 requests/minute

Monthly Tokens

1M tokens/day

No credit card needed

No phone verification

// Quick start

300">"text-purple-400">from openai 300">"text-purple-400">import OpenAI

client = OpenAI(
    api_key=300">"YOUR_CEREBRAS_KEY",
    base_url=300">"https://api.cerebras.ai/v1",
)

response = client.chat.completions.create(
    model=300">"llama-3.3-70b",
    messages=[{300">"role": 300">"user", 300">"content": 300">"Hello, world."}],
)

print(response.choices[0].message.content)

// Overview

Cerebras' wafer-scale chip delivers the fastest LLM inference on the planet — over 2,000 tokens per second on Llama 3.3 70B. Free developer tier with daily rate limits.

// Pros

Fastest available inference (2,000+ t/s)
OpenAI-compatible
No card to start

// Cons

Limited model selection
Daily token caps on free tier

// Score breakdown

Reliability (35%) (from 2m ago health check)100/100

Free Tier Generosity (30%) (computed from quota, no-CC, no-phone fields)85/100

Documentation (20%) (human rating)90/100

Popularity (15%) (GitHub stars (log-normalised), or manual baseline)88/100

Methodology: apivault.directory/methodology

// Best for

Real-time streamingLow-latency agentsInteractive apps

// Recent changes

Mar 15, 2026Llama 3.3 70B on Cerebras Inferenceadded