CB

Cerebras

online

2,000+ tokens/sec on Llama 70B. Free tier available.

LLMest. 2016 · Sunnyvale, CA
92
/ 100 APIVault Score

// At a glance

Free Tier
30 req/min · 1M tokens/day · no card
Category
LLM
Credit Card
Not required
Last Verified
2m ago

// Free tier details

Available Models

Llama 3.3 70BLlama 3.1 8BLlama 3.1 70B

Monthly Requests

30 requests/minute

Monthly Tokens

1M tokens/day

No credit card needed
No phone verification

// Quick start

300">"text-purple-400">from openai 300">"text-purple-400">import OpenAI

client = OpenAI(
    api_key=300">"YOUR_CEREBRAS_KEY",
    base_url=300">"https://api.cerebras.ai/v1",
)

response = client.chat.completions.create(
    model=300">"llama-3.3-70b",
    messages=[{300">"role": 300">"user", 300">"content": 300">"Hello, world."}],
)

print(response.choices[0].message.content)

// Overview

Cerebras' wafer-scale chip delivers the fastest LLM inference on the planet — over 2,000 tokens per second on Llama 3.3 70B. Free developer tier with daily rate limits.

// Pros

  • Fastest available inference (2,000+ t/s)
  • OpenAI-compatible
  • No card to start

// Cons

  • Limited model selection
  • Daily token caps on free tier

// Score breakdown

Reliability (35%) (from 2m ago health check)100/100
Free Tier Generosity (30%) (computed from quota, no-CC, no-phone fields)85/100
Documentation (20%) (human rating)90/100
Popularity (15%) (GitHub stars (log-normalised), or manual baseline)88/100

Methodology: apivault.dev/methodology

// Best for

Real-time streamingLow-latency agentsInteractive apps

// Recent changes

Mar 15, 2026Llama 3.3 70B on Cerebras Inferenceadded