G

Groq

online

Fastest LLM inference in the world. The free tier is real.

LLMest. 2016 · Mountain View, CA
92
/ 100 APIVault Score

// At a glance

Free Tier
14,400 req/day · 30 req/min · no card
Category
LLM
Credit Card
Not required
Last Verified
3m ago

// Free tier details

Available Models

Llama 3.3 70BMixtralGemma

Monthly Requests

14,400 requests/day

Rate Limit

30 requests/minute

No credit card needed
No phone verification

// Quick start

300">"text-purple-400">from groq 300">"text-purple-400">import Groq

client = Groq(api_key=300">"YOUR_GROQ_API_KEY")

response = client.chat.completions.create(
    model=300">"llama-3.3-70b-versatile",
    messages=[{300">"role": 300">"user", 300">"content": 300">"Explain LPU 400300">">in one sentence."}],
)

print(response.choices[0].message.content)

// Overview

Groq's LPU (Language Processing Unit) delivers hundreds of tokens per second, making it the fastest hosted inference for open-source LLMs. The free tier is generous, stable, and production-ready for low-latency applications.

// Pros

  • Insane inference speed (500+ tokens/sec)
  • OpenAI-compatible API
  • Generous free tier with daily resets

// Cons

  • Smaller model selection vs OpenRouter
  • Rate limits can hit during peak hours

// Score breakdown

Reliability (35%) (from 3m ago health check)100/100
Free Tier Generosity (30%) (computed from quota, no-CC, no-phone fields)100/100
Documentation (20%) (human rating)100/100
Popularity (15%) (GitHub stars (log-normalised), or manual baseline)46/100

Methodology: apivault.dev/methodology

// Best for

Real-time chatbotsCode generationStreaming responses

// Recent changes

May 12, 2026Added Llama 3.3 70B Versatileadded
Apr 1, 2026Increased free-tier rate limitsupdated