Groq

Name: Groq
Rating: 92

online

Fastest LLM inference in the world. The free tier is real.

LLMest. 2016 · Mountain View, CA

/ 100 APIVault Score

Get API Key Read Docs

// At a glance

Free Tier

14,400 req/day · 30 req/min · no card

// Free tier details

Available Models

Llama 3.3 70BMixtralGemma

Monthly Requests

14,400 requests/day

Rate Limit

30 requests/minute

No credit card needed

No phone verification

// Quick start

300">"text-purple-400">from groq 300">"text-purple-400">import Groq

client = Groq(api_key=300">"YOUR_GROQ_API_KEY")

response = client.chat.completions.create(
    model=300">"llama-3.3-70b-versatile",
    messages=[{300">"role": 300">"user", 300">"content": 300">"Explain LPU 400300">">in one sentence."}],
)

print(response.choices[0].message.content)

// Overview

Groq's LPU (Language Processing Unit) delivers hundreds of tokens per second, making it the fastest hosted inference for open-source LLMs. The free tier is generous, stable, and production-ready for low-latency applications.

// Pros

Insane inference speed (500+ tokens/sec)
OpenAI-compatible API
Generous free tier with daily resets

// Cons

Smaller model selection vs OpenRouter
Rate limits can hit during peak hours

// Score breakdown

Reliability (35%) (from 2m ago health check)100/100

Free Tier Generosity (30%) (computed from quota, no-CC, no-phone fields)100/100

Documentation (20%) (human rating)100/100

Popularity (15%) (GitHub stars (log-normalised), or manual baseline)46/100

Methodology: apivault.directory/methodology

// Best for

Real-time chatbotsCode generationStreaming responses

// Recent changes

May 12, 2026Added Llama 3.3 70B Versatileadded

Apr 1, 2026Increased free-tier rate limitsupdated