Blazing fast OSS inference. $1 free credits.
// At a glance
Free Tier
$1 credits · no card
// Free tier details
Available Models
Llama 3.3MixtralQwenDeepSeek
Monthly Requests
$1 free credits
// Quick start
300">"text-purple-400">from openai 300">"text-purple-400">import OpenAI
client = OpenAI(
api_key=300">"YOUR_FIREWORKS_KEY",
base_url=300">"https://api.fireworks.ai/inference/v1",
)
response = client.chat.completions.create(
model=300">"accounts/fireworks/models/llama-v3p3-70b-instruct",
messages=[{300">"role": 300">"user", 300">"content": 300">"Hello."}],
)
print(response.choices[0].message.content)
// Overview
Optimized inference for Llama, Mixtral, and OSS models with sub-100ms time-to-first-token. Function calling and fine-tuning support.
// Pros
- Very fast TTFT
- Fine-tuning support
- Function calling
// Cons
- $1 free credit is small
- Smaller model catalog
// Score breakdown
Reliability (35%) (from 2m ago health check)100/100
Free Tier Generosity (30%) (computed from quota, no-CC, no-phone fields)85/100
Documentation (20%) (human rating)87/100
Popularity (15%) (GitHub stars (log-normalised), or manual baseline)86/100
Methodology: apivault.dev/methodology
// Best for
Real-time agentsFunction callingOSS inference