Groq REST API
Ultra-fast AI inference with LPU technology
Groq provides lightning-fast AI inference powered by their custom Language Processing Unit (LPU) architecture. Developers use Groq's API to access state-of-the-art language models like Llama, Mixtral, and Gemma with industry-leading low latency and high throughput. The API is OpenAI-compatible, making it easy to switch or integrate into existing LLM applications.
https://api.groq.com/openai/v1
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /chat/completions | Create a chat completion with streaming or non-streaming responses |
| GET | /models | List all available models and their capabilities |
| GET | /models/{model_id} | Retrieve detailed information about a specific model |
| POST | /completions | Generate text completions from a prompt |
| POST | /embeddings | Create embeddings from input text for semantic search or RAG applications |
| POST | /audio/transcriptions | Transcribe audio files to text using Whisper models |
| POST | /audio/translations | Translate audio files to English text |
| GET | /usage | Retrieve API usage statistics and quota information |
| POST | /moderations | Check text content for policy violations |
| DELETE | /chat/completions/{completion_id} | Cancel an ongoing streaming completion request |
| GET | /rate_limits | Get current rate limit status and remaining quota |
| POST | /chat/completions/function_call | Create chat completions with function calling capabilities |
Sponsor this page
AvailableReach developers actively building with Groq. See live pageview data and self-serve checkout — your slot goes live in minutes.
View inventory & pricing →Code Examples
curl https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer gsk_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Use Groq from Claude / Cursor / ChatGPT
Get a hosted MCP endpoint for Groq. Paste your Groq API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Groq directly with your credentials — no local install, works on mobile.
groq_chat_completion
Generate AI responses using Groq's ultra-fast inference with support for multiple models including Llama, Mixtral, and Gemma
groq_stream_completion
Stream chat completions in real-time for interactive applications with minimal latency
groq_function_call
Execute structured function calls with AI models for tool use and API integration workflows
groq_list_models
Retrieve available models and their specifications to select optimal models for specific tasks
groq_transcribe_audio
Transcribe audio files to text using Whisper models with high accuracy and speed
Connect in 60 seconds
Paste your Groq key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.
Connect Groq to your AI →