AI & Machine Learning Bearer Token

Groq REST API

Ultra-fast AI inference with LPU technology

Groq provides lightning-fast AI inference powered by their custom Language Processing Unit (LPU) architecture. Developers use Groq's API to access state-of-the-art language models like Llama, Mixtral, and Gemma with industry-leading low latency and high throughput. The API is OpenAI-compatible, making it easy to switch or integrate into existing LLM applications.

Base URL https://api.groq.com/openai/v1

API Endpoints

Method	Endpoint	Description
POST	`/chat/completions`	Create a chat completion with streaming or non-streaming responses
GET	`/models`	List all available models and their capabilities
GET	`/models/{model_id}`	Retrieve detailed information about a specific model
POST	`/completions`	Generate text completions from a prompt
POST	`/embeddings`	Create embeddings from input text for semantic search or RAG applications
POST	`/audio/transcriptions`	Transcribe audio files to text using Whisper models
POST	`/audio/translations`	Translate audio files to English text
GET	`/usage`	Retrieve API usage statistics and quota information
POST	`/moderations`	Check text content for policy violations
DELETE	`/chat/completions/{completion_id}`	Cancel an ongoing streaming completion request
GET	`/rate_limits`	Get current rate limit status and remaining quota
POST	`/chat/completions/function_call`	Create chat completions with function calling capabilities

Code Examples

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer gsk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

const response = await fetch('https://api.groq.com/openai/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer gsk_YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'llama-3.3-70b-versatile',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Explain quantum computing in simple terms' }
    ],
    temperature: 0.7,
    max_tokens: 1024
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

import requests

url = 'https://api.groq.com/openai/v1/chat/completions'
headers = {
    'Authorization': 'Bearer gsk_YOUR_API_KEY',
    'Content-Type': 'application/json'
}
payload = {
    'model': 'llama-3.3-70b-versatile',
    'messages': [
        {'role': 'system', 'content': 'You are a helpful assistant.'},
        {'role': 'user', 'content': 'Explain quantum computing in simple terms'}
    ],
    'temperature': 0.7,
    'max_tokens': 1024
}

response = requests.post(url, headers=headers, json=payload)
data = response.json()
print(data['choices'][0]['message']['content'])

Use Groq from Claude / Cursor / ChatGPT

Get a hosted MCP endpoint for Groq. Paste your Groq API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Groq directly with your credentials — no local install, works on mobile.

groq_chat_completion Generate AI responses using Groq's ultra-fast inference with support for multiple models including Llama, Mixtral, and Gemma

groq_stream_completion Stream chat completions in real-time for interactive applications with minimal latency

groq_function_call Execute structured function calls with AI models for tool use and API integration workflows

groq_list_models Retrieve available models and their specifications to select optimal models for specific tasks

groq_transcribe_audio Transcribe audio files to text using Whisper models with high accuracy and speed

Connect in 60 seconds

Paste your Groq key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.

Connect Groq to your AI →

Groq REST API

API Endpoints

Sponsor this page

Code Examples

Use Groq from Claude / Cursor / ChatGPT

Connect in 60 seconds

Related APIs