AI & Machine Learning
Bearer Token
Groq REST API
Ultra-fast AI inference with LPU technology
Groq provides lightning-fast AI inference powered by their custom Language Processing Unit (LPU) architecture. Developers use Groq's API to access state-of-the-art language models like Llama, Mixtral, and Gemma with industry-leading low latency and high throughput. The API is OpenAI-compatible, making it easy to switch or integrate into existing LLM applications.
Base URL
https://api.groq.com/openai/v1
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /chat/completions | Create a chat completion with streaming or non-streaming responses |
| GET | /models | List all available models and their capabilities |
| GET | /models/{model_id} | Retrieve detailed information about a specific model |
| POST | /completions | Generate text completions from a prompt |
| POST | /embeddings | Create embeddings from input text for semantic search or RAG applications |
| POST | /audio/transcriptions | Transcribe audio files to text using Whisper models |
| POST | /audio/translations | Translate audio files to English text |
| GET | /usage | Retrieve API usage statistics and quota information |
| POST | /moderations | Check text content for policy violations |
| DELETE | /chat/completions/{completion_id} | Cancel an ongoing streaming completion request |
| GET | /rate_limits | Get current rate limit status and remaining quota |
| POST | /chat/completions/function_call | Create chat completions with function calling capabilities |
Code Examples
curl https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer gsk_YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Connect Groq to AI
Deploy a Groq MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Groq through these tools:
groq_chat_completion
Generate AI responses using Groq's ultra-fast inference with support for multiple models including Llama, Mixtral, and Gemma
groq_stream_completion
Stream chat completions in real-time for interactive applications with minimal latency
groq_function_call
Execute structured function calls with AI models for tool use and API integration workflows
groq_list_models
Retrieve available models and their specifications to select optimal models for specific tasks
groq_transcribe_audio
Transcribe audio files to text using Whisper models with high accuracy and speed
Deploy in 60 seconds
Describe what you need, AI generates the code, and IOX deploys it globally.
Deploy Groq MCP Server →