Deploy MCP Server
AI & Machine Learning Bearer Token

Groq REST API

Ultra-fast AI inference with LPU technology

Groq provides lightning-fast AI inference powered by their custom Language Processing Unit (LPU) architecture. Developers use Groq's API to access state-of-the-art language models like Llama, Mixtral, and Gemma with industry-leading low latency and high throughput. The API is OpenAI-compatible, making it easy to switch or integrate into existing LLM applications.

Base URL https://api.groq.com/openai/v1

API Endpoints

MethodEndpointDescription
POST/chat/completionsCreate a chat completion with streaming or non-streaming responses
GET/modelsList all available models and their capabilities
GET/models/{model_id}Retrieve detailed information about a specific model
POST/completionsGenerate text completions from a prompt
POST/embeddingsCreate embeddings from input text for semantic search or RAG applications
POST/audio/transcriptionsTranscribe audio files to text using Whisper models
POST/audio/translationsTranslate audio files to English text
GET/usageRetrieve API usage statistics and quota information
POST/moderationsCheck text content for policy violations
DELETE/chat/completions/{completion_id}Cancel an ongoing streaming completion request
GET/rate_limitsGet current rate limit status and remaining quota
POST/chat/completions/function_callCreate chat completions with function calling capabilities

Code Examples

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer gsk_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Connect Groq to AI

Deploy a Groq MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Groq through these tools:

groq_chat_completion Generate AI responses using Groq's ultra-fast inference with support for multiple models including Llama, Mixtral, and Gemma
groq_stream_completion Stream chat completions in real-time for interactive applications with minimal latency
groq_function_call Execute structured function calls with AI models for tool use and API integration workflows
groq_list_models Retrieve available models and their specifications to select optimal models for specific tasks
groq_transcribe_audio Transcribe audio files to text using Whisper models with high accuracy and speed

Deploy in 60 seconds

Describe what you need, AI generates the code, and IOX deploys it globally.

Deploy Groq MCP Server →

Related APIs