Available Endpoints
| Method |
Endpoint |
Description |
| POST |
/api/chat |
Non-streaming chat completion |
| POST |
/api/chat/stream |
Streaming chat completion (SSE) |
| GET |
/api/chat?q=query |
Simple query via GET |
| GET |
/api/chat/stream?q=query |
Streaming query via GET |
| GET |
/health |
System health & queue status |
| GET |
/api/stats |
Model performance statistics |
| GET |
/docs |
This documentation page |
Request Format (POST)
{
"prompt": "Your question here",
// OR
"messages": [
{"role": "user", "content": "Your message"}
]
}
Response Format
{
"success": true,
"model": "qwen/qwen-2.5-coder-32b-instruct:free",
"intent": "coding",
"retryAttempt": 0,
"response": "AI response text here...",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 45,
"total_tokens": 57
}
}
Automatic Intent Detection
The API automatically analyzes your prompt and routes it to the best model. No manual selection needed!
| Intent |
Trigger Keywords |
Best For |
| Coding |
python, javascript, function, api, debug, code, algorithm |
Programming tasks, code generation, debugging |
| Math |
calculus, equation, calculate, matrix, solve, derivative |
Mathematical problems, calculations, formulas |
| Reasoning |
analyze, explain, compare, logic, why, evaluate |
Complex analysis, logical reasoning, comparisons |
| Creative |
story, poem, creative, narrative, write, compose |
Creative writing, storytelling, content generation |
| General |
Default fallback |
General questions and conversations |
Available AI Models
All models are FREE! Powered by OpenRouter's free tier.
| Model |
Specialty |
Max Tokens |
| Qwen 2.5 Coder 32B |
Code generation, debugging |
4,096 |
| DeepSeek Chat v3.1 |
Advanced coding, technical |
8,192 |
| DeepSeek R1 |
Reasoning, analysis, math |
8,192 |
| Llama 3.3 8B |
General purpose, balanced |
4,096 |
| MiniMax M2 |
Creative writing, storytelling |
8,192 |
Model Selection Features
- Health Monitoring: Circuit breaker disables failing models
- Auto Retry: Up to 3 attempts with exponential backoff
- Load Balancing: Health-aware round-robin distribution
- Fallback System: Emergency fallback if all models fail
cURL Examples
POST Request (Non-Streaming)
curl -X POST https://mutlimodel-ai-api.onrender.com/api/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "Write a hello world in Python"}'
POST Request (Streaming)
curl -N -X POST https://mutlimodel-ai-api.onrender.com/api/chat/stream \
-H "Content-Type: application/json" \
-d '{"prompt": "Explain async/await"}'
GET Request
curl "https://mutlimodel-ai-api.onrender.com/api/chat?q=What+is+JavaScript"
JavaScript/Fetch Examples
Non-Streaming
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Explain async/await in JavaScript'
})
});
const data = await response.json();
console.log(data.response);
Streaming with Fetch
const response = await fetch('/api/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Tell me a story'
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
console.log(text);
}
Python Example
import requests
response = requests.post('https://mutlimodel-ai-api.onrender.com/api/chat',
json={'prompt': 'Calculate 25 * 37'}
)
data = response.json()
print(data['response'])
Node.js Example
const fetch = require('node-fetch');
async function chat() {
const response = await fetch('https://mutlimodel-ai-api.onrender.com/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Explain promises in JavaScript'
})
});
const data = await response.json();
console.log(data.response);
}
chat();