ManyLLM v1.0.0
Local-first AI platform
Documentation
API Reference
OpenAI Compatible
ManyLLM provides an OpenAI-compatible API running locally on your machine.
Base URL
All API requests are made to your local ManyLLM instance.
text
http://localhost:8080
Default port is 8080. Configure in ManyLLM settings if needed.
Authentication
Local API key for request authentication.
text
Authorization: Bearer your-local-api-key
API key is generated locally and can be found in ManyLLM settings.
Chat Completions
Generate chat responses using local models.
Endpoint
text
POST /v1/chat/completions
Example Request
bash
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-local-api-key" \
-d '{
"model": "llama3",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Explain quantum computing"
}
],
"temperature": 0.7,
"max_tokens": 500,
"stream": true
}'
Streaming Response
text
# Streaming response
data: {"choices":[{"delta":{"content":"Quantum"}}]}
data: {"choices":[{"delta":{"content":" computing"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]
Available Models
List and manage your local models.
List Models
text
GET /v1/models
Supported Model Providers
- • Ollama: llama3, mistral, qwen, phi3
- • llama.cpp: GGUF format models
- • MLX: Apple Silicon optimized models
Request Parameters
Supported parameters for chat completions.
Required
- •
model
- Model identifier - •
messages
- Array of messages
Optional
- •
temperature
- 0.0 to 2.0 - •
max_tokens
- Max response length - •
stream
- Enable streaming - •
top_p
- Nucleus sampling
More Endpoints Coming
Additional endpoints for embeddings, fine-tuning, and model management are in development.