ManyLLM v1.0.0

Local-first AI platform

Documentation

API Reference

OpenAI Compatible

ManyLLM provides an OpenAI-compatible API running locally on your machine.

Base URL
All API requests are made to your local ManyLLM instance.
text
http://localhost:8080

Default port is 8080. Configure in ManyLLM settings if needed.

Authentication
Local API key for request authentication.
text
Authorization: Bearer your-local-api-key

API key is generated locally and can be found in ManyLLM settings.

Chat Completions
Generate chat responses using local models.

Endpoint

text
POST /v1/chat/completions

Example Request

bash
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-local-api-key" \
  -d '{
    "model": "llama3",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": true
  }'

Streaming Response

text
# Streaming response
data: {"choices":[{"delta":{"content":"Quantum"}}]}
data: {"choices":[{"delta":{"content":" computing"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]
Available Models
List and manage your local models.

List Models

text
GET /v1/models
Supported Model Providers
  • Ollama: llama3, mistral, qwen, phi3
  • llama.cpp: GGUF format models
  • MLX: Apple Silicon optimized models
Request Parameters
Supported parameters for chat completions.
Required
  • model - Model identifier
  • messages - Array of messages
Optional
  • temperature - 0.0 to 2.0
  • max_tokens - Max response length
  • stream - Enable streaming
  • top_p - Nucleus sampling

More Endpoints Coming

Additional endpoints for embeddings, fine-tuning, and model management are in development.