API Reference

OpenAI Compatible

ManyLLM provides an OpenAI-compatible API running locally on your machine.

Base URL

All API requests are made to your local ManyLLM instance.

text

http://localhost:8080

Default port is 8080. Configure in ManyLLM settings if needed.

Authentication

Local API key for request authentication.

text

Authorization: Bearer your-local-api-key

API key is generated locally and can be found in ManyLLM settings.

Chat Completions

Generate chat responses using local models.

Endpoint

text

POST /v1/chat/completions

Example Request

bash

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-local-api-key" \
  -d '{
    "model": "llama3",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user", 
        "content": "Explain quantum computing"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500,
    "stream": true
  }'

Streaming Response

text

# Streaming response
data: {"choices":[{"delta":{"content":"Quantum"}}]}
data: {"choices":[{"delta":{"content":" computing"}}]}
data: {"choices":[{"delta":{"content":" is"}}]}
data: [DONE]

Available Models

List and manage your local models.

List Models

text

GET /v1/models

Supported Model Providers

• Ollama: llama3, mistral, qwen, phi3
• llama.cpp: GGUF format models
• MLX: Apple Silicon optimized models

Request Parameters

Supported parameters for chat completions.

Required

• model - Model identifier
• messages - Array of messages

Optional

• temperature - 0.0 to 2.0
• max_tokens - Max response length
• stream - Enable streaming
• top_p - Nucleus sampling

More Endpoints Coming

Additional endpoints for embeddings, fine-tuning, and model management are in development.