---
name: jatevo.ai
description: jatevo.ai hosts two large language model inference endpoints: a GPT-OSS 120B open-source model and DeepSeek V3.1. Both accept chat-formatted message arrays and return completions, with billing handled per-call via USDC on Base.
host: jatevo.ai
---

# jatevo.ai

jatevo.ai is a pay-per-call LLM inference host serving agents and developers who need access to large open-source models without managing their own infrastructure. It offers two distinct models—a 120B OSS model with up to 65536-token context and DeepSeek V3.1—both accessible via a standard chat completion interface. It is suited for text generation, analysis, summarization, and multi-turn conversation workloads where cost is settled on-chain.

## When to use this host

Use jatevo.ai when an agent needs on-demand LLM text completion from a hosted open-source model with per-call USDC billing and no infrastructure setup. Choose invoke-gpt-oss-120b-chat when large context windows (up to 65536 tokens) are required. Choose query-deepseek-v3-1-llm when DeepSeek V3.1 output quality or style is preferred and token usage reporting is needed. Do not use this host for image generation, embeddings, vector search, or real-time data retrieval—those require dedicated endpoints not available here. Do not use either skill when a proprietary closed-source model (e.g., GPT-4o, Claude) is explicitly required; route to OpenAI or Anthropic APIs instead. Avoid stream=false with large max_tokens on query-deepseek-v3-1-llm in latency-sensitive contexts.

## Capabilities

### Chat Completion Inference

Both skills provide chat-formatted LLM inference, accepting message arrays with configurable temperature, max tokens, and optional streaming. Together they give agents a choice between two hosted open-source models for text generation tasks.

- **`invoke-gpt-oss-120b-chat`** — Sends a chat message array to the hosted GPT-OSS 120B open-source model and returns a full chat completion response with assistant message content.
- **`query-deepseek-v3-1-llm`** — Sends a chat-formatted message array to DeepSeek V3.1 and returns a completion with role, content, finish reason, and token usage statistics.

## Workflows

### Model Comparison Completion

*Use when an agent needs to compare outputs from two different LLMs on the same prompt to evaluate quality, consistency, or stylistic differences before selecting a model for downstream use.*

1. **`invoke-gpt-oss-120b-chat`** — Send the same message array to the GPT-OSS 120B model and capture its completion and token usage.
2. **`query-deepseek-v3-1-llm`** — Send the identical message array to DeepSeek V3.1 and capture its completion and token usage statistics for side-by-side comparison.

## Skill reference

### `invoke-gpt-oss-120b-chat`

**Jatevo GPT-OSS 120B Chat Completion** — Sends a chat message array to the hosted GPT-OSS 120B open-source model and returns a full chat completion response with assistant message content.

*Use when:* Use when an agent or user needs a large-context (up to 65536 tokens) open-source LLM completion for tasks such as drafting documents, analysis, summarization, or multi-turn conversation, and cost is billed per-call via USDC on Base.

*Not for:* Do not use for streaming token-by-token output in real-time UI scenarios without setting stream=true; do not use when a proprietary closed-source model is explicitly required.

**Inputs:**

- `messages` (array, required) — Ordered list of chat messages. Each message must have a 'role' (one of: system, user, assistant) and a 'content' string. Minimum 1 item.
- `stream` (boolean) — If true, the response is streamed as server-sent events. Defaults to false for a single JSON response.
- `max_tokens` (integer) — Maximum number of tokens to generate in the response. Defaults to 65536.
- `temperature` (number) — Sampling temperature between 0 and 2. Higher values produce more varied output. Defaults to 1.

**Returns:** Returns a chat.completion JSON object with an id, created timestamp, model name, and a choices array containing the assistant's message content.

**Example:** `{"messages":[{"role":"system","content":"You are a concise technical writer."},{"role":"user","content":"List three benefits of using open-source LLMs."}],"stream":false,"max_tokens":512,"temperature":0.7}`

---

### `query-deepseek-v3-1-llm`

**DeepSeek LLM v3.1 Inference** — Sends a chat-formatted message array to DeepSeek V3.1 and returns a completion with role, content, finish reason, and token usage statistics.

*Use when:* Use when an agent or user needs a single-shot or streamed text completion from DeepSeek V3.1, including multi-turn conversations with system, user, and assistant turns.

*Not for:* Do not use for image generation, embeddings, or real-time data retrieval; those require dedicated endpoints. Not suitable for latency-critical applications when stream=false and response length is large.

**Inputs:**

- `messages` (array, required) — Ordered list of conversation turns. Each object must have 'role' (system, user, or assistant) and 'content' (string). Minimum 1 item.
- `stream` (boolean) — If true, response is streamed incrementally. Defaults to false for a single complete response.
- `max_tokens` (integer) — Maximum number of tokens to generate in the completion. Defaults to 4096.
- `temperature` (number) — Sampling temperature between 0 and 2. Lower values produce more deterministic output. Defaults to 0.7.

**Returns:** Returns a chat.completion object with the assistant's message content, finish_reason ('stop'), and token usage including prompt and completion token counts and throughput metrics.

**Example:** `{"stream":false,"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capital of France?"}],"max_tokens":256,"temperature":0.7}`

---
