Raw server access

Direct raw-server access with no additional Nexus inference layer required is the main technical claim for LM Nexus runtime control.

What direct raw endpoint means

When users connect directly to a llama.cpp server started by LM Nexus, inference traffic does not pass through a Nexus inference proxy. In that mode, Nexus acts as an orchestrator/manager rather than an inference middleware layer.

curl http://127.0.0.1:PORT/v1/chat/completions

PORT is a placeholder in public docs until a runtime-specific endpoint is shown inside the app.

When to use it

Direct raw-server access is useful when you want to:

inspect the exact endpoint being used;
connect another tool to the runtime;
debug provider compatibility;
avoid routing inference through an additional Nexus inference proxy;
keep LM Nexus acting as a manager/orchestrator around the server.

Nexus API path versus raw server path

The Nexus API path is for integrated workflows where LM Nexus coordinates workspace behavior.

The raw server path is for direct use of a runtime endpoint. In direct raw-server mode, the model request goes to the raw server endpoint rather than through a Nexus inference middleware layer.

Why it matters

Power users often care about where inference traffic goes. Direct raw endpoint visibility makes runtime behavior easier to inspect, reproduce, and integrate with existing tools.