Raw server access
Raw server access
Section titled “Raw server access”Direct raw-server access with no additional Nexus inference layer required is the main technical claim for LM Nexus runtime control.
What direct raw endpoint means
Section titled “What direct raw endpoint means”When users connect directly to a llama.cpp server started by LM Nexus, inference traffic does not pass through a Nexus inference proxy. In that mode, Nexus acts as an orchestrator/manager rather than an inference middleware layer.
curl http://127.0.0.1:PORT/v1/chat/completionsPORT is a placeholder in public docs until a runtime-specific endpoint is shown inside the app.
When to use it
Section titled “When to use it”Direct raw-server access is useful when you want to:
- inspect the exact endpoint being used;
- connect another tool to the runtime;
- debug provider compatibility;
- avoid routing inference through an additional Nexus inference proxy;
- keep LM Nexus acting as a manager/orchestrator around the server.
Nexus API path versus raw server path
Section titled “Nexus API path versus raw server path”The Nexus API path is for integrated workflows where LM Nexus coordinates workspace behavior.
The raw server path is for direct use of a runtime endpoint. In direct raw-server mode, the model request goes to the raw server endpoint rather than through a Nexus inference middleware layer.
Why it matters
Section titled “Why it matters”Power users often care about where inference traffic goes. Direct raw endpoint visibility makes runtime behavior easier to inspect, reproduce, and integrate with existing tools.