Skip to content

Raw server access

Direct raw-server access with no additional Nexus inference layer required is the main technical claim for LM Nexus runtime control.

When users connect directly to a llama.cpp server started by LM Nexus, inference traffic does not pass through a Nexus inference proxy. In that mode, Nexus acts as an orchestrator/manager rather than an inference middleware layer.

Terminal window
curl http://127.0.0.1:PORT/v1/chat/completions

PORT is a placeholder in public docs until a runtime-specific endpoint is shown inside the app.

Direct raw-server access is useful when you want to:

  • inspect the exact endpoint being used;
  • connect another tool to the runtime;
  • debug provider compatibility;
  • avoid routing inference through an additional Nexus inference proxy;
  • keep LM Nexus acting as a manager/orchestrator around the server.

The Nexus API path is for integrated workflows where LM Nexus coordinates workspace behavior.

The raw server path is for direct use of a runtime endpoint. In direct raw-server mode, the model request goes to the raw server endpoint rather than through a Nexus inference middleware layer.

Power users often care about where inference traffic goes. Direct raw endpoint visibility makes runtime behavior easier to inspect, reproduce, and integrate with existing tools.