NexWave AI: A Production LLM Agent That Queries Live ERP Data
We built a production-grade LLM agent embedded inside an ERP, giving non-technical users answers to P&L, receivables, inventory, and sales questions in plain English. No training, no report navigation, no SQL.

Background
New Zealand and Australian businesses running NexWave (an ERP platform for the NZ/AU market) face a familiar problem: the data they need is in the system, but extracting it requires knowing which report to run, which filters to apply, and how to interpret the output.
An operations manager who just wants to answer “what are our overdue receivables?” has to remember that the right report is Accounts Receivable Summary, that it needs a company filter, that ageing buckets default to 30/60/90/120, and that the output needs to be exported to Excel to be shareable. A finance director who wants a year-on-year P&L comparison has to run the report twice with different fiscal year filters and stitch the numbers together manually.
HighFlyer was asked to close that gap. The brief was deceptively simple: make the ERP answer questions in plain English, with charts and tables, accurately, without hallucinating, and without letting users see data they are not authorised to access.
The Challenge
Building an LLM agent that reads from a production ERP is not the same as wiring up a chatbot on top of a documentation corpus. Five constraints made this harder than the marketing demos suggest:
- Truth matters. Finance teams cannot tolerate hallucinated numbers. Every figure the assistant produces must come from a real query against live data, traceable back to the source document.
- Permissions must be honoured. ERP data is role-scoped. A warehouse staff member should not be able to ask the AI “show me all salaries” and get an answer. The agent must run with the user’s own permissions, not a service account.
- Tool-calling is stateful. The OpenAI Chat Completions protocol requires every
toolmessage to be paired with its originatingassistantmessage containing thetool_callsarray. Naive history trimming breaks this invariant and the model refuses to respond. - Failures are normal. Rate limits, timeouts, model errors, and context-window overflow all happen in production. The agent cannot just stop working when any of these occur.
- Output must be presentable. Raw JSON from an ERP report is not an answer. Users need formatted tables, charts, and clickable links to source documents.
The Solution by HighFlyer
We designed and shipped NexWave AI: an agentic assistant embedded directly inside the NexWave web interface. It uses the Chat Completions API with function calling, a curated tool registry, permission-scoped execution, streaming progress updates, and a chat-native rendering layer.

Architecture

The system has four layers:
- Agent loop: A Python loop that calls the LLM, executes any requested tools, appends results to the conversation, and loops until the model returns a final text answer (or hits a safety bound of ten rounds).
- Tool registry: A curated set of six tools with strict JSON Schema definitions: fetch a document, list documents, count documents, inspect a doctype’s fields, run a predefined ERP report, and evaluate a mathematical expression. Every tool is a plain Python callable; the registry maps names to functions.
- Permission layer: Tools execute as the signed-in user. The ERP’s existing permission system (user-level role checks, document-level permissions, field-level access) is reused verbatim. No separate AI permission model. If a user cannot see a document through the UI, the AI cannot see it either.
- Rendering layer: Responses are streamed back to a chat interface that renders Markdown, tables, fenced chart blocks (bar, line, pie, donut), clickable document links, and typing indicators.
The Tool Registry
We deliberately chose a small, precise set of tools rather than letting the model write arbitrary database queries:
| Tool | Purpose |
|---|---|
get_document | Fetch a single ERP document by name or filter |
list_documents | Query documents with filters, ordering, and pagination |
get_document_count | Count documents matching filters |
get_doctype_fields | Introspect a doctype’s schema (including custom fields) |
run_report | Execute a pre-defined ERP report (P&L, Balance Sheet, AR, AP, Stock Balance, Sales Analytics, and more) |
calculate_expression | Evaluate a mathematical expression safely |
This scope is narrow by design. The model cannot invoke arbitrary Python, cannot write SQL, cannot touch the file system, and cannot call out to other services. Every action it takes is an explicit, auditable tool call with a documented schema.
Why calculate_expression Exists
LLMs are notoriously bad at arithmetic. Asking a model to sum a column of thirty numbers accurately is a recipe for subtle errors that finance teams will absolutely notice.
The system prompt instructs the agent to never compute totals, averages, or percentages in its head, and to call calculate_expression for every arithmetic operation. The tool is a safe, restricted expression evaluator, it supports +, -, *, /, %, **, parentheses, and a whitelisted set of math functions. No eval, no variable binding, no attribute access.
This single design decision eliminated an entire class of errors that otherwise would have required every response to be manually reconciled.
Permission-Aware Execution
The tool implementations call into the ERP’s standard APIs using the authenticated session’s identity. When a sales rep asks “show me our P&L,” the report execution checks whether that user has permission to run the Profit and Loss Statement report. If not, they receive an error instead of numbers.
This is the right default. We considered the alternative, a privileged service account that always has access, with an application-layer permission check, and rejected it. Every additional permission model is a future security bug. Reusing the ERP’s existing permissions means the AI cannot accidentally out-disclose what the rest of the platform already controls.
Technical Deep-Dive
Preserving Tool-Call Pairs During History Trimming
Long conversations eventually hit the model’s context window. The natural fix is to keep only the last N messages. The wrong way to do it breaks tool-call integrity: if an assistant message with a tool_calls array gets trimmed but its corresponding tool messages remain, the API rejects the request.
The agent handles this by:
- Always keeping the system prompt.
- Taking the last 40 messages as candidates.
- Finding any
tool_call_idvalues referenced intoolmessages within the candidates. - Pulling back in any earlier
assistantmessages whosetool_callsarray contains those IDs. - Returning the combined set.
The result is a trimmed conversation that never violates the API’s pairing invariant, regardless of how the cut falls.
Graceful Degradation
Five error classes each have their own handler, and all return a helpful message to the user rather than stack-tracing:
- Authentication errors: The API key is invalid. Message suggests contacting an admin.
- Rate limits and timeouts: Temporary service issue. Message suggests retrying.
- Context-too-large (HTTP 400): The conversation has grown beyond what the model can handle. Message suggests starting a new conversation.
- Tool execution failures: Logged in detail on the backend, returned as structured error to the model, which can then retry with different parameters or surface the error to the user.
- Consecutive tool failures: After three in a row, the agent breaks the loop and returns a clear error instead of flailing indefinitely.
Streaming Progress
LLM calls take seconds. Tool calls take more. Without feedback, users assume the system has hung.
Before every reasoning round, the agent publishes a progress event over the ERP’s real-time channel:
- “Thinking…” while the model is generating
- “Running Accounts Receivable…” while a report executes
- “Fetching Sales Invoice SI-00123…” while a document lookup runs
- “Calculating…” while
calculate_expressionevaluates
The chat interface subscribes to these and updates the UI in real time. Users see what the assistant is actually doing, which builds trust far more effectively than a spinner.
Rendering: Charts and Document Links
Raw ERP data is not an answer. The system prompt instructs the agent to:
- Format tabular data as Markdown tables
- Emit a fenced
chartcode block (with JSON payload) alongside tables whenever a visual would help - Convert every document reference (e.g.,
SI-00123) into a Markdown link to the document’s ERP page
The chat frontend parses these conventions: charts render as interactive visualisations, document names become clickable, numbers are preserved exactly as emitted. Nothing is reformatted or inferred. What the model produces is what the user sees.
The Results
The result is a working, production LLM agent embedded in a real ERP used by NZ and AU businesses. It answers questions like:
- “What’s our P&L for this fiscal year, broken down monthly?”
- “Who owes us more than $10,000 and is more than 60 days overdue?”
- “Show me the top 5 customers by revenue this quarter as a chart.”
- “What’s the current stock balance across all warehouses?”
- “Find Sales Invoice SI-00456 and tell me if it’s been paid.”

Non-technical staff get answers in seconds that previously required navigating to the right report, applying filters, and interpreting the output. The AI cites source documents with clickable links, so users can verify anything they are shown.
Technology Stack
- Python for the agent loop and tool implementations
- OpenAI SDK pointing at OpenRouter for model flexibility (swap between GPT-4 class, Claude, and open-weight models without code changes)
- JSON Schema for tool definitions
- Frappe v15 as the underlying application framework
- Real-time publish/subscribe for progress events
- Markdown + custom fenced blocks for chat rendering
- Chart.js-compatible JSON payloads for visualisations
Why This Matters for New Zealand Businesses
Most “AI for business” products are either glorified document search, or general-purpose chatbots that hallucinate numbers the moment you ask something specific. An AI that actually touches the systems where the numbers live, with the permission model intact and arithmetic delegated to a calculator, is a different category of tool.
We built NexWave AI because NZ and AU businesses using NexWave told us their operations staff were being bottlenecked by the time it takes to get an answer out of the ERP. The goal was not to replace analysts. The goal was to give every staff member the ability to ask a question and get a correct, sourced answer in seconds.
That is what this ships.
Conclusion
Building a production LLM agent is not the same as building a demo. Every sharp edge we have described in this case study (tool-call pair integrity, permission delegation, delegated arithmetic, graceful error handling, streaming progress) is a place where a prototype would have been fine and a production system would have failed a customer.
HighFlyer specialises in AI engineering where correctness matters. If you need an AI that actually works inside your operational systems, not a chatbot slapped on top of your website, we would be glad to talk.
Thinking about an AI assistant for your own business systems? Contact HighFlyer to discuss how we can help.
Project Details
Client:
NexWave
Industry:
Enterprise Software / ERP
Key Metrics:
6
Tools the agent can call
40
Max messages per session with tool-call integrity
10
Max agent reasoning rounds per query
<10s
Typical answer latency (including tool calls)
Achievements:
- Production LLM agent embedded in a live ERP, not a prototype
- Permission-aware tool execution: the AI cannot see data the user cannot see
- Conversation history trimming that preserves tool-call/tool-result pairs for model correctness
- Chart and table rendering directly inside the chat interface
- Graceful degradation on rate limits, timeouts, auth failures, and context overflow
Ready to Transform Your Business?
Let's discuss how our expertise can help you achieve similar results.
Contact Us Today or Book a MeetingExplore More Case Studies
Discover how we've helped other organisations across various industries achieve their strategic objectives.