Building NESO's AI-First Future: How the AI Workbench is Transforming the Way Teams Work
We built NESO's AI Workbench platform from scratch in 6 months, creating Navi and TReSP agents with podcast, image, and video generation capabilities that transform how 4,000+ employees work - establishing the foundation for NESO becoming an AI-first organisation.

PISR: Problem, Impact, Solution, Result
-
Problem: NESO, Britain's National Energy System Operator managing the electricity grid for 60+ million people, wanted to become an AI-first organisation but had no AI platform capabilities. Teams navigated thousands of policy documents, regional energy pathways, and EV demand forecasts daily. Traditional search couldn't handle nuanced queries like "What are the EV charging infrastructure requirements for the North West region?"
-
Business Impact: Analysts spent hours manually searching PDFs and spreadsheets to answer stakeholder questions. Data was locked away, accessible only to technical specialists. New starters had no intelligent onboarding support. The energy transition depends on rapid, evidence-based decision-making - slow information access directly impacts grid planning and policy development.
-
Our Solution: ClearRoute designed and built an agentic AI platform - a multi-agent system where specialised AI agents collaborate to answer complex queries. Navi handles policy questions, news, and content generation. TReSP specialises in structured data analysis and visualisation. The platform generates podcasts from documents, creates images and videos, and renders interactive heat maps on geospatial tiles.
-
Tangible Result: We delivered 7 backend services and 2 AI agents in 6 months, integrating 20+ Azure services across 2 regions. The platform now serves users in alpha with instant policy answers, AI-generated podcasts, and self-service data visualisation - establishing NESO's foundation for AI-first operations.
The Challenge
The AI-First Ambition
NESO recognised that AI would fundamentally change how organisations operate. As Britain's energy system operator - responsible for keeping the lights on - they needed to be at the forefront of this transformation. The ambition: become an AI-first organisation where employees naturally turn to AI assistants to work more effectively.
But they were starting from zero. No AI platform. No ML capabilities. No unified way to access organisational knowledge.
How Work Happened Before
| Challenge | Business Cost |
|---|---|
| Manual document search | Hours per query, inconsistent answers |
| Data locked in spreadsheets | Analysts as bottleneck for insights |
| No self-service capability | Stakeholders waiting for reports |
| Knowledge silos | Critical information hard to surface |
| Accessibility gap | Complex data inaccessible to non-technical users |
Traditional search tools couldn't handle the nuanced queries NESO teams needed: "What are the EV charging infrastructure requirements for the North West region under the leading pathway scenario?" or "Summarise recent Ofgem announcements about grid connection reform."
Solution Overview
Meet Navi: Your Intelligent Work Companion
Navi (the Navigator Agent) is the centrepiece of the AI Workbench - an intelligent assistant that fundamentally changes how NESO employees work.
It Knows Who You Are
When you log into the Workbench, Navi already knows your role from Microsoft Graph. A new starter sees different prompts than a policy analyst. Your experience is personalised from day one.
It Answers Your Questions Instantly
Ask Navi "What are the policies if I want to take my laptop to France?" and it searches across all indexed policy documents using hybrid RAG (vector embeddings + BM25 keyword matching + semantic reranking), finds the relevant information, and gives you a clear answer with source citations.
It Keeps You Informed
Every morning, Navi surfaces the latest energy sector news - scraped from Ofgem, Gov.uk, and industry sources via Bing Custom Search. You start your day knowing what's happening in your industry.
It Makes Content Accessible
Got a 50-page technical document you need to understand but don't have time to read? Upload it to Navi and it generates an engaging podcast - not just narration, but a proper dual-voice conversation that makes dense content digestible. Architecture diagrams? Navi uses Document Intelligence (OCR) and GPT-4o to understand and explain them.
It Creates Visual Content
Need an image for a presentation? Describe it and Navi generates it via gpt-image-1. Need a video summary? Sora-2 creates 12-second clips from text prompts. Per-user quotas (10 videos/day, 500/month) ensure fair usage.
The TReSP Agent: Self-Service Data Analysis
For specialist grid data needs, the TReSP agent provides self-service analytics:
- Natural Language to SQL: Ask "What are the EV demand statistics for Birmingham?" and TReSP generates SQL against DuckDB, executes on 37MB+ of EV demand projections, and explains the results
- Heat Map Generation: Regional data visualised on interactive maps using MapTiler TileServer GL with 1.5GB of OpenStreetMap vector tiles for Great Britain
- Geospatial Queries: GSP region lookup by coordinates for precise grid planning
Multi-Agent Architecture
User Query
↓
┌─────────────────┐
│ Intent Classifier│ ← Determines query type & routes
└────────┬────────┘
↓
┌────────┴────────┐
↓ ↓
┌──────────────┐ ┌──────────────┐
│ NAVI AGENT │ │ TReSP AGENT │
│ │ │ │
│ • Policy RAG │ │ • EV Demand │
│ • Bing News │ │ • Grid Data │
│ • Images │ │ • SQL Gen │
│ • Video │ │ • Heatmaps │
│ • Podcasts │ │ • Maps │
└──────────────┘ └──────────────┘
LangGraph orchestrates query classification and routing. Seamless handoff between agents based on query intent - users don't need to know which agent handles their request.
Platform Services
| Service | Purpose |
|---|---|
| navi-agent | Policy Q&A via RAG, Bing news, image generation (gpt-image-1), video generation (Sora-2), real-time notifications |
| tresp-api | EV demand queries, regional energy pathways, SQL generation, heatmap visualisation |
| podcast-api | PDF upload, script generation, dual-voice audio synthesis |
| nesoai-blob-gateway | Secure media proxy for authenticated streaming |
| nesoai-tileserver | OpenStreetMap vector tiles for Great Britain (1.5GB MBTiles) |
| nesoai-analytics | Platform telemetry, user events, metrics |
| nesoai-infrastructure | Terraform IaC across dev/qa/prod |
Engagement Approach
Phase 1 (Months 1-2): Foundation Established Azure infrastructure, implemented vector database indexing, delivered initial Navi capability for document chat. Critically, we developed against ClearRoute's Azure environment first - enabling rapid iteration before tackling NESO's private endpoint complexity.
Phase 2 (Months 3-4): Expansion Added TReSP agent for grid data, Bing search integration for news feeds, and podcast generation. Each capability expanded what Navi could help employees with.
Phase 3 (Months 5-6): Scale Expanded team to 10+ engineers, onboarded dedicated QA, implemented PromptFoo testing framework, and prepared for broader rollout.
First Value: October 6th (Week 14) Stakeholders could interact with policy documents via natural language and see grid data visualised as heat maps.
Technical Implementation
RAG Pipeline
- Documents chunked and embedded (text-embedding-ada-002, 1536 dimensions)
- Indexed in Azure AI Search with vector + semantic capabilities
- Query-time: hybrid retrieval (vector + BM25 + semantic reranking) with configurable thresholds
- Results passed to GPT-4o with domain-specific system prompts
- Source citations returned with every answer
Podcast Generation Pipeline
PDF Upload → Document Intelligence (OCR) → Script Generation (GPT-4o)
→ SSML Markup → Azure Speech (dual voice) → Blob Storage
→ Web PubSub notification → Client playback
Security & Enterprise Integration
| Aspect | Implementation |
|---|---|
| Authentication | Microsoft Entra ID with OIDC/OAuth2 + PKCE |
| Authorisation | Role-based access via JWT claims, per-user data isolation |
| Network | Private endpoints for ALL Azure services, VNet integration |
| Identity | User-assigned managed identities for service-to-service auth |
| Secrets | Azure Key Vault with soft-delete and purge protection |
| Local Auth | Disabled on all cognitive services (Entra-only) |
Multi-Region Architecture
| Region | Purpose |
|---|---|
| UK South (Primary) | All web apps, Cosmos DB, AI Search, Storage, monitoring |
| Sweden Central | AI Foundry agents (Bing search), Sora-2 video generation, gpt-image-1 |
VNet peering connects Sweden to UK South for private connectivity - unlocking AI capabilities not yet available in UK regions.
Tech Stack
| Layer | Technologies |
|---|---|
| Frontend | React 19, TypeScript, Vite, Zustand, Tailwind, Radix UI |
| Backend | Python 3.11-3.13, FastAPI, Flask, LangGraph, LangChain, DuckDB |
| AI/ML | Azure OpenAI (GPT-4o, text-embedding-ada-002), AI Foundry, AI Search, Speech Services, Document Intelligence, Sora-2, gpt-image-1 |
| Data | Cosmos DB (MongoDB API), Blob Storage, Table Storage, Queue Storage |
| Geospatial | MapTiler TileServer GL, MBTiles, Leaflet |
| Infrastructure | Terraform (15 modules), Azure Pipelines, Docker, App Service, Functions |
| Security | Entra ID, OIDC/OAuth2, JWT, Managed Identities, Private Endpoints, Key Vault |
The Results
Platform Delivered
| Metric | Achievement |
|---|---|
| Backend Services | 7 APIs + 1 Function App |
| AI Agents | 2 (Navi + TReSP) |
| AI Models Deployed | 4 (GPT-4o, ada-002, gpt-image-1, Sora-2) |
| Azure Services Integrated | 20+ |
| Terraform Modules | 15 reusable modules |
| Environments | 3 (Dev, QA, Prod) |
| Regions | 2 (UK South, Sweden Central) |
| Private Endpoints | 15+ |
| Weekly Active Users | ~20 (alpha) |
| Daily Chat Volume | ~35 chats/day |
How Work Is Changing
Policy Questions: Hours → Seconds Previously, finding the right policy meant searching through multiple systems. Now employees ask Navi and get immediate, accurate answers with source citations.
Content Consumption: Reading → Listening Dense technical documents become engaging podcasts employees can listen to during their commute. Dual-voice narration makes complex content accessible.
Grid Data: Request & Wait → Self-Service Business users who needed EV demand data had to raise requests and wait. Now they ask TReSP and get visualisations immediately on interactive maps.
New Starter Experience: Lost → Guided First day at NESO? Navi knows your role and provides personalised prompts based on your position.
Value by Stakeholder
For NESO Leadership
- Established AI platform capability from zero - foundation for AI-first operations
- Zero-trust security model from day one (private endpoints, managed identity, no local auth)
- Multi-region deployment unlocking cutting-edge AI capabilities (Sora-2, gpt-image-1)
For Engineering
- 15 reusable Terraform modules for rapid environment provisioning
- Modern tech stack (React 19, Python 3.12, LangGraph) that attracts talent
- Patterns for multi-agent AI development replicable across future use cases
For Every Employee
- Instant answers to policy questions with source citations
- Complex documents made accessible through podcasts
- Self-service data visualisation without technical skills
Lessons Learned
What Worked Well
LangGraph for Agentic Workflows Flexible state management and clear debugging. The directed graph model made it easy to reason about agent capabilities and add new ones.
Hybrid RAG (Vector + Semantic + BM25) Significantly better retrieval than pure vector search. Configurable relevance thresholds per domain enabled fine-tuning for different document types.
Develop Local, Deploy Remote Building against ClearRoute Azure first, then porting to NESO, enabled rapid iteration. NESO's private endpoint requirements would have slowed feature development significantly.
Multi-Region AI Foundry Unlocked Sora-2 and Bing agents not available in UK South. VNet peering maintains private connectivity across regions.
Private Endpoints Everywhere Enterprise security from day one. Zero-trust architecture with managed identities eliminated credential management overhead.
Challenges Overcome
Aggressive Timeline The October 6th demo required trade-offs. The team subsequently invested in hardening - adding dedicated QA, implementing PromptFoo for LLM testing, and building proper integration tests.
Late Responsible AI Involvement Three months in, the Responsible AI lead surfaced compliance requirements that should have been considered earlier. Lesson: engage risk and compliance stakeholders from day one.
Emergent Requirements Requirements arrived as "Can you do podcasts? Make it happen." Stakeholders didn't know what they wanted until they saw it. We adopted iterative demonstration rather than extensive upfront specification.
Replicable Patterns
- Multi-agent routing architecture with LangGraph
- Enterprise RAG pipeline on Azure AI Search (hybrid retrieval)
- Real-time notification pattern for async AI tasks (Web PubSub)
- Secure media gateway with Azure AD integration
- Terraform modules for Azure AI services stack
- Multi-region AI deployment pattern with VNet peering
Future State
The AI Workbench is positioned for organisation-wide rollout. Planned capabilities:
- Holiday Booking: Ask Navi to check your balance and book time off, triggering approval workflows automatically
- Admin Portal: Self-service configuration for data sources and user group permissions
- Runtime Quality Monitoring: Real-time evaluation of agent responses beyond user feedback
- Expanded Agent Ecosystem: New agents for other business domains following established patterns
Navi is just the beginning. The platform establishes NESO's foundation for AI-first operations - where AI assistants aren't a novelty but an integral part of how every employee works. The architecture and patterns are ready to scale across 4,000-5,000 employees, transforming NESO into the AI-enabled organisation it set out to become.
Internal Learning
This engagement demonstrates that:
- Agentic AI is production-ready - LangGraph provides the orchestration layer needed for complex multi-agent systems in enterprise environments
- Hybrid RAG outperforms pure vector - combining embeddings, BM25, and semantic reranking delivers significantly better retrieval
- Multi-region unlocks capabilities - AI features ship to different regions at different times; architecture should accommodate this
- Security doesn't slow you down - private endpoints and managed identities from day one actually simplified the architecture
- Small teams ship fast - 5 core engineers delivered a working platform in 14 weeks; scale once patterns are established