At
Wakapi we talk a lot about AI in production systems. But sometimes the most powerful experiments start closer to home. This story comes from one of our team members who decided to solve a very real problem in their day to day work: creating better support tickets from call reviews, without sending any sensitive data to the cloud.
The result was a fully offline, AI assisted ticketing helper running on a local machine, powered by Whisper, Ollama, Docker and a GPU. In this post we will walk through how it works, why it matters for engineering teams and why clients should care about local AI too.
The problem: messy calls, incomplete tickets
If you have ever worked with a support or operations team, this probably sounds familiar:
- Calls are long, noisy and full of details.
- The person handling the call must take notes while talking.
- Later, they open the ticketing tool and try to remember: who called, what happened, what steps were taken, what is still pending.
Important information gets lost, descriptions are inconsistent and engineers who receive the ticket waste time asking for context.
Our teammate wanted something different: a simple way to turn raw call audio into a clean, structured ticket description, with better grammar, less redundancy and still 100 percent offline.
Why local AI instead of cloud APIs
- Cloud based AI is powerful, but it is not always the right fit for every workflow. Local AI can be a better option when you care about:
- Privacy: audio recordings and conversations never leave your machine or your internal network.
- Compliance: some industries have strict rules about sending user data to third party services.
- Cost control: once the hardware is in place, you are not paying per token or per request.
- Latency and reliability: the system keeps working even if your internet connection is unstable.
Running large language models locally, using tools like Ollama or similar local first stacks, is becoming a practical option exactly for these reasons.
Architecture at a glance
The project started as a hobby, but it follows the same thinking we apply in client work: keep things modular and easy to extend. At a high level, the assistant looks like this:
- A small offline web app for entering or reviewing ticket information.
- Whisper, running locally, to transcribe call audio into text.
- Ollama, running a local LLM, to:
- summarize the conversation,
- identify who spoke,
- extract key actions and decisions,
- polish the final ticket description.
- Docker to containerize everything and keep the stack isolated and reproducible.
- A GPU to make all of this usable in seconds instead of minutes.
Everything runs on a Linux environment inside Docker, with the web interface exposed only locally. No external APIs, no external data sharing.
Step by step: from call recording to clean ticket
Let us go through the workflow the way our teammate actually uses it.
1. Capture and transcribe the call
The starting point is a call recording. The user loads the audio file into the local environment, where Whisper is running. Whisper converts the conversation into text, entirely offline.
This solves the first big problem: instead of replaying the call multiple times to catch a single detail, they can scan the transcript, search inside it and feed it directly to the next AI step.
2. Summarize the conversation with a local LLM
Next, the transcript is sent to a model hosted in Ollama. The prompt is designed very carefully and asks the model to:
- Separate the people involved in the call (for example client, support agent, third party).
- Identify what the call was about in plain language.
- List the steps that were taken during the call.
- Highlight open questions or pending actions.
The model returns a structured summary with clear sections, ready to be pasted or adapted into the ticket body. This turns a messy, multi minute conversation into something that fits on a single screen.
3. Auto populate ticket fields
On top of that AI summary, the offline web app provides a simple form with the fields the user needs every day, such as:
- Store or location number
- Name of the person who called
- Phone number or contact info
- Ticket title
- Ticket description
The form applies a predefined format so that every ticket follows the same structure. That is valuable not only for engineers who read the tickets, but also for any reporting or analytics built on top later.
4. Spell check and cleanup in Markdown
There is one more clever layer. The same local model that summarized the conversation is also used as a kind of “Markdown proofreader”.
The user sends the final ticket description to the model with a simple prompt along the lines of “Spellcheck Markdown” so that it:
- Fixes spelling and grammar issues.
- Removes obvious redundancies.
- Keeps headings, bullet points and code blocks intact.
The result is a ticket that is clean, readable and easy to scan, without the user spending extra minutes manually editing the text.
Tech stack in more detail
Even though this started as a personal project, the stack uses tools we also see in production ready environments.
Whisper for transcription
Whisper is an automatic speech recognition model that can handle multiple languages, accents and noisy audio, which makes it a strong fit for real world support calls.
Ollama for local LLMs
Ollama makes it simple to run open models locally, managing downloads, updates and serving. Developers can choose different models depending on accuracy and speed needs, and swap them in and out as experiments evolve.
Docker for isolation and portability
All components run inside Docker containers so the setup can be reproduced or moved to a different machine with minimal friction. This is the same pattern we recommend for AI services in client environments: each model or service gets its own container, with clear interfaces between them.
GPU acceleration
Without GPU acceleration, local AI can feel slow. In this case, a dedicated GPU is what makes call transcription and summarization run in seconds instead of minutes, which is critical for daily use.
Why this matters for engineering teams
From an engineering point of view, this small project is interesting because it demonstrates several important ideas:
- AI as a workflow assistant, not a black box product
The value is not in “having AI”, but in how it is embedded into a real process: reviewing calls, creating tickets and handing work off to development teams.
- Prompt design as part of the architecture
The prompt for summarization and the prompt for Markdown cleanup are treated like code. They are iterated and refined until they consistently generate useful, predictable output.
- Local first architecture for sensitive data
Keeping transcription and analysis offline is not only about privacy. It also opens opportunities to run AI close to where the data is generated, for example inside a store network or a branch office.
- Developer led experimentation
Our teammate is very honest about this being a hobby project, but that is exactly the culture we want: engineers who explore new tools, break things safely, then bring the best ideas into client work.
Why clients should care about local AI
If you are a CTO or product owner evaluating AI initiatives, projects like this one send a clear signal about what is possible when you combine the right tools with the right mindset:
- You can boost productivity in support and operations without replacing teams or disrupting existing tools.
- You can respect customer privacy while still benefiting from advanced models.
- You can prototype quickly with local AI, then productionize the ideas that prove their value.
- You can extend the same approach to other use cases such as quality assurance, field service, or monitoring.
At
Wakapi we see local AI as a powerful complement to cloud based models. For many clients, the winning strategy will not be one or the other, but a hybrid approach where sensitive workflows stay local and other workloads take advantage of managed services.
What is next: from tickets to cameras and beyond
This story does not end with support tickets. Once you have a local AI stack running, it is natural to experiment with other ideas. In this case, our teammate has been exploring:
- Image generation for creative side projects.
- Separation of instruments and voices in audio tracks using open source tools.
- A future home setup with IP security cameras and local face detection, powered by a dedicated server.
Not every experiment becomes a production feature. But all of them build intuition about how AI behaves, what it needs in terms of hardware and where it delivers real, measurable value.
Bringing this thinking into your projects
If you are curious about how a similar offline or hybrid AI architecture could work in your environment, here are a few starting questions you can explore with your own teams:
- Which workflows depend on long calls, voice notes or unstructured conversations?
- Where are tickets or handoffs consistently painful for engineers or business users?
- What data would you prefer to keep fully on premises for privacy or compliance reasons?
- Do you already have hardware (for example, GPUs or powerful workstations) that could host local AI services?
From there, it becomes a design challenge: choose the right models, define the prompts and build small tools that make your teams faster without disrupting how they already work. If any of this sounds familiar and you would like to explore local or hybrid AI pilots for your own software stack, the Wakapi team would be happy to help you turn that curiosity into a roadmap. You can
Schedule a Meeting here.