Building an AI Meeting Assistant That Actually Follows Through
Every team I have worked with runs into the same quiet failure. A meeting happens, sensible decisions get made, and within a day half of them have evaporated. The recording, if one exists, sits in a folder nobody reopens. I built OneVoice, the AI Meeting Wizard, because I wanted the opposite of that: an assistant that does not simply capture what was said, but makes sure something actually happens once everyone logs off.
Transcription was never the hard part
It is tempting to think the value of a meeting tool lives in the transcript. Speech-to-text is a commodity now; you can get a serviceable transcript from any number of APIs. The problem is that a wall of text is not a record anyone uses. Nobody scrolls through ninety minutes of dialogue to find the one sentence where they agreed to send a contract. The hard part, and the part worth building, is turning that raw stream into something each person can act on in under a minute.
So I set the product a simple test: if someone misses the meeting entirely, can OneVoice tell them exactly what they need to know and do, without reading a single line of transcript? Almost every design decision came from trying to answer yes.
How the pieces fit together
OneVoice is a fairly conventional full-stack application with an AI pipeline bolted into the middle. I kept the moving parts deliberately boring so the interesting work could go into the prompts and the follow-up logic.
- A JavaScript front end where users upload or stream a discussion and review the generated outputs.
- A FastAPI back end in Python that orchestrates transcription, summarization, and the reminder engine.
- Firebase for authentication and as the data store for meetings, summaries, and action-item state.
- Render for deployment, which kept the infrastructure story simple while I iterated.
FastAPI earned its place here. The AI work is naturally asynchronous and I/O-bound, waiting on transcription, then on a language model, then on email delivery, and FastAPI's async model let me keep those stages responsive without reaching for heavier infrastructure.
Summaries that know who is reading
The feature I am proudest of is role-based summaries. A single neutral summary is a compromise that serves nobody especially well. The engineer wants the technical decisions and the tickets that fell out of them; the project lead wants risks, owners, and dates; the client wants outcomes and next steps in plain language. So instead of generating one summary, OneVoice produces several views of the same meeting, each shaped for a role.
In practice that meant designing prompts that take the transcript plus a role and return a summary written for that perspective, the same facts with different emphasis and vocabulary. The interesting early failure was the model inventing detail to fill a role it had little material for. I learned to constrain it hard: summarize only what was actually said, and state plainly when the meeting did not cover something rather than papering over the gap.
From discussion to action items
A summary tells you what happened. Action items are where a meeting tool either earns its keep or becomes decoration. OneVoice extracts commitments, who agreed to do what, and stores each as a tracked item with an owner and a status, not just a bullet in a document.
Getting this reliable was less about clever modeling and more about structure. I asked the model to return action items in a strict, predictable shape, validated that shape on the back end, and rejected anything that did not fit rather than trusting free-form output. Treating the model like an unreliable junior teammate, useful and fast and occasionally confidently wrong, shaped almost every decision that followed.
Closing the loop with email
This is the part most tools skip, and it is the reason I built the thing. Capturing an action item changes nothing if it sits in an app nobody opens. OneVoice sends both automated and manual email reminders, and crucially, it embeds the current status of each item directly in the message so people can see where things stand at a glance, without clicking through to anything.
The constraint I cared about was respect for attention. Reminders that fire too often get filtered out and trained into noise. So the cadence is deliberate, the manual nudge exists for when a human knows the moment is right, and every message is built to be useful on its own rather than to demand a click.
What I would do differently
If I rebuilt OneVoice today, I would invest earlier in evaluation. For too long I judged summary quality by reading outputs and trusting my gut, which does not scale and quietly hides regressions. A small, honest test set, real transcripts paired with a human-checked notion of what a good summary contains, would have let me change prompts with confidence instead of crossed fingers.
I would also push more of the trust-building into the interface. People are reasonably skeptical of AI-generated notes, and the cure is transparency: let them trace any claim in a summary back to the moment in the transcript where it was said. Make the model show its work, and adoption stops being a leap of faith.
I stopped treating the language model as a brain and started treating it as a fast, eager teammate who needs structure, limits, and a way to be checked. Everything got more reliable after that.
OneVoice is still evolving, but the core lesson has held: the value was never in transcribing the meeting. It was in everything that happens after it ends.