Our goal: ship a real-time voice conversation with Mem that feels like calling a human assistant who already knows our whole world — fluent recall in the moment, brain-dump capture without blocking, and heavier work executed during and after the call via the agent's full tool set.
How to read this: top sections are the briefing; lower sections add depth. The async execution model (the pillar we get into next) is the non-obvious crown jewel of this concept — we should make sure we've all internalized it before we start scoping.
TL;DR
- Start a voice conversation with Mem at any time — bidirectional, real-time, natural cadence.
- Mem captures brain dumps without slowing the conversation — what we say flows into Mem, and the agent works on it in the background.
- Mem recalls from our whole world instantly: "What's my KTN number?" "What were my follow-ups with Sarah?" "What should I focus on today?"
- Mem queues heavier work during the call and keeps executing after we hang up — notes get created, emails get drafted, summaries get consolidated — and then it messages us when it's done.
- Mem proactively surfaces related context as topics come up — dovetailing on what we're talking about.
- The huddle agent has the same tools as Mem Agent — internet search, search and retrieval across our Mem, the ability to message us after the call.
Why this matters
Voice Mode today is well-received but constrained — we can only use it inside a specific note, and it can really only create or edit content within that note. What users actually want is to use their voice to interact with their entire Mem — not just a single note they're already sitting inside.
Huddle is the up-level: voice as the interaction surface for the whole product, not a dictation helper inside one file. Another way to say it: we already have great value propositions (Mem Agent's search + research + memory, Mem Chat's conversational intelligence, proactive recall via Heads Up); Huddle is the medium that exposes all of that through a new modality. Becoming multimodal is how we meet the user where they are — driving, walking between meetings, running errands, AirPods in.
The ICP skews toward busy operators whose work happens in motion. Voice is the interaction pattern that fits their life. A real-time conversation with an AI that already knows their world is dramatically higher-bandwidth than typed chat, and completely different from any generic voice assistant they've used.
The pillar that makes Huddle different: async execution
This is the single most important concept for us to get right. Most voice assistants feel robotic because the conversation is gated on work completion — you ask, it pauses to "think," then it responds. Huddle inverts that.
The model:
- During the huddle, the conversation is synchronous and flowing.
- When we ask for something non-trivial ("Consolidate all my Europe trip notes into one plan"), Mem reacts to the idea conversationally and queues the task silently in the background.
- The conversation continues — no awkward pause, no "processing…"
- The huddle engine keeps running after the call ends, executing queued work with proper thinking time.
- Results arrive asynchronously — via Mem Agent messaging us when it's done.
What this sounds like in a conversation:
You: Hey, I've got this idea for our onboarding flow — I think Teams admins should be able to invite via Slack DM, and we should stop assuming people know what Collections are in the welcome sequence.
Mem: Oh, that's a good one — the Slack DM thing especially, since that's where admins actually live. Does it connect at all to the pilot Sarah was pushing for?
(In the background: Mem queues up turning this dump into a structured note; it'll create it after the call ends. Nothing is said out loud about that — the conversation just keeps going.)
You: Yeah, good catch, it might. Anyway — also, can you consolidate my Europe trip notes into one plan?
Mem: Sure, I'll pull that together.
(Also queued.)
You: Great, I'll talk to you later.
(After the call, Mem executes both queued tasks. A few minutes later, a Mem Agent message: "Created 'Onboarding flow ideas' with your dump, and cross-referenced Sarah's pilot note." A few minutes after that: "Europe trip plan consolidated — here's the note.")
The conversation feels human because Mem reacts to the ideas, not to its own bookkeeping. The queued work is invisible until it lands. This pattern is what makes Huddle feel like an assistant rather than a slow model. Without this, Huddle is just Mem Chat with a microphone.
Core workflows (the hero paths)
1. Brain-dump capture without blocking
You call up Mem on a walk: "Hey, I've got this idea for a new onboarding flow — we should let Teams admins invite via Slack DM, and I think the welcome sequence needs to stop assuming people know what Collections are…"
Mem reacts to the idea: "Oh interesting — the Slack DM thing especially. Does that play at all into the pilot Sarah was pushing for?"
The conversation keeps going. Meanwhile, Mem has queued the dump as work. Back at your desk minutes later, there's a message from Mem: "Created 'Onboarding flow ideas' — cross-referenced Sarah's pilot note where relevant."
This is the workflow that does the most to define Huddle. The vibe should feel like talking to a sharp person who's listening and reacting, not to a transcription service.
2. Recall — "aware of your whole world"
"What's my KTN number?" → "362496."
"What were my follow-ups from the meeting with Sarah last week?" → "You said you'd send her the SOC 2 report."
"What should I focus on today?" → Grounded answer based on what you've been working on, your calendar, and pending commitments.
Mem already has context. We don't specify who Sarah is, which meeting, or what's recent. The bar for "real recall" is high — if Mem hedges or asks clarifying questions on prompts like these, the magic dies.
3. Queued multi-step work
You: "Can you go through all my notes on the Europe trip and consolidate them into one trip plan I can reference on the trip?"
Mem: "Yep — I'll pull that together. Anything else?"
You: "Also, remind me tomorrow morning to confirm the hotel."
Mem: "Got it."
Two distinct asks, both queued silently. The trip plan arrives as a Mem Agent message after the call; the reminder fires tomorrow.
4. Proactive contextual surfacing (dovetailing)
You: "Hey, remind me to schedule a meeting with Sarah for later this week."
Mem: "Sure, I'll remind you later today." (pause) "By the way — didn't you say you were going to send her the SOC 2 report? Want me to remind you about that one too?"
You: "Oh shoot, I forgot."
Mem: "No worries, I'll add it to the list."
You brought up Sarah; Mem dovetails on that topic and surfaces something related and useful. This is different from the time-based reminders below, which can be totally unrelated to what you're talking about — this one rides the current topic naturally.
5. Time-based reminders delivered when Mem has your attention
(Mid-conversation, natural pause.) "Oh — by the way, your rent check is due today."
When Mem has our attention anyway, it's a good moment to land a time-sensitive reminder. The huddle becomes a natural delivery channel for things that would otherwise fire as an OS notification we'd ignore.
Milestones
M1 — Pick the voice stack
Before we write any app code, we need a conversation engine. Evaluate and pick one of:
- Gemini Live
- OpenAI Realtime
- xAI voice
- Anthropic (if/when realtime is viable)
- AssemblyAI or similar specialized stacks
Pick based on latency, barge-in support, tool-calling fidelity, and which form factor each makes easiest (iOS vs. web vs. desktop). The voice stack choice likely forces our form-factor decision — if one has a clean iOS SDK, iOS first; otherwise web/desktop.
Latency budget: Mem starts responding within ~500ms of the user pausing. Anything slower kills the natural feel.
M2 — Capture + queued execution
- During a huddle, brain-dumped content flows into Mem (via Mem Agent) and is captured as a structured note after the huddle ends.
- Multi-step requests ("consolidate my trip notes") are queued as tasks during the call and executed after.
- The huddle agent continues running after the call ends, can still invoke tool calls, and messages the user when work is complete (via Mem Agent, same plumbing we already use to message users).
- Demo target: a scripted conversation where the user brain-dumps and asks for multi-step work; after hanging up, the user receives a message from Mem with the results.
M3 — Recall over the user's Mem
- During a huddle, Mem has full access to the user's Mem via the Mem search and retrieval toolkit (the same retrieval tools Mem Agent uses today).
- Can answer factual questions, retrieve past commitments, and give grounded summary answers.
- Demo target: five scripted recall prompts answered correctly and fluently in one session.
- Internet search should also be available to the huddle agent (à la xAI's voice assistant) — so "what's the weather in Paris next week" works alongside "what did I tell Sarah."
M4 — Proactive contextual surfacing
- Based on what the user just said, Mem dovetails with a relevant related nudge.
- Demo target: in a scripted conversation, Mem dovetails on a topic the user raised with something useful they didn't ask for.
- Calibrate the cadence — at most one unsolicited surface per ~60 seconds of conversation is a reasonable starting point.
Bonus milestone — Mem calls you (stretch with high demo payoff)
- Mem Agent decides (based on time-based reminders, pending follow-ups, whatever) that it has enough stacked up to warrant a huddle.
- Sends a push notification: "Huddle starting — Mem is calling you."
- User picks up; Mem opens the conversation with an agenda. "Hey — you've got three things I wanted to catch up on. First, Sarah's follow-up…"
- This is the inversion of the normal flow. It's a powerful demo because it shows the agent as a genuine partner with initiative, not a tool waiting for a trigger.
More example vignettes
Use these to pressure-test whether scope feels right. If a demo doesn't naturally cover several, we probably need to adjust.
- Morning commute. AirPods in, driving. "What's on my plate today?" Mem pulls from calendar, pending commitments, and anything overdue, and gives a contextual brief.
- Walk around the block. Brain-dumping a new idea; Mem reacts to the idea conversationally. Back at the desk, the structured note is waiting.
- Between meetings. "Can you draft a follow-up email to John based on the meeting we just had?" Mem confirms, continues the conversation. A minute later, the draft is in the user's email.
- At the grocery store. "Hey, I'm in the produce section — I'm making that bolognese tonight for the team event. What ingredients do I still need to pick up?" Mem pulls the recipe, cross-references what the user already has (from a recent note about the fridge), answers.
- Contextual dovetail. (See workflow #4 above — Sarah + SOC 2 report.)
- Travel site lookup. "What's my frequent flyer number for United?" → Instant answer.
- End-of-day check. "Anything I still need to get done today?" Mem reviews captured commitments from today and surfaces what's outstanding.
- Mem calls us. 10am. Push notification: "Huddle — Mem calling." User picks up. "Hey — three things. You still owe Sarah the SOC 2 report; the hotel for Europe confirmed, so I logged that; and your rent check is due today. Want me to draft the Sarah email now?"
P2 / stretch ideas (if we land M1–M4)
iOS as a form factor (may actually be first)
iOS is the natural hero surface for the ICP — AirPods-in, on the move. If the voice stack we pick has a clean iOS SDK, we should start there. If it's mostly desktop/web, start there and treat iOS as a clear next-step. Either way, architecture for the voice session, recall, and work queue should be portable.
Continued agent execution and end-of-huddle artifacts
When the call ends, the huddle agent keeps running and emitting outputs — text messages summarizing what it did, draft emails delivered to the user's inbox, notes appearing in Mem, reminders scheduled for later. This isn't really "stretch" so much as a necessary property of M2, but worth calling out as something to polish: the post-hang-up experience should feel like a competent assistant doing the work, not like the call just ended.
Multi-tasking during huddle
On desktop: the huddle continues in a compact UI while the user uses Mem's main app. Worth exploring if we land M1–M3 with room to spare.
Key decisions to make on Day 1
- Voice stack. Pick one realtime voice API. Candidates include Gemini Live, OpenAI Realtime, xAI voice, Anthropic (if viable), AssemblyAI. Evaluate on latency, barge-in, tool-calling fidelity, and SDK quality per form factor. This decision will likely force #2.
- Form factor. iOS vs. web vs. desktop first, largely downstream of #1. Keep the architecture portable regardless — this is a medium we'll be in everywhere eventually.
- Invocation gesture. A button in the Mem app / Floating Mem bubble / a hotkey / a wake word. Recommendation: start with a button on whichever platform we choose; wake word can wait.
- Recall implementation. Reuse Mem's search-and-retrieval toolkit — the same one Mem Agent already uses. Don't build a new retrieval path for voice.
- Work queue representation. Not user-visible. Work is tracked inside the huddle agent; results are messaged to the user via Mem Agent after the call. Each queued task can spin up its own Mem Chat / Mem Agent session if that helps execution; we have a choice to make about which harness.
- Proactive-surface cadence. Start conservative (≤1 unsolicited surface per ~60s). Tune from there based on feel.
What "done" looks like for the week
Minimum demo (M1 + M2):
- Real-time voice conversation with Mem. User brain-dumps; Mem reacts naturally. User makes a multi-step request. After the call, Mem messages the user with the completed work.
Strong demo (M1 + M2 + M3):
- All of the above, plus fluent recall across the user's Mem during the call — factual questions, past commitments, grounded summaries.
Wow demo (M1 + M2 + M3 + M4):
- The above plus convincing in-conversation proactive dovetailing — Mem surfaces at least one helpful related thing based on a topic the user raised.
Top-tier demo (add Bonus milestone):
- The above plus Mem calling us — a push notification launches a huddle where Mem has its own agenda to work through.
Things Huddle is not (for this hack week)
- Not Mem Chat with a microphone. If the demo doesn't show async execution and proactive recall, it's missing the point.
- Not continuously listening. A huddle starts on explicit invocation and ends on explicit exit.
- Not a replacement for typing. Typed interaction remains the right tool for deep, careful queries. Huddle is for motion and throughput.
- Not a meeting transcription product. That's Voice Mode / meeting capture, a separate existing surface.