CarePal: AI Wellness Companion for Seniors

healthcare7 min read

Published 29 March 2024

Proactive AI companion for senior wellness checks and medication adherence. Built with the Cohere API and a retrieval-augmented LLM in 20 hours. 2nd place at the Georgian College GenAI Hackathon.

Cohere APIRetrieval-Augmented GenerationLLMPythonProduct Strategy

GitHub →

CarePal was built in twenty hours at the Georgian College Generative AI Hackathon in March 2024 and finished second out of the field. It was the first time I worked on a product where the user was not me and not a peer. The user was a population I had to think carefully about: seniors living alone. Many manage multiple medications. Some have sensory impairments that make modern app interfaces unfriendly.

It was also the first time I had to defend a product story to a panel of judges who had heard six pitches before mine. Twenty hours is just enough time to get something demoable and not enough time to fix the second-order things you notice while demoing. This page is about what we built, what the experience taught us, and what was actually under the hood.

The problem we picked

The hackathon offered four themes: information security, healthcare, smart cities, and sustainability. Healthcare won the team's vote because one teammate told a story about helping an elderly neighbour learn to use a tablet and how that small interaction had revealed a much larger gap. The aging-in-place population in Canada is large and growing. Statistics Canada's 2022 release on living arrangements of older Canadians documented that a substantial share of seniors live alone, with the proportion rising in older age brackets and trending upward across the last several census cycles.

The technology question was narrower than the social one. Wellness checks, medication reminders, and emergency detection already exist as standalone tools. None of them were designed around the conversational accessibility that a senior with mild cognitive decline or fading vision actually needs. The opportunity was to use the new generation of conversational LLMs to do that interaction layer, and to wrap it in the context-awareness that retrieval-augmented generation enables.

What we built

CarePal as we shipped it at the hackathon was a proactive conversational companion that did three things:

Wellness checks. Scheduled, conversational check-ins that ask how the user is feeling, listen for signal words that indicate a problem, and flag care contacts when answers diverge from baseline.

Medication adherence support. Conversational reminders tied to a medication schedule, with the ability to ask follow-up questions ("did you take your evening dose with food?") and to note patterns that should be reported to a clinician.

Anomaly and emergency detection. Behavioral pattern recognition layered over the conversational data. If the daily check-in misses for three days running, the system escalates. If the conversation reveals symptoms that match emergency patterns, it routes to a different escalation path.

The accessibility surface was deliberately broad. The product was designed to work over voice, text, and visual interfaces depending on what the user needed. We did not build all three in twenty hours but the architectural decisions were made with that surface in mind.

Why Cohere, and how RAG fit

We built on Cohere's API for two reasons. First, Cohere is a Canadian company and the social mission of the project paired well with using a Canadian provider for the core inference. Second, Cohere's Command and Embed endpoints at the time had first-class support for the retrieval-augmented generation pattern we wanted, including a documents parameter on the chat endpoint that handled context grounding without us writing our own prompt assembly. There was also a free hackathon tier that let us actually ship.

Retrieval-augmented generation matters here because a senior wellness companion cannot operate from the model's training data alone. The model needs to know this user's medication schedule, this user's care preferences, this user's emergency contacts, and the recent conversation history. Stuffing all of that into a system prompt would not scale beyond a few users, and would burn through context window on data the model only sometimes needs. RAG sidesteps the problem by retrieving the relevant per-user context from a vector store at inference time and grounding the model's response in it.

The pipeline looked roughly like this:

User initiates or receives a conversational prompt.
The retrieval layer embeds the user's input, runs a top-k similarity search against a per-user vector index that holds chunks of their profile, medication schedule, recent conversation summaries, and care-plan documents, and returns the highest-scoring chunks.
The LLM receives the user's input plus the retrieved chunks (passed through Cohere's documents parameter so they show up as cited context), plus a carefully scoped system prompt that defines tone, escalation rules, and what the agent is not allowed to do.
The response is generated, displayed (or spoken), and the interaction is summarised and logged to the user's history so the next retrieval can pick it up.

That loop is the anatomy of any RAG-based assistant. The interesting design choices were not in the architecture, they were in the prompting and the retrieval ranking. How explicit should the model be about being an AI? When should it be sympathetic and when should it be directive? How does it handle a check-in where the user is clearly distressed but says they are fine?

Two retrieval choices are worth naming because they did not survive contact with our test prompts. Our first cut chunked the user profile by document, so a single retrieved chunk would carry both the medication list and the emergency contacts. That blew the context window faster than expected and made the model treat unrelated facts as connected. Re-chunking by topic (one chunk per medication, one per contact, one per care preference) cut token usage and gave the model cleaner grounding. The second was a cosine-similarity threshold below which we returned no documents at all, rather than the model's least-bad guess. Letting the agent say "I do not have that on file, can you tell me?" was better than letting it confabulate a medication name.

Team structure

Hackathon teams compress real product organisation into hours. Ours split into three roles:

The Hackers built the prototype. Two engineers on the core conversational pipeline and the wellness-check logic.

Business Development was where I worked. The work covered market sizing, competitive positioning, brand and messaging, and a go-to-market sketch. For a hackathon, business work is judged less on rigor and more on whether the product story fits together. Our story was that the technology had matured enough to make this category viable, the demographic shift was making it urgent, and a Canadian-built product had room in a market dominated by US competitors.

The Hustler handled the pitch. A presenter with the energy and clarity to walk a panel of judges through CarePal in three minutes is a different person than the engineer who wrote the prompt-engineering layer. The team had to know who was the right fit for which job and that decision happened within the first hour.

The bug that taught me the most

The most embarrassing failure of the weekend happened during dry-run testing, with maybe four hours left on the clock. I had wired up the medication-reminder flow and was checking that it handled a missed dose. The agent's response was technically correct (it logged the miss, asked a follow-up, and offered to escalate) but the tone was wrong. It read like a customer-service script. "We have noted that you have not yet taken your evening dose. Would you like to add a reminder?"

The user we had spent the day thinking about would not respond well to that voice. She would respond well to a check-in that sounded like the daughter who lives in another province calling to say hi. The fix was prompt-level: a system-prompt rewrite that gave the model a persona (a warm, slightly chatty companion who happens to track medications) and a list of phrases never to use ("we have noted," "your account," anything that sounded like a portal). The functional behaviour did not change. The voice did. That was the version we demoed.

The lesson was that the system prompt is doing far more product work than I had given it credit for. We could have been the same team with the same RAG and the same model and shipped something that felt like a help desk. The prompt is what made it not.

What it taught me

The lasting value of a hackathon is rarely the prototype. The prototypes are throwaway. The lessons that stuck were three.

Empathy is the design constraint people skip. Most teams approached the healthcare theme with a technology-first framing: what AI feature can we ship? The teams that placed approached it with a user-first framing: who is being underserved and what do they actually need? Those are not the same question and they produce different products.

RAG is the right primitive for personalized assistants. The pattern of retrieve-then-generate, with the retrieval scoped to a per-user context, is the spine of every personalized AI product I have built since. The hackathon was the first time I implemented it end-to-end and the lessons (chunking strategy matters, embedding-model choice matters, the system prompt is doing more work than you think) carried forward.

A product story is half the deliverable. The judges saw maybe ten seconds of the actual prototype. They spent three minutes on the pitch, the market story, and the founding team. Engineers who undervalue that mismatch ship great work that nobody invests in.

What I would change

If I were rebuilding CarePal today with what I have learned since, three things would change.

Voice-first by default, not text-first. The senior population we were designing for has a strong preference for voice interaction. Building text first and adding voice later is a common engineering shortcut and it produces the wrong product.

Care-team integration, not standalone product. The most plausible business model is not a direct-to-consumer subscription but an integration into the workflows of home-care providers, geriatric care managers, and provincial home-care programs. Designing for that integration changes what the product looks like.

Privacy and clinical safety boundaries from day one. The data CarePal handles, even in a wellness use case, edges into protected health information. The right architecture is one where the user's care record never leaves the boundary of the entity legally responsible for it. We did not build that in twenty hours. A real production version would have to. Working at metricHEALTH since has made this constraint feel obvious in a way it did not at the time. The hackathon prototype called Cohere's hosted endpoint with patient-shaped data sitting in the request body. A clinical version of CarePal would have to either run inference inside the care-provider's compliance boundary or strip the data to non-identifying tokens before it crossed any network. Neither is a twenty-hour decision.

The pitch video from the hackathon: https://youtu.be/rrtmGMnLhE4?si=8Jg7PHTcVZiouNZL