AI & LLM Explainer - J4H Health Diary

J4H uses AI in three distinct ways: (1) a large language model (Claude) to generate natural-language health summaries, (2) a RAG pipeline (TF-IDF retrieval + cosine similarity) to select the most relevant diary entries before sending them to the LLM, and (3) a focus search that lets you drill into a specific topic. None of this requires a GPU — it all runs on standard Python and the Anthropic cloud API.

🧠

What is an LLM?

Large Language Models explained simply

▼

A Large Language Model (LLM) is a neural network trained on vast amounts of text. It learns statistical patterns — which words follow which other words — until it can generate coherent, contextually appropriate text on almost any topic.

A prompt goes in → the LLM tokenizes, applies attention, and predicts the best response token by token

Why Claude Haiku? It's Anthropic's fastest and most cost-efficient model — ideal for summarizing diary entries where speed and low cost matter more than maximum reasoning power. Temperature is set to 0.3 (low) to keep output focused and factual rather than creative.

🔍

What is RAG?

Retrieval-Augmented Generation — finding the right entries before generating

▼

LLMs have a context window limit — they can only read so many words at once. If a user has 500 diary entries, sending them all to Claude would be slow, expensive, and potentially exceed the limit. RAG solves this by retrieving only the most relevant entries first.

❌ Without RAG

Send all 500 entries to Claude. Most are irrelevant. Slow, expensive, and the signal is buried in noise.

✅ With RAG

Retrieve the top 25 most relevant entries first. Send only those to Claude. Faster, cheaper, more accurate.

RAG pipeline: all entries → TF-IDF retrieval → top relevant entries → Claude generates summary

Encryption fallback: TF-IDF can't vectorize ciphertext. If entries are AES-encrypted (start with ENC:), the pipeline falls back to sorting by pain_level instead. This is why the frontend sends already-decrypted entries in the POST body — the server never sees the plaintext, but the RAG step still works.

📐

TF-IDF & Cosine Similarity

How entries are ranked by relevance

▼

Term Frequency–Inverse Document Frequency

TF (Term Frequency) measures how often a word appears in one entry. IDF (Inverse Document Frequency) penalizes words that appear in many entries — common words like "pain" are less useful for distinguishing relevance than specific words like "patella." Multiplied together, TF-IDF surfaces the signal words.

TF-IDF score = how often a word appears here × how rare it is everywhere else

Cosine Similarity — the geometry

Each entry becomes a vector — one dimension per word, value = TF-IDF weight. The query ("knee pain orthopedist") becomes its own vector. Cosine similarity measures the angle between vectors. Small angle = similar direction = relevant entry.

Vectors in 2D (simplified). Real space has thousands of dimensions — one per word in the vocabulary.

Invented by: Gerard Salton (Cornell, 1960s–70s) developed TF and the vector space model. Karen Spärck Jones (1972) introduced IDF. Together their work forms the backbone of classical information retrieval — and this project's RAG pipeline.

✍️

AI Summaries — End to End

From diary entries to doctor-ready paragraph

▼

When you click Generate Summary, here is the full pipeline:

Decrypt on device. The browser decrypts your entries using the AES-GCM key derived from your passcode. Plaintext never leaves the device before this step.
Send decrypted entries to /api/summary/preview. The frontend POSTs the already-decrypted entry list along with your chosen specialty (e.g. "orthopedist") and optional custom doctor text.
RAG retrieval. _retrieve_relevant_entries() in summarizer.py builds TF-IDF vectors, computes cosine similarity against the specialty query, and selects the top 25 entries.
Prompt construction. generate_summary() builds a prompt using specialty-specific instructions from SPECIALTY_PROMPTS (9 specialties) plus the formatted top entries.
Claude API call. The prompt is sent to Claude Haiku (max_tokens=1200, temperature=0.3). Claude returns a 3-section response: symptom paragraph, patterns paragraph, and 3 suggested questions.
Display & save. The summary is shown in the browser. You can edit it, then save it to the DB via /api/summary/save, download it, or email it.

Blue arrows = request flow (browser → server → Anthropic). Green arrows = response flow back to the browser. Dashed lines cross layer boundaries.

9 Specialty prompts: General, Dentist, Podiatrist, Orthopedist, Neurologist, Cardiologist, Gastroenterologist, Rheumatologist, Physical Therapist — each has tailored retrieval queries and paragraph instructions so Claude focuses on the right symptoms.

# summarizer.py — simplified
retrieved, used_tfidf = self._retrieve_relevant_entries(entries, query_text, top_k=25)
formatted  = self._format_entries(retrieved)

message = self.client.messages.create(
    model="claude-3-haiku-20240307",
    max_tokens=1200,
    temperature=0.3,
    messages=[{"role": "user", "content": prompt}]
)

🔎

Focus Search

Drill into a specific body part or illness

▼

The Focus page (/focus) lets you type any topic — "knee", "diabetes", "lower back" — and get a targeted AI summary plus a list of all matching entries. It uses the same RAG pipeline as specialty summaries but with top_k=50 (wider net) and a topic-specific prompt.

Focus search pipeline — same RAG core as specialty summaries, but query = user's topic string

Uses generate_focus_summary(entries, topic) in summarizer.py
Returns summary text + full list of matched entries with dates, pain scores, and locations
RAG info line shows "X most relevant of Y total" when filtering occurred
Download button saves a focus-knee.txt file for sharing with your doctor

🔒

Cost & Privacy

What goes to Anthropic, what stays on device

▼

✅ Stays on your device

Your passcode. The AES encryption key (derived in browser, never sent to server). Raw ciphertext in the database.

⚠️ Sent to Anthropic API

Decrypted entry text when you generate a summary or use Focus. Anthropic's API privacy policy applies.

Cost: Claude Haiku is Anthropic's cheapest model — roughly $0.001–$0.003 per summary depending on entry count. For typical use (a few summaries per month) the cost is negligible, well under $1/month.

The RAG step reduces tokens sent to Claude — fewer entries = lower cost + faster response
Pain levels, dates, and patient names are stored plaintext in the DB — only diary content and location are encrypted
Photos, summaries, and family history are stored unencrypted in the database
ANTHROPIC_API_KEY lives in Heroku config vars — never in the codebase

🗺️

AI in J4H — Full Map

Every place AI touches the app

▼

Claude sits at the centre — fed by the RAG pipeline, triggered by Specialty Summaries and Focus Search, producing summaries, suggested questions, and emailable reports

AI & LLM in J4H

❌ Without RAG

✅ With RAG

Term Frequency–Inverse Document Frequency

Cosine Similarity — the geometry

✅ Stays on your device

⚠️ Sent to Anthropic API