πŸ€–

AI & LLM in J4H

How artificial intelligence, large language models, and retrieval-augmented generation power this app

J4H uses AI in three distinct ways: (1) a large language model (Claude) to generate natural-language health summaries, (2) a RAG pipeline (TF-IDF retrieval + cosine similarity) to select the most relevant diary entries before sending them to the LLM, and (3) a focus search that lets you drill into a specific topic. None of this requires a GPU β€” it all runs on standard Python and the Anthropic cloud API.
🧠
What is an LLM?
Large Language Models explained simply
β–Ό

A Large Language Model (LLM) is a neural network trained on vast amounts of text. It learns statistical patterns β€” which words follow which other words β€” until it can generate coherent, contextually appropriate text on almost any topic.

Your Prompt "Summarize my knee pain entries" LLM (Claude Haiku) Billions of learned parameters Tokenize Attend Predict Temperature=0.3 β†’ focused output Summary "The patient has experienced knee pain since Jan…"
A prompt goes in β†’ the LLM tokenizes, applies attention, and predicts the best response token by token
Why Claude Haiku? It's Anthropic's fastest and most cost-efficient model β€” ideal for summarizing diary entries where speed and low cost matter more than maximum reasoning power. Temperature is set to 0.3 (low) to keep output focused and factual rather than creative.
πŸ”
What is RAG?
Retrieval-Augmented Generation β€” finding the right entries before generating
β–Ό

LLMs have a context window limit β€” they can only read so many words at once. If a user has 500 diary entries, sending them all to Claude would be slow, expensive, and potentially exceed the limit. RAG solves this by retrieving only the most relevant entries first.

❌ Without RAG

Send all 500 entries to Claude. Most are irrelevant. Slow, expensive, and the signal is buried in noise.

βœ… With RAG

Retrieve the top 25 most relevant entries first. Send only those to Claude. Faster, cheaper, more accurate.
All Entries (could be 500+) headache entry back pain entry βœ“ knee entry stomach entry βœ“ knee entry fatigue entry βœ“ knee entry mood entry TF-IDF Retrieval Vectorize all entries + query as vectors Cosine similarity rank by relevance β†’ top 25 selected Top Entries (most relevant) βœ“ knee entry (score 0.92) βœ“ knee entry (score 0.87) βœ“ knee entry (score 0.81) …up to 25 entries sorted by score Claude Reads the top entries + specialty prompt β†’ Summary ✨ Query: "orthopedist symptoms joint pain stiffness"
RAG pipeline: all entries β†’ TF-IDF retrieval β†’ top relevant entries β†’ Claude generates summary
Encryption fallback: TF-IDF can't vectorize ciphertext. If entries are AES-encrypted (start with ENC:), the pipeline falls back to sorting by pain_level instead. This is why the frontend sends already-decrypted entries in the POST body β€” the server never sees the plaintext, but the RAG step still works.
πŸ“
TF-IDF & Cosine Similarity
How entries are ranked by relevance
β–Ό

Term Frequency–Inverse Document Frequency

TF (Term Frequency) measures how often a word appears in one entry. IDF (Inverse Document Frequency) penalizes words that appear in many entries β€” common words like "pain" are less useful for distinguishing relevance than specific words like "patella." Multiplied together, TF-IDF surfaces the signal words.

Term Frequency (TF) How often a word appears in THIS entry "knee" appears 5Γ— β†’ TF = 5/200 = 0.025 Γ— Inverse Doc Freq (IDF) How rare the word is across ALL entries "knee" in 20/500 entries β†’ IDF = log(500/20) = 3.2 = TF-IDF Weight 0.025 Γ— 3.2 = 0.08 High score = word is important here + rare
TF-IDF score = how often a word appears here Γ— how rare it is everywhere else

Cosine Similarity β€” the geometry

Each entry becomes a vector β€” one dimension per word, value = TF-IDF weight. The query ("knee pain orthopedist") becomes its own vector. Cosine similarity measures the angle between vectors. Small angle = similar direction = relevant entry.

"knee" axis "pain" axis Query "knee pain" Entry A knee, swelling Entry B headache, nausea ~8Β° simβ‰ˆ0.99 ~75Β° simβ‰ˆ0.26 Small angle = similar direction = high cosine similarity = RETRIEVED Large angle = different direction = low cosine similarity = filtered out
Vectors in 2D (simplified). Real space has thousands of dimensions β€” one per word in the vocabulary.
Invented by: Gerard Salton (Cornell, 1960s–70s) developed TF and the vector space model. Karen SpΓ€rck Jones (1972) introduced IDF. Together their work forms the backbone of classical information retrieval β€” and this project's RAG pipeline.
✍️
AI Summaries β€” End to End
From diary entries to doctor-ready paragraph
β–Ό

When you click Generate Summary, here is the full pipeline:

  1. Decrypt on device. The browser decrypts your entries using the AES-GCM key derived from your passcode. Plaintext never leaves the device before this step.
  2. Send decrypted entries to /api/summary/preview. The frontend POSTs the already-decrypted entry list along with your chosen specialty (e.g. "orthopedist") and optional custom doctor text.
  3. RAG retrieval. _retrieve_relevant_entries() in summarizer.py builds TF-IDF vectors, computes cosine similarity against the specialty query, and selects the top 25 entries.
  4. Prompt construction. generate_summary() builds a prompt using specialty-specific instructions from SPECIALTY_PROMPTS (9 specialties) plus the formatted top entries.
  5. Claude API call. The prompt is sent to Claude Haiku (max_tokens=1200, temperature=0.3). Claude returns a 3-section response: symptom paragraph, patterns paragraph, and 3 suggested questions.
  6. Display & save. The summary is shown in the browser. You can edit it, then save it to the DB via /api/summary/save, download it, or email it.
DEVICE (browser) HEROKU SERVER (app.py + summarizer.py) ANTHROPIC CLOUD User clicks Generate spa.html Decrypt entries AES-GCM key from passcode (crypto.js) POST /api/summary plaintext entries + specialty choice HTTPS Flask route app.py receives request.json RAG Retrieval TF-IDF + cosine sim β†’ top 25 entries Build prompt specialty text + formatted entries API call Claude Haiku generates summary response Flask returns JSON summary + RAG stats Display summary Edit Β· Save Β· Email Download 1 2 3 4 5 6 7 8 9
Blue arrows = request flow (browser β†’ server β†’ Anthropic). Green arrows = response flow back to the browser. Dashed lines cross layer boundaries.
9 Specialty prompts: General, Dentist, Podiatrist, Orthopedist, Neurologist, Cardiologist, Gastroenterologist, Rheumatologist, Physical Therapist β€” each has tailored retrieval queries and paragraph instructions so Claude focuses on the right symptoms.
# summarizer.py β€” simplified retrieved, used_tfidf = self._retrieve_relevant_entries(entries, query_text, top_k=25) formatted = self._format_entries(retrieved) message = self.client.messages.create( model="claude-3-haiku-20240307", max_tokens=1200, temperature=0.3, messages=[{"role": "user", "content": prompt}] )
πŸ”Ž
Focus Search
Drill into a specific body part or illness
β–Ό

The Focus page (/focus) lets you type any topic β€” "knee", "diabetes", "lower back" β€” and get a targeted AI summary plus a list of all matching entries. It uses the same RAG pipeline as specialty summaries but with top_k=50 (wider net) and a topic-specific prompt.

User types "knee" + decrypted entries list RAG Retrieval TF-IDF vectorize Cosine similarity top_k = 50 Claude Topic-specific prompt + entries β†’ 2 paragraphs Results AI Summary + 3 questions Matched entry cards listed
Focus search pipeline β€” same RAG core as specialty summaries, but query = user's topic string
  • Uses generate_focus_summary(entries, topic) in summarizer.py
  • Returns summary text + full list of matched entries with dates, pain scores, and locations
  • RAG info line shows "X most relevant of Y total" when filtering occurred
  • Download button saves a focus-knee.txt file for sharing with your doctor
πŸ”’
Cost & Privacy
What goes to Anthropic, what stays on device
β–Ό

βœ… Stays on your device

Your passcode. The AES encryption key (derived in browser, never sent to server). Raw ciphertext in the database.

⚠️ Sent to Anthropic API

Decrypted entry text when you generate a summary or use Focus. Anthropic's API privacy policy applies.
Cost: Claude Haiku is Anthropic's cheapest model β€” roughly $0.001–$0.003 per summary depending on entry count. For typical use (a few summaries per month) the cost is negligible, well under $1/month.
  • The RAG step reduces tokens sent to Claude β€” fewer entries = lower cost + faster response
  • Pain levels, dates, and patient names are stored plaintext in the DB β€” only diary content and location are encrypted
  • Photos, summaries, and family history are stored unencrypted in the database
  • ANTHROPIC_API_KEY lives in Heroku config vars β€” never in the codebase
πŸ—ΊοΈ
AI in J4H β€” Full Map
Every place AI touches the app
β–Ό
Claude Haiku model Anthropic API Specialty Summaries 9 doctor types Focus Search Any topic / body part RAG Pipeline TF-IDF retrieval Cosine similarity Email Summary to Doctor Gmail SMTP Summaries Saved to DB (/summaries) Suggested Questions 3 per summary
Claude sits at the centre β€” fed by the RAG pipeline, triggered by Specialty Summaries and Focus Search, producing summaries, suggested questions, and emailable reports