Live

Titanic Historical RAG

A RAG search engine over the 1912 Titanic inquiries that surfaces contradictions between witnesses instead of hiding them.

Try it live →View source on GitHub

Highlights

Live at titanic.higuera.io with the full US Senate Inquiry corpus indexed.
Killer feature: LLM-driven pairwise contradiction detection with 0–1 confidence and a one-sentence explanation per conflict.
Pinecone serverless for retrieval, Claude Haiku 4.5 with structured JSON output for verdicts, SQLite-backed verdict cache.
Roadmap: ingest full British Inquiry (~2,200 pages) to surface cross-inquiry contradictions — same witness, two stories.

The killer feature: contradictions, not consensus

Most RAG systems try to return one answer. The witnesses in the Titanic inquiries gave conflicting accounts on almost everything — speed, lifeboat counts, ice warnings, who gave which order. This tool embraces that. Ask “How many people were in Ismay’s lifeboat?” and you’ll see Ismay’s “about 45” set next to Officer Lowe’s “only twelve”, with a confidence-scored explanation of why the statements conflict.

Architecture

Layer	Choice
PDF extraction	pymupdf
Embeddings	OpenAI text-embedding-3-large @ 1024d
Vector store	Pinecone (serverless, AWS)
Contradiction LLM	Claude Haiku 4.5 (structured JSON output)
API	FastAPI + uvicorn
Frontend	Single-page HTML/JS, no framework
Verdict cache	SQLite local; DynamoDB planned for prod

What gets indexed

US Senate Inquiry (1912) — 1,173 pages, fully ingested. ~40,000 chunks across 68 witnesses.
British Board of Trade Inquiry — format sample only; full ~2,200-page transcript on the roadmap.

Long-term goal: ingest the full British Inquiry to surface cross-inquiry contradictions — where the same witness testified differently in the two proceedings.

API shape

POST /search/contradictions returns a list of pairwise contradictions with witness names, the two conflicting claims, a 0–1 confidence score, a one-sentence explanation, and the source chunks behind each claim.

{
  "witness_a": "Joseph Bruce Ismay",
  "witness_b": "Harold Godfrey Lowe",
  "claim_a": "Ismay's lifeboat had approximately forty-five people",
  "claim_b": "Ismay's lifeboat (Boat C) had only twelve people",
  "confidence": 0.95,
  "explanation": "The two witnesses provide directly conflicting specific counts: forty-five versus twelve, which cannot both be true."
}

Roadmap

Done — Killer feature shipped end-to-end and deployed at titanic.higuera.io (search → LLM verdicts → side-by-side UI behind a custom domain).
Next — Hook /witnesses to the canonical witness index; retire legacy ChromaDB code.
Soon — British Inquiry pipeline (speaker-tag parser, separate witness index, ingest full corpus).
Production — DynamoDB-backed verdict cache, rate limiter on the contradictions endpoint.

Try it live at titanic.higuera.io — no signup. Hit the /health endpoint to see the live document count.