Titanic Historical RAG
A RAG search engine over the 1912 Titanic inquiries that surfaces contradictions between witnesses instead of hiding them.
Highlights
- Live at titanic.higuera.io with the full US Senate Inquiry corpus indexed.
- Killer feature: LLM-driven pairwise contradiction detection with 0–1 confidence and a one-sentence explanation per conflict.
- Pinecone serverless for retrieval, Claude Haiku 4.5 with structured JSON output for verdicts, SQLite-backed verdict cache.
- Roadmap: ingest full British Inquiry (~2,200 pages) to surface cross-inquiry contradictions — same witness, two stories.
The killer feature: contradictions, not consensus
Most RAG systems try to return one answer. The witnesses in the Titanic inquiries gave conflicting accounts on almost everything — speed, lifeboat counts, ice warnings, who gave which order. This tool embraces that. Ask “How many people were in Ismay’s lifeboat?” and you’ll see Ismay’s “about 45” set next to Officer Lowe’s “only twelve”, with a confidence-scored explanation of why the statements conflict.
Architecture
| Layer | Choice |
|---|---|
| PDF extraction | pymupdf |
| Embeddings | OpenAI text-embedding-3-large @ 1024d |
| Vector store | Pinecone (serverless, AWS) |
| Contradiction LLM | Claude Haiku 4.5 (structured JSON output) |
| API | FastAPI + uvicorn |
| Frontend | Single-page HTML/JS, no framework |
| Verdict cache | SQLite local; DynamoDB planned for prod |
What gets indexed
- US Senate Inquiry (1912) — 1,173 pages, fully ingested. ~40,000 chunks across 68 witnesses.
- British Board of Trade Inquiry — format sample only; full ~2,200-page transcript on the roadmap.
Long-term goal: ingest the full British Inquiry to surface cross-inquiry contradictions — where the same witness testified differently in the two proceedings.
API shape
POST /search/contradictions returns a list of pairwise contradictions with witness names, the two conflicting claims, a 0–1 confidence score, a one-sentence explanation, and the source chunks behind each claim.
{
"witness_a": "Joseph Bruce Ismay",
"witness_b": "Harold Godfrey Lowe",
"claim_a": "Ismay's lifeboat had approximately forty-five people",
"claim_b": "Ismay's lifeboat (Boat C) had only twelve people",
"confidence": 0.95,
"explanation": "The two witnesses provide directly conflicting specific counts: forty-five versus twelve, which cannot both be true."
}Roadmap
- Done — Killer feature shipped end-to-end and deployed at titanic.higuera.io (search → LLM verdicts → side-by-side UI behind a custom domain).
- Next — Hook
/witnessesto the canonical witness index; retire legacy ChromaDB code. - Soon — British Inquiry pipeline (speaker-tag parser, separate witness index, ingest full corpus).
- Production — DynamoDB-backed verdict cache, rate limiter on the contradictions endpoint.
Try it live at titanic.higuera.io — no signup. Hit the /health endpoint to see the live document count.