Research
Papers & Publications
Research on AI, semantic retrieval and machine learning — with open-access papers and reproducible code.
Zenodo
·
May 2026
+5.2% Mean Average Precision
2.5× similarity gap
48× storage compression
Applying Principal Component Analysis to text embeddings fitted on a domain-specific corpus improves semantic retrieval without any fine-tuning of the embedding model. Tested on a medical corpus of 20 clinical topics using OpenAI's text-embedding-3-small, PCA-32 with corpus-only fitting achieved MAP 0.9203 versus a baseline of 0.8750 — a 5.2% improvement — alongside a 2.5× increase in similarity gap and 48× reduction in storage. Domain-directed axes are essential; random projections do not replicate the gain.