VESTIGE
Fine-tuned DNABERT-2 on 44,800-year-old woolly mammoth aDNA using a damage-aware masking strategy — 13% lower loss than standard training. Biosecurity CNN (AUC 0.934) flags reconstructed sequences against known pathogenic profiles.
Building intelligent systems at the intersection of
data, cloud, and machine learning.
I'm a Data Scientist and ML Systems Engineer building end-to-end pipelines at the intersection of scientific computing, generative AI, and MLOps. My work ranges from fine-tuning genomic language models on ancient DNA to benchmarking LLMs on hardware design tasks — always with an eye on reproducibility and real-world deployment.
I care about the full lifecycle: research, model design, containerised deployment, and production monitoring. If a system doesn't hold up outside a notebook, it isn't done.
Fine-tuned DNABERT-2 on 44,800-year-old woolly mammoth aDNA using a damage-aware masking strategy — 13% lower loss than standard training. Biosecurity CNN (AUC 0.934) flags reconstructed sequences against known pathogenic profiles.
Evaluation framework for 3 LLMs writing Verilog HDL across 50 tasks with a 5-metric scoring system. Auto-repair pipeline feeds simulation failures back with structured error context — lifting pass rate from 0% → 51.8% across 1,610 runs.
Dual-model MRI pipeline — EfficientNetB0 classifier (99.75% AUC) followed by Faster RCNN localization (93.3% detection accuracy). Grad-CAM pseudo-labels eliminate manual annotation across 2,783 augmented images.
Autonomous ML red-teaming — 8-phase pipeline that attacks any PyTorch model, clusters failure modes with UMAP + HDBSCAN, explains with Gemini 2.5 Flash + RAG, patches autonomously, and ships a PDF audit report. Zero human decisions.
Team of 6 · 24-hour sprint
3 PRs submitted · gravitational lensing ML
Harvard / edX
NPTEL · IIT Madras
NPTEL · IIT Kharagpur
Open to collaborations, research opportunities, and interesting problems.