Angshuman Chakravertty

Driven by curiosity.
Powered by data.

I'm an ML Systems and Data Science Engineer building end-to-end pipelines at the intersection of scientific computing, generative AI, and MLOps. My work ranges from fine-tuning genomic language models on ancient DNA to benchmarking LLMs on hardware design tasks — always with an eye on reproducibility and real-world deployment.

Most recently I shipped backend AI services in production — multi-language OCR, async batch pipelines, and document Q&A — where correctness under real rate limits and messy Unicode mattered more than benchmark numbers.

I care about the full lifecycle: research, model design, containerised deployment, and production monitoring. If a system doesn't hold up outside a notebook, it isn't done.

Genomic ML LLM Systems MLOps Benchmarking Scientific Computing

0 + GitHub Repos

0 + Commits

0 + Technologies

0 + Research Domains

In Production

Jun — Jul 2026 Hyderabad, IN

AI Backend Engineering Intern

Asset Telematics Pvt Ltd Certificate GitHub

Extended a Gemini-powered OCR microservice to emit structured output in five languages (English, Arabic, Hindi, French, German) across 10 document types, via prompt engineering and a per-request output_lang parameter. Traced a silent Unicode data-loss bug to a Latin-only regex that was stripping Arabic and Devanagari — replaced it with Unicode-category-aware cleaning that preserves combining marks.
Built an async batch-processing pipeline (/extract_batch + /batch_status polling) with bounded parallelism via an asyncio semaphore and per-file partial-failure isolation — engineering concurrency around a blocking vendor SDK using a multi-key round-robin thread pool, validated to degrade gracefully under live API rate limits.
Integrated OpenRouter (Qwen3-VL) as an isolated document Q&A module with upload-once/ask-many sessions, plus a benchmarking harness measuring throughput, latency, and extraction accuracy — ~86% name-extraction accuracy on passports, with structured validation rejecting incomplete documents.

Python FastAPI asyncio Gemini OpenRouter Qwen3-VL

Selected Work

01 2026

VESTIGE

Fine-tuned DNABERT-2 on 44,800-year-old woolly mammoth aDNA using a damage-aware masking strategy — 13% lower loss than standard training. Biosecurity CNN (AUC 0.934) flags reconstructed sequences against known pathogenic profiles.

Python DNABERT-2 PyTorch HuggingFace ESMFold

02 2026

GenAI-EDA Benchmark

Evaluation framework for 3 LLMs writing Verilog HDL across 50 tasks with a 5-metric scoring system. Auto-repair pipeline feeds simulation failures back with structured error context — lifting pass rate from 0% → 51.8% across 1,610 runs.

Python Ollama Verilog Verilator Docker

03 2025

Brain Tumor Detection

Dual-model MRI pipeline — EfficientNetB0 classifier (99.75% AUC) followed by Faster RCNN localization (93.3% detection accuracy). Grad-CAM pseudo-labels eliminate manual annotation across 2,783 augmented images.

TensorFlow PyTorch EfficientNetB0 Faster RCNN Streamlit

04 2026

ANVIL

Autonomous ML red-teaming — 8-phase pipeline that attacks any PyTorch model, clusters failure modes with UMAP + HDBSCAN, explains with Gemini 2.5 Flash + RAG, patches autonomously, and ships a PDF audit report. Zero human decisions.

PyTorch LangGraph FAISS Gemini FastAPI

Tech Stack

Languages

Python C++ SQL

ML / DL

PyTorch TensorFlow Scikit-learn HuggingFace Pandas NumPy

Gen AI & LLMs

LangChain LangGraph RAG Pipelines Ollama

MLOps & Cloud

Docker Kubernetes AWS SageMaker CI/CD Git Linux/Bash

Data & Tools

Matplotlib FastAPI Flask Streamlit

Academic Background

B.Tech, Computer Science & Engineering (Data Science)

SVKM's NMIMS — Hyderabad

Jul 2023 — May 2027 CGPA 7.1 / 10 · Sem 5 GPA 3.14 / 4.0

Achievements & Certifications

2025

1st Place — NMIMS Hackathon NMTF

Team of 6 · 24-hour sprint

2026

Open Source Contributor — ML4SCI / DeepLense

3 PRs submitted · gravitational lensing ML

2024

CS50's Introduction to AI with Python

Harvard / edX

2024

Python for Data Science

NPTEL · IIT Madras

2024

Introduction to Machine Learning

NPTEL · IIT Kharagpur

Let's Build
Something Together

Open to collaborations, research opportunities, and interesting problems.

Driven by curiosity.Powered by data.

In Production

AI Backend Engineering Intern

Selected Work

VESTIGE

GenAI-EDA Benchmark

Brain Tumor Detection

ANVIL

Tech Stack

Languages

ML / DL

Gen AI & LLMs

MLOps & Cloud

Data & Tools

Academic Background

B.Tech, Computer Science & Engineering (Data Science)

Achievements & Certifications

1st Place — NMIMS Hackathon NMTF

Open Source Contributor — ML4SCI / DeepLense

CS50's Introduction to AI with Python

Python for Data Science

Introduction to Machine Learning

Let's BuildSomething Together

ANGSHUMAN

CHAKRAVERTTY

Driven by curiosity.
Powered by data.

Let's Build
Something Together