Building speech enhancement systems and training models at scale. IIT Roorkee. 25,000+ views on Medium.
First principles, pen & paper, Jupyter notebooks — before frameworks.
I document everything I learn — from paper breakdowns to model internals. See my research.
Research Scientist at Invideo with hands-on experience in audio signal processing, speech enhancement, and large-scale distributed training. Built and pre-trained GenHencer from scratch, achieving strong quantitative and perceptual results on industry benchmarks.
Deeply interested in the latest developments in ASR (Whisper, Wav2Vec2, HuBERT) and TTS systems. Strong foundation in mathematics, ML, and Generative AI with in-depth knowledge of LLMs. Published 4 articles in Towards Data Science with 25,000+ views.
B.Tech · JEE Advanced AIR 6851 · 2021 – 2025
I go deep into the papers I study — custom diagrams, full pipeline breakdowns, and training analysis.
21-page deep dive into the SyncNet paper with custom diagrams, full inference pipeline walkthrough with exact tensor shapes, training analysis with W&B logs, and dataloader bottleneck identification. From architecture to contrastive loss to confidence scores.
Structured repository covering audio fundamentals (sound physics, Fourier transforms, aliasing, reverb), paper breakdowns (HuBERT, Wav2Vec2, Whisper, SyncNet), Descript Audio Codec documentation, spectral & time-domain analysis notebooks, and an audio model visualizer.
videoEra/ ├── docs/ │ ├── audio_fundamentals/ │ │ ├── aliasing.md │ │ ├── fourier_intuition.md │ │ ├── sound_physics_and_perception.md │ │ └── sound_reverb.md │ ├── audio_codecs/ │ │ ├── README.md │ │ └── descript_audio_codec.md │ └── papers/ │ ├── hubert.pdf │ ├── syncnet.pdf │ ├── wav2vec2.pdf │ └── whisper.pdf ├── notebooks/ │ ├── spectral_analysis/ │ └── time_domain/ ├── projects/ │ ├── audio_model_visualizer/ │ └── syncnet/ └── tutorials/
76 pages of handwritten notes covering the full audio preprocessing pipeline from first principles — sound physics, sampling, Fourier transforms, STFT, filter banks, and mel spectrograms. Written while building deep domain expertise during research work at Invideo.
GitHub Issues Semantic Search · 2023
Concept-based issue retrieval using NLP embeddings and FAISS. Robust data pipeline via GitHub REST API with selective comment filtering. Contextual chatbot powered by Google Gemini 1.5 with RAG for conversational, issue-focused solutions.
Watch Demo25,000+ views — 4 publications in Towards Data Science + self-hosted technical blog

An intuitive deep dive — from winding machines and the centre of mass to frequency decomposition of sound.

Why spinning wheels go backward, why cheap recordings sound harsh, and why it all traces back to the Nyquist theorem.

Full technical breakdown of audio-visual synchronisation — how SyncNet learns to match lip movements with speech.

Why we divide by n-1 for sample variance — the mathematical proof and intuition behind this statistical correction.
Four codec architectures from first principles with the first systematic evaluation on five Indian languages.

From winding machines to the centre of mass — building deep intuition for frequency decomposition.

From first principles — why spinning wheels go backward and why it traces back to one elegant rule.
A story about sound waves, compression, and the invisible machinery behind every song you've ever streamed.

Deep dive into Retrieval-Augmented Generation — how retrieval enhances language model generation.

The fundamental concept of MLE — a cornerstone in parameter estimation for machine learning.
Secured Top 20 among 72 teams (15,000+ registrations) at PIWOT 2025, organized by PAN IIT Alumni Association in Mumbai, representing IIT Roorkee.
Gold Medalist in Basketball (Men) Inter-Hostel (10 Teams) Sports General Championship, IIT Roorkee.
Open to research collaborations, full-time opportunities in AI/ML, or just a conversation about what's next in audio ML and LLMs.
Say Hello