Cirriculum Vitae

Summary

As a computational biologist and software leader, I empower precision medicine at scale by bridging data science, software engineering, genomics, and artificial intelligence. My focus is on building robust infrastructure and state-of-the-art analytical tools to translate complex biomedical data into actionable insights, particularly in cancer research.

Currently, as Director of Research Informatics at Tempus AI (since March 2024), I lead my team in developing scalable genomic data analysis software, infrastructure (including the “tempusverse” R packages), and AI-driven models (like RAG based models, evaluations) that accelerate the delivery of precision medicine. I also provide expert consulting services in bioinformatics, data science, software/platform development, AI, and cloud solutions. Feel free to reach out to me if you are interested.

My expertise spans R and Python development, AI application (Gen AI/ML), cloud architecture (AWS/GCP/Azure), containerization (Docker/K8s), and high-dimensional genomic data analysis at scale. I have a passion for open-source software and an interdisciplinary background encompassing bioinformatics, computational biology, computer science, and statistics. Previously, I played key roles in open source teams, scaling R/Bioconductor-based genomic data science on the cloud as a core team member of the Bioconductor Project (DFCI/HMS & Roswell Park) and enhancing R integration for the Galaxy Project (Johns Hopkins).

Education

Carnegie Mellon University, USA

  • Master of Science, Computational Biology
  • School of Computer Science
  • 2011 - 2013

SRM University, India

  • Bachelor of Technology, Bioinformatics
  • School of Bioengineering
  • 2007 - 2011

Experience

Tempus AI

Director, Research Informatics, Mar 2024 - Present

Principal Scientist, Translational Research, Aug 2022 - Mar 2024

I joined Tempus AI as a Principal Scientist and earned a promotion to Director of Research Informatics in March 2024. In my current Director role, I built and now lead a Research Informatics team of five, dedicated to scaling translational research capabilities through advanced software and infrastructure. A major focus is leading the R&D and implementation for the “tempusverse,” a comprehensive suite of over 12 R packages used internally and externally for analyzing complex multimodal cancer data. My team also develops infrastructure for efficient package distribution and creates interactive Shiny dashboards for data exploration. Furthermore, I spearhead R&D initiatives for applying Generative AI in translational research, including developing RAG-based models to assist users with code generation and understanding our analytical software.

As Principal Scientist, I established myself as an R expert within Tempus, contributing significantly to internal R package development and advising teams on best practices while contributing to the Tempus Lens platform’s R&D. I designed, implemented, and managed a critical internal analytical platform on Google Cloud Platform, serving over 200 users; this involved creating custom Docker images for R, setting up internal package distribution, maintaining compute resources, and performing technical audits. I actively collaborated cross-functionally with Engineering, DevOps, Legal, and Security Ops to enhance product development cycles and liaised directly with Google Cloud partners to co-develop custom Generative AI models as code assistants in R and Python.

Bioconductor Project (Dana Farber Cancer Institute & Roswell Park Comprehensive Cancer Center)

Scientist II (DFCI / Harvard Medical School), Dec 2020 - Aug 2022

As Scientist II within the Bioconductor team at DFCI, my work centered on enhancing cloud-native capabilities, particularly for the AnVIL Project. I designed and implemented a Kubernetes-based infrastructure on Google Cloud to reliably build and deploy Bioconductor Docker containers, enabling scalable computation for researchers. These containers, featuring an RStudio front-end optimized for cloud scalability, incorporated a significant innovation I spearheaded: the distribution of pre-compiled binary packages. We reduced Bioconductor package installation times by an estimated 7-8x through binary installations. This binary distribution capability is now widely used by R developers globally to accelerate analysis, CI/CD processes for building and checking packages. These Docker images have garnered over 1 million downloads on DockerHub. I also focused on improving cross-language integration within Bioconductor, specifically enhancing standards for R/Python interoperability and enabling the use of deep learning Python modules on GPU-enabled VMs.

Senior Programmer Analyst (Roswell Park), Nov 2016 - Dec 2020

As a core team member of the Bioconductor project, I pioneered early solutions for cloud-based computation with Bioconductor on Google Cloud and Microsoft Azure. I designed and implemented foundational Docker containers incorporating an RStudio front-end, significantly boosting community adoption for HPC and educational purposes. During this period, I authored key R packages to enhance parallel computing access, including the BiocParallel interface (using Batchtools). Crucially, I spearheaded and managed the complex, multi-stage transition of Bioconductor’s core version control system from SVN to GIT, establishing the necessary private server infrastructure, access protocols, and documentation to improve open-source community collaboration. I also implemented and maintained other core infrastructure components (like Git hooks, Docker registry management, GCP resources) and supported the bi-annual release cycle.

Galaxy Project, Johns Hopkins University

Software Engineer, Galaxy Core Team, Oct 2014 - Nov 2016

As a Software Engineer and core team member with the Galaxy Project, I focused on enhancing its capabilities for bioinformatics analysis, particularly with R and cloud environments. I improved the integration of R packages within Galaxy by implementing extensions to ‘Planemo’ (command line utilities to assist Galaxy development), enabling Bioconductor tools to be incorporated more systematically and at scale. My work also involved adding tools for large-scale analysis of methylation data to the platform. To facilitate wider adoption and ease of use, I improved the automated deployment of Galaxy instances across multiple cloud services (AWS, GCE, OpenStack) by developing robust Ansible playbooks. Furthermore, I integrated an RStudio interactive environment directly into Galaxy, empowering users with exploratory data analysis capabilities within the platform.

Johns Hopkins School of Medicine

Senior Research Data Analyst, Aug 2013 - Oct 2014

In my role as a Senior Research Data Analyst, I supported large-scale cancer genomics and public health research. I developed and implemented bioinformatics workflows for analyzing datasets from The Cancer Genome Atlas (TCGA), covering methylation, RNA sequencing, and microarray data. This involved managing complex jobs, optimizing bioinformatic pipelines, and efficiently utilizing resources on a local high-performance computing (HPC) cluster using Sun Grid Engine (SGE). I communicated my research findings by creating Rmarkdown reports and data visualizations for multiple studies using the R langauge.

Skills

I am constantly learning and adapting to new technologies and methodologies. While my skillset is broad, encompassing experience across multiple languages, tools, and platforms (too many to list exhaustively!), the following highlights some of my key areas of technical expertise:

  • Scripting and Programming Languages: Python, R, Shell, SQL, C/C++
  • AI Tools: TensorFlow, PyTorch, Scikit-learn
  • AI Libraries/Frameworks: LangChain, LlamaIndex
  • Technologies: AWS, Azure, CI/CD, Docker, Git, Google Cloud Platform, Kubernetes