seeking good explanations with machine learning


senior researcher at microsoft research (deep learning group) ; phd from berkeley (with prof. bin yu)

Research

Some areas I'm currently excited about. If you want to chat about research or are interested in interning at MSR, feel free to reach out over email :)
🔎 Interpretability methods, especially LLM interpretability.

augmented imodels - use LLMs to build a transparent model
attention steering - mechanistically guide LLMs by emphasizing specific input spans
explanation penalization - regularize explanations to align models with prior knowledge
adaptive wavelet distillation - replace neural nets with transparent wavelet models
🧠 Semantic brain mapping, mostly using fMRI responses to language.

explanation-mediated validation - test fMRI explanations using LLM-generated stimuli
qa embeddings - predict fMRI language responses by asking yes/no questions to LLMs
summarize & score explanations - generate natural-language explanations of fMRI encoding models
💊 Clinical decision rules, can we improve them with data?

greedy tree sums - build accurate, compact tree-based clinical models
clinical self-verification - self-verification improves performance and interpretability of clinical information extraction
clinical rule vetting - stress testing a clinical decision instrument performance for intra-abdominal injury
clinical rule bias assessment - evaluating bias in the development of popular clinical decision instruments
Note: I put a lot of my code into the imodels and imodelsX packages.

year title authors tags paper code misc
'25 Systematic Bias in Clinical Decision Instrument Development obra, singh, et al. 🔎💊 medrxiv
'25 Analyzing patient perspectives with llms kornblith*, singh* et al. 💊🌀 nature scientific reports
'25 Simplifying DINO via Coding Rate Regularization wu et al. 🌀 arxiv
'25 Vector-ICL: In-context Learning with Continuous Vector Representations zhuang et al. 🔎🌀 iclr
'24 A generative framework to bridge data-driven models and scientific theories in language neuroscience antonello*, singh*, jain, hsu, gao, yu, & huth 🧠🔎🌀 arxiv
'24 Crafting Interpretable Embeddings by Asking LLMs Questions benara*, singh*, morris, antonello, stoica, huth, & gao 🧠🔎🌀 neurips
'24 Interpretable Language Modeling via Induction-head Ngram Models kim*, mantena*, et al. 🧠🔎🌀 arxiv
'24 Rethinking Interpretability in the Era of Large Language Models singh, inala, galley, caruana, & gao 🔎🌀 arxiv
'24 Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning chen et al. 🔎🌀 COLING
'24 Learning a Decision Tree Algorithm with Transformers zhuang et al. 🔎🌀🌳 tmlr
'24 Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering zhang*, yu*, et al. 🔎🌀 arxiv
'24 Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs zhang et al. 🔎🌀 iclr
'24 Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries gero et al. 🔎🌀 ml4h findings
'23 Tree Prompting morris*, singh*, rush, gao, & deng 🔎🌀🌳 emnlp
'23 Augmenting Interpretable Models with LLMs during Training singh, askari, caruana, & gao 🔎🌀🌳 nature communications
'23 Explaining black box text modules in natural language with language models singh*, hsu*, antonello, jain, huth, yu & gao 🔎🌀 neurips workshop
'23 Self-Verification Improves Few-Shot Clinical Information Extraction gero*, singh*, cheng, naumann, galley, gao, & poon 🔎🌀💊 icml workshop
'22 Explaining patterns in data with language models via interpretable autoprompting singh*, morris*, aneja, rush, & gao 🔎🌀 emnlp workshop
'22 Stress testing a clinical decision instrument performance for intra-abdominal injury kornblith*, singh* et al. 🔎🌳💊 PLOS digital health
'22 Fast interpretable greedy-tree sums (FIGS) tan*, singh*, nasseri, agarwal, & yu 🔎🌳 pnas
'22 Hierarchical shrinkage for trees agarwal*, tan*, ronen, singh, & yu 🔎🌳 icml (spotlight)
'22 VeridicalFlow: a python package for building trustworthy data science pipelines with PCS duncan*, kapoor*, agarwal*, singh*, & yu 💻🔍 joss
'21 imodels: a python package for fitting interpretable models singh*, nasseri*, et al. 💻🔍🌳 joss
'21 Adaptive wavelet distillation from neural networks through interpretations ha, singh, et al. 🔍🌀🌳 neurips
'21 Matched sample selection with GANs for mitigating attribute confounding singh, balakrishnan, & perona 🌀 cvpr workshop
'21 Revisiting complexity and the bias-variance tradeoff dwivedi*, singh*, yu & wainwright 🌀 jmlr
'20 Curating a COVID-19 data repository and forecasting county-level death counts in the United States altieri et al. 🔎🦠 hdsr
'20 Transformation importance with applications to cosmology singh*, ha*, lanusse, boehm, liu & yu 🔎🌀🌌 iclr workshop (spotlight)
'20 Interpretations are useful: penalizing explanations to align neural networks with prior knowledge rieger, singh, murdoch & yu 🔎🌀 icml
'19 Hierarchical interpretations for neural network predictions Singh*, Murdoch*, & Yu 🔍🌀 ICLR
'19 interpretable machine learning: definitions, methods, and applications Murdoch*, Singh*, et al. 🔍🌳🌀 pnas
'19 disentangled attribution curves for interpreting random forests and boosted trees devlin, singh, murdoch & yu 🔍🌳 arxiv
'18 large scale image segmentation with structured loss based deep learning for connectome reconstruction Funke*, Tschopp*, et al. 🧠🌀 TPAMI
'18 linearization of excitatory synaptic integration at no extra cost Morel, Singh, & Levy 🧠 J Comp Neuro
'17 a consensus layer V pyramidal neuron can sustain interpulse-interval coding Singh & Levy 🧠 Plos One
'17 a constrained, weighted-l1 minimization approach for joint discovery of heterogeneous neural connectivity graphs Singh, Wang, & Qi 🧠 neurips Workshop ,

resources + posts



Notes in machine learning / neuroscience.

Mini personal projects. There's also some dumb stuff here.

experience


I've been lucky to work with many amazing people & help advise some incredible students

Advisors / managers
Scientific collaborators