# Publications

## The Vendi Score: A Diversity Evaluation Metric for Machine Learning

Dan Friedman, Adji Bousso Dieng

Diversity is an important criterion for many areas of machine learning (ML), including generative modeling and dataset curation. Yet little work has gone into understanding, formalizing, and measuring diversity in ML. In this paper, we address the diversity evaluation problem by proposing the Vendi Score, which connects and extends ideas from ecology and quantum statistical mechanics to ML. The Vendi Score is defined as the exponential of the Shannon entropy of the eigenvalues of a similarity matrix. This matrix is induced by a user-defined similarity function applied to the sample to be evaluated for diversity. In taking a similarity function as input, the Vendi Score enables its user to specify any desired form of diversity. Importantly, unlike many existing metrics in ML, the Vendi Score doesn't require a reference dataset or distribution over samples or labels, it is therefore general and applicable to any generative model, decoding algorithm, and dataset from any domain where similarity can be defined. We showcased the Vendi Score on molecular generative modeling, a domain where diversity plays an important role in enabling the discovery of novel molecules. We found that the Vendi Score addresses shortcomings of the current diversity metric of choice in that domain. We also applied the Vendi Score to generative models of images and decoding algorithms of text and found it confirms known results about diversity in those domains. Furthermore, we used the Vendi Score to measure mode collapse, a known limitation of generative adversarial networks (GANs). In particular, the Vendi Score revealed that even GANs that capture all the modes of a labeled dataset can be less diverse than the original dataset. Finally, the interpretability of the Vendi Score allowed us to diagnose several benchmark ML datasets for diversity, opening the door for diversity-informed data augmentation.

## Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients

Kyurae Kim, Jisu Oh, Jacob R. Gardner, Adji Bousso Dieng, Hongseok Kim

Minimizing the inclusive Kullback-Leibler divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods--which we collectively refer to as Markov chain score ascent (MCSA) methods--can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.

## Consistency Regularization for Variational Auto-Encoders

Samarth Sinha, Adji Bousso Dieng

Variational auto-encoders (VAEs) are a powerful approach to unsupervised learning. However the encoder of a VAE has the undesirable property that it maps a given observation and a semantics-preserving transformation of it to different latent representations. This "inconsistency" of the encoder lowers the quality of the learned representations, especially for downstream tasks, and also negatively affects generalization. In this paper, we propose a simple and generic regularization method to enforce consistency in VAEs. The method yields state-of-the-art performance on several image benchmarks.

## Deep Probabilistic Graphical Modeling

Adji Bousso Dieng

My thesis work leveraged deep learning to make probabilistic graphical modeling more powerful and more flexible. The thesis won the Savage Award for Applied Methodology.

## Quantitative Nanoinfrared Spectroscopy of Anisotropic van der Waals Materials

Francesco L. Ruta*, Aaron J. Sternbach, Adji Bousso Dieng, Alexander S. McLeod, and D. N. Basov

Anisotropic dielectric tensors of uniaxial van der Waals (vdW) materials are difficult to investigate at infrared frequencies. The small dimensions of high-quality exfoliated crystals prevent the use of diffraction-limited spectroscopies. Near-field microscopes coupled to broadband lasers can function as Fourier transform infrared spectrometers with nanometric spatial resolution (nano-FTIR). Although dielectric functions of isotropic materials can be readily extracted from nano-FTIR spectra, the in- and out-of-plane permittivities of anisotropic vdW crystals cannot be easily distinguished. For thin vdW crystals residing on a substrate, nano-FTIR spectroscopy probes a combination of sample and substrate responses. We exploit the information in the screening of substrate resonances by vdW crystals to demonstrate that both the in and out-of-plane dielectric permittivities are identifiable for realistic spectra. This novel method for the quantitative nanoresolved characterization of optical anisotropy was used to determine the dielectric tensor of a bulk 2H-WSe2 microcrystal in the mid-infrared.

## The Dynamic Embedded Topic Model

Adji Bousso Dieng*, Francisco R. J. Ruiz*, David M. Blei

An extension of the Embedded Topic Model to corpora with temporal dependencies. The DETM models each word with a categorical distribution whose parameter is given by the inner product between the word embedding and an embedding representation of its assigned topic at a particular time step. The word embeddings allow the DETM to generalize to rare words. The DETM learns smooth topic trajectories by defining a random walk prior over the embeddings of the topics. The DETM is fit using structured amortized variational inference with LSTMs.

## Prescribed Generative Adversarial Networks

Adji Bousso Dieng, Francisco J. R. Ruiz, David M. Blei, Michalis K. Titsias

This paper describes a solution to two important problems in the GAN literature: (1) How can we maximize the entropy of the generator of a GAN to prevent mode collapse? (2) How can we evaluate predictive log-likelihood for GANs to assess how they generalize to new data? Key ingredients: noise, entropy regularization, and Hamiltonian Monte Carlo.

## Reweighted Expectation Maximization

Adji Bousso Dieng, John Paisley

Maximum likelihood in deep generative models is hard. The typical workaround is variational inference (VI) which maximizes a lower bound to the log marginal likelihood of the data. VI introduces an undesirable amortization gap and often causes latent variable collapse. We propose to use expectation maximization (EM) instead. Importantly, we separate posterior inference and model fitting. To fit the model we leverage moment matching to learn rich proposals to estimate the EM objective. Posterior inference is done after the model is fitted. This two-step procedure shies away from the current VAE approach of bundling together model fitting and posterior inference. Turns out EM learns better deep generative models than VI as measured by predictive log-likelihood.

## Topic Modeling in Embedding Spaces

Adji Bousso Dieng, Francisco J. R. Ruiz, David M. Blei

Define words and topics in the same embedding space. Form a generative model of documents that defines the likelihood of a word as a Categorical whose natural parameter is the dot product between the word embedding and its assigned topic's embedding. The resulting Embedded Topic Model (ETM) learns interpretable topics and word embeddings and is robust to large vocabularies that include rare words and stop words.

## Avoiding Latent Variable Collapse With Generative Skip Models

Adji Bousso Dieng, Yoon Kim, Alexander M. Rush, David M. Blei

One of the current staples of unsupervised representation learning is variational autoencoders (VAEs). However they suffer from a problem known as "latent variable collapse". Our paper proposes a simple solution that relies on skip connections. This solution leads to the Skip-VAE--a deep generative model that avoids latent variable collapse. The decoder of a Skip-VAE is a neural network whose hidden states--at every layer--condition on the latent variables. This results in a stronger dependence between observations and their latents and therefore avoids latent variable collapse.

## Noisin: Unbiased Regularization for Recurrent Neural Networks

Adji Bousso Dieng, Rajesh Ranganath, Jaan Altosaar, David M. Blei

Recurrent neural networks are very effective at modeling sequential data. However they tend to have very high capacity and overfit very easily. We propose a new regularization method called Noisin. Noisin relies on the notion of "unbiased" noise injection. Noisin is an explicit regularizer--it's objective function can be decomposed as the original objective for the deterministic RNN and a non-negative data-dependent term. Noisin significantly outperforms Dropout on both the Penn TreeBank and the Wikitext-2 datasets on a language modeling task.

## Augment and Reduce: Stochastic Inference for Large Categorical Distributions

Francisco J. R. Ruiz, Michalis K. Titsias, Adji Bousso Dieng, David M. Blei

Categorical distributions are ubiquitous in Statistics and Machine Learning. One wide parameterization of a categorical distribution is the softmax. However softmax does not scale well when there are many categories. We propose a method called A&R that scales learning with categorical distributions. A&R is built on two ideas: latent variable augmentation and stochastic variational expectation maximization.

## Readmission prediction via deep contextual embedding of clinical concepts

Cao Xiao, Tengfei Ma, Adji Bousso Dieng, David M. Blei, Fei Wang

Hospital readmissions are avoidable and can cost a lot of money. Excessive hospital readmissions could also be harmful to patients. Accurate prediction of hospital readmission can effectively help reduce the readmission risk associated with costs and patient well-being. However, the complex relationship between readmission and potential risk factors makes readmission prediction a difficult task. The main goal of this paper is to explore deep learning models to distill such complex relationships and make accurate predictions. We leverage TopicRNN to learn interpretable patient representations that are helpful in predicting hospital readmission.

## Variational Inference via χ-Upper Bound Minimization

Adji Bousso Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei

Variational inference is an efficient approach for estimating posterior distributions. It consists in positing a family of distributions and finding the distribution in this family that better approximates the true posterior. The criterion for learning is a divergence measure. The most used divergence is the Kullback-Leibler (KL) divergence. However minimizing the KL leads to approximations that underestimate posterior uncertainty. Our paper proposes the Chi-divergence for variational inference. This divergence leads to an upper bound of the model evidence (called CUBO) and overdispersed posterior approximations. CUBO can be used alongside the usual ELBO to sandwich-estimate the model evidence.

## TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

Adji Bousso Dieng, Chong Wang, Jianfeng Gao, John Paisley

One challenge in modeling sequential data with RNNs is the inability to capture long-term dependencies. In natural language these long-term dependencies come in the form of semantic dependencies. TopicRNN is a deep generative model of language that marries RNNs and topic models to capture long-term dependencies. The RNN component of the model captures syntax while the topic model component captures semantic. The topic model and the RNN parameters are learned jointly using amortized variational inference.

## Edward: A library for probabilistic modeling, inference, and criticism

Dustin Tran, Alp Kucukelbir, Adji Bousso Dieng, Maja Rudolph, Dawen Liang, David M. Blei

A tensorflow-based library for probabilistic programming.