SHAKIR MOHAMEDML
.01

ABOUT ME

I am a senior research scientist working in statistical machine learning and artificial intelligence at Google DeepMind, London, where we  work towards the goal of developing intelligent and general-purpose learning systems. 

I am most interested in research that combines multiple scientific disciplines and views of computational and machine learning problems. Much of my current focus is on the interface between probabilistic reasoning, deep learning and reinforcement learning, and how the computational solutions that emerge in this space can be used for systems and agent-based decision-making. I love exploring and writing about the connections between different computational paradigms and maintain a blog at blog.shakirm.com.

Before moving to London, I held a Junior Research Fellowship from the Canadian Institute for Advanced Research (CIFAR) as part of the programme on Neural Computation and Adaptive Perception. I was based in Vancouver at the University of British Columbia in the Laboratory for Computational Intelligence (LCI) with Nando de Freitas.

I completed my PhD with Zoubin Ghahramani at the University of Cambridge, where I was a Commonwealth Scholar to the United Kingdom and a member of St John's College. I am from South Africa, and completed my previous degrees in Electrical and Information Engineering at the University of the Witwatersrand, Johannesburg.
 
 
.02

PUBLICATIONS

01 Dec 2016

Unsupervised Learning of 3D Structure from Images

NIPS 2016

A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. ...

Conferences Danilo J. Rezende, Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess

Unsupervised Learning of 3D Structure from Images

Danilo J. Rezende, Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess Conferences

Unsupervised Learning of 3D Structure from Images

A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet [2], and establish the first benchmarks in the literature. We also show how these models and their inference networks can be trained end-to-end from 2D images. This demonstrates for the first time the feasibility of learning to infer 3D representations of the world in a purely unsupervised manner.

17 Jun 2016

Early Visual Concept Learning with Unsupervised Deep Learning

Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. ...

Conferences Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, Alexander Lerchner

Early Visual Concept Learning with Unsupervised Deep Learning

Irina Higgins, Loic Matthey, Xavier Glorot, Arka Pal, Benigno Uria, Charles Blundell, Shakir Mohamed, Alexander Lerchner Conferences

Early Visual Concept Learning with Unsupervised Deep Learning

Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors. Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of “objectness”.

05 Jun 2016

One-Shot Generalization in Deep Generative Models

ICML 2016

Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. ...

Conferences Danilo Jimenez Rezende, Shakir Mohamed, Ivo Danihelka, Karol Gregor, Daan Wierstra

One-Shot Generalization in Deep Generative Models

Danilo Jimenez Rezende, Shakir Mohamed, Ivo Danihelka, Karol Gregor, Daan Wierstra Conferences

One-Shot Generalization in Deep Generative Models

Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples— having seen new examples just once—providing an important class of general-purpose models for one-shot machine learning.

07 Dec 2015

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

NIPS 2015

The mutual information is a core statistical quantity that has applications in all areas of machine learning, whether this is in training of density models over multiple data modalities, in maximising the efficiency of noisy transmission channels, or when learning behaviour policies for exploration by artificial agents. Most learning algorithms that involve optimisation of the mutual information rely on the Blahut-Arimoto algorithm --- an enumerative algorithm with exponential complexity that is not suitable for modern machine learning applications. This paper provides a new approach for scalable optimisation of the mutual information ...

Conferences Shakir Mohamed, Danilo Jimenez Rezende

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

Shakir Mohamed, Danilo Jimenez Rezende Conferences

Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

The mutual information is a core statistical quantity that has applications in all areas of machine learning, whether this is in training of density models over multiple data modalities, in maximising the efficiency of noisy transmission channels, or when learning behaviour policies for exploration by artificial agents. Most learning algorithms that involve optimisation of the mutual information rely on the Blahut-Arimoto algorithm --- an enumerative algorithm with exponential complexity that is not suitable for modern machine learning applications. This paper provides a new approach for scalable optimisation of the mutual information by merging techniques from variational inference and deep learning. We develop our approach by focusing on the problem of intrinsically-motivated learning, where the mutual information forms the definition of a well-known internal drive known as empowerment. Using a variational lower bound on the mutual information, combined with convolutional networks for handling visual input streams, we develop a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions.

04 Jul 2015

A Statistical View of Deep Learning,

Technical Note

I’ve taken to writing this series of posts on a statistical view of deep learning with two principal motivations in mind. The first was as a personal exercise to make concrete and to test the limits of the way that I think about and use deep learning in my every day work. The second, was to highlight important statistical connections and implications of deep learning that I have not seen made in the popular courses, reviews and books on deep learning, but which are extremely important to keep in mind.

Technical Note/Essays Shakir Mohamed

A Statistical View of Deep Learning,

Shakir Mohamed Technical Note/Essays
05 Jun 2015

Variational Inference with Normalizing Flows

ICML 2015

The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. ...

Conferences Danilo Jimenez Rezende, Shakir Mohamed

Variational Inference with Normalizing Flows

Danilo Jimenez Rezende, Shakir Mohamed Conferences

Variational Inference with Normalizing Flows

The choice of approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.

01 Feb 2015

A Tutorial on Variational Inference for Machine Learning

Tutorial

This tutorial is meant as a broad introduction to modern approaches for approximate, large-scale inference and reasoning in probabilistic models. It is designed to be of interest to both new and experienced researchers in machine learning, statistics and engineering and is intended to leave everyone with an understanding of an invaluable tool for probabilistic inference and its connections to a broad range of fields, such as Bayesian analysis, deep learning, information theory, and statistical mechanics. ...

Tutorial Shakir Mohamed

A Tutorial on Variational Inference for Machine Learning

Shakir Mohamed Tutorial

A Tutorial on Variational Inference for Machine Learning

Variational inference is one of the tools that now lies at the heart of the modern data analysis lifecycle. Variational inference is the term used to encompass approximation techniques for the solution of intractable integrals and complex distributions and operates by transforming the hard problem of integration into one of optimisation. As a result, using variational inference we are now able to derive algorithms that allow us to apply increasingly complex probabilistic models to ever larger data sets on ever more powerful computing resources.

This tutorial is meant as a broad introduction to modern approaches for approximate, large-scale inference and reasoning in probabilistic models. It is designed to be of interest to both new and experienced researchers in machine learning, statistics and engineering and is intended to leave everyone with an understanding of an invaluable tool for probabilistic inference and its connections to a broad range of fields, such as Bayesian analysis, deep learning, information theory, and statistical mechanics.

The tutorial will begin by motivating probabilistic data analysis and the problem of inference for statistical applications, such as density estimation, missing data imputation and model selection, and for industrial problems in search and recommendation, text mining and community discovery. We will then examine importance sampling as one widely-used Monte Carlo inference mechanism and from this begin our journey towards the variational approach for inference. The principle of variational inference and basic tools from variational calculus will be introduced, as well as the class of latent Gaussian models that will be used throughout the tutorial as a running example. Using this foundation, we shall discuss different approaches for approximating posterior distributions, the smorgasbord of techniques for optimising the variational objective function, a discussion of implementation and largescale applications, a brief look at the available theory for variational methods, and an overview of other variational problems in machine learning and statistics.

04 Dec 2014

Semi-supervised Learning with Deep Generative Models

NIPS 2014

The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. ...

Conferences Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling

Semi-supervised Learning with Deep Generative Models

Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, Max Welling Conferences

Semi-supervised Learning with Deep Generative Models

The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.

05 Jun 2014

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

ICML 2014

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. ...

Conferences Selected Danilo Jimenez Rezende, Shakir Mohamed, Daan Wierstra

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

Danilo Jimenez Rezende, Shakir Mohamed, Daan Wierstra Conferences Selected

Stochastic Backpropagation and Approximate Inference in Deep Generative Models

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent approximate posterior distributions, and that acts as a stochastic encoder of the data. We develop stochastic back-propagation -- rules for back-propagation through stochastic variables -- and use this to develop an algorithm that allows for joint optimisation of the parameters of both the generative and recognition model. We demonstrate on several real-world data sets that the model generates realistic samples, provides accurate imputations of missing data and is a useful tool for high-dimensional data visualisation.

02 Mar 2014

Bayesian Approaches for Sparse Latent Variable Models: Reconsidering L1 Sparity.

Practical Applications in Sparse Modelling

Chapter 10 in I. Rish, G. A. Cecchi, A. Lozano and A. Niculescu-Mizil, eds, 'Practical Applications in Sparse Modelling', MIT Press, 2014.

Book Chapters Shakir Mohamed, Katherine Heller, Zoubin Ghahramani

Bayesian Approaches for Sparse Latent Variable Models: Reconsidering L1 Sparity.

Shakir Mohamed, Katherine Heller, Zoubin Ghahramani Book Chapters
01 Jul 2013

A Simple and General Exponential Family Framework for Partial Membership and Factor Analysis

Handbook of Mixed-Membership Models and their Applications

In E. M. Airoldi, D. Blei, E. A. Erosheva and S. E. Fienberg, eds, 'Handbook of Mixed-Membership Models and their Applications'. CRC Press.

Book Chapters Shakir Mohamed, Katherine Heller, Zoubin Ghahramani

A Simple and General Exponential Family Framework for Partial Membership and Factor Analysis

Shakir Mohamed, Katherine Heller, Zoubin Ghahramani Book Chapters

A Simple and General Exponential Family Framework for Partial Membership and Factor Analysis

We show how mixture models, partial membership models, factor analysis, and their extensions to more general mixed-membership models, can be unified under a simple framework using the exponential family of distributions and variations in the prior assumptions on the latent variables that are used. We describe two models within this common latent variable framework: a Bayesian partial membership model and a Bayesian exponential family factor analysis model. Accurate inferences can be achieved within this framework that allow for prediction, missing value imputation, and data visualisation, and importantly, allow us to make a broad range of insightful probabilistic queries of our data. We emphasise the adaptability and flexibility of these models for a wide range of tasks, characteristics that will continue to see such models used at the core of modern data analysis paradigms.

18 Jun 2013

Adaptive Hamiltonian and Riemann Monte Carlo Samplers

ICML 2013

In this paper we address the widely-experienced difficulty in tuning Hamiltonian-based Monte Carlo samplers. We develop an algorithm that allows for the adaptation of Hamiltonian and Riemann manifold Hamiltonian Monte Carlo samplers using Bayesian optimization that allows for infinite adaptation of the parameters of these samplers. ...

Conferences Ziyu wang, Shakir Mohamed, Nando de Freitas

Adaptive Hamiltonian and Riemann Monte Carlo Samplers

Ziyu wang, Shakir Mohamed, Nando de Freitas Conferences

Adaptive Hamiltonian and Riemann Monte Carlo Samplers

In this paper we address the widely-experienced difficulty in tuning Hamiltonian-based Monte Carlo samplers. We develop an algorithm that allows for the adaptation of Hamiltonian and Riemann manifold Hamiltonian Monte Carlo samplers using Bayesian optimization that allows for infinite adaptation of the parameters of these samplers. We show that the resulting sampling algorithms are ergodic, and that the use of our adaptive algorithms makes it easy to obtain more efficient samplers, in some cases precluding the need for more complex solutions. Hamiltonian-based Monte Carlo samplers are widely known to be an excellent choice of MCMC method, and we aim with this paper to remove a key obstacle towards the more widespread use of these samplers in practice.

Link to Paper

04 Dec 2012

Fast Bayesian Inference for Non-conjugate Gaussian Process Regression

NIPS 2012

We present a new variational inference algorithm for Gaussian process regression with non-conjugate likelihood functions, with application to a wide array of problems including binary and multi-class classification, and ordinal regression. Our method constructs a concave lower bound that is optimized using an efficient fixed-point updating algorithm. ...

Conferences Selected Emtiyaz Khan, Shakir Mohamed, Kevin P. Murphy

Fast Bayesian Inference for Non-conjugate Gaussian Process Regression

Emtiyaz Khan, Shakir Mohamed, Kevin P. Murphy Conferences Selected

Fast Bayesian Inference for Non-conjugate Gaussian Process Regression

We present a new variational inference algorithm for Gaussian process regression with non-conjugate likelihood functions, with application to a wide array of problems including binary and multi-class classification, and ordinal regression. Our method constructs a concave lower bound that is optimized using an efficient fixed-point updating algorithm. We show that the new algorithm has highly competitive computational complexity, matching that of alternative approximate inference methods. We also prove that the use of concave variational bounds provides stable and guaranteed convergence – a property not available to other approaches. We show empirically for both binary and multi-class classification that our new algorithm converges much faster than existing variational methods, and without any degradation in performance.

04 Dec 2007

Expectation Propagation in Gaussian Process Dynamical Systems

NIPS 2012

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc scelerisque urna in ipsum iaculis aliquam. In vestibulum lacus a leo tincidunt commodo. Ut nec lorem scelerisque, aliquet nisi

Conferences Marc P. Deisenroth and Shakir Mohamed

Expectation Propagation in Gaussian Process Dynamical Systems

Marc P. Deisenroth and Shakir Mohamed Conferences

Expectation Propagation in Gaussian Process Dynamical Systems

Rich and complex time-series data, such as those generated from engineering systems, financial markets, videos or neural recordings, are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems (GPDS) as a rich model class that is appropriate for such analysis. In particular, we present a message passing algorithm for approximate inference in GPDSs based on expectation propagation. By posing inference as a general message passing problem, we iterate forward-backward smoothing. Thus, we obtain more accurate posterior distributions over latent structures, resulting in improved predictive performance compared to state-of-the-art GPDS smoothers, which are special cases of our general message passing algorithm. Hence, we provide a unifying approach within which to contextualize message passing in GPDSs.

10 Aug 2012

Large-scale Approximate Bayesian Inference for Exponential Family Latent Gaussian Models

ISBA 2012

Conference of the International Society for Bayesian Analysis (ISBA), June 2012.

Conferences Emtiyaz Khan, Shakir Mohamed, Kevin P. Muprhy

Large-scale Approximate Bayesian Inference for Exponential Family Latent Gaussian Models

Emtiyaz Khan, Shakir Mohamed, Kevin P. Muprhy Conferences

Large-scale Approximate Bayesian Inference for Exponential Family Latent Gaussian Models

10 Jun 2012

Bayesian and L1 Approaches for Sparse Unsupervised Learning

ICML 2012

The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically under-performs in terms of predictive performance when compared to other methods for inferring sparsity. ...

Conferences Selected Shakir Mohamed, Katherine Heller, Zoubin Ghahramani

Bayesian and L1 Approaches for Sparse Unsupervised Learning

Shakir Mohamed, Katherine Heller, Zoubin Ghahramani Conferences Selected

Bayesian and L1 Approaches for Sparse Unsupervised Learning

The use of L1 regularisation for sparse learning has generated immense research interest, with many successful applications in diverse areas such as signal acquisition, image coding, genomics and collaborative filtering. While existing work highlights the many advantages of L1 methods, in this paper we find that L1 regularisation often dramatically under-performs in terms of predictive performance when compared to other methods for inferring sparsity. We focus on unsupervised latent variable models, and develop L1 minimising factor models, Bayesian variants of “L1”, and Bayesian models with a stronger L0-like sparsity induced through spike-and-slab distributions. These spikeand-slab Bayesian factor models encourage sparsity while accounting for uncertainty in a principled manner, and avoid unnecessary shrinkage of non-zero values. We demonstrate on a number of data sets that in practice spike-and-slab Bayesian methods outperform L1 minimisation, even on a computational budget. We thus highlight the need to re-assess the wide use of L1 methods in sparsity-reliant applications, particularly when we care about generalising to previously unseen data, and provide an alternative that, over many varying conditions, provides improved generalisation performance.

08 Apr 2012

A stick-breaking likelihood for categorical data analysis with latent Gaussian models

AISTATS 2012

The development of accurate models and efficient algorithms for the analysis of multivariate categorical data are important and longstanding problems in machine learning and computational statistics. In this paper, we focus on modeling categorical data using Latent Gaussian Models (LGMs). We propose a novel stick-breaking likelihood function for categorical LGMs that ...

Conferences Emtiyaz Khan, Shakir Mohamed, Ben M. Marlin and Kevin P. Murphy

A stick-breaking likelihood for categorical data analysis with latent Gaussian models

Emtiyaz Khan, Shakir Mohamed, Ben M. Marlin and Kevin P. Murphy Conferences

A Stick-Breaking Likelihood for Categorical Data Analysis with Latent Gaussian Models

The development of accurate models and effi- cient algorithms for the analysis of multivariate categorical data are important and longstanding problems in machine learning and computational statistics. In this paper, we focus on modeling categorical data using Latent Gaussian Models (LGMs). We propose a novel stick-breaking likelihood function for categorical LGMs that exploits accurate linear and quadratic bounds on the logistic log-partition function, leading to an effective variational inference and learning framework. We thoroughly compare our approach to existing algorithms for multinomial logit/probit likelihoods on several problems, including inference in multinomial Gaussian process classification and learning in latent factor models. Our extensive comparisons demonstrate that our stick-breaking model effectively captures correlation in discrete data and is well suited for the analysis of categorical data.

08 Apr 2012

On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models

AISTATS 2012

This paper studies issues relating to the parameterization of probability distributions over binary data sets. Several such parameterizations of models for binary data are known, including the Ising, generalized Ising, canonical and full parameterizations. We also discuss a parameterization that we call the “spectral parameterization”, which has received significantly less coverage in existing literature. We provide ...

Conferences David Buchman, Mark Schmidt, Shakir Mohamed, David Poole, Nando de Freitas

On Sparse, Spectral and Other Parameterizations of Binary Probabilistic Models

David Buchman, Mark Schmidt, Shakir Mohamed, David Poole, Nando de Freitas Conferences

This paper studies issues relating to the parameterization of probability distributions over binary data sets. Several such parameterizations of models for binary data are known, including the Ising, generalized Ising, canonical and full parameterizations. We also discuss a parameterization that we call the “spectral parameterization”, which has received significantly less coverage in existing literature. We provide this parameterization with a spectral interpretation by casting loglinear models in terms of orthogonal WalshHadamard harmonic expansions. Using various standard and group sparse regularizers for structural learning, we provide a comprehensive theoretical and empirical comparison of these parameterizations. We show that the spectral parameterization, along with the canonical, has the best performance and sparsity levels, while the spectral does not depend on any particular reference state. The spectral interpretation also provides a new starting point for analyzing the statistics of binary data sets; we measure the magnitude of higher order interactions in the underlying distributions for several data sets

05 Feb 2011

Generalised Bayesian Matrix Factorisation Models

University of Cambridge, 2011

Factor analysis and related models for probabilistic matrix factorisation are of central importance to the unsupervised analysis of data, with a colourful history more than a century long. Probabilistic models for matrix factorisation allow us to explore the underlying structure in data, and have relevance in a vast number of application areas including collaborative filtering, source separation, missing data imputation, gene expression analysis, information retrieval, computational finance and computer vision, amongst others. This thesis develops generalisations of matrix factorisation models that advance our understanding and enhance the applicability of this important class of models.

Theses Selected Shakir Mohamed

Generalised Bayesian Matrix Factorisation Models

Shakir Mohamed Theses Selected

Factor analysis and related models for probabilistic matrix factorisation are of central importance to the unsupervised analysis of data, with a colourful history more than a century long. Probabilistic models for matrix factorisation allow us to explore the underlying structure in data, and have relevance in a vast number of application areas including collaborative filtering, source separation, missing data imputation, gene expression analysis, information retrieval, computational finance and computer vision, amongst others. This thesis develops generalisations of matrix factorisation models that advance our understanding and enhance the applicability of this important class of models.

The generalisation of models for matrix factorisation focuses on three concerns: widening the applicability of latent variable models to the diverse types of data that are currently available; considering alternative structural forms in the underlying representations that are inferred; and including higher order data structures into the matrix factorisation framework. These three issues reflect the reality of modern data analysis and we develop new models that allow for a principled exploration and use of data in these settings. We place emphasis on Bayesian approaches to learning and the advantages that come with the Bayesian methodology. Our port of departure is a generalisation of latent variable models to members of the exponential family of distributions. This generalisation allows for the analysis of data that may be real-valued, binary, counts, non-negative or a heterogeneous set of these data types. The model unifies various existing models and constructs for unsupervised settings, the complementary framework to the generalised linear models in regression.

Moving to structural considerations, we develop Bayesian methods for learning sparse latent representations. We define ideas of weakly and strongly sparse vectors and investigate the classes of prior distributions that give rise to these forms of sparsity, namely the scale-mixture of Gaussians and the spike-and-slab distribution. Based on these sparsity favouring priors, we develop and compare methods for sparse matrix factorisation and present the first comparison of these sparse learning approaches. As a second structural consideration, we develop models with the ability to generate correlated binary vectors. Moment-matching is used to allow binary data with specified correlation to be generated, based on dichotomisation of the Gaussian distribution. We then develop a novel and simple method for binary PCA based on Gaussian dichotomisation. The third generalisation considers the extension of matrix factorisation models to multi-dimensional arrays of data that are increasingly prevalent. We develop the first Bayesian model for non-negative tensor factorisation and explore the relationship between this model and the previously described models for matrix factorisation.

10 Dec 2010

Sparse Exponential Family Latent Variable Models

NIPS Workshop on Sparsity, 2010


Conferences Shakir Mohamed, Katherine Heller and Zoubin Ghahramani

Sparse Exponential Family Latent Variable Models

Shakir Mohamed, Katherine Heller and Zoubin Ghahramani Conferences

NIPS Workshop on Sparsity, 2010.

11 Dec 2009

Large Scale Non-parametric Inference: Data Parallelisation in the Indian Buffet Process

NIPS 2009

Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, the high-dimensional averages required for Bayesian methods can be slow, especially with the unbounded representations used by nonparametric models. We address the challenge of scaling Bayesian inference to the increasingly large datasets found in real-world applications. We focus on parallelisation of inference in the Indian Buffet Process (IBP), which allows data points to have an unbounded number of sparse latent features. Our novel MCMC sampler divides a large data set between multiple processors and uses message passing to compute the global likelihoods and posteriors. This algorithm, the first parallel inference scheme for IBP-based models, scales to datasets orders of magnitude larger than have previously been possible.

Conferences Finale Doshi-Velez, David Knowles, Shakir Mohamed and Zoubin Ghahramani.

Large Scale Non-parametric Inference: Data Parallelisation in the Indian Buffet Process

Finale Doshi-Velez, David Knowles, Shakir Mohamed and Zoubin Ghahramani. Conferences

Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, the high-dimensional averages required for Bayesian methods can be slow, especially with the unbounded representations used by nonparametric models. We address the challenge of scaling Bayesian inference to the increasingly large datasets found in real-world applications. We focus on parallelisation of inference in the Indian Buffet Process (IBP), which allows data points to have an unbounded number of sparse latent features. Our novel MCMC sampler divides a large data set between multiple processors and uses message passing to compute the global likelihoods and posteriors. This algorithm, the first parallel inference scheme for IBP-based models, scales to datasets orders of magnitude larger than have previously been possible.

Paper Link

01 Jun 2009

PROBABILISTIC NON-NEGATIVE TENSOR FACTORIZATION USING MARKOV CHAIN MONTE CARLO

EUSIPCO 2009

We present a probabilistic model for learning non-negative tensor factorizations (NTF), in which the tensor factors are latent variables associated with each data dimension. The non-negativity constraint for the latent factors is handled by choosing priors with support on the non-negative numbers. Two Bayesian inference procedures based on Markov chain Monte Carlo sampling are described: Gibbs sampling and Hamiltonian Markov chain Monte Carlo. We evaluate the model on two food science data sets, and show that the probabilistic NTF model leads to better predictions and avoids overfitting compared to existing NTF approaches.

Conferences Mikkel N. Schmidt and Shakir Mohamed

PROBABILISTIC NON-NEGATIVE TENSOR FACTORIZATION USING MARKOV CHAIN MONTE CARLO

Mikkel N. Schmidt and Shakir Mohamed Conferences

European Signal Processing, 2009

We present a probabilistic model for learning non-negative tensor factorizations (NTF), in which the tensor factors are latent variables associated with each data dimension. The non-negativity constraint for the latent factors is handled by choosing priors with support on the non-negative numbers. Two Bayesian inference procedures based on Markov chain Monte Carlo sampling are described: Gibbs sampling and Hamiltonian Markov chain Monte Carlo. We evaluate the model on two food science data sets, and show that the probabilistic NTF model leads to better predictions and avoids overfitting compared to existing NTF approaches.

06 Dec 2008

Bayesian Exponential Family PCA.

NIPS 2008

Principal Components Analysis (PCA) has become established as one of the key tools for dimensionality reduction when dealing with real valued data. Approaches such as exponential family PCA and non-negative matrix factorisation have successfully extended PCA to non-Gaussian data types, but these techniques fail to take advantage of Bayesian inference and can suffer from problems of over-fitting and poor generalisation. This paper presents a fully probabilistic approach to PCA, which is generalised to the exponential family, based on Hybrid Monte Carlo sampling. We describe the model which is based on a factorisation of the observed data matrix, and show performance of the model on both synthetic and real data.

Conferences Selected Shakir Mohamed, Katherine Heller and Zoubin Ghahramani

Bayesian Exponential Family PCA.

Shakir Mohamed, Katherine Heller and Zoubin Ghahramani Conferences Selected

Neural Information Processing Systems (NIPS 2008)

Principal Components Analysis (PCA) has become established as one of the key tools for dimensionality reduction when dealing with real valued data. Approaches such as exponential family PCA and non-negative matrix factorisation have successfully extended PCA to non-Gaussian data types, but these techniques fail to take advantage of Bayesian inference and can suffer from problems of over- fitting and poor generalisation. This paper presents a fully probabilistic approach to PCA, which is generalised to the exponential family, based on Hybrid Monte Carlo sampling. We describe the model which is based on a factorisation of the observed data matrix, and show performance of the model on both synthetic and real data.

Paper Link

.03

TALKS AND TUTORIALS

Projects number 4
DLSummerSchool_Aug2016.001
TUTORIAL

Building Machines that Imagine and Reason

DLSummerSchool_Aug2016.001

Building Machines that Imagine and Reason

 Building Machines that Imagine and Reason: Principles and Applications of Deep Generative Models

Deep generative models provide a solution to the problem of unsupervised learning, in which a machine learning system is required to discover the structure hidden within unlabelled data streams. Because they are generative, such models can form a rich imagery the world in which they are used: an imagination that can harnessed to explore variations in data, to reason about the structure and behaviour of the world, and ultimately, for decision-making. This tutorial looks at how we can build machine learning systems with a capacity for imagination using deep generative models, the types of probabilistic reasoning that they make possible, and the ways in which they can be used for decision making and acting.

Deep generative models have widespread applications including those in density estimation, image de-noising and in-painting, data compression, scene understanding, representation learning, 3D scene construction, semi-supervised classification, and hierarchical control, amongst many others. After exploring these applications, we'll sketch a landscape of generative models, drawing-out three groups of models: fully-observed models, transformation models, and latent variable models. Different models require different principles for inference and we'll explore the different options available. Different combinations of model and inference give rise to different algorithms, including auto-regressive distribution estimators, variational auto-encoders, and generative adversarial networks. Although we will emphasise deep generative models, and the latent-variable class in particular, the intention of the tutorial will be to explore the general principles, tools and tricks that can be used throughout machine learning. These reusable topics include Bayesian deep learning, variational approximations, memoryless and amortised inference, and stochastic gradient estimation. We'll end by highlighting the topics that were not discussed, and imagine the future of generative models.

memoryBased
TALK

Memory-based Bayesian Reasoning and Deep Learning

memoryBased

Memory-based Bayesian Reasoning and Deep Learning

Deep learning and Bayesian machine learning are currently two of the most active areas of machine learning research. Deep learning provides a powerful class of models and an easy framework for learning that now provides state-of-the-art methods for applications ranging from image classification to speech recognition. Bayesian reasoning provides a powerful approach for knowledge integration, inference, and decision making that has established it as the key tool for data-efficient learning, uncertainty quantification and robust model composition, widely-used in applications ranging from information retrieval to large-scale ranking. Each of these research areas has shortcomings that can be effectively addressed by the other, pointing towards a needed convergence of these two areas of machine learning and one that enhances our machine learning practice.

One powerful outcome of this convergence is our ability to develop systems for probabilistic inference with memory. A memory-based inference amortises the cost of probabilistic reasoning by cleverly reusing prior computations. To explore this, we shall take a statistical tour of deep learning, re-examine latent variable models and approximate Bayesian inference, and make connections to de-noising auto-encoders and other stochastic encoder-decoder systems. In this way, we will make sense of what memory in inference might mean, and highlight the use of amortised inference in many other parts of machine learning.

VItutorial
TUTORIAL

Tutorial on Variational Inference for Machine Learning

VItutorial

Tutorial on Variational Inference for Machine Learning

Variational inference is one of the tools that now lies at the heart of the modern data analysis lifecycle. Variational inference is the term used to encompass approximation techniques for the solution of intractable integrals and complex distributions and operates by transforming the hard problem of integration into one of optimisation. As a result, using variational inference we are now able to derive algorithms that allow us to apply increasingly complex probabilistic models to ever larger data sets on ever more powerful computing resources.

This tutorial is meant as a broad introduction to modern approaches for approximate, large-scale inference and reasoning in probabilistic models. It is designed to be of interest to both new and experienced researchers in machine learning, statistics and engineering and is intended to leave everyone with an understanding of an invaluable tool for probabilistic inference and its connections to a broad range of fields, such as Bayesian analysis, deep learning, information theory, and statistical mechanics.

The tutorial will begin by motivating probabilistic data analysis and the problem of inference for statistical applications, such as density estimation, missing data imputation and model selection, and for industrial problems in search and recommendation, text mining and community discovery. We will then examine importance sampling as one widely-used Monte Carlo inference mechanism and from this begin our journey towards the variational approach for inference. The principle of variational inference and basic tools from variational calculus will be introduced, as well as the class of latent Gaussian models that will be used throughout the tutorial as a running example. Using this foundation, we shall discuss different approaches for approximating posterior distributions, the smorgasbord of techniques for optimising the variational objective function, a discussion of implementation and large-scale applications, a brief look at the available theory for variational methods, and an overview of other variational problems in machine learning and statistics.

Link to slides

bayesDeep
TALK

Bayesian Reasoning and Deep Learning

bayesDeep

Bayesian Reasoning and Deep Learning

Deep learning and Bayesian machine learning are currently two of the most active areas of machine learning research. Deep learning provides a powerful class of models and an easy framework for learning that now provides state-of-the-art methods for applications ranging from image classification to speech recognition. Bayesian reasoning provides a powerful approach for information integration, inference and decision making that has established it as the key tool for data-efficient learning, uncertainty quantification and robust model composition that is widely used in applications ranging from information retrieval to large-scale ranking. Each of these research areas has shortcomings that can be effectively addressed by the other, pointing towards a needed convergence of these two areas of machine learning; the complementary aspects of these two research areas is the focus of this talk. Using the tools of auto-encoders and latent variable models, we shall discuss some of the ways in which our machine learning practice is enhanced by combining deep learning with Bayesian reasoning. This is an essential, and ongoing, convergence that will only continue to accelerate and provides some of the most exciting prospects, some of which we shall discuss, for contemporary machine learning research.

Link to slides

.04

BLOG

I love writing and exploring the connections between different research areas.
Read my machine learning blog at:

blog.shakirm.com

typew1280

.05

News and Activities

  • August 2016: I’ll be speaking at the NIPS 2016 Workshop on Bayesian Deep Learning.
  • August 2016: I’ll be an Area Chair for ICLR 2017.
  • August 2016: I gave a tutorial at the 2016 Deep Learning Summer School in Montreal on ‘Building Machines that Imagine and Reason: Principles and Applications of Deep Generative Models’.
  • June 2016: With David Blei and Rajesh Ranganath, we’ll be giving a tutorial at NIPS 2016 entitled ‘Variational Inference: Foundations and Modern Methods’.
  • March 2016: I am one of the organisers of the ICML 2016 Workshop on Data-efficient Machine Learning.
  • August 2015: I am one of the organisers of the the NIPS 2015 Workshop on Advances in Approximate Inference.
  • January 2015: I’ll be an Area Chair for NIPS 2015.
  • September 2014: We’ll be at NIPS to discuss our paper on ‘Semi-supervised learning with Deep Generative Models’.
  • August 2014: I am one of the organisers of the NIPS 2014 Workshop on Advances in Variational Inference.
  • June 2014: I gave a talk at ICML in Beijing on our paper on ‘Stochastic Backpropagation and Approximate Inference in Deep Generative Models’.
.06

CONTACT

paperplanePlease get in touch.

E-mail: shakir.mohamed -at- gmail.com

Twitter: @shakir_za