Masked Autoencoder Distribution Estimator (MADE) (Deepmind & Iain Murray) [3] masks the autoencoder's parameters to respect autoregressive properties that each input only reconstructed from previous input in a given ordering. The implied data distribution isn't normalized. MADE: Masked Autoencoder for Distribution Estimation. Our method masks the autoencoder's parameters to respect autoregressive constraints: each input is reconstructed only from previous inputs in a given ordering. the authors propose a simple yet effective method to pretrain large vision models (here ViT Huge). This work introduces a simple modification for autoencoder neural networks that yields powerful generative models that is significantly faster and scales better than other autoregressive estimators. Complete code is stored in accompanying github repository. Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Inspired from the pretraining algorithm of BERT (Devlin et al., 2022) are a nascent set of methods based on a mask-and-reconstruct training mechanism. This repository is for the original Theano implementation. units: Python int scalar representing the dimensionality of the output space. params: integer specifying the number of parameters to output per input. A autoregressively masked dense layer. Dependencies: python = 2.7; numpy >= 1.9.1; scipy >= 0.14; theano >= 0.9 