Papers to Read

The 30 Machine Learning White Papers AI suggested I Read

"Attention is All You Need" by Vaswani et al. (2017)

The Transformer, a novel neural network architecture, is presented in this 2017 research paper by Vaswani et al. The Transformer model eliminates the need for recurrent or convolutional layers commonly used in sequence processing tasks, and instead relies solely on multi-head self-attention mechanisms. The paper also includes a comprehensive evaluation of the Transformer model on machine translation and language modeling tasks, demonstrating that it outperforms existing state-of-the-art models. The paper's findings have had significant impact on natural language processing and have inspired further research in the area of self-attention mechanisms.

"Generative Adversarial Networks" by Goodfellow et al. (2014)

The 2014 paper by Goodfellow et al. introduces Generative Adversarial Networks (GANs), which are a class of deep learning models that learn to generate new data samples that are similar to a given dataset. GANs consist of a generator model and a discriminator model that are trained simultaneously using an adversarial training process. The generator learns to generate data samples that can fool the discriminator, while the discriminator learns to distinguish between real and fake data samples. The paper includes a theoretical analysis of GANs and provides examples of their applications, such as image generation and data augmentation. GANs have become a popular and powerful tool in the field of deep learning and have been used in various applications including image and speech synthesis, data generation, and anomaly detection.

"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift" by Ioffe and Szegedy (2015)

In 2015, Ioffe and Szegedy introduced Batch Normalization, a technique that improves the training of deep neural networks by reducing the internal covariate shift problem. Internal covariate shift occurs when the distribution of input to a layer changes during training, making it difficult to optimize the network. Batch Normalization normalizes the inputs to each layer by subtracting the batch mean and dividing by the batch standard deviation. The paper shows that Batch Normalization improves the performance of deep neural networks on image classification tasks and enables the use of higher learning rates during training. Batch Normalization has become a standard technique in the training of deep neural networks and is used in various applications such as image recognition, natural language processing, and speech recognition.

"ResNet: Deep Residual Learning for Image Recognition" by He et al. (2015)

The 2015 paper by He et al. introduces ResNet, a deep neural network architecture that employs residual connections to address the problem of vanishing gradients in very deep networks. Residual connections allow information to be directly propagated from one layer to another, bypassing the intermediate layers. This helps to alleviate the vanishing gradient problem and enables the training of very deep neural networks with hundreds of layers. The paper shows that ResNet achieves state-of-the-art performance on various image recognition tasks, such as the ImageNet classification task. ResNet has had a significant impact on the field of deep learning and has been widely adopted in various applications such as object recognition, image segmentation, and speech recognition.

"Dropout: A Simple Way to Prevent Neural Networks from Overfitting" by Srivastava et al. (2014)
"R-CNN: Regions with CNN features" by Girshick et al. (2014)
"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" by Ren et al. (2015)
"You Only Look Once: Unified, Real-Time Object Detection" by Redmon et al. (2015)
"ImageNet Classification with Deep Convolutional Neural Networks" by Krizhevsky et al. (2012)
"Convolutional Neural Networks for Sentence Classification" by Kim (2014)
"Deep Learning for NLP (without Magic)" by Collobert and Weston (2008)
"Dynamic Routing Between Capsules" by Sabour et al. (2017)
"A Critical Review of Recurrent Neural Networks for Sequence Learning" by Cho et al. (2014)
"LSTM: A Search Space Odyssey" by Greff et al. (2017)
"Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" by Dai et al. (2019)
"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" by Devlin et al. (2018)
"GPT-3: Language Models are Few-Shot Learners" by Brown et al. (2020)
"Megatron: A Multi-Billion Parameter Language Model by NVIDIA" by Shoeybi et al. (2020)
"The Illustrated Transformer" by Vaswani et al. (2017)
"ELMo: Deep contextualized word representations" by Peters et al. (2018)
"Universal Language Model Fine-tuning for Text Classification" by Howard and Ruder (2018)
"Generative Pretraining from Pixels" by Brock et al. (2018)
"Variational Autoencoders" by Kingma and Welling (2014)
"Reinforcement Learning with Deep Neural Networks" by Mnih et al. (2015)
"Playing Atari with Deep Reinforcement Learning" by Mnih et al. (2013)
"Human-level control through deep reinforcement learning" by Mnih et al. (2015)
"Deep Reinforcement Learning Hands-On" by Maxim Lapan (2018)
"Reinforcement Learning: An Introduction" by Richard S. Sutton and Andrew G. Barto (2018)
"AlphaGo: Using Machine Learning, Monte Carlo Tree Search, and Deep Neural Networks to Play the Board Game Go" by Silver et al. (2016)
"DeepMind and Blizzard OpenAI join forces to develop AI for games" (2017)