The Evolved Transformer – Enhancing Transformer with Neural Architecture Search

Neural architecture search (NAS) is the process of algorithmically searching for new designs of neural networks. Though researchers have developed sophisticated architectures over the years, the ability to find the most efficient ones is limited, and recently NAS has reached the point where it can outperform human-designed models. A new paper by Google Brain presents […]

XLM – Enhancing BERT for Cross-lingual Language Model

Attention models, and BERT in particular, have achieved promising results in Natural Language Processing, in both classification and translation tasks. A new paper by Facebook AI, named XLM, presents an improved version of BERT to achieve state-of-the-art results in both types of tasks. XLM uses a known pre-processing technique (BPE) and a dual-language training mechanism […]

Transformer-XL – Combining Transformers and RNNs Into a State-of-the-art Language Model

Language modeling has become an important NLP technique thanks to the ability to apply it to various NLP tasks, such as machine translation and topic classification. Today, there are two leading architectures for language modeling – Recurrent Neural Networks (RNNs) and Transformers. While the former handles the input tokens – words or characters – one […]

InstaGAN – Instance-aware Image-to-image Translation – Using GANs for Object Transfiguration

Generative Adversarial Networks (GANs) have been used for many image processing tasks, among them, generating images from scratch (Style-based GANs) and applying new styles to images. A new paper, named InstaGAN, presents an innovative use of GANs – transfiguring instances of a given object in an image into another object while preserving the rest of […]

Multilingual Sentence Embeddings for Zero-Shot Transfer – Applying a Single Model on 93 Languages

Language models and transfer learning have become one of the cornerstones of NLP in recent years. Phenomenal results were achieved by first building a model of words or even characters, and then using that model to solve other tasks such as sentiment analysis, question answering and others. While most of the models were built for […]

Style-based GANs – Generating and Tuning Realistic Artificial Faces

Generative Adversarial Networks (GAN) are a relatively new concept in Machine Learning, introduced for the first time in 2014. Their goal is to synthesize artificial samples, such as images, that are indistinguishable from authentic images. A common example of a GAN application is to generate artificial face images by learning from a dataset of celebrity […]

Kuzushiji-MNIST – Japanese Literature Alternative Dataset for Deep Learning Tasks

MNIST, a dataset with 70,000 labeled images of handwritten digits, has been one of the most popular datasets for image processing and classification for over twenty years. Despite its popularity, contemporary deep learning algorithms handle it easily, often surpassing an accuracy result of 99.5%. A new paper introduces Kuzushiji-MNIST, an alternative dataset which is more […]

Go-Explore – RL Algorithm for Exploration Problems – Solving Montezuma’s Revenge

Uber has released a blog post describing Go-Explore, a new Reinforcement Learning (RL) algorithm to deal with hard exploration problems. These problems are characterized by the lack (or sparsity) of external feedback, which makes it difficult for the algorithm to learn how to operate. One of the popular testbeds for RL algorithms is Atari Games, […]

HMTL – Multi-task Learning for solving NLP Tasks

The field of Natural Language Processing includes dozens of tasks, among them machine translation, named-entity recognition, and entity detection. While the different NLP tasks are often trained and evaluated separately, there exists a potential advantage in combining them into one model, i.e., learning one task might be helpful in learning another task and improve its […]

Curiosity-Driven Learning – Exploration By Random Network Distillation

In recent years, Reinforcement Learning has proven itself to be a powerful technique for solving closed tasks with constant rewards, most commonly games. A major challenge in the field remains training a model when external feedback (reward) to actions is sparse or nonexistent. Recent models have tried to overcome this challenge by creating an intrinsic […]