The Evolved Transformer – Enhancing Transformer with Neural Architecture Search

Neural architecture search (NAS) is the process of algorithmically searching for new designs of neural networks. Though researchers have developed sophisticated architectures over the years, the ability to find the most efficient ones is limited, and recently NAS has reached the point where it can outperform human-designed models. A new paper by Google Brain presents […]

XLM – Enhancing BERT for Cross-lingual Language Model

Attention models, and BERT in particular, have achieved promising results in Natural Language Processing, in both classification and translation tasks. A new paper by Facebook AI, named XLM, presents an improved version of BERT to achieve state-of-the-art results in both types of tasks. XLM uses a known pre-processing technique (BPE) and a dual-language training mechanism […]

Transformer-XL – Combining Transformers and RNNs Into a State-of-the-art Language Model

Language modeling has become an important NLP technique thanks to the ability to apply it to various NLP tasks, such as machine translation and topic classification. Today, there are two leading architectures for language modeling – Recurrent Neural Networks (RNNs) and Transformers. While the former handles the input tokens – words or characters – one […]

Multilingual Sentence Embeddings for Zero-Shot Transfer – Applying a Single Model on 93 Languages

Language models and transfer learning have become one of the cornerstones of NLP in recent years. Phenomenal results were achieved by first building a model of words or even characters, and then using that model to solve other tasks such as sentiment analysis, question answering and others. While most of the models were built for […]

Kuzushiji-MNIST – Japanese Literature Alternative Dataset for Deep Learning Tasks

MNIST, a dataset with 70,000 labeled images of handwritten digits, has been one of the most popular datasets for image processing and classification for over twenty years. Despite its popularity, contemporary deep learning algorithms handle it easily, often surpassing an accuracy result of 99.5%. A new paper introduces Kuzushiji-MNIST, an alternative dataset which is more […]

HMTL – Multi-task Learning for solving NLP Tasks

The field of Natural Language Processing includes dozens of tasks, among them machine translation, named-entity recognition, and entity detection. While the different NLP tasks are often trained and evaluated separately, there exists a potential advantage in combining them into one model, i.e., learning one task might be helpful in learning another task and improve its […]

BERT – State of the Art Language Model for NLP

BERT (Bidirectional Encoder Representations from Transformers) is a recent paper published by researchers at Google AI Language. It has caused a stir in the Machine Learning community by presenting state-of-the-art results in a wide variety of NLP tasks, including Question Answering (SQuAD v1.1), Natural Language Inference (MNLI), and others. BERT’s key technical innovation is applying […]