Exploring the Lottery Ticket Hypothesis

Pruning is a well known Machine Learning technique in which unnecessary weights are removed from a neural network model after training. In some cases, pruning can reduce model sizes by more than 90% without compromising on model accuracy while potentially offering a significant reduction in inference memory usage (see some great examples here). In 2018, […]

Attention Augmented Convolutional Networks

Convolutional neural networks have proven to be a powerful tool for image recognition, allowing for ever-improving results in image classification (ImageNet), object detection (COCO), and other tasks. Despite their success, convolutions are limited by their locality, i.e. their inability to consider relations between different areas of an image. On the other hand, a popular mechanism […]

TossingBot – Teaching Robots to Throw Objects Accurately

One of the most well-known challenges in robotics is ‘picking’, i.e. using a robotic claw to lift a single object, usually from a cluttered 3-dimensional pile of objects. Picking an object from a pile and moving it to a destination can be a useful capability in many real-world situations but human experience shows that in […]

The Evolved Transformer – Enhancing Transformer with Neural Architecture Search

Neural architecture search (NAS) is the process of algorithmically searching for new designs of neural networks. Though researchers have developed sophisticated architectures over the years, the ability to find the most efficient ones is limited, and recently NAS has reached the point where it can outperform human-designed models. A new paper by Google Brain presents […]

BagNet – Solving ImageNet with a Simple Bag-of-features Model

Prior to 2012, most machine learning algorithms were statistical models which used hand-created features. The models were highly explainable and somewhat effective but failed to reach a high accuracy in many language and computer vision tasks. In 2012, AlexNet, a deep neural network model, won the 2012 ImageNet competition by a large margin, and ignited […]

XLM – Enhancing BERT for Cross-lingual Language Model

Attention models, and BERT in particular, have achieved promising results in Natural Language Processing, in both classification and translation tasks. A new paper by Facebook AI, named XLM, presents an improved version of BERT to achieve state-of-the-art results in both types of tasks. XLM uses a known pre-processing technique (BPE) and a dual-language training mechanism […]

Identifying and Correcting Label Bias in Machine Learning

As machine learning (ML) becomes more effective and widespread it is becoming more prevalent in systems with real-life impact, from loan recommendations to job application decisions. With the growing usage comes the risk of bias – biased training data could lead to biased ML algorithms, which in turn could perpetuate discrimination and bias in society. […]

Transformer-XL – Combining Transformers and RNNs Into a State-of-the-art Language Model

Language modeling has become an important NLP technique thanks to the ability to apply it to various NLP tasks, such as machine translation and topic classification. Today, there are two leading architectures for language modeling – Recurrent Neural Networks (RNNs) and Transformers. While the former handles the input tokens – words or characters – one […]

InstaGAN – Instance-aware Image-to-image Translation – Using GANs for Object Transfiguration

Generative Adversarial Networks (GANs) have been used for many image processing tasks, among them, generating images from scratch (Style-based GANs) and applying new styles to images. A new paper, named InstaGAN, presents an innovative use of GANs – transfiguring instances of a given object in an image into another object while preserving the rest of […]

Multilingual Sentence Embeddings for Zero-Shot Transfer – Applying a Single Model on 93 Languages

Language models and transfer learning have become one of the cornerstones of NLP in recent years. Phenomenal results were achieved by first building a model of words or even characters, and then using that model to solve other tasks such as sentiment analysis, question answering and others. While most of the models were built for […]