- 【博客】Understanding the new Google Translate
Google launched a new version of the Translate in September 2016. Since then, there have been a few interesting developments in the project, and this post attempts to explain it all in as simple terms as possible.
The earlier version of the Translate used Phrase-based Machine Translation, or PBMT. What PBMT does is break up an input sentence into a set of words/phrases and translate each one individually. This is obviously not an optimal strategy, since it completely misses out on the context of the overall sentence. The new Translate uses what Google calls *Google Neural Machine Translation (*GNMT**), an improvement over a traditional version of NMT. Lets see how GNMT works on a high-level:
2.【博客 & 代码】Self-Organizing Maps with Google’s TensorFlow
A Self-Organizing Map, or SOM, falls under the rare domain of unsupervised learning in Neural Networks. Its essentially a grid of neurons, each denoting one cluster learned during training. Traditionally speaking, there is no concept of neuron ‘locations’ in ANNs. However, in an SOM, each neuron has a location, and neurons that lie close to each other represent clusters with similar properties. Each neuron has a weightage vector, which is equal to the centroid of its particular cluster.
3.【博客】Simple Beginner’s guide to Reinforcement Learning & its implementation
One of the most fundamental question for scientists across the globe has been – “How to learn a new skill?”. The desire to understand the answer is obvious – if we can understand this, we can enable human species to do things we might not have thought before. Alternately, we can train machines to do more “human” tasks and create true artificial intelligence.
While we don’t have a complete answer to the above question yet, there are a few things which are clear. Irrespective of the skill, we first learn by interacting with the environment. Whether we are learning to drive a car or whether it an infant learning to walk, the learning is based on the interaction with the environment. Learning from interaction is the foundational underlying concept for all theories of learning and intelligence.
4.【论文】Revisiting Visual Question Answering Baselines
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict an answer. This paper questions the value of these common practices and develops a simple alternative model based on binary classification. Instead of treating answers as competing choices, our model receives the answer as input and predicts whether or not an image-question-answer triplet is correct. We evaluate our model on the Visual7W Telling and the VQA Real Multiple Choice tasks, and find that even simple versions of our model perform competitively. Our best model achieves state-of-the-art performance on the Visual7W Telling task and compares surprisingly well with the most complex systems proposed for the VQA Real Multiple Choice task. We explore variants of the model and study its transferability between both datasets. We also present an error analysis of our model that suggests a key problem of current VQA systems lies in the lack of visual grounding of concepts that occur in the questions and answers. Overall, our results suggest that the performance of current VQA systems is not significantly better than that of systems designed to exploit dataset biases.
5.【Tutorial & 代码】Introduction to Natural Language Processing with fastText
Natural Language Processing (NLP) is one of the hottest areas in machine learning. Its global purpose is to understand language the way humans do. NLP subareas include machine translation, text classification, speech recognition, sentiment analysis, question answering, text-to-speech, etc.
As in most areas of Machine Learning, NLP accuracy has improved considerably thanks to deep learning. Just to highlight the most recent and impressive achievement, in October 2016 Microsoft Research reached human parity in speech recognition. For that milestone, they used a combination of Convolutional Neural Networks and LSTM networks.
However, not all machine learning is deep learning, and in this notebook I would like to highlight a great example. In the summer of 2016, two interesting NLP papers were published by Facebook Research, Bojanowski et al., 2016 and Joulin et al., 2016. The first one proposed a new method for word embedding and the second one a method for text classification. The authors also opensourced a C++ library with the implementation of these methods, fastText, that rapidly attracted a lot of interest.
The reason for this interest is that fastText obtains an accuracy in text classification almost as good as the state of the art in deep learning, but it is several orders of magnitude faster. In their paper, the authors compare the accuracy and computation time of several datasets with deep nets. As an example, in the Amazon Polarity dataset, fastText achieves an accuracy of 94.6% in 10s. In the same dataset, the crepe CNN model of Zhang and LeCun, 2016 achieves 94.5% in 5 days, while the Very Deep CNN model of Conneau et al., 2016 achieves 95.7% in 7h. The comparison is not even fair, because while fastText's time is computed with CPUs, the CNN models are computed using Tesla K40 GPUs.