- 【数据集】Microsoft Translator publicly releases speech translation corpus
As part of an ongoing effort within Microsoft to improve the accuracy of artificial intelligence (AI) systems, Microsoft Translator is publicly releasing a set of data that includes multiple conversations between bilingual speakers who are speaking French, German and English.
This corpus, which was produced by Microsoft using bilingual speakers, aims to create a standard by which people can measure how well their conversational speech translation systems work. It can serve as a standardized data set for testing bilingual conversational speech translation systems such as theMicrosoft Translator live featureandSkype Translator.
Christian Federmann, a senior program manager working with the Microsoft Translator team, said there aren’t as many standardized data sets for testing bilingual conversational speech translation systems. “You need high-quality data in order to have high-quality testing,” Federmann said.
The Microsoft team hopes the corpus, which is freely available, will benefit the entire field of conversational translation and help to create more standardized benchmarks that researchers can use to measure their work against others.
“This helps propel the field forward,” saidWill Lewis, a principal technical program manager with the Microsoft Translator team who also worked on the project.
Download the Microsoft Speech Language Translation corpushere.
Learn more about this release as well as other ways Microsoft is working to make AI smarter and more accurate in theMicrosoft Research blog.
2.【博客】Cognitive Machine Learning (1):Learning to Explain
his is an image of theZaamenkomst panel: one of the best remaining exemplarsof rock art from theSan peopleof Southern Africa. Assoon as you see it, you are inevitably herded, like theelandin the scene, througha series ofthoughts.Does it have a meaning? Why are the eland running? What do the white lines coming from the mouths of the humans and animals signify?What event is unfolding in this scene?These are questionsofinterpretation, and ofexplanation. Explanation issomething peopleactively seek, and can almost effortlessly provide. Having lost theirtraditional knowledge, the descendants of the Sanare unable to explain what the scene in the Zaamenkomst panel means. In what ways can machine learning systems be saved from such a fate? For this, we turn tothe psychology of explanation, the topic we explore in this post.
3.【博客】Scientific Data Processing
Machine learning is a technique with a growing importance, as the size of the datasets experimental sciences are facing is rapidly growing. Problems it tackles range from building a prediction function linking different observations, to classifying observations, or learning the structure in an unlabeled dataset.
This tutorial will explore statistical learning, that is the use of machine learning techniques with the goal of statistical inference: drawing conclusions on the data at hand.
4.【博客】Automated Machine Learning – AI software that writes itself
MIT Technology Reviewposted last 18th of January a nice article about the current developments around Automated Machine Learning.
Automated Machine Learning is a new avenue of research where the developers and researchers try to reach the goal of producing software with the ability to write software by its own. But the so-called automated software aims to write machine learning pipelines and libraries. That is, this aims for Artificial Intelligence becoming more and more autonomous and able to train and test itself.
What is remarkable in the MIT post is the number of research papers it provides, each with its own insight in this endeavour, but with some more advanced or promising than others.
5.【资料】15 Deep Learning Tutorials
This reference is a part of a new series of DSC articles, offering selected tutorials on subjects such as deep learning, machine learning, data science, deep data science, artificial intelligence, Internet of Things, algorithms, and related topics. It is designed for the busy reader who does not have a lot of time digging into long lists of advanced publications.