- 【博客】Can NLP Reveal Power Imbalances?
Last Friday, the NLP & Text-As-Data Seminar heard from Vinodkumar Prabhakaran, a postdoctoral fellow at Stanford University who is specializing in computational sociolinguistics. One of his research projects focuses on the workplace — today, 96% of all office communication, Prabhakaran explained, occurs through mediums like email. But although email may be more convenient, it has also resulted in more people speaking online in ways that they would not during face-to-face communication. For example, one may feel comfortable speaking more sharply over email than they usually would in person, thanks to the detached, quasi-anonymity of a digital screen.
2.【paper & talk】Cross-Language Text Classification using Structural Correspondence Learning
We present a new approach to crosslanguage text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce taskspecific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of interlanguage
correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.
3.【博客】Text Mining in R: A Tutorial
This tutorial was built for people who wanted to learn the essential tasks required to process text for meaningful analysis in R, one of the most popular and open source programming languages for data science. At the end of this tutorial, you’ll have developed the skills to read in large files with text and derive meaningful insights you can share from that analysis. You’ll have learned how to do text mining in R, an essential data mining tool. The tutorial is built to be followed along with tons of tangible code examples. The full repository with all of the files and data is here if you wish to follow along.
4.【博客】Cloud-Scale Text Classification with Convolutional Neural Networks on Microsoft Azure
Natural Language Processing (NLP) is one of the fields in which deep learning has made significant progress. Specifically, the area of text classification, where the objective is to categorize documents, paragraphs or individual sentences into classes, has attracted the interest of both industry and academia. Examples include determining what topic is discussed in a sentence or assessing whether the sentiment conveyed in a text passage is positive, negative or neutral. This information can be used by companies to define marketing strategy, generate leads or improve customer service.
This is the fourth blog showcasing deep learning applications on Microsoft’s Data Science Virtual Machine (DSVM) with GPUs using the R API of the deep learning library MXNet. The DSVM is a custom virtual machine image from Microsoft that comes pre-installed with popular data science tools for modeling and development activities.
5.【博客】Sparse coding: A simple exploration
Sparse coding is the study of algorithms which aim to learn a useful sparse representation of any given data. Each datum will then be encoded as a sparse code:
- The algorithm only needs input data to learn the sparse representation. This is very useful since you can apply it directly on any kind of data, it is called unsupervised learning.
- It will automatically find the representation without loosing any information (As if one could automatically reveals the intrinsic atoms of one’s data).