文档章节

人工智能资料库:第54辑(20170515)

AllenOR灵感
 AllenOR灵感
发布于 2017/09/10 01:20
字数 722
阅读 6
收藏 0

1.【博客】Handling imbalanced dataset in supervised learning using family of SMOTE algorithm

简介:

Consider a problem where you are working on a machine learning classification problem. You get an accuracy of 98% and you are very happy. But that happiness doesn’t last long when you look at the confusion matrix and realize that majority class is 98% of the total data and all examples are classified as majority class. Welcome to the real world of imbalanced data sets!!
Some of the well-known examples of imbalanced data sets are
1 - Fraud detection: where number of fraud cases could be much smaller than non-fraudulent transactions.
2- Prediction of disputed / delayed invoices: where the problem is to predict default / disputed invoices.
3- Predictive maintenance data sets, etc

原文链接:http://www.datasciencecentral.com/profiles/blogs/handling-imbalanced-data-sets-in-supervised-learning-using-family


2.【资料】Top 15 Python Libraries for Data Science in 2017

简介:


Core Libraries.

  1. NumPy
  2. SciPy
  3. Pandas

Visualization.

  1. Matplotlib
  2. Seaborn
  3. Bokeh
  4. Plotly

Machine Learning.

  1. SciKit-Learn

Deep Learning - Keras / TensorFlow / Theano

  1. Theano
  2. TensorFlow

原文链接:https://activewizards.com/blog/top-15-libraries-for-data-science-in-python/


3.【论文】Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent

简介:

With the increasing ability to routinely and rapidly digitize whole slide images with slide scanners, there has been interest in developing computerized image analysis algorithms for automated detection of disease extent from digital pathology images. The manual identification of presence and extent of breast cancer by a pathologist is critical for patient management for tumor staging and assessing treatment response. However, this process is tedious and subject to inter- and intra-reader variability. For computerized methods to be useful as decision support tools, they need to be resilient to data acquired from different sources, different staining and cutting protocols and different scanners. The objective of this study was to evaluate the accuracy and robustness of a deep learning-based method to automatically identify the extent of invasive tumor on digitized images. Here, we present a new method that employs a convolutional neural network for detecting presence of invasive tumor on whole slide images. Our approach involves training the classifier on nearly 400 exemplars from multiple different sites, and scanners, and then independently validating on almost 200 cases from The Cancer Genome Atlas. Our approach yielded a Dice coefficient of 75.86%, a positive predictive value of 71.62% and a negative predictive value of 96.77% in terms of pixel-by-pixel evaluation compared to manually annotated regions of invasive ductal carcinoma.

原文链接:https://www.nature.com/articles/srep46450


4.【博客】Neural networks for algorithmic trading 1.2 — Correct time series forecasting + backtesting

简介:


Hi everyone! Some time ago I published a small tutorial on financial time series forecasting which was interesting, but in some moments wrong. I have spent some time working with different time series of different nature (applying NNs mostly) in HPA, that particularly focuses on financial analytics, and in this post I want to describe more correct way of working with financial data. Comparing to previous post, I want to show different way of data normalizing and discuss more issues of overfitting (which definitely appears while working with data that has stochastic nature). We won’t compare different architectures (CNN, LSTM), you can check them in previous post. But even working only with simple feed-forward neural nets we will see important things. If you want to jump directly to the code — check out IPython Notebook. For Russian speaking readers, it’s a translation of my post here and you can check webinar on backtesting here.

原文链接:https://medium.com/@alexrachnog/neural-networks-for-algorithmic-trading-1-2-correct-time-series-forecasting-backtesting-9776bfd9e589


5.【课程】I Dropped Out of School to Create My Own Data Science Master’s — Here’s My Curriculum

简介:

I dropped out of a top computer science program to teach myself data science using online resources like Udacity, edX, and Coursera. The decision was not difficult. I could learn the content I wanted to faster, more efficiently, and for a fraction of the cost. I already had a university degree and, perhaps more importantly, I already had the university experience. Paying $30K+ to go back to school seemed irresponsible.

原文链接:https://medium.com/@davidventuri/i-dropped-out-of-school-to-create-my-own-data-science-master-s-here-s-my-curriculum-1b400dcee412


本文转载自:http://www.jianshu.com/p/7b9b84500929

共有 人打赏支持
AllenOR灵感
粉丝 10
博文 2634
码字总数 82983
作品 0
程序员
预测流行偏好,时尚 AI 未来可望取代造型师

【Technews科技新报】预测时尚潮流是一项需要天分的工作,还得仰赖一个庞大的系统让少数人追捧的时尚进入大众流行市场,进而让业者赚取大笔钞票。现在预测工作也可以交给人工智能,让服饰业者...

黄 嬿
2017/12/26
0
0
2018谷歌学术影响力排名出炉:CVPR进入前20,ResNet被引最多过万次!

【新智元导读】谷歌学术昨天发表了2018年最新的学术期刊和会议影响力排名,CVPR和NIPS分别排名第20和第54。在排名第一的Nature里,过去5年被引用次数最高的论文,正是深度学习三大神Hinton、...

技术小能手
08/06
0
0
人工智能知识整理-第1辑(20170603)-机器学习入门资源汇总

有一天我忽然忘记了一个函数的用法,于是就上谷歌搜,结果搜出来的竟然是自己写的一篇笔记,上面有很详细的回答。当时感觉是跟另外一个自己进行交流,那一个是刚学完知识,印象还非常深的自己...

人工智豪
2017/06/03
0
0
人工智能时代的工作、学习和生活---《人工智能》阅读笔记

自从“罗辑思维”栏目从优酷网站搬到得到APP并且变为每天几分钟的节目之后,我就很少收听它了。某天,我打开得到APP,并且点开了“罗辑思维”的节目清单,发现有一期的标题包含了“人工智能”...

zhouzxi
2017/07/15
0
0
区块链技术让科学家共享患者健康资讯,同时保障个人资料安全

【Technews科技新报】目前医生在依据乳房摄影判断乳癌发生的情况下,有四分之一的乳癌无法被及时判断发现。为了提升乳癌确诊的效率,科学家计划以数百万包含了健康女性以及患有乳癌的女性乳房...

黄 斯沛
04/16
0
0

没有更多内容

加载失败,请刷新页面

加载更多

一切都靠大数据:滴滴已封禁4.3万人员、车辆

这段时间以来,滴滴出行相继出炉了各种整改措施,包括自身安全建设和外部社会共建,昨日就刚刚宣布正在筹备建立安全监督顾问委员会。 据媒体最新报道,9月30日,上海市交通委员会执法总队、上...

linuxCool
39分钟前
4
0
awk命令用法介绍

10月18日任务 9.6/9.7 awk 1.awk(上)(下) 1.awk 分段操作功能 指定分隔符,并把第一段打印出来,不会改动文件内容 将所有内容打印出来 awk 没有指定分隔符号,则会默认用空格或者空白字符...

hhpuppy
今天
4
0
Spring Cloud Eureka Server高可用之:在线扩容

本文共 1591字,阅读大约需要 6分钟 ! 概述 业务微服务化以后,我们要求服务高可用,于是我们可以部署多个相同的服务实例,并引入负载均衡机制。而微服务注册中心作为微服务化系统的重要单元...

CodeSheep
今天
2
0
内网esxi主机上安装CoreOS虚拟机

CoreOS是一个为专门运行容器而设计的轻量级linux发行版,旨在通过轻量的系统架构和灵活的应用程序部署能力简化数据中心的维护成本和复杂度。它没有包管理工具,运行容器化应用以提供服务;默...

hiwill
今天
2
0
20181018 上课截图

![](https://oscimg.oschina.net/oscnet/49f66c08ab8c59a21a3b98889d961672f30.jpg) ![](https://oscimg.oschina.net/oscnet/a61bc2d618b403650dbd4bf68a671fabecb.jpg)......

小丑鱼00
今天
3
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部