文档章节

原创翻译 | 2017年大数据新手入门指南

openfea
 openfea
发布于 2017/03/17 13:11
字数 3120
阅读 32
收藏 0

大数据的概念提出已经有一段时间了,但实际上它仍然有点模糊不清。作为人工智能、数据分析和物联网等数字化转型浪潮中的驱动力,它的概念有待在发展中重新审视。

基于以上考虑,我觉得该写一份针对初学者的指南了,解释下当下大数据的含义。这篇文章和我之前写的关于区块链的文章一样,没有深奥的术语,能够向任何知识背景的人解释清楚核心的概念和理念。

DT时代以来,我们的数据量开始指数级增长。这在很大程度上,是由于计算机的兴起,互联网和信息采集技术可以从我们的真实生活中采集数据,并将其转化为数字数据。

在2017年,我们无时无刻不在生产数据,当我们上网、使用带GPS功能的智能手机,与朋友们在聊天软件中聊天,或逛街,都会产生大量的数据。因此,你可以说,我们做每件事都会留下数字脚印,每件事都是一场数据交易。

除此之外,设备产生的数据也在迅速增长。当我们的智能家居设备相互之间或与主服务器通信时,它们在生成和分享数据。世界各地的工厂越来越多地使用配备传感器的设备来采集和传输数据。很快,无人驾驶汽车将走上街头,只要它们经过的地方,都会采集到一个实时、四维的地图。

大数据能做什么?

这种不断增长的流传感器信息,照片,文字,语音和视频数据,是大数据的基础,我们现在对这些数据的用途,在几年前是不可能的实现。目前,大数据正在以下领域帮助人们:

  • 治疗疾病和预防癌症

    通过分析大量的医疗记录和图像,可以帮助人们发现早期疾病和研发出新的药物。

  • 遏制饥饿

    农业数据可以最大化地提高农作物产量,减少污染物向生态系统的排放以及优化农用器械的使用。

  • 探索外太空

    美国宇航局通过分析数百万数据,来模拟火星地表各种可能性以及部署未来研究计划。

  • 预测和应对天灾人祸

    通过分析传感器数据,可以预测地震,并在搜救地震幸存者时给出搜救线索。大数据技术也被用来监测和帮助难民离开世界各地的战区。

  • 预防犯罪

    警方正在越来越多地采用基于警方自己的情报信息和公共数据的数据驱动战略体系,来更有效地部署资源以及发挥必要的威慑作用。

  • 让我们的生活更便利

    网购,拼车或度假,自主选择最合适的时间预定机票,决定接下来看什么电影,这些便利的生活都要感谢大数据。

大数据如何工作?

大数据的原理是,你收集的数据越多,你得到的情报就越准确可靠,并对未来的发展变化做出预测。通过更多数据的碰撞比对,可以发现它们相互之间的潜在关系,以帮助我们学习和验证决定。

最常见的分析方法是,通过建立一个数据模型,不断训练收集的数据,并监测模型返回结果的自动化过程来实现。今天的高级数据分析技术可以同时运行数亿百万的数据模型,探索数据,直到迭代完善,从而解决我们面临的问题。

我们收集的很多数据都是非结构化的,以图片和视频居多(比如,上传到Facebook或Twitter上的卫星图片,以及电子邮件数据、聊天及通话记录),这些数据很难被结构化关系型数据库处理。我们常常觉得,大数据是人工智能分析和机器学习的前沿学科,通过比人类处理数据更优秀的计算机图像识别和自然语言处理技术,可以发掘出这些数据背后的价值。

过去几年时间,大数据工具和技术主要通过Paas平台来提供。企业通过租用服务器空间、软件和第三方云服务提供商的服务,来完成所有的工作,而客户只需要在平台上支付相应费用。这种模式使得任何机构都有机会去尝试大数据领域的应用探索,因为不需要在硬件、软件、办公场地和技术开发人员方面支出费用。

大数据问题

今天,大数据带给我们前所未有的认知和机会,但它也给我们提出了一些刺手的问题:

  • 数据隐私

    现在的大数据包含了很多我们的私人生活信息,并且大部分极具个人私密性。这就促使我们在暴露私人信息与方便地使用大数据应用系统和服务之间做出取舍,我们允许谁来访问这些数据?

  • 数据安全

    即使我们为了某一特定目地而非常乐意地分享数据,但我们能确保这些数据的安全吗?现有的法律体系能规范这些海量数据的使用目的吗?

  • 数据歧视

    当个人行为被暴露后,因私人数据而遭受歧视的情况发生时我们能接受吗?我们已经使用信用评分来决定可以给谁贷款,运用数据驱动策略来决定将保险卖给谁。但我们希望这些分析和评估能够更详细一点,更谨慎一点,因为它们会让那些拥有较少资源和信息获取渠道的人,生活变得更加困难。

以上问题只是“大数据”挑战中的一部分。虽然它们只是大数据学术圈常常讨论的重点话题,但这些问题必须由那些使用大数据进行商业行为的人解决。如果他们不予以解决,会使企业变得不堪一击,并导致金融灾害和巨额罚款。

当人们刚开始谈论大数据时,被认为是心血来潮。这是因为作为时髦术语,在下一个新技术到来之前,自然被人们经常谈论,但往往昙花一现。虽然目前还没有证据证明大数据是杭儿风。事实上,就算出现新的时髦术语,大数据仍然是它们背后的驱动力。我们收集的数据只会不断增长,分析技术将变得更强。因此,假如大数据能够解决今天的一切问题,那么它的明天还难想象吗。

 

英文原文如下:

The Complete Beginner's Guide To Big Data In 2017

Big Data is a term that has been around for some time now but there is still confusion about what it actually is. The concept is continuing to evolve and to be reconsidered, as it remains the driving force behind many ongoing waves of digital transformation, including artificial intelligence, data science and the Internet of Things (IoT).

With that in mind I thought it was time to write a beginner’s guide to what Big Data means in 2017. In a similar way to my beginner’s guides to Blockchain andFinTech, this will be jargon-free and aims to explain the core concepts and ideas to anyone regardless of background knowledge.

It all starts with the exponential explosion in the amount of data we have generated since the dawn of the digital age. This is largely due to the rise of computers, the Internet and technology capable of capturing information from the real, physical world we live in, and converting it to digital data.

In 2017, we generate data whenever we go online, when we carry our GPS-equipped smartphones, when we communicate with our friends through social media or chat applications, and when we shop. You could say we leave digital footprints with everything we do that involves a digital transaction, which is almost everything.

On top of this, the amount of machine-generated data is rapidly growing too. Data is generated and shared when our “smart” home devices communicate with each other or with their home servers. Industrial machinery in plants and factories around the world is increasingly equipped with sensors that gather and transmit data. Soon, self-driving cars will take to the streets, beaming a real-time, four-dimensional maps of their surroundings back home from wherever they go.

What can Big Data do?

This ever-growing stream sensor information, photographs, text, voice and video data, is the foundation of Big Data which we can now use in ways that were not possible even a few years ago. Right now, Big Data projects are helping to:

· Cure disease and prevent cancer – Data-driven medicine involves analyzing vast numbers of medical records and images for patterns which can help spot disease early and develop new medicines.

· Feed the hungry – Agriculture is being revolutionized by data which can be used to maximize crop yields, minimize the amount of pollutants released into the ecosystem and optimize the use of machines and equipment

· Explore distant planets – NASA analyzes millions of data points and uses them to model every eventuality to land its Rovers on the surface of Mars and plan future missions.

· Predict and respond to natural and man-made disasters – Sensor data can be analyzed to predict where earthquakes are likely to strike next, and patterns of human behavior give clues which help aid organizations give relief to survivors. Big Data technology is also used to monitor and safeguard the flow of refugees away from war zones around the world.

· Prevent crime – Police forces are increasingly adopting data-driven strategies based on their own intelligence and public data sets in order to deploy resources more efficiently and act as a deterrent where one is needed.

· Make our everyday lives easier and more convenient – Shopping online, crowdsourcing a ride or a place to stay on holiday, choosing the best time to book flights and deciding what movie to watch next are all easier thanks to Big Data.

How does Big Data work?

Big Data works on the principle that the more you know about anything or any situation, the more reliably you can gain new insights and make predictions about what will happen in the future. By comparing more data points, relationships will begin to emerge that were previously hidden, and these relationships will enable us to learn and inform our decisions.

Most commonly this is done through a process which involves building models, based on the data we can collect, and then running simulations, tweaking the value of data points each time and monitoring how it impacts our results. This process is automated – today’s advanced analytics technology will run millions of these simulations, tweaking all the possible variables until it finds a pattern – or an insight – that helps solve the problem it is working on.

Increasingly, data is coming to us in an unstructured form, meaning it cannot be easily put into structured tables with rows and columns. Much of this data is in the form of pictures and videos – from satellite images to photographs uploaded to Facebook or Twitter – as well as email and instant messenger communications and recorded telephone calls. To make sense of all of this, Big Data projects often use cutting edge analyticsinvolvingartificial intelligence and machine learning. By teaching computers to identify what this data represents– through image recognition or natural language processing, for example – they can learn to spot patterns much more quickly and reliably than humans.

A strong trend over the last few years has been a move towards the delivery of Big Data tools and technology through an “as-a-service” platform. Businesses and organizations rent server space, software systems and processing power from third-party cloud service providers. All of the work is carried out on the service provider’s systems, and the customer simply pays for whatever was used. This model is making Big Data-driven discovery and transformation accessible to any organization and cuts out the need to spend vast sums on hardware, software, premises and technical staff.

Big Data concerns

Today, Big Data gives us unprecedented insights and opportunities, but it also raises concerns and questions that must be addressed:

· Data privacy – The Big Data we now generate contains a lot of information about our personal lives, much of which we have a right to keep private. Increasingly we are asked to strike a balance between the amount of personal data we divulge, and the convenience that Big Data powered apps and services offer. Who do we allow to have access to this data?

· Data security – Even if we decide we are happy for someone to have our data for a particular purpose, can we trust them to keep it safe? Is the existing legal framework up to the job of regulating data use at this scale?

· Data discrimination – When everything is known, will it become acceptable to discriminate against people based on data we have on their lives? We already use credit scoring to decide who can borrow money, and insurance is heavily data-driven. We can expect to be analyzed and assessed in greater detail, and care must be taken that this isn’t done in a way which contributes to making life more difficult for those who already have fewer resources and access to information.

Facing up to these challenges is part of “Big Data,” too. They are certainly a major part of the debate around the use of Big Data in academic circles. However they must also be addressed by those who want to take advantage of Big Data in business. Failure to do so can leave businesses vulnerable and lead to financial disaster as well as huge fines.

When people first started talking about Big Data it was sometimes dismissed as a fad – the latest trendy technology term which would be talked about for a while then quietly forgotten about when the next big thing came along. This hasn’t proven to be the case yet – in fact, while newer buzzwords have popped up, Big Data is still the driving force behind just about all of them. The amount of data available to us is only going to increase, and analytics technology will become more capable. So if Big Data is capable of all of this today – just imagine what it will be capable of tomorrow.

© 著作权归作者所有

openfea
粉丝 18
博文 86
码字总数 95615
作品 1
杭州
其他
私信 提问
开源电子书

目录 语言无关类 操作系统 智能系统 分布式系统 编译原理 函数式概念 计算机图形学 WEB服务器 版本控制 编辑器 NoSQL PostgreSQL MySQL 管理和监控 项目相关 设计模式 Web 大数据 编程艺术 ...

zting科技
2017/12/11
0
0
免费的编程中文书籍索引【收藏速度】

语言无关类 优质博客 PyTab在线手册中心 ImportNew 廖雪峰的官方网站 程序员博客墙 操作系统 开源世界旅行手册 鸟哥的Linux私房菜 Linux 系统高级编程 The Linux Command Line (中英文版) L...

yonghu86
2015/04/15
0
0
编程类开放书籍荟萃(转载)

关于开源图书有人在网络上做了大量整理,本文为大家刊载《免费的编程中文书籍索引》 国外程序员在 stackoverflow 推荐的程序员必读书籍,中文版。 stackoverflow 上的程序员应该阅读的非编程...

行者PHPer
2016/10/09
159
0
总有你要的编程书单(GitHub )

目录 IDE IntelliJ IDEA 简体中文专题教程 MySQL 21分钟MySQL入门教程 MySQL索引背后的数据结构及算法原理 NoSQL Disque 使用教程 Neo4j .rb 中文資源 Redis 命令参考 Redis 设计与实现 The ...

汇智网
2017/11/22
0
0
Fanta/free-programming-books-zh_CN

免费的编程中文书籍索引 免费的编程中文书籍索引,欢迎投稿。 国外程序员在 stackoverflow 推荐的程序员必读书籍,中文版。 stackoverflow 上的程序员应该阅读的非编程类书籍有哪些? 中文版...

Fanta
2016/11/14
0
0

没有更多内容

加载失败,请刷新页面

加载更多

rime设置为默认简体

转载 https://github.com/ModerRAS/ModerRAS.github.io/blob/master/_posts/2018-11-07-rime%E8%AE%BE%E7%BD%AE%E4%B8%BA%E9%BB%98%E8%AE%A4%E7%AE%80%E4%BD%93.md 写在开始 我的Arch Linux上......

zhenruyan
今天
5
0
简述TCP的流量控制与拥塞控制

1. TCP流量控制 流量控制就是让发送方的发送速率不要太快,要让接收方来的及接收。 原理是通过确认报文中窗口字段来控制发送方的发送速率,发送方的发送窗口大小不能超过接收方给出窗口大小。...

鏡花水月
今天
10
0
OSChina 周日乱弹 —— 别问,问就是没空

Osc乱弹歌单(2019)请戳(这里) 【今日歌曲】 @tom_tdhzz :#今日歌曲推荐# 分享容祖儿/彭羚的单曲《心淡》: 《心淡》- 容祖儿/彭羚 手机党少年们想听歌,请使劲儿戳(这里) @wqp0010 :周...

小小编辑
今天
1K
11
golang微服务框架go-micro 入门笔记2.1 micro工具之micro api

micro api micro 功能非常强大,本文将详细阐述micro api 命令行的功能 重要的事情说3次 本文全部代码https://idea.techidea8.com/open/idea.shtml?id=6 本文全部代码https://idea.techidea8....

非正式解决方案
今天
5
0
Spring Context 你真的懂了吗

今天介绍一下大家常见的一个单词 context 应该怎么去理解,正确的理解它有助于我们学习 spring 以及计算机系统中的其他知识。 1. context 是什么 我们经常在编程中见到 context 这个单词,当...

Java知其所以然
昨天
9
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部