Big data defined

发布于 2015/04/28 10:19
字数 377
阅读 77
收藏 0
点赞 0
评论 0

Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses.

More accurate analyses may lead to more confident decision making. And better decisions can mean greater operational efficiencies, cost reductions and reduced risk.

Big data defined

As far back as 2001, industry analyst Doug Laney (currently with Gartner) articulated the now mainstream definition of big data as the three Vs of big data: volume, velocity and variety.

  • Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.

  • Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.

  • Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.

At SAS, we consider two additional dimensions when thinking about big data:

  • Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something trending in social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage. Even more so with unstructured data involved.

  • Complexity. Today's data comes from multiple sources. And it is still an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.


共有 人打赏支持
粉丝 7
博文 18
码字总数 1936
作品 0
Ibis: Scaling the Python Data Experience

Ibis: Scaling the Python Data Experience Ibis 0.5 (September 10, 2015) Ibis 0.5.0 is released. Read all about it Please also sign up for the mailing list. What is Ibis? Ibis is ......

openthings ⋅ 2016/01/10 ⋅ 0


spark-http-stream spark-http-stream transfers Spark structured stream over HTTP protocol. Unlike tcp streams, Kafka streams and HDFS file streams, http streams often flow across......

白乔 ⋅ 2017/10/12 ⋅ 0

【NoSQL vs RDBMS】 Why and why not to use NoSQL over RDBMS

这是墙外一片关于NOSQL的论文,主要探讨NoSQL与RDBMS直接有什么区别?为什么要使用和不使用NoSQL数据库?说一说NoSQL数据库的几个优点? (水平有限,没有进行翻译,原滋原味的文档) NoSQL vs...

冷冷gg ⋅ 2016/08/30 ⋅ 0

The Data Disconnect With DevOps and Digital Transformation

Industry surveys indicate growing momentum of DevOps with widespread adoption and experimentation. But, a disconnect looms for most organizations due to inadequate access to rel......

Paul Stanton ⋅ 2017/12/24 ⋅ 0

Comprehensive Introduction to Apache Spark

Introduction Industry estimates that we are creating more than 2.5 Quintillion bytes of data every year. Think of it for a moment – 1 Qunitillion = 1 Million Billion! Can you i......

grasp_D ⋅ 06/15 ⋅ 0

NoSQL比较:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j

本文详细介绍这几个 NoSQL 服务器的特点以及适用的场合! CouchDB Written in: Erlang Main point: DB consistency, ease of use License: Apache Protocol: HTTP/REST Bi-directional (!) ......

红薯 ⋅ 2011/08/29 ⋅ 8

JavaScript Primitive Data Types

JavaScript Primitive Data Types Primitive data types Any value that you use is of a certain type. In JavaScript, there are just a few primitive data types: 1. Number: This inc......

秋风醉了 ⋅ 2014/07/16 ⋅ 0


1. Redis简介 Redis是一个开源软件项目(BSD许可),用ANSI C编写,适用于大多数的POSIX系统,是一个可用作数据库、缓存和消息代理的内存数据库。Redis是一个非关系型数据库,Redis可以存储键...

qq58f47049ce44d ⋅ 2017/06/12 ⋅ 0

When Monolithic Is the Way: Play vs. Spring Boot vs. Grails

Nowadays, when we are going to start a new project, the developer starts thinking about the technology, and usually they come up with something like: microservices, RESTFul, Act......

Marini Fabio ⋅ 2017/12/20 ⋅ 0

DDD: Part I (Introduction)

The first time I heard about DDD (Domain Driven Design, not Deadline Driven Design, for sure), I was still working as a Senior Java Developer for Hewlett-Packard at its Developm......

M Yauri Maulana at-Tamimi ⋅ 2017/12/14 ⋅ 0






作为软件人,找工作有时候似乎挺苦逼的。 说真的,让我去掉前面这句中“似乎”二字吧。就是苦逼!很多人都曾抱怨处在招聘的一方很糟糕——我们没有任何可靠的方式来甄别会写代码并且写得好的...

老道士 ⋅ 30分钟前 ⋅ 0


一、概述 1、 Hadoop整合了众多的文件系统,首先提供了一个高层的文件系统抽象org.apache.hadoop.fs.FileSystem。然后有各个文件系统的实现类。 2、Hadoop是JAVA编写的,不同文件系统之间的交...

cjxcloud ⋅ 34分钟前 ⋅ 0


Linux下MySQL表名不区分大小写的设置方法 MySQL表名不区分大小写的设置方法 在用centox安装mysql后,把项目的数据库移植了过去,发现一些表的数据查不到,排查了一下问题,最后发现是表名的大...

随风而浮沉 ⋅ 39分钟前 ⋅ 0


sudo cp simsun.ttc /usr/share/fonts cd /usr/share/fonts sudo chmod 644 simsun.ttc 更新字体缓存: 代码: sudo mkfontscale 代码: sudo mkfontdir 代码: sudo fc-cache -fsv 安装chrome扩......

wangxuwei ⋅ 41分钟前 ⋅ 0

利用 ssh 传输文件

Linux 下一般可以用 scp 命令通过 ssh 传送文件: #把服务器上的 /home/user/a.txt 发送到本机的 /var/www/local_dir 目录下scp username@servername:/home/user/a.txt /var/www/local_dir...

大灰狼时间 ⋅ 51分钟前 ⋅ 0


如何使用web3j为Java应用或Android App增加以太坊区块链支持,本教程内容即涉及以太坊中的核心概念,例如账户管理包括账户的创建、钱包创建、交易转账,交易与状态、智能合约开发与交互、过滤...

智能合约 ⋅ 今天 ⋅ 0


web3j简介 web3j是一个轻量级、高度模块化、响应式、类型安全的Java和Android类库提供丰富API,用于处理以太坊智能合约及与以太坊网络上的客户端(节点)进行集成。 可以通过它进行以太坊区块链...

笔阁 ⋅ 今天 ⋅ 0


异步I/O “异步”这个名词其实很早就诞生了,但它大规模流行却是在Web 2.0浪潮中,它伴随着AJAX的第一个A(Asynchronous)席卷了Web。 为什么要异步I/O 关于异步I/O为何在Node里如此重要,这与...

小草先森 ⋅ 今天 ⋅ 0


1、如果启动什么都不设,会怎样? 先来看一个命令 [root@localhost bin]# java -XX:+PrintCommandLineFlags -version -XX:InitialHeapSize=29899008 -XX:MaxHeapSize=478384128 -XX:+PrintCo......

算法之名 ⋅ 今天 ⋅ 0


宏是一种文本,一般来说其编译是在程序执行之前。 宏变量的创建 %let语句 %let macro_variables = text; %let是常见的宏变量建立方式,其编译就在执行前。如下例中,想要宏变量test等于数据集...

tonorth123 ⋅ 今天 ⋅ 0