文档章节

阅读论文 --- GPS: A Graph Processing System∗

wdfnst
 wdfnst
发布于 2016/05/16 22:12
字数 886
阅读 18
收藏 0

输入图片说明

输入图片说明

  • Highlight, page 1
    GPS is similar to Google’s proprietary Pregel system, with three new features: (1) an extended API to make global computations more easily expressed and more efficient; (2) a dynamic repartitioning scheme that re- assigns vertices to different workers during the computation, based on messaging patterns; and (3) an optimization that distributes adjacency lists of high-degree vertices across all compute nodes to improve performance.

  • Highlight, page 1
    The GPS API has an extension that enables efficient implementation of al- gorithms composed of one or more vertex-centric com- putations, combined with global computations.

  • Highlight, page 1
    GPS has an optimization called large adjacency list par- titioning (LALP), which partitions the adjacency lists of high-degree vertices across compute nodes, again to reduce communication.

  • Highlight, page 2
    Broadly

  • Underline, page 2
    1.4 Partitioning Experiments

  • Highlight, page 3
    A computation thread loops through the vertices in the worker and executes vertex.compute() on each active ver- tex. It maintains an outgoing message buffer for all work- ers in the cluster, including itself. When a buffer is full it is either given to MINA threads for sending over the network, or passed directly to the local message parser thread.

  • Highlight, page 3
    MINA threads send and receive message buffers, as well as simple coordination messages between the master and the worker. When a message buffer is received, it is passed to the message parser thread.

  • Highlight, page 3
    A message parser thread parses incoming message buffers into separate messages and enqueues them into the re- ceiving vertices’ message queues for the next superstep.

  • Highlight, page 3
    GPS is implemented in Java. The compute nodes run HDFS (Hadoop Distributed File System) [18], which is used to store persistent data such as the input graph and the checkpointing files.

  • Highlight, page 3
    each line starts with the ID of a vertex u, followed by the IDs of u’s outgoing neighbors.

  • Highlight, page 3
    GPS assigns the vertices of G to workers using the same simple round- robin scheme used by Pregel: vertex u is assigned to worker W(u mod k).

  • Highlight, page 3
    The master coordinates the computation by instructing work- ers to: (a) start parsing input files; (b) start a new super- step; (c) terminate computation; and (d) checkpoint their states for fault-tolerance.

  • Highlight, page 5
    by partitioning large graphs “intelligently” before computation begins, we can reduce total network I/O by up to 13.6x and run-time by up to 2.5x. The effects of partitioning depend on three factors: (1) the graph algorithm being executed; (2) the graph itself; and (3) the configuration of the worker tasks across compute nodes.

  • Highlight, page 5
    Domain-based: In this partitioning scheme for web graphs only, we locate all web pages from the same domain in the same partition, and partition the domains randomly across the workers.

  • Highlight, page 5
    Random: The default “mod” partitioning method

  • Box, page 6
    (a) PageRank on sk-2005-d (b) Different algorithms on uk-2007-u Figure 6: Network I/O, different partitioning schemes (a) PageRank (50 iter.) on sk-2005-d (b) Different algorithms on uk-2007-u Figure 7: Run-time, different partitioning schemes

  • Highlight, page 8
    There are three questions any dynamic repartitioning scheme must answer: (1) which vertices to reassign; (2) how and when to move the reassigned vertices to their new workers; (3) how to lo- cate the reassigned vertices.

  • Highlight, page 8
    One option is to reassign vertex u at worker Wi to a new worker Wj if u send/receives more message to/from Wj than to/from any other worker, and that number of messages is over some threshold.

  • Highlight, page 9
    An obvious option is for each worker to store an in-memory map consisting of <vertex-id, new-worker-id> pairs.

  • Highlight, page 10
    We have developed a compiler from the Green-Marl [19] domain-specific language for graph processing into GPS.

  • Highlight, page 10
    some al- gorithms become very complex when implemented in a vertex-centric fashion—a classic example is contructing or doing a reverse traversal on a BFS tree. Green-Marl’s high-level constructs can express some of these compu- tations very easily, e.g., lines 7 and 11 in Figure 13b.

  • Box, page 11
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Procedure PageRank(G: Graph, e,d: Double, PR: Node Prop<Double>(G)) { Int i = 0; Double N = G.NumNodes(); G.PR = 1 / N; // Init PageRank Do { // Main iteration diff = 0.0; Foreach (t: G.Nodes) { Double val = (1−d) / N + d∗Sum(w: t.InNbrs){ w.PR / w.OutDegree()}; t.PR <= val @ t; diff +=|val−t.PR|;} i++; }While(i<30);} (a) PageRank Procedure bc approx(G:Graph, BC:Node Prop<Float>) { G.BC = 0; // Initialize BC as 0 per each node Node Prop<Float> sigma, delta; G.sigma = 0; Node s = G.PickRandom(); s. sigma = 1; InBFS(v: G.Nodes From s) { // BFS−order traversal // Summing over BFS parents v.sigma = Sum(w:v.UpNbrs) { w.sigma }; } InReverse { // Reverse−BFS order traversal v.delta = // Summing over BFS children Sum (w:v.DownNbrs) { v.sigma / w.sigma ∗ (1+ w.delta) }; v.BC += v.delta; // accumulate delta into BC } (b) Approximate Betweenness Centrality Figure 13: Green-Marl Programs }

© 著作权归作者所有

共有 人打赏支持
wdfnst
粉丝 2
博文 27
码字总数 22859
作品 1
宁波
半定规划(SDP)for 最大割

半定规划(SDP)for 最大割 最大割问题: Instance: given an undirected graph G(V,E), find a bipartition of V into S and T that maximizing the size of the cut E(S,T)= 贪心算法、随机......

hlveying
01/02
0
0
caffe学习:卷积计算

在caffe中如何计算卷积的? caffe中, 卷积网络的前向传播过程需要计算类似 这样的连接, forward_cpu_gemm()函数用weight矩阵和输入的bottom相乘, 然后对bias进行处理, bias程序会根据情况决定...

Hanging_Gardens
01/11
0
0
100 open source Big Data architecture papers

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it ha......

naughty
2016/04/05
55
0
每周论文清单:知识图谱,文本匹配,图像翻译,视频对象分割

在碎片化阅读充斥眼球的时代,越来越少的人会去关注每篇论文背后的探索和思考。 在这个栏目里,你会快速 get 每篇精选论文的亮点和痛点,时刻紧跟 AI 前沿成果。 点击本文底部的「阅读原文」...

c9yv2cf9i06k2a9e
2017/12/27
0
0
一种基于ASEF算法的高速特征点定位算法

一种基于ASEF算法的高速特征点定位算法 [TOC] 作者:侯伟 时间:2018.04.14 论文链接1:http://vision.lbl.gov/Conferences/cvpr/Papers/data/papers/1105.pdf 论文链接2:http://citeseerx....

monkey3233
04/13
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

流量劫持是如何产生的?

流量劫持,这种古老的攻击沉寂了一段时间后,最近又开始闹的沸沸扬扬。众多知名品牌的路由器相继爆出存在安全漏洞,引来国内媒体纷纷报道。只要用户没改默认密码,打开一个网页甚至帖子,路由...

谢思华
20分钟前
0
0
Hadoop Client无法使用maven下载源码

最近在学习hadoop,使用maven的时候想看一下源码的注释,结果IDEA一直提示无法下载 搞得我一度以为maven坏掉了。 但是通过搜索,发现在maven仓库里确实没有源码.... 而2.8.1以及之前的版本是...

Iceberg_XTY
22分钟前
0
0
为什么程序员千万不要重写代码?

你所做的事情,也许暂时看不到成果,但不要灰心或焦虑,你不是没有成长,而是在扎根。 图片来自网络 0 前言 程序员都有一颗工程师的心,所以当他们到一片新的场地想做的第一件事就是,将旧的...

Java小铺
23分钟前
0
0
VUE集成AdminLte

1. 安装需要到插件 npm i admin-lte -Snpm i jquery -Snpm i axios -Snpm i vue-router -S 2. 配置webpack.config.js 2.1 module.exports.module.rules修改字体loader: {test: /\.(p......

Pasenger
今天
0
0
Spring Aop原理之切点表达式解析

在前面的文章(Spring AOP切点表达式详解)中,我们总结了Spring Aop切点表达式的用法,而在上文(Spring Aop原理之Advisor过滤)中我们讲到,切点表达式的解析主要是在PatternParser.parse...

爱宝贝丶
今天
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部