文档章节

阅读论文 --- GPS: A Graph Processing System∗

wdfnst
 wdfnst
发布于 2016/05/16 22:12
字数 886
阅读 106
收藏 0

「深度学习福利」大神带你进阶工程师,立即查看>>>

输入图片说明

输入图片说明

  • Highlight, page 1
    GPS is similar to Google’s proprietary Pregel system, with three new features: (1) an extended API to make global computations more easily expressed and more efficient; (2) a dynamic repartitioning scheme that re- assigns vertices to different workers during the computation, based on messaging patterns; and (3) an optimization that distributes adjacency lists of high-degree vertices across all compute nodes to improve performance.

  • Highlight, page 1
    The GPS API has an extension that enables efficient implementation of al- gorithms composed of one or more vertex-centric com- putations, combined with global computations.

  • Highlight, page 1
    GPS has an optimization called large adjacency list par- titioning (LALP), which partitions the adjacency lists of high-degree vertices across compute nodes, again to reduce communication.

  • Highlight, page 2
    Broadly

  • Underline, page 2
    1.4 Partitioning Experiments

  • Highlight, page 3
    A computation thread loops through the vertices in the worker and executes vertex.compute() on each active ver- tex. It maintains an outgoing message buffer for all work- ers in the cluster, including itself. When a buffer is full it is either given to MINA threads for sending over the network, or passed directly to the local message parser thread.

  • Highlight, page 3
    MINA threads send and receive message buffers, as well as simple coordination messages between the master and the worker. When a message buffer is received, it is passed to the message parser thread.

  • Highlight, page 3
    A message parser thread parses incoming message buffers into separate messages and enqueues them into the re- ceiving vertices’ message queues for the next superstep.

  • Highlight, page 3
    GPS is implemented in Java. The compute nodes run HDFS (Hadoop Distributed File System) [18], which is used to store persistent data such as the input graph and the checkpointing files.

  • Highlight, page 3
    each line starts with the ID of a vertex u, followed by the IDs of u’s outgoing neighbors.

  • Highlight, page 3
    GPS assigns the vertices of G to workers using the same simple round- robin scheme used by Pregel: vertex u is assigned to worker W(u mod k).

  • Highlight, page 3
    The master coordinates the computation by instructing work- ers to: (a) start parsing input files; (b) start a new super- step; (c) terminate computation; and (d) checkpoint their states for fault-tolerance.

  • Highlight, page 5
    by partitioning large graphs “intelligently” before computation begins, we can reduce total network I/O by up to 13.6x and run-time by up to 2.5x. The effects of partitioning depend on three factors: (1) the graph algorithm being executed; (2) the graph itself; and (3) the configuration of the worker tasks across compute nodes.

  • Highlight, page 5
    Domain-based: In this partitioning scheme for web graphs only, we locate all web pages from the same domain in the same partition, and partition the domains randomly across the workers.

  • Highlight, page 5
    Random: The default “mod” partitioning method

  • Box, page 6
    (a) PageRank on sk-2005-d (b) Different algorithms on uk-2007-u Figure 6: Network I/O, different partitioning schemes (a) PageRank (50 iter.) on sk-2005-d (b) Different algorithms on uk-2007-u Figure 7: Run-time, different partitioning schemes

  • Highlight, page 8
    There are three questions any dynamic repartitioning scheme must answer: (1) which vertices to reassign; (2) how and when to move the reassigned vertices to their new workers; (3) how to lo- cate the reassigned vertices.

  • Highlight, page 8
    One option is to reassign vertex u at worker Wi to a new worker Wj if u send/receives more message to/from Wj than to/from any other worker, and that number of messages is over some threshold.

  • Highlight, page 9
    An obvious option is for each worker to store an in-memory map consisting of <vertex-id, new-worker-id> pairs.

  • Highlight, page 10
    We have developed a compiler from the Green-Marl [19] domain-specific language for graph processing into GPS.

  • Highlight, page 10
    some al- gorithms become very complex when implemented in a vertex-centric fashion—a classic example is contructing or doing a reverse traversal on a BFS tree. Green-Marl’s high-level constructs can express some of these compu- tations very easily, e.g., lines 7 and 11 in Figure 13b.

  • Box, page 11
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Procedure PageRank(G: Graph, e,d: Double, PR: Node Prop<Double>(G)) { Int i = 0; Double N = G.NumNodes(); G.PR = 1 / N; // Init PageRank Do { // Main iteration diff = 0.0; Foreach (t: G.Nodes) { Double val = (1−d) / N + d∗Sum(w: t.InNbrs){ w.PR / w.OutDegree()}; t.PR <= val @ t; diff +=|val−t.PR|;} i++; }While(i<30);} (a) PageRank Procedure bc approx(G:Graph, BC:Node Prop<Float>) { G.BC = 0; // Initialize BC as 0 per each node Node Prop<Float> sigma, delta; G.sigma = 0; Node s = G.PickRandom(); s. sigma = 1; InBFS(v: G.Nodes From s) { // BFS−order traversal // Summing over BFS parents v.sigma = Sum(w:v.UpNbrs) { w.sigma }; } InReverse { // Reverse−BFS order traversal v.delta = // Summing over BFS children Sum (w:v.DownNbrs) { v.sigma / w.sigma ∗ (1+ w.delta) }; v.BC += v.delta; // accumulate delta into BC } (b) Approximate Betweenness Centrality Figure 13: Green-Marl Programs }

wdfnst

wdfnst

粉丝 3
博文 27
码字总数 22859
作品 1
宁波
私信 提问
加载中
请先登录后再评论。
密码管理程序--pwgrep

为了管理我的密码,我写了一个小的 bash/awk 脚本用来管理一个密码数据库并使用 GnuPG 进行加密。使用 pwgrep 的好处是: 密码加密 密码版本化,不用担心丢失老密码 Since a versioning sys...

匿名
2013/03/11
1.3K
0
Android 设备管理工具--androrat

androrat 是一个对 Android 设备进行远程管理的工具。 功能包括: 获取通讯录信息 获取呼叫记录 获取短信和彩信 通过 GPS 获取定位 实时监控接收到的短信 监控手机的呼叫状态 拍照 获取来自麦...

匿名
2013/03/28
2.7W
11
DKPro Core

DKPro Core 是基于 Apache UIMA 框架之上的自然语言处理(NLP)的软件组件。DKPro Core 提供了这样的第三方工具以及原NLP组件的包装。 DKPro核心建立在很大程度上uimaFIT可以快速方便的开发N...

匿名
2012/10/24
1.8K
0
FasterGPS

通过 FasterGPS 你可以使用一个本地区的NTP服务器 来加快获得GPS修正的时间。

开源好!
2012/12/12
1.8K
0
商铺记账系统软件--bluebee蓝蜜蜂记账系统

此项目已废弃:) Donate捐助 bluebee accounting system is a popular and free accounting system for individual businesses.This system focus on the individual businesses operation......

刘学炜
2013/04/18
1.4W
10

没有更多内容

加载失败,请刷新页面

加载更多

【每周CV论文】初学深度学习图像对比度增强应该要读的文章

欢迎来到《每周CV论文》。在这个专栏里,还是本着有三AI一贯的原则,专注于让大家能够系统性完成学习,所以我们推荐的文章也必定是同一主题的。 图像对比度增强,即增强图像中的有用信息,抑...

言有三
昨天
0
0
运营商大数据-行业大数据获客利器

一、永远不要沉溺在安逸里得过且过,能给你遮风挡雨的,同样能让你不见天日,只有让自己更加强大,才能真正撑起一片天。 二、别把生活当作游戏,谁游戏人生,生活就惩罚谁,这不是劝诫,而是...

osc_1wo6kipk
16分钟前
0
0
【Rust日报】2020-08-10:在 Rust 中存储连续数据

在 Rust 中存储连续数据? 作者都帮你整理好了: 使用 Rust 中的数组 [T; N]. Slice &[T] or &mut [T], 可以方便的 split. Boxed slice Box<[T]>. Vec. 长度和内容都可以变化,这可能是我们最常...

MikeTang
昨天
0
0
Gradient Centralization: 一行代码加速训练并提升泛化能力 | ECCV 2020 Oral

梯度中心化GC对权值梯度进行零均值化,能够使得网络的训练更加稳定,并且能提高网络的泛化能力,算法思路简单,论文的理论分析十分充分,能够很好地解释GC的作用原理   来源:晓飞的算法工程...

zb1486966459725
昨天
0
0
移动大数据-装修行业获客利器

因为海伦凯勒的努力和坚毅不拔的个性,而赢得了大家的肯定,终于得到了诺贝尔文学奖。虽然得了诺贝尔奖,但她对生命依然奋战不懈,她马不停蹄的到各地学校里演讲。有一次,她到一所大学演讲,...

osc_qheq8wav
17分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部