wdfnst

## Optimizing Graph Algorithms on Pregel-like Systems

Semih Salihoglu Stanford University semih@cs.stanford.edu Jennifer Widom Stanford University widom@cs.stanford.edu

• Highlight, page 1
Standard graph algorithms in this setting can incur unnecessary in- efficiencies such as slow convergence or high communication or computation cost

• Highlight, page 1
Standard graph algorithms in this setting can incur unnecessary in- efficiencies such as slow convergence or high communication or computation cost

• Highlight, page 1
We identify a variety of inefficiencies that arise when ex- ecuting these algorithms on Pregel-like systems, and we describe optimization techniques, some of which are applicable across mul- tiple algorithms, to address them.

• Highlight, page 1
large diameters or skew in component sizes.

• Highlight, page 1
performing some serial computation on a tiny fraction of the in- put graph

• Underline, page 1
complementing

• Highlight, page 1
our open-source Pregel implementation

• Highlight, page 1
In GPS [42] and Giraph [17], an op- tional master.compute() function is executed by the Master task between supersteps to perform serial computation, and for coor- dination in algorithms that are composed of multiple vertex-centric stages.

• Highlight, page 1
The optimization techniques we offer in this paper focus on reducing the first two costs: communication and number of su- persteps.

• Highlight, page 2
FCS monitors the size of the “active” graph on which the computation is executing. If the active graph becomes small enough, FCS sends it to the master, which performs the end of the computation serially inside master.compute(), then sends the results back to the workers.

• Highlight, page 2
Storing Edges At Subvertices (SEAS): SEAS is an optimiza- tion we apply primarily to our MSF algorithm (Table 1), in which sets of vertices (called subvertices) are merged to form supervertices.

• Highlight, page 6
FCS monitors the size the active- subgraph. Once the size of the active-subgraph is below a threshold (5M edges by default), it sends the active-subgraph to the mas- ter, which performs the rest of the computation serially, and sends the results back to the workers.

• Highlight, page 8
Our SEAS optimiza- tion instead stores the edges of a supervertex s in a distributed fash- ion among all of its subvertices.

• Highlight, page 8
In our implementation of SEAS, subvertices store a pointer to their latest supervertices.

• Highlight, page 8
twofold

• Highlight, page 8
which is proportional to the number of edges in the graph,

### wdfnst

Spark GraphX Pregel API: An Example

GraphX Pregel API Graph data and graph processing is getting more and more attention lately in various fields. It has become apparent that a large number of real world problems ......

openthings
2016/09/02
98
1
Spark GraphX 编程指南

GraphX编程指南 (根据原文编辑：http://udn.yyuap.com/doc/spark-programming-guide-zh-cn/graphx-programming-guide/index.html) GraphX是一个新的（alpha）Spark API，它用于图和并行图(gr......

openthings
2016/08/29
50
0
GraphX Programming Guide

GraphX Overview GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstractio......

openthings
2016/08/29
10
0
100 open source Big Data architecture papers

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it ha......

naughty
2016/04/05
55
0
Pregel：基于图分割的图结构数据并行处理

jhonephone
2014/01/17
0
0

# -*- coding: utf-8 -*-"""Created on Thu Apr 18 17:05:47 2019@author: HeyJude"""import timestart_time = time.time()def GetText(pic_path, text_path): import pytesser......

KYO4321
3分钟前
1
0
Java多线程之创建线程的三种方式比较

7分钟前
0
0
inner join 与 left join 之间的区别

dragon_tech
8分钟前
0
0
linux下cpio.gz文件的解压方法

linux下cpio.gz文件的解压方法 linux下cpio.gz文件的解压方法linux解压cpiocpio.gz 今天下载了 10201_database_linux_x86_64.cpio.gz 文件，解压方法如下： 1. gunzip 10201_database_linux...

10分钟前
1
0
Shell分析服务器日志，解锁各种新姿势

Shell分析服务器日志，解锁各种新姿势 DevOps技术栈 5月10日 作者：Panda 原文：https://segmentfault.com/a/1190000009745139 自己的小网站跑在阿里云的ECS上面,偶尔也去分析分析自己网站服...

linzhuangrong
12分钟前
2
0