Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs

如何分块?

1. Voronoi Digram 划分 The GVD computation can be easily implemented in the vertex- centric computing model, by performing multi-source BFS. Specif- ically, in superstep 1, each source s sets block(s) = s and broad- casts it to the neighbors; for each non-source vertex v, block(v) is unassigned. Finally, the vertex votes to halt. In superstep i (i > 1), if block(v) is unassigned, v sets block(v) to an arbitrary source received, and broadcasts block(v) to its neighbors before voting to halt. Otherwise, v votes to halt directly. When the process con- verges, we have block(v) = si for each v ∈ V C(si).

2. 2D划分
The first job is vertex-centric and works as follows: (1)each worker samples a subset of its vertices with probability psamp and sends the sample to the master; (2)the master first partitions the sampled vertices into nx slots by the x-coordinates, and then each slot is further partitioned into ny slots by the y-coordinates.

如何管理块? (计算, 消息传递)

Block-centric algorithm. Our block-centric solution operates in VB-mode. Each vertex maintains the same fields as in the vertex- centric algorithm, and blocks do not maintain any information. In each superstep, V-compute() is first executed for all vertices, where a vertex v finds w∗ from the incoming messages as in the vertex- centric algorithm. However, now v votes to halt only if d(w∗) ≥ dist(v). Otherwise, v updates ⟨prev(v), dist(v)⟩ = ⟨w∗, d(w∗)⟩ but stays active. Then, B-compute() is executed, where each block B collects all its active vertices v into a priority queue Q (with dist(v) as the key), and makes these vertices vote to halt. B- compute() then runs Dijkstra’s algorithm on B using Q, which re- moves the vertex v ∈ Q with the smallest value of dist(v) from Q for processing each time. The out-neighbors u ∈ Γ(v) are updated as follows. For each u ∈ V (B), if dist(v)+l(v, u) < dist(u), we update ⟨prev(u), dist(u)⟩ to be ⟨v, dist(v)+l(v, u)⟩, and insert u into Q with key dist(u) if u ∈/ Q, or update dist(u) if u is already in Q. For each u ̸∈ V (B), a message ⟨v, dist(v) + l(v, u)⟩ is sent to u. B votes to halt when Q becomes empty. In the next superstep, if a vertex u receives a message, u is activated along with its block, and the block-centric computation repeats.

• Highlight, page 1
For processing graphs with a large diameter δ, the message (or neighbor) propagation paradigm of the vertex-centric model often leads to algorithms that require O(δ) rounds (also called super- steps) of computation.

• Underline, page 1
For example, a single-source shortest path algorithm in  takes 10,789 supersteps on a USA road network.

• Underline, page 1
Apart from spatial networks, some large web graphs also have large diameters (from a few hundred to thousands). For example, the vertex-centric system in  takes 2,450 rounds for computing strongly connected components on a web graph.

• Highlight, page 2
loading: each worker loads a portion of vertices from HDFS into main-memory; the workers then exchange ver- tices through the network (by hashing over vertex ID) so that each worker wi finally holds all and only those vertices assigned to wi

• Highlight, page 3
BTC

• Highlight, page 3
Friendster

• Highlight, page 3

• Highlight, page 3
On the contrary, the block- centric model works on G and a high-degree vertex involves at most O(n/b) messages each round, where n is the number of vertices in the giant CC.

• Highlight, page 3
The idea is to broadcast the smallest vertex ID seen so far by each vertex v, denoted by min(v).

• Highlight, page 3
Each vertex be- longs to a unique block, and let block(v) be the ID of the block that v belongs to.

• Highlight, page 4
Similar to a vertex in Pregel, a block in Blogel also has a com- pute() function. We use B-compute() and V-compute() to denote the compute() function of a block and a vertex, respectively.

• Highlight, page 4
A block has access to all its vertices, and can send messages to any block B or vertex v as long as worker(B) or worker(v) is available.

• Highlight, page 4
Each B-worker maintains two message buffers, one for exchang- ing vertex-level messages and the other for exchanging block-level messages. A block also has a state indicating whether it is active, and may vote to halt.

• Highlight, page 6
Block-centric algorithm. Our block-centric solution operates in VB-mode. Each vertex maintains the same fields as in the vertex- centric algorithm, and blocks do not maintain any information. In each superstep, V-compute() is first executed for all vertices, where a vertex v finds w∗ from the incoming messages as in the vertex- centric algorithm. However, now v votes to halt only if d(w∗) ≥ dist(v). Otherwise, v updates ⟨prev(v), dist(v)⟩ = ⟨w∗, d(w∗)⟩ but stays active. Then, B-compute() is executed, where each block B collects all its active vertices v into a priority queue Q (with dist(v) as the key), and makes these vertices vote to halt. B- compute() then runs Dijkstra’s algorithm on B using Q, which re- moves the vertex v ∈ Q with the smallest value of dist(v) from Q for processing each time. The out-neighbors u ∈ Γ(v) are updated as follows. For each u ∈ V (B), if dist(v)+l(v, u) < dist(u), we update ⟨prev(u), dist(u)⟩ to be ⟨v, dist(v)+l(v, u)⟩, and insert u into Q with key dist(u) if u ∈/ Q, or update dist(u) if u is already in Q. For each u ̸∈ V (B), a message ⟨v, dist(v) + l(v, u)⟩ is sent to u. B votes to halt when Q becomes empty. In the next superstep, if a vertex u receives a message, u is activated along with its block, and the block-centric computation repeats.

• Highlight, page 6
v finds the in-neighbor w∗ such that d(w∗) is the small- est among all d(w) received.

• Text Note, page 6

• Highlight, page 7
We first review the Graph Voronoi Diagram (GVD)  of an undirected unweighted graph G = (V, E).

• Highlight, page 8
The sampling rate psamp decides the number of blocks, and usually a small value as 0.1% is a good choice.

• Highlight, page 8
As for the stopping parameters, γ is usually set as 90%, and pmax as 10% with f = 2, so that there will not be too many rounds of multi-source BFS.

• Highlight, page 8
The GVD computation can be easily implemented in the vertex- centric computing model, by performing multi-source BFS.

• Highlight, page 8
Specif- ically, in superstep 1, each source s sets block(s) = s and broad- casts it to the neighbors; for each non-source vertex v, block(v) is unassigned. Finally, the vertex votes to halt. In superstep i (i > 1), if block(v) is unassigned, v sets block(v) to an arbitrary source received, and broadcasts block(v) to its neighbors before voting to halt. Otherwise, v votes to halt directly. When the process con- verges, we have block(v) = si for each v ∈ V C(si).

• Highlight, page 8
The multi-source BFS has linear workload since each vertex only broadcasts a message to its neighbors when block(v) is assigned, and thus the total messages exchanged by all vertices is bounded by O(|E|).

• Highlight, page 8
Initially, each vertex v samples itself as a source with probability psamp. Then, multi- source BFS is performed to partition the vertices into blocks.

• Highlight, page 8
The first job is vertex-centric and works as follows: (1)each worker samples a subset of its vertices with probability psamp and sends the sample to the master; (2)the master first partitions the sampled vertices into nx slots by the x-coordinates, and then each slot is further partitioned into ny slots by the y-coordinates.

• Highlight, page 9
Data Type |V| |E| AVG Deg Max Deg Web Graphs WebUK directed 133,633,040 5,507,679,822 41.21 22,429 WebBase directed 118,142,155 1,019,903,190 8.63 3,841 Social Networks Friendster undirected 65,608,366 3,612,134,270 55.06 5,214 LiveJournal directed 10,690,276 224,614,770 21.01 1,053,676 RDF BTC undirected 164,732,473 772,822,094 4.69 1,637,619 Spatial Networks USA Road undirected 23,947,347 58,333,344 2.44 9 Euro Road undirected 18,029,721 44,826,904 2.49 12

• Highlight, page 10
Giraph++  proposed a graph coarsening method to reduce the size of the input graph so that METIS can run on the smaller graph. wdfnst

MnasNet：终端轻量化模型新思路

2018/08/12
0
0
12月20日，哈工大（深圳）邀您参与人工智能国际顶会论文报告会啦！

2017/12/18
0
0
nginx cpu过高或过低--状态简单分析与监控

nginx cpu占用率过高,可能是CPU密集型计算导致堵塞的. 分析工具 https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt https://github.com/agentzh/stapxx#ngx-lj-lua-stacks 2) ......

testwork
2016/05/31
92
0
protobuf-netbeans-plugin

protobuf-netbeans-plugin 是一个 NetBeans 用来开发 Protocol Buffers 的模块。 Protocol Buffers 是Google公司开发的一种数据描述语言，类似于XML能够将结构化数据序列化，可用于数据存储、...

2010/03/12
1K
0

36大数据
02/21
0
0

3分钟看懂Activity启动流程

29分钟前
1
0

##XGBoost for GPU安装https://blog.csdn.net/weixin_30963287/article/details/79145107https://blog.csdn.net/wl2858623940/article/details/80546140https://blog.csdn.net/u01164186......

KYO4321
32分钟前
1
0

Windows 10 May 2019（1903版）正式释出将近一个月，或许已经有用户自主安装更新了，不过微软认为还不够多。微软表示将开始训练机器学习（machine learning）技术，帮助1803版本以前的PC更新...

yisy5566

0
0

0
0

好程序员web前端教程分享三大前端框架相关问题，三大前端框架，有没有哪个框架的组件间交互像js的方法传值一样简单？ 首先框架组件通信是为了方便组件模块之间进行数据交互的，因为框架的...

0
0 