文档章节

论文阅读 -- Optimizing Graph Algorithms on Pregel-like

wdfnst
 wdfnst
发布于 2016/05/17 14:24
字数 406
阅读 10
收藏 0

Optimizing Graph Algorithms on Pregel-like Systems

Semih Salihoglu Stanford University semih@cs.stanford.edu Jennifer Widom Stanford University widom@cs.stanford.edu

  • Highlight, page 1
    Standard graph algorithms in this setting can incur unnecessary in- efficiencies such as slow convergence or high communication or computation cost

  • Highlight, page 1
    Standard graph algorithms in this setting can incur unnecessary in- efficiencies such as slow convergence or high communication or computation cost

  • Highlight, page 1
    We identify a variety of inefficiencies that arise when ex- ecuting these algorithms on Pregel-like systems, and we describe optimization techniques, some of which are applicable across mul- tiple algorithms, to address them.

  • Highlight, page 1
    large diameters or skew in component sizes.

  • Highlight, page 1
    performing some serial computation on a tiny fraction of the in- put graph

  • Underline, page 1
    complementing

  • Highlight, page 1
    our open-source Pregel implementation

  • Highlight, page 1
    In GPS [42] and Giraph [17], an op- tional master.compute() function is executed by the Master task between supersteps to perform serial computation, and for coor- dination in algorithms that are composed of multiple vertex-centric stages.

  • Highlight, page 1
    The optimization techniques we offer in this paper focus on reducing the first two costs: communication and number of su- persteps.

  • Highlight, page 2
    FCS monitors the size of the “active” graph on which the computation is executing. If the active graph becomes small enough, FCS sends it to the master, which performs the end of the computation serially inside master.compute(), then sends the results back to the workers.

  • Highlight, page 2
    Storing Edges At Subvertices (SEAS): SEAS is an optimiza- tion we apply primarily to our MSF algorithm (Table 1), in which sets of vertices (called subvertices) are merged to form supervertices.

  • Highlight, page 6
    FCS monitors the size the active- subgraph. Once the size of the active-subgraph is below a threshold (5M edges by default), it sends the active-subgraph to the mas- ter, which performs the rest of the computation serially, and sends the results back to the workers.

  • Highlight, page 8
    Our SEAS optimiza- tion instead stores the edges of a supervertex s in a distributed fash- ion among all of its subvertices.

  • Highlight, page 8
    In our implementation of SEAS, subvertices store a pointer to their latest supervertices.

  • Highlight, page 8
    twofold

  • Highlight, page 8
    which is proportional to the number of edges in the graph,

© 著作权归作者所有

wdfnst
粉丝 2
博文 27
码字总数 22859
作品 1
宁波
私信 提问
Spark GraphX Pregel API: An Example

GraphX Pregel API Graph data and graph processing is getting more and more attention lately in various fields. It has become apparent that a large number of real world problems ......

openthings
2016/09/02
98
1
Spark GraphX 编程指南

GraphX编程指南 (根据原文编辑:http://udn.yyuap.com/doc/spark-programming-guide-zh-cn/graphx-programming-guide/index.html) GraphX是一个新的(alpha)Spark API,它用于图和并行图(gr......

openthings
2016/08/29
50
0
GraphX Programming Guide

GraphX Overview GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new Graph abstractio......

openthings
2016/08/29
10
0
100 open source Big Data architecture papers

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it ha......

naughty
2016/04/05
55
0
Pregel:基于图分割的图结构数据并行处理

Pregel设计在google的计算机集群结构之上。一个计算机集群(cluster)就是通用PC按rack(一组PC机)构成,Rack之间具有较高的数据传输速度。集群中通常包含一个域名服务器(namenode),采用...

jhonephone
2014/01/17
0
0

没有更多内容

加载失败,请刷新页面

加载更多

识别图片内容,并将相应内容写到对应文本文件中

# -*- coding: utf-8 -*-"""Created on Thu Apr 18 17:05:47 2019@author: HeyJude"""import timestart_time = time.time()def GetText(pic_path, text_path): import pytesser......

KYO4321
3分钟前
1
0
Java多线程之创建线程的三种方式比较

一:继承Thread类创建线程 1:继承Thread类定义线程子类; 2:重写run()方法,定义线程的操作; 3:通过创建的线程子类对象.start() 启动线程。 package com.thread; public class First...

天王盖地虎626
7分钟前
0
0
inner join 与 left join 之间的区别

关于inner join 与 left join 之间的区别,以前以为自己搞懂了,今天从前端取参数的时候发现不是预想中的结果,才知道问题出在inner join 上了。 需求是从数据库查数据,在前端以柱形图的形式...

dragon_tech
8分钟前
0
0
linux下cpio.gz文件的解压方法

linux下cpio.gz文件的解压方法 linux下cpio.gz文件的解压方法linux解压cpiocpio.gz 今天下载了 10201_database_linux_x86_64.cpio.gz 文件,解压方法如下: 1. gunzip 10201_database_linux...

突突突酱
10分钟前
1
0
Shell分析服务器日志,解锁各种新姿势

Shell分析服务器日志,解锁各种新姿势 DevOps技术栈 5月10日 作者:Panda 原文:https://segmentfault.com/a/1190000009745139 自己的小网站跑在阿里云的ECS上面,偶尔也去分析分析自己网站服...

linzhuangrong
12分钟前
2
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部