Map-Reduce based Multiple Sub-Graph Enumeration Using Dominating-Set Graph Partition

Автор: Fathimabi shaik, RBV Subramanyam, DVLN Somayajulu

Журнал: International Journal of Information Engineering and Electronic Business(IJIEEB) @ijieeb

Статья в выпуске: 2 vol.9, 2017 года.

Бесплатный доступ

The purpose of this paper is to find all the instances of a given set of pattern graphs (sub-graphs) in a large data graph using a single round of Map-Reduce. For the simplest pattern graphs like a triangle and rectangle we propose the solution. This paper enumerates complex pattern graphs using the enumeration of simple pattern graphs. We proposed Dominating set based graph partition, it generates non-overlapped sub-graphs. Each sub-graph is processed by one machine in the cluster. We analyze both the communication cost and the total computational cost. Communication cost is reduced by using Map-Reduce based dominating set graph partition. At the same time Multiple pattern (sub-graphs) graphs can be enumerated with the same communication cost. The proposed method is not always superior to the conventional sub-graph enumeration, but in some cases involving large-scale data where this method wins, including (1) Adjacency list representation of the graph is the input (2) Number of partitions are decided based on the graph size. We experimentally show that our approach decreases significantly the computation cost, communication cost and scales the enumeration process with a large graph database.

Еще

Big Graph Processing, Map-Reduce, Graph Partition, Triangle Listing, Sub-Graph Enumeration

Короткий адрес: https://sciup.org/15013502

IDR: 15013502

Список литературы Map-Reduce based Multiple Sub-Graph Enumeration Using Dominating-Set Graph Partition

  • Alon, Noga, Raphael Yuster, and Uri Zwick. "Finding and counting given length cycles." Algorithmica 17.3 (1997): 209-223. DOI: 10.1007/BF02523189
  • Kowaluk, Mirosław, Andrzej Lingas, and Eva-Marta Lundell. "Counting and detecting small subgraphs via equations and matrix multiplication." Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms. SIAM, 2011.
  • Tsourakakis, Charalampos E., et al. "Doulion: counting triangles in massive graphs with a coin." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009,doi:10.1145/1557019.1557111
  • Schank, Thomas. "Algorithmic aspects of triangle-based network analysis." Phd in computer science, University Karlsruhe 3 (2007).
  • Suri, Siddharth, and Sergei Vassilvitskii. "Counting triangles and the curse of the last reducer." Proceedings of the 20th international conference on World wide web. ACM, 2011.DOI: 10.1145/1963405.1963491
  • Kolda, Tamara G., et al. "Counting triangles in massive graphs with Map-Reduce." SIAM Journal on Scientific Computing 36.5 (2014): S48-S77. DOI: 10.1137/13090729X
  • Pagh, Rasmus, and Charalampos E. Tsourakakis. "Colorful triangle counting and a Map-Reduce implementation." Information Processing Letters 112.7 (2012): 277-281.
  • Afrati, Foto N., Dimitris Fotakis, and Jeffrey D. Ullman. "Enumerating subgraph instances using map-reduce." Data Engineering (ICDE), 2013 IEEE 29th International Conference on. IEEE, 2013. DOI: 10.1109/ICDE.2013.6544814
  • Afrati, Foto N., and Jeffrey D. Ullman. "Optimizing multiway joins in a map-reduce environment." IEEE Transactions on Knowledge and Data Engineering 23.9 (2011): 1282-1298.DOI: 10.1145/1739041.1739056
  • Lai, Longbin, et al. "Scalable subgraph enumeration in Map-Reduce." Proceedings of the VLDB Endowment 8.10 (2015): 974-985. DOI: 10.14778/2794367.2794368
  • Chu, Shumo, and James Cheng. "Triangle listing in massive networks and its applications." Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011. DOI: 10.1145/2020408.2020513
  • Park, Ha-Myung, and Chin-Wan Chung. "An efficient Map-Reduce algorithm for counting triangles in a very large graph." Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, 2013. DOI: 10.1145/2505515.2505563
  • Hu, Xiaocheng, Yufei Tao, and Chin-Wan Chung. "Massive graph triangulation." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.DOI:10.1145/2463676.2463704
  • Sun, Zhao, et al. "Efficient subgraph matching on billion node graphs." Proceedings of the VLDB Endowment 5.9 (2012): 788-799. DOI: 10.14778/2311906.2311907
  • Viger, Fabien, and Matthieu Latapy. "Efficient and simple generation of random simple connected graphs with prescribed degree sequence." International Computing and Combinatorics Conference. Springer Berlin Heidelberg, 2005. DOI: 10.1007/11533719_45
  • Zhang, Xiaofei, Lei Chen, and Min Wang. "Efficient multi-way theta-join processing using Map-Reduce." Proceedings of the VLDB Endowment 5.11 (2012): 1184-1195.DOI: 10.14778/2350229.2350238
  • Zhao, Peixiang, and Jiawei Han. "On graph query optimization in large networks." Proceedings of the VLDB Endowment 3.1-2 (2010): 340-351. DOI: 10.14778/1920841.1920887
  • Zhao, Zhao, et al. "Subgraph enumeration in large social contact networks using parallel color coding and streaming." 2010 39th International Conference on Parallel Processing. IEEE, 2010.DOI: 10.1109/ICPP.2010.67
  • Wang, Chaokun, et al. "MapDupReducer: detecting near duplicates over massive datasets." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010.DOI: 10.1145/1807167.1807296
  • Chiba, Norishige, and Takao Nishizeki. "Arboricity and subgraph listing algorithms." SIAM Journal on Computing 14.1 (1985): 210-223.
  • Lin, Jimmy, and Chris Dyer. "Data-intensive text processing with Map-Reduce." Synthesis Lectures on Human Language Technologies3.1(2010):1-170
  • Hadoop Distributed File System hdfs : http://hadoop.apache.org/hdfs
  • Hadoop. http://hadoop.apache.org
  • Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. "The Google file system." ACM SIGOPS operating systems review. Vol. 37. No. 5. ACM, 2003.DOI: 10.1145/1165389.945450
  • J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, pages 137–150, 2004. doi: 10.1016/j.procs.2015.07.392
  • J. Cohen. Graph twiddling in a mapreduce world.Computing in Science and Engineering, 11(4):29–41, 2009.
  • Vernica, Rares, Michael J. Carey, and Chen Li. "Efficient parallel set-similarity joins using MapReduce." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010, 10.1145/1807167.1807222
  • Dilbag Singh,Jaswinder Singh,Amit Chhabra "Failures in Cloud Computing Data Centers in 3-tier Cloud Architecture", IJIEEB, vol.4, no.3, pp.1-8, 2012. DOI: 10.5815/ijieeb.2012.03.01
  • Jitendra Singh,"Study of Response Time in Cloud Computing", IJIEEB, vol.6, no.5, pp.36-43, 2014. DOI: 10.5815/ijieeb.2014.05.06
  • Karim Zarour, Nacereddine Zarour,"Data Center Strategy to Increase Medical Information Sharing in Hospital Information Systems", IJIEEB, vol.5, no.1, pp.33-39, 2013. DOI: 10.5815/ijieeb.2013.01.04
Еще
Статья научная