A Fast Topological Parallel Algorithm for Traversing Large Datasets
Автор: Thiago Nascimento Rodrigues
Журнал: International Journal of Information Technology and Computer Science @ijitcs
Статья в выпуске: 1 Vol. 15, 2023 года.
Бесплатный доступ
This work presents a parallel implementation of a graph-generating algorithm designed to be straightforwardly adapted to traverse large datasets. This new approach has been validated in a correlated scenario known as the word ladder problem. The new parallel algorithm induces the same topological structure proposed by its serial version and also builds the shortest path between any pair of words to be connected by a ladder of words. The implemented parallelism paradigm is the Multiple Instruction Stream - Multiple Data Stream (MIMD) and the test suite embraces 23-word ladder instances whose intermediate words were extracted from a dictionary of 183,719 words (dataset). The word morph quality (the shortest path between two input words) and the word morph performance (CPU time) were evaluated against a serial implementation of the original algorithm. The proposed parallel algorithm generated the optimal solution for each pair of words tested, that is, the minimum word ladder connecting an initial word to a final word was found. Thus, there was no negative impact on the quality of the solutions comparing them with those obtained through the serial ANG algorithm. However, there was an outstanding improvement considering the CPU time required to build the word ladder solutions. In fact, the time improvement was up to 99.85%, and speedups greater than 2.0X were achieved with the parallel algorithm.
Word Ladder, Parallel Algorithm, Topological Structure, Graph Algorithm, Dataset Traversing
Короткий адрес: https://sciup.org/15018918
IDR: 15018918 | DOI: 10.5815/ijitcs.2023.01.01
Список литературы A Fast Topological Parallel Algorithm for Traversing Large Datasets
- H. Xingwei, “Sorting big data by revealed preference with application to college ranking,” Journal of Big Data, vol. 7, no. 30, pp. 1–26, 2020.
- Y. Chi, X. Cheng, C. Song, R. Xia, L. Xu, and Z. Li, “Sorting and utilizing of telecom operators’ data assets based on big data,” in 2019 IEEE International Conferences on Ubiquitous Computing Communications (IUCC) and Data Science and Computational Intelligence (DSCI) and Smart Computing, Networking and Services (SmartCNS), pp. 621–625, 2019.
- Y. Zhan and K. H. Tan, “An analytic infrastructure for harvesting big data to enhance supply chain performance,” European Journal of Operational Research, vol. 281, no. 3, pp. 559–574, 2020.
- J. Klüver, J. Schmidt, and C. Klüver, “Word morph and topological structures: A graph generating algorithm,” Complexity, vol. 21, pp. 426–436, sep/oct 2016.
- S. Iyengar, N. Zweig, A. Natarajan, and V. Madhavan, “A network analysis approach to understand humanwayfinding problem,” in Proceedings of the Annual Meeting of the Cognitive Science Society, vol. 33, pp. 2836–2841, 2011.
- S. Milgram, “The Small-World Problem,” Psychology Today, vol. 1, no. 1, pp. 61–67, 1967.
- J. M. Kleinberg, “Small-world phenomena and the dynamics of information,” in Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3-8, 2001, Vancouver, British Columbia, Canada], pp. 431–438, MIT Press, 2001.
- D. Watts, Small Worlds: the dynamics of networks between order and randomness. Princeton Univ Pr, 1999.
- D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’ networks,” Nature, vol. 393, no. 6684, pp. 440–442, 1998.
- J. Leskovec and J. Shawe-Taylor, “Semantic text features from small world graphs,” in Subspace, Latent Structure and Feature Selection techniques: Statistical and Optimization perspectives Workshop, Slovenia, 2005.
- D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-mat: A recursive model for graph mining,” in SIAM International Conference on Data Mining, 2004.
- A. Mahmood, A. Jabbar, E. K. Cetinkaya, and J. P. G. Sterbenz, “Deriving network topologies from real world constraints,” 2010 IEEE Globecom Workshops, pp. 400–404, 2010.
- A. Ranganathan and F. Dellaert, “Online probabilistic topological mapping,” The International Journal of Robotics Research, vol. 30, no. 6, pp. 755–771, 2011.
- J. Leeuwen and J. v. Leeuwen, Algorithms and Complexity. Elsevier Science, 1990.
- A. Prat-Perez, D. Dominguez-Sal, J.-L. Larriba-Pey, and P. Trancoso, “Producer-consumer: The programming model for future many-core processors,” in Architecture of Computing Systems – ARCS 2013 (H. Kubátová, C. Hochberger, M. Danek, and B. Sick, eds.), (Berlin, Heidelberg), pp. 110–121, Springer Berlin Heidelberg, 2013.
- M. J. Flynn, “Very high-speed computing systems,” Proceedings of the IEEE, vol. 54, pp. 1901 – 1909, December 1966.
- P. Pacheco, An Introduction to Parallel Programming. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1st ed., 2011.
- T. N. Rodrigues, “Parallel ANG Algorithm for the Word Ladder Game.” https://github.com/tnas/word-ladder, 2021.
- V. S. Adamchik, “S&Q: Word Ladder.” https://viterbiweb.usc.edu/ adamchik/15-121/labs.html, 2009.