CHex: An Efficient RDF Storage and Indexing Scheme for Column-Oriented Databases
Автор: Xin Wang, Shuyi Wang, Pufeng Du, Zhiyong Feng
Журнал: International Journal of Modern Education and Computer Science (IJMECS) @ijmecs
Статья в выпуске: 3 vol.3, 2011 года.
Бесплатный доступ
As increasingly large RDF data sets are being published on the Web, effcient RDF data management has become an essential factor in realizing the Semantic Web vision. However, most existing RDF storage schemes, which are built on top of row-store relational databases, are constrained in terms of efficiency and scalability. Still, the growing popularity of the RDF format used in real-world applications arguably calls for an effort to deal with these drawbacks. In this paper, we propose a novel RDF storage and indexing scheme, called CHex, which uses the triple nature of RDF as an asset to implement sextuple indexing for a column-oriented database system. Using binary association tables (BATs) in the column-oriented data model, RDF data is indexed in six possible ways, one for each possible ordering of the three RDF elements. The sextuple indexing scheme in a column-oriented database not only provides efficient single triple pattern lookups, but also allows fast merge-joins for any pair of two triple patterns. To evaluate the performance of our approach, we generate large-scale data sets upto 13 million triples, and devise benchmark queries that cover important RDF join patterns. The experimental results show that our approach outperforms the row-oriented database systems by upto an order of magnitude and is even competitive to the best state-of-the-art native RDF store.
RDF, storage scheme, sextuple indexing, column-oriented database, binary association table, URI
Короткий адрес: https://sciup.org/15010211
IDR: 15010211
Список литературы CHex: An Efficient RDF Storage and Indexing Scheme for Column-Oriented Databases
- F. Manola, E. Miller, and B. McBride, “RDF primer,” W3C Recommendation, 10 February 2004.
- G. Klyne, J. J. Carroll, and B. McBride. “Resource description framework (RDF): concepts and abstract syntax,” W3C Recommendation, 10 February 2004.
- P. Hayes and B. McBride. “RDF semantics,” W3C Recommendation, 10 February 2004.
- T. Berners-Lee, J. Hendler, and O. Lassila. “The Semantic Web,” Scientific American, 284(5):34-43, 2001.
- E. Prud’hommeaux and A. Seaborne, “SPARQL query language for RDF,” W3C Recommendation, 15 January 2008.
- S. Harris and A. Seaborne. “SPARQL 1.1 query language,” W3C Working Draft, 14 October 2010.
- S. Harris and N. Gibbins, “3store: Efficient bulk RDF storage,” In Proc. PSSS, pp. 1–20, 2003.
- K. Wilkinson, “Jena property table implementation,” In Proc. SSWS, pp. 54–68, 2006.
- D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach, “Scalable semantic web data management using vertical partitioning,” In Proc. VLDB, pp. 411–422, 2007.
- L. Sidirourgos, R. Goncalves, M. Kersten, N. Nes, and S. Manegold, “Column-store support for RDF data management: not all swans are white,” In Proc. VLDB, pp. 1553–1563, 2008.
- T. Neumann and G. Weikum, “RDF-3X: a RISC-style engine for RDF,” In Proc. VLDB, pp. 647–659, 2008.
- C. Weiss, P. Karras, and A. Bernstein, “Hexastore: sextuple indexing for semantic web data management,” In Proc. VLDB, pp. 1008–1019, 2008.
- M. Schmidt, T. Hornung, N. Küchlin, G. Lausen, and C. Pinkel, “An experimental comparison of RDF data management approaches in a SPARQL benchmark scenario,” In Proc. ISWC, pp. 82–97, 2008.
- Y. Guo, Z. Pan, and J. Heflin, “LUBM: A benchmark for OWL knowledge base systems,” Web Semantics 3(2), pp. 158–182, 2005.
- P. Boncz and M. Kersten, “MIL primitives for querying a fragmented world,” VLDB Journal, 8(2), pp. 101–119, 1999.
- X. Wang, S. Wang, P. Du, and Z. Feng. “Storing and indexing RDF data in a column-oriented DBMS,” In Proc. DBTA, pp. 46-49, 2010.