SlideShare a Scribd company logo
1 of 21
Download to read offline
Lizard 
Clustering an RDF Triplestore 
Andy Seaborne 
andy@a.o
Why Cluster? 
➢ Service Resilience 
○ Failures 
○ Server admin and security patches 
➢ Performance / scale 
○ More hardware : CPU, RAM, system bus
Why Cluster? 
➢ Alternatives 
○ Full clustering 
○ Master-slave (load scale only ; update issues) 
○ Application visible partitioning
Who am I? 
➢ Committer on Apache Jena 
○ Deploy/operate Jena/TDB in ££job. 
➢ W3C 
○ Co-editor on SPARQ 1.0 and 1.1 query lang 
spaces 
○ RDF 1.1 (on syntax, inc. SPARQL alignment) 
○ ASF’s W3C AC representative
Acknowledgements 
➢ Apache 
➢ Partial funding : InnovateUK* 
➢ Users 
○ For the discussion and encouragement 
* Used to be the Technology Strategy Board. 
UK Department for Business, Innovation & Skills
Outline 
➢ TDB Design 
➢ SPARQL Execution 
➢ Lizard Design 
➢ Back to SPARQL
TDB 
➢ Custom RDF Database 
○ Quads + Triples 
➢ Custom index code 
○ Threaded B+Trees 
➢ Dictionary terms 
○ NodeId – 8 bytes 
○ Inline terms (numbers, dates, dateTimes)
TDB : Node Table
TDB : Indexes 
➢ Indexes are covering 
○ Range scans 
○ All key, no value 
○ No "triple table"
SPARQL Execution 
{ ?x :p 123 . } 
Convert to NodeIds 
Look in POS to get all PO?, assign S to ?x 
123 is an inline constant in TDB. 
{ ?x :p 123 . 
?x :q ?v . } 
A database join 
Index join (Loop+substitution) 
Index join (= loop) on 
:x1 :q ?v 
where :x1 is the value of ?x
Index Implementation 
➢ TDB uses threaded B+Trees for indexes 
○ 8K blocks 100-way B+Tree 
○ Threaded : scan only touches data blocks 
SPO SPO SPO ------ ------ ------ 
Ptr Ptr ------ ------ ------ 
SPO SPO SPO SPO ------ ------ 
Ptr Ptr Ptr ------ ------ 
SPO SPO SPO SPO SPO SPO SPO SPO SPO SPO ------ ------
Choices 
Query and Update 
Indexes / B+Trees Node table / Objects 
Blocks Key → Value Store 
Where to introduce distribution?
This Does Not Work (very well) 
Query and Update 
B+Trees Objects 
Blocks Key→Value 
➢ Impedance mismatch 
Distribute the storage 
K->V store 
Index access on query processor 
○ Too much data moving about 
○ Little parallelism 
○ Bad cold-start
Distribute 
Query and Update 
B+Trees Objects 
Blocks Key→Value 
➢ Distribute the indexes 
○ With modified index access 
➢ Distribute the nodes 
➢ Comms : Apache Thrift
Clustered Node Table 
➢ Node Table 
○ N replicas; Read R / Write W 
e.g. W=N and R =1 => 
Complete copies of node table on each data server 
○ Replaceable 
Requirement: NodeId for naming
Clustered Indexes 
➢ Indexes 
○ Shard by subject 
○ Replica shards 
○ Compound access operations
Clustered Indexes 
Index 
Shard 1 Shard 2 Shard 3 
Machine 1 Machine 2
Configuration 1
Configuration 2
Modified SPARQL Execution 
➢ Different unit of index access 
○ subject + several predicates 
(subj, pred1, pred2, pred3, …) 
➢ Different join algorithms 
○ Merge join 
○ Parallel hash join
Paul Hirst / CC-BY-SA-2.5

More Related Content

What's hot

First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmInfoFarm
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVKevin Xu
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and TechnologyXinyuan Fu
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsThe HDF-EOS Tools and Information Center
 
FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...
FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...
FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...FIWARE
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution Chen Wu
 
Presto Bangalore Meetup1 Presto Raptor@ola
Presto Bangalore Meetup1 Presto Raptor@olaPresto Bangalore Meetup1 Presto Raptor@ola
Presto Bangalore Meetup1 Presto Raptor@olaShubham Tagra
 
(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag Session(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag SessionJens Hadlich
 
Improve data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDFImprove data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDFKentaro Yoshida
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataAki Ariga
 
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Muga Nishizawa
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)Takuya UESHIN
 

What's hot (20)

Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 dataUsage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
Usage of NCL, IDL, and MATLAB to access NASA HDF4/HDF-EOS2/HDF-EOS5 data
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
 
Implementing HDF5 in MATLAB
Implementing HDF5 in MATLABImplementing HDF5 in MATLAB
Implementing HDF5 in MATLAB
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
 
Data Are from Mars, Tools Are from Venus
Data Are from Mars, Tools Are from VenusData Are from Mars, Tools Are from Venus
Data Are from Mars, Tools Are from Venus
 
Terark Product and Technology
Terark Product and TechnologyTerark Product and Technology
Terark Product and Technology
 
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 productsInteroperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
 
FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...
FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...
FIWARE Wednesday Webinars - Interface With Machines and Robots: Building Inte...
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
 
HDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSSHDF Group Support for NPP/NPOESS/JPSS
HDF Group Support for NPP/NPOESS/JPSS
 
Partitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph ExecutionPartitioning SKA Dataflows for Optimal Graph Execution
Partitioning SKA Dataflows for Optimal Graph Execution
 
Presto Bangalore Meetup1 Presto Raptor@ola
Presto Bangalore Meetup1 Presto Raptor@olaPresto Bangalore Meetup1 Presto Raptor@ola
Presto Bangalore Meetup1 Presto Raptor@ola
 
Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4Moving form HDF4 to HDF5/netCDF-4
Moving form HDF4 to HDF5/netCDF-4
 
(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag Session(JVM) Garbage Collection - Brown Bag Session
(JVM) Garbage Collection - Brown Bag Session
 
Improve data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDFImprove data engineering work with Digdag and Presto UDF
Improve data engineering work with Digdag and Presto UDF
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
HDF5 Performance Enhancements with the Elimination of Unlimited DimensionHDF5 Performance Enhancements with the Elimination of Unlimited Dimension
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
 
Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)Introducing Koalas 1.0 (and 1.1)
Introducing Koalas 1.0 (and 1.1)
 

Similar to 2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore

Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdfhellobank1
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data WarehousesConnor McDonald
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightDataWorks Summit
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox PosterDBOnto
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Cloudera, Inc.
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Ceph Community
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
AltaVista Search Engine Architecture
AltaVista Search Engine ArchitectureAltaVista Search Engine Architecture
AltaVista Search Engine ArchitectureChangshu Liu
 

Similar to 2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore (20)

Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
 
3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf3.INTEL.Optane_on_ceph_v2.pdf
3.INTEL.Optane_on_ceph_v2.pdf
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 
Zero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsightZero ETL analytics with LLAP in Azure HDInsight
Zero ETL analytics with LLAP in Azure HDInsight
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
RDFox Poster
RDFox PosterRDFox Poster
RDFox Poster
 
Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013Presentations from the Cloudera Impala meetup on Aug 20 2013
Presentations from the Cloudera Impala meetup on Aug 20 2013
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
AltaVista Search Engine Architecture
AltaVista Search Engine ArchitectureAltaVista Search Engine Architecture
AltaVista Search Engine Architecture
 

More from andyseaborne

SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020andyseaborne
 
2016-02 Graphs - PG+RDF
2016-02 Graphs - PG+RDF2016-02 Graphs - PG+RDF
2016-02 Graphs - PG+RDFandyseaborne
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphsandyseaborne
 
Graph Data -- RDF and Property Graphs
Graph Data -- RDF and Property GraphsGraph Data -- RDF and Property Graphs
Graph Data -- RDF and Property Graphsandyseaborne
 
SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)andyseaborne
 
NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Storesandyseaborne
 

More from andyseaborne (6)

SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020SHACL in Apache jena - ApacheCon2020
SHACL in Apache jena - ApacheCon2020
 
2016-02 Graphs - PG+RDF
2016-02 Graphs - PG+RDF2016-02 Graphs - PG+RDF
2016-02 Graphs - PG+RDF
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
 
Graph Data -- RDF and Property Graphs
Graph Data -- RDF and Property GraphsGraph Data -- RDF and Property Graphs
Graph Data -- RDF and Property Graphs
 
SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)SPARQL 1.1 Update (2013-03-05)
SPARQL 1.1 Update (2013-03-05)
 
NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Stores
 

Recently uploaded

Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsRoxana Stingu
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfmchristianalwyn
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSlesteraporado16
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSedrianrheine
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdfShreedeep Rayamajhi
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024Jan Löffler
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteMavein
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...APNIC
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Shubham Pant
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilitiesalihassaah1994
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxnaveenithkrishnan
 
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptxA_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptxjayshuklatrainer
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpressssuser166378
 
Niche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus BonusNiche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus BonusSkylark Nobin
 
world Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptxworld Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptxnaveenithkrishnan
 

Recently uploaded (15)

Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced HorizonsVision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
Vision Forward: Tracing Image Search SEO From Its Roots To AI-Enhanced Horizons
 
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdfLESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
LESSON 5 GROUP 10 ST. THOMAS AQUINAS.pdf
 
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASSLESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
LESSON 10/ GROUP 10/ ST. THOMAS AQUINASS
 
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDSTYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
TYPES AND DEFINITION OF ONLINE CRIMES AND HAZARDS
 
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdfIntroduction to ICANN and Fellowship program  by Shreedeep Rayamajhi.pdf
Introduction to ICANN and Fellowship program by Shreedeep Rayamajhi.pdf
 
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
WordPress by the numbers - Jan Loeffler, CTO WebPros, CloudFest 2024
 
Computer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a WebsiteComputer 10 Lesson 8: Building a Website
Computer 10 Lesson 8: Building a Website
 
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
Benefits of doing Internet peering and running an Internet Exchange (IX) pres...
 
Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024Check out the Free Landing Page Hosting in 2024
Check out the Free Landing Page Hosting in 2024
 
Zero-day Vulnerabilities
Zero-day VulnerabilitiesZero-day Vulnerabilities
Zero-day Vulnerabilities
 
Bio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptxBio Medical Waste Management Guideliness 2023 ppt.pptx
Bio Medical Waste Management Guideliness 2023 ppt.pptx
 
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptxA_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
A_Z-1_0_4T_00A-EN_U-Po_w_erPoint_06.pptx
 
Presentation2.pptx - JoyPress Wordpress
Presentation2.pptx -  JoyPress WordpressPresentation2.pptx -  JoyPress Wordpress
Presentation2.pptx - JoyPress Wordpress
 
Niche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus BonusNiche Domination Prodigy Review Plus Bonus
Niche Domination Prodigy Review Plus Bonus
 
world Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptxworld Tuberculosis day ppt 25-3-2024.pptx
world Tuberculosis day ppt 25-3-2024.pptx
 

2014-11 ApacheConEU : Lizard - Clustering an RDF TripleStore

  • 1. Lizard Clustering an RDF Triplestore Andy Seaborne andy@a.o
  • 2. Why Cluster? ➢ Service Resilience ○ Failures ○ Server admin and security patches ➢ Performance / scale ○ More hardware : CPU, RAM, system bus
  • 3. Why Cluster? ➢ Alternatives ○ Full clustering ○ Master-slave (load scale only ; update issues) ○ Application visible partitioning
  • 4. Who am I? ➢ Committer on Apache Jena ○ Deploy/operate Jena/TDB in ££job. ➢ W3C ○ Co-editor on SPARQ 1.0 and 1.1 query lang spaces ○ RDF 1.1 (on syntax, inc. SPARQL alignment) ○ ASF’s W3C AC representative
  • 5. Acknowledgements ➢ Apache ➢ Partial funding : InnovateUK* ➢ Users ○ For the discussion and encouragement * Used to be the Technology Strategy Board. UK Department for Business, Innovation & Skills
  • 6. Outline ➢ TDB Design ➢ SPARQL Execution ➢ Lizard Design ➢ Back to SPARQL
  • 7. TDB ➢ Custom RDF Database ○ Quads + Triples ➢ Custom index code ○ Threaded B+Trees ➢ Dictionary terms ○ NodeId – 8 bytes ○ Inline terms (numbers, dates, dateTimes)
  • 8. TDB : Node Table
  • 9. TDB : Indexes ➢ Indexes are covering ○ Range scans ○ All key, no value ○ No "triple table"
  • 10. SPARQL Execution { ?x :p 123 . } Convert to NodeIds Look in POS to get all PO?, assign S to ?x 123 is an inline constant in TDB. { ?x :p 123 . ?x :q ?v . } A database join Index join (Loop+substitution) Index join (= loop) on :x1 :q ?v where :x1 is the value of ?x
  • 11. Index Implementation ➢ TDB uses threaded B+Trees for indexes ○ 8K blocks 100-way B+Tree ○ Threaded : scan only touches data blocks SPO SPO SPO ------ ------ ------ Ptr Ptr ------ ------ ------ SPO SPO SPO SPO ------ ------ Ptr Ptr Ptr ------ ------ SPO SPO SPO SPO SPO SPO SPO SPO SPO SPO ------ ------
  • 12. Choices Query and Update Indexes / B+Trees Node table / Objects Blocks Key → Value Store Where to introduce distribution?
  • 13. This Does Not Work (very well) Query and Update B+Trees Objects Blocks Key→Value ➢ Impedance mismatch Distribute the storage K->V store Index access on query processor ○ Too much data moving about ○ Little parallelism ○ Bad cold-start
  • 14. Distribute Query and Update B+Trees Objects Blocks Key→Value ➢ Distribute the indexes ○ With modified index access ➢ Distribute the nodes ➢ Comms : Apache Thrift
  • 15. Clustered Node Table ➢ Node Table ○ N replicas; Read R / Write W e.g. W=N and R =1 => Complete copies of node table on each data server ○ Replaceable Requirement: NodeId for naming
  • 16. Clustered Indexes ➢ Indexes ○ Shard by subject ○ Replica shards ○ Compound access operations
  • 17. Clustered Indexes Index Shard 1 Shard 2 Shard 3 Machine 1 Machine 2
  • 20. Modified SPARQL Execution ➢ Different unit of index access ○ subject + several predicates (subj, pred1, pred2, pred3, …) ➢ Different join algorithms ○ Merge join ○ Parallel hash join
  • 21. Paul Hirst / CC-BY-SA-2.5