SlideShare a Scribd company logo
1 of 18
Download to read offline
© 2013 EDB All rights reserved. 1
Implementing Parallelism in
PostgreSQL
•
Robert Haas | PGCon 2014
© 2014 EDB All rights reserved. 2
•
Between 1996 and 2004, single-threaded CPU
performance on SPECint and SPECfp benchmarks
increased by >50% per year. Between 2004 and 2012,
it increased by ~21% per year.
− http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/
•
Single-threaded 7-zip performance was only 39%
faster on 2 x Intel Xeon L5640 (March 16, 2010; 2.7
GHz, 12 MB cache) than on 4 x AMD Opteron 880
(September 26, 2005; 2.4 GHz, 2MB cache). That's
only 7.7% per year.
− http://www.anandtech.com/show/6825/inside-anandtech-2013-cpu-performance
Parallelism: Why? (1)
© 2014 EDB All rights reserved. 3
•
Dell Configuration Tool (as of 2014-05-15):
− 2x Intel® Xeon® E7-4890 v2 Processor 2.8GHz, 37.5M
Cache, 8.0 GT/s QPI, Turbo, 15 Core, 155W [add $7,735.68]
− 2x Intel® Xeon® E7-8893 v2 Processor 3.4GHz, 37.5M
Cache, 8.0 GT/s QPI, Turbo, 6 Core, 155W [add $8,410.30]
Parallelism: Why? (2)
© 2014 EDB All rights reserved. 4
Hash Join
Join Cond: foo.x = bar.x
→ Seq Scan on foo
Filter: something_complicated
→ Hash
→ Seq Scan on bar
•
One backend could run the Seq Scan and apply the
filter condition; it could then stream the results to
another backend to perform the Hash Join.
Parallel Query: Inter-Node
© 2014 EDB All rights reserved. 5
Hash Join
Join Cond: foo.x = bar.x
→ Seq Scan on foo
Filter: something_complicated
→ Hash
→ Seq Scan on bar
•
Multiple backends could cooperate to perform Seq
Scan – or Hash Join.
Parallel Query: Intra-Node
© 2014 EDB All rights reserved. 6
•
CREATE INDEX
− Parallel Heap Scan
− Parallel Sort
•
VACUUM
− Parallel Heap Scan
− Worker Per Index (suggestion from Andres and Heikki)
Parallel Maintenance / DDL Commands
© 2014 EDB All rights reserved. 7
•
Processes – Not Threads
− None of our fundamental subsystems are thread-safe (e.g.
palloc/pfree, ereport, syscache, relcache, buffer manager).
− Making them thread-safe would add synchronization overhead
even in the single-threaded case – and also bugs.
•
Started By Postmaster – Not created via fork()
− Can't fork() on Windows, where many of our users are.
− Currently, all backends are direct children of the postmaster;
seems preferable to keep it that way.
Architectural Overview (1)
© 2014 EDB All rights reserved. 8
•
Shared Memory – Not Pipes or Files
− Files would cause more system calls and more I/O.
− Pipes are a good paradigm, but shared memory is more
flexible.
− We can use shared memory to emulate a pipe if we need to –
see shm_mq. (This also dodges platform dependencies.)
•
Dynamic Shared Memory – Not Main Segment
− For an application such as parallel sort, we might need a LOT
of memory, like a terabyte. We can't pre-reserve that!
•
Dynamic Shared Memory Could Be At a Different
Address in Every Process
− No good, general techniques for achieving this.
Architectural Overview (2)
© 2014 EDB All rights reserved. 9
•
Basic Facilities (done in 9.4)
•
Plumbing (some work done/in progress)
•
Parallel Environment (a little unpublished work done)
•
Parallel Execution (some study/thought)
•
Parallel Planning (no idea yet)
What Do We Need To Build?
© 2014 EDB All rights reserved. 10
•
Dynamic Background Workers (done in 9.4)
•
Dynamic Shared Memory (done in 9.4)
Basic Facilities
© 2014 EDB All rights reserved. 11
•
DSM Table of Contents (done in 9.4)
− I just mapped this dynamic shared memory segment; how do I
figure out what it contains?
•
Message Queueing (done in 9.4)
− How does a background worker send tuples, errors, notices,
etc. to a user backend?
•
Error Propagation (working on it)
− Common infrastructure to make using message queueing
easy.
•
Shared Memory Allocator (early draft posted)
•
Shared Hash Table (someday)
Plumbing
© 2014 EDB All rights reserved. 12
•
Make the Background Worker Look Enough Like a
Regular User Backend To Do Useful Work
− Copy Relevant State (e.g. User, Database, Snapshot)
•
Useful Work Doesn't Mean Everything
− Some operations seem fundamentally unsafe in a parallel
context (e.g. calling a user-defined function that sets a GUC).
− Some operations could theoretically be made safe, but we
might not bother (e.g. setseed() + random()).
− Even if we share lots of state, arbitrary user-supplied code can
never be safe; must label unsafe functions.
Parallel Environment
© 2014 EDB All rights reserved. 13
•
User ID and Database
•
GUCs
•
Transaction State
•
Current and Active Snapshot
•
Combo CID Hash
Parallel Environment: What To Copy
© 2014 EDB All rights reserved. 14
•
Sequence Operations
•
Generation of Invalidation Messages
•
Cursor Operations
•
Large Object Manipulation
•
LISTEN/NOTIFY
•
Access to Temporary Buffers
•
Prepared Statements
Parallel Environment: What To Prohibit
© 2014 EDB All rights reserved. 15
•
Background Workers Can't Rely on User Backend To
Hold Necessary Locks
− The user backend might die or be killed before the background
worker terminates.
•
If Background Workers Re-Lock The Same Relations,
Parallel Query Might Self Deadlock
− User backend locks X; another process queues for a
conflicting lock on X; background worker tries to re-lock X.
•
Probably Need a Concept of Locking Groups Inside the
Lock Manager
Parallel Environment: Lock Management
© 2014 EDB All rights reserved. 16
•
This is the “easy” part.
•
Parallel sorting algorithms are described in the
literature and well-understood.
•
For parallel sequential scan, grab blocks or block
ranges in alternation.
•
Amdahl's Law: If α is the fraction of running time a
program spends executing serially, the maximum
speedup from parallelism is 1/α.
Parallel Execution
© 2014 EDB All rights reserved. 17
•
This is probably hard.
•
Right now, we do costing based on estimating the page
access costs (CPU and I/O) and tuple processing
costs.
•
For parallelism, need to consider worker startup costs
and IPC costs.
•
A plan that's a little cheaper for me might be much
more expensive in total.
Parallel Query Planning
© 2014 EDB All rights reserved. 18
•
Any questions?
Thanks.

More Related Content

What's hot

PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and BenchmarksJignesh Shah
 
EDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB
 
EnterpriseDB BackUp and Recovery Tool
EnterpriseDB BackUp and Recovery ToolEnterpriseDB BackUp and Recovery Tool
EnterpriseDB BackUp and Recovery ToolEDB
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentEDB
 
Tips and Tricks for SAP Sybase IQ
Tips and Tricks for SAP  Sybase IQTips and Tricks for SAP  Sybase IQ
Tips and Tricks for SAP Sybase IQDon Brizendine
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) PostgreSQL Experts, Inc.
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with ContainersEDB
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Postgres Point-in-Time Recovery
Postgres Point-in-Time RecoveryPostgres Point-in-Time Recovery
Postgres Point-in-Time RecoveryEDB
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
5 Tips to Simplify the Management of Your Postgres Database
5 Tips to Simplify the Management of Your Postgres Database5 Tips to Simplify the Management of Your Postgres Database
5 Tips to Simplify the Management of Your Postgres DatabaseEDB
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverEDB
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Michael Arnold
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesAshnikbiz
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 

What's hot (20)

PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
EDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB Postgres DBA Best Practices
EDB Postgres DBA Best Practices
 
EnterpriseDB BackUp and Recovery Tool
EnterpriseDB BackUp and Recovery ToolEnterpriseDB BackUp and Recovery Tool
EnterpriseDB BackUp and Recovery Tool
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Top 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres DeploymentTop 10 Tips for an Effective Postgres Deployment
Top 10 Tips for an Effective Postgres Deployment
 
HBase operations
HBase operationsHBase operations
HBase operations
 
Tips and Tricks for SAP Sybase IQ
Tips and Tricks for SAP  Sybase IQTips and Tricks for SAP  Sybase IQ
Tips and Tricks for SAP Sybase IQ
 
Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009) Performance Whack-a-Mole Tutorial (pgCon 2009)
Performance Whack-a-Mole Tutorial (pgCon 2009)
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with Containers
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Postgres Point-in-Time Recovery
Postgres Point-in-Time RecoveryPostgres Point-in-Time Recovery
Postgres Point-in-Time Recovery
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
5 Tips to Simplify the Management of Your Postgres Database
5 Tips to Simplify the Management of Your Postgres Database5 Tips to Simplify the Management of Your Postgres Database
5 Tips to Simplify the Management of Your Postgres Database
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
 
Enterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional DatabasesEnterprise PostgreSQL - EDB's answer to conventional Databases
Enterprise PostgreSQL - EDB's answer to conventional Databases
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 

Similar to Implementing Parallelism in PostgreSQL - PGCon 2014

Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Perforce
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsCeph Community
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014EDB
 
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...Nexcess.net LLC
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Cloudera, Inc.
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...Neo4j
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016Ted Wennmark
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspectivePriit Piipuu
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.pptnadirpervez2
 
Platform Engineering for the Modern Oracle World
Platform Engineering for the Modern Oracle WorldPlatform Engineering for the Modern Oracle World
Platform Engineering for the Modern Oracle WorldSimon Haslam
 

Similar to Implementing Parallelism in PostgreSQL - PGCon 2014 (20)

Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014Top10 list planningpostgresdeployment.2014
Top10 list planningpostgresdeployment.2014
 
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
EECI 2013 - ExpressionEngine Performance & Optimization - Laying a Solid Foun...
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
026 Neo4j Data Loading (ETL_ELT) Best Practices - NODES2022 AMERICAS Advanced...
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016MySQL Enterprise Backup apr 2016
MySQL Enterprise Backup apr 2016
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Database failover from client perspective
Database failover from client perspectiveDatabase failover from client perspective
Database failover from client perspective
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt
 
Platform Engineering for the Modern Oracle World
Platform Engineering for the Modern Oracle WorldPlatform Engineering for the Modern Oracle World
Platform Engineering for the Modern Oracle World
 

More from EDB

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube EDB
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EDB
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLEDB
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLEDB
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?EDB
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresEDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!EDB
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesEDB
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 

More from EDB (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Implementing Parallelism in PostgreSQL - PGCon 2014

  • 1. © 2013 EDB All rights reserved. 1 Implementing Parallelism in PostgreSQL • Robert Haas | PGCon 2014
  • 2. © 2014 EDB All rights reserved. 2 • Between 1996 and 2004, single-threaded CPU performance on SPECint and SPECfp benchmarks increased by >50% per year. Between 2004 and 2012, it increased by ~21% per year. − http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance/ • Single-threaded 7-zip performance was only 39% faster on 2 x Intel Xeon L5640 (March 16, 2010; 2.7 GHz, 12 MB cache) than on 4 x AMD Opteron 880 (September 26, 2005; 2.4 GHz, 2MB cache). That's only 7.7% per year. − http://www.anandtech.com/show/6825/inside-anandtech-2013-cpu-performance Parallelism: Why? (1)
  • 3. © 2014 EDB All rights reserved. 3 • Dell Configuration Tool (as of 2014-05-15): − 2x Intel® Xeon® E7-4890 v2 Processor 2.8GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 15 Core, 155W [add $7,735.68] − 2x Intel® Xeon® E7-8893 v2 Processor 3.4GHz, 37.5M Cache, 8.0 GT/s QPI, Turbo, 6 Core, 155W [add $8,410.30] Parallelism: Why? (2)
  • 4. © 2014 EDB All rights reserved. 4 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • One backend could run the Seq Scan and apply the filter condition; it could then stream the results to another backend to perform the Hash Join. Parallel Query: Inter-Node
  • 5. © 2014 EDB All rights reserved. 5 Hash Join Join Cond: foo.x = bar.x → Seq Scan on foo Filter: something_complicated → Hash → Seq Scan on bar • Multiple backends could cooperate to perform Seq Scan – or Hash Join. Parallel Query: Intra-Node
  • 6. © 2014 EDB All rights reserved. 6 • CREATE INDEX − Parallel Heap Scan − Parallel Sort • VACUUM − Parallel Heap Scan − Worker Per Index (suggestion from Andres and Heikki) Parallel Maintenance / DDL Commands
  • 7. © 2014 EDB All rights reserved. 7 • Processes – Not Threads − None of our fundamental subsystems are thread-safe (e.g. palloc/pfree, ereport, syscache, relcache, buffer manager). − Making them thread-safe would add synchronization overhead even in the single-threaded case – and also bugs. • Started By Postmaster – Not created via fork() − Can't fork() on Windows, where many of our users are. − Currently, all backends are direct children of the postmaster; seems preferable to keep it that way. Architectural Overview (1)
  • 8. © 2014 EDB All rights reserved. 8 • Shared Memory – Not Pipes or Files − Files would cause more system calls and more I/O. − Pipes are a good paradigm, but shared memory is more flexible. − We can use shared memory to emulate a pipe if we need to – see shm_mq. (This also dodges platform dependencies.) • Dynamic Shared Memory – Not Main Segment − For an application such as parallel sort, we might need a LOT of memory, like a terabyte. We can't pre-reserve that! • Dynamic Shared Memory Could Be At a Different Address in Every Process − No good, general techniques for achieving this. Architectural Overview (2)
  • 9. © 2014 EDB All rights reserved. 9 • Basic Facilities (done in 9.4) • Plumbing (some work done/in progress) • Parallel Environment (a little unpublished work done) • Parallel Execution (some study/thought) • Parallel Planning (no idea yet) What Do We Need To Build?
  • 10. © 2014 EDB All rights reserved. 10 • Dynamic Background Workers (done in 9.4) • Dynamic Shared Memory (done in 9.4) Basic Facilities
  • 11. © 2014 EDB All rights reserved. 11 • DSM Table of Contents (done in 9.4) − I just mapped this dynamic shared memory segment; how do I figure out what it contains? • Message Queueing (done in 9.4) − How does a background worker send tuples, errors, notices, etc. to a user backend? • Error Propagation (working on it) − Common infrastructure to make using message queueing easy. • Shared Memory Allocator (early draft posted) • Shared Hash Table (someday) Plumbing
  • 12. © 2014 EDB All rights reserved. 12 • Make the Background Worker Look Enough Like a Regular User Backend To Do Useful Work − Copy Relevant State (e.g. User, Database, Snapshot) • Useful Work Doesn't Mean Everything − Some operations seem fundamentally unsafe in a parallel context (e.g. calling a user-defined function that sets a GUC). − Some operations could theoretically be made safe, but we might not bother (e.g. setseed() + random()). − Even if we share lots of state, arbitrary user-supplied code can never be safe; must label unsafe functions. Parallel Environment
  • 13. © 2014 EDB All rights reserved. 13 • User ID and Database • GUCs • Transaction State • Current and Active Snapshot • Combo CID Hash Parallel Environment: What To Copy
  • 14. © 2014 EDB All rights reserved. 14 • Sequence Operations • Generation of Invalidation Messages • Cursor Operations • Large Object Manipulation • LISTEN/NOTIFY • Access to Temporary Buffers • Prepared Statements Parallel Environment: What To Prohibit
  • 15. © 2014 EDB All rights reserved. 15 • Background Workers Can't Rely on User Backend To Hold Necessary Locks − The user backend might die or be killed before the background worker terminates. • If Background Workers Re-Lock The Same Relations, Parallel Query Might Self Deadlock − User backend locks X; another process queues for a conflicting lock on X; background worker tries to re-lock X. • Probably Need a Concept of Locking Groups Inside the Lock Manager Parallel Environment: Lock Management
  • 16. © 2014 EDB All rights reserved. 16 • This is the “easy” part. • Parallel sorting algorithms are described in the literature and well-understood. • For parallel sequential scan, grab blocks or block ranges in alternation. • Amdahl's Law: If α is the fraction of running time a program spends executing serially, the maximum speedup from parallelism is 1/α. Parallel Execution
  • 17. © 2014 EDB All rights reserved. 17 • This is probably hard. • Right now, we do costing based on estimating the page access costs (CPU and I/O) and tuple processing costs. • For parallelism, need to consider worker startup costs and IPC costs. • A plan that's a little cheaper for me might be much more expensive in total. Parallel Query Planning
  • 18. © 2014 EDB All rights reserved. 18 • Any questions? Thanks.