SlideShare a Scribd company logo
1 of 42
Download to read offline
Distributed Systems

 scalability and high availability




Renato Lucindo - lucindo.github.com - @rlucindo
Renato Lucindo

 Call me Lucindo (or Linus)
 2002 - Bachelor Computer Science
 2007 - M.Sc. Computer Science (Combinatorial
 Optimization)
 7+ year developing Distributed Systems




 My default answer: "I don't know."
Agenda


 Scalability

 High Availability

 Problems

 Tips and Tricks

 Learning More
Distributed Systems

  Multiple computers that interact with each other over a
  network to achieve a common goal
  Purpose
     Scalability
     High availability




                                     source: http://www.cnds.jhu.edu/
Scalability

  System ability to handle gracefully a growing amount of
  work

  Scale up (vertical)
     Add resources to a single node
     Improve existing code to handle more work

  Scale out (horizontal)
     Add more nodes to a system
     Linear (or better) scalability
Scalability - Vertical

  Add: CPU, Memory, Disks (bigger box)
  Handling more simultaneous:
     Connections
     Operations
     Users
  Choose a good I/O and concurrency model
     Non-blocking I/O
     Asynchronous I/O
     Threads (single, pool, per-connection)
     Event handling patterns (Reactor, Proactor, ...)
  Memory model?
     STM
Scalability - Vertical

  Careful with numbers
      Requests per second
      # of Connections
      Simultaneous operations
  Event handling
      Think front-end
      Slow connections/clients
      It's slower than other options
  In doubt, go async
  Back-end
      Thread pool (thread per-connection)
      No events
      Process per-core
Scalability - Horizontal

  Add nodes to handle more work
  Front-end
     Straightforward
     Stateless
  Back-end
     Master/Slave(s)
     Partitioning
         DHT
         Volatile Index
Scalability - Horizontal

  Master/Slave
  Write on single Master
  Read on Slaves (one or more)
  Scales reads
Scalability - Horizontal

  Partitioning (Sharding)
     Distribute dada across nodes
  Generally involves data de-normalization
  Where is some specific data?
     Master Index
     Hash (DTH, Consistent Hashing)
     Volatile Index
  Joins done in application level
  NoSQL friendly
Scalability - Horizontal

  Volatile Index: build and maintain data index as cached
  information (all clients)
High Availability

            "Processes, as well as people, die"


  Handle hardware and software failures
      Eliminate single point of failure
  Redundancy
  Failover
  Replicas
High Availability - Failover/Redundancy
High Availability - Replicas

  Two or more copies of same data
  Replica granularity
     From node replica to "row" replica
  Load balancing
  Write concurrency
  Replica updates
  Key for high availability and root of several problems
Problems
Problems - CAP Theorem
Problems - CAP Theorem

 Consistency: all operations (reads/writes) yield a global
 consistent state

 Availability: all requests (on non-failed servers) must have
 a response

 Partition Tolerance: nodes may not be able
 to communicate with each other.



                     Pick Two
Problems - CAP Theorem

 C + A: network problems might stop the system

 Examples:
    Oracle RAC, IBM DB2 Parallel
    RDBMS (Master/Slave)
    Google File System
    HDFS (Hadoop)
Problems - CAP Theorem

 C + P: clients can't always perform operations

 Examples:
    Distributed lock-systems: Chubby, ZooKeeper
    Paxos protocol (consensus)
    BigTable, Hbase
    Hypertable
    MongoDB
Problems - CAP Theorem

 A + P: clients may read inconsistent (old or undone) data

 Examples:
    Amazon Dynamo
    Cassandra
    Voldemort
    CouchDB
    Riak
    Caches
Problem with CAP Theorem

 In practice, C + A and C + P systems are the same.
     C + A: not tolerant of network partitions
     C + P: not available when a network partition occurs
 Big problem: network partition
     Not so big (how often does it happens?)
 Pick two
     Availability
     Consistency
 The forgotten: Latency
     Or, how long the system waits before considering a
     partitioned network?
Problems - Real World

Every component may fail:
   Network failure
   Hardware failure
   Electricity
   Natural disasters
   Code failure
Tips & Tricks
Tips & Tricks - Pyramid

  Capacity (connections, operations, ...) Pyramid
Tips & Tricks - Reply Fast

  FAIL Fast
  Break complex requests into smaller ones
  Use timeouts
  No transactions
  Be aware that a single slow operation or component can
  generate contention
  Self-denial attack
Tips & Tricks - Cache

  Cache: component location, data, dns lookups, previous
  requests, etc
  Use negative cache for failed requests (low expiration)
  Don't rely on cache
  Your system must work with no cache
Tips & Tricks - Queues

  Easy way to add asynchronous processing an decouple
  your system.
Tips & Tricks - DNS
Tips & Tricks - Logs

  Log everything
  Use several log levels
  On every log message
       User
       Request host
       Component involved
       Version
       Filename and line
  If log level not enabled do not process log message
       Avoid lookup calls (gettimeofday)
Tips & Tricks - Domino Effect

  Make sure your load balancer won't overload components
  User smart algorithms
     Load Balance
     Resource Allocation
Tips & Tricks - (Zero) Configuration

  No configuration files
  Use good defaults
  Auto-discovery (multicast, gossip, ...)
  Make everything configurable
     Administrative command
     No need to stop for changes
  Automatic self adjusts when possible
Tips & Tricks - STOP Test

  With your system under load: kill -STOP <component>
Tips & Tricks - Know your tools

  load average (uptime)
  stats tools
      vmstat
      iostat
      mpstat
      tcpstat, tcprstat, etc
  tcpdump, nc, netstat
  tunning
      /proc/net/*
      ulimit
      sysctl
  oprofile
  debuging tools (gdb, valgrind)
  ...
Tips & Tricks - Count

  Count everything
     Connections
     Operations
     Failures
     Successes
     Request times (granularity)
  Total, average, standard deviation
  Monitor counters
Tips & Tricks - Stability Patterns

  Use Timeouts
  Circuit Breaker
  Bulkheads
  Steady State
  Fail Fast
  Handshaking
  Test Harness
  Decoupling Middleware
Tips & Tricks - Don't Panic!
Learning More - Books

TCP/IP Illustrated, Vol. 1: The Protocols
Learning More - Books

Unix Network Programming, Vol. 1: The Sockets Networking
Learning More - Books

Pattern Oriented Software Architecture, Vol. 2
Learning More - Books

Release It!
Learning More - Papers

 The Google File System
 Bigtable: A Distributed Storage System for Structured Data
 Dynamo: Amazon's Highly Available Key-Value Store
 PNUTS: Yahoo!’s Hosted Data Serving Platform
 MapReduce: Simplified Data Processing on Large Clusters

 Towards robust distributed systems
 Brewer's conjecture and the feasibility of consistent,
 available, partition-tolerant web services
 BASE: An Acid Alternative
 Looking up data in P2P systems
Thanks!!! Questions?

lucindo.github.com - @rlucindo

More Related Content

What's hot

Structure of shared memory space
Structure of shared memory spaceStructure of shared memory space
Structure of shared memory spaceCoder Tech
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system Sarvesh Meena
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience aniadkar
 
The Extreme Programming (XP) Model
The Extreme Programming (XP) ModelThe Extreme Programming (XP) Model
The Extreme Programming (XP) ModelDamian T. Gordon
 
Software engineering model
Software engineering modelSoftware engineering model
Software engineering modelManish Chaurasia
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systemsViet-Trung TRAN
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...sumithragunasekaran
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Bhavik Vashi
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)vani261
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDivyen Patel
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage VirtualizationMehul Jariwala
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared MemoryPrakhar Rastogi
 

What's hot (20)

Structure of shared memory space
Structure of shared memory spaceStructure of shared memory space
Structure of shared memory space
 
Distributed Computing system
Distributed Computing system Distributed Computing system
Distributed Computing system
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience
 
The Extreme Programming (XP) Model
The Extreme Programming (XP) ModelThe Extreme Programming (XP) Model
The Extreme Programming (XP) Model
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Distributed file systems dfs
Distributed file systems   dfsDistributed file systems   dfs
Distributed file systems dfs
 
Software engineering model
Software engineering modelSoftware engineering model
Software engineering model
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 
Basic 50 linus command
Basic 50 linus commandBasic 50 linus command
Basic 50 linus command
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
On demand provisioning
On demand provisioningOn demand provisioning
On demand provisioning
 
Unit 4
Unit 4Unit 4
Unit 4
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage Virtualization
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 

Similar to Distributed Systems: scalability and high availability

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Stuart Charlton
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReducecoolmirza143
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operationniallmilton
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageNilesh Salpe
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCCal Henderson
 

Similar to Distributed Systems: scalability and high availability (20)

Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009Designing for the Cloud Tutorial - QCon SF 2009
Designing for the Cloud Tutorial - QCon SF 2009
 
test
testtest
test
 
HeartBeat
HeartBeatHeartBeat
HeartBeat
 
Distributed Computing & MapReduce
Distributed Computing & MapReduceDistributed Computing & MapReduce
Distributed Computing & MapReduce
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015UnConference for Georgia Southern Computer Science March 31, 2015
UnConference for Georgia Southern Computer Science March 31, 2015
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Basics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed StorageBasics of Distributed Systems - Distributed Storage
Basics of Distributed Systems - Distributed Storage
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 

Recently uploaded

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 

Distributed Systems: scalability and high availability

  • 1. Distributed Systems scalability and high availability Renato Lucindo - lucindo.github.com - @rlucindo
  • 2. Renato Lucindo Call me Lucindo (or Linus) 2002 - Bachelor Computer Science 2007 - M.Sc. Computer Science (Combinatorial Optimization) 7+ year developing Distributed Systems My default answer: "I don't know."
  • 3. Agenda Scalability High Availability Problems Tips and Tricks Learning More
  • 4. Distributed Systems Multiple computers that interact with each other over a network to achieve a common goal Purpose Scalability High availability source: http://www.cnds.jhu.edu/
  • 5. Scalability System ability to handle gracefully a growing amount of work Scale up (vertical) Add resources to a single node Improve existing code to handle more work Scale out (horizontal) Add more nodes to a system Linear (or better) scalability
  • 6. Scalability - Vertical Add: CPU, Memory, Disks (bigger box) Handling more simultaneous: Connections Operations Users Choose a good I/O and concurrency model Non-blocking I/O Asynchronous I/O Threads (single, pool, per-connection) Event handling patterns (Reactor, Proactor, ...) Memory model? STM
  • 7. Scalability - Vertical Careful with numbers Requests per second # of Connections Simultaneous operations Event handling Think front-end Slow connections/clients It's slower than other options In doubt, go async Back-end Thread pool (thread per-connection) No events Process per-core
  • 8. Scalability - Horizontal Add nodes to handle more work Front-end Straightforward Stateless Back-end Master/Slave(s) Partitioning DHT Volatile Index
  • 9. Scalability - Horizontal Master/Slave Write on single Master Read on Slaves (one or more) Scales reads
  • 10. Scalability - Horizontal Partitioning (Sharding) Distribute dada across nodes Generally involves data de-normalization Where is some specific data? Master Index Hash (DTH, Consistent Hashing) Volatile Index Joins done in application level NoSQL friendly
  • 11. Scalability - Horizontal Volatile Index: build and maintain data index as cached information (all clients)
  • 12. High Availability "Processes, as well as people, die" Handle hardware and software failures Eliminate single point of failure Redundancy Failover Replicas
  • 13. High Availability - Failover/Redundancy
  • 14. High Availability - Replicas Two or more copies of same data Replica granularity From node replica to "row" replica Load balancing Write concurrency Replica updates Key for high availability and root of several problems
  • 16. Problems - CAP Theorem
  • 17. Problems - CAP Theorem Consistency: all operations (reads/writes) yield a global consistent state Availability: all requests (on non-failed servers) must have a response Partition Tolerance: nodes may not be able to communicate with each other. Pick Two
  • 18. Problems - CAP Theorem C + A: network problems might stop the system Examples: Oracle RAC, IBM DB2 Parallel RDBMS (Master/Slave) Google File System HDFS (Hadoop)
  • 19. Problems - CAP Theorem C + P: clients can't always perform operations Examples: Distributed lock-systems: Chubby, ZooKeeper Paxos protocol (consensus) BigTable, Hbase Hypertable MongoDB
  • 20. Problems - CAP Theorem A + P: clients may read inconsistent (old or undone) data Examples: Amazon Dynamo Cassandra Voldemort CouchDB Riak Caches
  • 21. Problem with CAP Theorem In practice, C + A and C + P systems are the same. C + A: not tolerant of network partitions C + P: not available when a network partition occurs Big problem: network partition Not so big (how often does it happens?) Pick two Availability Consistency The forgotten: Latency Or, how long the system waits before considering a partitioned network?
  • 22. Problems - Real World Every component may fail: Network failure Hardware failure Electricity Natural disasters Code failure
  • 24. Tips & Tricks - Pyramid Capacity (connections, operations, ...) Pyramid
  • 25. Tips & Tricks - Reply Fast FAIL Fast Break complex requests into smaller ones Use timeouts No transactions Be aware that a single slow operation or component can generate contention Self-denial attack
  • 26. Tips & Tricks - Cache Cache: component location, data, dns lookups, previous requests, etc Use negative cache for failed requests (low expiration) Don't rely on cache Your system must work with no cache
  • 27. Tips & Tricks - Queues Easy way to add asynchronous processing an decouple your system.
  • 28. Tips & Tricks - DNS
  • 29. Tips & Tricks - Logs Log everything Use several log levels On every log message User Request host Component involved Version Filename and line If log level not enabled do not process log message Avoid lookup calls (gettimeofday)
  • 30. Tips & Tricks - Domino Effect Make sure your load balancer won't overload components User smart algorithms Load Balance Resource Allocation
  • 31. Tips & Tricks - (Zero) Configuration No configuration files Use good defaults Auto-discovery (multicast, gossip, ...) Make everything configurable Administrative command No need to stop for changes Automatic self adjusts when possible
  • 32. Tips & Tricks - STOP Test With your system under load: kill -STOP <component>
  • 33. Tips & Tricks - Know your tools load average (uptime) stats tools vmstat iostat mpstat tcpstat, tcprstat, etc tcpdump, nc, netstat tunning /proc/net/* ulimit sysctl oprofile debuging tools (gdb, valgrind) ...
  • 34. Tips & Tricks - Count Count everything Connections Operations Failures Successes Request times (granularity) Total, average, standard deviation Monitor counters
  • 35. Tips & Tricks - Stability Patterns Use Timeouts Circuit Breaker Bulkheads Steady State Fail Fast Handshaking Test Harness Decoupling Middleware
  • 36. Tips & Tricks - Don't Panic!
  • 37. Learning More - Books TCP/IP Illustrated, Vol. 1: The Protocols
  • 38. Learning More - Books Unix Network Programming, Vol. 1: The Sockets Networking
  • 39. Learning More - Books Pattern Oriented Software Architecture, Vol. 2
  • 40. Learning More - Books Release It!
  • 41. Learning More - Papers The Google File System Bigtable: A Distributed Storage System for Structured Data Dynamo: Amazon's Highly Available Key-Value Store PNUTS: Yahoo!’s Hosted Data Serving Platform MapReduce: Simplified Data Processing on Large Clusters Towards robust distributed systems Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services BASE: An Acid Alternative Looking up data in P2P systems