SlideShare a Scribd company logo
1 of 70
AdamCloud (Part 2):
Lessons learned from
Docker
Sébastien Bonami, IT Engineering Student
and
David Lauzon, Researcher
École de technologie supérieure (ÉTS)
Presented at Big Data Montreal #32 + DevOps Montreal
January 12th 2015
1
Plan
● AdamCloud Project
● Docker Introduction
● Lessons learned from Docker
o Dockerfiles
o Data Storage
o Networking
o Monitoring
● Conclusion
2
AdamCloud Project
Brief overview
3
AdamCloud Goal
● Main goal: provide a portable infrastructure
for processing genomics data
● Requirements:
o A series of softwares must be chained in a pipeline
o Centralize configuration for multiple environments
o Simple installation procedure for new students
4
Potential solution
● For genomics: Adam project developed at
Berkeley AmpLab
o Snap, Adam, Avocado
o (uses Spark, HDFS)
● For infrastructure:
o Docker ?
5
Adam Genomic Pipeline
6
Fastq
File
(up to
250 GB)
Sam
File
Parquet
File
Parquet
File
(~10MB)
Sequencer
Machine
Snap AvocadoAdam
Hardware
AmpLab
Genomics
Projects
File
Formats
AdamCloud - Environments
3 different environments
● Development (laptop)
o All services in 1 single host
● Demo
o Mac mini cluster
● Testing
o ÉTS servers (for larger genomes)
7
Docker Introduction
From now on, we will talk about Docker leaving AdamCloud
aside.
For simplicity, we chose to use MySQL to demonstrate some
examples about learning Docker.
8
Docker Introduction - Key Concepts
Dockerfile Image
Docker
Hub
Registry
Internet
Container
build
push
pull
run commit
Text file
Size = ~ KB
Installation &
config instructions
Composed of many read-only layers
Typical size = ~ hundred(s) MB
Can have multiple versions (akin Git tags)
Shares the image’s read-only layers
1 private writeable layer (copy-on-write)
Initial size = 0 bytes
Can be stopped, started, paused, etc.
Free public hosting
9
Docker Introduction - How does it work?
Docker
Daemon Container 1
Host OS Kernel
Docker
Storage
Backend Container 2 ...
Hardware
Setups & manage the LXC containers.
Stores the image and container’s data layers
locally.
10
Lesson 0:
Playing with Docker
11
Lesson 0: Playing with Docker
$ sudo sh -c "echo deb https://get.docker.com/ubuntu docker main >
/etc/apt/sources.list.d/docker.list"
$ sudo apt-get update && sudo apt-get install -y --force-yes lxc-docker
12
$ docker run -ti --rm=true ubuntu bash
root@e0a1dad9f7fa:/# whoami; hostname
root
e0a1dad9f7fa
Creates a new interactive (-i)
container with a tty (-t) from the image
ubuntu, starts a bash shell, and
automatically remove the container
when it exits (--rm=true)
Install Docker
You are now “inside” the container
with the id e0a1dad9f7fa
Dockerfiles
13
Dockerfiles - MySQL Example (1/3)
$ mkdir mysql-docker/
$ vi mysql-docker/Dockerfile
# Contents of file mysql-docker/Dockerfile [1]
# Pull base image (from Docker Hub)
FROM ubuntu:14.04
# Install MySQL
RUN apt-get update
RUN apt-get install -y mysql-server
[1] Source: https://registry.hub.docker.com/u/dockerfile/mysql/dockerfile/ 14
Dockerfiles - MySQL Example (2/3)
# Contents of file mysql-docker/Dockerfile (continued)
# Configure MySQL: listening interface, log error, etc.
RUN sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf
RUN sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf
RUN echo "mysqld_safe &" > /tmp/config
RUN echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config
RUN echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT
OPTION;'" >> /tmp/config
RUN bash /tmp/config && rm -f /tmp/config
15
Dockerfiles - MySQL Example (3/3)
# Contents of file mysql-docker/Dockerfile (continued)
# Define default command
CMD ["mysqld_safe"]
# Expose guest port. Not required, but facilitates management
# NEVER expose the public port in the Dockerfile
EXPOSE 3306
16
Dockerfiles - Building MySQL image
$ docker build -t mysql-image mysql-docker/
Sending build context to Docker daemon 2.56 kB
[...]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
[...]
17
Lesson 1:
Dialog-less installs
18
Lesson 1: Dialog-less installs
# Contents of file mysql/Dockerfile (showing differences)
[...]
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y mysql-server
[...]
$ docker run -d mysql-image
5f3695d8f5e4dfc836156f645dbf6b647e264e58a25b4e2a9724b7522591b9bc
$ docker build -t mysql-image mysql-docker/
[...]
Successfully built d5cb85b206a4
That’s our image ID
That’s our container ID
(we can use a prefix as long as it is unique)
19
Lesson 1: Testing the connectivity
$ mysql -uroot -h 172.17.0.102 -e "SHOW DATABASES;"
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
+--------------------+
$ docker inspect 5f3695d8f5e4 |grep IPAddress |cut -d'"' -f4
172.17.0.102
Finding the IP address
of our container
From the host, we can now connect to
our MySQL box inside the container
using the Docker network bridge.
20
Lesson 2:
Layers
21
Lesson 2: Layers - Docker History
$ docker history mysql-image
IMAGE CREATED CREATED BY SIZE
d5cb85b206a4 41 minutes ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B
a3fcf7ad0e46 41 minutes ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B
e495928f5148 41 minutes ago /bin/sh -c bash /tmp/config && rm -f /tmp/con 5.245 MB
e81232406a48 41 minutes ago /bin/sh -c echo "mysql -e 'GRANT ALL PRIVILEG 131 B
3ed871742259 41 minutes ago /bin/sh -c echo "mysqladmin --silent --wait=3 59 B
7383675c6559 41 minutes ago /bin/sh -c echo "mysqld_safe &" > /tmp/config 14 B
dfa40ac0f314 45 minutes ago /bin/sh -c sed -i 's/^(log_errors.*)/# 1/ 3.509 kB
01a7a7904f29 45 minutes ago /bin/sh -c sed -i 's/^(bind-addresss.*)/# 3.507 kB
2709eaa06d42 About an hour ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 130.2 MB
6ca9716f2565 About an hour ago /bin/sh -c apt-get update 20.8 MB
86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B
dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B
78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB
3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B
61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB
5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB
511136ea3c5a 19 months ago 0 B
17 layers !
Every Docker instruction
creates a layer.
200 MB for Ubuntu
20 MB for apt-get update
130 MB for installing
MySQL
22
Time to
cleanup ?
23
Lesson 2: Layers - What are they?
● Think of a layer as directory of files (or blocks)
● All these “physical” layers are combined into a
“logical” file system for each individual container
o Union file system
o Copy-on-write
o Like a stack: higher layers may override lower layers
24
Lesson 2: Layers - Purpose (1/4)
● Blazing fast container instantiation
o To create a new instance from an image, Docker simply creates a
new empty read-write layer
Great, but we could achieve this goal
with 1 single layer per image + 1 layer
per container
Why 17 layers ?
25
Lesson 2: Layers - Purpose (2/4)
● Faster image modification
o Changing/adding a Dockerfile instruction causes only the modified
layer(s) and those following it to be rebuilt
How often do you plan on changing
your Dockerfiles ?
26
Lesson 2: Layers - Purpose (3/4)
● Faster distribution
o when distributing the image (via docker push) and downloading it
(via docker pull, or docker build), only the affected layer(s)
are sent.
27
Lesson 2: Layers - Purpose (4/4)
● Minimize disk space
o All the containers located on the same Docker host and parent of
the same image hierarchy will share layers.
o Ubuntu Docker image is 200 MB
o 1000 containers based on Ubuntu only takes 200 MB total
(+ the additional packages they require)
Will you have multiple variants (config and/or versions) of MySQL on
the same machine ?
How many MySQL servers will you have on the same machine ?
28
Lesson 2: Layers - Layer Genocide
$ cp -r mysql-docker/ mysql-docker-grouped
$ vi mysql-docker-grouped/Dockerfile
In this example, all our MySQL containers will be the same.
Therefore, we’ll only be needing 1 single layer.
29
Lesson 2: Layers - Combine multiple RUN instructions
# Contents of file mysql-docker-grouped/Dockerfile
[...]
RUN apt-get update && 
apt-get install -y mysql-server && 
sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf && 
sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf && 
echo "mysqld_safe &" > /tmp/config && 
echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config && 
echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT
OPTION;'" >> /tmp/config && 
bash /tmp/config && rm -f /tmp/config
[...]
30
Lesson 2: Layers - Docker History
$ docker build -t mysql-image-grouped mysql-docker-grouped/
[...]
Successfully built d5cb85b206a4
$ docker history mysql-image-grouped
IMAGE CREATED CREATED BY SIZE
11ccd4cc6c82 About an hour ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B
59c9467d3360 About an hour ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B
0993d316210d About an hour ago /bin/sh -c apt-get update && DEBIAN_FRONT 151 MB
86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B
dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B
78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB
3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B
61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB
5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB
511136ea3c5a 19 months ago 0 B
Freed 7 layers !
Our Docker now only
adds 3 layers on top of
the base image:
RUN, CMD, EXPOSE
31
Lesson 3:
Staying fit
32
Lesson 3: Staying fit - Compacting layers
$ cp -r mysql-docker-grouped/ mysql-docker-cleaned
$ vi mysql-docker-cleaned/Dockerfile
Some commands, like apt-get update, creates some
temporary files, which can be safely discarded after use.
We can save space and create smaller images by deleting
those files.
33
Lesson 3: Staying fit - Removing temporary files
# Contents of file mysql-docker-cleaned/Dockerfile (partial)
[...]
RUN apt-get update && 
apt-get install -y mysql-server && 
rm -fr /var/lib/apt/lists/* && 
[...]
$ docker build -t mysql-image-cleaned mysql-docker-cleaned/
[...]
Successfully built d5cb85b206a4
Remember: you’ll need to run
apt-get update again next time
you want to install something
34
Lesson 3: Staying fit - Local Docker images
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
mysql-image-cleaned latest 032798b8e064 2 hours ago 322.8 MB
mysql-image-grouped latest 11ccd4cc6c82 2 hours ago 343.6 MB
mysql-image latest d5cb85b206a4 3 hours ago 348.9 MB
ubuntu 14.04 86ce37374f40 6 weeks ago 192.7 MB
The cleaned image occupies 17% less space than the original
mysql-image (it’s a virtual size) [1].
MySQL is small; the impact can be much bigger for other
applications.
[1] ((348-192) - (322-192)) / (348-192) = 17%
35
Lesson 3: Staying fit - Smallest Docker base images
Image:Tag Size
scratch 0.0 B
busybox:ubuntu-14.04 5.6 MB
debian:7 85.0 MB
ubuntu:14.04 192.7 MB
centos:7 210.0 MB
fedora:21 241.3 MB
36
Lesson 3: Staying fit - docker diff
● Show differences between container and the image
o Useful to see which files have been modified/created when writing
your Dockerfile
37
Lesson 4:
Fixed as “worksforme”
38
Lesson 4: Reproducibility - Package
version
● Your Dockerfile may build a different image in a few
months than today’s image
RUN apt-get install -y mysql-server
RUN apt-get install -y mysql-server=5.5.40-0ubuntu0.14.04.1
Specify the package version explicitly is better
39
Lesson 4: Reproducibility - Dependency
version
RUN apt-get install -y libaio1=0.3.109-4 mysql-common=5.5.40-0ubuntu0.14.04.1
libmysqlclient18=5.5.40-0ubuntu0.14.04.1 libwrap0=7.6.q-25 libdbi-perl=1.630-
1 libdbd-mysql-perl=4.025-1 libterm-readkey-perl=2.31-1 mysql-client-core-
5.5=5.5.40-0ubuntu0.14.04.1 mysql-client-5.5=5.5.40-0ubuntu0.14.04.1 mysql-
server-core-5.5=5.5.40-0ubuntu0.14.04.1 psmisc=22.20-1ubuntu2 mysql-server-
5.5=5.5.40-0ubuntu0.14.04.1 libhtml-template-perl=2.95-1 mysql-server=5.5.40-
0ubuntu0.14.04.1 tcpd=7.6.q-25
Previous solution should be enough…
But if you need higher guarantee of reproducibility:
A. Specify the package version for the dependencies as well
B. And / or use a cache proxy, maven proxy, etc.
40
Lesson 5:
Prototry
A quick and dirty attempt to develop a working
model of software. The original intent is to
rewrite the ProtoTry, using lessons learned, but
schedules never permit. Also known as legacy
code. [1]
41[1] Michael Duell, Ailments of Unsuitable Project-Disoriented Software, http://www.fsfla.org/~lxoliva/fun/prog/resign-patterns
Lesson 5: Prototry - Docker Hub Registry
● Before writing your own Dockerfile, try a build from
someone else
o https://registry.hub.docker.com/
o Official builds
o Trusted (automated) builds
o Other builds
For advanced setup,
see these images:
● jenkins
● dockerfile/java
42
Lesson 5: Prototry - Using other people images
PROs CONs
● Faster to get started
● Better tested
● You may end up with a mixed stack to
support
○ e.g. different versions of Java
○ Ubuntu vs Debian vs CentOS
● Not all sources use all the best practices
described in this presentation
For medium - large organisations / heavy Docker users:
Best to fork and write your own Dockerfiles
43
Lesson 5: Prototry - Potential image hierarchy
FROM ubuntu:14.04
# Organization-wide tools (e.g. vim, etc.)
myorg-base
myorg-java
FROM myorg-base:1.0
# OpenJDK | OracleJDK
myorg-python
FROM myorg-base:1.0
# Install Python 2.7
python-app1
FROM myorg-python:2.7
# ...
java-app3
FROM myorg-java:oracle-jdk7
# ...
python-app2
FROM myorg-python:2.7
# ...
44
Lesson 6:
Volume Design Patterns
45
● Nothing to do - that’s the default Docker behavior
o Application data is stored along with the
infrastructure (container) data
● If the container is restarted, data is still there
● If the container is deleted, data is gone
Lesson 6: Inside Container Pattern
46
Lesson 6: Host Directory Pattern
● A directory on the host
● To share data across containers on the
same host
● For example, put the source code on the
host and mount it inside the container with
the “-v” flag
47
Lesson 6: Data-Only Container Pattern
● Run on a barebone image
● VOLUME command in the Dockerfile or “-v”
flag at run
● Just use the “--volumes-from” flag to
mount all the volumes in another container
48
Lesson 7:
Storage backend
49
Lesson 7: Storage backend - Overview
● Options:
o VFS
o AUFS (default, docker < 0.7)
o DeviceMapper
 Direct LVM
 Loop LVM (default in Red Hat)
o Btrfs (experimental)
o OverlayFS (experimental)
Red Hat[1] says the
fastest backends are:
1. OverlayFS
2. Direct LVM
3. BtrFS
4. Loop LVM
Lookup your current Docker backend
$ docker info |grep Driver
[1] http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
50
Lesson 7: Storage backend - VFS & AUFS
● Both are very basic (NOT for PROD)
● Both store each layer as a separate directory with
regular files
● VFS
o No Copy-on-Write (CoW)
● AUFS
o Original Docker backend
o File-level Copy-on-Write (CoW)
VFS & AUFS can be
useful to understand how
Docker works
Do not use in PROD
51
Lesson 7: Storage backend - DeviceMapper (1/2)
● Already used by linux kernel for LVM2 (logical volume management)
o Block-level Copy-on-Write (CoW)
o Unused blocks do not use space
● Uses thin pool provisioning to implement CoW snapshots
o Each pool requires 2 block devices: data & metadata
o By default, uses loop back mounts on sparse regular files
# ls -alhs /var/lib/docker/devicemapper/devicemapper
506M -rw-------. 1 root root 100G Sep 10 20:15 data
1.1M -rw-------. 1 root root 2.0G Sep 10 20:15 metadata
Loop LVM
52
Lesson 7: Storage backend - DeviceMapper (2/2)
● In production:
o Use real block devices! (Direct LVM)
o Ideally, data & metadata each on its own spindle
o Additional configuration is required
Docker does not
do that for you
53
Lesson 7: Storage backend - Btrfs & OverlayFS
Btrfs:
● Requires /var/lib/docker to be on a btrfs file system
● Block-level Copy-on-Write (CoW) using Btrfs’s snapshotting
● Each layer stored as a Btrfs subvolume
● No SELinux
OverlayFS:
● Support page cache sharing
● Lower FS contains the base image (XFS or EXT4)
● Upper FS contains the deltas
● No SELinux
Claims a huge
RAM saving
54
Lesson 8:
Networking
55
Docker
● Ethernet bridge “docker0” created when Docker boots
● Virtual subnet on the host (default: 172.17.42.1/16)
● Each container has a pair of virtual Ethernet interfaces
● You can remove “docker0” and use your own bridge if
you want
56
Weave
Why Weave?
● Docker built-in functionalities don’t provide
a solution for connecting containers on
multiple hosts
● Weave create a virtual network to permit a
distributed environment (common in the
real word)
57
Weave
How does it work?
● Virtual routers establish TCP connections to
each other with a handshake
● These connections are duplex
● Use “pcap” to capture packets
● Exclude traffic between local containers
58
Weave
Weave
Container
Container 1 Container 2 Container 3
Host A
Weave
Container
Container 1 Container 2 Container 3
Host B
59
Weave - images
Image:Tag Size
zettio/weave:0.8.0 11 MB
zettio/weavedns:0.8.0 9.4 MB
zettio/weavetools:0.8.0 3.7 MB
60
Weave - getting started
$ sudo weave launch
$ sudo weave run 10.0.0.1/24 -ti --name ubuntu-01 ubuntu:14.04
$ sudo weave launch weave-01
$ sudo weave run 10.0.0.2/24 -ti --name ubuntu-02 ubuntu:14.04
● First host: weave-01
● Second host: weave-02
Note: “weave run” invokes “docker run -d” (running as a daemon)
Starts the weave router in a container
Starts the weave router in a container and peers it
CIDR notation
61
Weave - testing the connectivity (1/2)
$ sudo weave status
weave router 0.8.0
Our name is 7a:ab:c1:21:f9:3b
Sniffing traffic on &{15 65535 ethwe 56:40:66:0b:a4:c6 up|broadcast|multicast}
MACs:
56:40:66:0b:a4:c6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:39.23846091 +0000 UTC)
7a:ab:c1:21:f9:3b -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.142183122 +0000 UTC)
a2:60:ab:8b:1f:b6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.716414595 +0000 UTC)
7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.204010927 +0000 UTC)
1e:b4:78:1e:dd:23 -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.42594994 +0000 UTC)
Peers:
Peer 7a:ab:c1:21:f9:3b (v1) (UID 17511927952474106279)
-> 7a:5a:98:6e:92:2e [192.168.1.30:47638]
Peer 7a:5a:98:6e:92:2e (v1) (UID 8527109358448991597)
-> 7a:ab:c1:21:f9:3b [192.168.1.195:6783]
Routes:
unicast:
7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e
7a:ab:c1:21:f9:3b -> 00:00:00:00:00:00
broadcast:
7a:ab:c1:21:f9:3b -> [7a:5a:98:6e:92:2e]
7a:5a:98:6e:92:2e -> []
Reconnects:
● First host: weave-01
Connected peers
Virtual interface used by Weave
Containers and
host points
62
Weave - testing the connectivity (2/2)
$ sudo docker attach ubuntu-02
$ ping -c 4 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=4.22 ms
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=1.20 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=1.73 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=2.02 ms
--- 10.0.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3008ms
rtt min/avg/max/mdev = 1.206/2.299/4.226/1.150 ms
● Second host: weave-02
It pings!
63
Lesson 9:
Monitoring
64
cAdvisor
● New tool from Google
● Specialized for Docker containers
PROs CONs
● Great web interface
● Docker image available (18 MB)
to try it in seconds
● Stats can be export to InfluxDB
(data mining to do)
● Needs more maturity
● Missing metrics
○ No data for Disk I/O
● Only keep last 60 metrics locally (not
configurable)
65
Monitoring with cAdvisor
Web interface →
66
Conclusion
67
AdamCloud - The next steps
● Docker + Weave = success
● Open-source the project and merge it
upstream into the AmpLab genomic
pipeline.
● Support for Amazon EC2 environments
● Improve administration of Docker
containers
o Monitoring, orchestration, provisioning
68
Docker Conclusion
● 1 Docker container = 1 background daemon
● Container isolation is not like a VM
● Use correct versions of images and keep a trace
● Docker is less interesting for multi-tenants use cases (no SSH in the
containers)
● Docker is FAST and VERSATILE
● cAdvisor is an interesting monitoring tool, but limited
● Docker is perfect for short lived apps (no long term data persistence)
● Data intensive apps should review the Docker docs carefully. Start
looking at Direct LVM.
69
References
● Jonathan Bergknoff - Building good docker images, http://jonathan.bergknoff.com/journal/building-good-
docker-images
● Michael Crosby - Dockerfile Best Practices, http://crosbymichael.com/dockerfile-best-practices.html
● Michael Crosby - Dockerfile Best Practices - take 2, http://crosbymichael.com/dockerfile-best-practices-take-
2.html
● Nathan Leclaire - The Dockerfile is not the source of truth for your image,
http://nathanleclaire.com/blog/2014/09/29/the-dockerfile-is-not-the-source-of-truth-for-your-image/
● Docker Documentation - Understanding Docker, https://docs.docker.com/introduction/understanding-docker/
● Docker Documentation - Docker User Guide, https://docs.docker.com/userguide/
● Docker Documentation - Dockerfile Reference, https://docs.docker.com/reference/builder/
● Docker Documentation - Command Line (CLI) User Guide,
https://docs.docker.com/reference/commandline/cli/
● Docker Documentation - Advanced networking, http://docs.docker.com/articles/networking/
● Project Atomic - Supported Filesystems, http://www.projectatomic.io/docs/filesystems/
● Red Hat Developer Blog - Comprehensive Overview of Storage Scalability in Docker,
http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/
● Linux Kernel Documentation - DeviceMapper Thin Provisioning,
https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt
● weave - the Docker network, http://zettio.github.io/weave/
● GitHub - google/cadvisor, https://github.com/google/cadvisor
70

More Related Content

What's hot

Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark Summit
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDsDean Chen
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and SharkYahooTechConference
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰Wayne Chen
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and dockerFabio Fumarola
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveSachin Aggarwal
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internalDavid Lauzon
 

What's hot (20)

Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
Spark on Mesos-A Deep Dive-(Dean Wampler and Tim Chen, Typesafe and Mesosphere)
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
Docker.io
Docker.ioDocker.io
Docker.io
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark RDD 101
Apache Spark RDD 101Apache Spark RDD 101
Apache Spark RDD 101
 
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰HadoopCon 2016  - 用 Jupyter Notebook Hold 住一個上線 Spark  Machine Learning 專案實戰
HadoopCon 2016 - 用 Jupyter Notebook Hold 住一個上線 Spark Machine Learning 專案實戰
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
Apache spark Intro
Apache spark IntroApache spark Intro
Apache spark Intro
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark Introduction | Big Data Hadoop Spark Tutorial | CloudxLab
 
BDM25 - Spark runtime internal
BDM25 - Spark runtime internalBDM25 - Spark runtime internal
BDM25 - Spark runtime internal
 

Viewers also liked

BDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingBDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingDavid Lauzon
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaDavid Lauzon
 
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetupBDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetupDavid Lauzon
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseDavid Lauzon
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
หนังสือภาษาไทย Spark Internal
หนังสือภาษาไทย Spark Internalหนังสือภาษาไทย Spark Internal
หนังสือภาษาไทย Spark InternalBhuridech Sudsee
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AILex Yu
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKTaposh Roy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark Summit
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internalsAnton Kirillov
 

Viewers also liked (20)

BDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 DebriefingBDM26: Spark Summit 2014 Debriefing
BDM26: Spark Summit 2014 Debriefing
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
 
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetupBDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
BDM24 - Cassandra use case at Netflix 20140429 montrealmeetup
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
หนังสือภาษาไทย Spark Internal
หนังสือภาษาไทย Spark Internalหนังสือภาษาไทย Spark Internal
หนังสือภาษาไทย Spark Internal
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
QCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AIQCon2016--Drive Best Spark Performance on AI
QCon2016--Drive Best Spark Performance on AI
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Fun[ctional] spark with scala
Fun[ctional] spark with scalaFun[ctional] spark with scala
Fun[ctional] spark with scala
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 
Apache Spark
Apache Spark Apache Spark
Apache Spark
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Spark Deep Dive
Spark Deep DiveSpark Deep Dive
Spark Deep Dive
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas Dinsmore
 
Data processing platforms with SMACK: Spark and Mesos internals
Data processing platforms with SMACK:  Spark and Mesos internalsData processing platforms with SMACK:  Spark and Mesos internals
Data processing platforms with SMACK: Spark and Mesos internals
 

Similar to BDM32: AdamCloud Project - Part II

手把手帶你學Docker 03042017
手把手帶你學Docker 03042017手把手帶你學Docker 03042017
手把手帶你學Docker 03042017Paul Chao
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to DockerLuong Vo
 
時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇Philip Zheng
 
Docker workshop 0507 Taichung
Docker workshop 0507 Taichung Docker workshop 0507 Taichung
Docker workshop 0507 Taichung Paul Chao
 
手把手帶你學 Docker 入門篇
手把手帶你學 Docker 入門篇手把手帶你學 Docker 入門篇
手把手帶你學 Docker 入門篇Philip Zheng
 
Dockerizing a Symfony2 application
Dockerizing a Symfony2 applicationDockerizing a Symfony2 application
Dockerizing a Symfony2 applicationRoman Rodomansky
 
桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作Philip Zheng
 
Docker for Web Developers: A Sneak Peek
Docker for Web Developers: A Sneak PeekDocker for Web Developers: A Sneak Peek
Docker for Web Developers: A Sneak Peekmsyukor
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionBen Hall
 
Docker for mere mortals
Docker for mere mortalsDocker for mere mortals
Docker for mere mortalsHenryk Konsek
 
Docker Essentials Workshop— Innovation Labs July 2020
Docker Essentials Workshop— Innovation Labs July 2020Docker Essentials Workshop— Innovation Labs July 2020
Docker Essentials Workshop— Innovation Labs July 2020CloudHero
 
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...Puppet
 
Challenges of container configuration
Challenges of container configurationChallenges of container configuration
Challenges of container configurationlutter
 
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & DevopsMaciej Lasyk
 
Shipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with DockerShipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with DockerJérôme Petazzoni
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Cosimo Streppone
 
Introduction to Docker
Introduction  to DockerIntroduction  to Docker
Introduction to DockerJian Wu
 

Similar to BDM32: AdamCloud Project - Part II (20)

手把手帶你學Docker 03042017
手把手帶你學Docker 03042017手把手帶你學Docker 03042017
手把手帶你學Docker 03042017
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇時代在變 Docker 要會:台北 Docker 一日入門篇
時代在變 Docker 要會:台北 Docker 一日入門篇
 
Docker workshop 0507 Taichung
Docker workshop 0507 Taichung Docker workshop 0507 Taichung
Docker workshop 0507 Taichung
 
手把手帶你學 Docker 入門篇
手把手帶你學 Docker 入門篇手把手帶你學 Docker 入門篇
手把手帶你學 Docker 入門篇
 
Dockerizing a Symfony2 application
Dockerizing a Symfony2 applicationDockerizing a Symfony2 application
Dockerizing a Symfony2 application
 
桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作桃園市教育局Docker技術入門與實作
桃園市教育局Docker技術入門與實作
 
Docker for Web Developers: A Sneak Peek
Docker for Web Developers: A Sneak PeekDocker for Web Developers: A Sneak Peek
Docker for Web Developers: A Sneak Peek
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and Production
 
Docker for mere mortals
Docker for mere mortalsDocker for mere mortals
Docker for mere mortals
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Docker Essentials Workshop— Innovation Labs July 2020
Docker Essentials Workshop— Innovation Labs July 2020Docker Essentials Workshop— Innovation Labs July 2020
Docker Essentials Workshop— Innovation Labs July 2020
 
ABCs of docker
ABCs of dockerABCs of docker
ABCs of docker
 
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...
 
Challenges of container configuration
Challenges of container configurationChallenges of container configuration
Challenges of container configuration
 
Linux containers & Devops
Linux containers & DevopsLinux containers & Devops
Linux containers & Devops
 
Shipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with DockerShipping Applications to Production in Containers with Docker
Shipping Applications to Production in Containers with Docker
 
Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013Puppet at Opera Sofware - PuppetCamp Oslo 2013
Puppet at Opera Sofware - PuppetCamp Oslo 2013
 
Docker dDessi november 2015
Docker dDessi november 2015Docker dDessi november 2015
Docker dDessi november 2015
 
Introduction to Docker
Introduction  to DockerIntroduction  to Docker
Introduction to Docker
 

Recently uploaded

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 

Recently uploaded (20)

Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 

BDM32: AdamCloud Project - Part II

  • 1. AdamCloud (Part 2): Lessons learned from Docker Sébastien Bonami, IT Engineering Student and David Lauzon, Researcher École de technologie supérieure (ÉTS) Presented at Big Data Montreal #32 + DevOps Montreal January 12th 2015 1
  • 2. Plan ● AdamCloud Project ● Docker Introduction ● Lessons learned from Docker o Dockerfiles o Data Storage o Networking o Monitoring ● Conclusion 2
  • 4. AdamCloud Goal ● Main goal: provide a portable infrastructure for processing genomics data ● Requirements: o A series of softwares must be chained in a pipeline o Centralize configuration for multiple environments o Simple installation procedure for new students 4
  • 5. Potential solution ● For genomics: Adam project developed at Berkeley AmpLab o Snap, Adam, Avocado o (uses Spark, HDFS) ● For infrastructure: o Docker ? 5
  • 6. Adam Genomic Pipeline 6 Fastq File (up to 250 GB) Sam File Parquet File Parquet File (~10MB) Sequencer Machine Snap AvocadoAdam Hardware AmpLab Genomics Projects File Formats
  • 7. AdamCloud - Environments 3 different environments ● Development (laptop) o All services in 1 single host ● Demo o Mac mini cluster ● Testing o ÉTS servers (for larger genomes) 7
  • 8. Docker Introduction From now on, we will talk about Docker leaving AdamCloud aside. For simplicity, we chose to use MySQL to demonstrate some examples about learning Docker. 8
  • 9. Docker Introduction - Key Concepts Dockerfile Image Docker Hub Registry Internet Container build push pull run commit Text file Size = ~ KB Installation & config instructions Composed of many read-only layers Typical size = ~ hundred(s) MB Can have multiple versions (akin Git tags) Shares the image’s read-only layers 1 private writeable layer (copy-on-write) Initial size = 0 bytes Can be stopped, started, paused, etc. Free public hosting 9
  • 10. Docker Introduction - How does it work? Docker Daemon Container 1 Host OS Kernel Docker Storage Backend Container 2 ... Hardware Setups & manage the LXC containers. Stores the image and container’s data layers locally. 10
  • 12. Lesson 0: Playing with Docker $ sudo sh -c "echo deb https://get.docker.com/ubuntu docker main > /etc/apt/sources.list.d/docker.list" $ sudo apt-get update && sudo apt-get install -y --force-yes lxc-docker 12 $ docker run -ti --rm=true ubuntu bash root@e0a1dad9f7fa:/# whoami; hostname root e0a1dad9f7fa Creates a new interactive (-i) container with a tty (-t) from the image ubuntu, starts a bash shell, and automatically remove the container when it exits (--rm=true) Install Docker You are now “inside” the container with the id e0a1dad9f7fa
  • 14. Dockerfiles - MySQL Example (1/3) $ mkdir mysql-docker/ $ vi mysql-docker/Dockerfile # Contents of file mysql-docker/Dockerfile [1] # Pull base image (from Docker Hub) FROM ubuntu:14.04 # Install MySQL RUN apt-get update RUN apt-get install -y mysql-server [1] Source: https://registry.hub.docker.com/u/dockerfile/mysql/dockerfile/ 14
  • 15. Dockerfiles - MySQL Example (2/3) # Contents of file mysql-docker/Dockerfile (continued) # Configure MySQL: listening interface, log error, etc. RUN sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf RUN sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf RUN echo "mysqld_safe &" > /tmp/config RUN echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config RUN echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT OPTION;'" >> /tmp/config RUN bash /tmp/config && rm -f /tmp/config 15
  • 16. Dockerfiles - MySQL Example (3/3) # Contents of file mysql-docker/Dockerfile (continued) # Define default command CMD ["mysqld_safe"] # Expose guest port. Not required, but facilitates management # NEVER expose the public port in the Dockerfile EXPOSE 3306 16
  • 17. Dockerfiles - Building MySQL image $ docker build -t mysql-image mysql-docker/ Sending build context to Docker daemon 2.56 kB [...] debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype [...] 17
  • 19. Lesson 1: Dialog-less installs # Contents of file mysql/Dockerfile (showing differences) [...] RUN DEBIAN_FRONTEND=noninteractive apt-get install -y mysql-server [...] $ docker run -d mysql-image 5f3695d8f5e4dfc836156f645dbf6b647e264e58a25b4e2a9724b7522591b9bc $ docker build -t mysql-image mysql-docker/ [...] Successfully built d5cb85b206a4 That’s our image ID That’s our container ID (we can use a prefix as long as it is unique) 19
  • 20. Lesson 1: Testing the connectivity $ mysql -uroot -h 172.17.0.102 -e "SHOW DATABASES;" +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ $ docker inspect 5f3695d8f5e4 |grep IPAddress |cut -d'"' -f4 172.17.0.102 Finding the IP address of our container From the host, we can now connect to our MySQL box inside the container using the Docker network bridge. 20
  • 22. Lesson 2: Layers - Docker History $ docker history mysql-image IMAGE CREATED CREATED BY SIZE d5cb85b206a4 41 minutes ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B a3fcf7ad0e46 41 minutes ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B e495928f5148 41 minutes ago /bin/sh -c bash /tmp/config && rm -f /tmp/con 5.245 MB e81232406a48 41 minutes ago /bin/sh -c echo "mysql -e 'GRANT ALL PRIVILEG 131 B 3ed871742259 41 minutes ago /bin/sh -c echo "mysqladmin --silent --wait=3 59 B 7383675c6559 41 minutes ago /bin/sh -c echo "mysqld_safe &" > /tmp/config 14 B dfa40ac0f314 45 minutes ago /bin/sh -c sed -i 's/^(log_errors.*)/# 1/ 3.509 kB 01a7a7904f29 45 minutes ago /bin/sh -c sed -i 's/^(bind-addresss.*)/# 3.507 kB 2709eaa06d42 About an hour ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 130.2 MB 6ca9716f2565 About an hour ago /bin/sh -c apt-get update 20.8 MB 86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B 78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB 3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B 61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB 5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB 511136ea3c5a 19 months ago 0 B 17 layers ! Every Docker instruction creates a layer. 200 MB for Ubuntu 20 MB for apt-get update 130 MB for installing MySQL 22
  • 24. Lesson 2: Layers - What are they? ● Think of a layer as directory of files (or blocks) ● All these “physical” layers are combined into a “logical” file system for each individual container o Union file system o Copy-on-write o Like a stack: higher layers may override lower layers 24
  • 25. Lesson 2: Layers - Purpose (1/4) ● Blazing fast container instantiation o To create a new instance from an image, Docker simply creates a new empty read-write layer Great, but we could achieve this goal with 1 single layer per image + 1 layer per container Why 17 layers ? 25
  • 26. Lesson 2: Layers - Purpose (2/4) ● Faster image modification o Changing/adding a Dockerfile instruction causes only the modified layer(s) and those following it to be rebuilt How often do you plan on changing your Dockerfiles ? 26
  • 27. Lesson 2: Layers - Purpose (3/4) ● Faster distribution o when distributing the image (via docker push) and downloading it (via docker pull, or docker build), only the affected layer(s) are sent. 27
  • 28. Lesson 2: Layers - Purpose (4/4) ● Minimize disk space o All the containers located on the same Docker host and parent of the same image hierarchy will share layers. o Ubuntu Docker image is 200 MB o 1000 containers based on Ubuntu only takes 200 MB total (+ the additional packages they require) Will you have multiple variants (config and/or versions) of MySQL on the same machine ? How many MySQL servers will you have on the same machine ? 28
  • 29. Lesson 2: Layers - Layer Genocide $ cp -r mysql-docker/ mysql-docker-grouped $ vi mysql-docker-grouped/Dockerfile In this example, all our MySQL containers will be the same. Therefore, we’ll only be needing 1 single layer. 29
  • 30. Lesson 2: Layers - Combine multiple RUN instructions # Contents of file mysql-docker-grouped/Dockerfile [...] RUN apt-get update && apt-get install -y mysql-server && sed -i 's/^(bind-addresss.*)/# 1/' /etc/mysql/my.cnf && sed -i 's/^(log_errors.*)/# 1/' /etc/mysql/my.cnf && echo "mysqld_safe &" > /tmp/config && echo "mysqladmin --silent --wait=30 ping || exit 1" >> /tmp/config && echo "mysql -e 'GRANT ALL PRIVILEGES ON *.* TO "root"@"%" WITH GRANT OPTION;'" >> /tmp/config && bash /tmp/config && rm -f /tmp/config [...] 30
  • 31. Lesson 2: Layers - Docker History $ docker build -t mysql-image-grouped mysql-docker-grouped/ [...] Successfully built d5cb85b206a4 $ docker history mysql-image-grouped IMAGE CREATED CREATED BY SIZE 11ccd4cc6c82 About an hour ago /bin/sh -c #(nop) EXPOSE map[3306/tcp:{}] 0 B 59c9467d3360 About an hour ago /bin/sh -c #(nop) CMD [mysqld_safe] 0 B 0993d316210d About an hour ago /bin/sh -c apt-get update && DEBIAN_FRONT 151 MB 86ce37374f40 6 weeks ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B dc07507cef42 6 weeks ago /bin/sh -c apt-get update && apt-get dist-upg 0 B 78e82ee876a2 6 weeks ago /bin/sh -c sed -i 's/^#s*(deb.*universe)$/ 1.895 kB 3f45ca85fedc 6 weeks ago /bin/sh -c rm -rf /var/lib/apt/lists/* 0 B 61cb619d86bc 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 194.8 kB 5bc37dc2dfba 6 weeks ago /bin/sh -c #(nop) ADD file:d11cc4a4310c270539 192.5 MB 511136ea3c5a 19 months ago 0 B Freed 7 layers ! Our Docker now only adds 3 layers on top of the base image: RUN, CMD, EXPOSE 31
  • 33. Lesson 3: Staying fit - Compacting layers $ cp -r mysql-docker-grouped/ mysql-docker-cleaned $ vi mysql-docker-cleaned/Dockerfile Some commands, like apt-get update, creates some temporary files, which can be safely discarded after use. We can save space and create smaller images by deleting those files. 33
  • 34. Lesson 3: Staying fit - Removing temporary files # Contents of file mysql-docker-cleaned/Dockerfile (partial) [...] RUN apt-get update && apt-get install -y mysql-server && rm -fr /var/lib/apt/lists/* && [...] $ docker build -t mysql-image-cleaned mysql-docker-cleaned/ [...] Successfully built d5cb85b206a4 Remember: you’ll need to run apt-get update again next time you want to install something 34
  • 35. Lesson 3: Staying fit - Local Docker images $ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE mysql-image-cleaned latest 032798b8e064 2 hours ago 322.8 MB mysql-image-grouped latest 11ccd4cc6c82 2 hours ago 343.6 MB mysql-image latest d5cb85b206a4 3 hours ago 348.9 MB ubuntu 14.04 86ce37374f40 6 weeks ago 192.7 MB The cleaned image occupies 17% less space than the original mysql-image (it’s a virtual size) [1]. MySQL is small; the impact can be much bigger for other applications. [1] ((348-192) - (322-192)) / (348-192) = 17% 35
  • 36. Lesson 3: Staying fit - Smallest Docker base images Image:Tag Size scratch 0.0 B busybox:ubuntu-14.04 5.6 MB debian:7 85.0 MB ubuntu:14.04 192.7 MB centos:7 210.0 MB fedora:21 241.3 MB 36
  • 37. Lesson 3: Staying fit - docker diff ● Show differences between container and the image o Useful to see which files have been modified/created when writing your Dockerfile 37
  • 38. Lesson 4: Fixed as “worksforme” 38
  • 39. Lesson 4: Reproducibility - Package version ● Your Dockerfile may build a different image in a few months than today’s image RUN apt-get install -y mysql-server RUN apt-get install -y mysql-server=5.5.40-0ubuntu0.14.04.1 Specify the package version explicitly is better 39
  • 40. Lesson 4: Reproducibility - Dependency version RUN apt-get install -y libaio1=0.3.109-4 mysql-common=5.5.40-0ubuntu0.14.04.1 libmysqlclient18=5.5.40-0ubuntu0.14.04.1 libwrap0=7.6.q-25 libdbi-perl=1.630- 1 libdbd-mysql-perl=4.025-1 libterm-readkey-perl=2.31-1 mysql-client-core- 5.5=5.5.40-0ubuntu0.14.04.1 mysql-client-5.5=5.5.40-0ubuntu0.14.04.1 mysql- server-core-5.5=5.5.40-0ubuntu0.14.04.1 psmisc=22.20-1ubuntu2 mysql-server- 5.5=5.5.40-0ubuntu0.14.04.1 libhtml-template-perl=2.95-1 mysql-server=5.5.40- 0ubuntu0.14.04.1 tcpd=7.6.q-25 Previous solution should be enough… But if you need higher guarantee of reproducibility: A. Specify the package version for the dependencies as well B. And / or use a cache proxy, maven proxy, etc. 40
  • 41. Lesson 5: Prototry A quick and dirty attempt to develop a working model of software. The original intent is to rewrite the ProtoTry, using lessons learned, but schedules never permit. Also known as legacy code. [1] 41[1] Michael Duell, Ailments of Unsuitable Project-Disoriented Software, http://www.fsfla.org/~lxoliva/fun/prog/resign-patterns
  • 42. Lesson 5: Prototry - Docker Hub Registry ● Before writing your own Dockerfile, try a build from someone else o https://registry.hub.docker.com/ o Official builds o Trusted (automated) builds o Other builds For advanced setup, see these images: ● jenkins ● dockerfile/java 42
  • 43. Lesson 5: Prototry - Using other people images PROs CONs ● Faster to get started ● Better tested ● You may end up with a mixed stack to support ○ e.g. different versions of Java ○ Ubuntu vs Debian vs CentOS ● Not all sources use all the best practices described in this presentation For medium - large organisations / heavy Docker users: Best to fork and write your own Dockerfiles 43
  • 44. Lesson 5: Prototry - Potential image hierarchy FROM ubuntu:14.04 # Organization-wide tools (e.g. vim, etc.) myorg-base myorg-java FROM myorg-base:1.0 # OpenJDK | OracleJDK myorg-python FROM myorg-base:1.0 # Install Python 2.7 python-app1 FROM myorg-python:2.7 # ... java-app3 FROM myorg-java:oracle-jdk7 # ... python-app2 FROM myorg-python:2.7 # ... 44
  • 45. Lesson 6: Volume Design Patterns 45
  • 46. ● Nothing to do - that’s the default Docker behavior o Application data is stored along with the infrastructure (container) data ● If the container is restarted, data is still there ● If the container is deleted, data is gone Lesson 6: Inside Container Pattern 46
  • 47. Lesson 6: Host Directory Pattern ● A directory on the host ● To share data across containers on the same host ● For example, put the source code on the host and mount it inside the container with the “-v” flag 47
  • 48. Lesson 6: Data-Only Container Pattern ● Run on a barebone image ● VOLUME command in the Dockerfile or “-v” flag at run ● Just use the “--volumes-from” flag to mount all the volumes in another container 48
  • 50. Lesson 7: Storage backend - Overview ● Options: o VFS o AUFS (default, docker < 0.7) o DeviceMapper  Direct LVM  Loop LVM (default in Red Hat) o Btrfs (experimental) o OverlayFS (experimental) Red Hat[1] says the fastest backends are: 1. OverlayFS 2. Direct LVM 3. BtrFS 4. Loop LVM Lookup your current Docker backend $ docker info |grep Driver [1] http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/ 50
  • 51. Lesson 7: Storage backend - VFS & AUFS ● Both are very basic (NOT for PROD) ● Both store each layer as a separate directory with regular files ● VFS o No Copy-on-Write (CoW) ● AUFS o Original Docker backend o File-level Copy-on-Write (CoW) VFS & AUFS can be useful to understand how Docker works Do not use in PROD 51
  • 52. Lesson 7: Storage backend - DeviceMapper (1/2) ● Already used by linux kernel for LVM2 (logical volume management) o Block-level Copy-on-Write (CoW) o Unused blocks do not use space ● Uses thin pool provisioning to implement CoW snapshots o Each pool requires 2 block devices: data & metadata o By default, uses loop back mounts on sparse regular files # ls -alhs /var/lib/docker/devicemapper/devicemapper 506M -rw-------. 1 root root 100G Sep 10 20:15 data 1.1M -rw-------. 1 root root 2.0G Sep 10 20:15 metadata Loop LVM 52
  • 53. Lesson 7: Storage backend - DeviceMapper (2/2) ● In production: o Use real block devices! (Direct LVM) o Ideally, data & metadata each on its own spindle o Additional configuration is required Docker does not do that for you 53
  • 54. Lesson 7: Storage backend - Btrfs & OverlayFS Btrfs: ● Requires /var/lib/docker to be on a btrfs file system ● Block-level Copy-on-Write (CoW) using Btrfs’s snapshotting ● Each layer stored as a Btrfs subvolume ● No SELinux OverlayFS: ● Support page cache sharing ● Lower FS contains the base image (XFS or EXT4) ● Upper FS contains the deltas ● No SELinux Claims a huge RAM saving 54
  • 56. Docker ● Ethernet bridge “docker0” created when Docker boots ● Virtual subnet on the host (default: 172.17.42.1/16) ● Each container has a pair of virtual Ethernet interfaces ● You can remove “docker0” and use your own bridge if you want 56
  • 57. Weave Why Weave? ● Docker built-in functionalities don’t provide a solution for connecting containers on multiple hosts ● Weave create a virtual network to permit a distributed environment (common in the real word) 57
  • 58. Weave How does it work? ● Virtual routers establish TCP connections to each other with a handshake ● These connections are duplex ● Use “pcap” to capture packets ● Exclude traffic between local containers 58
  • 59. Weave Weave Container Container 1 Container 2 Container 3 Host A Weave Container Container 1 Container 2 Container 3 Host B 59
  • 60. Weave - images Image:Tag Size zettio/weave:0.8.0 11 MB zettio/weavedns:0.8.0 9.4 MB zettio/weavetools:0.8.0 3.7 MB 60
  • 61. Weave - getting started $ sudo weave launch $ sudo weave run 10.0.0.1/24 -ti --name ubuntu-01 ubuntu:14.04 $ sudo weave launch weave-01 $ sudo weave run 10.0.0.2/24 -ti --name ubuntu-02 ubuntu:14.04 ● First host: weave-01 ● Second host: weave-02 Note: “weave run” invokes “docker run -d” (running as a daemon) Starts the weave router in a container Starts the weave router in a container and peers it CIDR notation 61
  • 62. Weave - testing the connectivity (1/2) $ sudo weave status weave router 0.8.0 Our name is 7a:ab:c1:21:f9:3b Sniffing traffic on &{15 65535 ethwe 56:40:66:0b:a4:c6 up|broadcast|multicast} MACs: 56:40:66:0b:a4:c6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:39.23846091 +0000 UTC) 7a:ab:c1:21:f9:3b -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.142183122 +0000 UTC) a2:60:ab:8b:1f:b6 -> 7a:ab:c1:21:f9:3b (2015-01-11 22:27:40.716414595 +0000 UTC) 7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.204010927 +0000 UTC) 1e:b4:78:1e:dd:23 -> 7a:5a:98:6e:92:2e (2015-01-11 22:28:53.42594994 +0000 UTC) Peers: Peer 7a:ab:c1:21:f9:3b (v1) (UID 17511927952474106279) -> 7a:5a:98:6e:92:2e [192.168.1.30:47638] Peer 7a:5a:98:6e:92:2e (v1) (UID 8527109358448991597) -> 7a:ab:c1:21:f9:3b [192.168.1.195:6783] Routes: unicast: 7a:5a:98:6e:92:2e -> 7a:5a:98:6e:92:2e 7a:ab:c1:21:f9:3b -> 00:00:00:00:00:00 broadcast: 7a:ab:c1:21:f9:3b -> [7a:5a:98:6e:92:2e] 7a:5a:98:6e:92:2e -> [] Reconnects: ● First host: weave-01 Connected peers Virtual interface used by Weave Containers and host points 62
  • 63. Weave - testing the connectivity (2/2) $ sudo docker attach ubuntu-02 $ ping -c 4 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=4.22 ms 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=1.20 ms 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=1.73 ms 64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=2.02 ms --- 10.0.0.1 ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3008ms rtt min/avg/max/mdev = 1.206/2.299/4.226/1.150 ms ● Second host: weave-02 It pings! 63
  • 65. cAdvisor ● New tool from Google ● Specialized for Docker containers PROs CONs ● Great web interface ● Docker image available (18 MB) to try it in seconds ● Stats can be export to InfluxDB (data mining to do) ● Needs more maturity ● Missing metrics ○ No data for Disk I/O ● Only keep last 60 metrics locally (not configurable) 65
  • 66. Monitoring with cAdvisor Web interface → 66
  • 68. AdamCloud - The next steps ● Docker + Weave = success ● Open-source the project and merge it upstream into the AmpLab genomic pipeline. ● Support for Amazon EC2 environments ● Improve administration of Docker containers o Monitoring, orchestration, provisioning 68
  • 69. Docker Conclusion ● 1 Docker container = 1 background daemon ● Container isolation is not like a VM ● Use correct versions of images and keep a trace ● Docker is less interesting for multi-tenants use cases (no SSH in the containers) ● Docker is FAST and VERSATILE ● cAdvisor is an interesting monitoring tool, but limited ● Docker is perfect for short lived apps (no long term data persistence) ● Data intensive apps should review the Docker docs carefully. Start looking at Direct LVM. 69
  • 70. References ● Jonathan Bergknoff - Building good docker images, http://jonathan.bergknoff.com/journal/building-good- docker-images ● Michael Crosby - Dockerfile Best Practices, http://crosbymichael.com/dockerfile-best-practices.html ● Michael Crosby - Dockerfile Best Practices - take 2, http://crosbymichael.com/dockerfile-best-practices-take- 2.html ● Nathan Leclaire - The Dockerfile is not the source of truth for your image, http://nathanleclaire.com/blog/2014/09/29/the-dockerfile-is-not-the-source-of-truth-for-your-image/ ● Docker Documentation - Understanding Docker, https://docs.docker.com/introduction/understanding-docker/ ● Docker Documentation - Docker User Guide, https://docs.docker.com/userguide/ ● Docker Documentation - Dockerfile Reference, https://docs.docker.com/reference/builder/ ● Docker Documentation - Command Line (CLI) User Guide, https://docs.docker.com/reference/commandline/cli/ ● Docker Documentation - Advanced networking, http://docs.docker.com/articles/networking/ ● Project Atomic - Supported Filesystems, http://www.projectatomic.io/docs/filesystems/ ● Red Hat Developer Blog - Comprehensive Overview of Storage Scalability in Docker, http://developerblog.redhat.com/2014/09/30/overview-storage-scalability-docker/ ● Linux Kernel Documentation - DeviceMapper Thin Provisioning, https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt ● weave - the Docker network, http://zettio.github.io/weave/ ● GitHub - google/cadvisor, https://github.com/google/cadvisor 70