More Related Content
Similar to Much Ado about CPU
Similar to Much Ado about CPU (20)
More from Martin Packer (8)
Much Ado about CPU
- 1. IBM System z Technical University – Vienna , Austria – May 2-6
zZS28 Much Ado About CPU
Martin Packer
© 2011 IBM Corporation
- 2. IBM System z Technical University – Vienna , Austria – May 2-6
Abstract
System z and zEnterprise processors have in recent years
introduced a number of capabilities of real value to
mainframe customers. These capabilities have, however,
required changes in the way we think about CPU
management.
This presentation describes these capabilities and how to
evolve your CPU management to take them into account.
It is based on the author's experience of evolving his
reporting to support these changes.
This presentation is substantially enhanced this year
© 2011 IBM Corporation
2
- 3. IBM System z Technical University – Vienna , Austria – May 2-6
Agenda
A brief review of technology
Unfinished Business?
Coupling Facility CPU
zAAP and zIIP
z/OS Release 10 Changes
Soft Capping and Group Capacity Limits
Blocked Workloads
z10 Hiperdispatch
Cool It
I/O Assist Processors (IOPs)
SMF 23 and 113
In Conclusion
© 2011 IBM Corporation
3
- 4. IBM System z Technical University – Vienna , Austria – May 2-6
R
A Brief Review of
Technology
© 2011 IBM Corporation
4
- 5. IBM System z Technical University – Vienna , Austria – May 2-6
"Characterisable" Engines
–GCPs - Pool 1
–(Obsolete Pool 2)
–ICFs - Pool 5
–IFLs - Pool 3
–zAAPs - Pool 4
–zIIPs – Pool 6
●
“Non-Characterisable" Engines
―
SAPs
―
Spares
With zEnterprise zBX other engines
―
Not connected in the same way at all
―
Not discussed here
―
Treating as a “z11”
© 2011 IBM Corporation
5
- 6. IBM System z Technical University – Vienna , Austria – May 2-6
Book-Structured
● Connected by a ring in z9
● z10 and zEnterprise ensure all books connected to all books directly
● Data transfers are direct between books via the L2 Cache chip in each book's
MCM
● L2 Cache is shared by every PU on the MCM
● zEnterprise has an additional per-chip level of cache – and nomenclature “cleaned up”
● Only 1 book in BC models
© 2011 IBM Corporation
6
- 7. IBM System z Technical University – Vienna , Austria – May 2-6
IRD CPU Management
Weight Management for GCP engines
–Alter weights within an LPAR Cluster
–Shifts of 10% of weight
CP Management
–Doesn't work with HiperDispatch
–Vary LOGICAL CPs on and off
–Only for GCP engines
WLM objectives
–Optimise goal attainment
–Optimise PR/SM overhead
–Optimise LPAR throughput
Part of "On Demand" picture
–Ensure you have defined reserved engines
–Make weights sensible to allow shifts to happen
© 2011 IBM Corporation
7
- 8. IBM System z Technical University – Vienna , Austria – May 2-6
Unfinished Business?
How do we evolve our performance and capacity reporting?
Should we define an LPAR with dedicated engines?
–Or with shared engines?
•What should the weights be?
-
In total and individually
-
And what about the total for each pool?
-How many engines should each LPAR have?
-And IRD makes all this so much more dynamic
© 2011 IBM Corporation
8
- 9. IBM System z Technical University – Vienna , Austria – May 2-6
Increasing Complexity
Installations are increasing the numbers of LPARs on a machine
–Many exceed 10 per footprint
●
Expect 20 + soon
●
My record: 51 and 52, 56
●
33 and 34 active, respectively
―And have more logical and physical engines
―And increasing the diversity of their LPARs
● Greater incidence of IFLs
● Fast uptake of zIIPs and zAAPs
●Sometimes meaning 2 engine speeds
● Fewer stand-alone CF configurations
―
With mergers etc. the numbers of machines managed by a team is
increasing
―
And stuff's got more dynamic, too
―
As an aside...
●
Shouldn't systems be self-documenting?
© 2011 IBM Corporation
9
- 10. IBM System z Technical University – Vienna , Austria – May 2-6
Coupling Facility CPU
© 2011 IBM Corporation
10
- 11. IBM System z Technical University – Vienna , Austria – May 2-6
Internal Coupling Facility (ICF)
•Managed out of Pool 5
–Pool numbers given in SMF 70 as index into table of labels
– Label is “ICF”
Recommendation: Manage in reporting as a separate pool
Follow special CF sizing guidelines
–Especially for takeover situations
Always runs at full speed
So good technology match for coupled z/OS images on same footprint
Another good reason to use ICFs is IC links
Shared ICFs strongly discouraged for Production
Especially if the CF image has Dynamic Dispatch turned on
© 2011 IBM Corporation
11
- 12. IBM System z Technical University – Vienna , Austria – May 2-6
ICF ...
Need to correlate SMF 70-1 with SMF 74-4 CF Utilisation to get proper
CPU picture
Since z/OS Release 8 74-4 has machine serial number
Allows correlation in most cases
Partition number added to 74-4 in OA21140
• Enables correlation with 70-1 when LPAR name is not
the Coupling Facility Name
© 2011 IBM Corporation
12
- 13. IBM System z Technical University – Vienna , Austria – May 2-6
Structure-Level CPU Consumption
CFLEVEL 15 and z/OS R.9
Always 100% Capture Ratio
Adds up to R744PBSY
Multiple uses:
Capacity planning for changing request rates
Examine which structures are large consumers
Compute CPU cost of a request
• And compare to service time
• Interesting number is “non-CPU” element of service time
– as we shall see
NOTE:
Need to collect 74-4 data from all z/OS systems sharing to get total request rate
© 2011 IBM Corporation
13
- 14. IBM System z Technical University – Vienna , Austria – May 2-6
Structure CPU ...
Where not trivial I plot Sync Request %
Shows if deterioration with load
Different request types and technologies behave markedly differently
For example modern lock structures locally accessed are typically around 5us CPU and
elapsed or lower
For example XCF structures often in hundreds of us elapsed
• And quite high CPU
• Though obviously all async
© 2011 IBM Corporation
14
- 15. IBM System z Technical University – Vienna , Austria – May 2-6
zAAP and zIIP
© 2011 IBM Corporation
15
- 16. IBM System z Technical University – Vienna , Austria – May 2-6
zAAP and zIIP
Must each not exceed number of GCPs
Run at full speed, even if GCPs don't
•Instrumentation documents “speed”
difference
Hardcapping but no softcapping
•No Resource Group capping
Not managed by IRD
–Weight is the INITIAL LPAR weight
© 2011 IBM Corporation
16
- 17. IBM System z Technical University – Vienna , Austria – May 2-6
© 2011 IBM Corporation
17
- 18. IBM System z Technical University – Vienna , Austria – May 2-6
zAAP on zIIP
New with z/OS Release 11
Retrofitted to R.9 and R.10 with OA27495
Not available if you already have zAAPs installed
Or have reserved zAAP logical engines
Designed to enable further use of perhaps-underused zIIPs
Does not change the configuration rules relative to GCPs
Does not suddenly make zAAP-eligible work look like zIIP-
eligible in terms of SRBs etc
No special metrics
eg zAAP work now in zIIP bucket
eg zAAP-eligible now in zIIP-eligible bucket
© 2011 IBM Corporation
18
- 19. IBM System z Technical University – Vienna , Austria – May 2-6
zIIP Instrumentation – Subsystems and Address
Spaces
Instrumentation on consumption and potential for a number of
exploiters:
Latter is eg “zAAP on GCP”
Type 30 Address Space – Interval and Step/Job-End
Takes RMF Workload Activity (72-3) to address space level
DB2 Accounting Trace
Type 101 shows zIIP USED times by usage category
• At plan and package level
• ELIGIBLE is only reported on up to Version 9
Websphere Application Server
Type 120 Subtype 9 (Request Activity)
• Both zIIP and zAAP usage and potential
© 2011 IBM Corporation
19
- 20. IBM System z Technical University – Vienna , Austria – May 2-6
z/OS Release 10 Changes
© 2011 IBM Corporation
20
- 21. IBM System z Technical University – Vienna , Austria – May 2-6
z/OS Release 10 Changes
All RMF Records
Whether at least one zAAP was online
Whether at least one zIIP was online
In Type 70 and retrofitted to supported releases:
Permanent and Temporary Capacity Models and 3
capacities
Hiperdispatch
• To be covered in a few minutes
© 2011 IBM Corporation
21
- 22. IBM System z Technical University – Vienna , Austria – May 2-6
Defined- and Group-
Capacity
instrumentation
© 2011 IBM Corporation
22
- 23. IBM System z Technical University – Vienna , Austria – May 2-6
Soft Capping and Group Capacity
Defined Capacity
A throttle on the rolling 4-hour average of the LPAR
ƒ When this exceeds the defined capacity PR/SM softcaps the LPAR
ƒ CPU delay in RMF
SMF70PMA Average Adjustment Weight for pricing management
SMF70NSW Number of samples when WLM softcaps partition
Group Capacity
Similar to Defined Capacity but for groups of LPARs on the same
machines
SMF70GJT Timestamp when the system joined the Group Capacity
group
SMF70GNM Group name
SMF70GMU Group Capacity MSU limit
© 2011 IBM Corporation
23
- 24. Exceeding 8University – Vienna , Austria – May 2-6
IBM System z Technical
MSUs (MSU_VS_CAP > 100%) in the morning leads to active
capping (SOFTCAPPED > 0%). Note: OCPU and O2 are CPU Queuing
numbers
© 2011 IBM Corporation
24
- 25. IBM System z Technical University – Vienna , Austria – May 2-6
Group Capacity Limits
Each partition (z/OS system) manages itself
Group capacity is based on defined capacity implementation
4hr rolling average of group MSU consumption is used for managing the group's
partitions
Each partition is aware of the consumption of all other partitions on the CPC
And identifies all other partitions that are member of the same capacity group
Calculates its defined share of the capacity group, based on the partition weight.
• This share is the target for the partition if all partitions of the group want to use as
much CPU as possible
If some LPARs do not consume their share the unused capacity will be
distributed over those LPARs that need additional capacity
If a defined capacity limit is defined to a partition that limit will not be violated even
when the partition receives capacity from others.
WLM will only manage partitions with shared CPs and WC=NO
© 2011 IBM Corporation
25
- 26. IBM System z Technical University – Vienna , Austria – May 2-6
LPAR Table Fragment for Group Capacity
© 2011 IBM Corporation
26
- 27. IBM System z Technical University – Vienna , Austria – May 2-6
Blocked Workloads
© 2011 IBM Corporation
27
- 28. IBM System z Technical University – Vienna , Austria – May 2-6
z/OS Release 9 Blocked Workload Support
Rolled back to R.7 and R.8
Blocked workloads:
Lower priority work may not get dispatched for an elongated time
May hold a resource that more important work is waiting for
WLM allows some throughput for blocked workloads
By dispatching low important workload from time to time, these
“blocked workloads” are no longer blocked
Helps to resolve resource contention for workloads that have no
resource management implemented
Additional information in WSC flash
http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10609
Additional instrumentation in 70-1 and 72-3
© 2011 IBM Corporation
28
- 29. IEAOPT BLWLTRPCT and BLWLINTHD (With
IBM System z Technical University – Vienna , Austria – May 2-6
OA22443)
BLWLTRPCT Percentage of the CPU capacity of the LPAR to
be used for promotion
Specified in units of 0.1%
Default is 5 (=0.5%)
Maximum is 200 (=20%)
Would only be spent when sufficiently many
dispatchable units need promotion.
BLWLINTHD Specifies threshold time interval for which a
blocked address space or enclave must wait
before being considered for promotion.
Minimum is 5 seconds. Maximum is 65535
seconds.
Default is 60 seconds.
© 2011 IBM Corporation
29
- 30. IBM System z Technical University – Vienna , Austria – May 2-6
Type 70 CPU Control Section
Type 72-3 Service/Report Class Period Data Section
© 2011 IBM Corporation
30
- 31. IBM System z Technical University – Vienna , Austria – May 2-6
IBM System z10 EC HiperDispatch
© 2011 IBM Corporation
31
- 32. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch
HiperDispatch – z10 EC unique function
– Dispatcher Affinity (DA) - New z/OS Dispatcher
– Vertical CPU Management (VCM) - New PR/SM Support
Hardware cache optimization occurs when a given unit of work is
consistently dispatched on the same physical CPU
– Up until now software, hardware, and firmware have acted independently of each
other
– Non-Uniform-Memory-Access has forced a paradigm change
• CPUs have different distance-to-memory attributes
• Memory accesses can take a number of cycles depending upon cache level / local
or remote memory accessed
The entire z10 EC hardware / firmware / OS stack now tightly
collaborates to manage these effects
© 2011 IBM Corporation
32
- 33. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – z/OS Dispatcher
Functionality
New z/OS Dispatcher
– Multiple dispatching queues
• Average 4 logical processors per queue
– Tasks distributed amongst queues
– Periodic rebalancing of task assignments
– Generally assign work to minimum # logicals needed to use weight
• Expand to use white space on box
– Real-time on/off switch (Parameter in IEAOPTxx)
– May require "tightening up" of WLM policies for important work
• Priorities are more sensitive with targeted dispatching queues
© 2011 IBM Corporation
33
- 34. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – z/OS Dispatcher
Functionality…
Initialization:
Single HIPERDISPATCH=YES z/OS parameter dynamically activates HiperDispatch
(full S/W and H/W collaboration) without IPL
• With HIPERDISPATCH=ON, IRD management of CPU is turned OFF
Four Vertical High LPs are assigned to each Affinity Node
A “Home” Affinity Node is assigned to each address space / task
zIIP, zAAP and standard CP “Home” Affinity Nodes must be maintained for work that
transitions across specialty engines
Benefit increases as LPAR size increases (i.e. crosses books)
© 2011 IBM Corporation
34
- 35. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – z/OS Dispatcher
Functionality…
Workload Variability Issues:
– Short Term
• Dealing with transient utilization spikes
– Intermediate
• Balancing workload across multiple Affinity Nodes
– Manages “Home” Book assignment
– Long Term
• Mapping z/OS workload requirements to available physical resources
– Via dynamic expansion into Vertical Low Logical Processors
© 2011 IBM Corporation
35
- 36. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – PR/SM Functionality
New PR/SM Support
– Topology information exchanged with z/OS
• z/OS uses this to construct its dispatching queues
– Classes of logicals
• High priority allowed to consume weight
– Tight tie of logical processor to physical processor
• Low priority generally run only to consume white space
© 2011 IBM Corporation
36
- 37. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch – PR/SM Functionality…
Firmware Support (PR/SM, millicode)
New z/OS invoked instruction to cause PR/SM to enter “Vertical mode”
• To assign vertical LPs subset and their associated LP to physical CP mapping
– Based upon LPAR weight
Enables z/OS to concentrate its work on fewer vertical processors
• Key in PR/SM overcommitted environments to reduce the LP competition for physical CP
resources
Vertical LPs are assigned High, Medium, and Low attributes
Vertical low LPs shouldn’t be used unless there is logical white space within the CEC and
demand within LPAR
© 2011 IBM Corporation
37
- 38. IBM System z Technical University – Vienna , Austria – May 2-6
z10 EC HiperDispatch Instrumentation
Hiperdispatch status
– SMF70HHF bits for Supported, Active, Status Changed
Parked Time
– SMF70PAT in CPU Data Section
Polarization Weight
– SMF70POW in Logical Processor Data Section
• Highest weight for LPAR means Vertical High processor
• Zero weight means Vertical Low processor
• In-between means Vertical Medium processor
Example on next foil
– 2 x Vertical High (VH)
– 1 x Vertical Medium (VM)
– 4 x Vertical Low (VL)
– Because Hiperdispatch all engines online in the interval are online all
the time
• But there are other engines reserved so with Online Time = 0
© 2011 IBM Corporation
38
- 39. IBM System z Technical University – Vienna , Austria – May 2-6
Depiction Of An LPAR – With HiperDispatch Enabled
120 160
140
100
120
80
100
60 80
60
40
40
20
20
0 0
0 1 2 3 4 5 6
UNPARKED % PARKED % POLAR WEIGHT I/O % © 2011 IBM Corporation
39
- 40. IBM System z Technical University – Vienna , Austria – May 2-6
HiperDispatch “GA2” Support in RMF - OA21140
SMF70POF Polarisation Indicators Bits 0,1
00 is “Horizontal” or “Polarisation Not Indicated”
01 is “Vertical Low”
10 is “Vertical Medium”
11 is “Vertical High”
(Bit 2 is whether it changed in the interval)
SMF70Q00 - SMF70Q12 In & Ready counts based on the number of processors
online and unparked
Refinement is to take into account parking and unparking
Also SMF70RNM
Normalisation factor for zIIP
• Which happens to be the same for zAAP
Also R744LPN – LPAR Number
For correlation with SMF 70
(Also zHPF support)
© 2011 IBM Corporation
40
- 41. IBM System z Technical University – Vienna , Austria – May 2-6
“Cool It” - Cycle Steering
Introduced with z990
http://www.research.ibm.com/journal/rd/483/goth.html
Refined in later processors
BOTH frequency- and voltage-reduction in z9
When cooling degraded processor progressively slowed
Much better than dying
Rare event
• But should not be ignored
WLM Policy refreshed
Admittedly not that helpful a message:
• IWM063I WLM POLICY WAS REFRESHED DUE TO A PROCESSOR SPEED CHANGE
• Automate it
SMF70CPA not changed
• Used as part of SCRT
• Talk to IBM and consider excluding intervals round such an event
R723MADJ is changed
• Al Sherkow's news item shows an example:
– http://www.sherkow.com/updates/20081014cooling.html
In R.12 Types 89, 70, 72 and 30 have instrumentation for this situation
© 2011 IBM Corporation
41
- 42. IBM System z Technical University – Vienna , Austria – May 2-6
IOPs – I/O Assist Processors
Not documented in Type 70
Despite being regular engines characterised as IOPs
NOT a pool
Instrumentation in Type 78-3
Variable-length Control Section
• 1 IOP Initiative Queue / Util Data Section per IOP inside it
Processor Was Busy / Was Idle counts
• NOT Processor Utilisation as such
• Suggest stacking the two numbers on a by-hour plot
I/O Retry counts
• Channel Path Busy, CU Busy, Device Busy
Machines can be configured with different numbers of IOPs
Depending on I/O intensiveness of workloads
• Generally speaking it's only TPF that is said to need extra
IOPs
Analysis can help get this right
© 2011 IBM Corporation
42
- 43. IBM System z Technical University – Vienna , Austria – May 2-6
SMF 23 and 113
© 2011 IBM Corporation
43
- 44. IBM System z Technical University – Vienna , Austria – May 2-6
SMF 23
SMF 23 –The “SMF”record
New extensions to the SMF 23 record
• Provide information related to Dispatching, Storage and I/O
• Available on z/OS 1.8 and above
Why you’d want to collect them?
They may provided a way to help characterize your workload to improve your capacity
planning
• LoIO Mix zPCR is simply an estimate of your actual workload pattern
Record Size and Interval
Small record - 210 bytes (258 bytes with “deltas”) per System per interval
© 2011 IBM Corporation
44
- 45. IBM System z Technical University – Vienna , Austria – May 2-6
What is in the SMF 23s? - New Fields via APAR OA22414
Storage
Total Number of Getmain requests (NGR)
Total Pages backed during Getmain requests (PBG)
Total Number of Fixed requests for Storage below 2 GB (NFR)
Total number of Frames for Fixed requests for Storage below 2 GB (PFX)
Faults
Total number of first reference faults (1RF)
Total number of non first reference faults (NRF)
I/Os
Total Number of I/Os (NIO)
Dispatches (Dispatch)
Number of unlocked TCB Dispatches (TCB)
Number of SRB Dispatches (SRB)
APAR OA27161–Closed 1/19/2009
To provide “delta” counters for above fields
Otherwise “cumulative” counters
© 2011 IBM Corporation
45
- 46. IBM System z Technical University – Vienna , Austria – May 2-6
What is the z10 CPU Measurement Facility?
New hardware instrumentation facility “CPU Measurement
Facility”(CPU MF)
Available on System z10 EC GA2 and z10 BC
Supported by a new z/OS component (Instrumentation), Hardware
Instrumentation Services (HIS)
Potential Future Uses –for this new “cool”virtualization technology
CPU MF provides support built into the processor hardware
• So exploiting mechanism allows the observation of performance
behavior with nearly no impact to the system being observed
Potential Uses
• Future workload characterization
• ISV product improvement
• Application Tuning
© 2011 IBM Corporation
46
- 47. IBM System z Technical University – Vienna , Austria – May 2-6
CPU MF ...
Data collection done by System z hardware
Low overhead
Little/No skew in sampling
Access to information which is not available from software
SAMPLING
SAMPFREQ=800000 is default (samples per minute), = 13,333 /s
• 8M samples in 10 minutes is the default
(DURATION=10 is the default, 10 minutes)
• Recommendation – Start with a small frequency, e.g. SAMPFREQ=320,
and increase after early experiences – e.g. ensure enough disk space for
output
– Smaller z10 BCs should increase only up to SAMPFREQ=130000 (for
DURATION=60)
New IBM Research article
“IBM System z10 performance improvements with software and hardware synergy”
http://www.research.ibm.com/journal/rd/531/jackson.pdf
© 2011 IBM Corporation
47
- 48. IBM System z Technical University – Vienna , Austria – May 2-6
COUNTERS
Basic Counter Set
Cycle count
Instruction count
Level-1 I-cache directory write count
Level-1 I-cache penalty cycle count
Level-1 D-cache directory write count
Level-1 D-cache penalty cycle count
Problem State Counter Set
Problem state cycle count
Problem state instruction count
Problem state level-1 I-cache directory write count
Problem state level-1 I-cache penalty cycle count
Problem state level-1 D-cache directory write count
Problem state level-1 D-cache penalty cycle count
Extended Counter Set
Number and meaning of counters are model-dependent
© 2011 IBM Corporation
48
- 49. IBM System z Technical University – Vienna , Austria – May 2-6
Crypto Activity Counter Set (CPACF activity)
PRNG function count DES function count
PRNG cycle count DES cycle count
PRNG blocked function count DES blocked function count
PRNG blocked cycle count DES blocked cycle count
SHA function count AES function count
SHA cycle count AES cycle count
SHA blocked function count AES blocked function count
SHA blocked cycle count AES blocked cycle count
© 2011 IBM Corporation
49
- 50. IBM System z Technical University – Vienna , Austria – May 2-6
Sample Report – Basic / Extended Counters z10 L1 Cache
Hierarchy Sourcing
© 2011 IBM Corporation
50
- 51. IBM System z Technical University – Vienna , Austria – May 2-6
In Conclusion
© 2011 IBM Corporation
51
- 52. IBM System z Technical University – Vienna , Austria – May 2-6
In Conclusion
Be prepared for fractional engines, multiple engine pools, varying weights
etc
Understand the limitations of z/OS Image Level CPU Utilisation as a
number
Take advantage of Coupling Facility Structure CPU
For Capacity Planning
For CF Request Performance Analysis
There’s additional instrumentation for Defined- and Group-Capacity limits
z9, z10 and zEnterprise ARE different from z990 – and from each other
The CPU data model is evolving
To be more complete
To be more comprehensible
To meet new challenges
Such as Hiperdispatch’s Parked Time state
For example SMF 23 and 113
© 2011 IBM Corporation
52