10% de réduction sur vos envois d'emailing --> CLIQUEZ ICI
Retour à l'accueil, cliquez ici
ou juste avant la balise de fermeture -->
Documentation IBM
IBM System x3100 M4
IBM System x at-a-glance guide
http://www.redbooks.ibm.com/technotes/tips0811.pdf
Power Systems Enterprise Servers
with PowerVM Virtualization and RA
http://www.redbooks.ibm.com/redbooks/pdfs/sg247965.pdf
IBM Power 710 and 730
Technical Overview
and Introductio
http://www.redbooks.ibm.com/redpapers/pdfs/redp4796.pdf
IBM CICS Event Processing: New Features
in V4.
http://www.redbooks.ibm.com/redpapers/pdfs/redp4809.pdf
IBM Power 770 and 780
Technical Overview
and Introductio
http://www.redbooks.ibm.com/redpapers/pdfs/redp4798.pdf
Choix du stockage sur les serveurs eX5 eXFlash IBM
Choosing eXFlash Storage on IBM eX5 Server
http://www.redbooks.ibm.com/redpapers/pdfs/redp4807.pdf
IBM System x3100 M4 1
®
IBM System x3100 M4
IBM System x at-a-glance guide
The System x3100 M4 single socket tower server is designed for small businesses and first-time server
buyers looking for a solution to improve business efficiency. It delivers several IBM® innovative features
in a compact mini 4U form factor with a competitive price. The IBM System x3100 M4 provides
next-generation performance in an innovative and compact design with flexible configuration options,
built-in security, and systems management capabilities. It leverages the next-generation dual-core and
quad-core Intel Xeon processor technology.
Figure 1. The IBM System x3100 M4
Did you know
The System x3100 M4 server is a compact, cost-effective, single-processor tower or rack-mountable
server that has been optimized to provide outstanding availability, manageability, and performance
features to small-to-medium-sized businesses, retail stores, or distributed enterprises. It supports the
latest Xeon processors for applications that require performance and stability, and Core i3 processors for
applications that require lower cost.
The system includes features not typically seen in this class of system, such as standard, embedded
RAID-0/1, remote control capabilities even when the machine is powered off, and Predictive Failure
Analysis (PFA) on processor and memory.IBM System x3100 M4 2
Locations of key components
Figures 2 and 3 show the front and rear of the server.
Figure 2. Front view of the System x3100 M4
Figure 3. Rear view of the System x3100 M4IBM System x3100 M4 3
Figure 4 shows the locations of key components inside the server.
Figure 4. Inside view of System x3100 M4IBM System x3100 M4 4
Standard specifications
Table 1. Standard specifications (part 1)
Components Specification
Form factor Tower (can be a 4U rack form factor using the optional Tower-to-Rack conversion kit).
Processor One quad-core Intel Xeon E3-1200 series processor (up to 3.5 GHz/8 MB/1333 MHz) or
one dual-core Intel Core i3 2100 series processor (up to 3.3 GHz/3 MB/1333 MHz), or
Pentium (dual-core) (up to 3.0 GHz/3 MB/1333 Mhz) and low-cost Celeron. Supports
specific quad-core and dual-core processors via Configure-To-Order (CTO).
Memory cache Up to 8 MB L3 for Intel Xeon E3-1200 series processors. Up to 3 MB L3 for Intel Core i3
2100 series processors.
Chipset Intel C202.
Memory DIMM slots 4 DDR3 DIMM slots.
Memory capacity Up to 32 GB with 8 GB DDR3 UDIMMs and four populated DIMM slots. RDIMMs are not
supported.
Memory protection ECC.
Disk drive bays Up to four 3.5" Simple Swap SATA HDDs.
Maximum internal
storage
Up to 12 TB with 3 TB 3.5" SATA HDDs.
RAID support Software RAID 0, 1, 10 with ServeRAID C100 controller; hardware RAID 0, 1, 1E with
ServeRAID-BR10il v2 or M1015. Optional RAID 5, 50 with ServeRAID M5014. Optional
upgrade to RAID 5 is available for M1015 and C100.
Optical drive bay One 5.25" HH bay, support for DVD-ROM or Multiburner. Half-High SATA DVD-ROM
standard in standard models.
Tape drive bays One 5.25 HH bay, support for DDS, RDX, or LTO.
Network interfaces Integrated 2 port Gigabit Ethernet (Intel 82574L) / 1 port is shared with IMM.
PCI Expansion slots Four PCI Express slots:
? Slot 1, PCIe x8 Gen 2, full-height, half-length
? Slot 2, PCIe x16 (x8 wired) Gen 2, full-height, half-length
? Slot 3, PCIe x4 Gen 2, full-height, half-length
? Slot 4, PCIe x1 Gen 2, full-height, half-length
External ports Two USB 2.0 ports on front. Four USB 2.0, one DB-15 video, one DB-9 serial, two RJ-45
Gigabit Ethernet network ports (one is dedicated, one is shared with IMM) on rear. One
internal USB port for internal USB tape drive.
Cooling One speed-controlled non-redundant fan.
Power supply One fixed 350 W AC or 300 W AC 80Plus Bronze power supply (model dependent)
Hot-swap components None.
Systems management UEFI, IBM Integrated Management Module II (IMM2), Predictive Failure Analysis,
Automatic Server Restart, IBM Systems Director and IBM Systems Director Active Energy
Manager™, IBM ServerGuide.IBM System x3100 M4 5
Table 1. Standard specifications (part 2)
Components Specification
Security features Power-on password, Administrator's password, Trusted Platform Module (TPM).
Video Matrox G200eV with 16 MB memory integrated into the IMM. Maximum resolution is
1600x1200 at 85 Hz with 64 K colors.
Operating systems
supported
Microsoft Windows Server 2008 R2 and 2008, Red Hat Enterprise Linux 5, SUSE Linux
Enterprise Server 11.
Limited warranty One-year customer replaceable unit and onsite limited warranty with 9x5/next business
day (NBD) response time.
Service and support Optional service upgrades are available through IBM ServicePacs®: 24x7/NBD or four
hours onsite repair, 1-year or 2-year warranty extension, remote technical support for IBM
hardware and selected IBM and third-party (Microsoft, Linux, VMware) software.
Dimensions Height: 360 mm (14.2 in); width: 180 mm (7.1 in); depth: 480 mm (18.9 in)
Weight Minimum configuration: 10 kg (22.0 lb), Maximum configuration: 13 kg (28.7 lb)
The x3100 M4 servers are shipped with the following items:
? Statement of Limited Warranty.
? Important Notices.
? Documentation CD that contains the Installation and User's Guide.
? IBM Systems Director 6.2 Base for x86 DVD-ROM.
? Country-specific models might have one country-specific line cord.IBM System x3100 M4 6
Standard models
The following table lists the standard models.
Table 2. Standard models
Model† Intel Processor * (one maximum) Memory Disk
adapter
HDD bays Disks GbE DVD Power
supply
2582-32x 1x Pentium G850 2.9GHz 2C 3MB 1333MHz 1x 2 GB C100 4x 3.5" SS Open 2 DVD 1x 350W
2582-42x 1x Corei3-2100 3.1GHz 2C 3MB 1333MHz 1x 2 GB C100 4x 3.5" SS Open 2 DVD 1x 350W
2582-62x 1x Xeon E3-1220 3.1GHz 4C 8MB 1333MHz 1x 2 GB C100 4x 3.5" SS Open 2 DVD 1x 350W
2582-82x 1x Xeon E3-1270 3.4GHz 4C 8MB 1333MHz 1x 4 GB C100 4x 3.5" SS Open 2 DVD 1x 300W
* Processor detail: Processor quantity, processor model, core speed, number of cores, L3 cache, front-side bus speed
Express models
Express models are preconfigured with additional components such as processors, memory, and disks
with the purpose of making the ordering and installation process simpler. The following table lists the
Express models that are available in certain regions.
Table 3. Express models
Model Intel Processor * (one maximum) RAM Disk
adapter
HDD
bays
Disks Gb DVD Power
Region: NA & LA
2582-EAU Corei3-2100 3.1GHz 2C 3MB 1333MHz 1x 1 GB C100 4x 3.5" SS 1x 250 2 DVD 1x 350W
2582-EBU Corei3-2120 3.3GHz 2C 3MB 1333MHz 1x 2 GB C100 4x 3.5" SS 1x 250 2 DVD 1x 350W
2582-ECU Xeon E3-1220 3.1GHz 4C 8MB 1333MHz 1x 2 GB C100 4x 3.5" SS 1x 250 2 DVD 1x 350W
2582-EDU Xeon E3-1230 3.2GHz 4C 8MB 1333MHz 1x 2 GB C100 4x 3.5" SS 1x 250 2 DVD 1x 350W
Region: NE & SW & CEE & MEA
2582-E1U Corei3-2100 3.1GHz 2C 3MB 1333MHz 1x 2 GB C100 4x 3.5" SS 1x 250 2 DVD 1x 350W
2582-E2U Xeon E3-1220 3.1GHz 4C 8MB 1333MHz 1x 2 GB C100 4x 3.5" SS 1x 250 2 DVD 1x 350W
2582-E3U Xeon E3-1230 3.2GHz 4C 8MB 1333MHz 1x 4 GB C100 4x 3.5" SS 1x 250 2 Multi 1x 350W
* Processor detail: Processor quantity, processor model, core speed, number of cores, L3 cache, front-side bus speedIBM System x3100 M4 7
Processor options
The server supports only one processor, which is already installed in all standard and Express models. No
additional processor options are available. The following table lists all processors available in standard
models of x3100 M4 or via CTO. If there is no corresponding where-used model for a particular processor,
then that processor is only available through the configure-to-order (CTO) process.
Table 4. Processor options
Part number Description Standard models where used
None* Intel Pentium G840 2.8 GHz 2C 3 MB 1333 MHz DDR3 65W -
None* Intel Pentium G850 2.9 GHz 2C 3 MB 1333 MHz DDR3 65W 32x
None* Intel Pentium G860 3.0 GHz 2C 3 MB 1333 MHz DDR3 65W -
None* Intel Pentium G620 2.6 GHz 2C 3 MB 1066 MHz DDR3 65W -
None* Intel Pentium G630 2.7 GHz 2C 3 MB 1066 MHz DDR3 65W -
None* Intel Corei3-2100, 3.1 GHz 2C 3 MB 1333 MHz DDR3 65 W 42x
None* Intel Corei3-2120, 3.3 GHz 2C 3 MB 1333 MHz DDR3 65 W -
None* Intel Core i3-2130 3.4GHz 2C 3MB cache 1333MHz 65W A2X
None* Intel Xeon E3-1220 3.1 GHz 4C 8 MB 1333 MHz DDR3 80 W 62x
None* Intel Xeon E3-1220L 2.2 GHz 2C 3 MB 1333 MHz DDR3 20 W -
None* Intel Xeon E3-1230 3.2 GHz 4C 8 MB 1333 MHz DDR3 80 W -
None* Intel Xeon E3-1240 3.3 GHz 4C 8 MB 1333 MHz DDR3 80 W -
None* Intel Xeon E3-1260L 2.3 GHz 4C 8 MB 1333 MHz DDR3 45 W -
None* Intel Xeon E3-1270 3.4 GHz 4C 8 MB 1333 MHz DDR3 80 W 82x
None* Intel Xeon E3-1280 3.5 GHz 4C 8 MB 1333 MHz DDR3 95 W -
* No additional processor options are available. The server supports only one processor, which is already included in a
standard or custom configuration.
Memory options
The x3100 M4 has four DIMM slots, and only DDR3 ECC UDIMMs are supported. The CPU has two
memory channels, and there are two DIMMs per channel. If more than one DIMM is planned to be
installed, then DIMMs must be installed in a pair, and both DIMMs in a pair must be identical in type and
size. The following table lists the memory options supported by the server.
Table 5. Memory options
Part
number
Description Maximum
supported
Standard models
where used
44T1568 1 GB (1x 1 GB, 1Rx8) PC3-10600 CL9 ECC 1333 LP UDIMM 4 -
44T1570 2 GB (1x 2 GB, 1Rx8, 1.5 V) PC3-10600 CL9 ECC 1333 LP UDIMM 4 32x, 42x, 62x
44T1571 4 GB (1x 4 GB, 2Rx8) PC3-10600 CL9 ECC 1333 LP UDIMM 4 82x
90Y3165 8GB (4Gb, 2Rx8, 1.5V) PC3-10600 DDR3-1333 LP UDIMM 4 -IBM System x3100 M4 8
Internal disk storage options
The IBM System x3100 M4 server supports up to four 3.5" simple-swap SATA hard drives. The following
table lists the supported hard drive options.
Table 6. 3.5" Simple-Swap SATA disk drive options
Part number Description Maximum quantity
supported
39M4514 500 GB 7200 RPM 3.5" Simple-Swap SATA II 4
43W7750 IBM 250GB 7.2K SATA 3.5" Simple-Swap HDD 4
43W7622 IBM 1TB 7200 SATA 3.5" Simple Swap HDD 4
42D0787 IBM 2 TB 7200 NL SATA 3.5" SS HDD 4
81Y9778 IBM 3TB 7.2K 6Gbps SATA 3.5'' SS HDD 4
* This drive cannot be ordered separately. It is only available via special bid or the CTO process.
The Integrated ServeRAID C100 disk controller offers RAID 0, 1, and 10 standard. RAID 5 is optional. The
following table lists the RAID controller additional options supported by the server.
Table 7. RAID controllers for internal storage
Part number Description Maximum quantity
supported
Standard models
where used
49Y4731 ServeRAID-BR10il SAS/SATA Controller v2 1 -
46M0831 ServeRAID M1015 SAS/SATA Controller 1 -
46M0832 ServeRAID M1000 Series Advance Feature Key 1 -
46M0916 ServeRAID M5014 SAS/SATA Controller 1 -
Only one RAID controller can be used with the server to support internal HDDs. Features of the supported
RAID controllers are listed below.
The ServeRAID BR10il v2 SAS/SATA Controller has the following specifications:
? One Mini-SAS internal connector
? Supports RAID levels 0, 1, and 1E
? 3 Gbps throughput per port
? Based on the LSI 1064E controller
? PCI Express 2.0 x4 host interface
? Stripe size: 64 KB (fixed)
For more information, see the ServeRAID-BR10il SAS/SATA Controller v2 for IBM System x® at-a-glance
guide: http://www.redbooks.ibm.com/abstracts/tips0741.html?OpenIBM System x3100 M4 9
The ServeRAID M1015 SAS/SATA Controller has the following specifications:
? Two Mini-SAS internal connectors
? Supports RAID levels 0, 1, and 10
? Supports RAID levels 5 and 50 with optional ServeRAID M1000 Series Advanced Feature Key
? 6 Gbps throughput per port
? Based on the LSI SAS2008 6 Gbps RAID on Chip (ROC) controller
? PCI Express 2.0 x8 host interface
? Configurable stripe size up to 64 KB
For more information, see the ServeRAID M1015 SAS/SATA Controller for System x at-a-glance guide:
http://www.redbooks.ibm.com/abstracts/tips0740.html?Open
The ServeRAID M5014 SAS/SATA Controller has the following specifications:
? Two Mini-SAS internal connectors
? Supports RAID levels 0, 1, 5, 10, and 50
? 6 Gbps throughput per port
? PCI Express 2.0 x8 host interface
? Based on the LSI SAS2108 6 Gbps ROC controller
? 256 MB of onboard cache
For more information, see the ServeRAID M5015 and M5014 SAS/SATA Controllers for IBM System x
at-a-glance guide: http://www.redbooks.ibm.com/abstracts/tips0738.html?OpenIBM System x3100 M4 10
Internal tape drives
The server supports the internal tape drive options listed in the following table. Internal tape drives are
installed in a 5.25" HH bay. A maximum of one tape drive is supported. SAS tape drives require SAS HBA
to be installed in server. USB tape drives are connected to the dedicated USB tape drive connector on the
system board.
Table 8. Internal tape drives
Part
number
Description Maximum quantity
supported
46C5399 IBM DDS Generation 5 USB Tape Drive 1
39M5636 IBM DDS Generation 6 USB Tape Drive 1
43W8478 IBM Half High LTO Gen 3 SAS Tape Drive 1
44E8895 IBM Half High LTO Gen 4 SAS Tape Drive 1
46C5364 IBM RDX Removable Hard Disk System - Internal USB 160 GB Bundle 1
46C5387 IBM RDX Removable Hard Disk System - Internal USB 320 GB Bundle 1
46C5388 IBM RDX Removable Hard Disk System - Internal USB 500 GB Bundle 1
For more information, see the following at-a-glance guides:
? IBM RDX Removable Disk Backup Solution at-a-glance guide
http://www.redbooks.ibm.com/abstracts/tips0726.html?Open
? IBM DDS Generation 5 USB Tape Drive at-a-glance guide
http://www.redbooks.ibm.com/abstracts/tips0755.html?Open
? IBM DDS Generation 6 USB Tape Drive at-a-glance guide
http://www.redbooks.ibm.com/abstracts/tips0725.html?OpenIBM System x3100 M4 11
Optical drives
The server supports the optical drive options listed in the following table.
Table 9. Optical drives
Part number Description Maximum quantity
supported
Standard models
where used
None* Half-High SATA DVD-ROM 1 32x, 42x, 62x, 82x
None* Half-High SATA Multi-Burner 1 -
* This option is only available via CTO or is already installed in standard models.
The Half-High SATA DVD-ROM supports the following media and speeds for reading:
? CD-ROM 48X
? CD-DA (DAE) 40X
? CD-R 48X
? CD-RW 40X
? DVD-ROM (single layer) 16X
? DVD-ROM (dual layer) 12X
? DVD-R (4.7 GB) 16X
? DVD-R DL 12X
? DVD+R 16X
? DVD+R DL 12X
? DVD-RW (4.7 GB) 12X
? DVD+RW 12X
? DVD-RAM (4.7/9.4 GB) 6X
The Half-High SATA Multi-Burner supports the same media and speeds for reading as HH DVD-ROM. In
addition, this drive supports the following media and speeds for writing:
? CD-R 24X
? CD-RW 4X
? High Speed CD-RW 10X
? Ultra Speed CD-RW 16X
? DVD-R 8X
? DVD-R DL 8X
? DVD+R 8X
? DVD+R DL 8X
? DVD-RW 6X
? DVD+RW 8X
? DVD-RAM 3X
External disk storage expansion
The x3100 M4 server currently does not support external disk storage expansion.IBM System x3100 M4 12
External tape backup
The server supports the external tape attachment options listed in the following table.
Table 10. External tape options
Part number Description
External tape expansion enclosures for internal tape drives
87651UX 1U Tape Drive Enclosure
8767HHX Half High Tape Drive Enclosure
87651NX 1U Tape Drive Enclosure (with Nema 5-15P Line Cord)
8767HNX Half High Tape Drive Enclosure (with Nema 5-15P Line Cord)
Tape enclosure adapters (with cables)
44E8869 USB Enclosure Adapter Kit
40K2599 SAS Enclosure Adapter Kit
Internal tape drives supported by external tape enclosures
362516X IBM RDX Removable Hard Disk Storage System - External USB 160 GB Bundle
362532X IBM RDX Removable Hard Disk Storage System - External USB 320 GB Bundle
362550X IBM RDX Removable Hard Disk Storage System - External USB 500 GB Bundle
46C5399 IBM DDS Generation 5 USB Tape Drive
39M5636 IBM DDS Generation 6 USB Tape Drive
43W8478 IBM Half High LTO Gen 3 SAS Tape Drive
44E8895 IBM Half High LTO Gen 4 SAS Tape Drive
49Y9898 IBM Internal Half High LTO Gen 5 SAS Tape Drive
External tape drives
3628L5X IBM External Half High LTO Gen 5 SAS Tape Drive (with US linecord)
3628N5X IBM External Half High LTO Gen 5 SAS Tape Drive (no US linecord)
I/O expansion options
The server offers four PCI Express 2.0 expansion slots. The form-factors of available slots are:
? Slot 1, PCIe x8 Gen 2, full-height, half-length
? Slot 2, PCIe x16 (x8 wired) Gen 2, full-height, half-length
? Slot 3, PCIe x4 Gen 2, full-height, half-length
? Slot 4, PCIe x1 Gen 2, full-height, half-lengthIBM System x3100 M4 13
Network adapters
The x3100 M4 offers two integrated Gigabit Ethernet ports. One port is shared with Integrated
Management Module (IMM).
The integrated NICs have the following features:
? Intel 82574L chip
? TCP/UDP, IPv4, and IPv6 checksum offloads
? TCP Segmentation/Transmit Segmentation Offloading (TSO)
? Wake on LAN support
? 802.1Q VLAN tagging support
? Support for jumbo frames up to 9 KBytes
? NIC Teaming (Load Balancing and Failover) with Intel PROSet software
The following table lists additional supported network adapters.
Table 11. Network adapters
Part number Description Maximum quantity
supported
Gigabit Ethernet
42C1780 NetXtreme II 1000 Express Dual Port Ethernet Adapter 3
49Y4220 NetXtreme II 1000 Express Quad Port Ethernet Adapter 3
49Y4230 Intel Ethernet Dual Port Server Adapter I340-T2 for IBM System x 3
49Y4240 Intel Ethernet Quad Port Server Adapter I340-T4 for IBM System x 3IBM System x3100 M4 14
Storage host bus adapters
The following table lists the storage host bus adapters (HBAs) supported by x3100 M4 server.
Table 12. Storage adapters
Part number Description Maximum quantity
supported
Fibre Channel
42D0485 Emulex 8 Gb FC Single-port HBA for IBM System x 2
42D0494 Emulex 8 Gb FC Dual-port HBA for IBM System x 2
42D0501 QLogic 8 Gb FC Single-port HBA for IBM System x 2
42D0510 QLogic 8 Gb FC Dual-port HBA for IBM System x 2
46M6049 Brocade 8 Gb FC Single-port HBA for IBM System x 2
46M6050 Brocade 8 Gb FC Dual-port HBA for IBM System x 2
Converged Network Adapters (CNA)*
42C1820 Brocade 10 Gb Dual-port CNA for IBM System x 2
42C1800 QLogic 10 Gb Dual Port CNA for IBM System x 2
SAS
46M0907 IBM 6 Gb SAS HBA Controller 2
* Note: Converged Network Adapters require SFP+ optical transceivers or DAC cables that must be purchased
separately.
PCIe SSD adapters
The server does not support High IOPS SSD adapters.
Power supplies
Models of the x3100 M4 include one fixed 350 W AC power supply or one fixed 300 W AC 80 Plus Bronze
power supply, as listed in Tables 2 and 3.
Integrated virtualization
The x3100 M4 currently does not support VMware ESXi embedded virtualization.
Remote management
The server contains IBM IMM, which provides advanced service-processor control, monitoring, and
alerting functions. If an environmental condition exceeds a threshold or if a system component fails, the
IMM lights LEDs to help you diagnose the problem, records the error in the event log, and alerts you to the
problem. Optionally, the IMM also provides a virtual presence capability for remote server management
capabilities.IBM System x3100 M4 15
The IMM provides remote server management through industry-standard interfaces:
? Intelligent Platform Management Interface (IPMI) Version 2.0
? Simple Network Management Protocol (SNMP) Version 3
? Common Information Model (CIM)
? Web browser
Top-of-rack Ethernet switches
The server supports the following top-of-rack Ethernet switches from IBM System Networking.
Table 13. IBM System Networking - Top-of-rack switches
Part number Description
IBM System Networking - 1 Gb top-of-rack switches
0446013 IBM BNT RackSwitch G8000R
7309CFC IBM BNT RackSwitch G8000F
7309CD8 IBM BNT RackSwitch G8000DC
7309G52 IBM BNT RackSwitch G8052R
730952F IBM BNT RackSwitch G8052F
427348E IBM Ethernet Switch J48E
6630010 Juniper Networks EX2200 24 Port
6630011 Juniper Networks EX2200 24 Port with PoE
6630012 Juniper Networks EX2200 48 Port
6630013 Juniper Networks EX2200 48 Port with PoE
IBM System Networking - 10 Gb top-of-rack switches
7309BD5 IBM BNT RackSwitch G8124DC
7309BR6 IBM BNT RackSwitch G8124ER
7309BF7 IBM BNT RackSwitch G8124EF
7309G64 IBM BNT RackSwitch G8264R
730964F IBM BNT RackSwitch G8264F
0719-410 Juniper Networks EX4500 - Front to Back Airflow
0719-420 Juniper Networks EX4500 - Back to Front AirflowIBM System x3100 M4 16
Uninterruptible power supply units
The server supports attachments to the uninterruptible power supply (UPS) units listed in the following
table.
Table 14. Uninterruptible power supply units
Part number Description
Rack-mounted UPS
21304RX IBM UPS 10000XHV
53951AX IBM 1500VA LCD 2U Rack UPS (100V/120V)
53951KX IBM 1500VA LCD 2U Rack UPS (230V)
53952AX IBM 2200VA LCD 2U Rack UPS (100V/120V)
53952KX IBM 2200VA LCD 2U Rack UPS (230V)
53953AX IBM 3000VA LCD 3U Rack UPS (100 V/120 V)
53953JX IBM 3000VA LCD 3U Rack UPS (200 V/208 V)
53956AX IBM 6000VA LCD 4U Rack UPS (200 V/208 V)
53956KX IBM 6000VA LCD 4U Rack UPS (230 V)
For more information, see the following at-a-glance guides:
? IBM 3000VA LCD 3U Rack Uninterruptible Power Supply for IBM System x at-a-glance guide
http://www.redbooks.ibm.com/abstracts/tips0782.html?Open
? IBM 6000VA LCD 4U Rack UPS at-a-glance guide
http://www.redbooks.ibm.com/abstracts/tips0793.html?OpenIBM System x3100 M4 17
Power distribution units
The server supports attachments to the power distribution units (PDUs) listed in the following table.
Table 15. Power distribution units (part 1)
Part number Description
Switched and Monitored PDUs
46M4002 IBM 1U 9 C19/3 C13 Active Energy Manager DPI® PDU
46M4003 IBM 1U 9 C19/3 C13 Active Energy Manager 60A 3 Phase PDU
46M4004 IBM 1U 12 C13 Active Energy Manager DPI PDU
46M4005 IBM 1U 12 C13 Active Energy Manager 60A 3 Phase PDU
46M4167 IBM 1U 9 C19/3 C13 Switched and Monitored 30A 3 Phase PDU
46M4116 IBM 0U 24 C13 Switched and Monitored 30A PDU
46M4119 IBM 0U 24 C13 Switched and Monitored 32A PDU
46M4134 IBM 0U 12 C19/12 C13 Switched and Monitored 50A 3 Phase PDU
46M4137 IBM 0U 12 C19/12 C13 Switched and Monitored 32A 3 Phase PDU
Enterprise PDUs
71762MX IBM Ultra Density Enterprise PDU C19 PDU+ (WW)
71762NX IBM Ultra Density Enterprise PDU C19 PDU (WW)
71763MU IBM Ultra Density Enterprise PDU C19 3 phase 60A PDU+ (NA)
71763NU IBM Ultra Density Enterprise PDU C19 3 phase 60A PDU (NA)
39M2816 IBM DPI C13 Enterprise PDU without linecord
39Y8923 DPI 60A Three Phase C19 Enterprise PDU with IEC309 3P+G (208 V) fixed line cord
39Y8941 DPI Single Phase C13 Enterprise PDU without line cord
39Y8948 DPI Single Phase C19 Enterprise PDU without line cord
Front-End PDUs
39Y8934 DPI 32amp/250V Front-end PDU with IEC 309 2P+Gnd connector
39Y8935 DPI 63amp/250V Front-end PDU with IEC 309 2P+Gnd connector
39Y8938 30amp/125V Front-end PDU with NEMA L5-30P connector
39Y8939 30amp/250V Front-end PDU with NEMA L6-30P connector
39Y8940 60amp/250V Front-end PDU with IEC 309 60A 2P+N+Gnd connectorIBM System x3100 M4 18
Table 15. Power distribution units (part 2)
Part number Description
Universal PDUs
39Y8951 DPI Universal Rack PDU w/ US LV and HV line cords
39Y8952 DPI Universal Rack PDU w/ CEE7-VII Europe LC
39Y8953 DPI Universal Rack PDU w/ Denmark LC
39Y8954 DPI Universal Rack PDU w/ Israel LC
39Y8955 DPI Universal Rack PDU w/Italy LC
39Y8956 DPI Universal Rack PDU w/South Africa LC
39Y8957 DPI Universal Rack PDU w/UK LC
39Y8958 DPI Universal Rack PDU with AS/NZ LC
39Y8959 DPI Universal Rack PDU w/China LC
39Y8962 DPI Universal Rack PDU (Argentina)
39Y8960 DPI Universal Rack PDU (Brazil)
39Y8961 DPI Universal Rack PDU (India)
0U Basic PDUs
46M4122 IBM 0U 24 C13 16A 3 Phase PDU
46M4125 IBM 0U 24 C13 30A 3 Phase PDU
46M4128 IBM 0U 24 C13 30A PDU
46M4131 IBM 0U 24 C13 32A PDU
46M4140 IBM 0U 12 C19/12 C13 60A 3 Phase PDU
46M4143 IBM 0U 12 C19/12 C13 32A 3 Phase PDUIBM System x3100 M4 19
Racks cabinets
The server supports the rack cabinets listed in the following table. Tower-to-Rack Conversion Kit (part
number 69Y5182, 4U Tower to Rack Conversion Kit for x3100 M4) is required for the server to be installed
in the rack.
Table 16. Rack cabinets
Part number Description
69Y5182 Tower to 4U Rack Conversion Kit for IBM System x3100 M4
93072PX IBM 25U Static S2 Standard Rack
93072RX IBM 25U Standard Rack
93074RX IBM 42U Standard Rack
93074XX IBM 42U Standard Rack Extension
93084EX IBM 42U Enterprise Expansion Rack
93084PX IBM 42U Enterprise Rack
93604EX IBM 42U 1200 mm Deep Dynamic Expansion Rack
93604PX IBM 42U 1200 mm Deep Dynamic Rack
93614EX IBM 42U 1200 mm Deep Static Expansion Rack
93614PX IBM 42U 1200 mm Deep Static Rack
93624EX IBM 47U 1200 mm Deep Static Expansion Rack
93624PX IBM 47U 1200 mm Deep Static RackIBM System x3100 M4 20
Rack options
The server supports the rack console switches and monitor kits listed in the following table.
Table 17. Rack options
Part number Description
Monitor kits and keyboard trays
172317X 17" 1U Flat Panel Console Kit
172319X 19" 1U Flat Panel Console Kit
40K9584 IBM Preferred Pro Keyboard USB - US English 103P
40K5372 IBM Rack Mountable Keyboard & Pointing Device - 3m Cable - Black - USB - US English 103P
Console switches
1754D2X IBM Global 4x2x32 Console Manager (GCM32)
1754D1X IBM Global 2x2x16 Console Manager (GCM16)
1754A2X IBM Local 2x16 Console Manager (LCM16)
1754A1X IBM Local 1x8 Console Manager (LCM8)
Console cables
43V6147 IBM Single Cable USB Conversion Option (UCO)
39M2895 IBM USB Conversion Option (UCO) - 4 Pack
39M2897 IBM Long KVM Conversion Option (KCO) - 4 Pack
46M5383 IBM Virtual Media Conversion Option Gen2 (VCO2)
For more information, see the following IBM Redbooks® publication at-a-glance guides:
? IBM 1754 LCM8 and LCM16 Local Console Managers
http://www.redbooks.ibm.com/abstracts/tips0788.html
? IBM GCM16 and GCM32 Global Console Managers
http://www.redbooks.ibm.com/abstracts/tips0772.html
Warranty options
The IBM System x3100 M4, machine type 2582, has a 1-year onsite warranty with 9x5/NBD terms. IBM
offers warranty service upgrades through IBM ServicePacs. The IBM ServicePac is a series of
prepackaged warranty maintenance upgrades and post-warranty maintenance agreements with a
well-defined scope of services, including service hours, response time, term of service, and service
agreement terms and conditions.
IBM ServicePac offerings are country-specific. That is, each country might have its own service types,
service levels, response times, and terms and conditions. Not all covered types of ServicePacs might be
available in a particular country. For more information about IBM ServicePac offerings available in your
country, see the IBM ServicePac Product Selector at:
https://www-304.ibm.com/sales/gss/download/spst/servicepacIBM System x3100 M4 21
In general, the types of IBM ServicePacs are:
? Warranty and maintenance service upgrades
? One, 2, 3, 4, or 5 years of 9x5 or 24x7 service coverage
? Onsite repair from next business day to 4 or 2 hours (selected areas)
? One or two years of warranty extension
? Remote technical support services
? One or three years with 24x7 coverage (severity 1) or 9x5/NBD for all severities
? Installation and startup support for System x® servers
? Remote technical support for System x servers
? Software support - Support Line
? Microsoft or Linux software
? VMware
? IBM Director
The following table explains warranty service definitions in more detail.
Table 18. Warranty service definitions
Term Description
IBM onsite
repair (IOR)
A service technician will come to the server's location for equipment repair.
24x7x2 hour A service technician is scheduled to arrive at your customer’s location within two hours after remote
problem determination is completed. We provide service around the clock, every day, including IBM
holidays.
24x7x4 hour A service technician is scheduled to arrive at your customer’s location within four hours after remote
problem determination is completed. We provide service around the clock, every day, including IBM
holidays.
9x5x4 hour A service technician is scheduled to arrive at your customer’s location within four business hours
after remote problem determination is completed. We provide service from 8:00 a.m. to 5:00 p.m. in
the customer's local time zone, Monday through Friday, excluding IBM holidays. If after 1:00 p.m. it
is determined that onsite service is required, the customer can expect the service technician to arrive
the morning of the following business day. For noncritical service requests, a service technician will
arrive by the end of the following business day.
9x5 next
business day
A service technician is scheduled to arrive at your customer’s location on the business day after we
receive your call, following remote problem determination. We provide service from 8:00 a.m. to 5:00
p.m. in the customer's local time zone, Monday through Friday, excluding IBM holidays.
Physical and electrical specifications
Dimensions and weight:
? Height: 360 mm (14.2 in)
? Width: 180 mm (7.1 in)
? Depth: 480 mm (18.9 in)
? Weight::
? Minimum ship configuration: 10 kg (22.0 lb)
? Maximum ship configuration: 13 kg (28.7 lb)IBM System x3100 M4 22
Supported environment:
? Temperature
? Server on
? 10.0° to 35.0° C (50° to 95° F); altitude: 0 to 914.4 m (3,000 ft)
? 10.0° to 32.0° C (50° to 89.6° F); altitude: 914.4 m (3,000 ft) to 2,133.6 m (7,000 ft)
? Server off
? 10.0° to 43.0° C (50° to 109.4° F); maximum altitude: 2,133.6 m (7,000 ft)
? Shipping
? -40° to 60° C (-40° to 140° F)
? Relative humidity: 8 to 80%
? Maximum altitude: 2,133.6 m (7,000 ft)
Electrical:
? 100 - 127 (nominal) V ac; 50 - 60 Hz; 7.0 A
? 200 - 240 (nominal) V ac; 50 - 60 Hz; 3.5 A
? Input kilovolt-amperes (kVA) (approximately)
? Minimum configuration: 0.095 kVA
? Maximum configuration: 0.435 kVA
? Btu output
? Ship configuration: 324 Btu/hr (95 watts)
? Full configuration: 1484 Btu/hr (435 watts)
? Noise level
? 4.5 bels (idle)
? 4.8 bels (operating)
Regulatory compliance
The server conforms to the following international standards:
? FCC - Verified to comply with Part 15 of the FCC Rules, Class A
? Canada ICES-003, issue 4, Class A
? UL/IEC 60950-1
? CSA C22.2 No. 69950-1-03
? NOM-019
? Argentina IEC60950-1
? Japan VCCI, Class A
? Australia/New Zealand AS/NZS CISPR 22:2009, Class A
? IEC-60950-1:2001 (CB Certificate and CB Test Report)
? Taiwan BSMI CNS 13438, Class A; CNS 14336
? China CCC (4943-2001), GB 9254-2008 Class A, GB 17625.1:2003
? Korea KN22, Class A; KN24
? Russia/GOST ME01, IEC-60950-1, GOST R 51318.22-99, GOST R 51318.24-99, GOST R
51317.3.2-2006, GOST R 51317.3.3-99
? IEC 60950-1 (CB Certificate and CB Test Report)
? CE Mark (EN55022 Class A, EN60950-1, EN55024, EN61000-3-2, EN61000-3-3)
? CISPR 22, Class A
? TUV-GS (EN60950-1 /IEC60950-1,EK1-ITB2000)IBM System x3100 M4 23
Supported operating systems
The server supports the following operating systems:
? Microsoft Windows Server 2008 R2
? Microsoft Windows Server 2008, Enterprise x64 Edition
? Microsoft Windows Server 2008, Enterprise x86 Edition
? Microsoft Windows Server 2008, Standard x64 Edition
? Microsoft Windows Server 2008, Standard x86 Edition
? Microsoft Windows Server 2008, Web x64 Edition
? Microsoft Windows Server 2008, Web x86 Edition
? Red Hat Enterprise Linux 5 Server Edition
? Red Hat Enterprise Linux 5 Server Edition with Xen
? Red Hat Enterprise Linux 5 Server with Xen x64 Edition
? Red Hat Enterprise Linux 5 Server x64 Edition
? SUSE LINUX Enterprise Server 10 with Xen for AMD64/EM64T
? SUSE LINUX Enterprise Server 11 for AMD64/EM64T
? SUSE LINUX Enterprise Server 11 for x86
? SUSE LINUX Enterprise Server 11 with Xen for AMD64/EM64T
See the IBM ServerProven® website for the latest information about the specific versions and service
levels supported and any other prerequisites:
http://www.ibm.com/systems/info/x86servers/serverproven/compat/us/nos/matrix.shtml
Related publications and links
For more information see the following documents:
? IBM System x3100 M4 product page
http://www.ibm.com/systems/x/hardware/tower/x3100m4/index.html
? IBM System x3100 M4 Installation and User's Guide
http://ibm.com/support
? IBM System x3100 M4 Problem Determination and Service Guide
http://ibm.com/support
? ServerProven® hardware compatibility page for the x3100 M4
http://www.ibm.com/systems/info/x86servers/serverproven/compat/us/xseries/2582.html
? At-a-glance guides for IBM System x options
http://www.redbooks.ibm.com/portals/systemx?Open&page=ataglance
? IBM System x DDR3 Memory Configurator
http://www.ibm.com/systems/x/hardware/ddr3config/
? Configuration and Option Guide
http://www.ibm.com/systems/xbc/cog/
? xRef - IBM System x Reference Sheets
http://www.redbooks.ibm.com/xref
? IBM System x Support Portal
http://ibm.com/support/entry/portal/
http://ibm.com/support/entry/portal/Downloads/Hardware/Systems/System_x/System_x3100_M4IBM System x3100 M4 24
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local
IBM representative for information on the products and services currently available in your area. Any reference to an
IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may
be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property
right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM
product, program, or service. IBM may have patents or pending patent applications covering subject matter described
in this document. The furnishing of this document does not give you any license to these patents. You can send
license inquiries, in writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions
are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain
transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or
typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in
new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s)
described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner
serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this
IBM product and use of those Web sites is at your own risk.IBM may use or distribute any of the information you
supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM
products was obtained from the suppliers of those products, their published announcements or other publicly
available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility
or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be
addressed to the suppliers of those products. This information contains examples of data and reports used in daily
business operations. To illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses
used by an actual business enterprise is entirely coincidental.
Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained
in other operating environments may vary significantly. Some measurements may have been made on
development-level systems and there is no guarantee that these measurements will be the same on generally
available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results
may vary. Users of this document should verify the applicable data for their specific environment.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming techniques
on various operating platforms. You may copy, modify, and distribute these sample programs in any form without
payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to
the application programming interface for the operating platform for which the sample programs are written. These
examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability,
serviceability, or function of these programs.
© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by
GSA ADP Schedule Contract with IBM Corp.IBM System x3100 M4 25
This document was created or updated on December 14, 2011.
Send us your comments in one of the following ways:
? Use the online Contact us review form found at:
ibm.com/redbooks
? Send your comments in an e-mail to:
redbook@us.ibm.com
? Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400 U.S.A.
This document is available online at http://www.ibm.com/redbooks/abstracts/tips0811.html .
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. These and other IBM trademarked
terms are US registered or common law trademarks owned by IBM at the time this information was
published. Such trademarks may also be registered or common law trademarks in other countries. A
current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United
States, other countries, or both:
DPI®
IBM Systems Director Active Energy Manager™
IBM®
Redbooks®
Redpaper™
Redbooks (logo)®
ServerProven®
ServicePac®
System x®
The following terms are trademarks of other companies:
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered
trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
ibm.com/redbooks
Front cover
Power Systems Enterprise Servers
with PowerVM Virtualization and RAS
Dino Quintero
JinHoon Baek
Guillermo Diez
Hassan Elsetohy
Debra Francis
Bing He
Rajesh Jeyapaul
Anil Kalavakolanu
Tejaswini Kaujalgi
David Kgabo
Ricardo Puig
Vani Ramagiri
Unleash the IBM Power Systems
virtualization features
Understand reliability, availability,
and serviceability
Learn about various
deployment case scenariosInternational Technical Support Organization
Power Systems Enterprise Servers with PowerVM
Virtualization and RAS
December 2011
SG24-7965-00© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
First Edition (December 2011)
This edition applies to AIX 7.1 SP 3, IBM SDD PCM for AIX V61 Version 2.5.2.0, HMC code level 7.3.5, and
IBM Systems Director Version 6.2.1.2.
Note: Before using this information and the product it supports, read the information in “Notices” on
page ix.© Copyright IBM Corp. 2011. All rights reserved. iii
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapter 1. Introducing POWER7 Enterprise Server RAS and virtualization features . 1
1.1 High availability in today’s business environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Introduction to RAS and virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Reliability, availability, and serviceability (RAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Latest available feature enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 2. Exploring RAS and virtualization features in more detail . . . . . . . . . . . . . 11
2.1 New RAS and virtualization features with POWER7. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Active Memory Mirroring for the hypervisor on Power 795 . . . . . . . . . . . . . . . . . . 13
2.1.2 Hot GX adapter repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.3 Improved memory RAS features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.4 Active Memory Expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Significant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Active Memory Mirroring for the hypervisor on the Power 795 . . . . . . . . . . . . . . . 22
2.2.2 Persistent hardware deallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3 First Failure Data Capture (FFDC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.4 Processor RAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.5 Memory RAS features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.6 Dynamic service processor (SP) failover at run time and redundant SP . . . . . . . 25
2.2.7 Hot node add and repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.8 Hot node upgrade (memory) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 TurboCore and MaxCore technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.1 Enabling and disabling TurboCore mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Hypervisor and firmware technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.2 Firmware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 Dynamic firmware update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.4 Firmware update and upgrade strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5 Power management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.1 Differences in dynamic power saver from POWER6 to POWER7 . . . . . . . . . . . . 37
2.6 Rapid deployment of PowerVM clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.1 Deployment using the VMControl plug-in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.2 File-backed virtual optical devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.3 Deployment using the System Planning Tool (SPT) . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 I/O considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.1 Virtual SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.2 N_Port ID Virtualization (NPIV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.8 Active Memory Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8.1 Shared memory pool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46iv Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2.8.2 Paging virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8.3 Client LPAR requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8.4 Active Memory Sharing and Active Memory Expansion . . . . . . . . . . . . . . . . . . . . 48
2.8.5 Active Memory Sharing with Live Partition Mobility (LPM) . . . . . . . . . . . . . . . . . . 48
2.9 Integrated Virtual Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.10 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.10.1 Creating a simple LPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.10.2 Dynamically changing the LPAR configurations (DLAR) . . . . . . . . . . . . . . . . . . 59
Chapter 3. Enhancing virtualization and RAS for higher availability . . . . . . . . . . . . . . 65
3.1 Live Partition Mobility (LPM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.1 Partition migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.2 Migration preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.1.3 Inactive migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.1.4 Active migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2 WPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2.1 Types of WPARs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.2 Creating a WPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.2.3 Live Application Mobility (LPM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.3 Partition hibernation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.4 IBM SystemMirror PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4.1 Comparing PowerHA with other high-availability solutions . . . . . . . . . . . . . . . . . . 82
3.4.2 PowerHA 7.1, AIX, and PowerVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5 IBM Power Flex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.5.1 Power Flex Overview: RPQ 8A1830 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5.2 Power Flex usage options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.6 Cluster Aware AIX (CAA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.6.1 Cluster Aware AIX Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6.2 Cluster Aware AIX event infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.7 Electronic services and electronic service agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.7.1 Benefits of ESA for your IT organization and your Power systems. . . . . . . . . . . . 95
3.7.2 Secure connection methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers. . . . . 101
4.1 Physical environment planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.1 Site planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.2 Power and power distribution units (PDUs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.1.3 Networks and storage area networks (SAN). . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2 Hardware planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2.1 Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.2.2 Additional Power 795-specific considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.2.3 Planning for additional Power server features . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2.4 System management planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2.5 HMC planning and multiple networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.2.6 Planning for Power virtualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.2.7 Planning for Live Partition Mobility (LPM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.3 CEC Hot Add Repair Maintenance (CHARM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.1 Hot add or upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.2 Hot repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3.3 Planning guidelines and prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.4 Software planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5 HMC server and partition support limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.6 Migrating from POWER6 to POWER7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Contents v
4.6.1 Migrating hardware from POWER6 and POWER6+ to POWER7 . . . . . . . . . . . 134
4.6.2 Migrating the operating system from previous Power servers to POWER7 . . . . 135
4.6.3 Disk-based migrations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.6.4 SAN-based migration with physical adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.6.5 After migration to POWER7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.7 Technical and Delivery Assessment (TDA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.8 System Planning Tool (SPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.9 General planning guidelines for highly available systems. . . . . . . . . . . . . . . . . . . . . . 156
Chapter 5. POWER7 system management consoles . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.1 SDMC features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.1.1 Installing the SDMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.1.2 SDMC transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.1.3 SDMC key functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.1.4 HMC versus SDMC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.1.5 Statement of direction for support HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.2 Virtualization management: Systems Director VMControl . . . . . . . . . . . . . . . . . . . . . 167
5.2.1 VMControl terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.2.2 VMControl planning and installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.2.3 Managing a virtual server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.2.4 Relocating a virtual server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.2.5 Managing virtual appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.2.6 Creating a workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.2.7 Managing server system pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.3 IBM Systems Director Active Energy Management (AEM) . . . . . . . . . . . . . . . . . . . . . 185
5.3.1 Active Energy Manager (AEM) overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.3.2 AEM planning, installation, and uninstallation. . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.3.3 AEM and the managed systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.3.4 Managing and monitoring the consumed power using AEM. . . . . . . . . . . . . . . . 189
5.4 High availability Systems Director management consoles . . . . . . . . . . . . . . . . . . . . . 195
Chapter 6. Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
6.1 Hot node add and repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.1.1 Hot node add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
6.1.2 Hot node repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.2 Hot GX adapter add and repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.2.1 Hot GX adapter add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
6.2.2 Hot GX adapter repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
6.3 Live Partition Mobility (LPM) using the HMC and SDMC . . . . . . . . . . . . . . . . . . . . . . 212
6.3.1 Inactive migration from POWER6 to POWER7 using HMC and SDMC . . . . . . . 212
6.4 Active migration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.5 Building a configuration from the beginning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.5.1 Virtual I/O servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
6.5.2 HEA port configuration for dedicated SEA use . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.5.3 NIB and SEA failover configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.5.4 Active Memory Sharing configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
6.5.5 NPIV planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.5.6 Client LPAR creation (virtual servers) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.5.7 Server-side NPIV configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.6 LPM and PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.6.1 The LPM operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
6.6.2 The PowerHA operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Chapter 7. POWER7 Enterprise Server performance considerations . . . . . . . . . . . . 249vi Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.2 Performance design for POWER7 Enterprise Servers . . . . . . . . . . . . . . . . . . . . . . . . 250
7.2.1 Balanced architecture of POWER7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.2.2 Processor eDRAM technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.2.3 Processor compatibility mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.2.4 MaxCore and TurboCore modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7.2.5 Active Memory Expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
7.2.6 Power management’s effect on system performance . . . . . . . . . . . . . . . . . . . . . 254
7.3 POWER7 Servers performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.3.1 Processor compatibility mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.3.2 TurboCore and MaxCore modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
7.3.3 Active Memory Expansion (AME) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7.3.4 Logical memory block size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.3.5 System huge-page memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.4 Performance considerations with hardware RAS features . . . . . . . . . . . . . . . . . . . . . 273
7.4.1 Active Memory Mirroring for the hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
7.5 Performance considerations with Power virtualization features . . . . . . . . . . . . . . . . . 274
7.5.1 Dynamic logical partitioning (DLPAR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
7.5.2 Micro-partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
7.5.3 PowerVM Lx86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
7.5.4 Virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
7.5.5 Active Memory Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
7.5.6 Live Partition Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
7.6 Performance considerations with AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
7.6.1 Olson and POSIX time zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
7.6.2 Large page size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
7.6.3 One TB segment aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
7.6.4 Memory affinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
7.6.5 Hardware memory prefetch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
7.6.6 Simultaneous multithreading (SMT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
7.6.7 New features of XL C/C++ V11.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
7.6.8 How to deal with unbalanced core and memory placement . . . . . . . . . . . . . . . . 300
7.6.9 AIX performance tuning web resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
7.7 IBM i performance considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
7.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
7.7.2 Optimizing POWER7 performance through tuning system resources . . . . . . . . 305
7.8 Enhanced performance tools of AIX for POWER7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
7.8.1 Monitoring POWER7 processor utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
7.8.2 Monitoring power saving modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
7.8.3 Monitoring CPU frequency using lparstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
7.8.4 Monitoring hypervisor statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
7.8.5 Capabilities for 1024 CPU support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
7.8.6 Monitoring block IO statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
7.8.7 Monitoring Active Memory Expansion (AME) statistics. . . . . . . . . . . . . . . . . . . . 319
7.8.8 Monitoring memory affinity statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
7.8.9 Monitoring the available CPU units in a processor pool . . . . . . . . . . . . . . . . . . . 328
7.8.10 Monitoring remote node statistics in a clustered AIX environment . . . . . . . . . . 330
7.9 Performance Management for Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
7.9.1 Levels of support available within PM for Power Systems . . . . . . . . . . . . . . . . . 331
7.9.2 Benefits of PM for Power Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
7.9.3 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
7.9.4 Accessing the PM for Power Systems website . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Contents vii
Chapter 8. PowerCare Services offerings for Power Enterprise Servers. . . . . . . . . . 337
8.1 PowerCare highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
8.2 PowerCare Services offerings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
8.2.1 Availability optimization services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
8.2.2 Systems Director and VMControl enablement . . . . . . . . . . . . . . . . . . . . . . . . . . 341
8.2.3 Systems Director Active Energy Manager enablement. . . . . . . . . . . . . . . . . . . . 343
8.2.4 IBM Systems Director Management Console . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
8.2.5 Security assessment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
8.2.6 Performance optimization assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
8.2.7 Power Flex enablement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
8.2.8 Power 795 upgrade implementation services . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
8.2.9 PowerCare technical training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
Appendix A. Administration concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Making a root volume group (rootvg) easier to manage . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Example importing non-root volume group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
A dynamic LPAR operation using the HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Setting up Secure Shell keys between two management consoles . . . . . . . . . . . . . . . . . . 362
Simple cluster installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
Installing and configuring PowerHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Appendix B. Performance concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Performance concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Throughput versus response time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Performance and computing resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Central processing unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Multiple core systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Memory architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Server I/O storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Performance benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
Appendix C. ITSO Power Systems testing environment . . . . . . . . . . . . . . . . . . . . . . . 395
Austin environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
Poughkeepsie benchmark center environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
ITSO Poughkeepsie environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403viii Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. ix
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. x Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Active Memory™
AIX 5L™
AIX®
BladeCenter®
DB2®
DS8000®
Electronic Service Agent™
EnergyScale™
eServer™
GPFS™
HACMP™
IBM Systems Director Active Energy
Manager™
IBM®
Informix®
iSeries®
Orchestrate®
Power Architecture®
Power Systems™
POWER4™
POWER5™
POWER6+™
POWER6®
POWER7™
POWER7 Systems™
PowerHA™
PowerVM™
POWER®
pSeries®
Redbooks®
Redpaper™
Redbooks (logo) ®
System i®
System p®
System Storage®
System x®
System z®
Systems Director VMControl™
WebSphere®
XIV®
The following terms are trademarks of other companies:
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others. © Copyright IBM Corp. 2011. All rights reserved. xi
Preface
This IBM® Redbooks® publication illustrates implementation, testing, and helpful scenarios
with IBM Power® Systems 780 and 795 using the comprehensive set of the Power
virtualization features. We focus on the Power Systems functional improvements, in particular,
highlighting the reliability, availability, and serviceability (RAS) features of the enterprise
servers.
This document highlights IBM Power Systems Enterprise Server features, such as system
scalability, virtualization features, and logical partitioning among others. This book provides a
documented deployment model for Power 780 and Power 795 within a virtualized
environment, which allows clients to plan a foundation for exploiting and using the latest
features of the IBM Power Systems Enterprise Servers.
The target audience for this book includes technical professionals (IT consultants, technical
support staff, IT Architects, and IT Specialists) responsible for providing IBM Power Systems
solutions and support.
The team who wrote this book
This book was produced by a team of specialists from around the world working at the
International Technical Support Organization, Poughkeepsie Center.
Dino Quintero is a Project Leader and IT generalist with the ITSO in Poughkeepsie, NY. His
areas of knowledge include enterprise continuous availability planning and implementation,
enterprise systems management, virtualization, and clustering solutions. He is currently an
Open Group Master Certified IT Specialist. Dino holds a Master of Computing Information
Systems degree and a Bachelor of Science degree in Computer Science from Marist College.
JinHoon Baek is a Certified Product Support Professional and Senior System Service
Representative (SSR) in IBM Korea, working in Maintenance and Technical Support. He is
also an IBM Certified Advanced Technical Expert in IBM System p® and AIX® 5L™ with
seven years of experience in AIX and IBM Power Systems. His areas of expertise include
high-end storage systems, including SAN, Power Systems, and PowerVM™, as well as AIX,
GPFS™ and PowerHA™.
Guillermo Diez is a Certified IT Specialist and IBM Certified Systems Expert in Virtualization
Technical Support for AIX and Linux working at the Service Delivery Center in IBM Uruguay.
He joined IBM in 2003 and works as the Team Leader for the UNIX and Storage
administration teams since 2007. His areas of expertise include AIX, Linux, PowerVM,
performance tuning, TCP/IP, and midrange storage systems. Guillermo also holds a
Computer Engineer degree from the Universidad Catolica del Uruguay (UCUDAL).
Hassan Elsetohy is dual-certified professional. He is both a Certified IT Architect and a
Certified IT Specialist. He performs the lead architect role for full life-cycle engagements, and
also undertakes the Method Exponent role in large engagements, in addition to his lead
architect role. He also sometimes performs the SME role in his in-depth areas of expertise,
such as Storage and AIX/System p. Hassan Joined IBM in 1994 directly from university after
attaining his Bachelor of Engineering in Electrical Engineering. Hassan also attained his
Masters degree in VLSI Design - Course Work - in 1996.xii Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Debra Francis is a Senior Managing Consultant with IBM STG Lab Services and Training out
of Rochester, MN, with over 25 years of experience with IBM midrange and Power Systems.
She is part of the Worldwide PowerCare team that works with Power Enterprise Server clients
around the globe. This team tackles the clients’ IT availability demands to meet the business
expectations of today and to provide input and availability consulting as part of a solid IT
resiliency strategy.
Bing He is a Senior I/T Specialist of the IBM Advanced Technical Skills (ATS) team in China.
He has 11 years of experience with IBM Power Systems. He has worked at IBM for over four
years. His areas of expertise include PowerHA, PowerVM, and performance tuning on AIX.
Rajesh Jeyapaul is the technical lead for IBM Systems Director POWER Server
management. His focus is on the PowerHA SystemMirror plug-in and PowerRF interface for
Virtualization Management Control (VMC) Plug-in on System Director. He is part of the
Technical advocate team that works closely work with clients to tackle their POWER
Server-related issues. Rajesh holds a Master in Software Systems degree from the University
of BITS, India, and a Master of Business Administration (MBA) degree from the University of
MKU, India.
Anil Kalavakolanu is a Senior Engineer and also Technical Lead in the AIX Development
Support Organization. He has 18 years of experience supporting AIX and POWER. He holds
a Masters degree in Electrical Engineering from University of Alabama, in Tuscaloosa, AL.
His areas of expertise include AIX, PowerVM, and SAN.
Tejaswini Kaujalgi is currently working as a Systems Software Engineer in the IBM AIX
UNIX Product Testing team, Bangalore, India. Her expertise lies in the areas of AIX,
PowerHA, Security, and Virtualization. She has also worked on various client configurations
using LDAP, Kerberos, RBAC, PowerHA, and AIX. She is an IBM Certified System p
Administrator. She has published articles in developerWorks Forum, as well.
David Kgabo is a Specialist Systems Programmer working for ABSA in South Africa. He has
15 years of experience in IT, nine of which he worked on Enterprise POWER systems. His
areas of expertise include AIX, Virtualization, disaster recovery, Clustering PowerHA, and
GPFS.
Ricardo Puig has been working as an AIX Support Engineer since 1998 and is a leading
expert in installation and disaster recovery procedures for AIX.
Vani Ramagiri is a Virtual I/O Server Specialist in the Development Support Organization in
Austin, Texas. She has 12 years of experience supporting AIX and has worked as a Lead in
PowerVM since its inception in 2004. She holds a Masters degree in Computer Science from
Texas State University.
Thanks to the following people for their contributions to this project:
David Bennin, Richard Conway, Don Brennan
International Technical Support Organization, Poughkeepsie Center
Bob Maher, Christopher Tulloch, Duane Witherspoon
IBM Poughkeepsie
Basu Vaidyanathan, Jayesh Patel, Daniel Henderson, Liang Jiang, Vasu Vallabhaneni,
Morgan Jeff Rosas
IBM Austin
Cesar Maciel
IBM Atlanta Preface xiii
Gottfried Schimunek
IBM Rochester
Ralf Schmidt-Dannert
IBM US
Priya Kannan, Kiran Grover, Saravanan Devendra, Venkatakrishnan Ganesan, Shubha Joshi,
Jaipaul K Antony
IBM India
Martin Abeleira
IBM Uruguay
Now you can become a published author, too!
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and client satisfaction, as you expand your network
of technical contacts and relationships. Residencies run from two to six weeks in length, and
you can participate either in person or as a remote resident working from your home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:xiv Power Systems Enterprise Servers with PowerVM Virtualization and RAS
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html© Copyright IBM Corp. 2011. All rights reserved. 1
Chapter 1. Introducing POWER7 Enterprise
Server RAS and virtualization
features
In this chapter, we introduce reliability, availability, and serviceability (RAS), and virtualization
concepts for the IBM Enterprise Power Systems servers.
The following topics are discussed in this chapter:
High availability in today’s business environments
Introduction to RAS and virtualization
12 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
1.1 High availability in today’s business environments
Availability is a well-established concept in today’s IT environments. Tremendous growth in
system capabilities, along with business demands for around the clock operations, has put
increased demands on efforts to provide the highest levels of system availability.
IBM Power Systems™ servers are especially designed to help achieve high availability;
however, the need for planned downtime is required for periodic maintenance (for both
hardware and software) and cannot be completely eliminated. Even though both types of
outages affect the overall availability of a server, we need to understand the distinctions
between planned and unplanned downtime in today’s business environments:
Planned downtime is scheduled and typically is a result of a maintenance action to the
hardware, operating system, or an application. Scheduled downtime is used to ensure that
the server can operate optimally and reliably in the future. Because this type of event can
be planned for in advance, it can be scheduled at a time that least affects system or
application availability.
Unplanned downtime is downtime that occurs as a result of a type of physical event or
failure, along with human error, and cannot be planned in advance.
Understanding the causes of downtime and how the IBM Power Systems Enterprise Servers
can help you address both of them is a key aspect for improving IT operations in every
business.
“Despite the potential consequences of unplanned downtime, less than 10% of all downtime
can be attributed to unplanned events, and only a fraction of that is due to a site disaster. The
other 90+%—the kind that companies face on a regular basis—are those caused by system
maintenance tasks.”
_Vision Solutions, Inc. white paper, An introduction to System i® High Availability 2010
The following typical system maintenance tasks are included in planned downtime:
Data backups (nightly, weekly, and monthly)
Reorganization of files to reclaim disk space and improve performance
Vendor software upgrades and data conversions
IBM software release upgrades and patches (program temporary fixes (PTFs))
New application software installations
Hardware upgrades
System migrations
The number of unplanned outages continues to shrink quickly as hardware and software
technology becomes more resilient. Although unplanned outages must still be eliminated, in
the desire to achieve 24x365 availability, planned downtime has now become a primary focus.
1.2 Introduction to RAS and virtualization
Servers must be designed to help avoid every possible outage, focusing on applications
availability. For almost two decades now, the IBM Power Systems development teams have
worked to integrate industry-leading IBM System z® mainframe reliability features and
capabilities into the IBM Power Systems servers line. These RAS capabilities together with
the IBM Power Systems virtualization features help implement fully virtualized and highly
available environments.Chapter 1. Introducing POWER7 Enterprise Server RAS and virtualization features 3
In the following section, we present a brief introduction to RAS and virtualization concepts for
IBM Power Systems servers.
1.2.1 Reliability, availability, and serviceability (RAS)
Hardware RAS is defined this way:
1
Reliability: How infrequently a defect or fault is seen in a server.
Availability: How infrequently the functionality of a system or application is impacted by a
fault or defect.
Serviceability: How well faults and their impacts are communicated to users and services,
and how efficiently and non-disruptively they are repaired.
Defined this way, reliability in hardware is all about how often a hardware fault requires a
system to be serviced (the less frequent the failures, the greater the reliability). Availability is
how infrequently such a failure impacts the operation of the system or application. For high
levels of availability, correct system operation must not be adversely affected by hardware
faults. A highly available system design ensures that most hardware failures do not result in
application outages. Serviceability relates to identifying what fails and ensuring an efficient
repair (of that component, firmware, or software).
IBM POWER7 is designed for RAS by including technologies among others to detect and
isolate component faults the first time that they appear without the need to recreate the
situation or perform further tests. This technology helps to minimize the risk of the same error
repeating itself and causing similar or even larger problems.
Table 1-1 summarizes the RAS features that are available in the IBM POWER6® and IBM
POWER7 Enterprise Servers.
Table 1-1 System support for selected RAS features (?=capable, X=incapable)
1
D. Henderson, J.Mitchell, G. Ahrens. “POWER7™ System RAS Key Aspects of Power Systems Reliability,
Availability, and Serviceability” POW03056.doc, November 2010
RAS feature Power
595
Power
780
Power
795
Processor
Processor fabric bus protection
Dynamic Processor Deallocation
Dynamic Processor Sparing
Using CoD cores
Using capacity from spare pool
Core Error Recovery
Processor Instruction Retry
Alternate Processor Recovery
Partition core contained checkstop
Persistent processor deallocation
Midplane connection for inter-nodal communication
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
X
?
?
?
?
?
?
?
?
?
I/O subsystem
GX+ bus persistent deallocation ? ? ?
Optional ECC I/O hub with freeze behavior
PCI bus enhanced error detection
PCI bus enhanced error recovery
PCI-PCI bridge enhanced error handling
Redundant 12x Channel link
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?4 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Clocks and service processor
Dynamic SP failover at run time / Redundant SP
Clock failover at run time / Redundant Clock
?
?
?
?
?
?
Memory availability
ECC in L2 and L3 cache
Error detection/correction
Chipkill memory plus additional 1/2 symbol correct
Memory DRAM sparing
Memory sparing with CoD at IPL time
CRC plus retry on memory data bus (CPU to buffer)
Data bus (memory buffer to DRAM) ECC plus retry
DRAM sparing on x8+1 memory
Dynamic memory channel repair
Processor memory controller memory scrubbing
Memory page deallocation
L1 parity check plus retry/set delete
L2 cache line delete
L3 cache line delete
Special Uncorrectable Error handling
Active Memory™ Mirroring for hypervisor
?
?
X
a
?
X
b
?
X
?
?
?
?
?
?
?
X
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Fault detection and isolation
FFDC for fault detection and isolation
Storage Protection Keys
Error log analysis
?
?
?
?
?
?
?
?
?
Serviceability
Boot-time progress indicators
Firmware error codes
Operating system error codes
Inventory collection
Environmental and power warnings
PCI card hot-swap
Hot-swap DASD/media
Dual disk controllers/split backplane
Extended error data collection
I/O drawer redundant connections
I/O drawer hot add and concurrent repair
Hot GX adapter add and repair
Concurrent add of powered I/O rack
SP mutual surveillance with the Power hypervisor
Dynamic firmware update with HMC
Service Agent Call Home application
Service indicators – guiding light or light path LEDs
Service processor support for BIST for logic/arrays, wire tests,
and component initialization
System dump for memory, Power hypervisor , SP
Operating system error reporting to HMC SFP application
RMC secure error transmission subsystem
Health check scheduled operations with HMC
Operator panel (real or virtual)
?
?
?
?
?
?
?
?
?
?
?
X
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
RAS feature Power
595
Power
780
Power
795Chapter 1. Introducing POWER7 Enterprise Server RAS and virtualization features 5
1.2.2 Virtualization
First introduced in the 1960s, computer virtualization was created to logically divide
mainframe systems to improve resource utilization. After many years of continuous evolution,
IT organizations all over the world use or implement various levels of virtualization.
Built upon the Power systems RAS hardware platform, IBM virtualization features allow for
great flexibility, hardware optimization, simple management, and secure and low cost
hardware-assisted virtualization solutions.
The following section summarizes the available virtualization technologies for the IBM Power
Systems Enterprise Servers.
IBM PowerVM
IBM PowerVM is a combination of hardware and software that enable the virtualization
platform for AIX, Linux, and IBM i environments for IBM Power Systems. By implementing
PowerVM you can perform these functions:
Easily and quickly deploy new partitions.
Execute isolated workloads for production, development, and test systems.
Reduce costs by consolidating AIX, IBM i, and Linux workloads into one high-end IBM
Power System.
Optimize resource utilization by effectively allocating resources to those workloads that
need them.
Optimize the utilization of I/O adapters.
Reduce the complexity and management of the environment.
Increase your overall availability by making workloads independent of the physical
hardware and by adding the capabilities to move those workloads to another server
without disruption, thus eliminating planned downtime.
Redundant HMCs
Automated server recovery/restart
Hot-node add/cold node repair
Hot-node repair/hot memory upgrade
Hot-node repair/hot memory Add for all nodes
PowerVM Live Partition/Live Application Mobility
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Power and cooling
Redundant, hot swap fans and blowers for CEC
Redundant, hot swap power supplies for CEC
Redundant voltage regulator outputs
TPMD/MDC for system power and thermal management
CEC power/thermal sensors (CPU and memory)
Redundant power for I/O drawers
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
a. The Power 595 does not have the Memory DRAM sparing feature, but it has redundant bit
steering.
b. In the Power 595, there is ECC on the memory bus with spare lanes.
RAS feature Power
595
Power
780
Power
7956 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
There are three editions of PowerVM that are suitable for these purposes:
PowerVM Express Edition
PowerVM Express Edition is designed for clients looking for an introduction to
virtualization features at an affordable price. The Express Edition is not available for the
IBM Power Systems Enterprise Servers.
PowerVM Standard Edition
PowerVM Standard Edition provides advanced virtualization functionality for AIX, IBM i,
and Linux operating systems. PowerVM Standard Edition is supported on all POWER
processor-based servers and includes features designed to help businesses increase
system utilization.
PowerVM Enterprise Edition
PowerVM Enterprise Edition includes all the features of PowerVM Standard Edition, plus
two new industry-leading capabilities that are called Active Memory Sharing and Live
Partition Mobility. This option provides complete virtualization for AIX, IBM i, and Linux
operating systems. Active Memory Sharing intelligently flows system memory from one
partition to another as workload demands change. Live Partition Mobility allows for the
movement of a running partition from one server to another with no application downtime,
resulting in better system utilization, improved application availability, and energy savings.
With Live Partition Mobility, planned application downtime due to regular server
maintenance can be a thing of the past.
Table 1-2 lists the feature codes of the PowerVM editions that are available on the Power 780
and 795 Enterprise Servers.
Table 1-2 Availability of PowerVM editions on the Power 780 and 795 Enterprise Servers
Table 1-3 outlines the functional elements of the available PowerVM editions for both the
Power 780 and 795.
Table 1-3 PowerVM capabilities and features on Power 780 and 795
PowerVM editions Express Standard Enterprise
IBM Power 780 N/A 7942 7995
IBM Power 795 N/A 7943 8002
PowerVM editions Standard Enterprise
PowerVM Hypervisor Yes Yes
Dynamic Logical Partitioning Yes Yes
Maximum partitions 1000 per server 1000 per server
Management VMControl, HMC, and SDMC VMControl, HMC, and SDMC
Virtual I/O server Yes (Maximum supported 10) Yes (Maximum supported 10)
PowerVM Lx86 Yes Yes
Suspend/Resume Yes Yes
N_port ID Virtualization Yes Yes
Multiple Shared Processor Pool Yes Yes
Shared Storage Pools Yes YesChapter 1. Introducing POWER7 Enterprise Server RAS and virtualization features 7
The PowerVM Standard Edition can be upgraded to the PowerVM Enterprise Edition by
entering a key code in the HMC. This upgrade operation is non-disruptive. If you have an
existing Power 595 (machine type 9119-FHA) with PowerVM Standard Edition (feature code
7943) or PowerVM Enterprise Edition (feature code 8002), you can also migrate the licenses
for PowerVM, if you migrate from a Power 595 to a Power 795.
Operating system versions supported
The following operating system versions are supported:
AIX 5.3, AIX 6.1, and AIX 7
IBM i 6.1 and IBM i 7.1
Red Hat Enterprise Linux 5 and Red Hat Enterprise Linux 6
SUSE Linux Enterprise Server 10 and SUSE Linux Enterprise Server 11
Table 1-4 summarizes the PowerVM features that are supported by the operating systems
that are compatible with the POWER7 processor-based servers.
Table 1-4 PowerVM features supported by AIX, IBM i, and Linux on Power 780 and 795
Thin Provisioning Yes Yes
Active Memory Sharing No Yes
Live Partition Mobility No Yes
Hardware management Console: The IBM Power 780 and 795 must be managed with
the Hardware Management Console (HMC) or the Systems Director Management Console
(SDMC). The Integrated Virtualization Manager (IVM) is not supported.
Feature AIX
5.3
AIX
6.1
AIX
7.1
IBM i
6.1.1
IBM i
7.1
RHEL
5.5
SLES10
SP3
SLES11
SP1
Simultaneous
Multi-Threading (SMT)
Ye s
a
Ye s
b
Ye s Ye s
c
Ye s Ye s
a
Ye s
a
Ye s
Dynamic LPAR I/O
adapter
add/remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Dynamic LPAR
processor
add/remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Dynamic LPAR I/O
adapter add
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Dynamic LPAR I/O
adapter remove
Ye s Ye s Ye s Ye s Ye s N o N o Ye s
Capacity on Demand Yes Yes Yes Yes Yes Yes Yes Yes
Micro-partitioning Yes Yes Yes Yes Yes Yes Yes Yes
Shared Dedicated
Capacity
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Multiple Shared
Processor Pools
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
PowerVM editions Standard Enterprise8 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
You can obtain additional information about the PowerVM Editions at the IBM PowerVM
Editions website:
http://www.ibm.com/systems/power/software/virtualization/editions/index.html
You can obtain detailed information about the use of PowerVM technology in the following
IBM Redbooks publications:
IBM PowerVM Virtualization Introduction and Configuration, SG24-7940-04
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
Refer to Chapter 2, “Exploring RAS and virtualization features in more detail” on page 11 for
detailed PowerVM information.
Other virtualization features
In addition to the PowerVM features, the IBM POWER7 Systems™ Enterprise Servers
introduce Active Memory Expansion (AME) and also the Integrated Virtual Ethernet (IVE)
adapter, which were previously available in the low and midrange Power servers.
We cover both AME and IVE in detail in 2.1, “New RAS and virtualization features with
POWER7” on page 13.
1.2.3 Latest available feature enhancements
The latest available virtualization feature contains the following enhancements:
LPAR maximums increased up to 1000 partitions per server as shown in Table 1-5.
Table 1-5 High-end Power Systems features
Virtual I/O server Yes Yes Yes Yes Yes Yes Yes Yes
Virtual SCSI Yes Yes Yes Yes Yes Yes Yes Yes
Virtual Ethernet Yes Yes Yes Yes Yes Yes Yes Yes
N_Por t ID Vir tualization
(NPIV)
Ye s Ye s Ye s Ye s Ye s Ye s N o Ye s
Live Partition Mobility Yes Yes Yes No No Yes Yes Yes
Workload Partitions No Yes Yes No No No No No
Active Memory Sharing Yes Yes Yes Yes Yes No No Yes
Active Memory
Expansion
No Yes
d
Yes No No No No No
a. Only supports two threads.
b. AIX 6.1 up to TL4 SP2 only supports two threads, and supports four threads as of TL4 SP3.
c. IBM i 6.1.1 and up support SMT4.
d. On AIX 6.1 with TL4 SP2 and later.
Feature AIX
5.3
AIX
6.1
AIX
7.1
IBM i
6.1.1
IBM i
7.1
RHEL
5.5
SLES10
SP3
SLES11
SP1
POWER7 model Maximum cores Original maximum
LPARs
May 2011
maximum LPARs
780 64 256 640
795 256 256 1000Chapter 1. Introducing POWER7 Enterprise Server RAS and virtualization features 9
Trial PowerVM Live Partition Mobility
This feature enables a client to evaluate Live Partition Mobility at no-charge for 60 days. At
the conclusion of the trial period, clients can place an upgrade order for a permanent
PowerVM Enterprise Edition to maintain continuity. At the end of the trial period (60 days),
the client’s system automatically returns to the PowerVM Standard Edition. Live Partition
Mobility is available only with PowerVM Enterprise Edition. It allows for the movement of a
running partition from one Power System server to another with no application downtime,
resulting in better system utilization, improved application availability, and energy savings.
With Live Partition Mobility, planned application downtime due to regular server
maintenance is a challenge of the past.
Requirement: Using maximum LPARs requires PowerVM Standard or PowerVM
Enterprise and the latest system firmware, which is 730_035 or later.
Requirement: This is a 60-day trial version of PowerVM Enterprise Edition. Using this
trial version requires PowerVM Standard Edition and firmware 730_035 or later.10 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 11
Chapter 2. Exploring RAS and virtualization
features in more detail
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. The IBM POWER7 processor-based servers have new features to support new
levels of virtualization, help ease administrative burden, and increase system utilization.
POWER7 Enterprise Servers use several innovative technologies that offer industry-leading
processing speed and virtualization capabilities while using less energy and operating at a
lower cost per transaction.
In this chapter, we investigate in more detail the new POWER7 reliability, availability, and
serviceability (RAS) features, along with other significant RAS and virtualization features. You
will become familiar with their benefits and understand how these capabilities strengthen your
overall IBM Power Systems server availability environment. Figure 2-1 shows the additional
features that are available only on the POWER7 Enterprise Server 780 and 795.
Figure 2-1 Additional exclusive features on POWER7 Enterprise Servers
In the following sections, we discuss the key features that IBM Power Systems provide in
detail, providing guidelines for implementing these features to take full advantage of their
capabilities.
212 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
We discuss the following topics in this chapter:
New RAS and virtualization features with POWER7
Significant features
TurboCore and MaxCore technology
Hypervisor and firmware technologies
Power management
Rapid deployment of PowerVM clients
I/O considerations
Active Memory Sharing
Integrated Virtual Ethernet
PartitioningChapter 2. Exploring RAS and virtualization features in more detail 13
2.1 New RAS and virtualization features with POWER7
A number of RAS and virtualization features are introduced with POWER7 servers. In this
section, we discuss the following features in more detail:
Active Memory Mirroring for the hypervisor
Hot GX adapter add/repair
Improved memory RAS features
Active Memory Expansion (AME)
Hibernation or suspend/resume (refer to 3.3, “Partition hibernation” on page 79)
For more in-depth information about POWER7 RAS features, see the POWER7 System RAS
Key Aspects of Power Systems Reliability, Availability, and Serviceability white paper at this
website:
http://www-03.ibm.com/systems/power/hardware/whitepapers/ras7.html
2.1.1 Active Memory Mirroring for the hypervisor on Power 795
Active Memory Mirroring for the hypervisor is a new RAS function that is provided with
POWER7 and is only available on the Power 795 server. This feature is also sometimes
referred to as system firmware mirroring. Do not confuse it with other memory technologies,
such as Active Memory Sharing and Active Memory Expansion, which are discussed in 2.1.4,
“Active Memory Expansion” on page 19 and 2.8, “Active Memory Sharing” on page 45.
Active Memory Mirroring for the hypervisor is designed to mirror the main memory that is
used by the system firmware to ensure greater memory availability by performing advance
error-checking functions. This level of sophistication in memory reliability on Power systems
translates into outstanding business value. When enabled, an uncorrectable error that results
from a failure of main memory used by the system firmware will not cause a system-wide
outage. The system maintains two identical copies of the system hypervisor in memory at all
times, as shown in Figure 2-2.
Figure 2-2 A simple view of Active Memory Mirroring
Active Memory Mirroring:
A simple view
Eliminates system outages due to
uncorrectable errors in Power
hypervisor memory.
Maintains two identical copies of the Power
hypervisor in memory at all times.
Both copies are simultaneously
updated with any changes.
In the event of a memory failure on the primary
copy, the second copy will be automatically
invoked and a notification sent to IBM via the
Electronic Service Agent (if enabled by the client
administrator).
Virtual view of memory
mirroring
Partition Partition Partition
Copy A User Memory pool Copy B
POWER Hypervisor
Size of mirrored pool can grow
or shrink based on client needs14 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
When a failure occurs on the primary copy of memory, the second copy is automatically
invoked and a notification is sent to IBM via the Electronic Service Agent™ (ESA).
Implementing the Active Memory Mirroring function requires additional memory; therefore,
you must consider this requirement when designing your server. Depending on the system
I/O and partition configuration, between 5% and 15% of the total system memory is used by
hypervisor functions on a system on which Active Memory Mirroring is not being used. Use of
Active Memory Mirroring for the hypervisor doubles the amount of memory that is used by the
hypervisor, so appropriate memory planning must be performed. The System Planning Tool
(SPT) can help estimate the amount of memory that is required. See Chapter 4, “Planning for
virtualization and RAS in POWER7 high-end servers” on page 101 for more details.
Active Memory Mirroring for the hypervisor is provided as part of the hypervisor, so there is no
feature code that needs to be ordered that provides this function. The feature is enabled by
default on a Power 795 server. An optimization tool for memory defragmentation is also
included as part of the Active Memory Mirroring feature.
The only requirement of a Power 795 system to support Active Memory Mirroring is that in
each node at least one processor module must be fully configured with eight dual inline
memory modules (DIMMs). Figure 2-3 shows the layout of a processor book and its
components.
Figure 2-3 A POWER7 processor book and its components
Disabling Active Memory Mirroring: Active Memory Mirroring can be disabled on a
system if required, but you must remember that disabling this feature leaves your Power
server exposed to possible memory failures that can result in a system-wide outage.Chapter 2. Exploring RAS and virtualization features in more detail 15
Beginning with Hardware Management Console (HMC) V7 R7.2.0, new commands have
been added for the Active Memory Mirroring support:
optmem -x -o start -t mirror -q
optmem -x -o stop
lsmemopt -m
This command lists the status and progress information of the most recent
defragmentation operation.
The lshwres command on the HMC, which lists the hardware resources of the managed
system, has been enhanced to support Active Memory Mirroring on the IBM POWER7
servers only and specifically the Power 795. Also, the chhwres command, which dynamically
changes the hardware resource configuration, supports Active Memory Mirroring. The
following commands are also valid on the IBM Systems Director Management Console
(SDMC) using the command-line interface (CLI). Each command is preceded by smcli:
smcli optmem -x -o start -t mirror -q
smcli optmem -x -o stop
smcli lsmemopt -m
You also have the option of configuring Active Memory Mirroring via the Advanced System
Management Interface (ASMI) interface. To perform this operation, you must have one of the
following authority levels:
Administrator
Authorized service provider16 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
To configure Active Memory Mirroring, perform the following steps:
1. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
2. In the navigation area, expand System Configuration ? Selective Memory Mirroring.
3. In the right pane, select the Requested mode (Enabled or Disabled) and click Save
settings, as shown in Figure 2-4.
Figure 2-4 Memory Mirroring enablement via the ASMI interface
2.1.2 Hot GX adapter repair
The IBM GX host channel adapter (HCA) provides server connectivity to InfiniBand fabrics
and I/O drawers. The POWER7 server provides the following GX adapter capabilities:
GX+ adapters run at 5 GB/second
GX++ adapters run at 20 GB/second
Concurrent maintenance has been available on Power Systems since 1997. POWER6
(2007-2009) introduced the ability to have Central Electronic Complex (CEC) concurrent
maintenance functions. The CEC consists of the processor, memory, systems clocks, I/O
hubs, and so on. Hot GX adapter ADD with COLD repair has been a RAS feature since
POWER6, but we did not have the capability to perform Hot GX adapter repair until POWER7.
Hot GX adapter repair enables the repair and replacement of the component with reduced
impact to systems operations:
Cold Repair: The hardware being repaired is electrically isolated from the system.
Hot Repair: The hardware being repaired is electrically connected to the system.Chapter 2. Exploring RAS and virtualization features in more detail 17
With POWER7, we introduced CEC Hot Add Repair Maintenance (CHARM) for Power 780
and Power 795 servers. CHARM offers new capabilities in reliability, availability, and
serviceability (RAS). Hot GX adapter repair enables the repair and replacement of the
component with reduced impact to systems operations, if all prerequisites have been met.
CHARM operations are complex and, therefore, require the following additional prerequisites:
Off-peak schedule: It is highly recommended that repairs are done during non-peak
operational hours.
Redundant I/O: It is a prerequisite that all I/O resources are configured with redundant
paths. Redundant I/O paths need to be configured through separate nodes and GX
adapters. Redundant I/O adapters must be located in separate I/O expansion units that
are attached to separate GX adapters that are located in separate nodes.
Redundant I/O can be either directly attached I/O or virtual I/O that is provided by dual VIO
servers housed in separate nodes.
ESA must be enabled: Electronic Service Agent (ESA) must be enabled on the POWER7
systems. ESA systems show decreased unscheduled repair actions and provides
invaluable statistics to gauge field performance.
Quiesce/Live Partition Mobility (LPM) critical applications: Critical applications must be
quiesced or moved to another sever using LPM, if available.
Hardware concurrent maintenance entails numerous steps that are performed by both you
and your IBM service personnel while the system is powered on. The likelihood of failure
increases with the complexity of the maintenance function. Therefore, IBM recommends that
all hardware concurrent maintenance operations be performed during off-peak hours.
The “Prepare for Hot Repair/Upgrade” (PHRU) utility on the HMC must be run by the system
administrator to determine the processor, memory, and I/O resources that must be freed up
prior to the start of the concurrent repair operation.
The Prepare for Hot Repair, Upgrade HMC, or SDMC utility is a tool for the system
administrator to identify the effects to system resources in preparation for a hot node repair,
hot node upgrade, or hot GX adapter repair operation. The utility provides an overview of
platform conditions, partition I/O, and processor and memory resources that must be freed up
for a node evacuation.
Important: Accomplishing CHARM requires careful advance planning and meeting all the
prerequisites.
Important: All serviceable hardware events must be repaired and closed before starting
an upgrade.18 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-5 displays the HMC Partitions tab showing error messages for the AIX resources.
Figure 2-5 Prepare for hot repair/upgrade utility
Figure 2-6 shows a message about the I/O resource in use by the AIX partition.
Figure 2-6 Details for the I/O resource in use by the AIX partition
You can obtain detailed information about the CHARM process in 4.3, “CEC Hot Add Repair
Maintenance (CHARM)” on page 123. Chapter 2. Exploring RAS and virtualization features in more detail 19
2.1.3 Improved memory RAS features
The following section provides details about the improved memory RAS features:
Chipkill
Chipkill is an enhancement that enables a system to sustain the failure of an entire
dynamic random access memory (DRAM) chip. An error correction code (ECC) word uses
18 DRAM chips from two DIMM pairs, and a failure on any of the DRAM chips can be fully
recovered by the ECC algorithm. The system can continue indefinitely in this state with no
performance degradation until the failed DIMM can be replaced.
72-byte ECC (cyclic redundancy check (CRC) plus retry on memory data bus (CPU to
buffer)
In POWER7, an ECC word consists of 72 bytes of data. Of these, 64 bytes are used to
hold application data. The remaining eight bytes are used to hold check bits and additional
information about the ECC word.
This innovative ECC algorithm from IBM research works on DIMM pairs on a rank basis (a
rank is a group of 10 DRAM chips on the Power 795). With this ECC code, the system can
dynamically recover from an entire DRAM failure (chipkill). It can also correct an error even
if another symbol (a byte, accessed by a 2-bit line pair) experiences a fault. This capability
is an improvement from the Double Error Detection/Single Error Correction ECC
implementation that is on the POWER6 processor-based systems.
DRAM sparing
IBM Power 780 and 795 servers have a spare DRAM chip per rank on each DIMM that
can be used to replace a failed DIMM in a rank (chipkill event). Effectively, this protection
means that a DIMM pair can sustain two, and in certain cases, three DRAM chip failures
and correct the errors without any performance degradation.
2.1.4 Active Memory Expansion
First introduced in POWER7, Active Memory Expansion (AME) is an innovative IBM
technology that enables the memory assigned to a logical partition (LPAR) to expand beyond
its physical limits.
AME relies on the real-time compression of data stored in memory to increase the amount of
available memory. When AME is enabled, the operating system compresses a portion of the
real memory, generating two pools: compressed and uncompressed memory. The size of
each pool varies, according to the application’s requirements. This process is completely
transparent to users and applications.
After using AME, the system’s effective memory increases. Then, you are able to perform
these functions:
Optimize memory utilization: By consolidating larger workloads using less real memory
than needed
Increase an LPAR’s throughput: By allowing a single LPAR to expand its memory capacity
beyond the physical memory assigned
Processing capability: Because AME relies on memory compression by the operating
system, additional processing capacity is required to use AME.20 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Active Memory Expansion License
AME is a special feature that needs to be licensed before it can be used. Check your system
configuration for the AME feature:
1. Log in into the server’s HMC.
2. In the navigation pane, expand Management ? Servers, and select the system that you
want to check.
3. Open the Properties page for the selected server.
4. Check the Capabilities tab, as shown in Figure 2-7. The Active Memory Expansion
Capable value must be set to True.
Figure 2-7 AME-capable server
Expansion factor
When using AME, you only need to perform one configuration, the memory expansion factor.
This parameter specifies the new amount of memory that is available for a specific LPAR and
thus defines how much memory the system tries to compress.
The new LPAR memory size is calculated this way:
LPAR_expanded_memory_size = LPAR_true_memory size * LPAR_expansion_factor
AME feature: If the value is False, you need to obtain a license for the AME feature, or you
can request a free 60-day trial at this website:
https://www-912.ibm.com/tcod_reg.nsf/TrialCod?OpenFormChapter 2. Exploring RAS and virtualization features in more detail 21
The expansion factor is defined on a per LPAR basis using the HMC, as shown in Figure 2-8.
Figure 2-8 Active Memory Expansion factor setting
AME: When considering AME, use the amepat tool to determine the best expansion factor
for your specific workload.22 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-9 presents a scenario where memory expansion is used in a 20 GB RAM LPAR
using an expansion factor of 1.5.
Figure 2-9 Active Memory Expansion example
2.2 Significant features
In this section, we describe previous features that play a key role in the POWER7 server RAS
and virtualization strategy.
2.2.1 Active Memory Mirroring for the hypervisor on the Power 795
Active Memory Mirroring for the hypervisor is a new RAS feature being introduced on the
Power 795 that is designed to eliminate the potential for a complete system outage as a result
of an uncorrectable error in memory.
2.2.2 Persistent hardware deallocation
For overall system availability purposes, a component that is identified as failing on a POWER
processor-based system is flagged for persistent deallocation. Component removal can occur
dynamically or at boot time, depending in the type of fault and the moment that the fault is
detected.
By deallocating failed components, we prevent faulty hardware from affecting the entire
system operation. The repair action is deferred to a more convenient, less critical time. The
Compress /
Decompress
Uncompressed
Pool
Uncompressed
Pool
Compressed
Pool
Compressed
Pool
20 GB
20 GB
LPAR's True
Memory
LPAR's Expanded
Memory
Memory Expansion Factor = 1.5
(50% Memory Expansion)Chapter 2. Exploring RAS and virtualization features in more detail 23
affected components for this function are processors, L2/L3 cache lines, memory, and I/O
adapters.
2.2.3 First Failure Data Capture (FFDC)
IBM Power Systems servers First Failure Data Capture ensures that when a fault is detected
in the system, the root cause is isolated in the first appearance of the problem without
needing any additional tests or reproducing the problem. FFDC relies on built-in checkers to
capture and identify error conditions. FFDC is the technique that is used to check all the
components in the system Central Electronic Complex (CEC): processors, memory buffers,
and I/O controllers.
First Failure Data Capture (FFDC) is a serviceability technique where a program that detects
an error preserves all the data that is required for the subsequent analysis and resolution of
the problem. The intent is to eliminate the need to wait for or to force a second occurrence of
the error to allow specially applied traps or traces to gather the data that is required to
diagnose the problem.
AIX V5.3 TL3 introduced the First Failure Data Capture (FFDC) capabilities. The set of FFDC
features is further expanded in AIX V5.3 TL5 and AIX V6.1. The following features are
described in the following sections:
Lightweight Memory Trace (LMT)
Run-Time Error Checking (RTEC)
Component Trace (CT)
Live Dump
These features are enabled by default at levels that provide valuable FFDC information with
minimal performance effects. The advanced FFDC features can be individually manipulated.
Additionally, a SMIT dialog has been provided as a convenient way to persistently (across
reboots) disable or enable the features through a single command. To enable or disable all
four advanced FFDC features, enter the following command:
#smitty ffdc
This SMIT dialog specifies whether the advanced memory tracing, live dump, and error
checking facilities are enabled or disabled. Note that disabling these features reduces system
RAS.
2.2.4 Processor RAS features
POWER-based servers are designed to recover from processor failures in many scenarios.
When a soft or transient failure is detected in a processor core, the processor instruction retry
algorithm retries the failed instruction in the same processor core. If that failure becomes a
solid or persistent failure, the alternate processor recovery algorithm tries to execute the
failed instruction in another processor core.
If the systems detects the existence of an error-prone processor, it takes the failed processor
out of service before it causes an unrecoverable system error. This process is called dynamic
processor deallocation. This features relies on the service processor’s ability to use FFDC
algorithms to notify the hypervisor of a failing processor. The Power hypervisor then
deconfigures the failing processor.
While dynamic processor deallocation can potentially reduce the overall system performance,
it can be coupled with dynamic processor sparing to automatically replace the failed
processor. This entire process is transparent to the partitions.24 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Similar to the alternate processor recovery technique, dynamic processor sparing tries to get
a free processor from the capacity on demand (CoD) pool. If not available, it uses an unused
available processor (from both shared processor pools or dedicated processor partitions). If it
cannot find an available processor, the hypervisor tries to release a processor from one active
partition based on the partition availability priority settings.
The POWER7 processor also provides the single processor checkstop functionality (already
present in POWER6). This features is invoked in the case that neither of the techniques
described is able to manage the existing processor error. With single processor checkstop,
the system reduces the probability of one failed processor to affect the overall system
availability by containing the processor checkstop to the partition using the failing processor.
Partition availability priority
POWER6 and POWER7 systems allow systems administrators to specify the availability
priority of their partition. If an alternate processor recovery event requires a spare processor
and there is no way to obtain the spare resource, and the system needs to shut down a
partition to maintain the server’s overall availability, the process selects the partitions with the
lowest availability priority. Priorities are assigned from 0 - 255 with 255 being the highest
priority (most critical partition). The partition availability priority default for a normal partition is
127 and for a VIO server is 191.
Partition availability priority settings define critical partitions so that the hypervisor can select
the best reconfiguration option after a process deallocation without sparing.
To check or modify the partition availability priority (refer to Figure 2-10) for a specific server,
use the following steps:
1. Log in into the system HMC.
2. In the navigation pane, select Systems Management ? Servers and select your server.
3. Under the Configuration menu, select Partition Availability Priority.
Figure 2-10 Partition availability priority settings
For more information: For a more detailed explanation of this process, see the white
paper POWER7 System RAS Key Aspects of Power Systems Reliability, Availability, and
Serviceability:
http://www-03.ibm.com/systems/power/hardware/whitepapers/ras7.htmlChapter 2. Exploring RAS and virtualization features in more detail 25
2.2.5 Memory RAS features
As defined in the IBM Power 770 and 780 (9117-MMB, 9179-MHB) Technical Overview and
Introduction, REDP-4639-00, and the IBM Power 795 (9119-FHB) Technical Overview and
Introduction, REDP-4640-00, IBM POWER7-based systems include a variety of protection
methods, which were already present in POWER6 servers, that are designed to prevent,
protect, or limit the consequences of memory errors in the system.
Memory error detection schemes
The following methods are the memory error detection schemes:
Hardware scrubbing
Hardware scrubbing is a method that is used to deal with intermittent errors. IBM POWER
processor-based systems periodically address all memory locations; any memory
locations with a correctable error are rewritten with the correct data.
CRC
The bus that transfers data between the processor and the memory uses CRC error
detection with a failed operation-retry mechanism and the ability to dynamically return bus
parameters when a fault occurs. In addition, the memory bus has spare capacity to
substitute a data bit-line, whenever it is determined to be faulty.
Memory page deallocation
IBM POWER processor systems can contain cell errors in memory chips using memory page
deallocation. When a memory address experiences repeated correctable errors or an
uncorrectable error, the service processor notifies the hypervisor and the memory page is
marked for deallocation. The operating system that uses the page is asked to move data to
another memory page, then the page is deallocated and no longer can be used by any
partition or the hypervisor. This action does not require user intervention. If the page is owned
by the hypervisor, it is deallocated as soon as the hypervisor releases that page.
The hypervisor maintains a list of deallocated pages. The list is used to continuously
deallocate pages upon system or partition reboots.
Memory persistent deallocation
At boot time, during system self-test, defective memory is deallocated and is not used in
subsequent reboots. If the server has available CoD memory, the hypervisor attempts to
replace the faulty memory with CoD unlicensed memory, and if properly configured, the
system triggers a service call. After memory deallocation, a service repair action needs to be
scheduled to replace the failed memory chips.
If, after system reboot, the amount of memory configured does not allow the system to
activate one or more partitions, the hypervisor reduces the memory assigned to one or more
partitions, based on the partition availability priority setting.
2.2.6 Dynamic service processor (SP) failover at run time and redundant SP
The ability to have redundant service processors (SP) to address demanding availability
requirements continues today with the Power 795 and the Power 780 systems with multi-node
capability (2 or more CECs). The redundant service processor capability enables you to
configure a secondary service processor that is activated when the primary service processor
fails. You must have either a Hardware Management Console (HMC) or System Director
Management Console (SDMC) to enable and disable the redundant service processor
capability. 26 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-11 depicts the correct configuration for redundant service processors and redundant
HMCs.
Figure 2-11 A correct service processor and HMC configuration
One HMC must connect to the port labeled as HMC Port 1 on the first two CEC drawers of
each system. A second HMC must be attached to HMC Port 2 on the first two CEC drawers of
each system. This type of solution provides redundancy for the service processors and the
HMCs.
It is important to understand exactly what a service processor is and the functions that it
provides to fully appreciate why having redundant SP with dynamic service processor failover
capability is an important RAS feature for POWER7 Enterprise Servers.
The service processor is a dedicated microprocessor that is independent of the other
POWER7 microprocessors and is separately powered. Its main job is to correlate and
process error information that is received from other system components and to engineer
error recovery mechanisms along with the hardware and the Power hypervisor. The service
processor uses error “thresholding” and other techniques to determine when corrective action
needs to be taken. Thresholding, as defined for Power RAS, is the ability to use historical
data and engineering expertise that is built into the feature to count recoverable errors and
accurately predict when corrective actions must be initiated by the system. Power systems
require a service processor to perform system power-on to initialize the system hardware and
to monitor error events during operation.
The service processor (SP) and the Power hypervisor work together to monitor and detect
errors. While the service processor is monitoring the operation of the Power hypervisor
firmware, the Power hypervisor monitors the service processor activity. The service processor
HMC1
eth0 eth1
HMC2
eth0 eth1
LAN 1 LAN 2
1 2
FSP
CEC 1
1 2
FSP
CEC 2
LPAR 1
LPAR 2
LPAR 3
LAN3 - Open network
LAN1 - hardware management
network for first FSP ports
(private)
LAN2 - hardware management network for second
FSP ports (private), separate network
hardware than LAN 1
LAN3 - open network for HMC access and
dlPAR operationsChapter 2. Exploring RAS and virtualization features in more detail 27
can take the proper action, which includes calling IBM for service, when it detects that the
Power hypervisor has lost control.
Similarly, the Power hypervisor automatically performs a reset and reload of the service
processor when it detects an error. A service processor reset/reload is not disruptive and
does not effect system operations. SP resets can also be initiated by the service processor
itself when necessary. When a SP does not respond to a reset request or the reset/reload
threshold is reached, the system dynamically performs a failover from one service processor
to the secondary SP during run time.
Service processor failover is enabled through an HMC or SDMC. The default for new systems
is to enable automatic failover if the primary service processor fails.
To enable/disable service processor redundancy failover on your managed system on the
HMC, in the navigation area of your managed server, select Serviceability ? FSP
Failover ? Setup, as seen in Figure 2-12.
Figure 2-12 HMC enable/disable service processor failover pane
To verify that the service processor redundancy is enabled on an SDMC, select your server
by clicking Navigate Resources ? Actions ? System Configuration ? Edit host ?
Capabilities.
Important: Verify that service processor redundancy is enabled on your server.28 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
One of the prerequisites for the ability to perform CEC Hot Add and Repair maintenance is
that the service processor failover must be enabled (see Figure 2-13). See Chapter 4,
“Planning for virtualization and RAS in POWER7 high-end servers” on page 101 for a detailed
overview.
Figure 2-13 SDMC enable/disable service processor failover pane
2.2.7 Hot node add and repair
Hot node add and hot node repair functions allow you to add or repair a system node without
causing downtime to the system. After the node installation, the firmware integrates the new
hardware and makes it available to existing or new partitions. These features are part of the
CHARM process. See 4.3.1, “Hot add or upgrade” on page 123 for more detailed information.
2.2.8 Hot node upgrade (memory)
Hot node upgrade allows you to increase the memory capacity in a system by adding or
replacing (exchanging) installed memory DIMMs with higher capacity DIMMs. This feature is
part of the CHARM process. See 4.3.1, “Hot add or upgrade” on page 123 for more detailed
information.
2.3 TurboCore and MaxCore technology
An innovative new feature that is offered on the POWER7 Enterprise Servers is the ability to
switch between the standard MaxCore mode, which is optimized for throughput, and our
unique TurboCore mode, where performance per core is boosted with access to both
additional cache and additional clock speed. TurboCore mode can run up to four active cores
for database and other transaction-oriented processing. Standard MaxCore mode can run up Chapter 2. Exploring RAS and virtualization features in more detail 29
to eight active cores for Internet-oriented processing. Figure 2-14 shows the POWER7 chip
design in detail.
Figure 2-14 POWER7 chip design
TurboCore is a special processing mode where only four cores per chip are activated. A
POWER7 chip consists of eight processor cores, each with on-core L1 instruction and data
caches, a rapid access L2 cache, and a larger longer-access L3 cache. With only four active
cores, ease of cooling allows the active cores to provide a frequency faster (~7.25%) than the
nominal rate. This capability also means that there is more processor cache available per
core. Both the higher frequency and the greater amount of cache per core are techniques that
can provide better performance.
Memory DMM
Memory DMM
Memory DMM
Memory DMM
Memory DMM
Memory DMM
Memory DMM
Memory DMM
Core Core Core Core Core Core Core Core
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
POWER7 Chip
Cache30 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-15 shows the differences between the MaxCore and TurboCore frequencies.
Figure 2-15 MaxCore compared to TurboCore frequency
Using MaxCore mode, which is the standard mode, all eight cores are used at a frequency of
3.86 GHz (at the time of writing this publication). Therefore, the 32 MB of L3 cache is shared
evenly across the eight cores, for example, 4 MB of L3 cache per core.
In the TurboCore mode, four cores out of the eight cores are switched off, while the other four
cores get a performance boost through running at a higher frequency, 4.1 GHz, that is,
approximately 7.25% of the nominal rate. The TurboCore four cores access the full 32 MB of
L3 cache. For example, each core gets 8 MB of L3 cache, which is double the amount in the
MaxCore mode:
MaxCore up to 256 cores @ 4 GHz (versus 64 cores @ 3.86 GHz on 780)
TurboCore up to 128 cores at 4.25 GHz (versus 32 cores @ 4.1 GHz on 780)
In TurboCore mode, up to half of the processor cores on each single-chip module (SCM) are
disabled, and their L3 cache is made available to the active processor cores on the chip. This
design provides a performance boost to the active cores. In general, the number of cores
used in TurboCore mode is equal to the number of processors activated, but only up to a
maximum of half the number of cores physically installed.
Both the Power 780 and Power 795 systems support the MaxCore and TurboCore technology.
The Power 780 can be configured with up to four system enclosures. Each system enclosure
contains one processor card, as shown in Figure 2-16 on page 31. Each processor card
Important: To be able to have a system that is capable of running in TurboCore mode, you
need extra processor cores physically installed. Only half of the processor cores are used
and you do not need to have extra activations.
Frequency (GH2)
MaxCore TurboCore
4.14 GHz
3.88 GHz
1
2
3
4
0
5Chapter 2. Exploring RAS and virtualization features in more detail 31
contains two POWER7 sockets, and each socket has eight cores with 2 MB of L2 cache and
32 MB of L3 cache.
Figure 2-16 contains a top view of the Power 780 system.
Figure 2-16 Power 780 CEC top view
The Power 795 system has a 20U-tall CEC housing that contains the system backplane
cooling fans, system electronic components, and mounting slots for up to eight processor
books. One to eight POWER7 processor books can be configured. Each processor book can
contain either 24 or 32 cores, for example, the 6-core or the 8-core offering. These cores are
packaged on four POWER7 processor chips. Each processor chip contains 2 MB of on-chip
L2 cache and 32 MB of eDRAM L3 cache, and each core supports four hardware threads.
There are two types of processor nodes available on the Power 795 offering with the following
features:
Four 6-core POWER7 single chip glass ceramic modules with 24 MB of L3 cache (24
cores per processor node) at 3.7 GHz (feature code 4702)
Four 8-core POWER7 single chip glass ceramic modules with 32 MB of L3 cache (32
cores per processor node) at 4.0 GHz (feature code 4700)
Both the Power 780s and the Power 795s 8-core processor cards can be configured in either
of two modes, MaxCore or TurboCore.
In the MaxCore mode, the POWER7 cache design has 4 MB of L3 cache per core. Although
it might look as if there is a private L3 cache per core, this cache can be shared between
cores. The cache state from an active core’s L3 cache can be saved into the L3 cache of less
active cores.
In the case of TurboCore, the cache state from the four active cores can be saved into the L3
cache of the TurboCore’s inactive cores. The result is more accessible cache per core.
It is important to note that in using TurboCore Mode, the processor chip’s core count has
decreased from eight cores per chip to four. An 8-core partition formally residing on one
processor chip now must reside on two.
TurboCore mode: Only the 8-core processor card supports the TurboCore mode.32 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
A Power 780 system needing sixteen cores and packaged in a single drawer as in
Figure 2-16 on page 31 requires two drawers when using TurboCore.
Another effect, though, stems from the fact that chip crossings introduce extra time for storage
accesses. For the same number of cores, there are often more chips required with
TurboCore. So, more chips often imply a higher probability of longer latency for storage
accesses. The performance chapter, in 7.2.4, “MaxCore and TurboCore modes” on page 254,
addresses the performance effects of using TurboCore mode versus MaxCore mode.
Example 2-1, Example 2-2, and Example 2-3 explain the relationship between physical
processors, activated processors, and TurboCore mode.
Example 2-1 Relationship between physical processors - 32 physical processors
A server has 32 physical processor cores with 14 activated, running in MaxCore
mode. If you re-IPL the system and switch to TurboCore mode, you now have 14
processor cores running in TurboCore mode.
Example 2-2 Relationship between physical processors - 48 physical processors
A server has 48 physical processor cores with 21 activated, running in MaxCore
mode. If you re-IPL the system and switch to TurboCore mode, you will have 21
processors running in TurboCore mode. There is no requirement to have an even
number of processors running in TurboCore mode.
Example 2-3 Relationship between physical processors - 40 physical processors with 29 active
A server has 48 physical processor cores with 29 activated, running in MaxCore
mode. If you re-IPL the system and switch to TurboCore mode, you will have 24
processors running in TurboCore mode and 5 extra activations, which are not used,
because the maximum number of cores that can be used in TurboCore mode is half the
number of cores physically installed (24 out of 48 in this case).
The rules for a minimum number of processor activations still apply when you configure a
POWER7 Enterprise server 780 or 795 for TurboCore mode:
Model 780: A minimum of four processor cores must be activated.
Model 795: A minimum of three feature code 4700 processor books for TurboCore mode
support 96 cores. You must activate a minimum of 25% or 24 of the installed processors,
whichever is greater.
2.3.1 Enabling and disabling TurboCore mode
TurboCore mode is enabled/disabled through the ASMI interface on either your HMC or
SDMC. The POWER7 server must meet the requirements reviewed earlier in this chapter to
support the TurboCore settings. All processors in the system must support TurboCore for the
processors to be enabled. If processors are installed in the system that do not support
TurboCore, a message similar to Example 2-4 is displayed.
Example 2-4 TurboCore mode not supported
Unable to process the request because some processors are not capable of
supporting TurboCore settings.Chapter 2. Exploring RAS and virtualization features in more detail 33
The location codes of the processors that do not support TurboCore are also displayed. To
perform this operation, you must have one of the following authority levels:
Administrator
Authorized service provider
To set the TurboCore settings, perform the following steps:
1. On the ASMI Welcome pane, specify your user ID and password, and click Log in.
2. In the navigation area, expand Performance Setup.
3. Click Turbo Core Settings.
4. In the right pane, select the settings that you want.
5. Click Save settings.
Figure 2-17 shows the TurboCore enablement through the ASMI from the SDMC.
Figure 2-17 TurboCore settings
2.4 Hypervisor and firmware technologies
In this section, we describe the hypervisor and firmware technologies.
2.4.1 Hypervisor
The technology behind the virtualization of the IBM POWER7 systems is from a piece of
firmware that is known as the Power hypervisor (PHYP), which resides in flash memory. This
firmware performs the initialization and configuration of the POWER7 processors, along with
the required virtualization support to now run up to 1,000 partitions concurrently on the IBM
Activate settings: To enable or disable TurboCore settings, perform an initial program
load (IPL) to power off, and then power on a managed system.34 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
POWER 795 server. The Power hypervisor is an essential element of the IBM virtualization
engine. The Power hypervisor is the key component of the functions that are shown in
Figure 2-18 and performs the following tasks:
Provides an abstraction layer between the physical hardware resources and the LPARs
Enforces partition integrity by providing a security layer between the LPARs
Controls the dispatch of virtual processors to physical processors
Saves and restores all processor state information during logical processor context switch
Controls hardware I/O interrupts to management facilities for LPARs
Figure 2-18 Power hypervisor functions
The Power hypervisor, acting as the abstraction layer between the system hardware and the
LPARs, allows multiple operating systems to run on POWER7 technology with little or no
modifications.
The Power hypervisor is a component of the system firmware that is always installed and
activated, regardless of your system configuration. It operates as a hidden partition, with no
processor resources assigned to it. The hypervisor provides privileged and protected access
to assigned partition hardware resources and enables the use of advanced Power
virtualization features by receiving and responding to requests using specialized hypervisor
calls.
The Power hypervisor requires both system processor and memory resources to perform its
tasks. The performance impact is relatively minor for most workloads, but it can increase with
extensive amounts of page-mapping activities. Refer to Chapter 7, “POWER7 Enterprise
Server performance considerations” on page 249 for more information about performance
considerations.
Micro-partitioning technology, provided by the hypervisor, allows for increased overall use of
system resources by automatically applying only the required amount of processor resources
that each partition needs. Micro-partitioning technology allows for multiple partitions to share
one physical processor. Partitions using micro-partitioning technology are referred to as
shared processor partitions. You can choose between dedicated processor partitions and
shared processor partitions using micro-partitioning technology on POWER7. Therefore, you
are able to have both dedicated and shared processor partitions running on the same system
at the same time.
The hypervisor schedules shared processor partitions from a set of physical processors that
is called the shared processor pool. By definition, these processors are not associated with
dedicated partitions.
Implement the virtual I/O server: While not required, we highly advise that you
implement the virtual I/O server for use with the Power hypervisor technology to take
advantage of virtualization capabilities when sharing physical I/O resources between
LPARs.Chapter 2. Exploring RAS and virtualization features in more detail 35
The hypervisor continually adjusts the amount of processor capacity that is allocated to each
shared processor partition and any excess capacity that is unallocated based on current
partition profiles within a shared pool. Tuning parameters allow the administrator extensive
control over the amount of processor resources that each partition can use.
For IBM i, the Technology Independent Machine Interface (TIMI) and the layers above the
hypervisor are still in place. System Licensed Internal Code, however, was changed back in
POWER5™ and enabled for interfacing with the hypervisor. The hypervisor code is based on
the i Partition Licensed Internal Code and is now part of the hypervisor.
2.4.2 Firmware
Server firmware is the licensed machine code that resides in system flash memory. Server
firmware includes a number of subcomponents, including Power hypervisor power control, the
service processor, and LPAR firmware that is loaded into your AIX, IBM i, and Linux LPARs.
There are several types of upgrades:
Concurrent Firmware maintenance (CFM) is one way to perform maintenance for the
hypervisor. Concurrent Firmware maintenance on your IBM POWER7 server firmware
refers to a system that can execute the firmware upgrade without having to reboot the
system.
Deferred updates (delay in the upgrade) refer to firmware upgrades that can be completed
in the concurrent mode, but afterwards, specific firmware upgrade functions can only be
activated after the next system IPL.
Disruptive upgrades require that you must perform a full system reboot before the
contents of the firmware upgrade take effect.
System firmware is delivered as a release level or a service pack. Release levels support
both new function or features, as well as new machine types or models. Upgrading to a higher
release level is disruptive to client operations. IBM intends to introduce no more than two new
release levels per year. These release levels are supported by service packs. Service packs
are intended to contain only firmware fixes and not to introduce new functionality. A service
pack is an update to an existing release level. The Power code matrix website also provides
the life cycle for each system firmware release level. Use life-cycle information, along with the
supported code combination tables, to assist you with long-term planning for upgrades to your
HMC and system firmware.
2.4.3 Dynamic firmware update
To increase systems availability, it is important that defects in the service processor firmware,
Power hypervisor, and other system firmware can be fixed without experiencing an outage.
IBM Power servers can operate in a given supported firmware release, using concurrent
firmware fixes to be updated to the current release level.
Normally, IBM provides patches for a firmware release level for up to two years after its
release. Then, clients must plan to upgrade to a new release in order to stay on a supported
firmware release level.
In addition to concurrent and disruptive firmware updates, IBM offers concurrent updates,
which include functions that are activated in the next server reboot.36 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2.4.4 Firmware update and upgrade strategies
This section describes the firmware update and upgrade strategies.
Upgrade strategy (release level)
New functions are released via firmware release levels. The installation of a new release level
is disruptive. At the time of this book, there is no plan for non-destructive versions.
Unless you require the functions or features that are introduced by the latest release level, it is
generally prudent to wait a few months until the release level stability has been demonstrated
in the field. Release levels are supported with fixes (delivered via service packs) for
approximately two years. Supported releases overlap, so fixes usually are made in multiple
service packs. Typically, clients are not required to upgrade to the latest level to obtain fixes,
except when the client’s release level has reached the end of service. Therefore, clients can
stay on an older (typically more stable) release level and still obtain fixes (via service packs)
for problems. Because the number of changes in a service pack is substantially fewer than a
release level, the risk of destabilizing the firmware by installing a service pack update is much
lower.
Update strategy (service pack within a release)
The strategy to update to the latest service pack needs to be more aggressive than upgrading
to the latest release level. Service packs contain fixes to problems that have been discovered
in testing and reported from the field. The IBM fix strategy is to encourage the installation of a
fix before the problem is encountered in the field. The firmware download page provides
downloads for the N and N-1 service packs, unless a problem was introduced with the N-1
service pack. In such cases, only the latest level is available. Our goal is to prevent the
installation of a broken service pack. When product and development engineering determines
that the latest available (N) service pack has sufficient field penetration and experience (30 -
60 days), the download for the older (N-1) service pack is removed from the firmware
download web page.
2.5 Power management
In response to rising energy costs, which can be prohibitive to business growth, and also in
support of green initiatives, IBM developed the EnergyScale™ technology for IBM Power
Systems. This technology allows the system architects to monitor and control energy
consumption for power, cooling, planning, and management purposes.
POWER7 processor-based systems support EnergyScale features and support IBM Systems
Director Active Energy Manager™, which is a comprehensive energy management tool that
monitors and controls IBM Power servers. Support and awareness of EnergyScale extends
Disruptive firmware patches: Certain firmware patches, for example, patches changing
the initialization values for chips, and the activation of new firmware functions, which
require the installation of a new firmware release level, are disruptive processes that
require a scheduled outage and full server reboot.
Website: Refer to the following website to ensure that you have the latest release levels for
the supported IBM Power Systems firmware:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/power7.htmlChapter 2. Exploring RAS and virtualization features in more detail 37
throughout the system software stack, and is included in AIX, IBM i, and Linux operating
systems. Table 2-1 reviews the Power management features and indicates which features
require Active Energy Manager (AEM).
Table 2-1 Power management features
2.5.1 Differences in dynamic power saver from POWER6 to POWER7
Dynamic power saver differs from POWER6 to POWER7 systems.
In POWER6 systems, maximum frequencies varied based on whether Favor Power or Favor
Performance was selected in Active Energy Manager. Favor Power guaranteed power
savings by limiting the maximum frequency of the system under peak utilization. Favor
Performance allowed a higher frequency range. In both cases, the firmware increased the
processor frequency only under high utilization.
In POWER7 systems running system firmware EM710, EnergyScale Dynamic Power Saver
maintains compatibility with POWER6 implementations. In POWER7 systems running EM711
system firmware or later, Dynamic Power Saver has been enhanced so that the full frequency
range is available to a system (including frequencies in excess of 100% where applicable)
regardless of whether power or performance is selected in Active Energy Manager.
Feature Requires
AEM
Description
Power Trending Yes Collects and reports power consumption information for a
server.
Thermal Reporting Yes Collects and reports inlet and exhaust temperatures (where
applicable).
Static Power
Server
No Provides predictable performance with power savings by
reducing CPU frequency by a fixed amount.
Dynamic Power
Saver
Yes Allows a system to implement algorithms to adjust the
processor core frequency to favor system performance (saving
power where applicable) or to balance power and performance.
Core frequency can exceed 100% at times.
Power Capping Yes Enforces a user-specified power budget on a system.
Energy-Optimized
Fans
No System fans respond dynamically to temperatures of system
components.
Processor Core
Nap
No Enables low-power modes in POWER7 when cores are unused.
Processor Folding No Dynamically re-allocates which processor cores execute a task
to optimize energy efficiency of the entire system.
EnergyScale for
I/O
No Powers on I/O slots only when needed.
Server Power
Down
Yes Provides information that is necessary to dynamically migrate
workloads off of a lightly utilized system, allowing the entire
system to be powered off.
Partition Power
Management
Yes Provides power savings settings for certain partitions and the
system processor pool. Not available on IBM PS70x Blades.38 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Instead of controlling frequency ranges, POWER7 EnergyScale with EM711 firmware or
newer selects from various power and performance control algorithms, depending on the
selected mode. In Dynamic Power Saver, Favor Power mode, system firmware balances
performance and power consumption, only increasing processor core frequency when the
system is heavily utilized. In Dynamic Power Saver, Favor Performance mode, system
firmware defaults to the maximum processor core frequency that is allowed for a given
system’s environment and configuration, and reduces frequency only when a system is lightly
utilized or idle.
2.6 Rapid deployment of PowerVM clients
This section presents three methods for parallel deployment of virtual servers (LPARs) on a
high-end POWER7 system to deliver improvements in both efficiency and flexibility.
2.6.1 Deployment using the VMControl plug-in
VMControl allows you to both provision an empty server and deploy an operating system
image to it. For instructions to provision a new virtual server via VMControl, see “Creating a
virtual server” under 4.1.1, “Managing virtual servers” in IBM Systems Director VMControl
Implementation Guide on IBM Power Systems, SG24-7829:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247829.pdf
For instructions to deploy a complete operating system on a POWER7 system, see 4.2.4,
“Deploying a virtual appliance to a host” in IBM Systems Director VMControl Implementation
Guide on IBM Power Systems, SG24-7829:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247829.pdf
Section 5.2.5, “Managing virtual appliances”, in IBM Systems Director VMControl
Implementation Guide on IBM Power Systems, SG24-7829, also describes in detail creating
image repositories for AIX and Linux, and capturing a mksysb image or resource to
create/deploy a virtual appliance. A virtual appliance contains an image of a full operating
system, and it can also contain software applications and metadata describing the virtual
server that the image requires.
2.6.2 File-backed virtual optical devices
The virtual I/O server support for virtual optical devices allows sharing of a physical CD or
DVD drive that is assigned to the virtual I/O server between multiple AIX, IBM i, and Linux
client partitions. This feature has been available since Virtual I/O Server Version 1.5. This
feature supports provisioning of the DVD drive on the virtual I/O server by mapping it to each
of the partitions. This drive can only be utilized by one partition at a time.
A new feature, the file-backed virtual optical device, which was introduced in Virtual I/O
Server Version 1.5, allows simultaneous use of the installation image by all the virtual servers
in parallel. This technology is the preferred method for deployment of operating system
images on high-end systems where a large number of virtual servers can be configured.
Using file-backed virtual optical devices provides the flexibility to use an ISO image as a
virtual device and share it among all the virtual servers on your system as a virtualized optical
drive. The virtual media repository is used to store virtual optical media for use by the
file-backed virtual optical devices. This design is analogous to the file-backed virtual optical
device being a juke box and the virtual optical media repository as its CD library.Chapter 2. Exploring RAS and virtualization features in more detail 39
The following procedure illustrates how to install an AIX client partition using file-backed
virtual optical devices. This procedure can also be used in installing Linux and IBM i
partitions:
1. The first step is to check for a defined storage pool in which you need to create the virtual
media repository, as shown in Example 2-5.
Example 2-5 Checking for the defined storage pools
$ lssp
Pool Size(mb) Free(mb) Alloc Size(mb) BDs Type
rootvg 69632 41984 512 0 LVPOOL
isopool 16304 16304 16 0 LVPOOL
The lssp command that is shown in Example 2-5 lists the storage pools that are defined in
the virtual I/O server.
2. Create a virtual media repository on the virtual I/O server using the mkrep command, as
shown in Example 2-6.
Example 2-6 Creating a virtual media repository
$ mkrep -sp isopool -size 10G
Virtual Media Repository Created
3. Copy the ISO images to the virtual I/O server and place them in a directory that is created
in /home/padmin.
4. Create a virtual optical media disk in the virtual media repository using the mkvopt
command, as shown in Example 2-7.
Example 2-7 Creating a virtual optical media disk
$ mkvopt -name AIXcd -file /home/padmin/AIXiso/dvd.710_GOLD.v1.iso
The mkvopt command that is shown in Example 2-7 creates a virtual optical media disk
named AIXcd from the dvd.710_GOLD.v1.iso image that is located in the
/home/padmin/AIXiso directory.
5. In order to show that the new virtual optical media disk was added with the mkvopt
command, use the lsrep command as shown in Example 2-8.
Example 2-8 Showing that the virtual optical media was added
$ lsrep
Size(mb) Free(mb) Parent Pool Parent Size Parent Free
10198 7083 isopool 16304 6064
Name File Size Optical Access
AIXcd 3115 None rw
6. Remove the iso file to save space, because it is already in the repository.
7. Now, map the virtual optical media disk to a virtual server adapter that is mapped to the
AIX client with the mkvdev command, as shown in Example 2-9.
Example 2-9 Mapping the virtual optical media disk
$ mkvdev -fbo -vadapter vhost0
vtopt0 Available40 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
The mkdev command makes a vtopt0 device available, as shown in Example 2-9 on
page 39.
8. The next step is to load the virtual media in the virtual optical device using the loadopt
command, as shown in Example 2-10.
Example 2-10 Loading the virtual media
$ loadopt -disk AIXcd -vtd vtopt0
9. Verify the mapping with the lsmap command. The output is similar to Example 2-11.
Example 2-11 Output of the lsmap command
$ lsmap -vadapter vhost# (replace # with your adapter number)
VTD vtopt0
Status Available
LUN 0x8400000000000000
Backing device /var/vio/VMLibrary/AIXcd
Physloc
Mirrored N/A
10.Use the lsdev command to show the newly created file-backed optical device. Refer to
Example 2-12.
Example 2-12 Showing the file-backed optical device
name status description
ent2 Available Virtual I/O Ethernet Adapter (l-lan)
ent3 Available Virtual I/O Ethernet Adapter (l-lan)
ent4 Available Virtual I/O Ethernet Adapter (l-lan)
ent5 Available Virtual I/O Ethernet Adapter (l-lan)
vasi0 Available Virtual Asynchronous Services Interface (VASI)
vbsd0 Available Virtual Block Storage Device (VBSD)
vhost0 Available Virtual SCSI Server Adapter
vsa0 Available LPAR Virtual Serial Adapter
name status description
max_tranVTD Available Virtual Target Device - Disk
skessd6s_hd01 Available Virtual Target Device - Disk
vtd01 Available Virtual Target Device - Disk
vtopt0 Available Virtual Target Device - File-backed Optical <- NEW
name status description
ent6 Available Shared Ethernet Adapter
Finally, we can use the vtopt0 device to boot the AIX client partition and install from it.
2.6.3 Deployment using the System Planning Tool (SPT)
The IBM System Planning Tool is the third method to deploy a system that is based on
existing performance data or based on new workloads. The system plans that are generated
by the SPT can be deployed on the system by the Hardware Management Console (HMC).
We describe the SPT in detail in Chapter 4 of this book (4.8, “System Planning Tool (SPT)” on
page 153). For the latest detailed information about SPT, refer to the IBM System Planning
Tool at this website:
http://www-947.ibm.com/systems/support/tools/systemplanningtool/Chapter 2. Exploring RAS and virtualization features in more detail 41
2.7 I/O considerations
In a Power system, up to 10 partitions can be defined per physical processor. With a
maximum of 1,000 LPARs per server, typically each LPAR uses at least one Ethernet adapter
and one adapter to access the back-end storage. This design results in at least 2,000
adapters that are needed to exploit the full capacity of the server. This number doubles if we
consider redundant adapters. You also need to consider the number of cables and both LAN
and SAN switch ports that are required for this type of a configuration.
Available systems cannot fulfill this I/O requirement in terms of physical adapters. However,
by using PowerVM, you can implement several storage virtualization technologies that allow
you to share the disk I/O resources.
Although many backing devices are supported, not all of them offer redundancy or
virtualization capabilities. The following devices are available backing devices:
Internal storage (virtualized or physically assigned)
Physically attached external storage
Virtualized SAN storage (either virtual SCSI (vSCSI) or N_Port ID Virtualization (NPIV))
Disk backed by file
Disk backed by logical volume (LV)
Optical CD/DVD
Optical DVD-RAM backed by file
Tape devices
Because we focus on virtualized highly available solutions, we only advise the use of
SAN-attached devices that are virtualized via vSCSI or NPIV technologies.
2.7.1 Virtual SCSI
Virtual SCSI (vSCSI) is a virtual implementation of the SCSI protocol. Available since
POWER5, it provides Virtual SCSI support for AIX 5.3 and later, selected Linux distributions,
and IBM i (POWER6 needed).
Virtual SCSI is a client/server technology. The virtual I/O server owns both the physical
adapter and the physical disk and acts as a server (target device). The client LPAR accesses
the physical disk as a SCSI client. Both client and server virtual SCSI adapters are configured
using HMC, SDMC, or Integrated Virtualization Manager (IVM). Physical connection between
the adapters is emulated by mapping the corresponding adapter slots at the management
console and by configuring the devices in both client and virtual I/O server.
After the adapter’s configuration is ready in the client and the server, the back-end disk must
be mapped to the virtual client. This process is done at the VIO server or servers. Then, the
disk is configured at the client partition as a regular physical disk.
As shown in Figure 2-19 on page 42, for availability purposes, always consider a dual virtual
I/O server environment with at least two paths to the physical storage at each virtual I/O
server.
Supported storage systems and drivers: For an up-to-date list of supported storage
systems and drivers, consult the SSIC website:
http://www-03.ibm.com/system/support/storage/ssic/interoperability.wss42 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-19 Redundant vSCSI configuration
Figure 2-19 shows these components:
Two virtual I/O servers.
Each virtual I/O server has two paths to the SAN storage.
Using the multipath I/O (MPIO) driver, the client partition can access its backing device
using the prioritized path.
The backing disks are located in a SAN environment.
SCSI Remote Direct Memory Access (RDMA)
By implementing the RDMA protocol, PowerVM SCSI implementation has the ability to
directly transfer information between SCSI initiators and target memory address spaces.
Important: Shared storage pool redundancy is not supported at the time of developing this
book.Chapter 2. Exploring RAS and virtualization features in more detail 43
As shown in Figure 2-20, the SCSI request and responses are sent over the vSCSI adapters
using the Power hypervisor, but the actual data transfer is done directly between the LPAR
and the physical adapter using the RDMA protocol.
Figure 2-20 Logical Remote Direct Memory Access (RDMA)
2.7.2 N_Port ID Virtualization (NPIV)
In a typical SAN environment, when using FC, the logical unit numbers (LUNs) are created
using the physical storage and then mapped to physical host bus adapters (HBAs). Each
physical port on each physical FC adapter is identified by its own unique worldwide port name
(WWPN).
NPIV is an FC adapter virtualization technology. By implementing NPIV, you can configure
your system so that each LPAR has independent access to the storage system that shares a
physical adapter.
To enable NPIV in your managed system, you need to install one or more virtual I/O servers.
By creating servers’ and clients’ virtual FC adapters, the virtual I/O server provides a pass
through to enable the client virtual server to communicate with the storage subsystem using a
shared HBA.
Using NPIV, you can have FC redundancy either by using MPIO or by mirroring at the client
partition. Because the virtual I/O server is just a pass through, the redundancy occurs totally
in the client.
VSCSI Client
Data
Buffer
VSCSI
Initiator
VSCSI
Tar get
Physical
Adapter
Driver
SCSI Control
POWER Hypervisor
I/O Server
PCI Adapter
Data44 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
As shown in Figure 2-21, to achieve high levels of availability, NPIV must be configured in a
dual VIO environment with at least two HBAs at each virtual I/O server.
Figure 2-21 NPIV with redundancy
Figure 2-21 shows the following scenario:
There are two virtual I/O servers to provide redundancy at the VIO level.
There are two independent HBAs at each VIO and they are connected to the storage.
The client can access the storage through both virtual I/O servers using any of the
configured HBAs.
Using this configuration, you protect your client partitions from both HBA and virtual I/O
server problems.
NPIV memory prerequisites
When planning for an NPIV configuration, remember that, as in any other high-speed
adapters, virtual FC adapters require memory to operate. Because the virtual I/O server only
copies packets from the physical adapter to the virtual client adapter, it does not require extra
memory. The hypervisor reserves the required memory.
Adding more HBAs: Depending on your environment, consider adding more HBAs for
both load balancing and availability purposes.Chapter 2. Exploring RAS and virtualization features in more detail 45
Mixed configurations
As shown in Figure 2-22, an NPIV and vSCSI mixed configuration is also supported. A
configuration in which you also use physical adapters in the client partitions is supported, too.
However, consider that if a physical adapter is present, you need to move your disks to a
virtual adapter before performing a Live Partition Mobility operation.
By using NPIV and vSCSI in the same scenario, you can take advantage of the specific
capabilities of both technologies. Refer to IBM PowerVM Virtualization Introduction and
Configuration, SG24-7940-04, for a complete comparison between NPIV and vSCSI.
Figure 2-22 vSCSI, NPIV, and physical adapter in the same client partition
2.8 Active Memory Sharing
Since the earlier version of PowerVM, formerly known as Advanced Power Virtualization
(APV), IBM Power Systems have had the ability to virtualize processor use. Initially known as
micro-partitioning, this features allows one processor core to be shared with up to 10 LPARs.
Important: Consider reserving 140 MB for each client virtual FC adapter. However, the
amount of memory that is needed is much less than the amount that is required to manage
physical adapters. For more information, refer to the following website:
https://www.ibm.com/developerworks/mydeveloperworks/blogs/aixpert/entry/virtual
_fibre_channel_for_npiv_requires_memory_too59?lang=en
Server Virtual I/O Server AIX IBM i
vscsi
Passthru module
Client
vscsi
Physical
FC HBA
HBA
NPIV
NPIV
NPIV
Physical
FC HBA
NPIV
Fibre
HBA
MPIO
IBM i
Multi-pathing
POWER Hypervisor
SAN Switch SAN Switch
SAN Switch
Controller Controller46 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Introduced in 2009, Active Memory Sharing (AMS) is an advanced PowerVM memory
virtualization technology that allows multiple partitions in a Power System to share a common
pool of physical memory.
AMS can be used to improve memory utilization in a Power System’s server by reducing the
total assigned memory or by creating more LPARs using the same amount of physical
memory.
AMS allows the administrator to define shared memory pools and to assign LPARs to these
pools. As shown in Figure 2-23, the same Power System server can host both shared and
dedicated memory LPARs.
Figure 2-23 Power System with dedicated and shared memory partitions
Using AMS, memory flows from one LPAR to another LPAR according to the real needs of
each LPAR, not based on fixed assignments. Using AMS, the sum of logical memory that is
assigned to a pool of shared memory partitions can exceed the amount of memory in the
pool, allowing the creation of more LPARs than if using dedicated memory. This capability is
known as memory logical overcommitment.
2.8.1 Shared memory pool
The shared memory pool is a collection of physical memory that is reserved for shared
memory partitions’ exclusive use. In a system that is enabled to use the AMS feature, the
administrator must define a shared memory pool before creating shared memory partitions.
dedicated
LPAR #2
memory
dedicated
LPAR #1
memory
unused
memory
Shared
memory
pool
Physical memory
LPAR #1 LPAR #2 LPAR #3 LPAR #4 LPAR #5
Hypervisor
Paging
Devices
IBM Power SystemChapter 2. Exploring RAS and virtualization features in more detail 47
The shared memory pool size can be increased and reduced dynamically. If no shared
memory partitions are available, the pool can also be deleted.
In a shared memory environment, each shared memory partition requires a dedicated paging
space that is assigned to the shared memory pool. If there is no available paging device in the
pool, the activation fails. Paging devices are automatically designed based on the maximum
logical configuration of the LPAR.
Paging devices can be dynamically added or removed from the shared memory pool if not in
use by any virtual server.
2.8.2 Paging virtual I/O server
In memory overcommitment situations, the hypervisor needs to free memory pages in the
shared processor pool. As in regular paging, the data in those memory pages needs to be
moved to paging devices to be restored when needed. In active memory sharing
environments, this paging activity is performed by the virtual I/O server that is assigned to that
shared memory pool following the hypervisor requests.
For availability, we highly suggest to implement a dual VIO to provide redundancy for shared
memory pool paging activities.
As described in 2.8.1, “Shared memory pool” on page 46, each shared memory LPAR
requires its own paging device. Although many backup devices can be used, for performance
and high availability, we advise that you only use these backup devices:
Mirrored volumes (distributed over many physical disks)
Located on SAN environments
Accessible for both paging virtual I/O servers
2.8.3 Client LPAR requirements
In order to use AMS, an LPAR must meet the following prerequisites:
Use shared processors rather than dedicated processors
Use virtual I/O resources
AIX Level 6.1 TL 03 or later
Novell SUSE SLES11 kernel 2.6.27.25-0.1-ppc64 or later
IBM i Version V6R1M1 PTF SI32798 or later
Shared memory option: If there is no shared memory pool defined, the shared memory
option does not appear in the Create logical partition wizard. However, you can define your
partition as a dedicated memory partition and later change this setting. A reboot is required
for this operation.
Upper limit: Up to 1,000 partitions are supported at each shared memory pool.48 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2.8.4 Active Memory Sharing and Active Memory Expansion
Active Memory Sharing is a PowerVM feature. Active Memory Expansion is a virtualization
technology that allows a partition to perform memory compression and expand its memory.
Although these technologies differ, they are both memory virtualization technologies that can
work independently or together.
2.8.5 Active Memory Sharing with Live Partition Mobility (LPM)
Shared memory partitions are eligible for LPM operations if there is a shared memory pool in
the destination server with available, suitable paging devices to be allocated to the migrated
partition.
2.9 Integrated Virtual Ethernet
First introduced in POWER6 servers, Integrated Virtual Ethernet (IVE) was not present in the
high-end servers. With the release of POWER7 servers, this high-speed, virtualizable
network technology is now available for the IBM POWER7 780 servers.
Also called the Host Ethernet Adapter (HEA), IVE enables the sharing of integrated
high-speed Ethernet ports. IVE includes hardware features to provide logical Ethernet ports
for inter-partition and external network communication without using any other component,
such as the virtual I/O server.
The IVE is a physical Ethernet adapter that is connected to the GX+ bus of the Power
processor instead of being connected to the PCI buses, as illustrated in Figure 2-25 on
page 50. This configuration provides a high throughput adapter. IVE also includes hardware
features to enable the adapter to provide logical adapters. These logical adapters appear as
regular Ethernet adapters to the virtual servers.Chapter 2. Exploring RAS and virtualization features in more detail 49
As shown in Figure 2-24, IVE logical adapters communicate directly between the LPAR and
external networks, reducing the interaction with the hypervisor. Previously, this
communication was performed using virtual Ethernet and Shared Ethernet Adapter (SEA)
adapters by moving packages from one LPAR to another LPAR through the Power hypervisor.
Figure 2-24 SEA and IVE model comparison
Hosting
Partition
Packet
Forwarder
AIX AIX Linux
Virtual
Ethernet
Driver
Virtual
Ethernet
Driver
Virtual
Ethernet
Driver
Network Adapters
Virtual Ethernet Switch
Hypervisor
Using Virtual I/O Server Shared Ethernet Adapter Using Integrated Virtual Ethernet
AIX AIX Linux
Virtual
Ethernet
Driver
Virtual
Ethernet
Driver
Virtual
Ethernet
Driver
Integrated Virtual Ethernet
LAN, WAN,...50 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-25 shows the IVE and processor data connections.
Figure 2-25 IVE and processor data connections
At the time that this publication was written, three IVE adapters were available on the Power
780:
Feature code 1803: Four 1 Gbps Ethernet ports
Feature code 1804: Two 10 Gbps SFP+ optical (SR only) Ethernet ports and two 1 Gbps
copper Ethernet ports
Feature code 1813: Two 10 Gbps SFP+ copper twinaxial ports and two 1 Gbps Ethernet
ports
For more information about IVE features and configuration options, see the Integrated Virtual
Ethernet Adapter Technical Overview and Introduction, REDP-4340-00.
2.10 Partitioning
One of the earlier concepts that was introduced in Power Systems was the creation of LPARs.
LPARs were introduced in Power Systems beginning with POWER4™ systems. Since this
introduction, IBM has continued to improve the virtualization technologies. The goal of this
publication is to familiarize you with most of these virtual technologies.
IVE supports 64 LPARs maximum: Each IVE feature code can address up to 64 logical
Ethernet ports to support up to 64 LPARs. If you plan to have more than 64 LPARs, use the
Shared Ethernet Adapter.
P510C-2
A
P510C-2
A
USB
ctrl
Integrated
Virtual Ethernet
ports
USB ports
Serial ports
Memory
controller 0
Memory
controller 1
1.232 Gb/s
GX1
PCI express slot 2
Integrated
Virtual Ethernet
ports
1.232 Gb/s
PCI express slot 3
PCI express slot 4
PCI express slot 5
PCI express slot 6
GX0
POWER7
chip 0
6 or 8 cores
A
B
C
D
D
C
B
AChapter 2. Exploring RAS and virtualization features in more detail 51
Prior to the introduction of partitioning, IBM clients experienced the following situations:
Clients installed a number of related applications on a single machine, which led to
possible incompatibilities in system requirements and competition for resources:
– One example was an application requiring certain network parameter settings;
however, another application required a separate setting. These parameters might not
be valid for the third application. The use of settings, such as Asynchronous I/O (aio) in
AIX, can be used by one application, but not another application.
– Another example was an application using more CPU time than other applications,
thus slowing down the whole system. This situation is analogous to a 1 Tier
Architecture. IBM, with AIX Version 4, attempted to remedy the situation by introducing
a feature called Workload Management. The challenge was that, even with Workload
Management, all the applications still executed in the same operating system space.
Application designers needed to separate the machines on which the applications are
installed. For each new application, a new physical machine had to be purchased for each
application that was deployed. Clients ended up with an unmanageable number of
physical machines. Data center floors were full of machines. This situation is analogous to
an n-Tier system.
To avoid these situations and to make better use of deployed technology and to gain full
power of IT investments, server consolidation became a necessity. IBM introduced POWER4
systems that were capable of logical partitioning. Each partition was seen as an individual
machine. POWER4 logical partitioning had limitations, including the inability to perform
Dynamic Reconfigurations (DLPAR). Most DLPAR operations were hardware-based. Clients
were unable to virtualize processors.
IBM POWER improved with the virtualization of system resources. A number of publications
exist that explain more about LPARs. Consider reading IBM PowerVM Virtualization
Introduction and Configuration, SG24-7940, which lists in detail, virtualization features,
operating system support, and hardware support. In general terms, you can virtualize the
processor, memory, I/O, and operating system. A single copy of AIX can be virtualized into
multiple workload partitions (WPARs).
This section describes the process of creating an LPAR. Although we concentrate on creating
the LPAR using the IBM Systems Director Management Console (SDMC), all these steps can
be performed using an HMC.
Assumptions
We create an LPAR with the following assumptions:
We assume that the POWER7 server is installed, powered on, and connected to either an
HMC or SDMC. Chapter 4, “Planning for virtualization and RAS in POWER7 high-end
servers” on page 101 contains a discussion about planning and installation.
SDMC: Two names have changed with the SDMC:
LPARs are called virtual servers on the SDMC.
Managed servers are called hosts on the SDMC.
Naming: Be aware of the name changes between the HMC and the SDMC. We will use
both virtual server (VS) and LPAR interchangeably throughout the book. Do not confuse a
virtual server with a virtual I/O server.52 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
We assume that the sizing of the LPAR has been determined and that the system
administrator has enough resources to meet the minimum requirements to start the
partition. You can choose to use the entire POWER7 as a single partition system. If this
design is done, after the partition is started, no other partitions can be started, because
there are no resources available. You can create other partition profiles, but the system
partition must be stopped before any other partitions can be started.
Figure 2-26 shows the creation of a full system partition. Remember that you are
resource-bounded when you create a partition that uses all the resources of the server.
We also assume that you have decided which virtualization technology to use for access
to the network and storage. We discuss these existing options in this book:
– NPIV for Virtual Fibre Channel Adapter or vSCSI
– Internal disks for storage (limits mobility)
– Storage pools if required (limits mobility)
– IVE as a Logical Host Ethernet Adapter (HEA) versus a Shared Ethernet Adapter
(SEA)
We assume that you are connected to either an SDMC or an HMC as a valid user with
proper authorization to create an LPAR.
Figure 2-26 shows the window to create a single partition system.
Figure 2-26 Creating a single partition system
Technologies: You can use any combination of these technologies, depending on your
environment.You need to understand the features, as well as the challenges that exist
for each feature. For example, using HEA and not SEA hinders LPM.
You can also use internal disks along with physical adapters. This publication does not
discuss these options, because our goal is to introduce features that make your
environment virtualized, highly available, flexible, and mobile.Chapter 2. Exploring RAS and virtualization features in more detail 53
When connected to the SDMC, select the managed server on which to create a virtual server.
You see a window similar to Figure 2-27 when you are connected and have expanded the
Hosts selection. The HMC window differs slightly. The names of the LPARs on Figure 2-27
are IBM ITSO names. You use names that are specific to your environment.
Figure 2-27 SDMC listing of available hosts
An LPAR requires the following elements:
Processors
Memory
Storage
Ethernet access to install the operating system through Network Installation Management
(NIM)
HMC for Resource Monitoring and Control (RMC) connectivity
2.10.1 Creating a simple LPAR
We now guide you through the creation of an LPAR using the SDMC. It is possible to perform
the LPAR creation operations via the HMC, HMC command-line interface, or SDMC
command-line interface. Refer to the Hardware Management Console V7 Handbook,
SG24-7491, for command-line syntax.
Managed servers are called Hosts on SDMC
LPARS are called Vitrual Servers54 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Creating an LPAR
From the Power system resource menu, select the host on which to create an LPAR by
selecting Select Actions ? Select the managed server (host) ? System
configurations ? Create Virtual Server (Figure 2-28).
Figure 2-28 Creating a virtual server on P780_02 using the SDMC
Figure 2-28 shows the creation process. This operation opens the LPAR Basic Information
tab.
Enter the following information based on your configuration:
Virtual server name (LPAR name): This is the name with which the HMC or SDMC
identifies your virtual server. This name is not the host name that is used by the
application. It is not the operating system host name. To simplify the environment, you
might make the host name and the LPAR name the same name. Example 2-13 shows an
AIX command to check if the hostname and virtual server names differ. If you plan to
migrate your virtual server using partition mobility features, you are required to have
unique names for all partitions that might be migrated. Both source and destination hosts
must have unique names.
Example 2-13 Using AIX command to see LPAR name and hostname
# lparstat -i | head
Node Name : testlpar
Partition Name : lpar1
Partition Number : 6
Type : Shared-SMT
Mode : Uncapped
Entitled Capacity : 0.30
Partition Group-ID : 32774
Shared Pool ID : 0
Online Virtual CPUs : 2
Maximum Virtual CPUs : 4
#Chapter 2. Exploring RAS and virtualization features in more detail 55
# hostname
testlpar
Virtual Server: IDID Number used by the SDMC to identify the virtual server.
Select Environment: AIX or LINUX/i/virtual I/O server.
Suspend capable: A suspend capable virtual server can be put “On Hold”. Refer to 3.1,
“Live Partition Mobility (LPM)” on page 66, which discusses LPM in detail.
Setting up memory requirements
To set up the memory requirements, select the Memory Mode. Figure 2-29 shows the
available modes.
Figure 2-29 Available Memory Modes with AME and AMS
Setting up processor requirements
To set up the processor requirements, select the processor mode. The processor can be
shared or dedicated. There is a slight difference between SDMC and HMC menus. With the
HMC, you can select the initial settings for Minimum, Desired, and Maximum processor and
memory values. You can also specify virtual processors. In the SDMC, you use a single value
to create the partition. You can later modify the values and attributes using Virtual Server
Management. With the SDMC LPAR creation wizard, the memory is configured before the
processors.
Memory mode: For memory mode, you have the following options:
Dedicated memory mode, with or without Active Memory Expansion (AME)
Shared Memory (AMS) with or without AME56 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-30 shows the initial processor settings and compares the SDMC settings with the
HMC settings.
Figure 2-30 Processor settings on SDMC (left) and HMC (right)
Setting up Ethernet adapters
The choices include Shared Ethernet Adapter (SEA) and Integrated Virtual Ethernet (IVE)
adapter. We explain the Shared Ethernet Adapter creation in 6.5.1, “Virtual I/O servers” on
page 220.
You can use the Integrated Virtual Ethernet adapter for the LPAR, but because we are
creating a mobile-capable LPAR, we use SEA and not the Host Ethernet Adapter.
Figure 2-31 on page 57 shows where to select the Virtual Ethernet V/S Host Ethernet
adapter.
Mobile-capable LPAR: For a mobile-capable LPAR, select a VLAN that is part of a Shared
Ethernet Adapter on one or more virtual I/O servers. Do not use the Host Ethernet Adapter.
The virtual I/O server can use PCI or IVE adapters.
SDMC Initial
Processor settings
HMC Initial
Processor settingsChapter 2. Exploring RAS and virtualization features in more detail 57
Figure 2-31 Virtual Ethernet
Selecting storage adapters
For the purpose of mobility and virtualization, do not use the physical adapters. An LPAR
using AMS cannot have physical adapters allocated to it.
The SDMC provides an option to let the VIO manage your virtual adapter allocations, or you
can manage them explicitly. The options that are shown in Figure 2-32 on page 58 do not
exist on the HMC. For this example, we manage the resources manually.
We explain the options that are shown in Figure 2-32 on page 58:
Virtual Disks This option allows a selected virtual I/O server to create a logical
volume as a virtual disk to be used by the LPAR. This option is not a
recommended method of creating disks for an LPAR. The virtual I/O
server becomes a single point of failure (SPOF) and the LPAR cannot
be migrated.
Physical Volumes If you use Virtual SCSI adapters, you are able to select any physical
disk that is visible to your selected virtual I/O server. The management
of the disk allocation is done via the VIO mkvdev command.
Fibre Channel This option creates an NPIV FC adapter, which is explained in detail in
2.7.2, “N_Port ID Virtualization (NPIV)” on page 43.58 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 2-32 Virtual adapter management
Now, select the number of virtual adapter the LPAR can have (refer to Figure 2-33). This
affects the adapter ID created in the next section. Specify the number and select create. A
virtual adapter creation box is shown in Figure 2-33.
Figure 2-33 Virtual adapter selection
Although the adapter ID does not matter, the connecting adapter ID must match the VIO
connecting adapter. This creates the “path” through which the hypervisor transfers the data.
After you complete this step, you see a summary of the setup. Review the setup summary
and select Finish to complete the creation of the LPAR. See Figure 2-34 on page 59.Chapter 2. Exploring RAS and virtualization features in more detail 59
Figure 2-34 LPARS on the HMC
The created LPAR appears in the list of LPARS in the HMC, as shown in Figure 2-34, and it
also appears in the list of virtual servers in the SDMC, as shown in Figure 2-35.
Figure 2-35 Virtual Servers on the SDMC
2.10.2 Dynamically changing the LPAR configurations (DLAR)
It is possible to change resource configuration of a running LPAR, for an example, if you need
to add or remove virtual or physical adapters, or increase the allocated size of memory or
processors on a running partition. When performing a dynamic LPAR operation, decide
whether the resource must be added permanently on the LPAR or if it is temporary. If you
need the resource permanently, you must indicate that you need the resource permanently on
the LPAR profile.
Reconfiguring adapters
To add the adapter dynamically, the adapter must be available and not in use by any other
partition. If there is another partition using the adapter, the adapter can be removed using a
dynamic LPAR operation, which is shown in “A dynamic LPAR operation using the HMC” on
page 359. Do not try to allocate resources in a partition profile as required, unless they are
actually critical to the operation. Adapters that are allocated as required cannot be removed
from a running LPAR. Consider this rule when creating an LPAR profile.
Reconfiguring an adapter using SDMC
Refer to the IBM Systems Director Management Console: Introduction and Overview,
SG24-7860, for more LPAR operations using the SDMC. In this example, we only show the
adapter reconfiguration operations using the SDMC. You can also reconfigure an adapter
using the HMC.60 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Before adding an adapter, confirm which adapters are present in the LPAR, as shown on
Example 2-14. Then, continue with the steps to add the adapters to the virtual server.
Example 2-14 Listing available adapters
# lsdev -Cc adapter
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
fcs0 Available 20-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 21-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 22-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter
vsa0 Available LPAR Virtual Serial Adapter
Follow these steps to add the adapters to the virtual server:
1. Log on to the SDMC and select Virtual Servers ? Virtual LPAR ? Actions, as
summarized and shown in Figure 2-36.
Figure 2-36 Selecting Actions on an LPARChapter 2. Exploring RAS and virtualization features in more detail 61
2. The Actions menu appears. Select System Configuration ? Manage Virtual Server, as
shown on Figure 2-37.
Figure 2-37 Managing an LPAR (Virtual Server) configurations
3. Now, we add a virtual adapter. The SDMC differentiates between the physical I/O adapters
and the virtual adapters. The storage adapters refer to PowerVM virtual adapters. If you
are adding physical adapters, you select the physical I/O. In this case, we use virtual
devices. Select Storage Adapters ? Add. Figure 2-38 shows adding a virtual adapter.
Figure 2-38 Adding a virtual adapter dynamically62 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4. Click Apply. If you do not select Apply, the dynamic LPAR operation is rolled back.
5. On the LPAR command line, run the cfgmgr command to configure the dynamically added
adapter, as shown in Example 2-15.
Example 2-15 Configuring an adapter with the cfgmgr command
# lsdev -Cc adapter
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
fcs0 Available 20-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 21-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 22-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter
vsa0 Available LPAR Virtual Serial Adapter
#
# cfgmgr
# lsdev -Cc adapter
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
fcs0 Available 20-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 21-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 22-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter
vsa0 Available LPAR Virtual Serial Adapter
vscsi0 Defined Virtual SCSI Client Adapter
vscsi1 Available Virtual SCSI Client Adapter
# cfgmgr
Method error (/usr/lib/methods/cfg_vclient -l vscsi0 ):
0514-040 Error initializing a device into the kernel.
6. If the adapter must be added permanently, you can add it into the profile. One way is to
save the current configurations from the system Virtual Server Management tab. Select
Tasks ? Save current configurations, as shown in Figure 2-39 on page 63.
Virtual I/O server connection: Figure 2-38 on page 61 shows the connecting device
information. For a VIO client, you select the virtual I/O server to which this device
connects. If this device was created for a virtual I/O server, the heading of the list box is
“Connecting virtual server”, but the drop-down list contains other LPARs and not virtual
I/O servers.
Note: The cfgmgr command in Example 2-15 shows a method error on vscsi0, “vscsi1
has an associated Virtual I/O adapter and shows available and not defined”.
The resource that was created requires a connecting resource on a virtual I/O server,
and that resource does not exist. You must have a Virtual Server adapter for each Client
adapter. Refer to IBM Systems Director Management Console: Introduction and
Overview, SG24-7860.
Note: The adapter reconfiguration using the HMC is shown in “A dynamic LPAR
operation using the HMC” on page 359. We used the steps that are shown in the
appendix to dynamically remove the adapter that was added in this example.Chapter 2. Exploring RAS and virtualization features in more detail 63
Figure 2-39 Permanently saving a dynamic LPAR operation
Memory and processor DLPAR operations
When performing operations on your managed server, certain resources are used by the
hypervisor. The system must have enough available resources to allocate to the LPAR. If you
are removing resources from the LPAR, this requirement does not apply. The dynamic LPAR
(DLPAR) operation for both memory and processor reallocation is similar to the storage
adapter example that was shown previously, except that you select Processor or Memory
instead of the storage adapters.
If there are no resources available to allocate, you can reduce resources from other LPARs
using dynamic LPAR, and then move them to the requesting partition.64 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 65
Chapter 3. Enhancing virtualization and
RAS for higher availability
This chapter covers details of the RAS and virtualization features of the IBM Power Systems
Enterprise Servers to help provide high systems availability to the applications hosted in
these servers.
In this chapter, we describe the following topics:
Live Partition Mobility (LPM)
WPAR
Partition hibernation
IBM SystemMirror PowerHA
IBM Power Flex
Cluster Aware AIX (CAA)
Electronic services and electronic service agent
366 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3.1 Live Partition Mobility (LPM)
Live Partition Mobility is a PowerVM feature that allows you to move an LPAR from one
physical Power system server to another physical Power system server. The following
process shows the manual steps that are required to move a partition if you do not have LPM
functionality. You can skip the manual steps and go directly to 3.1.1, “Partition migration” on
page 67.
We list the steps to move a partition from one server to another server. This process is
cumbersome and requires manual user intervention. We explain the following steps in more
detail in 4.2.7, “Planning for Live Partition Mobility (LPM)” on page 115:
1. Create a new LPAR (client LPAR) on the destination server.
2. Create client virtual adapters for the LPAR on the destination server.
3. Dynamically create server virtual adapters for the virtual I/O server for the client.
4. Update the destination virtual I/O server profile with the same virtual adapters that are
added dynamically.
5. Either create client VTD mappings if using virtual SCSI (vSCSI) for storage or create and
zone N_Port ID Virtualization (NPIV) Fibre Channels (FCs). The disks being mapped must
be the exact disks that are used by the moving partition on the source server.
6. Shut down the moving partition from the source server.
7. Start the partition that you created on the destination.
8. Remove the source LPAR virtual adapter from the virtual I/O server both dynamically and
on the profile.
9. Remove the VTD mappings and NPIV mapping on the source virtual I/O server.
10.Remove the client partition from the source server.
LPM automates these steps, making the process faster because the manual intervention is
minimized.
Using LPM offers many advantages, including the ability to migrate a partition without having
to stop the application. The IBM PowerVM Live Partition Mobility, SG24-7460, publication
explains LPM in detail. This IBM Redbooks publication also provides the reasons to use LPM
and the advantages of LPM. The advantages include server consolidation, saving electricity,
the ability to perform scheduled maintenance of the server, such as firmware upgrades, and
the ability to handle growth where the current system cannot handle the load. For example,
you can move a workload from a POWER6 550 to a POWER 780 by using LPM.
We only briefly introduce the LPM technology in this book, because the IBM PowerVM Live
Partition Mobility, SG24-7460, publication explains LPM in detail. The Redbooks publication
further details all components that might be involved in the mobility process, several of which
might be transparent to the administrator. The following components are involved in the
mobility process:
Systems Director Management Console (SDMC) or Hardware Management Console
(HMC)
Resource Monitoring and Control (RMC)
Moving a non-PowerVM environment: It is possible to move a non-PowerVM
environment, which we explain in the 4.6, “Migrating from POWER6 to POWER7” on
page 134. This procedure is still valid for systems prior to POWER6.Chapter 3. Enhancing virtualization and RAS for higher availability 67
Dynamic LPAR resource manager
Virtual Asynchronous Service interface (VASI)
Time reference
Mover service partition
Power hypervisor
Partition profile
Virtual I/O server
Migration road map for LPARs
The following methods are used to migrate a partition from one server to another server. We
list the methods in order of our preference and recommendation. We discuss these methods
in 4.2.7, “Planning for Live Partition Mobility (LPM)” on page 115.
Active migration
Inactive migration
Manual migration using virtual I/O server but not the LPM option
Manual migration by pointing SAN volumes to the new server
Manual migration using alt_disk_copy
Manual migration using mksysb
High-availability clusters versus LPM
LPM complements cluster technologies, but it does not provide cluster functionality. Table 3-1
explains the differences.
Table 3-1 Clusters versus LPM
3.1.1 Partition migration
Partition Mobility uses two methods:
Inactive: Where the services are stopped and the LPAR is shut down
Active: Services remain running
The SDMC/HMC controls whether the migration is active or inactive based on the partition
status. If the partition is shut down, it assumes an inactive migration. If it is running, it
assumes an active migration. The SDMC/HMC also performs a validation. If the requirements
for LPM are not met, the migration is not attempted.
Two methods exist for performing the migration:
Point in time: This option is administrator driven and requires an administrator to be
connected to the HMC and to follow a simple wizard.
Cluster LPM
Handles unplanned failures of applications,
servers, and components.
Handles planned migration of LPARs from one
server to another server
Operating system is not the same image. Same operating system image
Update and changes must be applied on each
node on the cluster.
Same operating system image
Basic operating system configurations, such as
the IP addresses, might not be the same.
Same operating system image68 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Automated: This option can be incorporated and scheduled into system management
scripts.
3.1.2 Migration preparation
In both active and inactive migrations, you must ensure that you meet the prerequisites before
LPM can be executed. Table 3-2 lists the prerequisites.
Prepare the SDMC/HMC for migration. The SDMC/HMC controls the migration process. It
copies LPAR profile information from the source server to the destination server’s hypervisor.
Use Dynamic LPAR operations to remove physical or dedicated devices. Depending on
the LPAR profile, it might not be possible to dynamically remove specific resources,
especially physical I/O adapters that are marked as required. In this case, you have to shut
down the partition.
Run the migration validation. At the validation stage, no resource changes are done.
When the validation is complete, the SDMC/HMC on the destination server creates a shell
partition that is similar to the source server. This shell partition ensures that the required
resources are reserved.
Only partition profiles that have been activated can be migrated. To make sure that a profile is
capable of being migrated, activate that profile. You do not need to start the operating system.
You can activate the profile into System Management Services (SMS) mode and then shut it
down again.
Selecting a capable virtual I/O server
Because a virtual I/O server is a requirement for the migration process, you can confirm if
there is a capable virtual I/O server by running the migration validation. An example of
validation is shown on Figure 3-1 on page 71. You can also use the lslparmigr command.
Selecting a capable mover service
At least one virtual I/O server in the same VLAN as the moving partition must be selected as
a mover service partition. This mover service partition is a requirement for migrating an active
partition.
Table 3-2 lists items to consider before attempting the migration. Use the table as a checklist
or quick guide. Although these considerations are listed under the LPM discussion, most of
them apply to manual migration, as well.
Table 3-2 Migration readiness checklist
Consideration Requirements Valid for
POWER6 and later Confirm that all devices are
supported between source and
destination
Active and inactive migrations
CPU and memory resources Make sure that the destination
server has available resources
Active migration reserves
resources. An inactive
migration does not need to
reserve resources.
Server on battery Source server can be running
on battery power. The
destination server cannot be
running on battery power.
Destination server onlyChapter 3. Enhancing virtualization and RAS for higher availability 69
Logical memory block size
(LMB)
Same size between source and
destination server
Only active migration
Barrier synchronization
registers (BSR)
Active migration LPAR must not
use BSR.
Only active migration
Large pages Active mobile LPAR cannot use
large pages. Inactive LPARs
can use large pages.
Only active migration
Storage pools Not supported Active and inactive migrations
Logical Volume VTDs Not supported Active and inactive migration
SAN Additional settings must be
confirmed, including reserve
policy
Active and inactive. Must be
vSCSI or NPIV
Internal disks Not supported Active and inactive migration
IVE Supported only of it is part of a
shared Ethernet adapter (SEA)
on a VIO. All clients to use SEA.
Both active and inactive
Other physical adapter Remove all physical adapters
using DLAR
Active migration. These
adapters are removed with
inactive migration
VLAN Both virtual I/O servers must
access the VLAN used by the
mobile LPAR
Active and inactive migration
LPAR name Must be unique across the
source and destination
Active and inactive migration
LPAR state Depends on migration Shut down for inactive and
running for active
HMC communication with
virtual I/O server
RMC communication is
required between HMC and a
managed service provider.
Only active migration
Inter HMC/SDMC ssh keys Allows one managed server to
communicate with another
without prompting for a
password
Remote HMC/SDMC
HMC communication with
moving partition
RMC communication is
required between HMC and
LPAR to get memory state
Only active migration
HMC versions Specifies required version:
HMC Version 7 Release 7.1.0
SP1 or later
Active and inactive migration
Virtual I/O server as an MSP At least one VIO on the server
and destination must be used
as an MSP partition
Only valid for active migration
Virtual I/O server version Minimum 2.1.12 or higher
required for on POWER7
Remote migration
Active and inactive migration
Consideration Requirements Valid for70 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3.1.3 Inactive migration
Before performing an inactive migration, check that the partition is shut down. Inactive
migration does not need a mover service partition. Check that you meet the requirements that
are specified in Table 3-2 on page 68. Notice that inactive migration requires fewer
prerequisites than active migration. Refer to 6.3.1, “Inactive migration from POWER6 to
POWER7 using HMC and SDMC” on page 212 where an example of inactive migration is
performed.
3.1.4 Active migration
Active migration is performed with running clients. A Mover Service Partition, which can be
any virtual I/O server on both the source and destination system, is required to keep the
services available. The SDMC/HMC copies the physical memory to the destination server.
LPM keeps the applications running. Regardless of the size of the memory that is used by the
partition, the services are not interrupted, the I/O continues accessing the disk, and network
connections keep transferring data. The IBM PowerVM Live Partition Mobility, SG24-7460,
publication lists the following memory states and the migration sequence. LPAR active run
time states that are copied to the destination server include the following information:
Partition’s memory
Hardware page table (HPT)
Processor state
Virtual adapter state
Non-volatile RAM (NVRAM)
Time of day (ToD)
Partition configuration
State of each resource
The mover service partitions on the source and destination, under the control of the
SDMC/HMC, move these states between the two systems. See the flow indicators in
Figure 3-1 on page 71.
For active partition migration, the transfer of the partition state follows this path:
1. From the mobile partition to the source system’s hypervisor
2. From the source system’s hypervisor to the source mover service partition
3. From the source mover service partition to the destination mover service partition
4. From the destination mover service partition to the destination system’s hypervisor
5. From the destination system’s hypervisor to the partition shell on the destination system
Type of LPAR Only AIX and Linux. Virtual I/O
server and IBM i servers cannot
be migrated. Ensure that the
version supports the hardware.
Active and inactive migration
Consideration Requirements Valid forChapter 3. Enhancing virtualization and RAS for higher availability 71
Figure 3-1 Active migration partition state transfer path
We show migration scenarios in 6.3, “Live Partition Mobility (LPM) using the HMC and SDMC”
on page 212 and in 6.4, “Active migration example” on page 216.
3.2 WPAR
The IBM PowerVM offering extends the virtualization further to include software virtualization.
This function is called Workload Partition (WPAR). A logical partition (LPAR) is a
hardware-based partitioning feature that allows you to create multiple independent operating
system environments, which are called LPARs. Each LPAR can run a version of either AIX,
Linux, or IBM i. A WPAR is built within an AIX partition. Therefore, WPARs are
software-created, virtualized operating system environments within a single instance of the
AIX operating system.
Each WPAR is seen as a separate operating system that is independent of any other WPAR
within the same LPAR. We differentiate between a WPAR and an LPAR by referring to the
LPAR in which a WPAR operates as the Global Environment, because the LPAR has a global
view of all resources, and the WPARs are created within the LPAR. Each WPAR hosts
applications that are invisible to other WPARs within the same Global Environment. The
Global Environment is an AIX LPAR. We suggest that you do not use the LPAR to host
applications while it hosts WPARs.
You define a hypervisor profile in the Global Environment. Devices are attached to the Global
Environment with hardware resources. You can see the global/LPAR in the system partition
list on the HMC or Virtual Server on the SDMC. You cannot see a WPAR on the HMC/SDMC.
The Global Environment has full control of the WPARs. But the WPARs cannot view the global
WPARs on the same LPAR and cannot overlap. One WPAR is not aware of any other WPAR
in the same AIX Global Environment. Only by using standard TCP/IP can two or more WPARs
communicate; although, the actual communication is a loopback.
We describe the level of granularity by using the following hierarchy:
Managed server: This server is a Power System. In our environment, it is an IBM
POWER7.
An LPAR: Within a managed server, you create one or more LPARs. Each partition is
capable of running its own operating system space. The operating system version might
differ per LPAR, which is called the Global Environment.
WPAR: The concept of a WPAR was introduced in AIX 6.1.72 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
A WPAR: Inside a single running copy of AIX, you can create partitions (WPARs). These
partitions have the same characteristics as an independent LPAR. All WPARs must be on
the same operating system level as the global LPAR. Detached/rootvg WPARS have a
separate /usr and /opt, and can have software installed in them that is not part of the LPAR
(Global Environment).
An LPAR can be created on an IBM Power server starting from POWER4 and later. A WPAR
does not depend on the hardware. It is a software virtualization feature. The minimum
operating system requirement is AIX Version 6.1 or later. Thus, if you can install AIX Version
6.1 on a Power server, you can set up a WPAR.
An application sees a WPAR as an LPAR, and the WPAR still has the following characteristics
displayed in LPARs:
Private execution environments
Isolation from other processes outside the WPAR
Dedicated network addresses and filesystems
Interprocess communication that is restricted to processes executing only in the same
Workload Partition
Because a system WPAR can be viewed as an independent operating system environment,
consider separating the WPAR system administrator from the Global system administrator.
3.2.1 Types of WPARs
This section describes the kinds of WPARs.
System WPAR
A System WPAR has the following characteristics:
A System WPAR is a flexible, complete copy of an AIX instance.
It has a mixture of shared and dedicated file systems.
Each system WPAR has separate init processes, daemons, users, resources, file systems,
user IDs, process IDs, and network addresses. Applications and interprocess
communication (IPC) are restricted to processes running in the same workload partition.
Can be attached or detached WPARs:
– Attached System WPARs have shared /opt and /usr with the Global Environment.
– Detached System WPARs have dedicated /opt and /usr as the Global Environment.
Application WPAR
An Application WPAR is a process-based workload environment. It starts when a process is
called and stops when the process terminates. The Application WPAR cannot be detached
from the Global Environment. Applications in workload partitions are isolated in terms of
process and signal, and they can be isolated in the file system space.
An Application WPAR has the process isolation that a System WPAR provides, except that it
shares file system namespace with the Global Environment and any other Application WPAR
that is defined in the system. Other than the application itself, a typical Application WPAR
runs an additional lightweight init process in the WPAR.
Figure 3-2 on page 73 shows a diagram that was adapted from the Exploiting IBM AIX
Workload Partitions, SG24-7955, publication. Refer to this publication for more information
about WPARs.Chapter 3. Enhancing virtualization and RAS for higher availability 73
Figure 3-2 Three WPARs within an LPAR
WPARs have improved since AIX 6.1. Most of the improvements, as well as detailed
information, are discussed in Exploiting IBM AIX Workload Partitions, SG24-7955. Table 3-3
summarizes the improvements since WPARs in AIX 6.1.
Table 3-3 WPAR improvements with the AIX 6.1 and later operating systems
AIX version WPAR improvement
AIX 6.1 Base Level (GA) Initial support, including mobility using synchronous
checkpoint/restart
First WPAR manager release
AIX 6.1 TL1 Network File System (NFS) support for WPAR
AIX 6.1 TL2 Asynchronous mobility
Per-WPAR routing
Name-mapped network interfaces
Network Installation Management (NIM) support for WPAR
AIX 6.1 TL3 Storage disk devices support
AIX 6.1 TL4 rootvg WPAR
SAN mobility
WPAR manager integration with IBM Systems Director
VxFS support
AIX 6.1 TL5 WPAR Error Logging Framework (RAS)
AIX 6.1 TL6 Virtual SCSI (vSCSI) disk support
WPAR migration to AIX 7.1
Shared
Pool
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Virtual I/O Server
vS vE
MEM
MEM
MEM
MEM
MEM
vS vE
MEM
MEM
WPAR
WPAR
WPAR
vS
vE
E-mail
Database
Accounting
Linux Web
LPAR
AIX
6.1
LPAR
vS
vE
vS
vE
SAN Ethernet MEM
H
Y
P
E
R
V
I
S
O
R
3
WPARS
The
Global
LPAR74 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Advantages of WPARs over LPARs
WPARs have the following advantages over LPARs:
WPARs are much simpler to manage, and they can actually be created from the AIX
command line or through SMIT unlike the LPARs.
It is a requirement to install patches and technology upgrades to every LPAR. Each LPAR
requires its own archiving strategy and disaster recovery strategy. However, this
requirement is not the same with a WPAR, because it is part of a single LPAR.
As many as 8,000 WPARs can be created in a single LPAR, which means that 8,000
applications can run in an isolated environment.
Rather than a replacement for LPARs, WPARs are a complement to LPARs. WPARs allow you
to further virtualize application workloads through operating system virtualization. WPARs
allow new applications to be deployed much more quickly, which is an important feature.
3.2.2 Creating a WPAR
In this section, we show an example of creating, starting, and stopping a WPAR, as shown in
Example 3-1. Creating this WPAR took one minute and 15 seconds. Example 3-1 only shows
the beginning and ending output lines of the WPAR creation process.
For further WPAR administration and management, refer to Chapter 4 of Exploiting IBM AIX
Workload Partitions, SG24-7955.
Example 3-1 Creating a WPAR on a global called rflpar20
# uname -a
AIX rflpar20 1 6 00EE14614C00
# mkwpar -n wpartest1 -N address=172.16.21.61 netmask=255.255.252.0
mkwpar: Creating file systems...
/
/home
/opt
/proc
/tmp
/usr
/var
Mounting all workload partition file systems.
x ./usr
x ./lib
x ./admin
x ./admin/tmp
x ./audit
x ./dev
x ./etc
x ./etc/check_config.files
x ./etc/consdef
x ./etc/cronlog.conf
x ./etc/csh.cshrc
AIX 7.1 Base Level (GA) Everything that is supported in AIX 6.1, plus Fiber Channel
(FC) adapter support, Versioned WPARs running AIX 5.2, and
Trusted Kernel extension support
AIX version WPAR improvementChapter 3. Enhancing virtualization and RAS for higher availability 75
.
.
. A few lines were skipped
.
.
rsct.core.hostrm 3.1.0.1 ROOT COMMIT SUCCESS
rsct.core.microsensor 3.1.0.1 ROOT COMMIT SUCCESS
syncroot: Processing root part installation status.
syncroot: Installp root packages are currently synchronized.
syncroot: RPM root packages are currently synchronized.
syncroot: Root part is currently synchronized.
syncroot: Returns Status = SUCCESS
Workload partition wpartest1 created successfully.
mkwpar: 0960-390 To start the workload partition, execute the following as root:
startwpar [-v] wpartest1
#
After creating the WPAR, you can see it and start it, as shown in Example 3-2.
Example 3-2 Listing and starting the WPAR
# lswpar
Name State Type Hostname Directory RootVG WPAR
-----------------------------------------------------------------
wpartest1 D S wpartest1 /wpars/wpartest1 no
#
#
#
# startwpar wpartest1
Starting workload partition wpartest1.
Mounting all workload partition file systems.
Loading workload partition.
Exporting workload partition devices.
Starting workload partition subsystem cor_wpartest1.
0513-059 The cor_wpartest1 Subsystem has been started. Subsystem PID is 6553710.
Verifying workload partition startup.
#
#
# lswpar
Name State Type Hostname Directory RootVG WPAR
-----------------------------------------------------------------
wpartest1 A S wpartest1 /wpars/wpartest1 no
Because a WPAR behaves like a normal LPAR, we can access it, as shown in Example 3-3.
Example 3-3 Accessing a WPAR using telnet
# telnet 172.16.21.61
Trying...
Connected to 172.16.21.61.
Escape character is '^]'.
telnet (wpartest1)76 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
AIX Version 6
Copyright IBM Corporation, 1982, 2010.
login:
In Example 3-4, we show a few AIX commands that have executed within the WPAR.
Example 3-4 Executing AIX commands within a WPAR
AIX Version 6
Copyright IBM Corporation, 1982, 2010.
login: root
*******************************************************************************
* *
* *
* Welcome to AIX Version 6.1! *
* *
* *
* Please see the README file in /usr/lpp/bos for information pertinent to *
* this release of the AIX Operating System. *
* *
* *
*******************************************************************************
# hostname
wpartest1
# ifconfig -a
en0:
flags=1e080863,480
inet 172.16.21.61 netmask 0xfffffc00 broadcast 172.16.23.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
Global 196608 138576 30% 1845 11% /
Global 65536 63808 3% 5 1% /home
Global 884736 555544 38% 8394 11% /opt
Global - - - - - /proc
Global 196608 193376 2% 10 1% /tmp
Global 5505024 1535416 73% 41390 18% /usr
Global 262144 110016 59% 2797 19% /var
# exit
Connection closed.
In Example 3-5, we stop and remove the WPAR.
Example 3-5 Stopping and removing a WPAR
# stop wpar wpartest1
ksh: wpar: Specify a process identifier or a %job number.
# stopwpar wpartest1Chapter 3. Enhancing virtualization and RAS for higher availability 77
Stopping workload partition wpartest1.
Stopping workload partition subsystem cor_wpartest1.
0513-044 The cor_wpartest1 Subsystem was requested to stop.
stopwpar: 0960-261 Waiting up to 600 seconds for workload partition to halt.
Shutting down all workload partition processes.
wio0 Defined
Unmounting all workload partition file systems.
# rmwpar wpartest1
rmwpar: Removing file system /wpars/wpartest1/var.
rmlv: Logical volume fslv03 is removed.
rmwpar: Removing file system /wpars/wpartest1/usr.
rmwpar: Removing file system /wpars/wpartest1/tmp.
rmlv: Logical volume fslv02 is removed.
rmwpar: Removing file system /wpars/wpartest1/proc.
rmwpar: Removing file system /wpars/wpartest1/opt.
rmwpar: Removing file system /wpars/wpartest1/home.
rmlv: Logical volume fslv01 is removed.
rmwpar: Removing file system /wpars/wpartest1.
rmlv: Logical volume fslv00 is removed.
3.2.3 Live Application Mobility (LPM)
WPARs also have the capability to be actively relocated (or migrated) from one AIX LPAR to
another AIX LPAR. This process is called Live Application Mobility. Live Application Mobility
refers to the ability to relocate a WPAR from one Global AIX LPAR to another Global AIX
LPAR. It uses checkpoint/restart capabilities that allow the WPAR to hold the application
state. The Global LPAR can be on the same server or a separate server, which shows the
flexibility and portability of a WPAR. Live Application Mobility is an operating system feature
and independent of the hardware. Because a WPAR resides over an LPAR, migrating the
LPAR to a separate server using LPM also migrates the WPARs within the LPAR. The WPAR
has three ways of migrating:
Explicit migration of a WPAR to a separate LPAR (Global Environment) within the same
server
Explicit migration of a WPAR to a separate LPAR
Implicit migration of a WPAR due to the Global Environment migration using LPM. In this
case, the WPAR remains part of the same LPAR.
Any hardware that can run AIX 6.1 is supported for Live Application Mobility. That is, you can
migrate a WPAR from a Global Environment running on a POWER5 server to a Global
Environment running on a POWER7 server and vice versa. LPM is a POWER6 and POWER7
feature.
Mobility can be inactive or active. In an inactive migration, a WPAR has to be shut down. In an
active migration, the WPAR is migrated while the applications are active.
Explicit mobility can be performed in one of two ways
You can perform explicit mobility by using either an NFS-mounted WPAR migration or a
rootvg WPAR:
NFS-mounted WPAR migration
The relocation of a WPAR involves moving its executable code from a source LPAR to a
destination LPAR while keeping application data on a common Network File System (NFS) 78 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
that is visible and accessible to both the source and destination LPARs. AIX operating
system binaries can be stored in file systems that are local to the hosting LPAR.
Figure 3-3 shows an NFS-mounted WPAR migration.
Figure 3-3 NFS-mounted WPAR migration
The setup that is shown in Figure 3-3 consists of two LPARs, NODE_A and NODE_B, and
NODE_C. NODE_A is the source LPAR, which hosts WPAR1. Node_B is the destination
LPAR to which WPAR1 will be migrated. Node_C is an NFS server that is required to
support workload partition mobility. Before creating WPAR1, create its file systems (/,
/var, /tmp, /home, and the application file systems) on the NFS server. These file systems
are exported to Node_A, Node_B, and WPAR1. While creating WPAR1, its file systems
are mounted on Node_A and WPAR1. When WPAR1 migrates from Node_A to Node_B,
its file systems are mounted on Node_B and unmounted from Node_A. In this way, the
WPAR migration has to rely on common NFS file systems that are hosted on a separate
NFS server.
However, the drawback of this setup is that certain applications might not support data
access over NFS. Also, to eliminate the need of NFS services, the concept of “rootvg
WPAR” was introduced.Chapter 3. Enhancing virtualization and RAS for higher availability 79
rootvg WPAR
In Figure 3-4, SAN disks are allocated to both Node A and Node B. The disks are shared
across both nodes. The SAN storage disks are assigned to the System WPARs while
creating the WPAR itself. The root file systems (/, /usr, /opt, /home, /tmp and /var file
systems) of the WPAR are created on the storage disks that are assigned to the WPAR.
Figure 3-4 SAN-based WPAR mobility
WPAR1 is assigned the disks from the SAN subsystem, and these disks are also seen by
Node B. Here, the migration can be done from NODE A to NODE B without needing NFS
services. This kind of setup has been supported since AIX 6.1 TL4.
3.3 Partition hibernation
POWER7 provides another virtualization feature: hibernation or suspend/resume. With LPM,
the memory state is transferred from one server hypervisor to another server hypervisor.
Hibernation takes the memory state and stores it on a non-volatile storage device, which
provides the ability to suspend a running partition and restart it at a later stage. On resume,
the hypervisor reads the memory structures from a virtual I/O server back into the partition so
that all applications that were running can continue where they left off. Resuming a partition is
not limited to a single server. You can suspend a partition on one server, move it with inactive
partition mobility, and resume it on another server.
WPAR Manager: WPAR migration requires the use of WPAR Manager, which is a plug-in
to the IBM Systems Director and the SDMC.80 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
For example, you suspend a data warehouse partition on a Power 795 to run payroll. The
payroll runs longer than expected and you need the data warehouse server up and running.
You can migrate the suspended data warehouse partition to a Power 780, and resume it
there.
You might suspend a partition for the following reasons:
Long running applications. The batch runs can be suspended during online periods and
resumed at the end of the day.
Disruptive firmware upgrades. Remember, updates are not disruptive.
Disruptive hardware maintenance. CEC Hot Add Repair Maintenance (CHARM) allows
certain hot hardware maintenance.
LPARs use IVE and physical adapters that cannot be migrated due to LPM prerequisites,
or if you do not have a server with spare capacity to which to migrate.
Previously, you stopped the process and started over.
Consider these factors before implementing partition hibernation:
HMC Version 7 Release 7.3
POWER7 server
The partition can be running on either POWER6 or POWER7 mode
Set up the partition as suspendible
AIX 7.1 SP1 and AIX 6.1 TL6 SP1. Linux and IBM System i are not supported.
PowerVM Virtual I/O Server 2.2.0
Storage pools (used to save the partition state)
No huge pages
The virtual I/O server cannot be suspended, but it can be restarted while clients are
suspended, because the memory states are on non-volatile storage.
To configure partition hibernation, use the following steps:
1. Confirm that your server is capable of suspend and resume. On the HMC, select System
Management ? Servers? Select the server you want. Click Properties ? Capabilities.
See Figure 3-5 on page 81.
Active Memory Sharing: If used with Active Memory Sharing (AMS), plan for a paging
device that is larger than the AMS memory storage pool. The suspend resume uses the
same paging devices that are used for AMS, which is also used for hibernation.Chapter 3. Enhancing virtualization and RAS for higher availability 81
Figure 3-5 System capabilities for suspend resume
2. Make sure that the LPAR is suspend-capable. Select LPAR ? Properties. See
Figure 3-6.
Figure 3-6 Suspend and resume option for the LPAR
Suspend and resume is a dynamic feature, and it can be enabled or disabled on a running
partition. After you confirm the system capability and LPAR ability to be suspended, you need
to configure the storage pools on one or more virtual I/O servers. IBM PowerVM Virtualization
Managing and Monitoring, SG24-7590, explains this concept in detail with examples.82 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3.4 IBM SystemMirror PowerHA
PowerHA provides a highly available cluster environment that enhances business continuity.
PowerHA provides an infrastructure that enables mission-critical resources to remain
available in case of failures. It also allows the quick recovery of an application to another
server if one or more servers fail.
Several factors make PowerHA a beneficial utility:
PowerHA is a mature cluster technology (was the High Availability Cluster
Multi-Processing (HACMP™) technology since 1992)
It can be deployed on standard hardware. If the hardware supports the required version of
AIX, IBM i, or Linux, PowerHA can be installed.
PowerHA allows you to change many configurations without having to shut down the
cluster, thus eliminating planned downtime. This concept is known as Dynamic Automatic
Reconfiguration (DARE). PowerHA is flexible.
PowerHA complements virtualization technologies, including Capacity on Demand (CoD),
WPAR, and LPM. These technologies are introduced and documented throughout this
publication.
PowerHA monitors the resources within its control. In the failure of any resource, PowerHA
takes an appropriate action to either restart the resource or move the resource to another
node in the cluster. Another node is typically on a separate server. Node failure,
application failure, and component failures are monitored by PowerHA.
3.4.1 Comparing PowerHA with other high-availability solutions
We provide a technical comparison between PowerHA and other high-availability solutions.
PowerHA with fault-tolerant systems
Fault-tolerant systems are costly and use specialized hardware and software. PowerHA uses
standard hardware and software. PowerHA can be deployed on a system that possesses
RAS features, such as the Power 795 and 780.Chapter 3. Enhancing virtualization and RAS for higher availability 83
Figure 3-7 shows the cost and benefit of the available technologies.
Figure 3-7 Cost/benefit graph of high-availability technologies
PowerHA is also compared with software-based clusters
It is not our intention to compare PowerHA with software clusters. PowerHA provides a
platform for both software-capable clustering and applications that do not have clustering
capability to be highly available. Most cluster software takes advantage of the PowerHA
infrastructure.
For example, if you have an application, which was developed in-house and does not have
clustering capabilities, and the application is critical, you can identify the components of the
application. Create a start script, stop script, and a script to monitor the state of the
application as you normally do on a single system. Integrate the application into PowerHA,
and you have an in-house-developed high-availability solution.
Choosing the type of cluster affects how you will set up the resource groups.
Planning for PowerHA
Planning and design form the basis of reliable environments. Chapter 4, “Planning for
virtualization and RAS in POWER7 high-end servers” on page 101 discusses how to
eliminate single points of failures (SPOFs). The chapter covers most of the planning
requirements for PowerHA.
Consider these factors:
Nodes Names and number of nodes. Refer to the Cluster Aware AIX (CAA)
documentation regarding node names. Note that the naming
convention affects the names that are used when creating the
PowerHA cluster.
Networks Multicast IP address, number of interfaces, supported/unsupported
networks, and IPv6.
SAN PowerHA recommends multipathing.
CAA Cluster repository for the CAA. We discuss CAA in 3.6, “Cluster Aware
AIX (CAA)” on page 89. Notice that the cluster repository does not
support Logical Volume Manager (LVM) Mirroring.84 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
LVM Mirroring for volume groups.
Resource group File system, logical volumes, volume groups, and network and
application servers under the control of PowerHA.
Cluster type Mutual failover, standby, one-sided takeover, mutual takeover, and
multi-tiered applications (both nodes active).
Resource groups contain the resources that PowerHA keeps highly available.
PowerHA management and configuration
You can use either the System Management Interface Tool (Smitty), command line, or a
plug-in to IBM Systems Director. From our experiences, we recommend IBM Systems
Director and Smitty. PowerHA 7.1 has a method that disables certain smitty commands, such
as chfs, chlv, and chgrp. You can overwrite this feature, but we do not advise that you do.
The PowerHA 7.1 SMIT menu differs slightly from PowerHA 6.1 and prior versions. A list of
these changes is included in the IBM PowerHA SystemMirror 7.1 for AIX, SG24-7845,
publication.
3.4.2 PowerHA 7.1, AIX, and PowerVM
PowerHA 7.1 introduces many improvements from PowerHA 6.1 and prior versions. Most of
the improvements come from changes and improvements to the base AIX operating system.
AIX Version 7 is enhanced to be able to maintain a few nodes using CAA capabilities. CAA is
introduced in 3.6, “Cluster Aware AIX (CAA)” on page 89. Most of the PowerHA monitoring
infrastructure is part of the base operating system. For earlier versions of AIX to take
advantage of PowerHA 7.1, CAA is included in AIX 6.1 TL6. CAA requires Reliable Scalable
Cluster Technology (RSCT) 3.1.
PowerHA classically has subnet requirements and needs a number of interfaces, because
PowerHA monitors failures and moves IP addresses to a surviving interface prior to moving to
the next node. AIX provides methods of monitoring the network adapter using EtherChannel,
which is implemented either as link aggregation or a network interface backup. The use of
EtherChannel eliminates the need for configuring multiple interfaces because this
requirement is taken care of by implementing EtherChannel.
The virtual I/O server provides a method of creating a Shared Ethernet Adapter Failover (SEA
Failover), which allows the virtual I/O server to provide required redundancy. An example of
an SEA configuration is shown in 6.5.3, “NIB and SEA failover configuration” on page 223. An
SEA configuration also removes the need to create multiple interfaces in the PowerHA
configuration.
Refer to the IBM PowerHA SystemMirror 7.1 for AIX, SG24-7845, and PowerHA for AIX
Cookbook, SG24-7739, publications to get started with PowerHA or to migrate to
PowerHA 7.1.
3.5 IBM Power Flex
Power Flex was introduced with POWER7 and is a multi-system Power 795 infrastructure
offering from IBM. It provides a highly available and flexible IT environment to support
large-scale server consolidation and an enterprise’s most demanding business resiliency
objectives. Power Flex is designed to enable you to more easily use your purchased Chapter 3. Enhancing virtualization and RAS for higher availability 85
processor and memory activations across a pool of 2 - 4 Power 795 systems. This flexibility
leads to the increased utilization of the resources and to enhanced application availability.
3.5.1 Power Flex Overview: RPQ 8A1830
Power Flex has these highlights:
Supports multi-system infrastructure support for active-active availability
Allocates and rebalances processor and memory resources
Uses LPM for flexible workload movement
Delivers seamless growth with Capacity on Demand (CoD)
Includes On/Off processor days for extra capacity
Power 795 servers in a Power Flex environment are allowed to share large portions of their
virtual processor and memory resources to provide capacity where it is most needed and to
best support application availability during occasional planned system maintenance activity.
Power Flex consists of two to four Power 795 systems, each with four or more 4.0 GHz or 4.25
GHz processor books, and 50% or more permanent processor and memory activations to
support its applications. Capacity higher that 25% on these systems can be used as a Flex
Capacity Upgrade on Demand resource and rebalanced to and from another Power 795
system in the same Power Flex pool of systems, up to twelve times per year.
Power Flex enablement has these prerequisites:
All servers within a Power Flex capacity pool must have equivalent IBM hardware
maintenance status.
Each feature code 4700 processor book installed must include enough feature code 4713s
(core activation feature) so that a minimum of 50% of the total number of processor cores
that are configured on the server are activated. For example, you can have 16 x feature
code 4713 (1-core feature) for each feature code 4700 processor book, or 8 x feature code
4713 for each feature code 4700, if TurboCore mode (feature code 9982) is specified.
The total number of RPQ 8A1830 that is installed on the server must equal the total
number of feature code 4700 processor books installed.
The Power Flex attachment and supplement must be signed and returned to enable the
960 On/Off processor credit days per book and to allow the rebalancing of resources on
the system to which RPQ 8A1830 applies.
A summary of the Power Flex details are shown in Figure 3-8 on page 86.86 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 3-8 Power Flex details
3.5.2 Power Flex usage options
PowerFlex options are designed to allow your multi-system virtualization infrastructure to
provide a highly available and flexible IBM Power Systems IT environment.
Maintaining application availability
Flex Capacity Upgrade on Demand, working in conjunction with PowerVM Live Partition
Mobility, can help clients maintain application availability more affordably during planned
maintenance activities on a Power Flex system. Up to four times per year, a client can request
to temporarily activate inactive resources on one Power Flex system to support the planned
maintenance activities on another Power Flex system. While virtual processor and memory
resources are not being used on a system that is being maintained, they can be used on
another system for productive use without having to first be deactivated on the system being
maintained.
Any resources activated as part of an advanced planning event are to be deactivated within
seven days.
Important: Advanced planning event capacity (key) requests to the Power CoD project
office require a minimum of two business days to ensure the receipt of the activation and
deactivation codes for a system, prior to commencing planned maintenance activities.Chapter 3. Enhancing virtualization and RAS for higher availability 87
Workload rebalancing with Power Flex Capacity Upgrade on Demand
Flex Capacity Upgrade on Demand (CUoD) processor and memory activations on a Power
Flex system can be temporarily rebalanced to be allowed to execute on another installed
Power Flex system within the same enterprise and country. Each Power 795’s Flex CUoD
resources are the processor and memory activations above 25% of its total capacity. These
resources can be rebalanced up to 12 times per year to execute on another Power Flex
system in the same pool to support changing capacity requirements.
Unique to a Power Flex environment, rebalancing capacity can be activated on a target
system prior to the capacity being deactivated on its donor system to better facilitate any
transition of applications from one system to another system. While resources on one system
are activated, corresponding resources are to be deactivated in the donating system within
seven days.
Rebalanced processor activation resources are not permanently moved to another system.
They are temporarily allowed to execute on systems within a Power Flex capacity pool, yet
they are retained on an initial system for inventory and maintenance purposes. Power Flex
merely allows clients to make use of these CUoD resources on more than a single system.
Any rebalanced activations are to be reconciled with inventory records in the event that the
system is upgraded or sold and must be returned to the original system upon any lease
termination. See Figure 3-9 on page 88 for an example of Power Flex in action.
Utility computing via On/Off processor days
Each Power Flex system ships with a quantity of included On/Off Capacity on Demand
processor days (approximately 60 days of the inactive resources on each purchased Power
Flex 64/128-core system, or 960 days per 32-core processor book). These On/Off processor
days are enabled via the normal Capacity on Demand resource enablement process and
contracts. The On/Off days are credited to the client’s account upon completion of initial CoD
and Power Flex contracts. They can be used at a client’s discretion to provide utility
computing for short-term projects, workload spikes, or in the event of an immediate
maintenance activity where an advanced planning event or rebalancing request has not been
requested.
Important: Requests for Flex Capacity Upgrade on Demand activation/deactivation keys
are initiated via e-mail to the Power CoD project office (pcod@us.ibm.com) at least two
business days in advance of rebalancing activity.88 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 3-9 Depicts Power Flex in action
IBM Capacity on Demand offerings
The On/Off Capacity on Demand processor days that are included with Power Flex are
separate from the IBM Capacity on Demand (CoD) offerings. With the IBM CoD offerings, you
can dynamically activate one or more resources on your POWER7 server as your business
activity peaks dictate. You can activate inactive processor cores or memory units that are
already installed on your server on a temporary and permanent basis. Inactive processor
cores and inactive memory units are resources that are installed as part of your server, but
they are not available for use until you activate them.
Table 3-4 provides a brief description of each CoD offering. For additional information, review
the Power Systems Capacity on Demand document:
https://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7ha2/p7ha2.pdf
Or, consult your IBM sales representative or IBM Business Partner.
Table 3-4 CoD offerings from IBM
CoD offering Description
Capacity Upgrade on Demand Permanently activate inactive processor cores and memory units
by purchasing an activation feature and entering the provided
activation code.
Trial Capacity on Demand Evaluate the use of inactive processor cores, memory, or both, at
no charge using Trial CoD. After it is started, the trial period is
available for 30 power-on days.Chapter 3. Enhancing virtualization and RAS for higher availability 89
3.6 Cluster Aware AIX (CAA)
Cluster Aware AIX (CAA) services and tools are among the most important new features in
AIX 7.1. AIX 7.1 is the first AIX release to provide for built-in clustering. The latest editions of
PowerHA SystemMirror and PowerVM are designed to exploit the CAA cluster infrastructure
to facilitate high availability and advanced virtualization capabilities. Administrators are now
able to create a cluster of AIX systems using features of the AIX 7 kernel. IBM introduced the
“built-in” clustering capabilities to AIX OS to simplify the configuration and management of
highly available clusters and high availability. Cluster Aware AIX functionality is primarily
intended to provide a reliable, scalable clustering infrastructure for products, such as
PowerHA SystemMirror and PowerVM. The new AIX clustering capabilities are designed to
offer these benefits:
Significantly simplify cluster construction, configuration, and maintenance
Improve availability by reducing the required time to discover failures
Offer capabilities, such as common device naming for shared devices, to help optimize
administration
Provide built-in event management and monitoring
Offer a foundation for future AIX capabilities and the next generation of PowerVM and
PowerHA SystemMirror
AIX 7 runs on our latest generation of Power processor POWER7 systems, as well as
systems based on POWER4, POWER5, and POWER6. Most of the new features of AIX 7 are
available on earlier Power processor-based platforms, but the most capability is delivered on
systems built with the POWER6 and POWER7 processors.
CAA is not designed as a high-availability replacement for PowerHA SystemMirror, but it does
change the way in which AIX integrates with cluster solutions, such as PowerHA (HACMP).
IBM’s mature RSCT technology is still an important element of AIX and PowerHA
configurations. IBM PowerHA now uses components of CAA, instead of RSCT, to handle the
cluster topology, including heartbeats, configuration information, and live notification events.
PowerHA still communicates with RSCT Group Services (grpsvcs replaced by cthags), but
PowerHA has replaced the topsvcs (topology services) function with the new CAA function.
CAA reports the status of the topology to cthags by using Autonomic Health Advisory File
System API (AHAFS) events, which are fed up to cthagsrhosts. Refer to Figure 3-10.
On/Off Capacity on Demand Activate processor cores or memory units for a number of days by
using the HMC to activate resources on a temporary basis.
Utility Capacity on Demand Used when you have unpredictable, short workload spikes.
Automatically provides additional processor capacity on a
temporary basis within the shared processor pool. Use is
measured in processor minute increments and is reported at the
Utility CoD website.
Capacity BackUp Used to provide an off-site, disaster recovery server using On/Off
CoD capabilities. The offering has a minimum set of active
processor cores that can be used for any workload and a large
number of inactive processor cores that can be activated using
On/Off CoD in the event of a disaster. A specified number of
no-charge On/Off CoD processor days are provided with Capacity
BackUp.
CoD offering Description90 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 3-10 Cluster aware AIX exploiters
IBM Reliable Scalable Cluster Technology (RSCT) is a set of software components that
together provide a comprehensive clustering environment for AIX and Linux. RSCT is the
infrastructure that is used by a variety of IBM products to provide clusters with improved
system availability, scalability, and ease of use. RSCT includes daemons, which are
responsible for monitoring the state of the cluster (for example, a node network adapter,
network interface card (NIC), or network crash) and coordinates the response to these
events. PowerHA is an RSCT-aware client. RSCT is distributed with AIX. The following list
includes the major RSCT components:
Resource Monitoring and Control (RMC) subsystem, which is the scalable, reliable
backbone of RSCT. It runs on a single machine or on each node (operating system image)
of a cluster and provides a common abstraction for the resources of the individual system
or the cluster of nodes. You can use RMC for single system monitoring, or for monitoring
nodes in a cluster. In a cluster, however, RMC provides global access to subsystems and
resources throughout the cluster, thus providing a single monitoring/management
infrastructure for clusters. It is also used for dynamic LPAR, sfp, invscout, and so on.
RSCT core resource managers. A resource manager is a software layer between a
resource (a hardware or software entity that provides services to another component) and
RMC. A resource manager maps programmatic abstractions in RMC into the actual calls
and commands of a resource.
RSCT cluster security services provides the security infrastructure that enables RSCT
components to authenticate the identities of other parties.
The Topology Services subsystem provides node/network failure detection in certain
cluster configurations.
The Group Services subsystem provides cross-node/process coordination in certain
cluster configurations.
RSCT Version 3.1 is the first version that supports Cluster Aware AIX (CAA). RSCT 3.1 can
operate without CAA in a “non-CAA” mode.
IBM
Director
DB2
TSA HMC
IBM
Storage
HPC
PowerHA
System Mirror
VIOS
RSCT Consumers
Legacy RSCT
Bundled Resource Managers
Group Services Resource Mgr. Services
Messaging
API
Monitoring
API
Cluster Admin
UI
Cluster
Messaging
Cluster CFG
Repository
Cluster
Monitoring
RSCT With Cluster Aware AIX
Bundled Resource Managers
Group Services Resource Mgr. Services
Messaging
API
Monitoring
API
Cluster Admin
UI
Cluster
Messaging
Cluster CFG
Repository
Cluster
Monitoring
Redesigned Layers Integrated to CAA Capabilities
Legacy AIX
Cluster Aware AIX
CAA APIs and UIs
Cluster
Repository
Cluster
Monitoring
Cluster
Messaging
Cluster
EventsChapter 3. Enhancing virtualization and RAS for higher availability 91
You use the non-CAA mode if you use one of the following products:
PowerHA versions before PowerHA 7.1
A mixed cluster with PowerHA 7.1 and prior PowerHA versions
Existing RSCT Peer Domains (RPD) that were created before RSCT 3.1
A new RPD cluster, when you specify during creation that the system must not use or
create a CAA cluster
Figure 3-11 shows both modes in which RSCT 3.1 can be used (with or without CAA). On the
left diagram, you can see how the non-CAA mode works, which is equal to the older RSCT
versions. The diagram on the right side shows the CAA-based mode. The difference between
these modes is that Topology Services has been replaced with CAA.
Figure 3-11 RSCT 3.1 modes
The use of CAA on AIX 6.1 TL 6 is enabled only for PowerHA 7.1 and not for earlier versions.
3.6.1 Cluster Aware AIX Services
Cluster Aware AIX (CAA) is set of services and tools embedded in AIX to help you manage a
cluster of AIX nodes and help you run cluster software on AIX. CAA services provide these
functions:
CAA configuration and database
Cluster verification that is performed when the cluster is defined or modified.
Important: RSCT 3.1 is available for both AIX 6.1 and AIX 7.1. To use CAA, for RSCT 3.1
on AIX 6.1, you must have TL 6 or later installed.
Resource
Manager
Group
Services
(grpsvcs)
Topology
Services
Resource
Manager
Group
Services
(cthags)
AIX AIX
CAA
Resource
Monitoring and Control
Resource
Monitoring and Control
RSCT
RSCT
RSCT without CAA RSCT with CAA92 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
CAA communication
Communication between nodes within the cluster is achieved using multicasting over the
IP-based network and also using storage interface communication through FC and
serial-attached SCSI (SAS) adapters.
CAA monitoring (nodes, networks, and storage):
– All monitors are implemented at low levels of the AIX kernel and are largely insensitive
to system load.
– All communication interfaces are monitored: network and storage.
CAA device-naming services:
– When a cluster is defined or modified, AIX interfaces automatically create a consistent
shared device view.
– When managed by Cluster Aware AIX, device files that are associated with the disks
shared across the nodes in the cluster have a common name across the nodes in the
cluster that have access to the disks.
Global device names, such as cldisk1, refer to the same physical disk from any cluster
node.
CAA cluster-wide event management:
– AIX event infrastructure allows event propagation across the cluster.
– Applications can monitor events from any cluster node.
CAA cluster-wide command distribution:
– Many of the security and storage-related AIX commands are enhanced to support the
operation across the cluster.
– The clcmd command provides a facility to distribute a command to a set of nodes that
are cluster members.
3.6.2 Cluster Aware AIX event infrastructure
AIX event infrastructure for AIX and AIX Clusters, which was introduced in AIX 6.1, provided
an event monitoring framework for monitoring predefined and user-defined events.
Enhancements in AIX 7.1 include support for cluster-wide event notifications for certain
events (for example, network and disk errors) with continuous monitoring and additional
producers. An event is defined as any change of a state or a value that can be detected by the
kernel or a kernel extension at the time that the change occurs. The events that can be
monitored are represented as files in a pseudo file system named the Autonomic Health
Advisor FileSystem (AHAFS). Cluster Aware AIX generates granular storage and network
events that are used by PowerHA to provide for better decision making for high-availability
management.
Four components make up the AIX event infrastructure (refer to Figure 3-12 on page 93):
The kernel extension implementing the pseudo file system
The event consumers that consume the events
The event producers that produce events
The kernel component that serves as an interface between the kernel extension and the
event producersChapter 3. Enhancing virtualization and RAS for higher availability 93
Figure 3-12 AIX event infrastructure components
The Cluster Aware AIX event infrastructure is designed to provide these functions:
Event framework for monitoring events efficiently and without the need for polling
System events are defined as a change in state or value, which can be detected in the AIX
kernel or kernel extensions as it occurs.
A pseudo-file system named Autonomic Health Advisor FileSystem (AHAFS):
– The AHAFS kernel extension was first introduced in AIX 6.1 TL 04 (October 2009)
fileset bos.ahafs.
– Further extensions were included in AIX 6.1 TL 06 and AIX 7.
– Loadable kernel extension and root mount of the AHAFS file system, for example:
mount ahafs /aha /aha
– In-memory only file system allocated from a pinned heap.
– Monitoring applications can use standard file system interfaces (for example, open,
write, select, read, close, and so on) to perform monitoring instead of having to use a
special set of APIs.
Monitor the health, security, and RAS of AIX:
– Events triggered by an event producer must originate from either the kernel or in a
kernel extension.
– Event producers can dynamically register and unregister themselves with the AHAFS
framework.94 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
– The authorization to monitor specific events is determined by each event producer.
– Detailed information about an event (stack trace, user, and process information) is
provided.
– Control is handed to the AIX event infrastructure at the exact time the event occurs.
The IBM AIX Version 6.1 Differences Guide, SG24-7559 contains detailed information about
Cluster Aware AIX functionality.
3.7 Electronic services and electronic service agent
Electronic services is an IBM support approach, which consists of the Electronic Service
Agent™ (ESA) and the electronic services website, as shown in Figure 3-13. The ESA
automatically monitors and collects hardware problem information and sends this information
to IBM support. It can also collect hardware, software, system configuration, and performance
management information, which might help the IBM support team to assist in diagnosing
problems. IBM electronic services reaches across all IBM systems in all countries and
regions where IBM does business. Electronic services can provide the electronic support
relationship for a single machine environment or a multinational complex of many servers.
Figure 3-13 Electronic services overview
Electronic service agent (ESA) is a no-charge software tool that resides on your system to
continuously monitor events and periodically send service information to IBM support on a
user-definable time table. This tool tracks and captures service information, hardware error
logs, and performance information. It automatically reports hardware error information to IBM Chapter 3. Enhancing virtualization and RAS for higher availability 95
support as long as the system is under an IBM maintenance agreement or within the IBM
warranty period. Service information reporting and performance information reporting do not
require an IBM maintenance agreement and do not need to be within the IBM warranty period
to be reported. Information that is collected by the ESA application is available to IBM service
support representatives to help them diagnose problems.
Previous ESA products were unique to the platform or operating system on which they were
designed to run. Because the ESA products were unique, each ESA product offered its own
interface to manage and control the ESA and its functions. Because networks can have
separate platforms with separate operating systems, administrators had to learn a separate
interface for each separate platform and operating system in their network. Multiple interfaces
added to the burden of administering the network and reporting problem service information
to IBM support.
ESA now installs on platforms that are running separate operating systems. It offers a
consistent interface to ESA functions, reducing the burden of administering a network with
various platforms and operating systems. ESA is operating system specific. Each operating
system needs its own compatible version of ESA. To access ESA user guides, go to the
electronic services website and select Electronic Service Agent on the left side of the
navigation page. In the contents pane, select Reference Guides ? Select a platform??Select
Operating System or Software.
On your POWER7 platform, you can have one or more operating systems. No matter how
many partitions are configured, or which operating systems are running, the IBM ESA must
be installed and activated on each partition, operating system, and HMC or SDMC.
For system inventory reporting, Resource Monitoring and Control (RMC) must be configured
in the partition. Additional activation of ESA on the partitions sends back OS-specific (AIX or
IBM i) and software inventory data.
You configure ESA on AIX 5.3, 6.1, and 7.1 from the command line by entering smit
esa_main and then selecting Configure Electronic Service Agent.
3.7.1 Benefits of ESA for your IT organization and your Power systems
ESA offers the following electronic services benefits for both your IT organization and
systems:
No additional charge for systems under warranty or maintenance agreements
Helps achieve higher availability and shorter downtime
Automatically contacts IBM support
Immediately uploads error logs
Faster diagnosis and time to repair
Automatically gathers and reports required system information by ESA, thus reducing data
entry errors or the risk of misreading system information
ESA: For HMC-controlled or SDMC-controlled environments, ESA must be activated on
the HMC or SDMC for hardware error reporting.
Important: It is important to activate ESA on every platform, partition, and Hardware
Management Console (HMC) or Systems Director Management Console (SDMC) in your
network to get the maximum coverage and utilization of the ESA capabilities.96 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Less personnel time providing and gathering information and reporting problems
Routes calls to the correct resource the first time with the required information to provide
an end-to-end, automated, closed loop support process
Access to web delivered services, such as viewing ESA information and tracking and
managing reported problems
Standard install for Power 780 and Power 795 systems
3.7.2 Secure connection methods
IBM knows that your security and information privacy are extremely important. No client
business data is ever transmitted to IBM through ESA. We provide several secure
connectivity methods from which you can choose:
Internet
VPN
Dial-up
We provide both proxy and authenticating firewall support when using ESA: security
protocols, including https (SSL and Transport Layer Security (TLS)) and 128-bit encryption
that uses encryption keys, certificates, and tokens. ESA and Call-Home follow the industry
standards for protecting data during network transport by using the TLS protocol. ESA and
Call-Home also protect your Call-Home and IBM support accounts by generating unique
passwords for these accounts. Call-Home uses protected channels, for example, TLS and
VPN, to transfer data from the HMC to IBM support. The channels provide confidentiality and
integrity protection for the data that is sent between the two entities. Figure 3-14 on page 97
shows the various connectivity methods that are available for ESA.
ESA: ESA enablement is a prerequisite for POWER7 systems performing CEC hot node
add, hot node upgrade (memory), hot node repair, or hot GX adapter repair. It is also
required for Power Flex enablement. ESA-enabled systems show improved concurrent
operations results.Chapter 3. Enhancing virtualization and RAS for higher availability 97
Figure 3-14 Call home information paths
ESA has no inbound capability. It cannot accept incoming connection attempts. ESA initiates
a connection with IBM support, and then IBM support replies. IBM support never initiates a
connection to the ESA.
IBM provides secure storage for all data that is transmitted using ESA. Your system
information is stored in a secure database behind two firewalls and is accessible by you with a
protected password. The database is accessible only by authorized IBM support
representatives. All access to the database is tracked and logged, and it is certified by the
IBM security policy.
The IBM Power 780 or 795 system has an attached HMC or SDMC, so there are additional
considerations when using ESA. For HMC/SDMC managed environments, ESA must be
activated on the HMC/SDMC for hardware error reporting.
The HMC and the SDMC include their own versions of ESA. ESA on the HMC and SDMC
monitors the system and AIX, IBM i, and Linux partitions for errors, and ESA reports these
errors to IBM. It also collects and reports hardware service information and performance
management information to IBM support. ESA on a partition does not collect hardware
information; it collects other service information, such as software information.
To access the ESA user guide for HMC, go to the electronic services website and select
Electronic Service Agent from the left navigation page. In the contents pane, select
Reference Guides ? a platform ? Operating System or Software.
To activate the ESA for problem hardware reporting from your HMC, as shown in Figure 3-15
on page 98, perform the following steps:
1. Log in to your HMC interface.
2. Select Guided Setup Wizard.
Client LAN
AT& T
Global
Network
Internet
Internet
Internet
Proxy
Server Customer
Firewall
Customer
Firewall
AT&T
Firewall
IBM
Firewall
IBM
Firewall
IBM
Firewall
Gateway
Gateway
VPN
Server
IBM
Firewall
IBM
Firewall
IBM
Firewall
IBM
Servers
IBM
Servers
IBM
Servers
INTERNET
Client LAN
Client LAN
VPN
MODEM98 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3. Select Yes to launch the Call-Home Setup Wizard, as shown in Figure 3-15.
Figure 3-15 Launching the Call-Home Setup Wizard
Figure 3-16 on page 99 shows the welcome page for configuring ESA on your HMC.
Worksheet: A preinstallation configuration worksheet is available to assist you with the
identification of prerequisite information:
https://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ec
q/arecqconfigurethehmc.htm Chapter 3. Enhancing virtualization and RAS for higher availability 99
Figure 3-16 Electronic Service Agent welcome pane
You can also review and control whether Call-Home requests can be created for the HMC or a
managed system by choosing Service Management ? Connectivity ? Enable Electronic
Service Agent.
You can also configure ESA from your SDMC. IBM Electronic Services Support using
Automation and Web Tools, SG24-6323, is an excellent source of information about using
IBM electronic services.100 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 101
Chapter 4. Planning for virtualization and
RAS in POWER7 high-end
servers
This chapter provides information about the suggested planning to help you enable your
Power System servers (Power 780 and Power 795) to exploit the RAS and virtualization
features.
In this chapter, we describe the following topics:
Physical environment planning
Hardware planning
CEC Hot Add Repair Maintenance (CHARM)
Software planning
HMC server and partition support limits
Migrating from POWER6 to POWER7
Technical and Delivery Assessment (TDA)
System Planning Tool (SPT)
General planning guidelines for highly available systems
4102 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4.1 Physical environment planning
In this section, we introduce points to consider before installing the IBM Power Systems
hardware. We introduce important considerations when deploying a reliability, availability, and
serviceability (RAS)-capable server, such as the Power 795 and Power 780. It is not our
intention in this chapter to provide a comprehensive environmental planning guide. We
exclude the site planning and assume that it is already completed. Concepts that are not in
the scope of this book include power and cooling, raised floors, air distribution, shock, and
vibrations.
Insufficient planning can affect the effectiveness of the RAS features of the intended
hardware. You must include the following items in the planning of the environment where the
server will be installed:
Site planning
Power distribution units (PDU)
Networks and switches
SANs and SAN switches
4.1.1 Site planning
A site inspection must be performed before a server is installed. After the site inspection, a
floor plan must be updated with a clearly marked location for the server and its expansion
drawers. If the expansion is not considered in the beginning, you might need to move the
machine at a later stage. This move can cause downtime to users unless IBM Live Partition
Mobility (LPM) is used. See 3.1, “Live Partition Mobility (LPM)” on page 66. For example, if
you share a rack between a 780 and other systems, you must plan for up to 16Us in case you
need to expand, even if you only purchased two system enclosures.
When considering a site, include the following factors:
Floor construction.
Access routes and entrances.
Floor space, which also must include working space for future maintenance of the server.
Distances from the network and SAN switching devices.
In case continuous operation is required, you must also have another separate site
available if you totally lose this site. This second site must be part of business continuity
and disaster recovery planning.
Other environmental considerations.
For a comprehensive list of all site planning considerations, consult the hardware pages on
the information center:
https://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp)
4.1.2 Power and power distribution units (PDUs)
Power and air conditioning form part of system planning. Refer to the handbook for your
system to ensure that your environment meets the energy requirements, as well as the air
conditioning requirements.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 103
Power installation
Consider using an uninterruptible power supply that is designed to provide clean power
wherever possible. Depending on the power management and distribution panel design and
the manufacturer of your power management, clean power ensures surge protection and
avoids blackouts and excess current. Using an uninterruptible power supply is not required for
AC-powered systems, but it is recommended.
PDU installation
To take advantage of RAS and availability features, including the ability to implement CEC Hot
Add Repair Maintenance (CHARM), as introduced in 4.3, “CEC Hot Add Repair Maintenance
(CHARM)” on page 123, remember to avoid single points of failure (SPOFs). It is advisable to
ensure that you install PDUs from two separate power distribution panels. See Figure 4-1. In
the case of a Power 780, you have to make sure that each power supply is connected to a
separate PDU.
Figure 4-1 Installing from dual power sources
Optional battery backup: The Power 795 has an optional battery backup that can be
used to complete transactions and shut the server down gracefully in the case of a total
power loss.
LAN 2
System A System B
LPAR A1
LPAR A2
LPAR A3
LPAR B1
LPAR B2
LPAR B3
FSP FSP
1 2 1 2
eth0
LAN 1
PDU
eth0
HMC 1 HMC 2
PDU
UPS 1
Power
Distribution
Panel
UPS 2
Power
Distribution
Panel104 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4.1.3 Networks and storage area networks (SAN)
Network and SAN design falls outside of the scope of this publication. It is beneficial,
however, for the administrator to understand and suggest the network connection between
the managed server and the switches. Consider the following factors when connecting the
managed server:
For system management connectivity between the managed server and IBM Systems
Director Management Console (SDMC) or Hardware Management Console (HMC). Make
sure that each SDMC/HMC port connects to a separate switch. The port connected to the
first SDMC/HMC switch must be on a separate virtual LAN (VLAN) from the port on the
second switch. One SDMC or one HMC with the required code level must be connected to
each of the two VLANs. A dedicated management server port must be in a VLAN with only
one SDMC/HMC connecting to it for Dynamic Host Configuration Protocol (DHCP)
requests.
For public Ethernet connection, at least four Ethernet cables must be connected to either
PCI slots or to the Integrated Virtual Ethernet Adapter (IVE) if you use dual virtual I/O
servers, or at least a pair must be connected to each network switching device. For link
aggregation, remember to consult the switch manufacturer’s documentation for its
capabilities. EtherChannel network interface backup (NIB) does not dictate how the
connections are made to the switch or switches. If you are using a dual virtual I/O server,
remember that each virtual I/O server uses redundant adapters connected to redundant
switches to avoid either a switch, a cable, or an adapter single-point-of-failure (SPOF).
SAN connection follows the same guidelines: Two cables per virtual I/O server, each on a
separate host bus adapter (HBA) and each HBA connected to a separate SAN switch.
Be careful when using dual port adapters because the adapter can be a SPOF.
Example 4-1 shows a typical installation list.
Example 4-1 Typical list for initial installation of a POWER7 system
Between SDMC/HMC and Management server
2 N/W Ports and 2 Cables connected to the Servers 2 SDMC/HMC ports : No IPs needed
DHCP
For LPARS : viopok1 and viopok2, 4 Ports, 2 on NWSwitch1 and 2 on NWSwitch2, 4
Cables accordingly 2 per VIO NIB or Link Aggregation
4 SAN ports, cabled to HBAs intended for VIO servers
The list in Example 4-1 helps to connect the managed server in the format that is shown in
Figure 4-2 on page 105.
The diagram shows the following design:
Two enclosures for CHARM
Two SAN switches recommended
Two network switches recommended
Two management consoles (HMC or SDMC)Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 105
Notice how carefully the ports and cables are allocated to the virtual I/O server.
Figure 4-2 Setting up the Power7 to provide redundancy
4.2 Hardware planning
Physical hardware planning involves the placement of the hardware components to ensure
that the RAS features, as well as the virtualization features, can be exploited. Certain
reliability features are embedded within the server. There is no need to cater for them. These
reliability features include processes, reliability retries, caches, memory, such as chipkill, and
clock cards. The system takes care of these components transparently.
We do not intend to dictate the way that all environments are designed, but we do however
attempt to provide preferred practices. Before any system installation or changes to the
current system, use the System Planning Tool (SPT) to validate the intended configuration.
We describe the SPT in 2.6.3, “Deployment using the System Planning Tool (SPT)” on
page 40. Download the STP from this website:
http://www.ibm.com/systems/support/tools/systemplanningtool
The placement of the systems, racks, and cables affects the ability to implement CHARM, as
discussed in 4.3, “CEC Hot Add Repair Maintenance (CHARM)” on page 123. When cabling,
Power
Supply
P1-E1
Power
Supply
P1-E2
IB
P1-C2-T2
IB
P1-C2-T1
HMC2
P1-C1-T7
SPCN2
P1-C1-T9
Serial
Port
P2-T7
SPCN1
P1-C1-T8
HMC1
P1-C1-T6
IVE 1
P2-C8-
T1
IVE 2
P2-C8-
T2
C6
IB
P1-C3-T2
IB
P1-C3-T1
IVE 3
P2-C8-
T3
IVE 4
P2-C8-
T4
FSP FSP FSP
FSP
FSP
Un-P1- C1 C2 C3 C4 C5 USB HEA
PHB0 PHB1 PHB2 PHB3 PHB0 PHB1
P2-T5
P2-T6
GX Bus 0 GX Bus 1
Power
Supply
P1-E1
Power
Supply
P1-E2
IB
P1-C2-T2
IB
P1-C2-T1
HMC2
P1-C1-T7
SPCN2
P1-C1-T9
Serial
Port
P2-T7
SPCN1
P1-C1-T8
HMC1
P1-C1-T6
IVE 1
P2-C8-
T1
IVE 2
P2-C8-
T2
C6
IB
P1-C3-T2
IB
P1-C3-T1
IVE 3
P2-C8-
T3
IVE 4
P2-C8-
T4
FSP FSP FSP
FSP
FSP
Un-P1- C1 C2 C3 C4 C5 USB HEA
PHB0 PHB1 PHB2 PHB3 PHB0 PHB1
P2-T5
P2-T6
GX Bus 0 GX Bus 1
vios1 vios2
vios1
vios2
vios1
vios2
vios2 SDMC HMC
Replace
SDMC
VLAN
HMC
VLAN
vios1
SDMC106 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
remember that the cables must have enough length “slag”. Cable length must be long enough
to allow an IBM service support representative (SSR) to pull out the components of the
system that need to be serviced or replaced. Sufficient cable length is even more important if
more than one Power system share the same rack. Use the provided cable arms to lead the
cables so that they do not end up in disarray. Normally, using the provided cable arms to lead
the cable ensures that the cable has enough length.
System planning
To use RAS features on an enterprise Power server, you need to have at least two system
enclosures. These enclosures are interconnected with flexible symmetric multiprocessor
(SMP) cables and service processor cables. These cables are designed to support scalability
(hot add), hot repair, and concurrent maintenance. Figure 4-3 and Figure 4-4 on page 107
have been taken from the IBM Redpaper™ publication, IBM Power 770 and 780 Technical
Overview and Introduction, REDP-4639.
Refer to this Redpaper for more information regarding the cabling of the IBM Power 780. This
IBM Redpaper also covers installation of the GX++ adapters. Figure 4-3 shows the SMP
cable installation.
We used the following steps in installing a four-node Power 780 (Refer to Figure 4-3):
1. The first system enclosure was installed on the topmost 4U of the rack. For example, if the
rack is empty, you install the first system enclosure on U12-16. In this manner, you reserve
the first 12Us for growth and avoid having to relocate in the future. This design is only a
guideline and a preferred practice from our experiences.
2. Install a system enclosure Flex Cable from system enclosure 1 to system enclosure 2.
3. Add a third system enclosure Flex Cable from system enclosure 1 and system enclosure 2
to system enclosure 3.
4. Add a fourth node Flex Cable from system enclosure 1 and system enclosure 2 to system
enclosure 4.
Figure 4-3 SMP cable installation order
1T 2T 3T 2T
4T
5T 3T 2T
4T 6T
1B 2B
2B
3B 4B
3B 4B 7T
5B 7B 6B
2BChapter 4. Planning for virtualization and RAS in POWER7 high-end servers 107
Figure 4-4 shows the Flex Cable installation.
Figure 4-4 Flexible Service Processor (FSP) Flex cables
4.2.1 Adapters
To take advantage of CHARM, RAS, or high-availability features, the number of adapters
connected to the server is important. For example, PCIe adapters are hot swappable, but the
devices on the slot are not available when the adapter is maintained. You must have
redundant adapters to avoid losing service. Many publications, including the IBM PowerHA
publications, provide advice about the number of adapters, types, and placement of the
adapters to take advantage of RAS features, as well as high-availability offerings. Although
suggestions exist, an important concept is introduced to help eliminate single points of failure
(SPOF).
SPOF concepts are explained in detail in the PowerHA publications. Both SAN and Ethernet
adapters must have multiple ports. The use of multiple ports on a single adapter for the same
environment might hinder RAS operation. Although the use of multiple ports on a single
adapter might provide better throughput in the case of link aggregation for Ethernet and
multi-pathing for the HBA (host bus adapter), the loss or failure of the adapter affects both
ports. Loss or failure of the adapter also affects both ports on the Integrated Virtual Ethernet
(IVE), which can, depending on the feature code, provide a total of 64 logical Ethernet ports
called Local Host Ethernet Adapters (LHEA).
Choice of internal storage
When using internal disks, remember to use Logical Volume Mirroring (LVM) or mirrors. A
number of choices are available to consider regarding disk types and placements. Types of
disks range from storage area network (SAN), Small Computer System Interface (SCSI),
IP-based SCSI (iSCSI), Serial Storage Architecture (SSA), serial-attached SCSI (SAS), and
solid-state drives (SSD). Mirroring between SSD and other disk types is not supported. Refer
to 2.7, “I/O considerations” on page 41 regarding storage considerations.
External I/O
If you need to implement an external I/O subsystem, use multiple GX++ busses. Using I/O
devices that are connected to single I/O drawer is limited to a single system. These devices
cannot be used for Logical Partition Mobility (LPM). And, these devices cannot be used in
clustering that depends on operating system mirroring LVM. Wherever possible, use
SAN-attached disks. IBM offers a range of external storage from which you can choose. 108 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
We list several types of external storage offered by IBM, as well as links for more details:
IBM System Storage® N series
http://www.ibm.com/systems/storage/network
IBM System Storage DS3000 family
http://www.ibm.com/systems/storage/disk/ds3000/index.html
IBM System Storage DS5020 Express
http://www.ibm.com/systems/storage/disk/ds5020/index.html
IBM System Storage DS5000
http://www.ibm.com/systems/storage/disk/ds5000
IBM XIV® Storage System
http://www.ibm.com/systems/storage/disk/xiv/index.html
IBM System Storage DS8700
http://www.ibm.com/systems/storage/disk/ds8000/index.html
4.2.2 Additional Power 795-specific considerations
The following list provides details about considerations that are specific to Power 795 servers:
Number of cores Be careful when choosing specific Power 795 servers. Processor
books can have either 24 cores in a book or 32 cores in a book. These
processor books cannot be mixed. A 24-core processor book machine
scales up to 192 cores. The 32-core processor book machine scales
up to 256 cores. You can use the 32-core machine in MaxCore Or
Turbo Core modes. See 2.3, “TurboCore and MaxCore technology” on
page 28 for additional details.
Processors per LPAR
The maximum number of processors per LPAR.
Space requirements Depending on the workload, if you expand beyond a single primary
Power 795 rack, you might need to plan for another rack, either 24U/19
inches (5.79 meters) or 32U/24 inches (7.3 meters). Up to two
expansion racks can be added to the Power 795.
Power source Each expansion rack has redundant bulk power assemblies (BPA).
Power all racks as shown in Figure 4-1 on page 103.
I/O drawers Power 795 I/O drawers are separated into halves, which are identified
by either P1 or P2. You can run the lscfg -vl command to see the
slot number. Slots on the I/O drawers are hot swappable. Refer to
documentation to confirm what is supported by which drawer (PCI,
PCI-X, PCIe, and so on). I/O drawers can be connected to the
processor book in either single-loop or dual-loop mode. Dual-loop
mode is preferable whenever possible, because it provides the
maximum bandwidth between the I/O drawer and the processor book,
as well as independent paths to each of the I/O drawer planars.
Important: You must discuss all the previous considerations with the sales representative
prior to the configuration. A few types of expansion racks are available (powered and
non-powered). We only pointed out a few considerations. See the technical overview
documents for a list of the available drawers that are specific to the server that you intend
to deploy.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 109
4.2.3 Planning for additional Power server features
This section discusses additional features for you to consider in detail. Certain features are
standard with Power 795 and Power 780 servers. Other features might need to be enabled.
The eConfig and System Planning Tool (SPT) are used to configure and specify the features.
We discuss SPT in 4.8, “System Planning Tool (SPT)” on page 153. Consider the following
points when planning to configure your system:
Is the feature supported on the intended server? For example, Turbo core is available on
Power 780, but not in the Power 770.
Is the feature standard with the server or must it be specified on eConfig? For example,
Active Memory Sharing (AMS) is standard with PowerVM on enterprise servers, but Active
Memory Expansion (AME) must be configured and requires a specific server license.
Can the feature be enabled after configuration? Can I upgrade between Editions (Express
to Standard to Enterprise)?
What are the supported server firmware, HMC, and SDMC code levels?
What is the supported operating system level to take advantage of the features? For
example, AME requires AIX 6.1 TL 4 with SP2 or later.
What is the feature code, and how can I see, after installation, if the feature is enabled?
Other considerations:
– How dynamic is the feature enablement?
– Do I need to restart the LPAR after enabling the feature?
– Do I need to shut the LPAR down completely to reread the profile?
– Do I need to shut the server (host) down before the feature is enabled?
Consider the following Power server features.
Active memory expansion
AME is the ability to compress memory pages up to twice the size of the actual memory. We
discuss AME in 2.1.4, “Active Memory Expansion” on page 19.
Power management
We discuss power management in 2.5, “Power management” on page 36.
Active Memory Mirroring
Memory mirroring of the hypervisor is designed to mirror the main memory that is used by
the system firmware to ensure greater memory availability by performing advanced
error-checking functions. We discuss Active Memory Mirroring in 2.1.1, “Active Memory
Mirroring for the hypervisor on Power 795” on page 13.110 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4.2.4 System management planning
The following sections provide information that is recommended for system management
planning.
Management console
For the system management console (HMC or SDMC), we suggest the components:
At least one 7310-CR3 or 7310-C05 is recommended for the HMC. Several HMC models
are supported to manage POWER7 systems:
– Desktop Hardware Management Console (HMC): 7310-C05, 7310-C06, 7042-C06,
7042-C07,or 7042-C08
– Rack-mounted Hardware Management Console: 7310-CR3, 7042-CR4, 7042-CR5, or
7042-CR6
At least a 7042-CR6 with feature code 0963 is suggested for an SDMC.
The V7R710 code level or later is suggested for the HMC.
For the IBM Power 795, the licensed machine code Version 7 Revision 720 is required. For
the IBM Power 780, the licensed machine code Version 7 Revision 710 SP1 is required.
Note that an HMC is required, even if you plan to implement a full system partition server.
Director Version 6.2.1.2 or higher is suggested for the SDMC.
Check the HMC software level for compatibility with the entire configuration using the IBM Fix
Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home
Planning for the SDMC or HMC
IBM introduces the IBM Systems Director Management Console (SDMC). Due to the amount
of detail that is discussed in Chapter 5, “POWER7 system management consoles” on
page 159, we do not discuss SDMC planning in this chapter. You need to implement either an
HMC or an SDMC to manage the resources of your Power system, particularly the Power 795
and Power 780. It is worthwhile to know that the SDMC provides the functionality of the HMC,
as well as the functionality of the Integrated Virtualization Manager (IVM). If you still use the
HMC, consider using the dual HMCs per managed server.
Use Figure 4-5 on page 111 as a guideline for connecting the dual HMCs. Notice how two
VLANS are specified in the diagram. Figure 4-5 on page 111 originated in the publication
Hardware Management Console V7 Handbook, SG24-7491.
HMC code: You can download or order the latest HMC code from the Fix Central
website:
http://www.ibm.com/support/fixcentral
HMC: You must turn on the HMC before the managed server, because the managed
server requests an IP address from the HMC. If more than one HMC is on that virtual LAN
(VLAN), there is no way to know which HMC is managing the FSP. This situation also
occurs if both ports of the managed server are on the same VLAN. You might end up with
one HMC managing both ports. You have to search through all the HMCs on the same
private VLAN to find the HMC that is managing the managed server.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 111
Figure 4-5 A redundant HMC setup
The following figures show a few possible HMC connections.
For a single drawer with dual HMCs, a recommended connection is shown on Figure 4-6 on
page 112. This configuration provides dual HMC connection. It is however a single drawer
and does not allow many RAS features to be performed. Depending on the redundancy
configurations of each drawer, certain CHARM operations cannot be performed (refer to the
4.3, “CEC Hot Add Repair Maintenance (CHARM)” on page 123). Thus, there is no FSP
redundancy.112 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 4-6 A single drawer Power 780 with dual HMC
Figure 4-7 on page 113 shows a preferred dual HMC connection where two drawers are
connected. Each drawer has both HMC port 0 and port 1 connected to dual HMCs. This
configuration can display redundancy on both HMC connections, as well as loss of FSP,
depending on how the VIO servers are set up. With redundant LPAR/Virtual I/O setup, this
configuration allows you to perform all CHARM operations without loss of clients. Refer to
Figure 4-7 on page 113, which suggests how to cable dual VIO servers to provide the
required level of redundancy.
eth0
HMC
#1
eth0
HMC
#2
Hub 0
Hub 1
DRAWER #1
HMC1
(P1-C1-T6)
HMC2
(P1-C1-T7)
FSP
*
*
**Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 113
Figure 4-7 Dual HMC with redundant FSP
The version of HMC to support Power 770 and Power 780 servers is V7R710 SP1 or later.
HMC functionality has been enhanced to support new features. These features include, but
are not limited to, the following features:
Support for POWER7 class servers
Support for Active Memory Expansion
Concurrent add/repair
Redundant service processors
Removal of the limit of 128 Active Memory Sharing partitions
Before updating the HMC to the latest version, consider the following factors:
If the HMC has managed systems connected to it, and it is on the latest version, no
actions need to be taken.
If the HMC has managed systems, and the HMC is on an earlier version, upgrade the
HMC to the appropriate version:
– If the managed server’s firmware is on a supported version for the HMC, you can
simply upgrade the HMC to the required level.
– If the managed server firmware is not on a supported version for the HMC, upgrade the
server firmware to a supported level. Depending on the firmware level, you might have
to either perform a concurrent firmware update where there is no need to shut down
the managed server, or a firmware upgrade where the server needs to be shut down
DRAWER #1
HMC1
(P1-C1-T6)
HMC2
(P1-C1-T7)
FSP
eth0
HMC
#1
eth0
HMC
#2
Hub 0
Hub 1
*
*
DRAWER #2
HMC1
(P1-C1-T6)
HMC2
(P1-C1-T7)
FSP114 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
for the firmware to take effect. Consider 3.1, “Live Partition Mobility (LPM)” on page 66
as an option to minimize the effect on service-level agreements (SLA) with your clients.
If a new HMC or the POWER7 server is the first device to connect to the HMC, consider
upgrading the HMC to the latest version. Any server joining or connecting to the same
HMC at a later stage must have its firmware upgraded to the supported level.
HMC and the network
The following sections describe the HMC with the private and public network characteristics.
HMC and the private network
The HMC is used to manage the server resources. You need to perform several operations
through the HMC to manage the managed server resource allocation. The managed server
continues to function in the absence of an HMC (for high-end systems, an HMC or SDMC is
required), but server management is not possible:
You cannot increase or decrease system resources.
You cannot start up a partition that is not activated.
You cannot perform LPM functions.
In the case of losing a single HMC, reconnect a new HMC as soon as possible. Backing up
the HMC data (HMC backup or the data in the managed server FSP) allows the new HMC to
populate and rebuild the managed server farm.
HMC and the public network
Certain computer center rules do not allow administrators into the data center without
following certain rules, which makes it difficult to manage machines behind the HMC. The
HMC has a port that can be connected to the public network. Through a web browser
(previously websm), an administrator can manage the system from almost any location.
4.2.5 HMC planning and multiple networks
With the processing power and redundancy that are built into the Power 780 and Power 795,
it is common to find a few business units hosted on the same server. This is the idea behind
server virtualization. The challenge with this approach is that the HMC communicates with
the LPARs on a managed server. The HMC must establish a Resource Monitoring and
Control (RMC) connection with the LPARs, as well as the Virtual I/O servers on the machine.
If the LPARs are not on the same network as the HMC, certain functions, such as dynamic
LPAR, are not possible. Consult with the network administrator to discuss the firewalls and
open rules that allow two-way RMC port connections between the networks.
4.2.6 Planning for Power virtualization
The IBM virtualization technology provides other offerings from which you can choose.
Knowledge of these virtualization technology offerings helps you make the correct choice of
the virtualization technology to use. The following virtual technologies are provided by the
Power hypervisor:
Logical partitioning
Virtualized processors
Hypervisor IEEE VLAN-compatible virtual switches for Virtual Ethernet
Update: Although it is possible to perform a concurrent update, We advise that you
update in a negotiated maintenance window with a scheduled change.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 115
Virtual SCSI adapters
Virtual Fibre Channel (FC) adapters
Virtual console (TTY)
We discuss these virtual resources in separate chapters throughout this publication.
Most of these virtualization features are managed by the Power hypervisor, which provides a
logical mapping between the physical hardware and the virtual resources. This hypervisor
functionality requires memory and CPU cycles to function. The hypervisor uses several of the
server resources. Remember to check that there are resources available for any virtual
request, for example, dynamic LPAR operations. When there are not enough resources for a
hypervisor to perform an operation, an error appears, such as the error that is shown in
Figure 4-8.
Certain factors might affect the amount of memory that is used by the hypervisor:
Number of partitions
Number of both physical and virtual devices
Number of Virtual I/O servers created
Number of N_Port ID Virtualization (NPIV) adapters created
Partition profiles, for instance, the maximum memory that is specified on the profile
Figure 4-8 Example of hypervisor error due to resource constraints
In the next sections, we discuss planning for the other RAS features.
4.2.7 Planning for Live Partition Mobility (LPM)
Live Partition Mobility is designed to enable the migration of a logical partition (LPAR) from
one system to another compatible system. You must ensure that the mobile partition (the
partition that is to be moved) is configured correctly so that you can successfully move it from
the source server to the destination server. Live Partition Mobility has specific requirements in
terms of the operating system level, firmware level, storage layout, and network interfaces. A
successful migration, regardless of whether it is active or inactive, requires careful
deployment planning in advance. Sometimes, you can qualify a partition for mobility by taking
additional steps, such as removing physical adapters (non-virtual adapters) or using a
dynamic logical partitioning (DLPAR) operation.
The following requirements, as shown in Table 4-1 on page 116, are for the Power 780 and
795 servers to be eligible for migration. We describe all the requirements in detail later in this
chapter. There might be separate requirements for other POWER7 models, so remember to
check for the latest requirements.116 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Table 4-1 Power 780 and 795 server requirements for migration
Component Prerequisites
Hardware levels
HMC CR7310-CR3 or later or the 7310-C05
SDMC 7042-CR6 with feature code 0963
Power Systems POWER7 or POWER6
Software levels
HMC Version 7.7.1.0 or later for POWER7
SDMC Director Version 6.2.1.2 or higher
System firmware Ax710_065 or later, where x is an M for Midrange servers,
such as 780 (or MHB) and an H for Enterprise Servers, such
as 795 (or FHB)
Source and destination systems can have separate levels of
firmware, but the level on the source system must be
compatible with the destination server’s firmware.
PowerVM PowerVM Enterprise Edition must be licensed and activated on
both the source and destination servers.
Virtual I/O server Minimum of one virtual I/O server on both source and destination
systems at Release level 2.12.12 with Fix Pack 22.1 and Service
Pack 1 or later for POWER7 servers
AIX AIX Version 5.3 TL09 and Service Pack 7 or later for POWER7
AIX Verison 6.1 TL02 and Service Pack 8 or later
AIX Version 7.1
Red Hat Linux RHEL Version 5 Update 5 or later (with the required kernel security
update)
SUSE Linux SUSE Enterprise Server 10 (SLES 10) Service Pack 3 or later (with
the required kernel security update)
AMS
(Not required for LPM)
PowerVM Enterprise Edition on both systems
Firmware level EH340_075 or later
HMC Version 7.3.4 Service Pack 2 for HMC-managed systems
Virtual I/O Server Version 2.1.0.1 with Fix Pack 21
AIX 6.1 TL03 or AIX V7.1
SUSE SLES 11
NPIV (If used) HMC Version 7.3.4 or later
SDMC Version 6.2.1.2 or later
Virtual I/O Server Version 2.1 with Fix Pack 20.1
Virtual I/O Server Version 2.3.1 required for NPIV on FCoCEE
AIX V5.3 TL9 or later
AIX V6.1 TL2 SP2 or later
AIX V7.1Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 117
Live Partition Mobility (LPM) is a feature of the PowerVM Enterprise Edition. There are two
types of LPM: Active Partition Mobility and Inactive Partition Mobility. The migration process
can be performed either with a live partition by Active Partition Mobility or with a powered-off
partition by Inactive Partition Mobility.
Active Partition Mobility
Active Partition Mobility has the following characteristics:
This type of migration allows you to migrate a running LPAR, including its operating
system and applications, from a source system to a destination system.
The operating system, the applications, and the services running on the migrated partition
are not stopped during the migration.
This type of migration allows you to balance workloads and resources among servers
without any effect on the users.
Environmental
Storage Must be shared by and accessible by at least one virtual I/O
server on both the souce and destination servers
Must not have any required dedicated physical adapters
Logical unit numbers (LUNs) must be zoned and masked to
the virtual I/O servers on both systems
Storage pools are not supported
SCSI reservation must be disabled
All shared disks have the reserve_policy set to “no_reserve”
Network One or more physical IP networks or LANs that provide the
necessary network connectivity for the mobile partition
through a virtual I/O server partition on both the source and
destination servers
At least one virtual I/O server on each system must be bridged
to the same physical network and the same subnet
Restrictions
Source and destination
server
Must be managed by the same HMC or SDMC (or redundant
pair)
Memory and processor resources required for current
entitlements must be available on the destination server
Logical memory block size must be the same on the source
and destination systems
Destination server Cannot be running on battery power at the time of the migration
Mobile partition The partition must have a unique name
Cannot use the Barrier Synchronization Register with an
active migration
The partition cannot be a virtual I/O server
All I/O must be virtualized through the virtual I/O server
All storage must reside on shared disks, not LVs
The moving partition cannot be designated as a redundant
error path reporting partition
Component Prerequisites118 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Inactive Partition Mobility
Inactive Partition Mobility has the following characteristics:
This type of migration allows you to migrate a powered-off LPAR from a source system to
a destination system.
Inactive Partition Mobility is executed in a controlled way and with minimal administrator
interaction so that it can be safely and reliably performed.
To make full use of LPM, you must meet the following requirements and considerations.
Management console requirements and considerations
Beginning with HMC Version 7 Release 3.4, the destination system can be managed by a
remote HMC. So, it is possible to migrate an LPAR between two IBM Power System servers,
each of which is managed by a separate HMC. The following considerations must be in place:
The source HMC and the destination HMC must be connected to the same network so
that they can communicate with each other. This rule applies to the SDMC, as well.
The source and destination systems, which can be under the control of a single HMC, can
also include a redundant HMC.
The source and destination systems, which can be under the control of a single SDMC,
can also include a redundant SDMC.
The source system is managed by an HMC and the destination system is managed by an
SDMC.
The source system is managed by an SDMC and the destination system is managed by
an HMC.
Source and destination system requirements and considerations
The source and destination servers have these considerations:
The source and destination systems must be an IBM POWER6-based model (and higher).
Migration between systems with separate processor types is possible. You can obtain
detailed information in the IBM PowerVM Live Partition Mobility, SG24-7460.
Firmware
The servers have these firmware considerations:
System firmware: The firmware must be Ax710_065 or later, where the x is an M for
Midrange servers, such as 780 (or MHB), and the x is an H for Enterprise Servers, such as
795 (or FHB).
Ensure that the firmware levels on the source and destination servers are compatible
before upgrading and if you plan to use LPM.
Table 4-2 on page 119 shows the values in the left column that represent the firmware
level from which
you are migrating, and the values in the top row represent the firmware level to which you
Capacity: The HMC or SDMC can handle multiple migrations simultaneously. However,
the maximum number of concurrent partition migrations is limited by the processing
capacity of the HMC or SDMC.
Note: Source and destination systems can have separate levels of firmware. The level
of source system firmware must be compatible with the destination firmware.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 119
are migrating. For each combination, blocked entries are blocked by code from migrating;
not supported entries are not blocked from migrating, but are not supported by IBM;
mobile entries are eligible for migration.
Table 4-2 Partition Mobility firmware support matrix
Both source and destination systems must have PowerVM Enterprise Edition installed and
activated.
Ensure that the Logical Memory Block (LMB) is the same on both the source and
destination systems (Refer to Figure 4-9).
Figure 4-9 Verify LMB size from the HMC
Destination systems must have enough available CPU and memory resources to host the
mobile partitions.
The destination server cannot be running on battery power at the time of migration.
To determine whether the destination server has enough available physical memory to
support your mobile partition, complete the following steps from the HMC:
1. Identify the amount of physical memory that the mobile partition requires:
a. In the navigation pane, expand Systems Management ? Servers.
b. Click the source server on which the mobile partition is located.
c. In the work pane, select the mobile partition.
From/To 320_xxx 330_034 + 340_039 + 350_xxx + 710_xxx 720_xxx
320_xxx Not
supported
Not
supported
Not
supported
Not
supported
Blocked Blocked
330_034 + Not
supported
Mobile Mobile Mobile Mobile Blocked
340_039 + Not
supported
Mobile Mobile Mobile Mobile Mobile
350_xxx + Not
supported
Mobile Mobile Mobile Mobile Mobile
710_xxx Blocked Mobile Mobile Mobile Mobile Mobile
720_xxx Blocked Blocked Mobile Mobile Mobile Mobile120 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
d. From the Tasks menu, click Properties. The Partition Properties window opens.
e. Click the Hardware tab.
f. Click the Memory tab.
g. Record the dedicated minimum, assigned, and maximum memory settings.
h. Click OK.
2. Identify the amount of physical memory that is available on the destination server:
a. In the navigation pane, expand Systems Management ? Servers.
b. In the work pane, select the destination server to which you plan to move the mobile
partition.
c. From the Tasks menu, click Properties.
d. Click the Memory tab.
e. Record the Current memory available for partition usage.
f. Click OK.
3. Compare the values from steps 1 and 2.
Make sure that the server has enough processor memory by completing the previous steps
from the HMC.
Source and destination virtual I/O server requirements and considerations
The following considerations are for the virtual I/O server:
Power 795: A dual Virtual I/O Server at V2.2 or higher must be installed on both the source
and destination systems.
Power 780: A Virtual I/O Server at V2.1.2.12 with Fix Pack 22.1 and Service Pack 2 or
higher must be installed on both the source and destination systems.
You can obtain more information about the virtual I/O server and the latest downloads at the
virtual I/O server website:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/download/home.html
Operating system requirements
The operating system that runs in the mobile partition must be AIX or Linux:
Power 795:
– AIX Version 5.3.10.5 or later, 5.3.11.5 or later, or 5.3.12.1 or later.
– AIX Version 6.1.0.6 or later (CSM 1.7.0.1 or later). Install APAR IZ95265 if the AIX level
is 6100-06.
– AIX Version 7.1.
– Red Hat Enterprise Linux Version 5 (RHEL5) Update 5 or later (with the required kernel
security update).
– SUSE Linux Enterprise Server 10 (SLES 10) Service Pack 3 or later (with the required
kernel security update).
AME: In order to move an LPAR using AME through LPM to another system, the target
system must support AME. The target system must have AME activated through the
software key. If the target system does not have AME activated, the mobility operation fails
during the pre-mobility check phase, and an appropriate error message is displayed.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 121
– SUSE Linux Enterprise Server 11 (SLES 11) Service Pack 1 or later (with the required
kernel security update).
Power 780:
– AIX Version 5.3.9.7 or later, 5.3.10.4 or late, or 5.3.11.2 or later.
– AIX Version 6.1.2.8 or later (CSM 1.7.0.1 or later), 6.1.3.5 or later, or 6.1.4.3 or later.
Install APAR IZ95265 if AIX level is 6100-06.
– AIX Version 7.1.
– SUSE Linux Enterprise Server 10 (SLES 10) Service Pack 3 or later (with the required
kernel security update).
– SUSE Linux Enterprise Server 11 (SLES 11) or later (with the required kernel security
update).
To download the Linux kernel security updates, refer to the following website:
http://www14.software.ibm.com/webapp/set2/sas/f/pm/component.html
The previous versions of AIX and Linux can participate in inactive partition migration if the
operating systems support virtual devices on IBM POWER6-based servers and
POWER7-based servers.
Storage requirements
The following storage is required:
For vSCSI:
– Storage must be shared by and accessible by at least one virtual I/O server on both the
source and destination systems.
– SAN LUNs must be zoned and masked to at least one virtual I/O server on both the
source and destination systems.
For NPIV
You must zone both worldwide names (WWNs) of each of the virtual FC adapters.
Storage pools and logical volumes are not supported.
SCSI reservation must be disabled.
All shared disks must have reserve_policy set to “no_reserve”.
You must not have any required dedicated physical adapters for active migration.
NPIV (if used):
– HMC Version 7.3.4 or later
– SDMC Version 6.2.1.2 or later
– Virtual I/O Server Version 2.1 with Fix Pack 20.1
– Virtual I/O Server Version 2.3.1 required for NPIV on FCoCEE
– AIX V5.3 TL9 or later
– AIX V6.1 TL2 SP2 or later
– AIX V7.1
For a list of supported disks and optical devices, see the virtual I/O server data sheet for the
virtual I/O server:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/datasheet.html122 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Network requirements
The migrating partition uses the virtual LAN (VLAN) for network access. Consider the
following network requirements. The VLAN must be bridged to a physical network using a
Shared Ethernet Adapter in the virtual I/O server partition. If there are multiple VLANs, the
additional VLANs also must be bridged. Your LAN must be configured so that migrating
partitions can continue to communicate with other necessary clients and servers after a
migration is completed. At least one virtual I/O server on each machine must be bridged to
the same physical network (same subnet). An RMC connection must be set up between the
HMC and the LPARs, and it must be operational at all times.
Mobile partition requirement and considerations
The following requirements exist for the mobile partition:
The mobile partition’s OS must be installed in a SAN environment (external disk).
The mobile partition cannot use the BSR for active migration, as shown in Figure 4-10.
All I/O must be virtualized through the virtual I/O server.
All storage must reside on shared disks (not LVs).
Figure 4-10 Barrier Synchronization Register (BSR)
Ensure that the mobile partition’s name is unique across both frames.
Mobile partition application requirement
Certain applications that are tied to the hardware identification information, such as license
compliance managers, must be aware of the migration. You can obtain detailed information
about the use of the IBM PowerVM Live Partition Mobility in IBM PowerVM Live Partition
Mobility, SG24-7460. Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 123
You can obtain the latest information about LPM at the following website:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hc3/ip
hc3whatsnew.htm
4.3 CEC Hot Add Repair Maintenance (CHARM)
Concurrent add and repair capabilities for Power Systems servers have been introduced
incrementally since 1997, starting with the power supply, fan, I/O device, PCI adapter, and I/O
enclosure/drawer. In 2008, IBM introduced significant enhancements to Enterprise Power
Systems 595 and 570 that highlighted the ability to perform node add/upgrade/maintenance
concurrently, without powering down the system. CEC hot add and repair maintenance
(CHARM) offers new capabilities in reliability, availability, and serviceability (RAS). With the
introduction of POWER7 in 2010, these capabilities continue, but the terminology has
changed:
CEC Concurrent Maintenance (CCM) for Power 570 and Power 595
CEC Hot Add Repair Maintenance (CHARM) for Power 770, Power 780, and Power 795
Table 4-3 shows the changed POWER7 terminology compared to the POWER6 terminology.
Table 4-3 New POWER7 terminology
Hot GX Adapter repair is supported from POWER7.
4.3.1 Hot add or upgrade
The CHARM functions provide the ability to add/upgrade system capacity and repair the
Central Electronic Complex (CEC), including processors, memory, GX adapters, system
clock, and service processor without powering down the system. The hot node add function
adds a node to a system to increase the processor, memory, and I/O capacity of the system.
The hot node upgrade (memory) function adds additional memory dual inline memory
modules (DIMMs) to a node, or upgrade (exchange) existing memory with higher-capacity
memory DIMMs. The system must have two or more nodes to utilize the host node upgrade
function. To take full advantage of hot node add or upgrade, partition profiles must reflect
higher maximum processor and memory values than the values that existed before the
upgrade. Then, the new resources can be added dynamically after the add or upgrade.
POWER6 CEC CCM terminology New POWER7 terminology
“Concurrent” when referring to CEC hardware “Hot” when referring to CEC hardware
CCM: CEC Concurrent Maintenance CHARM: CEC Hot Add and Repair Maintenance
Concurrent Node Add Hot Node Add
Concurrent Node Upgrade (memory) Hot Node Upgrade (memory)
Concurrent Hot Node Repair
Concurrent Cold Node Repair
Hot Node Repair
Concurrent GX Adapter Add Concurrent GX Adapter Add
Concurrent Cold GX Adapter Repair Hot GX Adapter Repair
Concurrent System Controller Repair Concurrent System Controller Repair124 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
You can estimate the increased system memory with SPT. Figure 4-11 shows a modification
of the maximum memory setting for a partition changing from 32768 to 47616, and the
corresponding change to the hypervisor memory.
Figure 4-11 Checking the increased system memory by using IBM System Planning Tool
Important: Higher maximum memory values in the partition profiles increase the system
memory set aside for partition page tables; changing maximum memory values requires
the partition reactivation of a new profile.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 125
The concurrent GX Adapter add function adds a GX adapter to increase the I/O capacity of
the system. You can add one GX adapter concurrently to a Power 770/780 system without
planning for a GX memory reservation. To concurrently add additional GX adapters,
additional planning is required (refer the following note and to Figure 4-12 for more details).
The system administrator can change the default value for a GX adapter from zero to the total
number of empty slots in the system via the service processor Advanced System
Management Interface (ASMI) menu. The change takes effect on the next system IPL.
Figure 4-12 Altering the GX adapter memory reservation with ASM
4.3.2 Hot repair
Hot node repair repairs defective hardware in a node of a system. The system must have two
or more nodes to utilize the hot node repair function. Hot GX adapter repair repairs a
defective GX adapter in the system. And, system controller repair (795) repairs a defective
service processor.
Memory reservations: The system firmware automatically makes GX adapter memory
reservations to support a concurrent GX adapter add. The default Translation Control Entry
(TCE) memory reservations are made in the following manner:
One GX adapter (128 MB) maximum, if an empty slot is available
One adapter slot per node, two slots maximum for 795, and one adapter slot for 780
Node and I/O evacuation:
For hot node upgrade or repair, processors and memory in use within the target node
are relocated to alternate nodes with available resources.
I/O devices that are attached to the target node or I/O hub must be removed from usage
by the system administrator.
CoD resources: Unlicensed Capacity on Demand (CoD) resources are used by system
firmware automatically without a CoD usage charge to meet node evacuation needs during
the CHARM operation.126 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4.3.3 Planning guidelines and prerequisites
Implementing CHARM requires careful advanced planning and meeting all prerequisites. You
need to request the free pre-sales “I/O Optimization for RAS” services offering. The system
must have spare processor and memory capacity to allow a node to be taken offline for hot
repair or upgrade.
You must configure all critical I/O resources using an operating system multi-path I/O
redundancy configuration, for example, multi-path I/O (MPIO), SDDPCM, PowerPath, HDLM,
and so on).
You must configure redundant I/O paths through separate nodes and GX adapters, because
the I/O expansion units that are attached to the GX adapters in that node are unavailable
during a hot node repair or upgrade procedure. These separate nodes and GX adapters can
be either directly attached I/O or virtual I/O that is provided by dual virtual I/O servers housed
in separate nodes (Figure 4-13).
Figure 4-13 shows the system configuration with redundant virtual I/O servers and redundant
I/O adapters to improve the I/O availability and reduce the effect of a hot node repair or
upgrade operation.
Figure 4-13 System configuration with redundant virtual I/O servers and I/O paths
Capacity: If you do not have spare processor and memory capacity, you can use either the
dynamic LPAR operation to reduce processor and memory to minimum size or LPM to
move a partition or several partitions to another server. Otherwise, shut down the low
priority or unnecessary partitions.
Node 1
GX
Adapter
Node 2
GX
Adapter
Node 3
GX
Adapter
Node 4
GX
Adapter
I/O Drawer 1
Eth
Adapter 1
FC
Adapter 1
I/O Drawer 2
Eth
Adapter 2
FC
Adapter 2
I/O Drawer 3
Eth
Adapter 3
FC
Adapter 3
I/O Drawer 4
Eth
Adapter 4
FC
Adapter 4
SAN
Storage
Network
VIOS 1
VIOS 2
LPAR 3
LPAR 4
Memory
ProcessorsChapter 4. Planning for virtualization and RAS in POWER7 high-end servers 127
Figure 4-14 shows the connections that are disrupted during the hot repair of node 1. A
partition with paths to the I/O configured through node 1 and at least one other node continue
to have access to the Ethernet and storage networks.
Figure 4-14 Redundant virtual I/O servers and I/O paths during the hot repair of node 1
Consider these points about hot repair:
It is strongly recommended that you perform all scheduled hot adds, upgrades, or repairs
during off-peak hours.
Electronic Service Agent (ESA) or Call-Home must be enabled.
All critical business applications are moved to another server using LPM, if available. Or,
critical applications are quiesced for hot node add, hot node repair, hot node upgrade, and
hot GX adapter repair.
All LPARs must have an RMC network connection with the HMC.
You must configure the HMC with a redundant service network for both service processors
in the CEC for hot repair or upgrade of the 780.
Do not select the “Power off the system after all the logical partitions are powered off”
property system setting (Figure 4-15 on page 128).
Node 1
GX
Adapter
Node 2
GX
Adapter
Node 3
GX
Adapter
Node 4
GX
Adapter
I/O Drawer 1
Eth
Adapter 1
FC
Adapter 1
I/O Drawer 2
Eth
Adapter 2
FC
Adapter 2
I/O Drawer 3
Eth
Adapter 3
FC
Adapter 3
I/O Drawer 4
Eth
Adapter 4
FC
Adapter 4
SAN
Storage
Network
VIOS 1
VIOS 2
LPAR 3
LPAR 4
Memory
Processors
Unavailable
Resources128 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 4-15 Do not select Power off the system after all the logical partitions are powered off
Table 4-4 summarizes the minimum enablement criteria for individual CHARM functions, as
well as other concurrent maintenance functions.
Table 4-4 CCM/CHARM minimum enablement criteria
Functions Criteria
Off-peak
a
Redundant
I/O
b
ESA-
enabled
c
LPM or
quiesce
d
Fan/Blower/Control Add, Repair Recommend
Power Supply/Bulk Power Add, Repair Recommend
Operator Panel Recommend
DASD/Media Drive & Drawer Add Recommend
DASD/Media Drive & Drawer Repair Recommend Prerequisite
PCI Adapter Add Recommend
PCI Adapter Repair Recommend Prerequisite
I/O Drawer Add Recommend
I/O Drawer Repair, Remove Recommend Prerequisite
System Controller Repair Recommend
GX Adapter Add Recommend Prerequisite
GX Adapter Repair Recommend Prerequisite Prerequisite Prerequisite
Node Add Recommend Prerequisite Prerequisite
Node Upgrade (Memory
e
) Recommend Prerequisite Prerequisite Prerequisite
Hot Node Repair Recommend Prerequisite Prerequisite PrerequisiteChapter 4. Planning for virtualization and RAS in POWER7 high-end servers 129
Next, we describe the supported firmware levels. Table 4-5 provides the minimum and
recommended system firmware, HMC levels, and IBM Systems Director Management
Console (SDMC) levels for CEC hot node add and hot node repair maintenance operations
on Power 780. Refer to Table 4-6 on page 130 provides the system firmware, HMC levels,
and SDMC levels for Power 795.
Table 4-5 System firmware, HMC levels, and SDMC levels for add/repair on Power 780
For more details and the latest update on the minimum and recommended firmware levels for
CHARM on Power 780, refer to the IBM Power Systems Hardware Information Center:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3/p7
ed3cm_matrix_mmb.htm
a. Highly recommend that schedule upgrades or repairs are done during “non-peak” operational
hours.
b. Prerequisite that critical I/O resources are configured with redundant paths.
c. Electronic Service Agent (ESA) enablement highly recommended for POWER6 systems and
prerequisite for POWER7 systems.
d. Prerequisite that business applications are moved to another server using LPM, if available, or
critical applications quiesced.
e. IBM recommends that you not dynamically change the size of the 16 M large page pool in AIX
partitions with the vmo command while a CCM/CHARM operation is in progress.
Function Minimum system firmware, HMC
levels, and SDMC levels
Recommended system firmware,
HMC levels, and SDMC levels
Hot node add/
Hot node repair
AM720_064 or later
V7R7.2.0 + MH01235
AM720_084 or later
V7R7.2.0 + MH01246
Hot memory add or
upgrade
AM720_064 or later
V7R7.2.0 + MH01235
AM720_084 or later
V7R7.2.0 + MH01246
Hot GX adapter add All levels
V7R7.1.0
AM720_084 or later
V7R7.2.0 + MH01246
Hot GX adapter repair AM720_064 or later
V7R7.2.0 + MH01235
AM720_084 or later
V7R7.2.0 + MH01246
Important: If there are two HMCs or SDMCs attached to the system, both HMCs or
SDMCs must be at the same level. If not, the HMC or SDMC that is not at the required level
must be disconnected from the managed system and powered off.
To view the HMC machine code version and release, follow these steps:
1. In the Navigation area, click Updates.
2. In the Work area, view and record the information that appears under the HMC Code
Level heading, including the HMC version, release, maintenance level, build level, and
base versions.
To view the SDMC appliance code version and release, follow these steps:
1. On the SDMC command line, type lsconfig -V.
2. View and record the information that is displayed under the SDMC Code Level heading,
including the SDMC version, release, service pack, build level, and base versions.130 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Table 4-6 System firmware, HMC levels, and SDMC levels for Power 795
For more details and the latest update on the minimum and recommended firmware levels for
CHARM on Power 795, refer to the IBM Power Systems Hardware Information Center:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3/p7
ed3cm_matrix_mmb_9119.htm
Next, we discuss IBM i planning considerations. To allow for a hot node repair/memory
upgrade to take place with i partitions running, the following PTFs are also required:
V5R4: MF45678
V6R1: MF45581
If the PTFs are not activated, the IBM i partitions have to be powered off before the CHARM
operation can proceed.
To disconnect an HMC: To disconnect an HMC from a managed system, follow these
steps:
1. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
2. In the navigation area, expand System Configuration.
3. Select Hardware Management Consoles.
4. Select the desired HMC.
5. Click Remove connection.
To disconnect an SDMC: To disconnect an SDMC from a managed system, refer to this
website:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/dpsm/dpsm_trouble
shooting/dpsm_troubleshooting_managedsystemstate_conn_prob.html
Function Minimum system firmware, HMC
levels, and SDMC levels
Recommended system firmware,
HMC levels, and SDMC levels
Hot node add/
Hot node repair
AH730_0xx or later
V7R7.3.0 + MHyyyy
AH730_0xx or later
V7R7.3.0 + MHyyyy
Hot memory add or
upgrade
AH730_0xx or later
V7R7.3.0 + MHyyyy
AH730_0xx or later
V7R7.3.0 + MHyyyy
Hot GX adapter add AH730_0xx or later
V7R7.3.0 + MHyyyy
AH730_0xx or later
V7R7.3.0 + MHyyyy
24-inch I/O drawer
add/removal
All levels
V7R7.2.0
All levels
V7R7.2.0Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 131
Table 4-7 provides estimated times in minutes for each activity (by role) for a CHARM
operation on a Power 780 Server. The times are shown in minutes, and they are
approximations (~). The estimated times are for a single operation. For a large MES upgrade
with multiple nodes or GX adapters, careful planning by the system administrator and IBM
system service representative (SSR) must be done to optimize the overall upgrade window.
Table 4-7 Estimated time for CHARM operation on a Power 780
There are rules for CHARM operations:
Only a single hot add or repair operation can be performed at one time from one HMC.
In a dual management console environment, all CHARM operations must be performed
from the primary management console.
The “Prepare for Hot Repair/Upgrade” task must be run by the system administrator to
determine the processor, memory, and I/O resources that must be freed up prior to the
start of concurrent operation (this task is for the system administrator).
Features not supported: The following features and capabilities are not supported in
conjunction with CHARM:
Systems clustered using RIO-SAN technology (this technology is used only by IBM i
users clustering using switchable towers and virtual OptiConnect technologies).
I/O Processors (IOPs) used by IBM i partitions do not support CHARM (any IBM i
partitions that have IOPs assigned must either have the IOPs powered off or the
partition must be powered off).
Systems clustered using InfiniBand technology (this capability is typically used by High
Performance Computing clients using an InfiniBand switch).
Sixteen GB memory pages, which are also known as huge pages, do not support
memory relocation (partitions with 16 GB pages must be powered off to allow CHARM).
System administrator time
(minutes)
SSR time
(minutes)
Operation Prepare for
node/GX
evacuation
Resource
allocation/
restore
Memory
relocation
(32-512GB)
Firmware
deactivate/
activate
Physically
remove/
install
Node Add N/A ~30 N/A ~30 - 45 ~60
Node Upgrade ~30 - 60 ~30 ~11 - 77 ~25 - 40 ~15
Node Repair ~30 - 60 ~30 ~11 - 102 ~25 - 40 ~15 - 20
GX Add N/A ~15 N/A ~10 ~5
GX Repair ~10 - 30 ~15 N/A ~15 - 20 ~8
Non-primary management console: In V7R7.3.x.x of HMC and SDMC, a new feature
was added that if you start a repair from the non-primary management console, it asks
if you want to make the console from which you are running the procedure the primary
management console. It then tries to renegotiate the role of the primary management
console. If the non-primary HMC can become the primary management console, it
allows you to continue with the procedure on this console. Refer to Figure 6-6 on
page 201.132 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 4-16 Prepare for Hot Repair/Upgrade utility
A second hot add or repair operation cannot be started until the first one has completed
successfully. If, at first, the hot operation fails, the same operation must be restarted and
completed before attempting another operation.
Multiple hot add or repair operations must be completed by performing a series of single
hot add or repair operations.
You must enable the service processor redundancy capability, if it has been disabled,
before a CHARM operation, except on a Power 780 with a single node.
An IBM service representative (SSR) must perform the execution of CHARM procedures
and the physical hardware removal and replacement.
You can find additional information about CHARM on the Power 780 Server at the following
website:
Planning for CEC hot node add and hot node repair maintenance
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3
/abstract_ared3.htm
Planning for concurrent GX adapter or hot node add
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3
/ared3addhardware.htm
Planning for hot GX adapter or node repair
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3
/ared3repairhardware.htm
Planning for adding or upgrading memory
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3
/ared3kickoff.htm
Note: The Prepare for Hot Repair/Upgrade utility is a tool for the system administrator
to identify the effects to system resources in preparation for a hot node repair, hot node
upgrade, or hot GX adapter repair operation. Refer to Figure 4-16. This utility provides
an overview of platform conditions, partition I/O, and processor and memory resources
that must be freed up for a node evacuation. A node is a drawer in a 9117-MMB,
9179-MHB, or 9119-FHB system. For more details about the Prepare for Hot
Repair/Upgrade utility, refer to the IBM Power Systems Hardware Information Center:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7ed3/ared3no
deevac.htmChapter 4. Planning for virtualization and RAS in POWER7 high-end servers 133
For the latest version of the planning checklist, refer to the “Planning for CEC hot-node add
and hot-node repair maintenance” section of the IBM Power Systems Hardware Information
Center:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3/ar
ed3kickoff.htm
4.4 Software planning
Next, we describe the operating system support requirements and key prerequisites for the
Power 780 and Power 795.
First, we discuss the following AIX operating system requirements for POWER7 systems:
AIX V7.1
AIX V6.1, with the 6100-06 Technology Level
AIX V5.3, with the 5300-12 Technology Level and Service Pack 1, or later
AIX V5.3, with the 5300-11 Technology Level and Service Pack 5, or later (availability
30 September 2010)
AIX V5.3, with the 5300-10 Technology Level and Service Pack 5, or later (availability
30 September 2010)
Although AIX 7.1 is supported on older Power servers, when running on the POWER7
servers, AIX 7.1 is the first OS version that allows an application to scale beyond 64
cores/128 threads to reach 256 cores and 1,024 threads in a single instance. Also, with AIX
7.1, you are able to control which processors (cores) are allowed to be interrupted in order to
handle typical system interrupt requests. This new capability in AIX and Power enables a
much more friendly environment for business critical applications that require true real-time
processing.
If installing the IBM i operating system on POWER7 systems, the following versions are
supported:
IBM i 7.1, or later
IBM i 6.1, with 6.1.1 machine code, or later
If installing the Linux operating system on POWER7, the following versions are supported:
Red Hat Enterprise Linux AP 5 Update 5 for Power, or later
SUSE Linux Enterprise Server 10 Service Pack 3, or later
SUSE Linux Enterprise Server 11 Service Pack 1, or later
For systems ordered with the Linux operating system, IBM ships the most current version that
is available from the distributor. If you require a separate version than the version that is
shipped by IBM, you must obtain it via a download from the Linux distributor’s website.
Information concerning access to a distributor’s website is located on the product registration
card that is delivered to you as part of your Linux operating system order.
If you are installing virtual I/O server, Virtual I/O Server 2.2 or later is required.
There are unique considerations when running Java 1.4.2 on POWER7. For the best
exploitation of the outstanding performance capabilities and most recent improvements of
POWER7 technology, IBM recommends upgrading Java-based applications to Java 6 or Java
5 whenever possible. For more information, refer to the following website:
http://www.ibm.com/developerworks/java/jdk/aix/service.html134 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
IBM Systems Director Version 6.2.1.2 or later is required for CHARM on POWER7.
4.5 HMC server and partition support limits
HMC Version 7.7 supports managing a maximum of 48 servers (non-Power 590/595 models)
or 32 IBM Power 590/595 servers with a maximum of 1,024 partitions across the managed
servers. The number of servers that each HMC can control varies by server size and
complexity. Each server partition must have a physical connection to the network, and the
HMC must be logically connected to each partition via the network connection. For additional
details about the number of servers and LPARs supported, go to this website:
http://www.software.ibm.com/webapp/set2/sas/f/hmc/
4.6 Migrating from POWER6 to POWER7
Before attempting cross-platform migration, familiarize yourself with the binary compatibility
statement of POWER7 with previous generations. This statement applies to the version of
AIX that you might plan to install or use on your POWER7 server.
4.6.1 Migrating hardware from POWER6 and POWER6+ to POWER7
Hardware migration from POWER5 and earlier to POWER7 is currently not possible.
However, it is possible to migrate your current POWER6 and POWER6+™ to a POWER7
server and vice versa. You can obtain a complete list of adapters that can be reused from the
POWER6 on the POWER7 in the IBM Power 795 Technical Overview and Introduction,
REDP-4640. You might need to review features, including CoD and PowerVM, with your sales
representative before this migration.
Hardware upgrade path to an IBM Power 780
IBM will replace the following components:
The CEC
Dynamic device reconfiguration (DDR2) to DDR3
Trim kits
Enterprise enablement
Depending on your system configuration, you might not replace the following components:
The rack
PCIe adapters
Cables, line cords, keyboards, and displays
I/O drawers
Hardware upgrade path to an IBM Power 795
POWER6 machine type 9119-FHA can be migrated to POWER7 machine type 9119-FHB.
The upgrade includes the replacement of the processor books and memory in the 9119-FHA
CEC frame. You must reorder the CoD enablements.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 135
Components that are not replaced
You do not replace the following components:
The current POWER6 bulk power distribution and bulk regulator assemblies
Bulk power regulators
Bulk power distributors
Some 12X PCI-X and PCI-e
The IBM Power 795 Technical Overview and Introduction, REDP-4640, specifies the
complete list and associated feature codes.
4.6.2 Migrating the operating system from previous Power servers to POWER7
The following sections provide information about software migrations from previous Power
Systems servers to POWER7 servers.
POWER6 to POWER7 migration
POWER6 TO POWER7 migration offers the following possibilities:
Hardware migration, which is discussed in 4.6.1, “Migrating hardware from POWER6 and
POWER6+ to POWER7” on page 134.
Active and inactive migration using LPM, which is introduced in 3.2.3, “Live Application
Mobility (LPM)” on page 77.
Offline migration, which is a similar process to Migrating from POWER5 and earlier
systems to POWER7.
Migrating from POWER5 and earlier systems to POWER7
In this section, an application refers to any non-operating system software, including vendor
off-the-shelf, packaged applications, databases, custom-made applications, and scripts.
The AIX installation, whether Network Installation Management (NIM) or media, provides a
few installation options. These options include a new and complete overwrite, preservation
install, and migration. NIM from A to Z in AIX 5L, SG24-7296, explains these options in detail.
The following examples show NIM installations and not media-based installation. The
preferred I/O environment takes advantage of the PowerVM virtual I/O setup.
There is no hardware-based path to move from POWER5. Therefore, there is no active
migration. The migration options in this section apply to POWER6/POWER7 migration. Our
preferred option is LPM. The options that follow require that you create an LPAR. After you
create the LPAR, you can perform one of these migrations:
New and complete installation
mksysb restore
SAN-based migration using physical HBAs
SAN-based migration using virtual adapters
Alternate disk installation using SAN
New and complete overwrite installation
This option is an installation of AIX (BOS), which is often called a “new and complete
overwrite”. Ensure that you have enough disk space to perform the installation. The
installation media can be CD or DVD. However, NIM is the preferred and recommended
method.136 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
With the “New and complete overwrite” method, consider the following information:
All data on the selected disk is lost.
All customized system settings are lost. These settings might be required by the
applications. Examples are custom network settings, including static routes, and file
system settings, including VIO users. You must set these options again.
All applications must be reinstalled.
If data resided on the same disk, the data must be restored from backup media.
If data resided on a separate volume group, recover it by importing the volume group as
shown in “Example importing non-root volume group” on page 357.
Follow these steps for a new complete overwrite installation on a new server:
1. Prepare the LPAR.
2. Make disks available to the LPAR using either NPIV or vSCSI.
3. Prepare the NIM environment or boot from installation media.
4. Initiate Base Operating System (BOS) installation.
5. Start up the LPAR and select the new and complete overwrite option. The following steps
are explained in NIM from A to Z in AIX 5L, SG24-7296:
a. Start the LPAR from the HMC into SMS
b. Select Boot Options.
c. Select Boot install devices.
d. Select install devices?
e. Choose Option 6: Network?
f. Select Bootp.
g. Choose the appropriate Network Adapter.
h. Select Normal Boot mode?
i. Select Yes to start.
j. Type 1 and press Enter to use this terminal as the system console.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 137
k. Select option 2 Change/Show Installation Settings and Install on the Base Operating
System installation window, as shown in Figure 4-17.
Figure 4-17 Base Operating System Installation and Maintenance window
l. You are prompted either install with the current installation settings or make changes,
as shown in Figure 4-18.
Figure 4-18 BOS Installation and Settings window138 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
m. Select option 1 New and Complete Overwrite. This option overwrites everything on the
selected disk, as shown in Figure 4-19.
Figure 4-19 Change Method of Installation
n. Selecting option 1 continues with the installation and overwrites the contents of the
selected disk.
The mksysb restore command
The mksysb restore command allows you to restore operating system files and
configurations. This method restores all file systems that were mounted when the backup was
taken. You must not use the mksysb command as a backup strategy for non-operating system
files (data). Follow these steps for using the mksysb restore procedure:
1. Prepare the LPAR.
2. Make disks available to the LPAR using either NPIV or vSCSI. Physical devices are not
recommended for this volume.
3. Prepare the NIM environment:
– If initiating a restore from installation media, you must boot either from a tape that was
created using the mksysb command or from a CD/DVD that was created using the
mkdvd command.
– If a NIM server is used, you need a valid mksysb image that was created with the
mksysb commands, a spot, and an lpp_source.
The lpp_source: The lpp_source is required, because it contains device files that
might not be included in the mksysb. These devices are POWER7-specific devices
that might not be available on POWER5. The required device files include virtual
devices that might not have been installed on the source machine.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 139
4. Initiate the BOS installation. When using NIM to initiate booting, select mksysb - Install
from a mksysb and not rte, as shown in Figure 4-20.
Figure 4-20 NIM BOS installation type selection
5. Boot the LPAR into maintenance mode and follow these steps, which help to start an
LPAR using media:
a. Start the LPAR from the HMC into SMS.
b. Select Boot Options?
c. Select Boot install devices.
d. Select install devices?
e. Choose Option 6: Network?
f. Select Bootp?
g. Choose the appropriate Network Adapter.
h. Select Normal Boot mode.
i. Select Yes to start.
j. Type 1 and press Enter to use this terminal as the system console.140 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
k. On the Base Operating System installation window that is shown in Figure 4-21, select
Option 2 Shrink File Systems. Figure 4-21 shows that there is no preservation or
migration option. There is only an option to select a target disk. After the target disk is
selected, the mksysb restore process continues.
Figure 4-21 Selecting a disk on which to restore an mksysb
Consider the following information when using the mksysb restore command:
All data on the selected target disk is lost.
Customized system settings and parameters are restored from the mksysb image.
All application binaries and data residing on the mksysb disk are restored, with the
exception of any directories and subdirectories that are listed in the /etc/exclude.rootvg
file.
If there is any data residing on a separate volume group, recover it by importing the
volume group, as shown in “Example importing non-root volume group” on page 357. This
method is a safer option than a new and complete overwrite. The process is still
cumbersome.
4.6.3 Disk-based migrations
The following methods remove the need to install or restore the operating system or the
application data. The operating system and the data is “Taken” as it is by pointing physical
volumes to host-based adapters (HBAs) on the POWER7 servers. Your storage team must be
involved in this process.
4.6.4 SAN-based migration with physical adapters
In this method, HBAs can be allocated directly to an LPAR. We are not using the virtual I/O
server. Although we mention this method, we recommend that you use virtual I/O server,
which is part of the PowerVM virtualization offering. Follow these steps:
1. Identify the disks that are allocated to the LPAR on the POWER5 server:Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 141
– Many commands exist to identify which disks are connected to the LPAR. Most of
these commands are vendor-based multipath software commands, such as pcmpath
query device. Other commands are AIX commands and VIO commands. Although
Example 4-2 shows two commands that can be used to get the serial numbers of the
disks that must be zoned or mapped to the target system, there are a number of
available commands, depending on the installed devices and drivers.
Example 4-2 Using pcmpath and lsattr -El to identify a LUN serial
# pcmpath query device 1
DEV#: 1 DEVICE NAME: hdisk1 TYPE: 1814 ALGORITHM: Load Balance
SERIAL: 600A0B800026B28200007ADC4DD13BD8
==========================================================================
Path# Adapter/Path Name State Mode Select Errors
0 fscsi0/path0 CLOSE NORMAL 0 0
1 fscsi2/path2 CLOSE NORMAL 0 0
2 fscsi0/path1 CLOSE NORMAL 162 0
3 fscsi2/path3 CLOSE NORMAL 144 0
Using lsattr
# lsattr -El hdisk1 -a unique_id
unique_id 3E213600A0B800026B28200007ADC4DD13BD80F1814 FAStT03IBMfcp PCM False
– Check that none of the disks are internal disks. If any of the disks are internal, you
must either replace them with SAN-attached disks, or you must migrate them to other
disks on the same volume group using either the migratepv or replacepv command.
PowerVM Migration from Physical to Virtual Storage, SG24-7825, explains other
options of migrations from physical to virtual. Use the lsvg and lsdev commands to
confirm if there are any internal disks that are allocated on a volume group.
Example 4-3 shows a root volume group with internal disks. These disks must be
migrated to SAN storage.
Example 4-3 A rootvg with internal disks
lsvg -p rootvg
root@nimres1 / # lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 11 00..00..00..00..11
root@nimres1 / #
Notice that hdisk0 is allocated to rootvg. lsdev -Cc disk shows hdisk0 is physically attached
to the server.
root@nimres1 / # lsdev -Cc disk
hdisk0 Available 04-08-00-3,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 04-08-00-4,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 04-08-00-5,0 16 Bit LVD SCSI Disk Drive
hdisk3 Available 04-08-00-8,0 16 Bit LVD SCSI Disk Drive
– Example 4-4 shows a root volume group using SAN-attached disks. The rootvg on
Example 4-3 cannot be “zoned” to the POWER7, because it is internal to the POWER5
hardware.
Example 4-4 A rootvg with SAN-attached disks
# lsvg -p rootvg
rootvg:142 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 79 2 00..00..00..00..02
#
Notice that hdisk0 is allocated to rootvg. lsdev -Cc disk shows hdisk0 is a Multi Path I/O
device (mpio)
# lsdev -Cc disk
hdisk0 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk1 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk2 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk3 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk4 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk5 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk6 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk7 Available 02-00-02 IBM MPIO DS4700 Array Disk
hdisk8 Available 00-00-02 IBM MPIO DS4700 Array Disk
hdisk9 Available 00-00-02 IBM MPIO DS4700 Array Disk
2. Prepare the LPAR, as shown in 2.10.1, “Creating a simple LPAR” on page 53.
3. Make disks available to the LPAR using either NPIV or vSCSI. 2.7.2, “N_Port ID
Virtualization (NPIV)” on page 43 shows the process.
4. Shut down the LPAR on the source server.
Figure 4-22 Stage where IPL stops due to corrupted root file systems and boot device
Note: If you do not shut down the LPAR on the source server before starting it, the
operating system does not start. The file systems, including the rootvg file systems, will
be corrupted. This situation creates a code 553 or 557 on the destination server and file
system corruptions on the source. You will have to restore from mksysb. This condition
does not show immediately, but it shows when the server is rebooted, as shown in
Figure 4-22. The IPL stops at this point. The HMC or SDMC displays the error code.
The first diagram shows the open console. The IPL stops at the window that is shown in
Figure 4-22. Figure 4-23 on page 143 shows the error code.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 143
Figure 4-23 Code caused by corrupt file system or boot device
5. Start the LPAR on the POWER7 in system management services (SMS) by using either
NIM rte or media. Go into system maintenance mode by selecting option 3 Start
Maintenance Mode for System Recovery on the Base Operating System Installation and
Maintenance menu, as shown on Figure 4-24. Refer to NIM from A to Z in AIX 5L,
SG24-7296, which shows how to prepare the NIM server to get to the Base Operating
System Installation and Maintenance menu.
Figure 4-24 Selecting the option to start an LPAR in maintenance mode144 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
6. Select option 1 Access a Root Volume Group, as shown in Figure 4-25.
Figure 4-25 Accessing a root volume group for system maintenance
7. Select 0 to continue when prompted.
8. Select the disk that contains the root volume group.
9. Select the Access the volume group and start the shell option. You have access to the
disk. The controlling image is the spot from NIM, not the operating system on your server.
When you enter the df command, the command shows which file systems are mounted.
The NIM file systems are also displayed. See Figure 4-26.
Figure 4-26 RAMFS file systems in system maintenance mode
10.After you are in system maintenance, run the following commands (the output is shown in
Example 4-5 on page 145):
a. Run the cfgmgr command for the Device configuration manager.
b. Run the bosboot command to recreate the boot image.
c. Run bootlist to confirm the bootlist.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 145
Example 4-5 Configuring POWER7 devices that might not be on the boot image on disk
# cfgmgr
# bosboot -ad /dev/hdisk0
bosboot: Boot image is 29083 512 byte blocks.
# bootlist -m normal -o
ent0 bserver=172.16.20.40 client=172.16.21.35 gateway=172.16.20.40
ent1 bserver=172.16.20.40 client=172.16.21.35 gateway=172.16.20.40
hdisk0 blv=hd5
# bootlist -m normal hdisk0
11.After the cfgmgr command completes successfully, and the bosboot command completes
without a failure, restart the LPAR by running shutdown -Fr.
Alternate disk installation (alt_disk_clone)
This method is similar to 4.6.4, “SAN-based migration with physical adapters” on page 140
with the following differences:
With alt_disk_clone, you clone the operating system to an alternate disk before making the
disk available to the destination server.
You can allocate either the original disk or the cloned disk to the target server.
The added safety of alternate disk is that you can return to the original server in its original
state, and the operating system has no added drivers and filesets.
After the alt_disk_copy is completed and cleaned, the alternate disk must be removed from
the source and allocated to the target server. You can follow the process in 4.6.4, “SAN-based
migration with physical adapters” on page 140.
Example 4-6 shows the commands to create a clone.
Example 4-6 Commands showing how to create a clone
# hostname
rflpar20
# lspv
hdisk0 00c1f170c2c44e75 rootvg active
hdisk1 00f69af6dbccc5ed None
hdisk2 00f69af6dbccc57f datavg
# alt_disk_copy -d hdisk1
Calling mkszfile to create new /image.data file.
Checking disk sizes.
Creating cloned rootvg volume group and associated logical volumes.
Creating logical volume alt_hd5
Creating logical volume alt_hd6
Creating logical volume alt_hd8
Creating logical volume alt_hd4
Creating logical volume alt_hd2
Creating logical volume alt_hd9var
Creating logical volume alt_hd3
Creating logical volume alt_hd1
Creating logical volume alt_hd10opt
Creating logical volume alt_hd11admin
Creating logical volume alt_lg_dumplv
Creating logical volume alt_livedump
Creating /alt_inst/ file system.146 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
/alt_inst filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/admin file system.
/alt_inst/admin filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/home file system.
/alt_inst/home filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/opt file system.
/alt_inst/opt filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/tmp file system.
/alt_inst/tmp filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/usr file system.
/alt_inst/usr filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/var file system.
/alt_inst/var filesystem not converted.
Small inode extents are already enabled.
Creating /alt_inst/var/adm/ras/livedump file system.
/alt_inst/var/adm/ras/livedump filesystem not converted.
Small inode extents are already enabled.
Generating a list of files
for backup and restore into the alternate file system...
Backing-up the rootvg files and restoring them to the
alternate file system...
Modifying ODM on cloned disk.
Building boot image on cloned disk.
forced unmount of /alt_inst/var/adm/ras/livedump
forced unmount of /alt_inst/var/adm/ras/livedump
forced unmount of /alt_inst/var
forced unmount of /alt_inst/var
forced unmount of /alt_inst/usr
forced unmount of /alt_inst/usr
forced unmount of /alt_inst/tmp
forced unmount of /alt_inst/tmp
forced unmount of /alt_inst/opt
forced unmount of /alt_inst/opt
forced unmount of /alt_inst/home
forced unmount of /alt_inst/home
forced unmount of /alt_inst/admin
forced unmount of /alt_inst/admin
forced unmount of /alt_inst
forced unmount of /alt_inst
Changing logical volume names in volume group descriptor area.
Fixing LV control blocks...
Fixing file system superblocks...
Bootlist is set to the boot disk: hdisk1 blv=hd5
# bootlist -m normal -o
hdisk1 blv=hd5
# bootlist -m normal hdisk0
The size and contents of your disks affect the time that it takes for the alternate disk to
complete.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 147
VIO server-based migration using virtual adapters
This method requires the creation of Virtual SCSI or Virtual Fibre. Refer to “Creating Virtual
FC adapters” on page 231. After allocating the disks, follow the process that is described in
4.6.4, “SAN-based migration with physical adapters” on page 140.
4.6.5 After migration to POWER7
Review the AIX prerequisites for running POWER7. The version of AIX has an effect on the
mode in which your POWER7 server runs. To take full advantage of the POWER7 features,
upgrade AIX to Version 7.1. Consult your application vendors to confirm compatibility. Also,
refer to Exploiting IBM AIX Workload Partitions, SG24-7599, for migrating an AIX Version 5.2
LPAR. This Redbooks publication shows the creation of a Versioned WPAR, which can run
AIX Version 5.2 TL 8 and later.
Reasons to consider running the latest AIX version:
AIX 5.3: With 5300-09 TL and service pack 7, or later, the LPAR only runs in POWER6 or
POWER6+ mode. You are not able to run smt2. Thus, you cannot run four threads per
core as designed for POWER7.
AIX Version 6.1: Prior to TL 6, the LPAR mode was POWER6 or POWER6+.
AIX Version 6.1 TL 6 and later: The LPAR runs in POWER7 mode, but it is limited to 64
cores.
AIX Version 7.1: It exploits all the capabilities of the POWER7 architecture.
Table 4-8 on page 148 from the IBM Power 770 and 780 Technical Overview and Introduction,
REDP-4639, shows the benefits that you can derive from running in POWER7 mode
compared to POWER6.
Supported device drivers: In all the migration procedures that we have discussed, check
that you have supported device drivers. One way to resolve this issue if you do not is to
install all device support with the Base Operating System installation. This method requires
disk space, makes the installation longer, and makes the upgrades longer. If you do not
have all the supported devices, LPM might not work.148 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Table 4-8 Benefits of running POWER7 mode
You can set up an LPAR running AIX Version 6 TL 6 and AIX Version 7.1 to an earlier
processor mode using the LPAR Profile Processors tab. We recommend that you leave the
option as the default. The processor mode changes based on the operating system level.
The next sections discuss the LPAR mode and provide examples.
POWER6 (and
POWER6+) mode
POWER7 mode Client value
Two-thread SMT Four-thread SMT Throughput performance,
processor core utilization
Vector Multimedia Extension
(VME)/AltiVec
Vector Scalar Extension (VSX) High-performance computing
Affinity OFF by default Three-tier memory,
Micro-partition
Affinity
Improved system performance
for system images spanning
sockets and nodes
Barrier Synchronization
Fixed 128-byte Array;
Kernel Extension Access
Enhanced Barrier
Synchronization
Variable-Sized Array; User
Shared Memory Access
High-performance computing,
parallel programming,
synchronization facility
64-core and 128-thread scaling 32-core and 128-thread
scaling
64-core and 256-thread
scaling
256-core and 1,024-thread
scaling
Performance and scalability for
large scale-up single system
image workloads, such as
online transaction processing
(OLTP), ERP scale-up, and
WPAR consolidation
EnergyScale CPU Idle EnergyScale CPU Idle and
Folding with NAP and SLEEP
Improved energy efficiencyChapter 4. Planning for virtualization and RAS in POWER7 high-end servers 149
Notice on the Processing Settings tab, when creating an LPAR, there is no option to choose
which processor mode to use. See Figure 4-27.
Figure 4-27 Initial creation of an LPAR: No processor mode option150 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
After the LPAR creation, you can change the processor mode on the LPAR when you are on
the POWER7 system. Follow these steps:
1. Log on to the SDMC.
2. Select hosts.
3. Select Virtual Server ? Action ? System Configuration ? Manage Profile.
These choices are shown in Figure 4-28.
In the following figures, we show you how to use an HMC to change the system mode. We
explain how to use an SDMC in detail in 5.1, “SDMC features” on page 160.
4. Log in to the HMC.
5. Select System Management.
6. Select the system. This option lists a few LPARs.
7. Click on the LPAR.
8. Select Tasks ? Configuration ? Manage Profiles. The window that is shown in
Figure 4-28 opens.
Figure 4-28 Editing a virtual server profile
9. Select the profile that you need to edit. Select the Processor tab. On the Processor tab,
select the appropriate mode on the Processor compatibility mode list box, as shown in
Figure 4-29 on page 151. Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 151
Figure 4-29 POWER7 Processor compatibility Mode
Example 4-7 shows the lsconf command, which allows you to see the processor mode of a
running AIX LPAR.
Example 4-7 Showing the processor mode of an AIX LPAR
# lsconf | head
System Model: IBM,9117-MMA
Machine Serial Number: 101F170
Processor Type: PowerPC_POWER6
Processor Implementation Mode: POWER 6
Processor Version: PV_6_Compat
Number Of Processors: 2
Processor Clock Speed: 4208 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 6 lpar2_570152 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4.7 Technical and Delivery Assessment (TDA)
IBM is continually striving to improve worldwide solution quality. The Technical and Delivery
Assessment (TDA) is an objective, third-party technical expert inspection of your completed
Power System solution design to answer three questions:
Will your solution work?
Is IBM prepared to implement it successfully?
Will your proposed solution meet your requirements and expectations?
The Power 780 and Power 795 servers require a mandatory pre-installation TDA before the
order ships. The pre-installation TDA is designed to evaluate your (the client) readiness to
install, implement, and support your new Power System solution. In addition, the TDA is
designed to minimize installation problems, minimize IBM and IBM Business Partner costs to
support, and document the actions that are required for success.
We want to ensure that you are getting the correct solution to meet your business
requirements. This process uses IBM technical support to provide expert skills to identify
activities that are required for a successful solution implementation.
Our solution assurance works because we have designed the process over time using actual
client experiences. The preparation of the TDA document is an exercise that can be extremely
revealing. It pulls the entire solution together from one central view and can reveal if there are
missing components. During the TDA review, experts review the overall solution and help
identify what might have been overlooked in the design. It provides many perspectives with a
single consistent approach.
When IBM conducts your TDA review, all appropriate team members are invited (IBM, IBM
Business Partner, and client) and required to attend. The pre-installation review needs to be
completed one to two weeks before the Power system ships, or earlier if significant porting,
moving, or site preparation tasks are required. During this review, the experts discuss the
following items:
Power requirements
Space requirements
Cabling requirements
Installation plan and responsibilities
Upgrade plan and responsibilities
Services and support
There are three possible outcomes to a TDA. The solution stage assessments are “passed”,
“passed with contingency on action items”, and “not recommended”.
Passed
If the subject matter experts (SMEs) approve the solution as presented, the proposed design
and solution proceeds. A result of “Passed” does not mean that there are no outstanding
action items, there might be many outstanding action items. However, the outstanding action
items associated with a review that receives this rating must have a predictable outcome that
does not alter the viability of the overall solution. For instance, an action item might be to
review options that you as the client have for maintenance offerings. This item can be
performed, and the outcome does not alter the nature of the solution technically.
Passed with contingency on action items
This outcome is a conditional approval that depends on the results of certain specified action
items. For example, suppose that a certain version of an application is an absolute
prerequisite to support your proposed Power server, but it was not known whether that Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 153
version was actually available. The reviewers might elect to approve contingent on it being
verified that the required version can be installed or upgraded to the required release.
A contingency on action item differs from an ordinary action item in that its outcome is
uncertain, yet critical to the viability of the proposed solution. In the case of a “Passed With
Contingency on Action Items” result, your IBM sales team must take steps to execute the
contingent action items and ensure that the outcomes are the ones needed to satisfy the TDA
conditions.
Not recommended
This result means that the reviewers do not agree that the solution is technically viable. A
solution might be “Not recommended” due to the lack of sufficiently detailed information to
evaluate the solution. The “Not recommended” result does not occur often. We list several
reasons that reviewers might conclude that a solution is “Not recommended”:
The solution, as presented, fails to meet the requirements articulated by the client and
cannot be rescued with minor adjustments.
The solution presenter cannot provide sufficient information to allow the reviewers to judge
the technical viability of the solution.
The technical risk that is associated with the solution is unreasonably high.
Your TDA review must document two solution risk assessment ratings:
Before Action Items Are Completed: Risk assessment is for the solution “as is”, at the time
of the TDA.
After Action Items Are Completed: Risk assessment is for the solution with the assumption
that all recommended action items that result from the review are completed on schedule.
There are three risk assessment levels:
High
Medium
Low
The members of your IBM sales and technical sales team need to ensure that all action items
are completed correctly.
4.8 System Planning Tool (SPT)
The System Planning Tool (SPT) is a browser-based application that helps you design your
system configurations and is particularly useful for designing logically partitioned systems. It
is available to assist in the design of an LPAR system and to provide an LPAR validation
report that reflects your system requirements while not exceeding IBM’s LPAR
recommendations. It is also used to provide input for your specified hardware placement
requirements. System plans that are generated by the SPT can be deployed on the system by
the HMC, SDMC, and the IVM. The SPT is intended to be run on the user’s personal
computer, and it is provided as is with no implied or expressed warranty of any kind.
SPT: The SPT is available for download at this website:
http://www-947.ibm.com/systems/support/tools/systemplanningtool/154 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
You can use the SPT to design both a logically partitioned system and a non-partitioned
system. You can create an entirely new system configuration from nothing or you can create a
system configuration based upon any of the following information:
Performance data from an existing system that the new system will replace
A performance estimate that anticipates future workload requirements
Sample systems that you can customize to fit your needs
After designing a system with SPT, you can generate the following information:
Reports that detail the system configuration that you have architected
System-plan files that can be moved to the HMC, SDMC, or IVM that are used to actually
deploy your system plan
SPT uses a file format called .sysplan, which is used on your management console to
systematically distribute your system plan. The .sysplan file can be renamed to .zip and an
XML file can be extracted for possible manipulation outside of the SPT tool and the HMC or
SDMC.
For managed systems with virtual I/O server installed, the HMC code must be at 7.3.3 (or
greater) and the virtual I/O server must be at Fix Pack 10.1 (or greater) to generate SPT files
from your HMC.
You can review the following reports from the SPT viewer:
Partition-wise processor summary report
Partition-wise memory summary report
Virtual SCSI server-client slot mappings
Virtual FC server-client slot mappings
Verify dual virtual I/O server configurations for preferred practices
We highly suggest that you create a system plan using the SPT before and after any
hardware changes are made. Additionally, any major changes or new systems need to be
built in SPT before an order is placed to ensure their validity.
To use the HMC or SDMC to create a system plan successfully, you need to ensure that your
system meets a number of prerequisite conditions.
A system plan that you create by using HMC V7.3.3 or later, or the SDMC V6.2.1.2 or later,
contains hardware information that the management console was able to obtain from your
selected managed system. However, the amount of hardware information that can be
captured for the system plan varies based on the method that was used to gather the
hardware information.
The management console can potentially use two methods: inventory gathering and
hardware discovery. When using hardware discovery, the HMC/SDMC can detect information
about hardware that is unassigned to a partition or that is assigned to an inactive partition.
Additionally, the HMC/SDMC can use one or both of these methods to detect disk information
for IBM i LPARs. You will collect better quality data and a more accurate quantity of data for
the system plans if you use the hardware discovery process.
The IBM POWER7 information center gives the detailed requirements for both inventory
gathering and hardware discovery at this website:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp
To create a system plan by using the HMC, complete the following steps:
1. In the navigation area, select System Plans. The System Plans page opens.Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 155
2. In the Tasks area, select Create System Plan. The Create System Plan window opens.
3. Select the managed system that you want to use as the basis for the new system plan.
4. Enter a name and description for the new system plan.
5. Optional: Select whether you want to retrieve inactive and unallocated hardware
resources. This option appears only if the managed system is capable of hardware
discovery, and the option is selected by default.
6. Optional: Select whether you want to view the system plan immediately after the HMC
creates it.
7. Click Create.
Figure 4-30 show the HMC panels to create your Sysplan.
Figure 4-30 Creating the system plan pane
Now that you have created a new system plan, you can export the system plan, import it onto
another managed system, and deploy the system plan to that managed system.
There are several methods to create a system plan:
Important: If you do not select the “Retrieve inactive and unallocated hardware
resources” option, the HMC does not perform a new hardware discovery, but instead
uses the data in the inventory cache on the system. The HMC still performs inventory
gathering and retrieves hardware information for any active LPARs on the managed
server. The resulting new system plan contains hardware information from the
inventory-gathering process and hardware information from the hardware inventory
cache on the system.
Creating a system plan: As an alternative to the HMC web-user interface, you can use
the following methods to create a system plan that is based on the configuration of an
existing managed system.156 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Run the mksysplan command from the HMC command-line interface (CLI).
Run the mksysplan command from the SDMC CLI.
Use the SDMC web user interface.
The POWER7 Enterprise Servers support the Customer Specified Placement (CSP) of I/O
adapters and I/O devices within the CEC and I/O drawers. Through the use of the CSP
feature, IBM Manufacturing can provide customization of your Power server order to match
your hardware placement request according to the slot in the drawer hardware placement,
before the server arrives at your site. We strongly advise that you use CSP for all Power 780
and 795 orders.
Without CSP, IBM Manufacturing makes an effort to distribute adapters evenly across busses,
planars, and drawers. However, this default placement might not be optimum for your specific
performance, availability, or LPAR connectivity requirements.
CSP specifications are collected using the SPT and processed through eConfig, or
placement requirements can be specified directly in eConfig using the Placement view. An
advantage of using SPT is that it allows the CSP information to be copied and preserved.
CSP requires your IBM account team to submit the cfreport output of eConfig to IBM
Manufacturing in a timely manner (within 24 hours) via the CSP website. It also requires your
account team to assure that the eConfig output submitted reflects the actual order placed.
We strongly advise that you create a system plan using the SPT before and after any changes
are made to existing hardware configuration. Additionally, any major changes or new
systems need to be built in SPT before an order is placed to ensure that the changes or new
systems are valid.
4.9 General planning guidelines for highly available systems
For a highly available operating environment that takes advantage of reduced planned and
unplanned outages, planning is important. Plan toward eliminating single points of failure
(SPOFs) within a single system or cluster of interconnected systems that support an
application or applications. The IBM PowerHA SystemMirror Planning Guide, SC23-6758,
suggests the following considerations when eliminating SPOFs.
Considerations within a single managed system
The following considerations help eliminate SPOFs in a single server system:
Power source: Use multiple circuits or uninteruptible power supplies.
Networks: Use multiple networks to connect nodes. Use redundant network adapters.
TCP/IP subsystems: Use as many TCP/IP subsystems as required to connect to users.
Disk adapters: Use redundant disk adapters.
Controllers: Use redundant disk controllers.
Disks: Use redundant hardware and disk mirroring.
Cluster repository: Use RAID protection.
Virtual I/O server: Use redundant VIO servers.
System management: Use redundant HMCs, SDMCs, or a combination of HMCs and
SDMCs.
Disaster recovery planning: The SPT is also an excellent tool for documentation and
needs to be included as input into your disaster recovery plan. Chapter 4. Planning for virtualization and RAS in POWER7 high-end servers 157
Considerations for enhancing availability
The following considerations help enhance availability:
Nodes: We suggest that you use multiple physical nodes.
Applications: Use clustering, such as PowerHA, Cluster Aware AIX (CAA), or high
availability disaster recovery (HADR). You need to assign additional nodes for the
takeover.
Mobility: Use either Live Application Mobility (LAM) or Live Partition Mobility (LPM).
Sites: Use more than one site. Also, disaster recovery uses multiple sites.
You must complement all planning and implementations with testing. Skipping the planning
stage can result in infrequent, high-impact errors occurring. The more scenarios that you can
test assist in building resilience around the solutions that are provided.
Together with planning and testing comes training. Users need to be trained both on the job
and formally to be able to take advantage of the features that are provided. Clients need to
plan for the components on a single system that might have to be repaired.158 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 159
Chapter 5. POWER7 system management
consoles
This section explores the POWER Server management console solutions through the
Hardware Management Console (HMC), IBM Systems Director Management Console
(SDMC), and the IBM Systems Director console.
We discuss the following topics in this section:
SDMC features
Virtualization management: Systems Director VMControl
IBM Systems Director Active Energy Management (AEM)
High availability Systems Director management consoles
5160 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5.1 SDMC features
The SDMC is designed to be a successor to both the Hardware Management Console (HMC)
and the Integrated Virtualization Manager (IVM) for Power Systems administration. The
Power Systems management is integrated into the Systems Director framework, which allows
for the management of many systems of various types.
5.1.1 Installing the SDMC
The SDMC installation involves the following tasks:
The installation of the hardware appliance that is required for all midrange and high-end
systems.
The installation of the software appliance that replaces IVM.
The use of the setup wizard at the end of the installation process to set up and perform the
initial configuration of the SDMC.
The SDMC virtual machine contains Linux as the base operating system. The virtualization
layer for the hardware appliance is fixed and cannot be changed. The hardware is provided by
IBM and it uses the Red Hat Enterprise Virtualization hypervisor (RHEV-H hypervisor). The
software appliance can be installed on either VMware or a kernel-based virtual machine
(KVM), and the client supplies the hardware.
For then detailed step-by-step installation procedure for the SDMC, see Chapter 2,
“Installation” of the IBM Systems Director Management Console Introduction and Overview,
SG24-7860, which is located at the following website:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247860.pdf
5.1.2 SDMC transition
Although the move from the HMC to the SDMC might at first seem daunting, the SDMC has
been designed to allow for as smooth a transition as possible. First, you can run the HMC and
the SDMC in parallel, co-managing the same hardware, during the transition period. To
operate in parallel, both consoles must be at the same level of code.
Section 4.3, “HMC to SDMC transition”, in the IBM Systems Director Management Console
Introduction and Overview, SG24-7860, describes the procedure to launch the transition
wizard to move a system that is managed by an HMC to the SDMC. The publication is located
at the following website:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247860.pdf
To use any of the advanced managers, such as the VMControl or Advanced Energy Manager
(AEM) plug-ins, at the SDMC launch, you must have the configuration that is shown in the
Figure 5-1 on page 161.Chapter 5. POWER7 system management consoles 161
Figure 5-1 Recommended configuration
Figure 5-1 describes the parallel management of an HMC or SDMC on a single POWER6 or
POWER7 frame. It also shows a hierarchical management configuration in which the
Systems Director, with the advanced management plug-ins installed, uses the HMC’s
management interface to the server to facilitate the use of the plug-ins.
As the HMC transitions out of use, the Systems Director will be able to manage the
POWER6/POWER7 either directly or hierarchically through the SDMC.
5.1.3 SDMC key functionalities
SDMC allows a single management point for many systems in your enterprise. The SDMC is
extremely similar to the HMC. The goal in the redesign of the single point of control is to
provide the Systems Director with a combined hardware and software control user
experience. With SDMC, you can perform these functions:
Manage and provision multiple systems of heterogeneous infrastructure
Reconfigure systems by using logical partition (LPAR) and dynamic LPAR (DLPAR)
capabilities
Enable certain hardware enhancements, such as POWER6 compatibility mode, on the
POWER7
Orchestrate Live Partition Mobility (LPM) operations
Coordinate the suspend and resume of virtual servers
Modify the resource assignment of your virtual servers even when they are in a stopped
state
Manage virtual slots automatically, leading to enhanced virtual I/O server management
Requirements: The configuration that is shown in Figure 5-1 requires these supported
levels:
The level of the HMC is 7.3.5 or higher
The level of the IBM Systems Director is 6.2.1.2 or higher
Director
HMC
SDMC
P6/P7 (a)
(VMC + AEM)162 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Create users that use Lightweight Directory Access Protocol (LDAP) or Kerberos for
authentication
Back up the whole virtual machine onto removable media or to a remote FTP server
Schedule operations for managed systems and virtual servers
Preserve the HMC’s active-active redundancy model in addition to the active-passive
availability model that is provided by the Systems Director
The IBM Systems Director Management Console Introduction and Overview, SG24-7860,
describe all these functionalities in detail. This book is located at the following website:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247860.pdf
5.1.4 HMC versus SDMC
The SDMC represents the consolidation of several system management offerings that are
currently offered by IBM. It brings together the features in the Systems Director, IVM, and
HMC. Tasks that you can perform independently on any of these platforms can also
conveniently be performed on the SDMC. The SDMC, which includes all the traditional server
and virtualization management functions that are provided by the latest HMC, also provides
the simplicity of IVM in its functions.
The HMC administers entry-level systems to high-end systems. The SDMC manages many
systems of multiple types. It manages both POWER processor-based blades, systems that
were previously managed by the IVM, and high-end systems. The SDMC is available as a
hardware appliance similar to the HMC. The SDMC, however, unlike the HMC, is available in
a virtual appliance form, as well, for installation into existing virtual machine environments.
The software appliance is intended for the management of low-end and midrange servers.
The hardware appliance is targeted for use with midrange to high-end servers.
Due to the integration with the Systems Director, an inherently cross-platform management
server, you might notice changes in terminology. Logical partitions (LPARs) are, for example,
in SDMC referred to as “virtual servers”, and managed systems are referred to as “hosts” or
“servers”. The SDMC also demonstrates tighter integration with virtual I/O server through a
more automatic management of virtual slots than the HMC. For users who prefer to use the
HMC command-line interface (CLI), the CLI transitioned fairly intact, although with a few
minor syntax changes. The commands of the HMC are run with a prefix of “smcli”.
For example, to list the virtual Small Computer System Interface (SCSI) resources of a host,
prefix the HMC lshwres command with smcli, as shown:
sysadmin@dd172:~>smcli lshwres -r virtualio --rsubtype scsi -m
Server-8233-E8B-SN100042P --level lpar
The preceding command lists all virtual SCSI adapters on the managed system,
Server-8233-E8B-SN100042P.
Experienced users also appreciate the enhancements to dynamic LPAR (DLPAR)
management to make it more intuitive, such as the ability to modify resource allocations
regardless of whether the partition is On or Off. You can modify the processor, memory, and
adapter assignments for a virtual server even when it is in a stopped state.
Perform the following steps to add a virtual Ethernet adapter using a DLPAR operation on a
virtual server that is in the stopped state:
1. Use the Manage Virtual Server task to change the general properties and perform
dynamic logical partitioning.Chapter 5. POWER7 system management consoles 163
For dynamic LPAR operations on the virtual server, click Manage Virtual Server to locate
the virtual server on which the DLPAR operation will be performed. Select the virtual
server, click Add and click OK, as shown in Figure 5-2.
Figure 5-2 Page showing the Manage Virtual Server task164 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2. Figure 5-3 shows the next page with tabs on the left side that can be used to modify the
processor, memory, and adapter assignments of the selected virtual server.
Figure 5-3 Shows the properties of the adapter to be createdChapter 5. POWER7 system management consoles 165
3. Figure 5-4 shows the attributes that have been selected to create the virtual Ethernet
adapter. We selected the Adapter ID 6, Port Virtual Ethernet 1, checked the IEEE 802.1q
compatible adapter box, added 20 as the additional VLAN ID, and selected the default
VSwitch ETHERNET0.
Figure 5-4 Shows the attributes to add to create the adapter166 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 5-5 shows that the virtual Ethernet adapter in slot number 6 has been added.
Figure 5-5 Shows that the virtual Ethernet adapter has been added
There are a few features in the HMC that are not in the SDMC: the system plans feature, the
management of POWER5 systems, and the capability to disconnect and reconnect to old
sessions.
5.1.5 Statement of direction for support HMC
It is expected that most new users of IBM Power Systems will use the new, enhanced SDMC
offering as their systems management of choice; therefore, the SDMC has been designed to
support only POWER6 and higher servers.
The HMC then takes on the role of an older management server. The amount of new
functionality that is added to the HMC ends over the next two years, and the POWER7 server
series will be the last systems to be able to be managed by the HMC.
IBM advises clients to consider adopting the SDMC in their environment in the near future.
POWER4 and POWER5: Users of POWER4 and POWER5 platforms have to use the
HMC to manage their servers.Chapter 5. POWER7 system management consoles 167
5.2 Virtualization management: Systems Director VMControl
This section provides an overview of IBM Systems Director VMControl™ and its functionality
to manage the virtualization of Power servers. Systems Director VMControl is available in
three editions: Express Edition, Standard Edition, and Enterprise Edition. Figure 5-6
describes the features of each edition. The Express Edition is a free download. The Standard
Edition and the Enterprise Edition require a valid license after a 60-day evaluation period.
Figure 5-6 VMC features supported by the various editions
5.2.1 VMControl terminology
This section explains the VMControl terminology.
Virtual server
A virtual server is associated with a host system. It is called an LPAR, a partition, or a virtual
machine.
Virtual appliance
A virtual appliance contains an image of a full operating system, and it can contain software
applications and middleware. The virtual appliance contains metadata describing the virtual
server.
Workload
A workload represents a deployed virtual appliance that allows you to monitor and manage
one or more virtual servers as a single entity. For example, a workload that might contain both
a web server and a database server can be monitored and managed as a single entity. A
workload is automatically created when a virtual appliance is deployed.
System pools
A system pool groups similar resources and manages the resources within the system pool as
a single unit. Storage system pools and server system pools are examples of system pools.168 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
A server system pool consists of multiple hosts and their associated virtual servers, along with
the attached shared storage.
Image repositories
The created virtual appliances are stored in storage that is considered to be an image
repository. The image repositories can be a Network Installation Management (NIM)-based
storage system or virtual I/O server-based storage system.
VMControl subagents
The following characteristics apply to the VMControl subagents:
For VMControl to see the images in the repositories, agents need to be installed in the
repository system.
If NIM is used to manage the virtual appliance, the subagent needs to be installed in the
NIM master.
If virtual I/O server is used to manage the image repository, the subagent needs to be
installed in the virtual I/O server partition.
In both cases, the subagents are installed on top of the common agent.
Import
The import task enables you to import a virtual appliance package, storing the virtual
appliance that it contains within VMControl. Then, the virtual appliance can be deployed.
Virtual farms
A virtual farm logically groups like hosts and facilitates the relocation task: moving a virtual
server from one host to another host within the virtual farm. A virtual farm can contain multiple
hosts and their associated virtual servers.
Relocation
The following policies are relocation policies:
Manual relocation
This policy relocates one or more virtual servers from an existing host at any time. To
relocate within virtual farms, choose the relocation target. If relocating within server
system pools, the relocation target is automatically identified.
Policy-based relocation
This policy activates a resiliency policy on a workload so that VMControl can detect a
predicted hardware failure problem that relates to processors, memory subsystems, power
source, or storage and relocate the virtual servers to another host in the server system
pool. Policy-based relocation can be done with approval or without approval.
Automatic relocation
The VMControl (refer to Figure 5-7 on page 169) server system pools can predict
hardware failure problems and relocate the virtual servers to maintain resilience.
For example, you can activate a threshold to monitor high and low values for CPU
utilization in workloads. You can create an automation plan to automatically relocate the
virtual server’s system pool when the thresholds are crossed.
Important: Hosts that are not connected to the same shared storage as the server system
pool cannot be added to the system pool.Chapter 5. POWER7 system management consoles 169
Figure 5-7 VMControl environment with Systems Director, HMC/SDMC, virtual I/O server, and storage
5.2.2 VMControl planning and installation
When planning for VMControl, the following components are required:
Systems Director installation
VMControl plug-in installation in the Director server
VMControl subagents installation in the NIM and virtual I/O server partition
Installation steps
Follow these Systems Director and the VMControl plug-in installation steps:
1. Use the detailed installation information at these links to download and install the Systems
Director:
– Director download link:
http://www.ibm.com/systems/software/director/resources.html
– Installation link:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com
.ibm.director.main.helps.doc/fqm0_main.html
Use the information at this link to log in to the Systems Director server for the first time:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/index.jsp?topic=/com.ib
m.director.main.helps.doc/fqm0_main.html
IBM Systems
Director Server
VMControl
HMC/
sdmc
IBM Power
server
VIOS
SAN
IP network170 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2. Follow these VMControl plug-in installation steps:
Systems Director VMControl is installed on systems running Systems Director server
Version 6.2.1 or higher.
a. Download the VMControl plug-in from the following link:
http://www-03.ibm.com/systems/software/director/downloads/plugins.html
b. Select the download package for the operating system that is running on your Systems
Director server:
• For AIX/Linux: SysDir_VMControl__Linux/AIX.tar.gz
• For AIX: SysDir_VMControl_2_2_AIX.tar.gz
• For Microsoft Windows: SysDir_VMControl_Windows.zip
c. Copy the download package to a directory or folder in the Systems Director server and
extract the contents of the package:
gzip -cd SysDir_VMControl__Linux/AIX.tar.gz | tar -xvf -
d. Change to the extracted folder and install the VMControl plug-in:
• For AIX/Linux: IBMSystems-Director-VMControl-Setup.sh
• For Microsoft Windows: IBMSystems-Director-VMControl-Setup.exe
e. Edit the following lines in the installer.properties file for silent mode installation:
INSTALLER_UI=silent
LICENSE_ACCEPTED=true
START_SERVER=true (this entry starts the director server on reboot)
f. Follow the instructions in the installation wizard to install Systems Director VMControl.
Ensure that you restart the Systems Director server.
g. Check the log file to see if the installation completed successfully:
• For AIX/Linux: /opt/ibm/Director/VMControlManager/installLog.txt
• For Microsoft Windows: \Director\VMControlManager\installLog.txt (path
where Director is installed)
h. Go to this link to obtain the hardware requirement for VMControl installation:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/com.ibm.direct
or.plan.helps.doc/fqm0_r_supported_hardware_and_software_requirements.html
i. Download the VMControl installation from this link:
http://www-03.ibm.com/systems/software/director/downloads/plugins.html
3. Installing VMControl agents and subagents:
– NIM subagent
VMControl uses the NIM Master to manage the virtual appliance. For VMControl to
connect with the NIM Master, a subagent needs to be installed in the NIM Master, as
shown in Figure 5-8 on page 171.
Combined installation: At the time of writing this book, the current VMControl
version was Version 2.3.1. In later releases, IBM intends to make the VMControl
plug-in installation a part of the Systems Director installation. Then, no separate
installation will be required.Chapter 5. POWER7 system management consoles 171
Figure 5-8 NIM subagent installation in the NIM Master server
– Common repository subagent:
• VMControl makes use of the virtual I/O server partition to store the raw disk images
that are associated with AIX or Linux virtual appliances. The storage is allocated
from the SAN and provided through the virtual I/O server.
• For VMControl to connect with the image repository, both the Systems Director
Common Agent and VMControl Common repository subagents need to be installed
in the virtual I/O server partition, as shown in Figure 5-9 on page 172.
NIM Master
VMControl
NIM Sub Agent
Common
Agent
AIX OS
Image Repository
HMC / SDMC
Power CEC
lpar1
lpar2
vios
Power Blade
VIOS
lpar2
lpar3
lpar1
lpar2
vios
IVM (running on
vios) / SDMC
lpar1
Systems Director Server
VMControl plug-in172 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 5-9 Repository subagent
– Subagent installation can be done through the following methods:
Subagent installation through the Systems Director’s release management task:
i. In the IBM Systems Director navigation pane, expand Release management.
ii. Click Agents.
iii. On the Agents page, click Common Agent Subagent Packages.
iv. From the Common Agent Subagent Packages view, select the subagent that needs
to be installed.
v. Click Actions on the menu bar.
vi. Select Release Management ? Install Agent.
vii. Follow the instructions in the installation wizard to install the subagent.
Manual subagent installation steps:
i. Locate the subagent in the following folder:
For AIX/Linux: /opt/ibm/director/packaging/agent
For Microsoft Windows: C:\Program Files\IBM\Director\packaging\agent
ii. Copy the subagent to a temporary folder, for example, /tmp.
For NIM, the agent is CommonAgentSubagent_VMControl_NIM_2.3.1.
Important: Ensure that the Systems Director’s Version 6.2.1 or higher common
agent is installed on the target NIM Master and virtual I/O server partition.
IBM Systems
Director Server
VMControl
HMC/
sdmc
IBM Power
server
IP network
SAN
VIOS
Image
repository
Common Agent
VMControl Common
Repository sub-agentChapter 5. POWER7 system management consoles 173
For the common repository, the agent is
CommonAgentSubagent_VMControl_CommonRepository-2.3.1.
iii. Change the directory to the Systems Director system bin directory:
/opt/ibm/director/agent/bin
iv. Use the ./lwiupdatemgr.sh command to install the subagent.
For the NIM subagent:
./lwiupdatemgr.sh -installFeatures -featureId
com.ibm.director.im.rf.nim.subagent -fromSite
jar:file:/tmp/com.ibm.director.im.rf.nim.subagent.zip\!/site.xml -toSite
"file:/var/opt/tivoli/ep/runtime/agent/subagents/eclipse/"
For the common repository subagent:
./lwiupdatemgr.sh -installFeatures -featureId
com.ibm.director.im.cr.agent.installer-fromSite
jar:file:/tmp/scom.ibm.director.im.cr.agent.installer.zip\!/site.xml -toSite
"file:/opt/ibm/director/agent/runtime/agent/subagents/eclipse/"
For the Linux environment involving VMware, you must install the following subagents:
• VMware vCenter 4.x subagent: CommonAgentSubagent_VSM_VC4x-6.2.1
• VMware VirtualCenter 2.x subagent: CommonAgentSubagent_VSM_VC2x-6.2.1
• VMware ESX 4.x subagent: CommonAgentSubagent_VSM_ESX4x-6.2.1
• VMware ESX 3.x subagent: CommonAgentSubagent_VSM_ESX3x-6.2.1
5.2.3 Managing a virtual server
With Systems Director VMControl, virtual appliances can be deployed to virtual servers.
Virtual servers can be created, edited, and deleted through the Systems Director (without the
need of the VMControl plug-in).
Creating a virtual server
The following steps provide instructions to create a virtual server. The Systems Director
provides wizards to create a virtual server. We outline the steps for creating a virtual server
next. After the virtual server is created, the virtual appliance can be deployed through
VMControl.
Follow these steps to create a virtual server:
1. In the Systems Director, click Navigate Resources to locate the host.
2. Select the host.
3. Click Actions from the menu bar.
4. Click System Configuration.
5. Select Create Virtual Server, as shown in Figure 5-10 on page 174.
6. Follow the Create Virtual Server wizard to set up the virtual server.
Important:
Ensure that the managed system has been discovered through the Systems Director.
The virtual server can also be created through the HMC or the SDMC. 174 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 5-10 Create Virtual Server
Editing a virtual server
This section illustrates how to edit a virtual server. After creating virtual servers through the
Systems Director, the virtual servers can be edited, as well, through the Systems Director.
Follow these steps to edit a virtual server:
1. In the Systems Director, click Navigate Resources to locate the virtual server.
2. Select the virtual server.
3. Click Actions from the menu bar.
4. Click System Configuration.
5. Select Edit Virtual Server.
Deleting a virtual server
You can delete a virtual server permanently from the host, as shown in Figure 5-11 on
page 175.
Note: The Create Virtual Server task gives you an option to run it immediately or
schedule it to run at a later time.
Editing: The Systems Director allows you to edit processor and memory details. For any
other modification, the Systems Director provides options to launch other platform
management utilities, such as the HMC and the SDMC.
Important: Power off the virtual server first to delete it permanently.Chapter 5. POWER7 system management consoles 175
Figure 5-11 Permanently Delete Virtual Server
5.2.4 Relocating a virtual server
VMControl can relocate virtual servers in response to predicted hardware failures related to
processors, memory subsystems, a power source, or storage. Also, a virtual server can be
relocated for planned maintenance or downtime or to adjust resources to improve
performance.
You can perform relocation in the following ways:
Static relocation
With static relocation, if the virtual server is powered on, the relocation operation powers
off the virtual server at the beginning of the relocation process and powers the virtual
server on when the relocation is complete.
Live relocation
With live relocation, if the virtual server is powered on, the relocation occurs without
powering the server off. There are three options from which to choose for the relocation:
– Manually relocate virtual servers at any time.
– Activate a resilience policy on a workload to relocate virtual servers automatically to
prevent predicted hardware failures from affecting the availability of the workload.
– Create an automation plan to relocate the virtual servers when certain events occur.
Relocating virtual servers manually
Systems Director VMControl chooses the target host from similarly configured hosts in the
server system pool and displays the proposed relocation actions. You either accept or cancel
the relocation operation based on your requirements.
Live Partition Mobility: The relocation feature makes use of the Live Partition Mobility
(LPM) functionality from the IBM Power Systems servers.176 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Relocating virtual servers using the resilience policy
The resilience policy enables Systems Director VMControl to relocate virtual servers
automatically to maintain the resilience (high availability) of workloads.
When the resilience policy is activated, Systems Director VMControl can automatically
relocate virtual servers when a predicted hardware failure is detected. VMControl moves
virtual servers away from a failing host system, and it relocates them to a host that the server
system pool determines has adequate resources.
Relocating virtual servers automatically
You create an automation plan to relocate the virtual servers from the host with a critical event
(for example, a hardware problem, high CPU utilization, and so on) to a host that the server
system pool determines has adequate resources.
5.2.5 Managing virtual appliances
As described under 5.2.1, “VMControl terminology” on page 167, a virtual appliance is the
bundle of the operating system image and the application that is installed on top of it. In
addition, the virtual appliance also has information about the virtual server configuration
details that are bundled.
The first step in managing virtual appliances is creating image repositories. The following
entities must exist before managing virtual appliances:
Image repositories: Virtual appliances are stored in image repositories:
– AIX: It has two options: NIM image repositories and virtual I/O server image
repositories
– Linux: Virtual I/O server image repositories
Agents: They are specific to the environment. NIM-based subagents are installed in the
NIM Master, and storage-based subagents are installed in the virtual I/O server partition.
Discovery: VMControl must have discovered the image repositories and virtual
appliances.
Creating image repositories
We discuss the creation of the image repositories.
Discovering NIM image repositories for AIX
The following actions are required to have NIM image repositories for AIX:
1. Discover and request access to the NIM server.
2. At this stage, image repositories, such as mksysb, are already created in the NIM server.
3. Ensure that the VMControl NIM subagent is installed in the NIM server, as explained in
5.2.2, “VMControl planning and installation” on page 169.
4. Run inventory collection on the NIM server.
5. From the VMControl summary page, go to the Basics tab and click Discover virtual
appliances to discover your repositories (mksysb) and virtual appliances, as shown in
Figure 5-12 on page 177. The virtual appliances that are already present in your
Approval required: By default, a prompt to approve any policy-based action, such as
relocation, appears before the move is performed.Chapter 5. POWER7 system management consoles 177
repositories and that have been imported or captured using VMControl are detected by
VMControl.
Figure 5-12 Discovering virtual appliances in the NIM image repository
Creating and discovering VIOS image repositories for AIX and Linux
You must follow these steps for VMControl to create image repositories:
1. Set up SAN storage pools, and set up a virtual I/O server that has access to the storage
pool.
2. Discover and request access to the storage and the operating system of the virtual I/O
server.
3. Ensure that the VMControl common repository subagent software is installed on the
virtual I/O server that hosts the image repository, as explained in 5.2.2, “VMControl
planning and installation” on page 169.
4. Run the full inventory collection on the operating system of the virtual I/O server to gather
information about the image repository subagent.
5. Create an image repository. From the VMControl summary page, go to the Virtual
Appliances tab and click Create image repository, as shown in Figure 5-13.
Figure 5-13 Create image repository: Virtual I/O server image repository178 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
6. Follow the instructions in the Create Image Repository wizard, as shown in Figure 5-14 on
page 178.
Figure 5-14 Create Image Repository wizard
Importing a virtual appliance package
Follow these steps to import a virtual appliance package:
1. Go to the VMControl Basics tab.
2. Under Common tasks, select the Import option (as seen in Figure 5-13 on page 177) to
import the virtual appliance package in Open Virtualization Format (ovf) format into the
image repository.
Figure 5-15 shows the Import virtual appliance wizard.
Figure 5-15 Import virtual appliance: Importing the .ovf file
Capturing virtual appliances
To capture virtual appliances for a virtual server or workload to create a virtual appliance, or
for a mksysb image or resource to create a virtual appliance, or for a NIM lpp_source
resource or directory to create a virtual appliance, refer to the following steps:Chapter 5. POWER7 system management consoles 179
1. To capture the virtual appliance image, click System Configuration ? VMControl ?
Virtual Appliances tab, as shown in Figure 5-16 on page 179.
Figure 5-16 Capture the virtual appliance under the VMControl window
2. The Capture virtual appliance wizard takes you through the steps involved to provide the
source virtual server repository where you want to store the image that is associated with
the new virtual appliance, as shown in Figure 5-17.
Figure 5-17 Capture the virtual appliance: Selecting the repository into which to store the image
Deploying the virtual appliance
Similar to capturing virtual appliances, the Deploy Virtual Appliance wizard helps in deploying
the virtual appliance, as shown in Figure 5-18 on page 180.180 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 5-18 Deploying virtual appliances
5.2.6 Creating a workload
As discussed in the VMControl terminology in 5.2.1, “VMControl terminology” on page 167,
each virtual appliance is considered a workload. At the same time, using the following step,
you can group virtual appliances to create a workload for better monitoring and management.
To create a workload, click System Configuration ? VMControl ? Workloads tab ?
Create workload, as shown in Figure 5-19 on page 181.Chapter 5. POWER7 system management consoles 181
Figure 5-19 Create a workload
5.2.7 Managing server system pools
Server system pools enable grouping similar hosts. It is advantageous to manage and
monitor hosts through a pool. This method provides more resiliency through relocation within
a server pool.
Creating a server system pool
Use the following steps to create a server system pool:
1. Click System Configuration ? VMControl ? System Pools tab, as shown in
Figure 5-20 on page 182.
Requirements: Ensure that the required VMControl plug-in and the subagent are already
installed. 5.2.2, “VMControl planning and installation” on page 169 offers more details
about the required plug-in, agents, and installation steps.182 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 5-20 Create server system pool
2. Use the wizard to create a server pool.
3. While creating the server pool, you are presented with options to select the pool resilience
criteria, as shown in Figure 5-21 on page 183.Chapter 5. POWER7 system management consoles 183
Figure 5-21 Server pool resilience criteria
4. All hosts in the server system pool must use the same shared storage. Available shared
storage is listed for selection, as shown in Figure 5-22.
Figure 5-22 System pool shared storage184 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5. The optimization function analyzes the server system pool and periodically optimizes for
performance. The process can relocate the virtual server to the most efficient host. We
provide more details about optimization next. You must select the optimization option while
creating the server system pool, as shown in Figure 5-23 on page 185.
Configuring SAN storage for server system pools
This section illustrates how to configure SAN storage for server system pools.
Add/remove host from server system pool
In addition to server pool creation, VMControl provides options to add/remove the hosts from
the pool.
Server system pool optimization
Optimization enables the analysis and periodic performance improvement of all the virtual
servers within a server system pool based on specified needs, such as the relocation of the
virtual servers within a workload to the most efficient hosts.
When optimization is run, the system pool is examined for performance hot spots, that is,
systems in the pool that are heavily used. If a hot spot is identified, workloads are relocated to
better balance resource usage in the environment.
There are two types of optimization:
Manual optimization: Optimize manually whenever you want any task, such as relocation,
to be started at convenient off-peak times.
Automated optimization: In automatic optimization, system analysis determines whether a
workload must be distributed.
Important: The hosts in the server system pool must support the relocation of their virtual
servers for workload resiliency, as shown in Figure 5-23 on page 185. The panel for
configuring optimization is displayed only if all the hosts in the server system pool support
workload resiliency.Chapter 5. POWER7 system management consoles 185
Figure 5-23 Server system pool optimization
We have described a few of the key features of VMControl. VMControl saves time and
reduces the configuration complexity that is involved with virtualization. In the next section, we
explain how another advanced software function from Systems Director called Active Energy
Management (AEM) can be used to enable power saving options for IBM Power servers.
5.3 IBM Systems Director Active Energy Management (AEM)
In this section, we provide an overview of the Active Energy Manager (AEM) plug-in. We
describe briefly the installation and uninstallation of AEM. We concentrate on how effectively
the power can be managed and monitored on the IBM POWER7 Systems. We cover
functionalities, such as power saving and power capping.
5.3.1 Active Energy Manager (AEM) overview
Active Energy Manager (AEM) is a Systems Director plug-in that offers power and thermal
monitoring. AEM also comes with many management capabilities, which can be used to gain
a better understanding of the data center’s power usage. Overall, this feature helps you to
better utilize the available power. You can also use AEM to plan for your future energy needs.
AEM directly monitors IBM Power Systems, System z, System x®, and IBM BladeCenter®
servers. AEM can also indirectly monitor non-IBM equipment by using external metering
devices, such as power distribution units (PDUs) and sensor devices. 186 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5.3.2 AEM planning, installation, and uninstallation
In the following section, we describe how to implement AEM.
Installation steps
You must install AEM on systems running IBM Systems Director server Version 6.2.1 or later.
The link in the previous shaded box provides the details about Systems Director. Follow these
AEM plug-in installation steps:
1. Download the AEM plug-in from the following link:
http://www-03.ibm.com/systems/software/director/downloads/plugins.html
2. Select the download package for the operating system that is running on the Systems
Director server.
3. Copy the downloaded package to a directory or folder on the Systems Director server and
extract the contents of the package.
4. Set a few attributes in the installer.properties file. Change the value of the three
following attributes for unattended installation:
INSTALLER_UI=silent:
LICENSE_ACCEPTED=true
START_SERVER=true
If the START_SERVER option is set to false, you need to manually restart AEM. Perform these
steps to manually restart AEM:
a. Run #/opt/ibm/director/bin/smstop.
b. To see the status of the Systems Director, issue the smstatus command.
c. To start the Systems Director, issue the /opt/ibm/director/bin/smstart command.
Removal steps
Before removing AEM from the Systems Director database, ensure that both AEM and the
Systems Director server are running. The AEM group only gets removed if the Systems
Director server is active. Follow these AEM plug-in removal steps:
1. To uninstall AEM on a system running AIX/LINUX, edit the installer.properties to
change the value of these attributes:
INSTALLER_UI=silent
DATABASE_CLEANUP=true
2. Launch the uninstaller.
Prerequisite: Systems Director must be installed and configured before the installation of
the AEM plug-in. You can obtain more information about how to configure Systems
Director in this document:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247694.pdf
Use this link to download the IBM Systems Director software:
http://www-03.ibm.com/systems/software/director/downloads/plugins.htmlChapter 5. POWER7 system management consoles 187
5.3.3 AEM and the managed systems
After installing AEM, we need to connect the managed systems to the Systems Director.
Figure 5-24 illustrates how a server can be connected to Systems Director.
Figure 5-24 Connection between Systems Director and the managed server
AEM, running on a Systems Director, communicates to the HMC, which communicates to the
server. You can access the Systems Director GUI via a browser. The DB2® workload is
running on the server (refer to Figure 5-24) managed by the HMC. The DB2 server simulates
the workload that is generated on the production servers. AEM manages and monitors the
energy that is consumed by the servers.
We need to connect the managed systems to the Systems Director. Follow these steps:
1. The first step is to discover the HMC, which manages the server. Log in to IBM Systems
Director server as a root user.
2. Select the Inventory tab.
3. Select the System Discovery tab.
4. Enter the IP address of the HMC.
POWER7 16 CPU 256GB
AIX 6.1 running DB2
Hardware Management
Console (HMC)
DB2 Workload drivers
Windows XP running DB2 workload
POWER5 p570
AIX 6.1 running Systems
Director 6.1.2. AEM 4.2
Brower188 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5. Click Discover Now, as shown in Figure 5-25.
Figure 5-25 HMC discovery
6. After the HMC is discovered, you need to give access to the system. Under the column
heading Access, click No Access, as shown in Figure 5-26.
Figure 5-26 Accessing the discovered system
7. Provide the user ID and password for the HMCChapter 5. POWER7 system management consoles 189
8. Click Request Access, as shown in Figure 5-27.
Figure 5-27 Accessing the discovered system
9. After the access is granted, click Inventory ? Views ? Platform Managers and
Members, as shown in Figure 5-28. In this pane, you see the HMC, as well as the servers
that are managed by the HMC.
Figure 5-28 Discovered HMCs and the servers that are managed by the HMCs
5.3.4 Managing and monitoring the consumed power using AEM
There are two ways to manage power: the power capping method and the power savings
method.
Power capping
You can use power capping to limit the power that is used by the server. Setting a power
capping value ensures that system power consumption stays at or beneath the value that is
defined by the setting. You can specify this value in terms of an absolute value or in terms of a
percentage of the maximum power cap:
Absolute value: This option is useful for a single object or for a group of similar objects for
which the same power cap value is appropriate.
Percentage value: This option is particularly useful in the case of a group of
heterogeneous or unlike systems, where a specific power cap value is inappropriate, but
percentage capping makes sense. 190 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Power savings
Power savings is helpful in saving the amount of power that is used by the servers. There are
two types of power savings:
Static power saving: This mode lowers the processor frequency and voltage on a system
by a fixed amount, therefore reducing the power consumption of the system while still
delivering predictable performance. This percentage is predetermined to be within a safe
operating limit and is not configurable by the user.
Static power saving can be enabled based on regular variations in workloads, such as
predictable dips in utilization overnight, or over weekends. It can be used to reduce peak
energy consumption, which can lower the cost of all power used. Note that when static
power saving is enabled for certain workloads with low CPU utilization, workload
performance is not affected, although CPU utilization might increase due to the reduced
processor frequency.
Dynamic power saving: Dynamic power saving allows for two kinds of modes: the favor
performance mode and the favor power mode:
– Favor performance mode: This mode allows the CPU cycles to go above the nominal
frequency. The system increases the CPU frequency above the nominal frequency if
the workload demands. With a lesser workload, the system runs at a lower frequency
to save energy.
– Favor power mode: In this mode, the CPU frequency is capped at a certain level, which
means that the CPU frequency cannot go higher that level, even if the workload
demands it. The system increases the CPU frequency up to the nominal frequency if
demanded by the workload. With no workload, the system runs at a slower frequency
to save energy.
Power capping versus power savings
It is important to understand the differences between power capping and power savings.
Power capping is used to allow the user to allocate less power to a system, which in turn
helps to cool the system. This mode can help you save on the data center infrastructure costs
and then potentially allow more servers to be put into an existing infrastructure. However,
power savings is used to put the server into a mode that consumes less energy.
Configuring dynamic power savings
Follow these steps to configure dynamic power savings:
1. Log on to the IBM Systems Director server.
Licensing: The use of the power capping and power savings functions requires an Active
Energy Manager license. You are granted a 60-day evaluation license when you begin
using AEM. When the evaluation license expires, the power savings and power capping
functions, on systems where these functions were activated, are turned off. The policies
that were previously saved in the AEM still exist when the license expires, but they cannot
be applied to any resource.Chapter 5. POWER7 system management consoles 191
2. Expand the Energy tab to view the Active Energy Manager pane, as shown in
Figure 5-29.
Figure 5-29 Active Energy Manager
3. From the AEM GUI, select Active Energy Managed Resources, which shows the list of
the systems that the Systems Director can manage, as shown in Figure 5-30.
Figure 5-30 Active Energy Managed Resources192 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4. Select one of the systems. Then, click the Actions drop-down list box to navigate through
the power management panel. Select Energy ? Manage Power ? Power Savings, as
shown in Figure 5-31.
Figure 5-31 Managing power
5. Select Dynamic power savings. Choose from two options: Favor Power mode and Favor
Performance mode, as shown in Figure 5-32.
Figure 5-32 Power saving mechanismsChapter 5. POWER7 system management consoles 193
You have completed the setup for the AEM. To view the data and the graphical representation
of the power consumed, as well the CPU frequency, use the Trend Data option. Select
Actions ? Energy ? Trend Dat a, as shown in Figure 5-33.
Figure 5-33 Selecting to view the Trend Data of the server194 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 5-34 shows the variation in the Maximum Input Power when in Favor Performance
mode compared to Favor Power mode.
Figure 5-34 Shows the peak power consumption lowered
Figure 5-35 shows the variation in the Processor Frequency in the Favor Power mode and the
Favor Performance mode.
Figure 5-35 Variation of processor frequency
These features of AEM help to save energy. Chapter 5. POWER7 system management consoles 195
5.4 High availability Systems Director management consoles
High availability of the server and its applications is maintained through the following
solutions:
PowerHA
Workload partition (WPAR)
The Systems Director provides management consoles to configure and manage both
PowerHA and WPAR. Both solutions are available as an additional plug-ins to the Systems
Director.
The following references provide details about how to install and configure these plug-ins to
manage high availability.
For more information: For further details, refer to the IBM Systems Director Active
Energy Manager Installation and User’s Guide, which is available at the following website:
http://publib.boulder.ibm.com/infocenter/director/v6r2x/topic/com.ibm.director.
aem.helps.doc/frb0_aem4.3_docs_user.pdf
For more information:
PowerHA Plug-in: See Chapter 12, “Creating and managing a cluster using IBM
Systems Director” in IBM PowerHA SystemMirror 7.1 for AIX:
http://www.redbooks.ibm.com/redbooks/pdfs/sg247845.pdf
WPAR Plug-in:
http://publib.boulder.ibm.com/infocenter/director/v6r1x/index.jsp?topic=/wpa
rlpp_210/wparlpp-overview.html
Requirements:
The PowerHA plug-in needs AIX 7.1 and PowerHA7.1 or later
WPAR requires AIX Version 6.1 with the 6100-02 TL or later196 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 197
Chapter 6. Scenarios
This chapter provides sample scenarios that show the various configurations in which IBM
Power Systems high-end servers can participate. The scenarios show the Power Systems
flexibility, high availability, reliability, availability, and serviceability (RAS) capabilities, and the
ease of administration with the IBM Systems Director.
In this chapter, we discuss the following topics:
Hot node add and repair
Hot GX adapter add and repair
Live Partition Mobility (LPM) using the HMC and SDMC
Active migration example
Building a configuration from the beginning
LPM and PowerHA
6198 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
6.1 Hot node add and repair
The CEC Hot Add Repair Maintenance (CHARM) functions provide the ability to add/upgrade
system capacity and repair the Central Electronic Complex (CEC), including processors,
memory, GX adapters, systems clock, and service processor without powering down the
system. You can obtain more detailed information about CHARM in 4.3, “CEC Hot Add Repair
Maintenance (CHARM)” on page 123.
In this section, we show the configuration and repair steps using CHARM to help you
understand the operation of hot node add and repair, and hot GX adapter add and repair
through a simple scenario.
Physical environment of the test system
Table 6-1 shows the hardware specifications and the location code for each drawer in the test
system.
Table 6-1 Hardware specification for the test system on a Power780
6.1.1 Hot node add
In this scenario, the second CEC drawer is added to a single CEC drawer using the hot node
add feature, as shown in Figure 6-1. Our hot node add and repair is tested under the control
of a single Hardware Management Console (HMC). We do not provide you with all the steps,
but we show specific, useful steps.
Figure 6-1 Hot node add to a single CEC drawer
Processor Number Memory size Drawer serial number
First CEC drawer 16 128 GB DBJH613
Second CEC drawer 16 (CoD processor) 128 GB (CoD memory) DBJG781
1-Drawer
Po w er 78 0 Po wer 7 80
Po wer 7 80
2-Drawer
Po we
Po we
2-DrawerChapter 6. Scenarios 199
Figure 6-2 shows the processor number before adding the second CEC drawer. The total
number of processors is 16.
Figure 6-2 Processor number before the hot node add procedure
Figure 6-3 shows the memory size before adding the second CEC drawer. The total memory
size is 128 GB.
Figure 6-3 Memory size before the hot node add procedure200 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 6-4 shows the physical I/O resources before adding the second CEC drawer.
Figure 6-4 Physical I/O resources before the hot node add procedure
Prerequisites for the hot node add procedure
Prior to adding the second CEC drawer, check that the prerequisites are met. Table 6-2
shows the system firmware and the HMC levels for the hot node add procedure for the
Power 780.
Table 6-2 System firmware and HMC levels for the hot node add procedure for the Power 780
If the prerequisites are met, continue with the preparation steps.
Figure 6-5 shows that the logical partition (LPAR) is running on the node during the CHARM
operation. We added a workload using over 90% of the processor during the hot node repair
and succeeded without any problem. However, we strongly advise that you quiesce a critical
application prior to starting the hot node repair.
Figure 6-5 Notice that the LPAR is running during hot node add procedure
Minimum recommended level Test system level
System firmware AM720_064 or later AM720_090
HMC V7R7.2.0 + MH01235 V7R7.2.0.1
Important: We strongly advise that all scheduled hot adds, upgrades, or repairs are
performed during off-peak hours.
You must move all critical business applications to another server using Live Partition
Mobility (LPM), if available, or quiesce critical applications for hot node add, hot node
repair, hot node upgrade, and hot GX adapter repair.Chapter 6. Scenarios 201
Follow these steps to add the hot node:
1. In the navigation pane, select Systems Management.
2. Select Add Enclosure by selecting the server ? Serviceability ? Hardware ? MES
Tasks ? Add Enclosure (refer to Figure 6-6).
Figure 6-6 Add enclosure
3. Select a machine type - model to add. Click add (refer to Figure 6-7).
Figure 6-7 Step to select an enclosure type to add
4. Click Launch Procedure. Read every step carefully. Click Next.202 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5. At this step (refer to Figure 6-8), connect the SMP and the FSP cable between the first
CEC drawer and the second CEC drawer. Click Next.
Figure 6-8 Step to connect SMP and FSP cable between two nodes
6. After a few minutes, you see the message that is shown in Figure 6-9.
Figure 6-9 Successful completion of hot node addChapter 6. Scenarios 203
Figure 6-10 shows the processor number after the hot node add. You can see the increased
installed number of processors. The configurable number of processors is not changed,
because the processors of the second CEC drawer are Capacity on Demand (CoD)
processors. The total installed processor number is 32.
Figure 6-10 Processor number increase after the hot node add
Figure 6-11 shows the memory size after the hot node add. The configurable size of the
memory is not changed, because the memory of the second CEC drawer is CoD memory.
The total memory size is 256 GB.
Figure 6-11 Memory size increase after the hot node add204 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 6-12 shows the physical I/O resources after the hot node add. You can see the
additional I/O location codes.
Figure 6-12 I/O resources increase after the hot node add
6.1.2 Hot node repair
In this scenario, we replace a memory dual inline memory module (DIMM) in the first CEC
drawer.
Prerequisites for hot node repair
As already explained in CHARM operations on 4.3.2, “Hot repair” on page 125, you must
have the following prerequisites installed before the hot node repair:
The system must have two or more nodes to use the hot node repair function.
Verify that the service processor redundancy capability is enabled.
– In the navigation pane, select Systems Management ? Servers.
– In the work pane, select the Server ? Serviceability ? FSP Failover ? Setup.
(Refer to Figure 6-13 on page 205).
You must configure the HMC with a redundant service network with both service
processors. Refer to Figure 6-14 on page 205.Chapter 6. Scenarios 205
Figure 6-13 FSP Failover
Figure 6-14 Service processor redundancy enabled
Preparing for the hot repair or upgrade (PHRU) utility
Prior to the start of the hot node repair, you need to run the Prepare For Hot Repair or
Upgrade tool first. All resources that are identified by the Prepare for Hot Repair or Upgrade
utility must be freed up by the system administrator prior to the start of the hot upgrade or
repair procedure.
The Prepare for Hot Repair or Upgrade utility is automatically run during every service
procedure requiring the evacuation of a node. This utility ensures that all requirements are
addressed prior to the execution of the repair or upgrade procedure.
The test system is managed by an HMC. Follow this procedure:
1. In the navigation pane, select Systems Management ? Servers.
2. In the work pane, select the server name on which the procedure will be performed. Select
Serviceability ? Hardware ? Prepare for Hot Repair/Upgrade.206 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3. Select the base location code that contains the field-replaceable unit (FRU) to be serviced,
as shown in Figure 6-15.
Figure 6-15 FRU selection on the Prepare for Hot Repair or Upgrade utility
4. Click Next.
5. Click OK when prompted to continue.
The Prepare for Hot Repair or Upgrade utility (PHRU) displays a window listing the set of
actions that must be performed for the node evacuation to be successful, as shown in
Figure 6-16.
Figure 6-16 Prepare for Hot Repair or Upgrade window
1
st
CEC DrawerChapter 6. Scenarios 207
Click the message text to display information about the resources that are being used, as
shown in Figure 6-17.
Figure 6-17 Information about the adapter to be removed prior to start the hot repair
The I/O resource is removed by using the rmdev command. All errors are corrected and a
node can be evacuated for the host node repair.
Hot repair
During the hot node repair of replacing a memory DIMM in the first CEC drawer, the control
panel must be moved to the alternate CEC drawer, as shown in the Figure 6-18.
Figure 6-18 Move the control panel to the alternate CEC drawer208 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
At this step, the control panel is removed from the first CEC drawer and installed on the
second CEC drawer using the Advanced System Management Interface (ASM) function, as
shown in the Figure 6-19 and Figure 6-20.
Figure 6-19 Remove control panel from the first CEC drawer using ASM
Figure 6-20 Install the control panel at the second CEC drawer using the ASM
6.2 Hot GX adapter add and repair
In this test scenario, the hot GX adapter add and repair is done under a dual management
console with the HMC and the Systems Director Management Console (SDMC). We do not
show all the steps, but we show specific, useful steps.
6.2.1 Hot GX adapter add
The following prerequisites are necessary for the hot GX adapter add.
1
st
CEC DrawerChapter 6. Scenarios 209
Prerequisites for hot GX adapter add
Prior to adding a GX adapter, check that the prerequisites are met. Table 6-3 shows the
system firmware and HMC levels prior to the hot GX adapter add in the Power 780.
Table 6-3 System firmware and HMC levels for the hot GX adapter add for the Power 780
If the prerequisites are met, continue with the next steps.
In the dual management console environment, you must perform all the CHARM operations
from the primary management console. But, with V7R7.3.x.x of the HMC and SDMC, a new
feature was added that if you start an add or repair process from the non-primary
management console, you are prompted if you want to make the console at which you are
performing the procedure the primary management console. The operation then tries to
renegotiate the role of the primary console. If the non-primary HMC can become the primary
HMC, the process allows you to continue with the procedure on this console.
We omitted most of the hot GX adapter add steps because they are similar to the steps of the
hot node add. During our hot GX adapter test, we see a message, as shown in Figure 6-21,
and we select Yes, force the local management console to become the primary. Then,
the management console becomes the primary, and we continue with the procedure on this
console.
Figure 6-21 Force the local management console to become the primary
After finishing the hot GX adapter add, it is necessary to add an I/O drawer separately.
6.2.2 Hot GX adapter repair
In this scenario, we replace a GX adapter in the first CEC drawer. The prerequisites for the
hot GX adapter repairs are presented in the next section.
Minimum recommended level Test system level
System firmware All levels AM720_101
HMC V7R7.1.0 V7R7.3.0.1
Primary console: In a dual management console environment, generally the primary
console is the first console that is connected. So, the primary console is not fixed, and it
can be the HMC or SDMC if you use a mixed console of HMC and SDMC. This concept is
the same as two HMCs or two SDMCs.210 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Prerequisites for the hot GX adapter repair
Prior to repairing a GX adapter, check that the prerequisites are met. Table 6-4 shows the
system firmware and HMC levels prior to the hot GX adapter repair in the
Power 780.
Table 6-4 System firmware, HMC levels for hot GX adapter repair for Power 780
If the prerequisites are met, continue with the following steps.
Follow these steps to perform the hot GX adapter repair:
1. In the navigation pane, select Systems Management ? Servers.
2. In the work pane, select the server name on which the procedure will be performed. Select
Serviceability ? Hardware ? Exchange FRU (see Figure 6-22).
Figure 6-22 Window for exchanging the FRU
Minimum recommended level Test system level
System firmware AM720_064 or later AM720_101
HMC V7R7.2.0 + MH01235 V7R7.3.0.1Chapter 6. Scenarios 211
3. Select the proper enclosure types and FRU to be replaced, as shown in Figure 6-23. After
this window, read every step carefully and click Next.
Figure 6-23 Select a FRU for a hot repair212 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Even though you skip the step to identify which I/O resources are to be removed with the
PHRU. Figure 6-24 shows you the step that notifies you to remove an adapter during the hot
GX adapter repair.
Figure 6-24 Prepare for Hot Repair or Upgrade main window
Prior to continuing to the next step, all the adapters must be removed first. To continue, read y
every step carefully and follow each step.
6.3 Live Partition Mobility (LPM) using the HMC and SDMC
We use this HMC and SDMC for this scenario:
Source: hmctest2 (172.16.20.109)
Destination: sdmc1 (172.16.20.22)
One node is attached to the HMC, and the other node is attached to the SDMC.
6.3.1 Inactive migration from POWER6 to POWER7 using HMC and SDMC
The POWER6 server and the POWER7 server are shown in “ITSO Poughkeepsie
environment” on page 397.
In this example, we perform an inactive migration from a POWER6 server to a POWER7
server. The POWER6 server is managed by an HMC, and the POWER7 Server is managed
by the SDMC. We deliberately ignored the LPM prerequisites to demonstrate the importance
of planning prior to the process. You notice that most of the errors relate to items that are
listed in Table 3-2 on page 68. We attempt to show the resolution to several of the issues in
this example. We demonstrate a migration that has met all the prerequisites in 6.4, “Active
migration example” on page 216.Chapter 6. Scenarios 213
Follow these steps for the migration:
1. Log on to the HMC command-line interface (CLI) and confirm if the HMC can perform the
remote mobility:
hscroot@hmctest4:~> lslparmigr -r manager
remote_lpar_mobility_capable=1
Notice that a message equal to 1 means that the HMC is capable of performing the
mobility.
2. Log on to the SDMC and confirm that it is also capable of remote mobility:
sysadmin@sdmc1:~> lslparmigr -r manager
remote_lpar_mobility_capable=1
3. Confirm that the managed server is mobile capable. On the HMC, follow these steps:
a. Log on to the HMC.
b. Click System Management ? Servers ? Select properties.
c. Select the Capabilities tab.
4. Look for Mobility Capable, as shown on Figure 6-25.
Figure 6-25 Capabilities of the POWER6 570
Assuming that all prerequisites are met, proceed with the rest of the steps.214 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5. Perform a validation by selecting Managed Server, which lists the valid partitions. Select a
partition. Click the pop-up arrow icon. Click Operations ? Mobility ? Validate, as shown
on Figure 6-26.
Figure 6-26 Validating the partition mobility
6. Enter the remote SDMC, as requested. We are performing a remote LPM, as shown in
Figure 6-27. Click Refresh Destination System to propagate the list of systems that are
managed by the SDMC.
Figure 6-27 Destination and validation window
This request fails with an error, because the Secure Shell (SSH) authorization keys are not
set up between the HMC and the SDMC. To resolve this error, refer to “Setting up Secure
Shell keys between two management consoles” on page 362. Figure 6-28 on page 215
shows the error.Chapter 6. Scenarios 215
Figure 6-28 SSH authentication error
After you resolve the error, retry the validate operation. Again, you see an error code
HSCLA27C, which is generic and depends on your environment. The error can relate to N_Port
ID Virtualization (NPIV) or virtual Small Computer System Interface (vSCSI). Or, the error
might be because the addition of a virtual adapter is not required. Another possibility is that
there are Resource Monitoring and Control (RMC) communication issues between the HMC
and the virtual I/O servers. Review the settings and retry the operation.
Figure 6-29 shows an example of an error. The error is caused by RMC communication
issues between the HMC/SDMC and the virtual I/O server.
Figure 6-29 LPM validation error
Avoid using clones: Using cloned images, such as the alt_disk_install, to clone servers
creates RMC challenges. Avoid using cloned images. The previous error was partly
caused because RMC uses a node ID to communicate with the LPARs. The node ID is
stored in /etc/ct_node_id. If one or more LPARs, including the VIO servers on the same
network, have the same node_id, RMC cannot confirm with which LPAR it is
communicating. Either avoid alt_disk_clone to install another LPAR or clean the node_id
immediately after the cloning. Other node_id symptoms include the inability to perform
dynamic LPAR (DLPAR) operations.216 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
When migrating to POWER7, check that the operating system level is on a TL that supports
the destination server processor mode. Failure to do so might result in the migration request
failing, which is shown with the error code HSCL366B in Figure 6-30.
Figure 6-30 LPM processor mode failure
In this example, we tried to migrate an LPAR with AIX 6.11 TL3 to a POWER7 machine. This
attempt failed due to the processor mode. To resolve the issue, we upgraded AIX to AIX 6.1
TL 6 and retried the operation. The migration checked for the virtual I/O servers. After a
suitable virtual I/O server was selected, the migration completed successfully, as shown in
Figure 6-31.
Figure 6-31 LPM migration status
6.4 Active migration example
In this example, we migrate a POWER7 LPAR to a POWER6 server that is on a separate
management console. The POWER6 server is managed by the HMC. The POWER7 server is
managed by the SDMC. Prior to this migration example, we performed a similar migration of
the same LPAR from POWER6 to POWER7 without meeting the requirements outlined in the
planning section. That migration kept failing due to requirements that were not met (see 6.3.1,
“Inactive migration from POWER6 to POWER7 using HMC and SDMC” on page 212 for that
example). Refer to the scenario to see which errors might be experienced. In this section, we
learned from our mistakes and experiences, and we completed the prerequisites in the
following example. Follow these steps: Chapter 6. Scenarios 217
1. Log on to the LPAR and run lsconf, as shown in Example 6-1.
Example 6-1 Running lsconf on the server to confirm the system model
# uname -a
AIX rflpar20 1 6 00F69AF64C00
# lsconf | head -15
System Model: IBM,9179-MHB
Machine Serial Number: 109AF6P
Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 6
Processor Version: PV_7_Compat
Number Of Processors: 2
Processor Clock Speed: 3864 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 6 lpar2_570
Memory Size: 3072 MB
Good Memory Size: 3072 MB
Platform Firmware level: AM720_090
Firmware Version: IBM,AM720_090
Example 6-1 shows that rflpar20 is on a POWER7 Server running in POWER6 mode.
Before attempting a migration, ensure that the destination server is capable of handling
the running mode. Refer to 7.2.3, “Processor compatibility mode” on page 253.
2. Follow these steps to initiate an LPM operation:
a. Log on to the SDMC.
b. Select Hosts.
c. Select the Virtual Server.
d. Click Action ? Operations ? Mobility ? Validate.
Figure 6-26 on page 214 shows these steps using an HMC instead of the SDMC.218 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3. At the validation window, enter the destination management system (SDMC/HMC) with
the appropriate user, and select Refresh Destination System. This action propagates the
list of managed servers that are managed by the remote SDMC/HMC. Select Validate, as
shown in Figure 6-32.
Figure 6-32 Validating the LPM between separate management consoles
4. After validation, the capable virtual I/O servers are listed. Select the appropriate virtual I/O
server and click Migrate.
5. After the migration completes, rerun the lsconf command to confirm that you are on a
separate server, as shown in Example 6-2.
Example 6-2 The lsconf command confirming that the LPAR has moved
# uname -a;lsconf | head -15
AIX rflpar20 1 6 00C1F1704C00Chapter 6. Scenarios 219
System Model: IBM,9117-MMA
Machine Serial Number: 101F170
Processor Type: PowerPC_POWER6
Processor Implementation Mode: POWER 6
Processor Version: PV_6_Compat
Number Of Processors: 2
Processor Clock Speed: 4208 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 6 lpar2_570
Memory Size: 3072 MB
Good Memory Size: 3072 MB
Platform Firmware level: EM350_085
Firmware Version: IBM,EM350_085
Console Login: enable
6.5 Building a configuration from the beginning
The following scenario shows a complete HA virtual solution built from nothing. For this
scenario, we use two IBM Power 780 servers and implemented Active Memory Sharing
(AMS), Active Memory Expansion (AME), LPM, and PowerHA features using NPIV.
Figure 6-33 on page 220 illustrates the scenario to be configured in this section. Figure 6-33
on page 220 represents the configuration use for each server in this scenario. We create the
same configuration for both of the 780 servers in our ITSO environment. We described our
environment in “ITSO Poughkeepsie environment” on page 397.
Notice that this scenario is a high availability solution. However, you can increase the levels of
redundancy by adding, for example, more Ethernet adapters or host bus adapters (HBAs)
and additional paths to the storage area network (SAN).
When implementing your solution, remember to check that each physical Ethernet adapter is
located in a separate CEC (in this scenario, we use Integrated Virtual Ethernet (IVE)
adapters). Also, check that each physical Fibre Channel (FC) adapter is located in a separate
CEC.
In the following sections, we describe these configurations:
Virtual I/O server definition and installation
HEA port configuration for dedicated SEA use
NIB and SEA failover configuration
Active Memory Sharing configuration
Server-side NPIV configuration
Active Memory Expansion (AME) configuration
The LPM operation
The PowerHA operation
RMC connection: The RMC connection between the virtual servers and the HMC or the
SDMC are key to the implementation. You can check if it works correctly by executing the
command lspartition -dlpar at your HMC or SDMC.220 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 6-33 Example of a minimum configuration for high availability in one server
The same network configuration needs to be done in the second server. The NPIV
configuration is only performed in the first server for the client LPAR installation.
Refer to the following SAN considerations:
Create a group for both VIO servers in each Power 780 server. AMS paging disks are
assigned to the group.
The VIO operating system’s disks must be assigned only to the corresponding VIO server.
AMS paging disks will are duplicated for each Power 780 server.
Because we use NPIV, additional configurations are performed after we create the client
virtual servers.
The following sections guide you to configure this environment. These scenarios are not
intended to show you a step-by-step configuration, but they provide a consistent guide for the
installation. For more details and specific tasks, refer to each specific product installation
guide. We advise you to read the entire example before performing your implementation.
6.5.1 Virtual I/O servers
This section guides you to perform the installation and initial configuration of your Virtual I/O
servers.
Differences: We reference our environment’s adapter numbers in this scenario. Be aware
that adapter numbers can change according to your environment.
Planning: Perform exhaustive planning of your SAN zoning before starting the Power
Systems configuration.
Client lpar1
LAN SW
Virtual I/O Server 1
VLAN 1
VLAN 99
fcs1
(virt)
fcs0
(virt)
fcs3
(virt)
fcs2
(virt)
ent0
(virt)
ent5
(SEA)
ent4
(LA)
ent3
(virt)
ent2
(virt)
ent0
(phy)
ent1
(phy)
priority=1
control channel
fcs2
(phy)
vfchost0
(virt)
vfchost1
(virt)
ha_mode=auto,
ctl_chan=ent3
fcs0
(phy)
ent5
(SEA)
ent4
(LA)
ent3
(virt)
ent2
(virt)
ent0
(phy)
ent1
(phy)
priority=2
control channel
fcs2
(phy)
vfchost0
(virt)
vfchost1
(virt)
ha_mode=auto,
ctl_chan=ent3
fcs0
(phy)
Virtual I/O Server 2
VLAN 1 VLAN 1
SAN Switch SAN Switch
LAN Switch LAN SwitchChapter 6. Scenarios 221
Virtual I/O server definition and installation
We define the first virtual I/O server LPAR (virtual server in the SDMC) and direct you through
the installation process.
Creating the virtual I/O server LPAR profile
Follow these steps to create the virtual I/O server LPAR profile:
1. Log on to the SDMC.
2. In the welcome window, select the server with which you want to work. Click Action ?
System Configuration ? Create Virtual Server. Then, the Create Virtual Server window
opens.
3. Type the partition name and select VIOS in the Environment box. Click Next. Refer to
Figure 6-34.
Figure 6-34 VIO server creation using SDMC 1
4. Configure the memory and processor values for your VIO (we review these values later).
Important: We do not include each step for the LPAR creation, but we provide details
about the key parts.222 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5. In the Virtual Ethernet profile, as shown in Figure 6-35, configure two virtual Ethernet
adapters.
Figure 6-35 Virtual Ethernet adapter configuration
Both adapters are used for the Shared Ethernet Adapter (SEA) configuration.
In the Host Ethernet Adapter section, select the T1 adapters for the first virtual I/O server,
and select the T3 adapters for the second virtual I/O server. Click Next.
6. In the Virtual storage adapters profile, set the Maximum number of virtual adapters to
1000. Click Next. Do not configure the virtual storage adapters at this point.
7. In the Physical I/O adapters window, select the physical adapters (HBAs) that you want to
configure in the VIO server.
8. In the summary window, review the settings and click Finish.
9. After the virtual server is created, edit the virtual server profile to adjust the following
values:
– Check the processor values for minimum, desired, and maximum processing units and
virtual processors.
– Configure the partition as uncapped with a weight of 255.
– Configure the memory values that you want.
– In the Optional settings window, select Enable connection monitoring.
10.Activate your new virtual I/O server LPAR, and install the latest available virtual I/O server
image.
11.Perform the usual virtual I/O server configuration, for example, with date and time settings.
12.Install the multi-path I/O (MPIO) driver according to your environment. Consult the System
Storage Information Center (SSIC) website for more information about the available
multi-pathing drivers.
Trunk priority: Notice that we configure the trunk priority with a value of 1 for the first
VIO and with a value of 2 in the second VIO for each server. This priority helps you to
configure the SEA failover feature in later steps.
Use separate CECs: As you did with the Host Ethernet Adapter (HEA) ports, the
adapters that you assign to each virtual I/O server need to be located in separate CECs
to help maximize availability.
Configurations: Repeat this configuration for the second virtual I/O server in this server
and for the two virtual I/O servers in your second POWER7 server.Chapter 6. Scenarios 223
6.5.2 HEA port configuration for dedicated SEA use
After creating your virtual I/O servers, you need to configure the corresponding IVE ports in
promiscuous mode. As explained in 2.9, “Integrated Virtual Ethernet” on page 48, this action
helps us to configure an SEA using the IVE ports.
To perform this configuration, refer to the following steps:
1. Log on to the SDMC.
2. In the Welcome window, select the server with which you want to work.
3. Click Action ? Hardware Information ? Adapter ? Host Ethernet, and the Host
Ethernet Adapter window opens.
4. Select the HEA port that you want to configure and click Configure (in this scenario, we
use port T1 for the first VIO and port T3 for the second VIO on both CECs).
5. In the Promiscuous virtual server field, select the virtual server that will use the HEA port.
In this scenario, we put the first virtual I/O server as the virtual server for each T1 adapter
and the second virtual I/O server as the virtual server for each T3 adapter.
6. Select Enable flow control.
7. Click OK.
8. Repeat this process for the HEA ports that you configure in the VIO servers and plan to
use as part of the SEA adapters.
6.5.3 NIB and SEA failover configuration
In this section, we explain the Network Interface Backup (NIB) and Shared Ethernet Adapter
(SEA) configurations.
In this configuration example, we use NIB as the aggregation technology for network
redundancy. Follow these steps:
1. Check the adapter numbers for your physical Ethernet adapters. You create the NIB
adapter using the two physical adapters, which in this case are ent0 and ent1.
2. Create your NIB adapter by executing the commands that are shown in Example 6-3. The
command output is shown in Figure 6-12 on page 204.
Example 6-3 NIB adapter configuration
$ mkvdev -lnagg ent0
ent4 Available
en4
et4
$ cfglnagg -add -backup ent4 ent1
3. Using the virtual slot placement, identify the adapter in VLAN1 and the adapter in
VLAN99. The adapter in VLAN99 is the control channel adapter. The adapter in VLAN1 is
the default adapter of the SEA.
4. Define the SEA adapter in the virtual I/O server using the mkvdev command, as shown in
Example 6-4 on page 224.
Important: Review the IVE documentation to determine the best values for the
remaining configuration parameters according to your network infrastructure.224 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Example 6-4 Shared Ethernet Adapter with failover creation
$ mkvdev -sea ent4 -vadapter ent2 -default ent2 -defaultid 1 -attr ha_mode=auto
ctl_chan=ent3 netaddr=172.16.20.1 largesend=1
ent5 Available
en5
et5
5. Configure the IP address for the SEA adapter using the cfgassist command, as shown in
Figure 6-36. Remember to use a separate IP address for the VIO servers.
Figure 6-36 SEA IP address configuration using the cfgassist menu
6. Dynamically configure the virtual server for each virtual I/O server to be a mover service
partition. Overwrite the virtual I/O server LPAR profile to make the change permanent.
After you perform this configuration on both VIO servers in each server, you have addressed
the needs for the virtual I/O server and Ethernet configurations, and you can continue with the
rest of the configuration steps.
Note: After you configure the SEA adapters in both VIO servers and install a client LPAR,
test the adapter failover before going into production. Refer to section 5.1.2 of IBM
PowerVM Virtualization Introduction and Configuration, SG24-7940-04, for more
information about this test.Chapter 6. Scenarios 225
6.5.4 Active Memory Sharing configuration
In this section, we describe the process for the AMS configuration:
Creating the paging devices on the Virtual I/O servers
Creating the shared memory pool using the SDMC
Creating the paging devices on the Virtual I/O servers
Because we are deploying a redundant configuration, we need our paging devices located in
the SAN environment and accessible to both VIO servers in the IBM Power server. Also,
because we plan to use LPM, the paging spaces must be duplicated in each server. For
example, if we have two 8 GB and two 4 GB paging devices on server 1 (available to both
virtual I/O servers in the server), we need to have another two 8 GB and two 4 GB paging
devices at server 2, and assign them to both VIO servers. Follow these steps to create the
paging devices:
1. As shown in Figure 6-37, create your paging spaces in the SAN and assign them to both
VIO servers in the first server and to both VIO servers in the second server.
Figure 6-37 Paging space creation
2. Configure the device on each VIO server by executing the cfgdev command.
3. On your SDMC Welcome page, select the server where you will create the shared
memory pool.
Console: For this part of the example, we use the SDMC. Or, you can also use the HMC.226 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
4. As shown in Figure 6-38, select Action ? System Configuration ? Virtual
Resources ? Shared Memory Pool Management.
Figure 6-38 Shared memory pool creation with SDMC (page 1 of 4)
5. The Create Shared Memory Pool window opens, as shown in Figure 6-39.
– Specify the Maximum Pool Size and the Pool Size.
– Check that both Virtual I/O servers appear as paging devices.
– Click Add Paging Devices.
Figure 6-39 Shared memory pool creation with SDMC (page 2 of 4)Chapter 6. Scenarios 227
6. The Add Paging Devices window opens.
– Select Physical as the Device Type.
– Click View Devices.
As shown in Figure 6-40, the SDMC detects the available disks and presents them in a grid.
After the available disks appear, select the devices that you plan to add into the shared
memory pool and click OK.
Figure 6-40 Shared memory pool creation with SDMC (page 3 of 4) 228 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7. The selected paging space devices are added to the shared memory pool, as shown in
Figure 6-41.
Figure 6-41 Shared memory pool creation with SDMC (page 4 of 4)
8. Click OK to create the pool.
6.5.5 NPIV planning
Because client partitions must exist before the creation of the virtual FC server adapters, you
need to plan the slot assignments and start by creating the client partition. Then, you can add
the server virtual adapters.
Figure 6-42 on page 229 shows the configuration that we will follow for each LPAR that we
define. The slot numbers vary on the virtual I/O server side.
Table 6-5 on page 229 shows the virtual adapter placement for this configuration. We use this
information to create both the client and server adapters.
Verification: Notice that the Redundancy attribute is set to true. Both VIO servers can
access the device. Also, notice that there are two 4 GB and two 8 GB paging devices.
We can activate two LPARs with up to 4 GB and two LPARs with up to 8 GB memory
with this pool. Chapter 6. Scenarios 229
Table 6-5 Virtual FC initial slot assignment
Figure 6-42 NPIV redundant configuration
6.5.6 Client LPAR creation (virtual servers)
In this section, we provide the steps to create the client LPARs or virtual servers:
1. In your SDMC Welcome window, select the server, and then, click Action ? System
Configuration ? Create virtual Server.
2. In the Name window:
– Enter the LPAR name.
– For the environment, select AIX/Linux.
Virtual I/O server Virtual I/O server
slot
Client partition Client partition slot
vios1_p780_1 30 lpar1(3) 20
vios1_p780_1 31 lpar1(3) 21
vios2_p780_1 30 lpar1(3) 22
vios2_p780_1 31 lpar1(3) 23
vios1_p780_1 40 lpar2(4) 20
vios1_p780_1 41 lpar2(5) 21
vios2_p780_1 40 lpar2(5) 22
vios2_p780_1 41 lpar2(5) 23
Virtual I/O
server 1
Physical
storage
Virtual I/O
server 2
Client logical
partition
Physical fibre
channel adapter
Physical fibre
channel adapter
Server virtual fibre
channel adapter
Server virtual fibre
channel adapter
Physical fibre
channel adapter
Physical fibre
channel adapter
Server virtual fibre
channel adapter
Server virtual fibre
channel adapter
Storage
Area
Network
21
20
22
23
Client virtual fibre
channel adapter 1
Client virtual fibre
channel adapter 2
Client virtual fibre
channel adapter 3
Client virtual fibre
channel adapter 4
30
30
31 31
Hypervisor230 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
3. In the Memory window:
– Select Shared for the memory mode.
– Select the Assigned Memory (in this scenario, we create two 6 GB partitions).
4. In the Processor window, assign one processor to the LPAR.
5. In the Ethernet window, be sure that you only have one virtual Ethernet adapter that is
located in VLAN 1, as shown in Figure 6-43.
Figure 6-43 Ethernet configuration for the virtual server
6. In the Storage Selection window, select No (you want to manage the virtual storage
adapters for this Virtual Server).
This option allows you to manually configure the slot assignment for the virtual FC
adapters.
Important: If you click the suspend capable check box, be aware that you add 10%
extra space in your paging device to activate the LPAR.Chapter 6. Scenarios 231
7. Follow these steps in the Virtual Storage Adapters window:
– Enter 100 for the Maximum number of virtual adapters.
– Configure the virtual FC adapters according to the values in Table 6-5 on page 229.
Figure 6-44 shows the lpar1 virtual adapters configuration.
Figure 6-44 Client virtual FC adapters creation using SDMC
8. In the Physical I/O adapters window, do not configure any adapter.
9. In the Summary window, review the settings and click Finish. You have created the virtual
server at this point.
Because the SDMC has an Integrated Virtualization Manager (IVM) approach in the virtual
server creation, you need to review the settings in the profile and configure them:
Paging virtual I/O server: Ensure that you have a primary and secondary paging virtual I/O
server.
Processor: Adjust the values to your desired values.
Memory: Consider maximum memory size and paging devices for AMS.
Now that the LPARs are created, proceed to create the server virtual FC adapters.
6.5.7 Server-side NPIV configuration
In this section, we show you how to create the server virtual FC adapters and to perform the
rest of the configuration to enable your virtual server clients to access the back-end storage
devices.
Creating Virtual FC adapters
After you have defined the client virtual servers, we need to create the server virtual FC
adapters. You can perform this operation dynamically using dynamic LPAR (DLPAR), or you
can shut down your virtual I/O server and modify the partition profile.
Follow these steps to perform the operation dynamically:
1. Select the virtual I/O server.
Slot assignments: You select slot assignments merely to have references and a proper
initial assignment. After your LPAR becomes operational, these numbers change.232 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2. Click Action ? System Configuration ? Manage Virtual Server. The Manage Virtual
Server pane appears.
3. Navigate to the storage adapters section.
4. Click Add and configure the virtual FC adapters according to Table 6-5 on page 229.
5. Configure the devices in the virtual I/O server by using the cfgdev command.
Follow these steps to modify the partition profile:
1. Select the virtual I/O server.
2. Click Action ? System Configuration ? Manage Profiles.
3. Select the profile that you want to modify, and click Action ? Edit.
4. Navigate to the Virtual Adapters tab.
5. Click Action ? Create Virtual Adapter ? Fibre Channel Adapter.
6. Configure the virtual FC adapters according to Table 6-5 on page 229.
7. Shut down the partition and start it again with the modified profile.
SAN zoning for client virtual servers
Because the NPIV technology presents newly independent worldwide port names (WWPNs)
to the SAN devices, you need to perform specific zoning and storage configuration for each
server.
To discover the WWPN information for each virtual FC client adapter, follow these steps:
1. Select the LPAR that you want to configure.
2. Click Action ? System Configuration ? Manage Profiles.
3. Select the profile with which you want to work.
4. Click Action ? Edit.
5. Navigate to the Virtual Adapters tab.
6. Select the Client FC adapter with which you want to work. Click Action ? Properties.
The window that is shown in Figure 6-45 opens.
Save the configuration: After you perform the DLPAR operation, you must save the
current configuration to avoid losing the configuration at the next partition shutdown.Chapter 6. Scenarios 233
Figure 6-45 WWN configuration for a client virtual FC adapter
7. Document the WWPNs and repeat the process for each client virtual FC adapter.
8. Perform the zoning and storage configuration task to enable these WWPNs to access the
storage logical unit numbers (LUNs).
9. Repeat this process for all the adapters in your virtual server client.
Mapping virtual FC adapters to physical HBAs
We create the client and VIO partitions, the client virtual FC adapters, the server virtual FC
adapters, and the zoning tasks. We now need to map the server virtual FC adapters to the
physical HBAs.
The first step is to check that the SAN ports to which we are connected are NPIV-capable
ports:
1. In the virtual I/O server, execute the lsnports command, as shown in Figure 6-46.
Important: You need to zone both WWPNs to be able to perform LPM operations. You
must perform the zoning and storage configuration manually. The virtual WWPNs do not
appear in the switch fabric as the physical adapters.234 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 6-46 The lsnports command execution
2. Use the vfcmap and the lsmap commands to perform the physical to virtual adapter
mapping, as shown in Example 6-5.
Example 6-5 vfcmap and lsmap commands example
$ vfcmap -vadapter vfchost0 -fcp fcs0
$ vfcmap -vadapter vfchost1 -fcp fcs2
$ vfcmap -vadapter vfchost2 -fcp fcs0
$ vfcmap -vadapter vfchost3 -fcp fcs2
$ lsmap -all -npiv
Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost0 U9179.MHB.109AF6P-V1-C30 3
Status:NOT_LOGGED_IN
FC name:fcs0 FC loc code:U78C0.001.DBJH615-P2-C1-T1
Ports logged in:0
Flags:4
VFC client name: VFC client DRC:
Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost1 U9179.MHB.109AF6P-V1-C31 3
Status:NOT_LOGGED_IN
FC name:fcs2 FC loc code:U78C0.001.DBJF678-P2-C1-T1
Ports logged in:0
Flags:4
VFC client name: VFC client DRC:
Important: If the command output shows a value of 1 in the fabric attribute, the port
switch is NPIV capable. If it shows 0, you need to change your SAN port configuration.Chapter 6. Scenarios 235
Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost2 U9179.MHB.109AF6P-V1-C40 4
Status:NOT_LOGGED_IN
FC name:fcs0 FC loc code:U78C0.001.DBJH615-P2-C1-T1
Ports logged in:0
Flags:4
VFC client name: VFC client DRC:
Name Physloc ClntID ClntName ClntOS
------------- ---------------------------------- ------ -------------- -------
vfchost3 U9179.MHB.109AF6P-V1-C41 4
Status:NOT_LOGGED_IN
FC name:fcs2 FC loc code:U78C0.001.DBJF678-P2-C1-T1
Ports logged in:0
Flags:4
VFC client name: VFC client DRC:
Client installation
Continue with the client installation:
1. At this point, boot your client virtual server and install AIX (the same way that you install it
for any LPAR).
2. Install the corresponding MPIO driver in your client LPAR.
3. Test the available path to the storage.
4. Perform your usual AIX configurations.
Active Memory Expansion (AME) configuration
The AME configuration is a specific configuration that depends on multiple factors: the
environment and applications. In this section, we present a configuration example that is
based in a newly installed shared memory partition in which we use a memory stress tool to
generate workloads.
We use an LPAR that is one of the partitions that we installed in this scenario. It has the
following configuration:
Six GB RAM
Status attribute: Check the Status attribute. In Example 6-5, the Status attribute shows
as NOT_LOGGED_IN, because the client virtual server did not yet connect using this
adapter. After you finish your client partition configuration, this attribute changes to
LOGGED_IN.
Note: To get the memory stress tool, download it from the following website:
http://www.ibm.com/developerworks/wikis/display/WikiPtype/nstress
In this scenario, we use the nmem64, dbstart, and webstart scripts.236 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Four GB paging space
Two CPUs
Four virtual CPUs
Four SMT enabled
AMS is enabled
AME is disabled
As shown in Example 6-6, initially there is no workload in the LPAR.
Example 6-6 The Topas Monitor output in an idle partition
Topas Monitor for host:lpar1 EVENTS/QUEUES FILE/TTY
Thu May 26 09:45:12 2011 Interval:2 Cswitch 200 Readch 1906
Syscall 205 Writech 177
CPU User% Kern% Wait% Idle% Physc Entc% Reads 20 Rawin 0
Total 0.0 0.3 0.0 99.7 0.01 0.56 Writes 0 Ttyout 177
Forks 0 Igets 0
Network BPS I-Pkts O-Pkts B-In B-Out Execs 0 Namei 23
Total 505.0 6.50 0.50 299.0 206.0 Runqueue 1.00 Dirblk 0
Waitqueue 0.0
Disk Busy% BPS TPS B-Read B-Writ MEMORY
Total 0.0 0 0 0 0 PAGING Real,MB 6144
Faults 0K % Comp 17
FileSystem BPS TPS B-Read B-Writ Steals 0K % Noncomp 0
Total 1.86K 20.00 1.86K 0 PgspIn 0 % Client 0
PgspOut 0K
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
xmgc 786456 0.0 60.0K root PageOut 0K Size,MB 4096
topas 3473578 0.0 2.40M root Sios 0K % Used 6
getty 8585228 0.0 588K root % Free 94
gil 1900602 0.0 124K root NFS (calls/sec)
clstrmgr 5570762 0.0 1.31M root SerV2 0 WPAR Activ 0
clcomd 4587562 0.0 1.71M root CliV2 0 WPAR Total 0
rpc.lock 5439664 0.0 208K root SerV3 0 Press: "h"-help
netm 1835064 0.0 60.0K root CliV3 0 "q"-quit
In order to generate a workload, we execute the nstress tool with four memory stress
processes, each one consuming 2,000 MB of RAM memory during a five-minute period.
Refer to Example 6-7.
Example 6-7 nmem64 command execution
# nohup ./nmem64 -m 2000 -s 300 &
[1] 7602190
# Sending output to nohup.out
nohup ./nmem64 -m 2000 -s 300 &
[2] 7405586
# Sending output to nohup.out
nohup ./nmem64 -m 2000 -s 300 &
[3] 8388704
# Sending output to nohup.out
nohup ./nmem64 -m 2000 -s 300 &
[4] 8126552
# Sending output to nohup.outChapter 6. Scenarios 237
You can see the Topas Monitor output in Example 6-8. Notice that there is significant paging
activity in the server.
Example 6-8 Topas Monitor output in a memory stressed partition without memory compression
Topas Monitor for host:lpar1 EVENTS/QUEUES FILE/TTY
Thu May 26 09:48:14 2011 Interval:2 Cswitch 2533 Readch 1906
Syscall 348 Writech 196
CPU User% Kern% Wait% Idle% Physc Entc% Reads 20 Rawin 0
Total 1.0 32.7 5.8 60.5 1.08 53.87 Writes 0 Ttyout 196
Forks 0 Igets 0
Network BPS I-Pkts O-Pkts B-In B-Out Execs 0 Namei 23
Total 407.0 4.00 0.50 184.0 223.0 Runqueue 2.50 Dirblk 0
Waitqueue 4.5
Disk Busy% BPS TPS B-Read B-Writ MEMORY
Total 20.0 8.85M 1.16K 4.31M 4.54M PAGING Real,MB 6144
Faults 1128K % Comp 99
FileSystem BPS TPS B-Read B-Writ Steals 1161K % Noncomp 0
Total 1.86K 20.00 1.86K 0 PgspIn 1103 % Client 0
PgspOut 1161K
Name PID CPU% PgSp Owner PageIn 1103 PAGING SPACE
amepat 5832710 25.0 220K root PageOut 1161K Size,MB 4096
lrud 262152 0.0 92.0K root Sios 1925K % Used 80
java 9044006 0.0 77.6M root % Free 20
nmem64 8388704 0.0 1.95G root NFS (calls/sec)
nmem64 8126552 0.0 1.95G root SerV2 0 WPAR Activ 0
nmem64 7405586 0.0 1.95G root CliV2 0 WPAR Total 0
nmem64 7602190 0.0 1.95G root SerV3 0 Press: "h"-help
topas 3473578 0.0 2.88M root CliV3 0 "q"-quit
While the memory stress processes run, we execute the amepat command to analyze the
memory behavior and get the AME recommendation. Example 6-9 shows the command
output.
Example 6-9 The amepat command output during high paging activity
# amepat 3
Command Invoked : amepat 3
Date/Time of invocation : Thu May 26 09:47:37 EDT 2011
Total Monitored time : 4 mins 48 secs
Total Samples Collected : 3
System Configuration:
---------------------
Partition Name : lpar1_p780
Processor Implementation Mode : POWER7
Number Of Logical CPUs : 16
Processor Entitled Capacity : 2.00
Processor Max. Capacity : 4.00
True Memory : 6.00 GB
SMT Threads : 4
Shared Processor Mode : Enabled-Uncapped
Active Memory Sharing : Enabled
Active Memory Expansion : Disabled238 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
System Resource Statistics: Average Min
Max
--------------------------- ----------- -----------
-----------
CPU Util (Phys. Processors) 0.32 [ 8%] 0.16 [ 4%]
0.65 [ 16%]
Virtual Memory Size (MB) 8477 [138%] 6697 [109%]
9368 [152%]
True Memory In-Use (MB) 6136 [100%] 6136 [100%]
6136 [100%]
Pinned Memory (MB) 1050 [ 17%] 1050 [ 17%]
1050 [ 17%]
File Cache Size (MB) 36 [ 1%] 31 [ 1%]
48 [ 1%]
Available Memory (MB) 0 [ 0%] 0 [ 0%]
0 [ 0%]
Active Memory Expansion Modeled Statistics:
-------------------------------------------
Modeled Expanded Memory Size : 6.00 GB
Average Compression Ratio : 2.00
Expansion Modeled True Modeled CPU Usage
Factor Memory Size Memory Gain Estimate
--------- ------------- ------------------ -----------
1.00 6.00 GB 0.00 KB [ 0%] 0.00 [ 0%]
1.12 5.38 GB 640.00 MB [ 12%] 0.00 [ 0%]
1.20 5.00 GB 1.00 GB [ 20%] 0.00 [ 0%]
1.30 4.62 GB 1.38 GB [ 30%] 0.00 [ 0%]
1.42 4.25 GB 1.75 GB [ 41%] 0.00 [ 0%]
1.50 4.00 GB 2.00 GB [ 50%] 0.00 [ 0%]
1.60 3.75 GB 2.25 GB [ 60%] 0.03 [ 1%]
Active Memory Expansion Recommendation:
---------------------------------------
The recommended AME configuration for this workload is to configure the LPAR
with a memory size of 3.75 GB and to configure a memory expansion factor
of 1.60. This will result in a memory gain of 60%. With this
configuration, the estimated CPU usage due to AME is approximately 0.03
physical processors, and the estimated overall peak CPU resource required for
the LPAR is 0.68 physical processors.
NOTE: amepat's recommendations are based on the workload's utilization level
during the monitored period. If there is a change in the workload's utilization
level or a change in workload itself, amepat should be run again.
The modeled Active Memory Expansion CPU usage reported by amepat is just an
estimate. The actual CPU usage used for Active Memory Expansion may be lower
or higher depending on the workload.
In Example 6-9 on page 237, you can observe the amepat command execution output during
the period in which we stress the server memory. In the recommendation section, The amepat
command specifies that we need to configure 3.75 GB and a 1.60 expansion factor. In this
case, we have two options:Chapter 6. Scenarios 239
Follow the recommendation as is: In this case, the amount of physical memory that is
consumed by the virtual server is reduced, but the paging activity remains.
Configure the 1.60 expansion factor and continue using the 6 GB of logical RAM memory
(remember we are using AMS, also). In this case, the paging activity disappears.
We present both scenarios. We must start by enabling the AME feature. Follow these steps:
1. Shut down the virtual server.
2. Modify the partition profile. In the Memory tab, select AME and enter a 1.6 active memory
expansion factor.
3. Reduce the assigned memory to the desired memory of 3.75 GB, as shown in
Figure 6-47.
Figure 6-47 AME configuration options
4. Start the partition.
5. We now execute the memory stress test with 3.75 GB RAM and a 1.6 AME expansion
factor in the virtual server. Example 6-10 shows the results.
Example 6-10 The Topas Monitor output in a memory-stressed partition
Topas Monitor for host:lpar1 EVENTS/QUEUES FILE/TTY
Thu May 26 10:15:31 2011 Interval:2 Cswitch 2225 Readch 1906
Syscall 206 Writech 151
CPU User% Kern% Wait% Idle% Physc Entc% Reads 20 Rawin 0
Total 2.5 2.5 13.1 82.0 0.17 8.37 Writes 0 Ttyout 151
Forks 0 Igets 0
Network BPS I-Pkts O-Pkts B-In B-Out Execs 0 Namei 23240 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Total 270.0 2.00 0.50 92.00 178.0 Runqueue 2.50 Dirblk 0
Waitqueue 3.5
Disk Busy% BPS TPS B-Read B-Writ MEMORY
Total 19.9 7.97M 1.06K 3.96M 4.01M PAGING Real,MB 6144
Faults 2687K % Comp 99
FileSystem BPS TPS B-Read B-Writ Steals 2709K % Noncomp 0
Total 1.86K 20.00 1.86K 0 PgspIn 1014 % Client 0
PgspOut 1027K
Name PID CPU% PgSp Owner PageIn 1014 PAGING SPACE
cmemd 655380 0.0 180K root PageOut 1027K Size,MB 4096
nmem64 7733342 0.0 1.95G root Sios 1914K % Used 75
nmem64 7209204 0.0 1.95G root % Free 25
nmem64 6619252 0.0 1.95G root AME
nmem64 8585308 0.0 1.95G root TMEM 3.75GWPAR Activ 0
lrud 262152 0.0 92.0K root CMEM 2.25GWPAR Total 0
topas 7602196 0.0 2.79M root EF[T/A] 1.6/1.6Press: "h"-help
java 5243082 0.0 70.2M root CI:1.66KCO:1.64K "q"-quit
In Example 6-10, observe that you still have significant paging activity. However, the virtual
server thinks it has 6 GB of RAM memory. In reality, it only has 3.75 GB, and there is 2.25
GB of compressed memory.
6. Dynamically, we add 2.25 GB of memory to the virtual server. The total amount is now
6 GB of RAM memory.
7. We execute the tests again. Example 6-11 presents the results.
Example 6-11 Topas output in a memory-stressed partition with memory compression and 6 GB RAM
Topas Monitor for host:lpar1 EVENTS/QUEUES FILE/TTY
Thu May 26 10:22:25 2011 Interval:2 Cswitch 2145 Readch 1909
Syscall 245 Writech 668
CPU User% Kern% Wait% Idle% Physc Entc% Reads 20 Rawin 0
Total 38.4 48.6 0.8 12.2 3.74 187.07 Writes 6 Ttyout 215
Forks 0 Igets 0
Network BPS I-Pkts O-Pkts B-In B-Out Execs 0 Namei 30
Total 850.0 10.52 1.00 540.0 310.1 Runqueue 9.52 Dirblk 0
Waitqueue 0.0
Disk Busy% BPS TPS B-Read B-Writ MEMORY
Total 0.0 246K 50.59 246K 0 PAGING Real,MB 9728
Faults 80140K % Comp 95
FileSystem BPS TPS B-Read B-Writ Steals 80190K % Noncomp 0
Total 1.88K 20.54 1.87K 17.03 PgspIn 0 % Client 0
PgspOut 0K
Name PID CPU% PgSp Owner PageIn 49 PAGING SPACE
cmemd 655380 40.3 180K root PageOut 0K Size,MB 4096
lrud 262152 13.4 92.0K root Sios 49K % Used 1
nmem64 8585310 13.4 1.95G root % Free 99
nmem64 7209206 13.4 1.95G root AME
nmem64 7733344 13.4 1.95G root TMEM 6.00GWPAR Activ 0
nmem64 6619254 0.0 1.95G root CMEM 3.09GWPAR Total 0
slp_srvr 4718600 0.0 484K root EF[T/A] 1.6/1.6Press: "h"-help
topas 7602196 0.0 2.79M root CI:80.0KCO:77.8K "q"-quit
As you can see in Example 6-11 on page 240, there is no paging activity in the server with the
new configuration.Chapter 6. Scenarios 241
Live Partition Mobility (LPM) operations
We now move lpar1 from the original server to the secondary server. Before performing the
LPM operation, we generate memory and CPU activity with the nstress tool.
Follow these steps to move the partition:
1. Execute the dbstart.sh script, which creates a fake database.
2. Execute the webstart.sh script, which creates a fake web server.
3. Execute the memory stress test: nohup ./nmem64 -m 2000 -s 3000. In this example, we
execute it four times.
4. Observe the Topas Monitor output, as shown in Example 6-12.
Example 6-12 Topas Monitor output for migration candidate partition
Topas Monitor for host:lpar1 EVENTS/QUEUES FILE/TTY
Fri May 27 13:06:50 2011 Interval:2 Cswitch 2489 Readch 0
Syscall 224 Writech 231
CPU User% Kern% Wait% Idle% Physc Entc% Reads 0 Rawin 0
Total 46.4 35.9 0.7 17.0 3.67 183.75 Writes 0 Ttyout 231
Forks 0 Igets 0
Network BPS I-Pkts O-Pkts B-In B-Out Execs 0 Namei 4
Total 1.08K 18.48 0.50 850.1 258.7 Runqueue 6.49 Dirblk 0
Waitqueue 0.0
Disk Busy% BPS TPS B-Read B-Writ MEMORY
Total 0.1 0 0 0 0 PAGING Real,MB 9728
Faults 64742 % Comp 87
FileSystem BPS TPS B-Read B-Writ Steals 64716 % Noncomp 0
Total 1.86K 19.98 1.86K 0 PgspIn 0 % Client 0
PgspOut 0
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
cmemd 655380 40.8 180K root PageOut 0 Size,MB 4096
nmem64 8388742 13.6 1.95G root Sios 0 % Used 1
webserve 14418146 13.6 4.18M root % Free 99
lrud 262152 13.6 92.0K root AME
db 12451978 13.6 64.2M root TMEM 6.00GWPAR Activ 0
nmem64 12386446 0.0 1.64G root CMEM 2.38GWPAR Total 0
nmem64 14221532 0.0 1.95G root EF[T/A] 1.6/1.6Press: "h"-help
nmem64 7340044 0.0 1.35G root CI:40.9KCO:63.2K "q"-quit
webserve 10420390 0.0 108K root
5. Check the machine serial number, as shown in Example 6-13.
Example 6-13 Checking the machine serial number
# hostname
lpar1_p780
# uname -u
IBM,02109AF6P
6. In the SDMC, select the virtual server to move (in this case, lpar1).
7. Click Action ? Operations ? Mobility ? Migrate.
Environment: In this scenario, we generate workloads in the test partitions. These
workloads exist in a real production environment.242 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
8. Complete the migration wizard with the default options.
9. On the Summary window, you see a window similar to Figure 6-48.
Figure 6-48 LPM Summary window
10.Click Finish to perform the LPM operation. The Virtual server migration status appears, as
shown in Figure 6-49 on page 242.
Figure 6-49 LPM operation in progress
11.After the migration completes, a window similar to Figure 6-50 opens.Chapter 6. Scenarios 243
Figure 6-50 Successful LPM operation
12.Check the machine serial number, as shown in Example 6-14.
Example 6-14 New serial number for lpar1 LPAR
# hostname
lpar1_p780
# uname -u
IBM,02109AF7P
At this point, the partition has successfully migrated to the second Power server without
disruption in the services.
6.6 LPM and PowerHA
In this section, we perform an LPM with a simple script running, followed by a PowerHA
failover test. “Simple cluster installation” on page 362 shows the cluster configuration.
These items are in the setup:
Nodes: rflpar10 and rflpar20
Resource groups: lpar1svcrg and lpar2svcrg
Application controllers: lpar2appserver and lpar2appserver
Service IP labels: rflpar10_svc and rflpar20_svc
Example 6-15 shows the application server scripts for the PowerHA failover. There are simple
DB2 start and stop scripts, and they must not be used in production environments.
Example 6-15 A simple application server is used in this example to show the failover test
cat /hascripts/startlpar1.sh
echo "0 `hostname` 0" > /home/db2inst1/sqllib/db2nodes.cfg
su - db2inst1 -c db2start
#
/hascripts/stoplpar1.sh
echo "0 `hostname` 0" > /home/db2inst1/sqllib/db2nodes.cfg
su - db2inst1 -c db2stop force
We also used a small script to confirm the following components:244 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
The physical server
The LPAR that we are currently using
The IP addresses
The state of the database, which is shown after connecting to the database by using db2
connect to test1 user db2inst1 using password
Example 6-16 shows the script.
Example 6-16 Script to track the state of the LPAR
while true
do
echo ------------------- | tee -a /home/db2inst1/status.log
lsconf | head -2 | tee -a /home/db2inst1/status.log
hostname | tee -a /home/db2inst1/status.log
ifconfig en0 | tee -a /home/db2inst1/status.log
echo ----------- | tee -a /home/db2inst1/status.log
db2 select tabname from syscat.tables fetch first 2 rows only |grep -v "\-\-" |\
tee -a /home/db2inst1/status.log
date | tee -a /home/db2inst1/status.log
who -r
sleep 10
echo "=================================================== \n"
done
Example 6-17 on page 244 shows the results of the script that is shown in Example 6-16.
Look at the following components in Example 6-17 on page 244:
System model
Serial number
lparname
IP addresses
Date and time
In all instances, the db2 select statement result is the same
Example 6-17 Initial results before the LPM test
System Model: IBM,9117-MMA
Machine Serial Number: 101F170
rflpar10
en0:
flags=1e080863,480
inet 172.16.21.36 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.40 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.60 netmask 0xfffffc00 broadcast 172.16.23.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
-----------
TABNAME
ATTRIBUTES
AUDITPOLICIES
2 record(s) selected.
Fri Jun 3 12:45:50 EDT 2011Chapter 6. Scenarios 245
. run-level 2 Jun 03 12:39 2 0 S
Example 6-18 shows the cluster status.
Example 6-18 Cluster status before LPM
/usr/es/sbin/cluster/clstat
clstat - HACMP Cluster Status Monitor
-------------------------------------
Cluster: pough (1111817142)
Fri Jun 3 12:48:29 2011
State: UP Nodes: 2
SubState: STABLE
Node: rflpar10 State: UP
Interface: rflpar10 (0) Address: 172.16.21.36
State: UP
Interface: rflpar10_svc (0) Address: 172.16.21.60
State: UP
Resource Group: lpar1svcrg State: On line
Node: rflpar20 State: UP
Interface: rflpar20 (0) Address: 172.16.21.35
State: UP
Interface: rflpar20_svc (0) Address: 172.16.21.61
State: UP
************************ f/forward, b/back, r/refresh, q/quit ******************
6.6.1 The LPM operation
The LPM operation is performed from an IBM POWER6 570 to an IBM Power 780, as
explained in 6.3, “Live Partition Mobility (LPM) using the HMC and SDMC” on page 212.
The migration is performed based on the information that is shown in Figure 6-51.246 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 6-51 Migration setup
After the LPM operation, we observed the following results, as shown in Example 6-19:
The cluster status that is shown with the clstat command does not change.
We did not lose the session on which we ran the while statement.
The model and serial number are the only items that changed.
The IP addresses do not change.
The script continued running.
Example 6-19 Results from the LPM operation
System Model: IBM,9117-MMA
Machine Serial Number: 109AF6P
rflpar10
en0:
flags=1e080863,480
inet 172.16.21.36 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.40 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.60 netmask 0xfffffc00 broadcast 172.16.23.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
-----------
TABNAME
ATTRIBUTES
AUDITPOLICIES
2 record(s) selected.Chapter 6. Scenarios 247
Fri Jun 3 12:55:01 EDT 2011
. run-level 2 Jun 03 12:54 2 0 S
===================================================
System Model: IBM,9179-MHB
Machine Serial Number: 109AF6P
rflpar10
en0:
flags=1e080863,480
inet 172.16.21.36 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.40 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.60 netmask 0xfffffc00 broadcast 172.16.23.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
-----------
TABNAME
ATTRIBUTES
AUDITPOLICIES
2 record(s) selected.
Fri Jun 3 12:55:28 EDT 2011
. run-level 2 Jun 03 12:54 2 0 S
6.6.2 The PowerHA operation
We continue. We run the same scripts as we ran in 6.6, “LPM and PowerHA” on page 243
and failed over using PowerHA. To force a failover, we forced one LPAR down.
We observed the following conditions, as shown in Example 6-20:
We had to restart the session and rerun the status script.
All IP addresses that were active on the failing node moved to rflpar20.
Example 6-20 Results of a failover
System Model: IBM,9179-MHB
Machine Serial Number: 109AF6P
rflpar20
en0:
flags=1e080863,480
inet 172.16.21.35 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.41 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.61 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.60 netmask 0xfffffc00 broadcast 172.16.23.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
-----------
TABNAME
ATTRIBUTES
AUDITPOLICIES248 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2 record(s) selected.
Fri Jun 3 13:10:29 EDT 2011
. run-level 2 Jun 03 11:25 2 0 S
The cluster status changed. One node showed a down status, as shown in Example 6-21.
Example 6-21 Cluster status after the failover test
clstat - PowerHA SystemMirror Cluster Status Monitor
-------------------------------------
Cluster: pough (1111817142)
Fri Jun 3 13:05:28 EDT 2011
State: UP Nodes: 2
SubState: STABLE
Node: rflpar10 State: DOWN
Interface: rflpar10 (0) Address: 172.16.21.36
State: DOWN
Node: rflpar20 State: UP
Interface: rflpar20 (0) Address: 172.16.21.35
State: UP
Interface: rflpar10_svc (0) Address: 172.16.21.60
State: UP
Interface: rflpar20_svc (0) Address: 172.16.21.61
State: UP
Resource Group: lpar1svcrg State: On line State: UP
Resource Group: lpar2svcrg State: On line
************************ f/forward, b/back, r/refresh, q/quit ******************
Example 6-20 on page 247 and Example 6-21 show the difference between PowerHA and
LPM.© Copyright IBM Corp. 2011. All rights reserved. 249
Chapter 7. POWER7 Enterprise Server
performance considerations
This chapter discusses the performance aspects of the POWER7 Enterprise Servers. We
start by introducing the performance design of our POWER7 servers. We also introduce key
considerations with POWER7 Enterprise Servers, such as the reliability, availability, and
serviceability (RAS) features and virtualization features. We also discuss specific AIX and
IBM i operating system considerations. In addition, we explain enhanced monitoring methods
for POWER7 servers. In the last few sections, we discuss IBM performance management
tools.
In this chapter, we discuss the following topics:
Performance design for POWER7 Enterprise Servers
POWER7 Servers performance considerations
Performance considerations with hardware RAS features
Performance considerations with Power virtualization features
Performance considerations with AIX
IBM i performance considerations
Enhanced performance tools of AIX for POWER7
Performance Management for Power Systems
7250 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7.1 Introduction
The IBM Power development team in Austin, TX, has aggressively pursued the integration of
industry-leading mainframe reliability technologies into Power Systems servers. With the
introduction of POWER7, there are successive generations of new RAS features included in
the server line. One core principle that guides the IBM RAS architecture engineering design
team is that systems must be configurable to achieve the required levels of availability without
compromising performance, utilization, or virtualization. Hardware and firmware RAS features
are independent of the operating system and, therefore, do not affect operating system or
application performance. However, the hardware RAS features can provide the key
enablement of availability features built into the AIX, IBM i, and Linux operating systems and
benefits that contribute to the overall system availability.
7.2 Performance design for POWER7 Enterprise Servers
This section contains descriptions of the Power 780 and Power 795 performance features:
Balanced architecture
Processor embedded dynamic random access memory (eDRAM) technology
Processor compatibility mode
MaxCore and TurboCore modes
Active Memory Expansion (AME)
Power management’s effect on system performance
7.2.1 Balanced architecture of POWER7
Multi-core processor technologies face major challenges to continue delivering growing
throughput and performance. These challenges include the constraints of physics, power
consumption, and socket pin count limitations.
To overcome these limitations, a balanced architecture is required. Many processor design
elements need to be balanced on a server in order to deliver maximum throughput.
In many cases, IBM has been innovative in order to achieve the required levels of throughput
and bandwidth. Areas of innovation for the POWER7 processor and POWER7
processor-based systems include (but are not limited to) these areas:
On-chip L3 cache implemented in eDRAM
Cache hierarchy and component innovation
Advances in the memory subsystem
Advances in off-chip signaling
Exploitation of the long-term investment in coherence innovation
For example, POWER6, POWER5, and POWER4 systems derive large benefits from high
bandwidth access to large, off-chip cache. However, socket pin count constraints prevent
scaling the off-chip cache interface to support eight cores, which is a feature of the POWER7
processor.
Figure 7-1 on page 251 illustrates the POWER5 and POWER6 large L2 cache technology.Chapter 7. POWER7 Enterprise Server performance considerations 251
Figure 7-1 The POWER5 and POWER6 large L2 cache technology
IBM was able to overcome this challenge by introducing an innovative solution: high speed
eDRAM on the processor chip. With POWER7, IBM introduces on-processor, high-speed,
custom eDRAM, combining the dense, low power attributes of eDRAM with the speed and
bandwidth of static random access memory (SRAM).
Figure 7-2 on page 252 illustrates the various memory technologies.
Low
Latency
2M – 4M
per Core
Cache
footprint
Low
Latency
2M – 4M
per Core
Cache
footprint
Large, Shared, 30+ MB
Cache footprint, that is much closer than
local memory
Core
Cache Hierarchy Requirement for POWER Servers
Core252 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-2 The various memory technologies
Another challenge is the need to satisfy both caching requirements: the low latency per core
cache and the large cache with one cache.
IBM introduced an innovative solution called the hybrid L3 “Fluid” cache structure, which has
these characteristics:
Keeps multiple footprints at ~3X lower latency than local memory
Automatically migrates private footprints (up to 4M) to the fast local region (per core) at
~5X lower latency than full L3 cache
Automatically clones shared data to multiple private regions
Figure 7-3 illustrates the Hybrid L3 Fluid Cache Structure.
Figure 7-3 Hybrid L3 Fluid Cache
On
uP
Chip
Off
uP
Chip
High Area/Power
High speed / bandwidth
Dense, Low power
Low speed / bandwidth
Conventional
Memory DRAM
IBM ASIC
eDRAM
IBM Custom
eDRAM
Custom
Dense SRAM
Custom
Fast SRAM
Conventional
Memory DIMMs
Large, Off-chip
30+ MB Cache
On-processor
Multi-MB Cache
On-processor
30+ MB Cache
Private core
Sub-MB Cache
Large, Shared
32MB
L3 Cache
Fast Local
L3 Region
Core Core Core Core Core Core Core
Private
Private
Private
Private
Private
Cloned
Cloned
Shared
Private
Fast Local
L3 Region
Private
Private
Private
Cloned
Cloned
Solution: Hybrid L3 “Fluid” Cache StructureChapter 7. POWER7 Enterprise Server performance considerations 253
7.2.2 Processor eDRAM technology
In many cases, IBM has been innovative in order to achieve the required levels of throughput
and bandwidth. Areas of innovation for the POWER7 processor and POWER7
processor-based systems include (but are not limited to) these areas:
On-chip L3 cache implemented in embedded dynamic random access memory (eDRAM)
Cache hierarchy and component innovation
Advances in the memory subsystem
Advances in off-chip signalling
Exploitation of long-term investment in coherence innovation
The innovation of using eDRAM on the POWER7 processor chip is significant for several
reasons:
Latency improvement: A six-to-one latency improvement occurs by moving the L3 cache
on-chip compared to L3 accesses on an external (on-ceramic) ASIC.
Bandwidth improvement: A 2x bandwidth improvement occurs with on-chip interconnect.
Frequency and bus sizes are increased to and from each core.
No off-chip driver or receivers.
Removing drivers or receivers from the L3 access path lowers interface requirements,
conserves energy, and lowers latency.
Small physical footprint: The eDRAM L3 cache requires far less physical space than an
equivalent L3 cache that is implemented with conventional SRAM. IBM on-chip eDRAM
uses only a third of the components that are used in conventional SRAM, which has a
minimum of six transistors to implement a 1-bit memory cell.
Low energy consumption: The on-chip eDRAM uses only 20% of the standby power of
SRAM.
7.2.3 Processor compatibility mode
POWER7 supports partition mobility with POWER6 and POWER6+ systems by providing
compatibility modes. Partitions running in POWER6 or POWER6+ compatibility mode can run
in single thread (ST) or simultaneous multi-thread (SMT2). SMT4 and single-instruction,
multiple-data (SIMD) double-precision floating-point (VSX) are not available in compatibility
mode.
Applications that are single process and single threaded might benefit from running in ST
mode. Multithreaded and multi-process applications typically benefit more running in SMT2 or
SMT4 mode. ST mode can be beneficial in the case of a multi-process application where the
number of application processes is smaller than the number of cores assigned to the
partition.
Applications that do not scale with a larger number of CPUs might also benefit from running in
ST or SMT2 mode instead of SMT4, because the lower number of SMT threads means a
lower number of logical CPUs.
You can set ST, SMT2, or SMT4 mode through the smtctl command. The default mode is
SMT4. For more information about performance considerations about processor compatibility
mode, refer to 7.3.1, “Processor compatibility mode” on page 256.254 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7.2.4 MaxCore and TurboCore modes
TurboCore provides a higher frequency core and more cache per core, which are normally
extremely good things for performance. Also, to make use of these positive attributes, the
system’s active core placement needs to change from having eight cores per chip to having
four cores per chip. So, using TurboCore can also mean a change in the number of chips and
perhaps in the number of drawers. For a given partition, this in turn can mean an increase in
the probability of longer access latencies to memory and cache. But partition placement and
workload type can influence these probabilities. As a result, the positive benefits of
TurboCore’s higher frequency and increased cache also have the potential of being offset to
various extents by these longer latency storage accesses.
In cases in which the cross-chip accesses are relatively limited, all of TurboCore’s benefits
can remain. For information about MaxCore performance considerations, refer to 7.3.2,
“TurboCore and MaxCore modes” on page 261.
7.2.5 Active Memory Expansion
Active Memory Expansion (AME) is an innovative POWER7 technology that uses
compression and decompression to effectively expand the true physical memory that is
available for client workloads. Often a small amount of processor resource provides a
significant increase in the effective memory maximum.
Actual expansion results depend on how much you can compress the data is that is used in
the application. For example, an SAP ERP sample workload showed up to 100% expansion.
An estimator tool and free trial are available.
AME differs from Active Memory Sharing (AMS). Active Memory Sharing moves memory
from one partition to another partition. AMS is the best fit when one partition is not busy when
another partition is busy, and it is supported on all AIX, IBM i, and Linux partitions.
A number of commands are available to monitor the AME configuration of a partition. These
commands include amepat, topas, vmstat, and lpartstat.
The amepat command provides a summary of the AME configuration and can be used for
monitoring and fine-tuning the configuration. The amepat command shows the current
configuration, as well as the statistics of the system resource utilization over the monitoring
period. For more information about the AME performance monitor, refer to 7.8.7, “Monitoring
Active Memory Expansion (AME) statistics” on page 319.
7.2.6 Power management’s effect on system performance
All power management modes can affect certain aspects of performance, depending on the
system configuration and how performance is measured. Consider these issues before
turning on any power management mode or feature:
Systems running at low utilization (and consequently, low frequency) might maintain
processor throughput. However, response time to a particular task might be affected. Also,
the reaction time to an incoming workload can be affected.
Any system setup that limits the amount of processing allowed, such as running with
capped partitions, can cause the frequency to be reduced. Even though a partition might
be running at 100% of its entitled capacity, the system as a whole might not be heavily
utilized.Chapter 7. POWER7 Enterprise Server performance considerations 255
Using virtual shared processor pools also can limit the overall system utilization and cause
lower processor frequencies to be set.
Certain external workload managers also have the effect of limiting system processing by
adjusting workloads to a point where frequency is lowered.
Because the processor frequency is variable, performance monitoring tools can be
affected.
As shown in Figure 7-4 and Figure 7-5 on page 256 from a representative POWER7 system,
enabling the various power savings modes can directly affect power consumption as a
workload varies. For example, Dynamic Power Saver mode can deliver higher workload
throughput than either of the other modes at the expense of system power consumption. At
less than peak utilization, Dynamic Power Saver mode delivers power savings, and it might
still deliver adequate workload throughput. It is important to note that trade-offs must be made
between energy consumption, workload response times, and throughput. For additional
details about these issues, refer to a companion white paper, EnergyScale Performance
Characteristics for IBM Power Systems.
Figure 7-4 System energy consumption trends: System load level and average processor frequency256 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-5 Nominal and static power save (SPS) modes
If you want to know about power management features and differences in Dynamic Power
Saver from POWER6 to POWER7, refer to 2.5, “Power management” on page 36.
7.3 POWER7 Servers performance considerations
In this section, we introduce performance considerations with the POWER7 server hardware
features. This section consists of the following topics:
Processor compatibility mode
TurboCore and MaxCore modes
Active Memory Expansion
Logical Memory Block size
System huge-page memory
7.3.1 Processor compatibility mode
Processor compatibility modes enable you to move logical partitions (LPARs) between
servers that have separate processor types without upgrading the operating environments
that are installed in the LPARs. In certain cases, the options with this feature result in varying
performance.
Regarding POWER7 servers, you have four options for choosing processor compatibility
mode:
POWER6
This execution mode is compatible with Version 2.05 of the Power Instruction Set
Architecture (ISA)
1Chapter 7. POWER7 Enterprise Server performance considerations 257
POWER6+
This mode is similar to POWER6 with eight additional storage protection keys.
POWER7
The POWER7 mode is the native mode for POWER7 processors, implementing the V2.06
of the Power Instruction Set Architecture
2
Default
The Power hypervisor determines the current mode for the LPAR.
Each LPAR running on a POWER7-based system can run in one of these modes. Also,
LPARs on the same system can run in separate modes. This mode setting is controlled by the
partition definition when the LPAR is created.
For more information about processor compatibility mode, refer to the following link:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hc3/ip
hc3pcmdefs.htm
Comparisons among the processor compatibility modes
Table 7-1 shows each processor compatibility mode and the servers on which the LPARs that
use each processor compatibility mode can successfully operate.
Table 7-1 The definition and supported servers with separate processor compatibility mode
1
If you want to know detailed information about Power Instruction Set V2.05, refer to the following website:
http://www.power.org/resources/reading/PowerISA_V2.05.pdf
2
If you want to know detailed information about Power Instruction Set V2.06, refer to the following website:
http://www.power.org/resources/downloads/PowerISA_V2.06_PUBLIC.pdf
Processor
compatibility mode
Description Supported servers
POWER6 The POWER6 processor
compatibility mode allows you to run
operating-system versions that use
all the standard features of the
POWER6 processor.
POWER6
POWER6+
POWER7 processor-based servers
POWER6+ The POWER6+ processor
compatibility mode allows you to run
operating-system versions that use
all the standard features of the
POWER6+ processor.
POWER6+
POWER7 processor-based servers
POWER7 The POWER7 processor
compatibility mode allows you to run
operating-system versions that use
all the standard features of the
POWER7 processor.
POWER7 processor-based servers258 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
In addition to the description and supported server differences between POWER6/6+ and
POWER7 modes, the operating system requirement for the various modes also differs. For
detailed information about the minimal operating system requirements, refer to the IBM
Offering information website:
http://www-01.ibm.com/common/ssi/index.wss
In addition, many functional differences exist between POWER6/6+ mode and POWER7
mode. Table 7-2 lists the differences with regard to performance and RAS.
Table 7-2 Functional differences between POWER6/POWER6+ and POWER7 modes
Simultaneous Multithreading Mode
Simultaneous Multithreading Mode (SMT) technology can help to increase the processor’s
utilization by improving cache misses and instruction dependency delay issues
3
. Enabling
SMT (2 or 4) mode provides concurrent execution of the instruction stream by multiple
threads on the same core. Table 7-3 on page 259 lists the various SMT modes that are
supported by the Power servers.
Default The preferred processor compatibility
mode enables the Power hypervisor
to determine the current mode for the
LPAR. When the preferred mode is
set to default, the Power hypervisor
sets the current mode to the most
fully featured mode supported by the
operating environment. In most
cases, this mode is the processor
type of the server on which the LPAR
is activated.
Dependent on the current processor
compatibility mode of the LPAR. For
example, if the Power hypervisor
determines that the current mode is
POWER7, the LPAR can run on
POWER7 processor-based servers.
POWER6 and POWER6+
mode
POWER7 mode Comment
Dual-threaded (SMT2)
Single-threaded (ST)
Quad-threaded (SMT4)
Dual-threaded (SMT2)
Single-threaded (ST)
Refer to “Simultaneous
Multithreading Mode”
Vector Multimedia Extension
(VMX) or AltiVec
VMX or AltiVec
VSX (Vector Scalar Extension)
Refer to “Single Instruction
Multiple Data” on page 259
64-core/128-thread Scaling 64-core/256-thread Scaling
256-core/1024-thread Scaling
(Only AIX7 support)
Refer to “Large scale-up
capability” on page 259
8/16 Storage Protection Keys
a
a. POWER6+ mode provides 16 storage protection keys.
32 Storage Protection Keys Refer to “Storage protection
keys” on page 260
3
For the description of cache misses and instruction dependency delay issues, refer to the IBM white paper:
http://www-03.ibm.com/systems/resources/pwrsysperf_SMT4OnP7.pdf
Processor
compatibility mode
Description Supported serversChapter 7. POWER7 Enterprise Server performance considerations 259
Table 7-3 SMT modes supported with different Power servers
From Table 7-3, you can see that POWER7 servers now support SMT4 mode
4
. In general,
because there are four threads running on one core concurrently, it can provide higher total
performance and throughput than other SMT modes.
Multithreaded and multiple process applications typically benefit more by running in SMT2 or
SMT4 mode. In AIX, ST, SMT2, or SMT4 mode can be set through the smtctl command
dynamically. For detailed information about SMT tuning on AIX, refer to 7.6.6, “Simultaneous
multithreading (SMT)” on page 296.
Linux for Power also has a similar command (ppc64_cpu) to control SMT modes. For detailed
information about this command, refer to the following website (you need to register first):
http://www.ibm.com/developerworks/wikis/display/LinuxP/Performance%20FAQs
Single Instruction Multiple Data
Single Instruction Multiple Data (SIMD), also called vector, instructions provide a concise and
efficient way to express data-level parallelism (DLP
5
), which is explained in the footnote. With
SIMD instructions, fewer instructions are required to perform the same data computation,
resulting in lower fetch, decode, and dispatch bandwidth and consequently higher power
efficiency.
The POWER7 processor adds another SIMD instruction called Vector Scalar Extension
(VSX), which is based on the Vector Media Extensions (VMX) instruction. This technology
helps to improve POWER7’s performance, especially in High Performance Computing (HPC)
projects.
For more information about SIMD, refer to the IBM research paper, “IBM Power Architecture,”
at the following website:
http://domino.research.ibm.com/library/cyberdig.nsf/papers/8DF8C243E7B01D948525787
300574C77/$File/rc25146.pdf
Large scale-up capability
The IBM Power 795 can provide 256 cores in one LPAR
6
. It provides high performance and
scalability for a large scale-up single system image, from which many workloads get benefit
(for example, online transaction processing (OLTP), ERP scale-up, and so forth).
Power server Supported SMT mode Number of logical CPUs per
core
POWER5 ST 1
POWER5 SMT2 2
POWER6/6+ ST 1
POWER6/6+ SMT2 2
POWER7 ST 1
POWER7 SMT2 2
POWER7 SMT4 4
4
Regarding AIX OS, AIX 5.3 does not support SMT4, and AIX5.3 does not support POWER7 mode.
5
Data level parallelism (DLP) consists of simultaneously performing the same type of operations on separate data
values, using multiple functional units, with a single instruction.260 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Storage protection keys
Power storage protection keys provide hardware-enforced access mechanisms for memory
regions. Only programs that use the correct key are allowed to read or write to protected
memory locations. The POWER7 mode provides 16 more storage protection keys than the
POWER6+ mode.
7
For more information about POWER7 storage protection keys, refer to the IBM white paper,
POWER7 System RAS, at the following website:
ftp://public.dhe.ibm.com/common/ssi/ecm/en/pow03056usen/POW03056USEN.PDF
Processor compatibility mode performance considerations
From Table 7-2 on page 258, if you select POWER7 mode, it supports SMT4 mode and also
enables other features to improve performance. If there is not necessarily a requirement
8
to
choose POWER6/6+ mode, we suggest choosing POWER7 mode. POWER7 mode is
flexible, and you can choose an appropriate SMT mode to get the best performance.
Applications that are single process and single threaded might benefit from running in ST
mode. ST mode can be beneficial in the case of a multi-process application where the
number of application processes is smaller than the number of cores assigned to the
partition. Applications that do not scale with a larger number of CPUs might also benefit from
running in SMT2 or ST mode instead of SMT4, because the lower number of SMT threads
means a lower number of logical CPUs. For detailed information about SMT, refer to 7.6.6,
“Simultaneous multithreading (SMT)” on page 296.
Configuration for processor compatibility mode
Processor compatibility mode is one feature of an LPAR that we can configure by editing the
LPAR’s profile. Figure 7-6 on page 261 shows the configuration window from the HMC.
6
At the time of writing, if you want to configure more than 128 cores in one LPAR with the FC4700 processor, you
need to purchase software key FC1256 and install it in the server. The name of this code is “AIX Enablement for
256-cores LPAR”.
7
In POWER6+ and POWER7, the hypervisor reserves one storage protection key. So, the number of maximum
available storage protection keys for OS is 15 (for POWER6+) and 31 (for POWER7).
8
If your AIX Version is AIX 5.3, you cannot choose POWER7 mode. If the LPAR is in the Live Partition Mobility (LPM)
environment and POWER7 mode does not support partition mobility, you cannot choose POWER7 mode, too. For
detailed information, refer to the IBM information center link:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hc3/iphc3pcm.htmChapter 7. POWER7 Enterprise Server performance considerations 261
Figure 7-6 Configuring the processor compatibility mode
7.3.2 TurboCore and MaxCore modes
POWER7 high-end servers (780 and 795) offer two processor running modes:
MaxCore (default)
The MaxCore mode allows for all processor cores in the system to be activated.
TurboCore
The TurboCore mode allows for half of the processor cores in the system to be activated,
but the cores run at a higher speed and have access to the entire L3 cache on the chip.
The TurboCore mode allows the processor cores to execute at a higher frequency (about
7.25% higher) and to have more processor cache per core. In general, higher frequency and
more cache often provide better performance. Refer to Figure 7-7 on page 262.262 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-7 TurboCore design
Table 7-4 shows the processor differences between the MaxCore and TurboCore modes.
Table 7-4 Processor difference between MaxCore and TurboCore
Considerations with TurboCore mode
TurboCore provides a higher frequency core and more cache per core, which are normally
good for performance. We also know that to make use of these positive attributes, the
system’s active core placement needed to change from having eight cores per chip to having
four cores per chip. So, using TurboCore can also mean a change in the number of chips and
perhaps in the number of drawers.
Here is an example to compare the performance between MaxCore and TurboCore. After
switching to TurboCore, the performance increases about 20%. Refer to Figure 7-8 on
page 263.
Mode Process feature of Power 780 Process feature of Power 795
MaxCore 4982/3.86 GHz 4700/4.0 GHz
TurboCore 4982/4.14 GHz 4700/4.25 GHz
1. TurboCore chips: Four available cores
2. Aggregation of L3 caches of unused
cores
3. TurboCore chips have a 2x the L3 cache
per chip available
4. Four TurboCore chips L3 = 32 MB
5. Chips run at a higher frequency
6. With configuration with ASMI, the system
can be reconfigured to eight-core mode.Chapter 7. POWER7 Enterprise Server performance considerations 263
Figure 7-8 Comparing the performance difference between MaxCore and TurboCore
Enabling TurboCore in one server, for a given partition, means an increase in the probability
of longer access latencies to memory and cache. But, partition placement and workload type
can influence these probabilities. As a result, the positive benefits of TurboCore’s higher
frequency and increased cache also have the potential of being offset to various extents by
these longer latency storage accesses.
Configuration of TurboCore
For information about enabling and disabling TurboCore mode, refer to 2.3, “TurboCore and
MaxCore technology” on page 28.
Case study
There are case studies that relate to POWER7 TurboCore performance. See the IBM white
paper Performance Implications of POWER7 Model 780’s TurboCore Mode, which is
available at the following site:
http://www-03.ibm.com/systems/resources/systems_i_pwrsysperf_turbocore.pdf
7.3.3 Active Memory Expansion (AME)
Active Memory Expansion is an innovative POWER7 technology that allows the effective
maximum memory capacity to be up to 100% larger than the true physical memory maximum
for AIX 6.1 and later partitions.
AME relies on the compression of in-memory data to increase the amount of data that can be
placed into memory and thus expand the effective memory capacity of a POWER7 system.
The in-memory data compression is managed by the system, and this compression is
transparent to applications and users. Figure 7-9 on page 264 shows the memory structure
change after applying AME.
TurboCore Option
50% of the available cores active
0
0.5
1
1.5
2
2.5
3
SMT4 SMT2 Single
0
0.5
1
1.5
2
2.5
3
SMT4 SMT2 Single
Standard Option
All cores active264 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-9 Memory structure change after applying AME
The AME feature can bring these benefits to clients:
AME increases the system’s effective memory capacity.
AME enables a system to process more work by increasing the system’s effective memory
capacity.
Because AME relies on memory compression, additional CPU utilization is consumed when
AME is in use. The amount of additional CPU utilization needed for AME varies based on the
workload and the level of memory expansion being used.
For more information about AME, see the IBM white paper, Active Memory Expansion:
Overview and Usage Guide, which is available at the following website:
ftp://public.dhe.ibm.com/common/ssi/en/pow03037usen/POW03037USEN.PDF
Active Memory Expansion considerations
Application performance in an AME environment depends on multiple factors, such as the
memory expansion factor, application response time sensitivity, and how compressible the
data is.
Figure 7-10 on page 265 illustrates the general relationship between application response
time, application throughput, CPU utilization, and the percentage of memory expansion. The
CPU utilization increases with a larger percentage memory expansion due to more
compression and decompression activity, which impacts the application response time. An
increase in application response time often results in less application throughput.
Memory
AME
Uncompressed
Pool
Compressed
Pool
Expanded Memory
Real Memory Size
Real Memory Size
Expanded Memory SizeChapter 7. POWER7 Enterprise Server performance considerations 265
Figure 7-10 Generation performance factors relationship with AME
In AIX (from 6.1.0.4 SP2 or 7.1), the amepat command is useful for sizing if you want to apply
AME technology. This command reports AME information and statistics, as well as provides
an advisory report that assists in planning the use of AME for existing workloads. See 7.8.7,
“Monitoring Active Memory Expansion (AME) statistics” on page 319 to get more information
about the command.
Various kinds of applications result in various behaviors after enabling the AME function, for
example, the SAP ABAP application can save more memory after you turn on AME, but the
SAP Java application might gain less benefit from AME. Figure 7-11 on page 266 shows the
difference between them.266 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-11 Potential real memory saving by AME for SAP
For more information about the details of Figure 7-11, refer to the following website:
https://www-927.ibm.com/servers/eserver/storageplaza/bert.nsf/files/2010CSIIelecti
ve-presentations/$File/E08%20Improving%20SAP%20flexibility%20with%20POWER.pdf
For more performance considerations about AME, refer to the IBM white paper, Active
Memory Expansion Performance, which is available at the following website:
ftp://ftp.software.ibm.com/common/ssi/sa/wh/n/pow03038usen/POW03038USEN.PDF
Configuring Active Memory Expansion
Regarding planning for AME and configuring AME, refer to “Active Memory Expansion (AME)
configuration” on page 235.
Case study
For detailed testing and measuring information about how to apply the AME function with
SAP ABAP application, refer to Chapter 3 of the IBM white paper, “Active Memory Expansion
Performance”. This paper describes the ERP workload of performance measurements, single
partition throughput, and the server and is available at the following website:
ftp://ftp.software.ibm.com/common/ssi/sa/wh/n/pow03038usen/POW03038USEN.PDF
7.3.4 Logical memory block size
Processors use memory to temporarily hold information. Memory requirements for LPARs
depend on the LPAR configuration, assigned I/O resources, and applications used.
0
50
100
150
200
250
300
350
400
450
500
74 148 222 296 370 444 518 592 666 740 814
Real Memory Size (GB)
Potential Memory Saving (GB)
Potential Memory Saving by AME (GB)
ABAR Application Saving, We compressing, 50%
Real life SAP, 70% ABAP + 30% Java Loads
Java Application, Less compressing, 10%
Range of SavingsChapter 7. POWER7 Enterprise Server performance considerations 267
Logical memory block (LMB) size can be assigned in increments of 16 MB, 32 MB, 64 MB,
128 MB, and 256 MB. The default memory block size varies according to the amount of
configurable memory in the system.
Table 7-5 shows the default logical memory block size in various systems.
Table 7-5 Default memory block size used for varying amounts of configurable memory
For more information about the logical memory block size, refer to the following website:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hat/ip
hatlparmemory.htm
Considerations with logical memory block size
To select a reasonable logical block size for your system, consider both the desired
performance and the physical memory size. Use the following guidelines when selecting
logical block sizes:
On systems with a small amount of memory installed (2 GB or less), a large logical
memory block size results in the firmware consuming an excessive amount of memory.
Firmware must consume at least 1 logical memory block. As a general rule, select the
logical memory block size to be no greater than 1/8th the size of the system’s physical
memory.
On systems with a large amount of installed memory, small logical memory block sizes
result in a large number of logical memory blocks. Because each logical memory block
must be managed during the system boot, a large number of logical memory blocks can
cause boot performance problems.
Ensure that the logical memory block (LMB) size is the same on the source and
destination systems during Live Partition Mobility (LPM).
Configuring logical memory block size
The memory block size can be changed by using the Integrated Virtualization Manager (IVM),
the Systems Director Management Console (SDMC) command-line interface, or the Logical
Memory Block Size option in the Advanced System Management Interface (ASMI).
In this section, we introduce how to change the logical memory block size via the ASMI.
To perform this operation, you must have one of the following authority levels:
Administrator
Authorized service provider
Amount of configurable memory Default logic memory block size
Less than 4 GB 16 MB
Greater than 4 GB up to 8 GB 32 MB
Greater than 8 GB up to 16 GB 64 MB
Greater than 16 GB up to 32 GB 128 MB
Greater than 32 GB 256 MB
System restart: The logical memory block size can be changed at run time, but the
change does not take effect until the system is restarted. 268 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
To configure the logical memory block size, perform the following steps:
1. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
2. In the navigation area, expand Performance Setup.
3. Select Logical Memory Block Size.
4. In the right pane, select the logical memory block size and click Save Settings, as shown
in Figure 7-12.
Figure 7-12 Configuring the Logical Memory Block size
7.3.5 System huge-page memory
IBM POWER6 servers or later can support 4 KB, 64 KB, 16 MB, and 16 GB page sizes. In this
topic, we introduce the 16 GB page size, which is called the huge-page memory size.
Using a larger virtual memory page size, such as 16 GB, for an application’s memory can
significantly improve the application’s performance and throughput due to the hardware
efficiencies that are associated with larger page sizes.
To use huge-page memory, a huge-page memory pool needs to be created when the
managed system is in the powered-off state. After a managed system has been configured
with a 16 GB huge-page pool, a system administrator can assign 16 GB huge pages to
partitions by changing a partition’s profile.
Before specifying the value for huge-page memory, you must determine which applications
might benefit from this feature.
Remember: You must shut down and restart your managed system for the change to take
effect.Chapter 7. POWER7 Enterprise Server performance considerations 269
For detailed information about how to determine huge-page memory requirements, and
considerations for calculating the huge-page values, refer to the following website:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/ipha1_p5
/calculatinghugepgmemory.htm
Huge-page memory considerations
Consider these factors when using the huge-page memory feature:
Huge-page memory is intended to only be used in high-performance environments,
because it can improve performance in specific environments that require a high degree of
parallelism, for example, DB2 databases. You can specify the huge-page memory that can
be used for the shared-memory buffer pools in DB2.
The huge-page memory allocation cannot be changed dynamically.
At the time of writing this book, huge-page memory was not supported when suspending
an LPAR. Also, huge-page memory was not supported with active LPM. Huge-page
memory is supported in inactive partition mobility solutions.
After setting huge-page memory for your server and LPARs, you can monitor it from the
HMC, SDMC, or the operating system.
Configuring the system huge-page memory pool
Follow this example of using the ASMI to configure a system huge-page memory pool.
To set up your system with larger memory pages, perform the following steps:
1. On the ASMI Welcome pane, specify your user ID and password, and click Log In.
2. In the navigation area, expand Performance Setup.
3. Select System Memory Page Setup.
4. In the right pane, select the settings that you want.
5. Click Save Settings and power on the server. Refer to Figure 7-13.
Figure 7-13 Configuration of system memory huge-page setup270 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Configuring huge-page memory for LPARs
To configure huge-page memory values for an LPAR, you can use the HMC, SDMC, and IVM.
This section introduces the configuration method via the HMC:
1. In the navigation pane, expand Systems Management ? Servers.
2. Select the server that has the LPAR that you want to configure.
3. In the work pane, select the LPAR for which you want to set huge-page memory values.
4. Select Configuration ? Manage profiles. The Managed Profiles window opens.
5. Select the LPAR that you want to configure.
6. Select Actions ? Edit. The Logical Partition Profile Properties window opens.
7. Click the Memory tab.
8. Assign the Huge Page memory for this partition profile, and Click OK.
Figure 7-14 shows the configuration window of the huge page size for an LPAR.
Figure 7-14 Configuring the huge page memory for an LPAR Chapter 7. POWER7 Enterprise Server performance considerations 271
Monitoring for huge-page memory from the HMC
To monitor huge-page memory from the HMC, Select Properties for the managed server.
Click the Advanced tab. Look at the current server’s huge-page memory state, as shown in
Figure 7-15.
Figure 7-15 Monitoring a server’s huge-page memory state from the HMC272 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
To monitor the state of an LPAR’s huge-page memory state from the HMC, select Properties
of a Logical Partition? Click the Hardware tab. Click the Memory tab. Refer to Figure 7-16.
Figure 7-16 Monitoring an LPAR’s huge-page memory state from the HMC
Monitoring huge-page memory from AIX
We can see the current state of an LPAR’s hugh-page memory by executing the svmon
command in the AIX environment, as shown in Example 7-1.
Example 7-1 Monitoring the huge-page memory from AIX
p29n01:/ # svmon -G
size inuse free pin virtual mmode
memory 22020096 21468793 551303 21334339 445183 Ded
pg space 1048576 3474
work pers clnt other
pin 259427 0 0 103392
in use 445183 0 52090
PageSize PoolSize inuse pgsp pin virtual
s 4 KB - 308073 3474 212275 255983
m 64 KB - 11825 0 9409 11825
S 16 GB 5 0 0 5 0
Case study
You can read about one test case, which showed performance improvement after using
huge-page memory, at the following website:
http://www.ibm.com/developerworks/data/library/techarticle/dm-0606kamath/index.htmlChapter 7. POWER7 Enterprise Server performance considerations 273
7.4 Performance considerations with hardware RAS features
In February 2010, IBM announced the first models in a new generation of Power servers
based on the POWER7 microprocessor. POWER7 servers provide an extensive set of
hardware features related to reliability, availability, and serviceability (RAS). At the same time,
POWER7 servers deliver industry-leading performance through their architectural design,
including modularity, timing closure, and efficiency.
The POWER7 RAS architecture is intended to work with the hardware, independently of any
operating system. By using a dedicated service processor, there is usually no reason to turn
off the features or tune them for performance.
Most recoverable errors are handled during run time by the dedicated service processor. This
processor, independently of any system processor, has dedicated access to detailed error
information from various processor and memory components that can be accessed and
assessed during run time without affecting the performance of the system.
Typically, transient recoverable errors can be handled quickly within the hardware and can
have no effect on performance.
Frequently occurring, but recoverable faults can be eliminated by using the built-in
redundancy capabilities of a system. For example, customized dual inline memory modules
(DIMMs) that are used in high-end systems have a spare memory module for each rank,
which can be substituted for a faulty memory module.
If the handling of a recoverable fault causes a measurable effect on performance, it is the
system design goal to report the fault through the error-reporting structure and, as needed,
request repair. Until the repair is completed, performance might be affected. For example, a
processor core has been determined to be unable to continue processing instructions. The
RAS feature that is known as Alternate Processor Recovery can seamlessly migrate the
workload that is being run on that core to another processor core. If the processor core in
question was unlicensed in the system at time (referred for later capacity update), the
operation does not affect the current system performance. Otherwise, the overall
performance of the system is affected by the temporary deallocation of one core.
However, the RAS feature, Active Memory Mirroring for the hypervisor, might have an effect
on performance even in a system that runs well.
7.4.1 Active Memory Mirroring for the hypervisor
Active Memory Mirroring for the hypervisor is a new RAS feature being introduced on the
Power 795
9
that is designed to eliminate the potential for a complete system outage as a
result of an uncorrectable error in memory. Active Memory Mirroring requires that in each
node of a Power 795 system at least one processor module must be fully configured with
eight DIMMs. When Active Memory Mirroring for the hypervisor is enabled (default), the
Power 795 system maintains two identical copies of the system hypervisor in memory at all
times. Both copies are simultaneously updated with any changes. This design might result in
a minor memory performance effect, and less memory might be available for partitions. If you
want to disable Active Memory Mirroring for the hypervisor, refer to 2.1.1, “Active Memory
Mirroring for the hypervisor on Power 795” on page 13.
9
In the IBM POWER7 product line announcement of October 2011, the Active Memory Mirroring feature is
introduced in the Power 780 (9179-MHC) as a standard feature and in the Power 770 (9117-MMC) as an optional
feature.274 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7.5 Performance considerations with Power virtualization
features
IBM PowerVM on Power Systems servers can be consider virtualization without limits.
Businesses are turning to PowerVM virtualization to consolidate multiple workloads into fewer
systems, increasing server utilization to ultimately help reduce cost. PowerVM provides a
secure and scalable virtualization environment for AIX, IBM i, and Linux applications that is
built upon the advanced RAS features and leading performance of the Power Systems
platform.
In this section, we introduce performance considerations when implementing PowerVM
features. We discuss the following topics:
Dynamic logical partitioning (DLPAR)
Micro-partitioning
Linux on Power (LX86)
Virtual I/O server
Active Memory Sharing (AMS)
Live Partition Mobility (LPM)
7.5.1 Dynamic logical partitioning (DLPAR)
Dynamic logical partitioning is available on POWER4-based System p systems with
microcode updates that are dated October 2002 or later. DLPAR increases the flexibility of
logically partitioned systems by allowing you to dynamically add and remove processors,
memory, I/O slots, and I/O drawers from active LPARs.
You can perform the following operations with DLPAR:
Move a resource from one partition to another partition
Remove a resource from a partition
Add a resource to a partition
The resource includes processors, memory, and I/O slots.
For detailed information about how to use dynamic LPAR, refer to the IBM Redbooks
publication, IBM PowerVM Virtualization Managing and Monitoring, SG24-7590, available at
the following website:
http://www.redbooks.ibm.com/abstracts/sg247590.html
DLPAR considerations
You need to be aware of several considerations when using dynamic LPAR:
When removing or adding memory from a partition, the time that it takes to complete a
DLPAR operation is relative to the number of memory chunks being removed.
The affinity logical partitioning configuration allocates CPU and memory resources in fixed
patterns based on multi-chip module (MCM) boundaries. The HMC does not provide
dynamic reconfiguration (DR) of processor or memory support on affinity partitions. Only
the I/O adapter resources can be dynamically reconfigured when you run affinity logical
partitioning.
When you remove memory from a partition, the DR operation succeeds even if there is not
enough free physical memory available to absorb outgoing memory, provided there is
enough paging space available instead of physical memory. Therefore, it is important to
monitor the paging statistics of the partition before and after a DR memory removal. The Chapter 7. POWER7 Enterprise Server performance considerations 275
virtual memory manager is equipped to handle paging; however, excessive paging can
lead to performance degradations.
In certain cases, the DLPAR operation breaks memory affinity with the processor, which
affects performance.
There are tools that support DR operations. These tools are designed to recognize
configuration changes and adjust their reporting accordingly. The following tools provide
DLPAR support: topas, sar, vmstat, iostat, and rmss.
For detailed information about monitor tools, refer to the IBM Redbooks publication, AIX 5L
Performance Tools Handbook, SG24-6039, which is located at the following website:
http://www.redbooks.ibm.com/abstracts/sg246039.html?Open
For more information about DLPAR performance considerations, refer to the information
center website:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.prf
tungd/doc/prftungd/dyn_log_part.htm
7.5.2 Micro-partitioning
An LPAR that utilizes processor resources from a shared processor pool is known as a
micro-partition LPAR. Micro-partitioning support provides flexible and efficient use of system
hardware resources by allowing physical processors to be shared (time-sliced) between
micro-partitions.
In a client’s production environment, in general, there are multiple system images within one
Power server, and each of these system images runs on a separate hardware system with
sufficient capacity to handle spikes in processor requirements, but each system is
underutilized.
Micro-partitioning support can allow these system images to be consolidated on a single set
of physical processors, allowing more efficient use of hardware resources, and providing a
reduction in the physical footprint required by multiple systems. Micro-partitioning with
uncapped processing provides more granularity for CPU resource balancing and allows idle
CPU cycles to be recovered and used by other partitions.
You can take advantage of the following advantages with micro-partitioning:
Optimal resource utilization
Rapid deployment of new servers
Application isolation
For detailed information about concepts and how to use micro-partitioning, refer to the IBM
Redbooks publication, IBM PowerVM Virtualization Introduction and Configuration,
SG24-7940, which is located at the following website:
http://www.redbooks.ibm.com/abstracts/sg247940.html
Important: Dynamically partitioning large memory pages (16 MB page size) is not
supported. A memory region that contains a large page cannot be removed.276 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Micro-partitioning considerations
The overall benefit of micro-partitioning is the increased utilization of system resources by
applying only the required amount of processor resource that is needed to each partition. To
ensure that the hypervisor’s memory pages keep track of all virtual partitions, consider the
capacity requirements carefully when choosing values for the attributes of the virtual
partitions.
CPU-intensive applications, such as high-performance computing applications, might not be
suitable for a micro-partitioning environment. If an application uses most of its entitled
processing capacity during execution, use a dedicated processor partition to handle the
demands of the application.
Tips when you deploy micro-partitioning
Consider these tips when implementing micro-partitioning:
Correctly determine the micro-partition processor allocation. Sizing the partition too small
can significantly increase response times. In addition, the processor efficiency is affected
more with smaller partitions or more partitions.
On POWER7 systems, consider using uncapped processing to better utilize idle processor
cycles in other partitions in the shared pool.
Limit the number of micro-partitions that are active at any one time. Workloads that are
cache sensitive or have response time criteria might suffer with the increased contention
that micro-partitioning places on shared resources.
Balance the memory DIMMs and I/O across the modules. Use the same size memory
DIMMs on the modules, whenever possible, to help reduce latencies caused by remote
references and avoid “hot spots”.
On POWER7 systems, the hypervisor attempts to optimize memory allocations at full
system startup. If after the system has started, you change a partition’s memory allocation
on a multi-module system, you can introduce more remote memory references as memory
is “reallocated” from its initial optimal allocation to another module. If you suspect that this
situation has happened, another full system startup re-optimizes the memory allocations.
Configuring virtual processors in a shared partition environment
Consider this additional information when configuring virtual processors in a shared partition
environment:
When creating a capped partition, for maximum processor efficiency and partition CPU
capacity, the number of desired virtual processors needs to be the minimum that can
consume the desired entitled capacity, for example:
– If the desired entitled capacity is 3.6 processing units, the number of desired virtual
processors must be 4.
– If the desired entitled capacity is 0.75 processing units, the number of desired virtual
processors must be 1.
When creating an uncapped partition, for maximum processor efficiency and partition
CPU capacity, follow these guidelines:
– Do not make the number of virtual processors for a partition greater than the number of
processors in the shared processor pool. The number of processors in the shared pool
is the maximum number of physical processors that a partition can use concurrently.
– Do not set the number of virtual processors for a partition to a number that is greater
than the number of available processing units. Other shared partitions are using their
entitled processing units, so, in many cases, the entire shared pool size is not available
for a single partition to use.Chapter 7. POWER7 Enterprise Server performance considerations 277
– Where possible, set the partition’s entitled processing units as close to the anticipated
CPU processing requirements as possible. The more CPU that processing partitions
use as uncapped (for example, beyond their entitlement), the greater the processor
efficiency effects that are caused by increased virtual processor switching.
– When attempting to take advantage of unused shared pool resources, set the number
of virtual processors close to the expected capacity that you are trying to achieve for
that partition.
Setting virtual processors to higher values usually results in reduced processor efficiency
and can result in decreased performance from increased contention.
For detailed information about micro-partition performance with Power Systems servers, refer
to following IBM white paper:
http://www-03.ibm.com/systems/resources/lparperf.pdf
For more information about IBM PowerVM and micro-partition, refer to following information
center website:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.prftungd/doc/p
rftungd/micro_part.htm
Case studies
The following case studies were performed while implementing IBM PowerVM virtualization:
Case 1
This IBM Redpaper introduces an actual application (life sciences application) benchmark
test with Power servers. The publication includes three applications in life sciences and how
the availability of a pool of virtual processors improves the time to the solution. A partition that
has exhausted its resources can take advantage of a pool of shared virtual processors,
provided that the shared virtual processors are not required by other partitions. For more
information, refer to the IBM Redpaper, Life Sciences Applications on IBM POWER5 and AIX
5L Version 5.3: Virtualization Exploitation through Micro-Partitioning Implementation,
REDP-3966, which is located at the following website:
http://www.redbooks.ibm.com/redpapers/pdfs/redp3966.pdf
Case 2
This IBM white paper describes how micro-partitioning can be deployed on IBM Power
servers for consolidation. It includes an example consolidation scenario and explores the
performance robustness of micro-partitioning in a demanding transactional environment. For
more information about this case, refer to the following website:
ftp://ftp.software.ibm.com/software/uk/itsolutions/datacentreoptimisation/virtuali
sation-consolidation/server/ibm-system-p5-570-server-consolidation-using-power5-vi
rtualization.pdf
7.5.3 PowerVM Lx86
PowerVM Lx86 supports migrating most 32-bit x86 Linux applications to any Power Systems
or BladeCenter model with POWER7 or POWER6 processors, or with IBM Power architecture
technology-based blade servers. Best of all, no native porting or application upgrade is
required for running most x86 Linux applications. PowerVM Lx86 offers these advantages:
Exceptional performance and scalability, allowing much greater consolidation of workloads
Improved service quality through leadership availability and security features 278 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
The ability to dynamically optimize the mix of processor, memory, disk, and network
resources with optional IBM PowerVM virtualization technology
For more information about PowerVM Lx86 and how to set up the Lx86 environment, refer to
the IBM Redpaper publication, Getting Started with PowerVM Lx86, REDP-4298, which is
located at the following website:
http://www.redbooks.ibm.com/abstracts/redp4298.html
Lx86 considerations
PowerVM Lx86 runs most x86 Linux applications, but PowerVM Lx86 cannot run applications
with these characteristics:
Directly access hardware, for example, 3D graphics adapters.
Require nonstandard kernel module access or use kernel modules that are not provided
by the Linux on Power operating system distribution.
Do not use only the Intel IA-32 instruction set architecture as defined by the 1997 Intel
Architecture Software Developer’s Manual consisting of Basic Architecture, 243190,
Instruction Set Reference Manual, 243191, and the System Programming Guide, 243192,
dated 1997.
Do not run correctly on RHEL 4 starting with Version 4.3 or Novell SLES 9 starting with
Version SP3 or Novell SLES 10.
Require RHEL 5, a Linux distribution currently unsupported by PowerVM Lx86, to run.
Are Linux/x86-specific system administration or configuration tools.
Require x86 real mode.
Regarding performance, Figure 7-17 on page 279 shows the PowerVM Lx86 application
translation process. The translation is a three-stage process:
1. Decoding: x86 binary instructions from the x86 binary are decoded as the application
requests them.
2. Optimization: The optimization is iterative, so more optimization is done on frequently
used code.
3. Generation of Power code: Frequently used code is stored in memory, so it does not need
to be translated again the next time that it runs.
From the translation process, we can find when an x86 application is executed in the
PowerVM Lx86 environment, because more CPU and memory resources are needed than in
a pure x86 environment. So, there are performance issues when migrating to the PowerVM
Lx86 environment.
The performance of certain x86 Linux applications running on PowerVM Lx86 might
significantly vary from the performance obtained when these applications run as a native port.
There are various architectural differences between x86 and Power architecture that can
affect the performance of translated applications. For example, translating dynamically
generated code, such as Java byte code, is an ongoing translation process, which can affect
the performance of x86 Java applications using an x86 Java virtual machine.
Floating-point applications running under x86 have a separate default precision level from
Power architecture, so translating between these levels can have additional performance
penalties. Also, translating and protecting multithreaded applications can incur an additional
performance overhead as the translator works to manage shared memory accesses. IBM
suggests that clients carefully consider these performance characteristics when selecting the
best method for enabling applications for their environment. Chapter 7. POWER7 Enterprise Server performance considerations 279
For detailed information about PowerVM Lx86 performance, refer to the following website:
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=ca&infotype=an&appname=i
Source&supplier=897&letternum=ENUS208-010
Figure 7-17 PowerVM Lx86 translation process
Case study
This IBM white paper provides performance comparisons between PowerVM Lx86 and
x86-based environments and is available at the following website:
http://www-03.ibm.com/systems/power/software/virtualization/Whitepapers/powervm_x8
6.html
7.5.4 Virtual I/O server
The virtual I/O server is part of the PowerVM Editions. The virtual I/O server is software that is
located in an LPAR. This software facilitates the sharing of physical I/O resources between
client LPARs within the server.
As a result, you can perform the following functions on client LPARs:
Share SCSI devices, Ethernet adapters, and FC adapters
Expand the amount of memory available to LPARs and suspend and resume LPAR
operations by using paging space devices
Considerations: PowerVM Lx86 is not recommended with applications that are highly
computational in nature, highly performance sensitive, or make heavy use of Java.
Linux on x86
Application binary
loaded into memory
Decoding
Optimization
Code is generated
for the POWER
processor
Code is executed
Frequently
Requested
code is stored
in memory in
translated form280 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
For more information about recent virtual I/O server features, refer to IBM PowerVM
Virtualization Introduction and Configuration, SG24-7940, which is located at the following
site:
http://www.redbooks.ibm.com/abstracts/sg247940.html
The virtual I/O server has an extremely important role in the IBM PowerVM solution, because
it is the foundation of many advanced PowerVM features, such as AMS and LPM.
In the next section, we introduce performance considerations about CPU configuration, virtual
SCSI, virtual network, and virtual FC.
Considerations with virtual SCSI
Using virtual SCSI (vSCSI), client LPARs can share disk storage and tape or optical devices
that are assigned to the virtual I/O server LPAR. We list several performance considerations
about vSCSI:
A RAID card can be used by either (or both) the virtual I/O server and virtual I/O clients
(VIOC) disk.
For performance reasons, logical volumes within the virtual I/O servers that are exported
as vSCSI devices must not be striped or mirrored, span multiple physical drives, or have
bad block relocation enabled.
SCSI reserves have to be turned off whenever you share disks across two virtual I/O
servers.
Set vSCSI Queue depth to match the underlying real devices.
Do not configure a large number of vSCSI adapters per client; four vSCSI adapters are
typically sufficient.
If you use the FC Multi-Path I/O (MPIO) on the virtual I/O server, set the following fscsi
device values (requires switch attachment and support by the switch):
a. dyntrk=yes (Dynamic Tracking of FC Devices)
b. fc_err_recov= fast_fail (FC Fabric Event Error Recovery Policy)
If you use the MPIO on the VIOC, set the following hdisk device values:
hcheck_interval=60 (Health Check Interval)
Shared Ethernet Adapter (SEA) considerations
The following performance considerations about virtual networks are important:
You need to know the network workload types: Transmission Control Protocol (TCP)
streaming or TCP request and response.
If the network workload type is TCP streaming, we suggest that you set maximum
transmission unit (MTU) to 9000 (jumbo frames), which saves CPU resources.
If the network workload type is TCP request and response, you need to monitor most
network package sizes. If the size <= 64 bytes, it is a small package. If the size >=1024
byes, it is a large package. If it is a large package, we suggest setting MTU to 9000 (jumbo
frames). If it is a small package, you do not need to change the MTU size; keep the default
For more information: For detailed information about how to tune queue_depth and
qdepth_enable, refer to the IBM white paper, AIX disk queue depth tuning for performance,
at this website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105745Chapter 7. POWER7 Enterprise Server performance considerations 281
size (1500). If most network package sizes are between small and large, a test to decide
which MTU size to use must be performed.
If there is only SEA running in a virtual I/O server environment (without vSCSI in the same
LPAR), we suggest that you disable the SEA adapter’s threading option. If there is vSCSI
or virtual FC, we suggest that you keep the thread attribute to the default value (1). For
example, the following command disables threading for the Shared Ethernet Adapter ent1:
mkvdev -sea ent1 -vadapter ent5 -default ent5 -defaultid 1 -attr thread=0
If you enable the largesend attribute on the SEA, the client partition can transmit large
data, which gets segmented by the real adapter to fit its MTU and saves the partition’s
CPU resource. It needs enabling on the physical adapter first before creating the SEA
device, and it needs enabling on the client partition. The following commands show how to
enable it on these devices:
a. Enable on the physical adapter:
chdev -dev ent0 -attr large_send=1
b. Enable on the SEA device after creation:
chdev -dev ent3 -attr largesend=1
c. Enable on the client partition:
ifconfig en0 largesend
d. Disable on the client partition:
ifconfig en0 -largesend
Considerations with Virtual Fibre Channel
With N_Port ID Virtualization (NPIV), you can configure the managed system so that multiple
LPARs can access independent physical storage through the same physical FC adapter. Here
are two performance considerations about Virtual FC:
To increase the performance of the FC adapters, you sometimes need to modify the
max_xfer_size and num_cmd_elems parameters. Each SAN vendor has a recommended
setting.
These suggestions are general suggestions for the parameters. For a production
environment, you need to monitor the I/O activity and make an assessment if the
parameters need to change. For detailed guidance, refer to the following IBM white paper
at this website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD105745
Considerations with CPUs
The following considerations relate to processors (CPUs):
Configure an uncapped micro-partition with enough virtual processors (VP) to manage
peak loads, especially when it has high network activity through the SEA or plenty of I/O
activity. You need to monitor first and determine the VP number.
Configure it with a higher weight (priority) than its clients if they are also uncapped.
IBM provides a simple formula to size the virtual I/O server CPU resource for SEA. It
includes many factors, such as cycles per byte (CPB), type of streaming, size of
transaction, MTU size, SEA threading option, and so on. For more information about this
sizing formula, refer to the following website:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hb1
/iphb1_vios_planning_sea_procs.htm282 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Case studies
The following case studies provide additional information about network and I/O
considerations.
Case 1
The following technical IBM white paper discusses performance characterization and
configuration preferred practices of IBM i in a virtual I/O high-end external storage
environment with IBM System Storage DS8000® attached natively through the IBM PowerVM
virtual I/O server and through virtual I/O server and the IBM System Storage SAN Volume
Controller (SVC). Refer to the following website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/5cb5ed706d254a8186256c71006d2e
0a/af88e63ccded8f6e86257563002680e1/$FILE/IBM%20i%20Virtual%20IO%20SAN%20Storage%2
0Performance.pdf
Case 2
The following IBM white paper compares two similar configurations running the same SAS
benchmark: one configuration uses directly-attached disks and the other configuration uses
virtual I/O server-attached disks to handle high I/O activity needed by the serial-attached
SCSI (SAS) computational back-end servers. Refer to the following website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101664
Case 3
The following document explains a typical network configuration scenario using virtual I/O
server in a production environment to provide better network performance and redundancy.
This document was created based on a recent experience in setting up a consolidated
production environment using dual virtual I/O servers with a redundant network configuration.
Refer to the following website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101766
7.5.5 Active Memory Sharing
Active Memory Sharing (AMS) is one feature of PowerVM from POWER6. AMS provides
optimized memory utilization similar to the micro-partition’s CPU optimization. The PowerVM
Hypervisor manages real memory across multiple AMS-enabled partitions, distributing
memory to partitions based on their workload demand. Several new concepts are involved in
AMS, such as shared memory pool, paging device, shared memory partition, I/O-entitled
memory, memory weight, virtualization control point (VCP), shared memory manager, paging
virtual I/O server, collaborative memory manager (CMM), and so on.
For more information about AMS concepts and how to set up, monitor, and manage AMS,
refer to PowerVM Virtualization Active Memory Sharing, REDP-4470, which is located at the
following website:
http://www.redbooks.ibm.com/redpieces/abstracts/redp4470.html?Open
Considerations with AMS
Performance tuning with AMS is complex, because there are many components related to
performance. The major components are the shared memory pool, virtual I/O server, paging
device, and shared memory partition. The key for tuning is how to reduce the number of
paging activities on the paging devices and how to accelerate the access speed to the paging
device.Chapter 7. POWER7 Enterprise Server performance considerations 283
Shared memory pool
There are two performance considerations for the shared memory pool:
If the system has unallocated memory, it is better to add the memory to the memory pool
(it can be added dynamically).
If you need to reduce the size of the shared memory pool, we suggest that you reduce the
size when the load on the shared memory partitions is low.
Virtual I/O server
The performance considerations in 7.5.4, “Virtual I/O server” on page 279 are also suitable
for AMS.
Support exists to assign up to two paging virtual I/O server partitions to a shared memory
pool to provide multi-path access to the paging devices. This redundant paging virtual I/O
server configuration improves the availability of the shared memory partitions in the event of a
planned or unplanned virtual I/O server outage. When you configure paging devices that are
accessed redundantly by two paging virtual I/O server devices, the devices must meet the
following requirements:
Physical volumes
Located on SAN
Must be accessible to both paging virtual I/O server partitions
Paging device
Consider these performance factors for paging devices:
Spread the I/O load across as many physical disks as possible.
Use disk caches to improve performance. Due to the random access nature of paging,
write caches provide benefits. Read caches might not have an overall benefit. If you can
use solid state disks (SSD), it is better.
Use a striped configuration, if possible with a 4 KB stripe size, which is ideal for a paging
environment. Because the virtual I/O server cannot provide striped disk access, striping
must be provided by a storage subsystem.
Disk redundancy is recommended. When using a storage subsystem, a configuration
using mirroring or RAID5 is appropriate. The virtual I/O server cannot provide redundancy.
Shared memory partition
The following performance considerations are for the shared memory partition:
The AMS partition only supports the 4 KB page size and does not support 64 KB, 16 MB,
or 16 GB page size.
Logical memory must be sized based on the maximum quantity of memory that the LPAR
is expected to use during the peak time.
When logical memory size is reduced dynamically, we suggest that it is done during
non-peak hours.
When logical memory size is increased dynamically, it does not mean that the LPAR has
more physical memory pages.
Change the memory weight carefully to balance the priority among all the shared memory
partitions to receive physical memory.
Keep enough physical memory for I/O-entitled memory.
The number of virtual processors on a shared memory partition must be calculated so that
when a high page fault rate occurs, the number of running virtual processors is able to
sustain the workload.284 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Keep the parameter (ams_loan_policy) as the default value (1) for most production
workloads.
For more information about performance tuning with AMS, refer to PowerVM Virtualization
Active Memory Sharing, REDP-4470, which is located at the following website:
http://www.redbooks.ibm.com/redpieces/abstracts/redp4470.html?Open
Or, refer to the IBM white paper, IBM PowerVM Active Memory Sharing Performance, which
is located at the following website:
http://public.dhe.ibm.com/common/ssi/ecm/en/pow03017usen/POW03017USEN.PDF
Case study
This paper describes the performance of IBM WebSphere® MQ when running in an AMS
environment and how WebSphere MQ benefits from AMS technology. Refer to the following
website:
http://www-304.ibm.com/partnerworld/wps/servlet/ContentHandler/Whitepaper/power/ai
x/v6r1_power6/performance
7.5.6 Live Partition Mobility
Live Partition Mobility (LPM) is one of the PowerVM features that provides the ability to move
AIX and Linux LPARs from one system to another system. The mobility process transfers the
system environment, including the processor state, memory, attached virtual devices, and
connected users.
There are two types of LPM: Active Partition Mobility and Inactive Partition Mobility.
For more information about the mechanism, planning, and configuration with AMS, refer to
IBM PowerVM Live Partition Mobility, SG24-7460, which is located at the following website:
http://www.redbooks.ibm.com/abstracts/sg247460.html?Open
Considerations with LPM
LPM, combined with other virtualization technologies, such as micro-partitioning, virtual I/O
server, AMS, and so on, provides a fully virtualized computing platform that helps provide the
infrastructure flexibility that is required by today’s production data centers.
For performance considerations with micro-partitioning, virtual I/O server, and AMS, refer to
7.5.2, “Micro-partitioning” on page 275, 7.5.4, “Virtual I/O server” on page 279, and 7.5.5,
“Active Memory Sharing” on page 282.
There are two considerations about LPM performance:
The network performance between the source and the destination systems, which is used
for transferring the system’s state, is important for the elapse time during the active
partition mobility process. We suggest using a dedicated network for the state transfer,
and the bandwidth must be at least 1 Gbps.
For instructions to set up the network to improve partition mobility performance, refer to
the following website:
https://www-304.ibm.com/support/docview.wss?uid=isg3T7000117
Important: Memory weight is merely one of the parameters used by the hypervisor to
decide how many physical pages are assigned to the shared memory partitions.Chapter 7. POWER7 Enterprise Server performance considerations 285
The CPU performance of the virtual I/O server is important for the application suspend
time during the active partition mobility process. We recommend to test and monitor the
CPU performance to get an appropriate performance for the production environment.
To describe the importance of the virtual I/O server’ processor resource, Table 7-6 shows
testing results performed with various configurations. The suspend time depends on the
network bandwidth and the virtual I/O server CPU performance. During the time of active
LPM, the VIOC keeps a high workload (CPU usage is 100% and has a lot of memory for the
operation) and the MTU size of the SEA is 1500. Additional virtual I/O server information is
provided:
Virtual I/O server configuration
The virtual I/O server partition is a micro-partition under capped mode.
Network throughput
The network throughput during the active LPM on the virtual I/O server partition.
Elapsed time
The time between the start migration and end migration.
Suspend time
The time that the application is suspended during the operation. The transaction’s
connection is not broken, and the client’s operation can continue after the suspended state
finishes.
Table 7-6 Testing results for an active LPM operation
In this testing, the processor number of the virtual I/O server affects the network performance
and suspended time. Note that our environment is an testing environment. For a production
environment, we suggest that you perform a similar test to find the best configuration for the
virtual I/O server.
Case studies
The following case studies provide additional information about how to set up the environment
to get the best results when performing Live Partition Mobility operations.
Case 1
This paper documents the findings of a performance test using the PowerVM Live Partition
Mobility feature to migrate an active SAP system with various workload levels. The workload
tests documented in this paper show that the LPM operations succeed even at high system
utilization levels. The amount of time that it takes to complete a migration is dependent on a
number of factors, such as the memory size of the migrating partition, the amount of memory
changes in the partition, and the sustainable network bandwidth between the two VIO servers
performing the migration. The suspend/resume phase during a migration affects the
application throughput and user response times for a few minutes. The overall effect and the
amount of time that is required to reach normal processing levels again increases with the
Virtual I/O server
configuration
Virtual I/O client
configuration
Network
throughput
Elapsed
time
Suspend
time
0.5C/512 MB 0.2C/8 G 50 MB/s 6m14s 2s
0.5C/512 MB 0.2C/16 G 50 MB/s 8m30s 34s
1C/512 MB 0.2C/16 G 77 MB/s 6m 2s
1.5C/512 MB 0.2C/16 G 100 MB/s 4m46s 2s286 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
active load on the system. This document provides suggestions when managing SAP within
an LPM environment. See the following website for detailed information:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101917
Case 2
This article provided detailed analysis for DB2 9.5 running with the PowerVM Enterprise
Edition Live Partition Mobility feature. The DB2 9.5 instance is hosting an OLTP workload.
The DB2 instance is servicing a multitude of clients, a throughput of as many as 2000
transactions per second (TPS) is migrated to another system. The client network connections
remained intact, and the application observed a blip for only a few seconds. Although the
environment is implemented on an IBM POWER6 system with AIX 5.3 TL7, it can also be
deployed on a POWER7 environment. Refer to the publication at the following website:
http://www.ibm.com/developerworks/aix/library/au-db2andpower/index.html
Case 3
This paper explains how to set up a complex PowerVM solution environment, including dual
virtual I/O servers, AMS, and LPM. Using Active Memory Sharing with advanced PowerVM
configurations, including dual virtual I/O servers and Live Partition Mobility, provides benefits
from the high availability and flexibility for your virtualized Power Systems environment. Refer
to the following website:
http://www.ibm.com/developerworks/aix/library/au-activemem/index.html?ca=drs-
7.6 Performance considerations with AIX
In this section, we introduce performance considerations and tuning methods with AIX when
deploying or using POWER7 servers. This section includes the following topics:
Olson and POSIX time zones
Large page size
One Terabyte (TB) segments aliasing
Memory affinity
Hardware memory prefetch
Simultaneous multithreading
New features of XL C/C++ V11.1 to support POWER7
How to deal with unbalanced core and memory placement
Also, we introduce web resources about AIX performance tuning on Power servers.
7.6.1 Olson and POSIX time zones
In AIX 5.3 or earlier versions, the default time zone is POSIX. In AIX 6.1, the default time zone
is replaced with the Olson time zone.
AIX performance tuning: AIX performance tuning deals with many areas. In this section,
we mention a few tuning methods about POWER7. For more detailed information about
AIX tuning, refer to 7.6.9, “AIX performance tuning web resources” on page 302.Chapter 7. POWER7 Enterprise Server performance considerations 287
in AIX 6.1, the Olson time zone is implemented with the AIX International Components for
Unicode (ICU) library and provides additional features:
It is natural for an user to know a city name than to know the POSIX format of a time zone.
It is easy to maintain the changes of daylight saving time (DST) over history, although
historically DST changes are not an important factor for most users.
In AIX 7.1, the default time zone is still Olson, but the implementation changed over the native
code and avoids using the ICU library.
In AIX 6.1 and AIX 7.1, the POSIX time zone is still supported. Refer to the following IBM
information center website:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/topic/com.ibm.aix.baseadmn/doc/b
aseadmndita/olson_time_zone.htm?resultof=%22%70%6f%73%69%78%22%20%22%74%69%6d%65%2
2%20%22%7a%6f%6e%65%22%20
How to know the time zone that is used in the current AIX environment
Check the output of the command (env|grep TZ). If the output format is similar to
“TZ=Asia/Shanghai”, it is an Olson time zone format. If the output format is similar to
“TZ=BEIST-8”, it is a POSIX time zone format.
Performance difference between Olson and POSIX time zones
Although time zone is an environmental variable, it provides a value and does not involve
other functions. But some local time-related subroutines, for example, localtime() and
gettimeofday(), use time zone values. Using other time zone values might cause these
functions’ response times to differ.
In AIX 6.1, when enabling the Olson time zone, the implementation of those subroutines
relies on the ICU library, and the arithmetic of the ICU library, which has been introduced by
the International Components for Unicode, is more complex than the implementation of the
POSIX time zone. Normally, the POSIX time zone value has better performance than the
Olson value.
The Olson time zone penalty might not be a concern in cases where the application looks up
the local time only one time or occasionally, but in cases where the local time-related
subroutines are going to be repeatedly called, or if an application is performance sensitive, it
is a much better option to continue using the POSIX time zone format in AIX 6.1.
In AIX 7.1, the implementation of the Olson time zone does not rely on the ICU library; it uses
native code. So, the performance gains improvements. The performance is similar between
the POSIX and Olson time zones in AIX 7.1.
Setting the POSIX time zone in AIX 6.1 or later
Refer to the following steps to set the POSIX time zone in AIX 6.1 or later:
1. Log in with the root user and edit the /etc/environment:
vi /etc/environment
2. Change the TZ environment variable to POSIX time zone format and save the file, for
example:
TZ=BEIST-8
3. Then, reboot the system.288 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Case study
In order to provide a timeout mechanism with the DB2 client, DB2 provides one environment
variable (DB2TCP_CLIENT_RCVTIMEOUT) to enable it. For example, when you set
DB2TCP_CLIENT_RCVTIMEOUT=10, the timeout value is 10 seconds. After enabling this
function, DB2 adds invoking subroutines for each transaction, it includes localtime() and
gettimeofday() and others. If the response time of the transaction is extremely short, for
example, 0.00005s, and you enable this environment variable, the performance difference is
obvious (sometimes more than 10%) between the Olson and POSIX time zones in the
AIX 6.1 environment.
For more information about the DB2 registry variable (DB2TCP_CLIENT_RCVTIMEOUT),
refer to the following IBM website and search DB2TCP_CLIENT_RCVTIMEOUT:
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.ud
b.admin.doc/doc/r0005660.htm
7.6.2 Large page size
AIX supports the 16 MB page size, which is also known as large pages. The use of large
pages reduces the number of translation lookaside buffer (TLB) misses and therefore
improves performance. Applications can use large pages for their text, data, heap, and shared
memory regions.
Considerations with large pages
The major benefit of larger page sizes is improved performance for applications that
repeatedly access large amounts of memory. Performance improvements from larger page
sizes result from reducing the overhead of translating an application page address to a page
address that is understood by the computer’s memory subsystem.
Large page support is a special-purpose performance improvement feature and is not
recommended for general use. Note that not all applications benefit from using large pages.
In fact, certain applications, such as applications that perform a large number of fork()
functions, are prone to performance degradation when using large pages.
Testing: For Independent Software Vendor (ISV) applications, we suggest that you
perform proof of concept (POC) testing to verify the feasibility and stability before changing
the time zone in a production environment.
Important: The 16 MB page size is only for high-performance environments, 64 KB pages
are considered general purpose, and most workloads will likely see a benefit from using
64 KB pages rather than 4 KB pages. For more information about the 64 KB page size,
refer to the IBM information center website (search on multiple page size support):
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
prftungd/doc/prftungd/multiple_page_size_support.htm
Starting from AIX 5.3 TL8 or AIX 6.1 TL1, AIX supports specifying the page size to use for
a process’s shared memory with the SHMPSIZE environment variable. For detailed
information about the SHMPSIZE environment variable, refer to the IBM information center
website:
http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=/com.ibm.aix.
prftungd/doc/prftungd/multiple_page_size_app_support.htmChapter 7. POWER7 Enterprise Server performance considerations 289
Rather than using the LDR_CNTRL environment variable, consider marking specific
executable files to use large pages, which limits the large page usage to the specific
application that benefits from large page usage.
If you are considering using large pages, think about the overall performance effect on your
system. Certain applications might benefit from large page usage, but you might see
performance degradation in the overall system performance due to the reduction of 4 KB
page storage available on the system. If your system has sufficient physical memory so that
reducing the number of 4 KB pages does not significantly hinder the performance of the
system, you might consider using large pages.
Enabling large pages on AIX
To enable large pages on AIX, follow these steps:
1. Compute the number of 16 MB pages needed from the memory size requirements of the
application, including text, data, heap, and shared memory region:
vmo –p –o lgpg_regions= -o lgpg_size=16777216
Input calculation: number_of_large_pages=INT[(Share Memory Size - 1)/16 MB)] + 1
2. Turn on the v_pinshm parameter to allow pinning of shared memory segments:
vmo -p -o v_pinshm=1
Leave maxpin% at the default of 80.
3. Change the attribute profile of the Oracle user to allow using large pages:
chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
Enabling large pages for DB2
To enable large pages, use the db2set command to set the DB2_LARGE_PAGE_MEM
registry variable to DB:
db2set DB2_LARGE_PAGE_MEM=DB
For detailed information about how to enable large pages on DB2, refer to this website:
http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/com.ibm.db2.
luw.admin.dbobj.doc/doc/t0010405.html
Enabling large pages for Informix (from IDS 11.50)
To enable large pages, change the profile of the Oracle user to allow the use of large pages:
export IFX_LARGE_PAGES=1
For detailed information about how to enable large pages on Informix®, refer to this website:
http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp?topic=/com.ibm.gsg
.doc/ids_rel_188.htm
Enabling large pages for Oracle
To enable large page usage, refer to the following steps:
1. Modify the XCOFF executable file header of the Oracle binary file:
ldedit -b lpdata
AMS partitions: An AMS partition does not support large page size. It only supports 4 KB
page size.290 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2. Change the Oracle initialization parameter, so that Oracle requests large pages when
allocating shared memory:
LOCK_SGA=true
For detailed information about how to enable large pages on Oracle, refer to this website:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP100883
When these steps are complete, we can start the database to start using large page memory
for the database shared memory region.
How to monitor large page utilization
To see how many large pages are in use, use the vmstat command with the flags that are
shown in Table 7-7. There are three parameters to show large pages.
Table 7-7 Description of vmstat command option for monitoring large page utilization
Example 7-2 introduces how to use the command, and it shows that there are 32 active large
pages (alp) and 16 free large pages (flp), for a total of 48 large pages. There are 32 large
pages that can be used for the client’s application (including databases).
Example 7-2 Monitoring large pages using vmstat
#vmstat -l 2
System configuration: lcpu=4 mem=8192MB
kthr memory page faults cpu large-page
----- ----------- ------------------------ ------------ ----------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa alp flp
0 0 1359405 169493 0 0 0 0 0 0 31 396 382 0 0 99 0 32 16
0 0 1359405 169493 0 0 0 0 0 0 22 125 348 0 0 99 1 32 16
0 0 1359405 169493 0 0 0 0 0 0 22 189 359 0 0 99 1 32 16
#vmstat -P ALL
System configuration: mem=8192MB
pgsz memory page
----- -------------------------- ------------------------------------
siz avm fre re pi po fr sr cy
4K 882928 276631 103579 0 0 0 56 110 0
64K 63601 59481 4120 0 0 0 0 0 0
16M 48 32 16 0 0 0 0 0 0
#vmstat -p ALL
System configuration: lcpu=4 mem=8192MB
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
vmstat option Description
-l Displays an additional “large-page” section with the active large pages and free
large pages columns
-P Displays only the VMM statistics, which are relevant for the specified page size
-p Appends the VMM statistics for the specified page size to the regular vmstat outputChapter 7. POWER7 Enterprise Server performance considerations 291
1 2 1359403 169495 0 0 0 56 110 0 90 4244 1612 3 1 96 0
psz avm fre re pi po fr sr cy siz
4K 276636 103574 0 0 0 56 110 0 882928
64K 59481 4120 0 0 0 0 0 0 63601
16M 32 16 0 0 0 0 0 0 48
You also can use the svmon command to display information about the memory state of the
operating system or specified process.
See the following documentation for additional information about large pages:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.prf
tungd/doc/prftungd/large_page_ovw.htm
7.6.3 One TB segment aliasing
AIX supports application segments that are one terabyte (TB) in size starting from AIX 6.1
TL6 or AIX 7.1. Prior AIX versions supported only 256 MB segments. With the TB segment
support, we nearly eliminate a huge amount of Segment Lookaside Buffer (SLB) misses, and
therefore SLB reload overhead in the kernel
10
. Applications using a large amount of shared
memory segments (for example, if one process needs 270 GB, leading to over 1300 256 MB
segments) incur an increased number of SLB hardware faults, because the data referenced is
scattered across all of these segments. This problem is alleviated with TB segment support.
There are two types of 1 TB segment aliasing:
Shared aliases:
– A single shared memory region large enough on its own to trigger aliasing, by default,
at least 3 GB in size.
– TB aliases used by the entire system are “shared” by processes.
– Aliasing triggered at shmat() time (shared memory attach).
– AIX does not place other attachments into the terabyte region, unless address space
pressure is present.
Unshared aliases:
– Multiple small, homogeneous shared memory regions grouped into a single 1 TB
region.
– Collectively large enough to trigger aliasing. By default, they must exceed 64 GB in
size.
– TB aliases are private to the process.
– Aliasing is triggered at the shmat of the region that crosses the threshold.
– Unshared aliasing is expensive. And shmat requires that the unshared alias is
invalidated and removed to avoid access via page table entries (PTEs) to stale regions.
See the following documentation for additional information about 1 TB segment aliasing:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.prf
tungd/doc/prftungd/1TB_segment_aliasing.htm
10
POWER7 core provides 12 SLB for user processes, and it will yield 3 GB of accessible memory without SLB
faults. But if the shared memory size exceeds 3 GB, there is a potential risk for a performance decrease.292 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Considerations with 1 Terabyte segments
This section provides considerations when using the 1 TB segment.
mmap considerations
For the mmap, there are two considerations:
Aliasing is not supported.
A new address space layout is in use (if 1 TB segment aliasing is active).
Unshared aliases can have performance issues
Here are a few details about performance when using unshared aliases:
Every address space removal can cause an unshared alias removal.
A 1 TB unshared alias is not reusable until a background kernel process clears all its
possible PTEs up to a high-water mark that has been established at run time.
Determine how to configure 1 TB segment aliasing in a production environment after
sufficient testing with the client application.
Implementing 1 TB segment aliasing
There are four key AIX vmo parameters to control the 1 TB segment aliasing:
esid_allocator, VMM_CNTRL=ESID_ALLOCATOR=[0,1]
The default is off (0) in AIX 6.1 TL6, and default is on (1) in AIX 7.1. When on, it indicates
that 1 TB segment aliasing is in use, including aliasing capabilities and address space
layout changes. This parameter can be changed dynamically.
shm_1tb_shared, VMM_CNTRL=SHM_1TB_SHARED=[0,4096]
The default is set to 12 on POWER7 and set to 44 on POWER6 and earlier. It controls the
threshold, which is the “trigger value” at which a shared memory attachment gets a shared
alias (3 GB, 11 GB). The unit is in 256 MB segments.
shm_1tb_unshared, VMM_CNTRL=SHM_1TB_UNSHARED=[0,4096]
The default is set to 256 MB (up to 64 GB). It controls the threshold at which multiple
homogeneous small shared memory regions are promoted to an unshared alias.
Conservatively, it is set high, because unshared aliases are compute-intensive to initially
establish and to destroy. The unit is in 256 MB segments.
shm_1tb_unsh_enable
The default is set to on; it determines whether unshared aliases are used. Unshared
aliases can be “expensive”.
How to verify 1 TB segment aliasing (LSA) usage
To determine if LSA is active for a specific process, you need to utilize the AIX kernel
debugger kdb, which is run as user root. To quit kdb, enter quit.
Important: The four vmo parameters are restricted tunables and must not be changed
unless recommended by IBM support. Any 32-bit applications are not affected by these
parameters.
Using kdb: Use caution when using kdb and follow the described procedure carefully. You
might terminate (kill) AIX from within kdb if you use the wrong commands.Chapter 7. POWER7 Enterprise Server performance considerations 293
The following IBM white paper describes how to verify 1 TB segment aliasing (LSA) usage for
the Oracle process with other testing scenarios. It also provides experiences with tuning
Oracle SGA with 1 TB segment aliasing.
The method to verify other applications, for example DB2, Informix, and so on, is similar to the
method that is mentioned in this document:
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/5cb5ed706d254a8186256c71006d2e
0a/a121c7a8a780d41a8625787200654e3a/$FILE/Oracle_DB_and_Large_Segment_Aliasing_v1.
0.pdf
7.6.4 Memory affinity
AIX added enhanced affinity (memory affinity) support for POWER7 systems, because the
POWER7 architecture is extremely sensitive to application memory affinity due to 4x the
number of cores on a chip compared to the POWER6 chip. There is a vmo tunable
enhanced_affinity_private that can be tuned from 0 to 100 to improve application memory
locality.
The vmo tunable enhanced_affinity_private is set to 40 by default on AIX 6.1 TL6 or AIX 7.1
and beyond. It is set to 20 on AIX 6.1 TL5. A value of 40 indicates the percentage of
application data that is allocated locally (or “affinitized”) on the memory behind its home
POWER7 socket. The rest of the memory is striped across behind all the sockets in its
partition.
In order to decide the need to implement this tuning parameter, an analysis of the AIX kernel
performance statistics and an analysis of the AIX kernel trace are required to determine if
tuning changes to the enhanced affinity tunable parameters can help to improve application
performance.
Considerations with memory affinity
Increasing the degree of memory locality can negatively affect application performance on
LPARs configured with low memory or with workloads that consist of multiple applications
with various memory locality requirements.
Determine how to configure enhanced_affinity_private or other memory affinity
parameters in a production environment after you have performed sufficient testing with the
application.
How to monitor memory affinity
Certain AIX commands, lssrad, mpstat, and svmon, are enhanced to retrieve POWER7
memory affinity statistics. For detailed information about how to use these commands, refer to
7.8.8, “Monitoring memory affinity statistics” on page 325.
Description of key memory affinity parameters
This section provides information about the key memory affinity parameters.
enhanced_affinity_vmpool_limit
Example 7-3 on page 294 shows a description of the enhanced_affinity_vmpool_limit with
the vmo command.
Important: The enhanced_affinity_private and enhanced_affinity_vmpool_limit vmo
tunables are restricted tunables and must not be changed unless recommended by IBM
support.294 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Example 7-3 Description of enhanced_affinity_vmpool_limit vmo command parameter
# vmo -h enhanced_affinity_vmpool_limit
Help for tunable enhanced_affinity_vmpool_limit:
Purpose:
Specifies percentage difference of affinitized memory allocations in a vmpool
relative to the average across all vmpools.
Values:
Default: 10
Range: -1 - 100
Type: Dynamic
Unit: numeric
Tuning:
Affinitized memory allocations in a vmpool are converted to balanced allocations
if the affinitized percentage difference between the vmpool and the average across
all vmpools is greater than this limit.
enhanced_affinity_private
Example 7-4 shows a description of the enhanced_affinity_private parameter with the vmo
command.
Example 7-4 Description of enhanced_affinity_private vmo command parameter
# vmo -h enhanced_affinity_private
Help for tunable enhanced_affinity_private:
Purpose:
Specifies percentage of process private memory allocations that are affinitized by
default.
Values:
Default: 40
Range: 0 - 100
Type: Dynamic
Unit: numeric
Tuning:
This tunable limits the default amount of affinitized memory allocated for process
private memory. Affinitizing private memory may improve process performance.
However, too much affinitized memory in a vmpool can cause paging and impact
overall system performance.
7.6.5 Hardware memory prefetch
Hardware memory prefetch helps to improve the performance of applications that reference
memory sequentially by prefetching memory.
Hardware memory prefetch consdierations
There might be adverse performance effects, depending on the workload characteristics.
Determine whether to disable hardware memory prefetch in a production environment after
sufficient testing with the application.
Important: The default value of the hardware memory prefetch must not be changed
unless recommended by IBM AIX support.Chapter 7. POWER7 Enterprise Server performance considerations 295
How to monitor and set prefetch settings
The dscrctl command is used for the system administrator to read the current settings for the
hardware streams mechanism and to set a system-wide value for the Data Stream Control
Register (DSCR). Table 7-8 shows the options and descriptions of the command.
Table 7-8 Descriptions of the dscrctl command options
For detailed information about the dscrctl command, refer to the IBM information center
website and search for it:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.ntl
/RELNOTES/GI11-9815-00.htm
Example 7-5 shows how to query the characteristics of the hardware streams on the system.
Example 7-5 To query the characteristics of the hardware streams on the system
# dscrctl -q
Current DSCR settings:
Data Streams Version = V2.06
number_of_streams = 16
platform_default_pd = 0x5 (DPFD_DEEP)
os_default_pd = 0x0 (DPFD_DEFAULT)
Example 7-6 shows how to disable the hardware default prefetch depth, effective immediately.
But after a reboot, it is restored to the default value (0).
Example 7-6 Disable hardware default prefetch depth
# dscrctl -n -s 1
# dscrctl -q
Current DSCR settings:
Data Streams Version = V2.06
number_of_streams = 16
platform_default_pd = 0x5 (DPFD_DEEP)
dscrctl option Description
-q Query: This option displays the number of hardware streams supported by the
platform and the values of the firmware and operating system default prefetch
depth.
-c Cancel: Thsi option cancels a permanent setting of the system default prefetch
depth at boot time by removing the dscrctl command from the /.etc/inittab
file.
-n Now: When used in conjunction with the -s flag, this option changes the runtime
value of the operating system default prefetch depth. The change is not persistent
across boots.
-b Boot: When used in conjunction with the -s flag, this option makes the change
persistent across boots by adding a dscrctl command to the /etc/inittab file.
-s dscr_value Set: This option defines the value for the new operating system default prefetch
depth. The value is treated as a decimal number unless it starts with 0x, in which
case it is treated as hexadecimal. The default is 0x0. The value 0x1 means
disable it.296 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
os_default_pd = 0x1 (DPFD_NONE)
Example 7-7 shows how to disable across a reboot using the -b flag. This command creates
an entry in the /etc/inittab file.
Example 7-7 Disable hardware default prefetch depth with the -b flag
# dscrctl -n -b -s 1
The value of the DSCR OS default will be modified on subsequent boots
# dscrctl -q
Current DSCR settings:
Data Streams Version = V2.06
number_of_streams = 16
platform_default_pd = 0x5 (DPFD_DEEP)
os_default_pd = 0x1 (DPFD_NONE)
# cat /etc/inittab|grep dscrset
dscrset:2:once:/usr/sbin/dscrctl -n -s 1 >/dev/null 2>/dev/console
For most JAVA applications, we suggest that you turn off the hardware memory prefetch with
the AIX command dscrctl -n -s 1. For detailed information, refer to page 19 of the IBM
white paper, Java performance on POWER7 - Best Practice, at the following website:
http://public.dhe.ibm.com/common/ssi/ecm/en/pow03066usen/POW03066USEN.PDF
7.6.6 Simultaneous multithreading (SMT)
Simultaneous multithreading (SMT) is a hardware multithreading technology, which enables
the execution of multiple instructions from multiple code paths, or hardware threads, in a
single CPU clock cycle. POWER5 was the first IBM Power series processor to implement this
technology with the capability of using either one or two hardware threads per processor core.
With the IBM POWER7 generation and AIX 6.1, an additional two hardware threads are
available.
Although SMT is implemented in physical hardware, its use is enabled at the operating
system layer, requiring operating system software awareness of this feature. AIX 5.3 can
recognize up to two threads per core; AIX Version 6.1 or higher utilizes all four threads on
POWER7. In addition to AIX, four SMT threads can be used with IBM i 6.1.1, SUSE SLES 11
SP1, and Red Hat RHEL 6.
POWER7 offers three types of SMT: 1-way, 2-way, and 4-way. With 4-way SMT, you can
increase the number of instruction streams that your system can run on the same processor
core.
In 2-way SMT (SMT2), the number of the logical processors that the operating system sees is
double the number of the physical processor cores in the system. That number of logical
processors becomes quadrupled with the 4-way SMT (SMT4). Consequently, the overall
system capacity increases as the number of instruction streams increases.
In order to detect the need to implement this tuning, an analysis of AIX kernel performance
statistics and an analysis of the AIX kernel trace are required to determine if tuning changes
to SMT can help to improve application performance.Chapter 7. POWER7 Enterprise Server performance considerations 297
Considerations with SMT
Simultaneous multithreading is primarily beneficial in commercial environments where the
speed of an individual transaction is not as important as the total number of transactions that
are performed. Simultaneous multithreading is expected to increase the throughput of
workloads with large or frequently changing working sets, such as database servers and web
servers.
With SMT (2/4) on, POWER7 can deliver more total capacity as more tasks are accomplished
in parallel. The higher CPU utilization the application gets, the higher relative improvement
you get of transaction rates, response times, and even CPU-intensive calculation batch jobs.
Figure 7-18
11
shows one compute-intense workload with POWER7’s SMT2 and SMT4 mode,
which provides better throughout than Single Thread (ST) mode with the same CPU
resources.
Figure 7-18 ST, SMT, and SMT4 efficiency with compute-intensive workloads
Figure 7-19
12
shows one Java workload with POWER7’s SMT2 and SMT4 mode, which also
provides better performance than ST with the same CPU resources.
Figure 7-19 ST, SMT, and SMT4 efficiency with Java workloads
Applications that are single process or single threaded might benefit from running in ST
mode. ST mode can be beneficial in the case of a multi-process application where the
number of application processes is smaller than the number of cores assigned to the
partition. Certain workloads do not benefit much from SMT, mostly they have the majority of
individual software threads using a large amount of resources in the processor or memory.
For example, workloads that are floating-point intensive are likely to gain little from SMT, and
they are the ones most likely to lose performance. These workloads heavily use either the
floating-point units or the memory bandwidth.
Determine how to configure SMT in a production environment after sufficient testing with the
client application.
11
The testing data is provided by the IBM STG Performance team.
12
The testing data is provided by the IBM STG Performance team.
P7 SMT4 and ST modes
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000
Transaction /s
CPU Util
P7 ST
P7 SMT4
P7 SMT2 and ST modes
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500
Transaction /s
CPU Util
P7 ST
P7 SMT
2136
1774
Ratio: 1.20
2411
1774
Ratio: 1.36
P7 SMT4 and ST modes
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600 700
Transaction /s
CPU Util
P7 ST
P7 SMT4
P7 SMT2 and ST modes
0
10
20
30
40
50
60
70
80
90
100
0 50 100 150 200 250 300 350 400 450 500
Transaction /s
CPU Util
P7 ST
P7 SMT
457
330
Ratio: 1.38
628
330
Ratio: 1.90298 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
The smtctl command controls the enabling (0, 2, or 4) and disabling of processor SMT mode.
The smtctl command uses this syntax:
smtctl [ -m off | on [ -w boot | now ]]
smtctl [-t #SMT [-w boot | now ]]
Table 7-9 shows the description of each option.
Table 7-9 Descriptions of the smtctl command options
For more information, refer to the smtctl command man manual (man smtctl).
Case study
This paper illustrates the use of SMT for a client’s Oracle Data Warehouse workload running
on AIX. The parallel degree that is used to execute a particular query can have a significant
effect on the run time for that query. Oracle’s CPU_COUNT parameter, determined by the number
of virtual processors and the SMT value, is the key determining factor for Oracle’s default
parallel degree. For this client’s workload, changing from SMT2 to SMT4 appears to increase
the total run time of their jobs by over 10%.
http://w3-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101824
APARs: We suggest that you apply AIX APAR IZ97088, which enhances the SMT4
performance, and AIX APAR IZ96303, which resolves AIX crash issues when the
processor number exceed 32 and you want to switch SMT mode from 2 to 4.
Option Description
-m off This option sets simultaneous multithreading mode to disabled. This option cannot be
used with the -t flag.
-m on This option sets simultaneous multithreading mode to enabled. By using the -m flag, the
maximum number of threads supported per processor is enabled. This option cannot be
used with the -t flag.
-t #SMT This option sets the number of the simultaneous threads per processor. The value can
be set to one to disable simultaneous multithreading. The value can be set to two for
systems that support 2-way simultaneous multi-threading. The value can be set to four
for the systems that support 4-way simultaneous multithreading. This option cannot be
used with the -m flag.
-w boot This option makes the simultaneous multithreading mode change effective on the next
and subsequent reboots if you run the bosboot command before the next system reboot.
-w now This option makes the simultaneous multithreading mode change immediately but it does
not persist across reboot. If the -w boot option is not specified or if the -w now option is
not specified, the mode change is made immediately. It persists across subsequent
reboots, if you run the bosboot command before the next system reboot.Chapter 7. POWER7 Enterprise Server performance considerations 299
7.6.7 New features of XL C/C++ V11.1
IBM XL C for AIX is a standards-based, high-performance C compiler with advanced
optimizing and debugging features. IBM XL C for AIX 11.1 introduces enhancements for
exploitation of the latest POWER7 architecture:
Vector unit and vector scalar extension (VSX) instruction support
Mathematical Acceleration Subsystem (MASS) libraries support
Power instruction direct control support
New arch and tune compiler options
Vector unit and vector scalar extension (VSX) instruction support
IBM XL C for AIX V11.1 supports the VSX instruction set in the POWER7 processor. New
data types and built-in functions are introduced to support the VSX instruction allowing you to
efficiently manipulate vector operations in your applications. The advanced compiler optimizer
can also automatically take advantage of these vector facilities to help automatically
parallelize your applications.
Mathematical Acceleration Subsystem (MASS) libraries support
The highly tuned MASS libraries are enhanced to support the POWER7 processor:
The vector functions within the vector MASS library are tuned for the POWER7
architecture. The functions can be used in either 32-bit or 64-bit mode.
New functions, such as exp2, exp2m1, log21p, and log2, are added in both
single-precision and double-precision functional groups. In addition, functions that support
the previous Power processors are enhanced to support POWER7 processors.
A new MASS single-instruction, multiple-data (SIMD) library that is tuned for the POWER7
processor is provided. It contains an accelerated set of frequently used mathematical
functions.
Power instruction direct control support
New built-in functions unlock POWER7 processor instructions to enable you to take direct
control at the application level:
POWER7 prefetch extensions and cache control instructions
POWER7 hardware instructions
New arch and tune compiler options
New arch and tune compiler options are added to specify code generation for the POWER7
processor architecture:
-qarch=pwr7 instructs the compiler to produce code that can fully exploit the POWER7
hardware architecture.
-qtune=pwr7 enables optimizations that are specifically tuned for the POWER7 hardware
platforms.
For more information about IBM XL C/C++ V11.1 for AIX, refer to the IBM website:
http://www-947.ibm.com/support/entry/portal/Documentation/Software/Rational/XL_C~C
++_for_AIX300 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7.6.8 How to deal with unbalanced core and memory placement
Unbalanced core and memory placement happens when you are frequently shutting down
and activating partitions using various amounts of resources (cores and memory) or using the
DLPAR operation to change the resource (cores and memory) frequently. This configuration
results in a performance decrease that is greater than you might originally expect.
How to detect unbalanced core and memory placement
To display core and memory placement on an LPAR, use the lssrad command. The REF1
column in the output of Example 7-8 is the first hardware-provided reference point that
identifies sets of resources that are near each other. SRAD is the Scheduler Resource
Allocation Domain. Cores and memory must be allocated from the same REF1 and SRAD.
The lssrad output that is shown in Example 7-8 shows an undesirable placement of an LPAR
with 32 cores and 256 GB of memory. Pinned memory, such as large pages, is not reflected in
the output.
Example 7-8 The output of the lssrad command
# lssrad -va
REF1 SRAD MEM CPU
0 0 64287.00 0-31
1 1 8945.56 32-47
2 0.00 48-79
2 3 0.00 80-99
4 1893.00 100-127
3 5 74339.00
How to handle unbalanced core and memory placement
In many cases, unbalanced core and memory placement does not affect performance. It is
unnecessary to fix this situation.
The performance of certain applications, especially memory-sensitive applications, might be
affected by unbalanced core and memory placement, for example, > 10%. If CPU and
memory resources are insufficient in the server, and you have to optimize the core and
memory placement, refer to the following steps to release the resources manually.
Follow these steps:
1. Shut down all the partitions in the server.
2. Log in the HMC as the hscroot user and execute the following commands for every LPAR:
chhwres -r mem -m -o r -q --id
chhwres -r proc -m -o r --procs --id
Deactivation: Deactivating an existing LPAR does not free its resources, so you cannot
optimize the core and memory placement merely by deactivating and rebooting LPARs.
Commands: The first command frees all the memory, and the second command frees
all the cores.Chapter 7. POWER7 Enterprise Server performance considerations 301
3. Activate the LPARs in the order of performance priority.
For example, we have one Power 780 server. There are six LPARs in the server: one LPAR for
the DB partition, two LPARs for the application partitions, and three LPARs for the web
partitions. Figure 7-20 shows the HMC view.
Figure 7-20 LPAR configuration of the Power 780 server
After running dynamic LPAR operations with the DB partition, the core and memory
placement is not as good, as shown in Example 7-9.
Example 7-9 Checking the CPU and memory placement
# lssrad -av
REF1 SRAD MEM CPU
0
0 31714.00 0-7
1
1 0.00 8-19
2
2 0.00 20-27
3 0.00 28-31
Then, shut down all the LPARs through the AIX command line or the HMC GUI. Then, we log
in to the HMC command line as the hscroot user and execute the following commands, as
shown in Example 7-10.
Example 7-10 Release LPAR resources with the chhwres command manually
hscroot@HMC50:~> chhwres -r mem -m SVRP7780-04-SN0661F4P -o r --id 1 -q 32768
hscroot@HMC50:~> chhwres -r mem -m SVRP7780-04-SN0661F4P -o r --id 2 -q 24576
hscroot@HMC50:~> chhwres -r mem -m SVRP7780-04-SN0661F4P -o r --id 3 -q 24576
hscroot@HMC50:~> chhwres -r mem -m SVRP7780-04-SN0661F4P -o r --id 4 -q 24576
hscroot@HMC50:~> chhwres -r mem -m SVRP7780-04-SN0661F4P -o r --id 5 -q 24576
hscroot@HMC50:~> chhwres -r mem -m SVRP7780-04-SN0661F4P -o r --id 6 -q 24576
After the commands execute successfully, we can see that the values of the Processing Units
column and Memory column in the HMC view, as shown in Figure 7-21, have changed to 0.
This value means that the resource has been released.302 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-21 The HMC view after releasing the resource manually
Then, we activate the LPARs in order of their performance priority level, as shown in
Example 7-11.
Example 7-11 HMC commands to active LPARs
hscroot@HMC50:~> chsysstate -r lpar -m SVRP7780-04-SN0661F4P -o on --id 1 -f lpar1
hscroot@HMC50:~> chsysstate -r lpar -m SVRP7780-04-SN0661F4P -o on --id 2 -f lpar2
hscroot@HMC50:~> chsysstate -r lpar -m SVRP7780-04-SN0661F4P -o on --id 3 -f lpar7
hscroot@HMC50:~> chsysstate -r lpar -m SVRP7780-04-SN0661F4P -o on --id 4 -f lpar4
hscroot@HMC50:~> chsysstate -r lpar -m SVRP7780-04-SN0661F4P -o on --id 5 -f lpar5
hscroot@HMC50:~> chsysstate -r lpar -m SVRP7780-04-SN0661F4P -o on --id 6 -f lpar6
After all the LPARs are started, we check the CPU and memory placement again with the
lssrad command, as shown in Example 7-12.
Example 7-12 Checking core and memory placement
# lssrad -av
REF1 SRAD MEM CPU
0
0 31714.00 0-31
Now, the core and memory placement has changed to optimized. If this method cannot get
optimal placement, reboot the CEC to fix it.
Unbalanced core and memory placement considerations
To avoid this bad situation, at the time of writing this book, we decided that it is better to plan
the partitions’ configurations carefully before activating them and performing the DLPAR
operation. IBM intends to fix this situation in a future release of the system firmware.
7.6.9 AIX performance tuning web resources
This section provides web resources for AIX performance tuning:
AIX 7.1 Information Center performance management and tuning
This IBM information center topic contains links to information about managing and tuning
the performance of your AIX system: Performance management, Performance Tools
Guide, and Performance Toolbox Version 2 and 3 Guide and Reference. The first link
(Performance management) provides application programmers, service support
representatives (SSRs), system engineers, system administrators, experienced users, Chapter 7. POWER7 Enterprise Server performance considerations 303
and system programmers with complete information about how to perform tasks, such as
assessing and tuning the performance of processors, file systems, memory, disk I/O,
Network File System (NFS), Java, and communications I/O:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.
doc/doc/base/performance.htm
AIX on Power performance FAQ
This IBM white paper is intended to address the most frequently asked questions
concerning AIX performance on Power Systems and to provide guidelines for the most
commonly seen performance issues:
ftp://ftp.software.ibm.com/common/ssi/rep_wh/n/POW03049USEN/POW03049USEN.PDF
Database performance tuning on AIX
This IBM Redbooks publication provides information about the database’s life cycle, in the
planning and sizing stage, during implementation, and when running a productive
database system. It also describes many tuning experiences for databases on AIX. The
databases include DB2, Informix, and Oracle:
http://www.redbooks.ibm.com/abstracts/sg245511.html?Open
7.7 IBM i performance considerations
In this section, we provide performance considerations for IBM i.
7.7.1 Overview
IBM i has excellent scalability features and uses the POWER7 architecture without any extra
and special tuning. Due to the multithreading nature of IBM i, all applications automatically
take advantage of the underlying operating system, microcode, and hardware. For example,
IBM i exploits the four SMT threads per processor, as well as the many processors available
in POWER7.
The new Level 3 cache design (the cache is on the processor chip and shared among all
eight cores on the chip (refer to Figure 7-22 on page 304)), as well as the capability to “lateral
cast out” instructions and data elements to the other seven remaining processors’ caches on
the chip, helps to free up level 2 and 3 caches. This design helps to dramatically improve the
performance and throughput of highly interactive applications, whether they are “green
screen” or web-based applications. Sharing the commonly used data in the high-speed Level
3 cache among eight processors reduces the need drastically to fetch the data from real
memory. In addition, because the memory bus is much wider and faster on POWER7 than
before, applications, which consume a lot of memory and process a large amount of database
information, gain performance when migrated to POWER7.304 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-22 The new Level 3 cache design of POWER7
Processor chips are tightly interconnected with high-speed fabrics to ensure greater
performance when application workloads span multiple processor chips at the same time.
Figure 7-23 shows a conceptual picture of the processor interconnection in a Power 750.
Figure 7-23 The processor interconnection in POWER7 (750)
Core Core Core Core Core Core Core Core
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
Core Core Core Core Core Core Core Core
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L2
256KB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
L3
4MB
Memory DIMM
Memory DIMM
Memory DIMM
Memory DIMM
Memory DIMM
Memory DIMM
Memory DIMM
Memory DIMM
“Lateral Cast-Out””
POWER7Chapter 7. POWER7 Enterprise Server performance considerations 305
The Power 780 has the following conceptual interconnection fabric design, as shown in
Figure 7-24.
Figure 7-24 The interconnection fabric design of Power 780
The hypervisor in POWER7 ensures that processors and memory resources are as close as
possible to each other to use both processor and memory affinity. However, you can improve
the affinity further during the setup of partitions by considering the eight cores and the
amount of memory when partitioning the system. A good example is to design an eight
processor partition even though there might be a need for a ninth processor eventually. The
hypervisor takes the eight core request and attempts to place these eight cores on a single
processor chip if possible and if they are not in conflict with other previously defined partitions.
7.7.2 Optimizing POWER7 performance through tuning system resources
All known IBM i tuning practices today can be applied to POWER7, as well. For example,
providing enough processors and memory to each of the IBM i partitions is a good base for
excellent performance. A good assumption is to use about 8 GB of memory for each of the
processors being allocated in the partition to fully exploit the processing capabilities. Having
enough disk arms available (not to be confused with enough disk capacity) is essential to
sustain good throughput when running any commercial application.
There are a few tuning options in IBM i, which become more important and sensitive with
POWER7 than with prior systems. Processors on POWER7 systems can be shared or
dedicated to a partition. In addition, there can be multiple processor pools that are defined,
each with a certain use and characteristics. Shared processors, in general, are highly
effective in multithreaded applications, such as interactive 5250 applications, web-based
applications using HTTP, application and database servers with hundreds or thousands of
clients connected to the system, each with a separate thread. 306 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Typically, these workloads are tuned for the best possible throughput. A good setup strategy
and design for the best performance is to round up or round down processor fractions to full
processors, if possible. For example, 0.9 or 1.1 processors need to be one processor, 1.9 or
2.1 processors can be better defined as 2 processors, and so on. This approach also
provides a good, even relationship to the number of virtual processors for the assigned
physical processors.
Dedicated processors are used best for ensuring that the processors are instantaneously
available to the partition to guarantee good response times. Also, dedicated processors are
best for applications that are sensitive to stolen level 2 and 3 cache information, as well as
processor cycles being used by other partitions. Batch applications typically benefit most from
dedicated processors. Also, other transaction-oriented applications with a high degree of
response time requirements also benefit from dedicated processors.
POWER7 provides simultaneous multithreading (SMT), which consists of SMT4 (four
processor threads concurrently) and automatic switching between single thread (ST), SMT2,
and SMT4 mode. SMT is not always beneficial for all workloads. Single-threaded processes
work more efficiently when executing in single-threaded mode rather than SMT2 or SMT4
mode. Because the system itself can determine what mode is best for the workload currently
running, it relieves system administrators from having to make decisions and trade-offs for the
best performance and highest optimization levels.
The POWER7 system can run in either POWER6 (compatibility mode) or in POWER7 mode.
The mode is determined and set in the HMC (refer to Figure 7-25). Although mostly ignored
with IBM i, there are a few slight consequences in terms of SMT. POWER6 supports only
SMT2, and therefore, a POWER7 system in POWER6 mode only runs in SMT2 mode.
Figure 7-25 Configure processor compatibility modeChapter 7. POWER7 Enterprise Server performance considerations 307
IBM i versions 6.1 and 7.1 are supported on POWER7; however, there are differences in
terms of the number of hardware threads that can be active at any single point in time. The
maximum value for threads supported by IBM i 6.1 is 128. The IBM i 7.1 supports 256
hardware threads on POWER7 in a single partition. You can run up to 64 processors in either
version, but 6.1 supports only SMT2 when 64 processors are used. IBM i 7.1 supports 64
processors with SMT4, up to a maximum of 256 hardware threads.
In case you need more than 64 processors in one IBM i partition, you can request special
support from IBM Lab Services by going to the following website:
http://www.ibm.com/systems/services/labservices
7.8 Enhanced performance tools of AIX for POWER7
In this section, we introduce new features with AIX commands on the POWER7 server
platform:
Monitoring POWER7 processor utilization
Monitoring power saving modes
Monitoring CPU frequency using the lparstat command
Monitoring hypervisor statistics
Capabilities for 1024 CPU support
Monitoring block I/O statistics
Monitoring Active Memory Expansion (AME) statistics
Monitoring memory affinity statistics
Monitoring the available CPU units in a processor pool
Monitoring the remote node statistics using the perfstat library in a clustered AIX
environment
7.8.1 Monitoring POWER7 processor utilization
POWER7 introduces improved reporting of the consumed capacity of a processor. This
section explains the difference in processor utilization reporting between POWER5,
POWER6, and POWER7.
Figure 7-26 on page 308 illustrates how processor utilization is reported on POWER5,
POWER6, and POWER7. On POWER5 and POWER6, when one of the two hardware
threads in SMT2 mode is busy (T0) while the other hardware thread is idle (T1), the utilization
of the processor is 100%. On POWER7, the utilization of the processor in SMT2 mode is
around 69%, providing a better view about how much capacity is available.
In SMT4 mode, with one hardware thread busy (T0) and the other three hardware threads idle
(T1, T2, and T3), the utilization of the processor is around 63%. The processor’s utilization in
SMT4 mode is less than in SMT2 mode, because it has more capacity available through the
additional two hardware threads.308 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-26 Differing processor utilization among POWER5, POWER6, POWER7 SMT2, and SMT4
The following examples demonstrate the CPU utilization for a single-threaded program
running in SMT4, SMT2, and ST modes. All of these measurements were taken on a
POWER7 LPAR with two physical processors.
The first example (Example 7-13) demonstrates the CPU utilization, as reported by the sar
command, when running a single-threaded application in SMT4 mode. It shows that the
single-threaded program consumed an entire logical CPU (cpu4) but not the entire capacity of
the processor.
Example 7-13 CPU utilization when the single-thread process is running with SMT4
# sar -P ALL 1 20
System configuration: lcpu=8 mode=Capped
17:49:05 cpu %usr %sys %wio %idle physc
17:49:07 0 0 2 0 98 0.25
1 0 0 0 100 0.25
2 0 0 0 100 0.25
3 0 0 0 100 0.25
4 100 0 0 0 0.63
5 0 0 0 99 0.12
6 0 0 0 100 0.12
7 0 0 0 100 0.12
- 32 0 0 68 2.00
Example 7-14 shows the output after switching the SMT mode from 4 to 2 (smtctl -t 2). The
same program is running on logical cpu5, and it consumes an entire logical CPU but now it is
consuming 69% of the processor’s capacity.
Example 7-14 CPU utilization when the single-thread process is running with SMT2
# sar -P ALL 2 10
System configuration: lcpu=4 mode=Capped
17:48:18 cpu %usr %sys %wio %idle physc
17:48:20 0 0 2 0 98 0.50
1 0 0 0 100 0.50
4 0 0 0 100 0.31
5 100 0 0 0 0.69
- 35 0 0 65 2.00
busy
idle
T0
T1
POWER5/POWER6 SMT2
100% busy busy
idle
T0
T1
POWER7 SMT2
~69% busy busy
idle
T0
T1
POWER7 SMT4
~63% busy
idle
idle
T2
T3Chapter 7. POWER7 Enterprise Server performance considerations 309
Example 7-15 shows the output after switching SMT mode from 2 to single thread
(smtctl -t 1). The same program is running on logical cpu4, and it consumes the entire
capacity of a processor because there are no other hardware threads available to execute
code.
Example 7-15 CPU utilization when single-thread process is running with ST (single thread) mode
# sar -P ALL 2 10
System configuration: lcpu=2 mode=Capped
17:47:29 cpu %usr %sys %wio %idle
17:47:31 0 0 1 0 99
4 100 0 0 0
- 50 1 0 50
If you want to learn more about processor utilization on Power Systems, refer to the IBM
developerWorks article, “Understanding Processor Utilization on Power Systems - AIX”. This
article covers in detail how processor utilization is computed in AIX and what changes it has
undergone in the past decade in sync with the IBM Power processor technology changes:
http://www.ibm.com/developerworks/wikis/display/WikiPtype/Understanding+Processor+
Utilization+on+POWER+Systems+-+AIX
7.8.2 Monitoring power saving modes
In AIX 6.1 TL6 or AIX 7.1, there are enhanced features with the lparstat, topas, and
topas_nmon commands. They can display power saver modes now.
There are four values of the power saving mode, as showed in Table 7-10.
Table 7-10 Description of power saving modes
Example 7-16, Example 7-17, and Example 7-18 on page 310 show the power saving mode
features.
Example 7-16 Monitoring the power saving mode using lparstat -i
# lparstat -i|grep Power
Power Saving Mode : Static Power Savings
Example 7-17 Monitoring the power saving mode using topas -L
Interval:2 Logical Partition: p29n01 Tue May 24 20:38:01 2011
Psize: 64.0 Shared SMT 4 Online Memory: 96.00G
Power Saving: Static
Ent: 2.00 Mode: Capped Online Logical CPUs:16
Partition CPU Utilization Online Virtual CPUs:4
Value Description
Disabled Power saver mode is disabled
Static Static power savings
Dynamic-performance Dynamic power savings favoring performance
Dynamic-power Dynamic power savings favoring power310 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
%usr %sys %wait %idle physc %entc app vcsw phint %lbusy %hypv hcalls
0.0 0.4 0.0 99.6 0.01 0.69 63.98 264 0 0.2 0.0 0
================================================================================
LCPU MINPF MAJPF INTR CSW ICSW RUNQ LPA SCALLS USER KERN WAIT IDLE PHYSC LCSW
0 69.0 0 275 308 121 0 100 128.00 4.0 87.1 0.0 8.9 0.01 227
2 0 0 17.0 0 0 0 0 0 0.0 1.9 0.0 98.1 0.00 17.0
3 0 0 10.0 0 0 0 0 0 0.0 0.9 0.0 99.1 0.00 10.0
1 0 0 10.0 0 0 0 0 0 0.0 0.8 0.0 99.2 0.00 10.0
Example 7-18 Monitoring the power saving mode using nmon and entering “r”
+-topas_nmon--h=Help-------------Host=p29n01---------Refresh=2 secs---20:38.56
| Resources ------------------------------------------------------------------
|OS has 16 PowerPC_POWER7 (64 bit) CPUs with 16 CPUs active SMT=4
|CPU Speed 3864.0 MHz SerialNumber=105E85P MachineType=IBM,9179-MHB
|Logical partition=Dynamic HMC-LPAR-Number&Name=1,p29n01
|AIX Version=7.1.0.2 TL00 Kernel=64 bit Multi-Processor
|Power Saving=Static
|Hardware-Type(NIM)=CHRP=Common H/W Reference Platform Bus-Type=PCI
|CPU Architecture =PowerPC Implementation=POWER7
|CPU Level 1 Cache is Combined Instruction=32768 bytes & Data=32768 bytes
| Level 2 Cache size=not available Node=p29n01
|Event= 0 --- --- SerialNo Old=--- Current=F65E85 When=---
7.8.3 Monitoring CPU frequency using lparstat
The IBM energy saving features let the user modify the CPU frequency. The frequency can be
set to any selected value (static power saver mode) or can be set to vary dynamically
(dynamic power saver mode). In AIX 5.3 TL11 and AIX 6.1 TL4, the lparstat command
provides new options to monitor the CPU frequency:
-E
It shows both the actual and normalized CPU utilization metrics.
-w
It works with the -E flag to provide longer output.
Example 7-19 shows the report when executing the command in one LPAR in static power
saving mode. The actual frequency is 3.1 GHz. The nominal frequency is 3.864 GHz.
Example 7-19 Monitoring processor frequency using lparstat -E
#lparstat -E 2
System configuration: type=Shared mode=Capped smt=4 lcpu=16 mem=180224MB ent=2.00
Power=Static
Physical Processor Utilisation:
--------Actual-------- ------Normalised------
user sys wait idle freq user sys wait idle
Tip: The Power Saving Mode field shows “-” with the lparstat -i command and shows
“Unknown” with the topas -L and nmon commands when the power modes are not
supported.Chapter 7. POWER7 Enterprise Server performance considerations 311
---- ---- ---- ---- --------- ---- ---- ---- ----
1.926 0.006 0.000 0.068 3.1GHz[ 81%] 1.565 0.005 0.000 0.430
1.926 0.005 0.000 0.069 3.1GHz[ 81%] 1.565 0.004 0.000 0.431
1.891 0.009 0.000 0.099 3.1GHz[ 81%] 1.537 0.008 0.000 0.456
When the LPAR is running, there are 16 logical CPUs running in 4-way SMT mode, the
processing entitled capacity is 2 CPUs, and the Virtual Processor number is 4. The actual
metrics use PURR counters, and the normalized metrics use the SPURR counters
13
.
The values that are shown in each mode are the actual physical processors that are used in
each mode. Adding up all the values (user, sys, idle, and wait) equal the total entitlement of
the partition in both the actual and normalized view. The current idle capacity is shown by
PURR, and the idle value that is shown by SPURR is what the idle capacity is (approximately)
if the CPU is run at the nominal frequency.
Example 7-20 shows the report of the lparstat -Ew command with its long output.
Example 7-20 Monitoring the processor frequency using lparstat -Ew
#lparstat -Ew 2
System configuration: type=Shared mode=Capped smt=4 lcpu=16 mem=180224MB ent=2.00
Power=Static
Physical Processor Utilisation:
-------------------Actual------------
-----------Normalised------------------
user sys wait idle freq user sys
wait idle
--------- -------- -------- -------- --------- --------- ---------
--------- ---------
1.8968[95%] 0.0046[0%] 0.0000[0%] 0.0985[5%] 3.1GHz[81%] 1.5412[77%] 0.0038[0%]
0.0000[0%] 0.4551[23%]
1.8650[93%] 0.0100[1%] 0.0031[0%] 0.1218[6%] 3.1GHz[81%] 1.5153[76%] 0.0082[0%]
0.0025[0%] 0.4740[24%]
1.8944[95%] 0.0047[0%] 0.0000[0%] 0.1009[5%] 3.1GHz[81%] 1.5392[77%] 0.0038[0%]
0.0000[0%] 0.4570[23%]
1.8576[93%] 0.0057[0%] 0.0017[0%] 0.1349[7%] 3.1GHz[81%] 1.5093[75%] 0.0047[0%]
0.0014[0%] 0.4846[24%]
7.8.4 Monitoring hypervisor statistics
The power hypervisor (PHYP) is the most important component to power virtualization
technology. It is a firmware that resides in flash memory. Sometimes, we need to monitor its
activities.
The AIX commands topas and lparstat show the hypervisor statistics.
13
For detailed information about PURR and SPURR, refer to the following website:
http://www.ibm.com/developerworks/wikis/display/WikiPtype/CPU+frequency+monitoring+using+lparstat
Clarification: In shared uncapped mode, the result of this command, as shown in
Example 7-20, might not be true, because the actual processor consumption can exceed
the entitlement. So, in this case, adding these values might not be equal.
In Example 7-20, the Power field does not show when power modes are not supported.312 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
topas
Two hypervisor statistics are displayed in a report of the topas -L command:
hpi
The aggregate number of hypervisor page faults that have occurred for all of the LPARs in
the pool.
hpit
The aggregate of time that is spent in waiting for hypervisor page-ins by all of the LPARs in
the pool in milliseconds.
Example 7-21 shows the report after running topas -L.
Example 7-21 Monitoring hypervisor statistics using topas -L
Interval:2 Logical Partition: lpar1 Tue May 24 21:10:32 2011
Psize: - Shared SMT 4 Online Memory: 6.00G
Power Saving: Disabled
Ent: 0.10 Mode: Un-Capped Online Logical CPUs: 4
Mmode: Shared IOME: 668.00 Online Virtual CPUs: 1
Partition CPU Utilization
%usr %sys %wait %idle physc %entc app vcsw phint hpi hpit pmem iomu
0.3 6.3 0.0 93.4 0.02 15.44 - 278 0 2.00 0 1.78 13.05
================================================================================
LCPU MINPF MAJPF INTR CSW ICSW RUNQ LPA SCALLS USER KERN WAIT IDLE PHYSC LCSW
0 55.0 0 278 158 74.0 0 100 121.00 3.9 85.9 0.0 10.2 0.01 229
1 0 0 14.0 16.0 8.00 0 100 4.00 0.6 4.1 0.0 95.3 0.00 19.0
3 0 0 18.0 0 0 0 0 0 0.0 1.6 0.0 98.4 0.00 18.0
2 0 0 11.0 0 0 0 0 0 0.0 1.0 0.0 99.0 0.00 11.0
lparstat -h
The following statistics are displayed when the -h flag is specified:
%hypv
This column indicates the percentage of physical processor consumption spent making
hypervisor calls.
hcalls
This column indicates the average number of hypervisor calls that are started.
Example 7-22 shows the report after running the lparstat -h command.
Example 7-22 Monitoring hypervisor statistics using lparstat -h
# lparstat -h 2
System configuration: type=Shared mode=Capped smt=4 lcpu=16 mem=180224MB psize=64
ent=2.00
%user %sys %wait %idle physc %entc lbusy app vcsw phint %hypv hcalls %nsp
----- ----- ------ ------ ----- ----- ------ --- ----- ----- ------ ------ -----
0.0 0.4 0.0 99.6 0.02 0.8 1.3 64.00 874 0 86.3 963 81
0.0 0.3 0.0 99.6 0.01 0.6 0.0 64.00 694 0 99.4 790 81
0.0 0.5 0.0 99.5 0.02 0.8 1.9 63.99 92 0 82.5 108 81 Chapter 7. POWER7 Enterprise Server performance considerations 313
lparstat -H
The following statistics are displayed when the -H flag is specified.
The report of the lparstat -H command provides detailed hypervisor information. This option
displays the statistics for each of the hypervisor calls. We list the various hypervisor statistics
that are displayed in columns by this option for each of the hypervisor calls and the
description of that statistic:
Number of calls
Number of hypervisor calls made
%Total Time Spent
Percentage of total time spent in this type of call
%Hypervisor Time Spent
Percentage of hypervisor time spent in this type of call
Avg Call Time (ns)
Average call time for this type of call in nanoseconds
Max Call Time (ns)
Maximum call time for this type of call in nanoseconds
Example 7-23 shows the output of the lparstat -H command.
Example 7-23 Monitoring hypervisor statistics using lparstat -H
lparstat -H 2
System configuration: type=Shared mode=Capped smt=4 lcpu=16 mem=180224MB psize=64
ent=2.00
Detailed information on Hypervisor Calls
Hypervisor Number of %Total Time %Hypervisor Avg Call Max Call
Call Calls Spent Time Spent Time(ns) Time(ns)
remove 0 0.0 0.0 0 1250
read 0 0.0 0.0 0 0
nclear_mod 0 0.0 0.0 0 0
page_init 11 0.0 0.0 480 1937
clear_ref 0 0.0 0.0 0 0
protect 0 0.0 0.0 0 0
put_tce 0 0.0 0.0 0 0
xirr 2 0.0 0.0 1140 1593
eoi 2 0.0 0.0 546 1375
ipi 0 0.0 0.0 0 0
cppr 2 0.0 0.0 312 500
asr 0 0.0 0.0 0 0
others 0 0.0 0.0 0 3875
enter 11 0.0 0.0 360 2812
cede 518 75.8 99.8 54441 2595109
migrate_dma 0 0.0 0.0 0 0
put_rtce 0 0.0 0.0 0 0
confer 0 0.0 0.0 0 0
prod 16 0.0 0.0 480 2000
get_ppp 1 0.0 0.0 3343 4031
set_ppp 0 0.0 0.0 0 0
purr 0 0.0 0.0 0 0314 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
pic 1 0.1 0.1 31750 31750
bulk_remove 0 0.0 0.0 0 0
send_crq 0 0.0 0.0 0 0
copy_rdma 0 0.0 0.0 0 0
get_tce 0 0.0 0.0 0 0
send_logical_lan 0 0.0 0.0 0 0
add_logicl_lan_buf 0 0.0 0.0 0 0
7.8.5 Capabilities for 1024 CPU support
On the POWER7 server, the maximum logical CPU
14
is 1024 (256 cores)
15
. IBM announced
that AIX 7.1 supports 1024 logical CPUs in one LPAR. So, many tools in AIX 7.1 were
enhanced to support this feature. The modified tools are mpstat, sar, and topas. The tools
include an additional capability to generate XML reports, which enables applications to
consume the performance data and generate the required reports. The tools that support
XML output are sar, mpstat, vmstat, iostat, and lparstat. In this section, we introduce
several of these tools.
mpstat
The mpstat tool has this syntax:
mpstat -O (option for sorting and filtering)
mpstat [ { -d | -i | -s | -a | -h } ] [ -w ][ -O Options ] [ -@ wparname] [
interval [ count ] ]
There are three values for the -O option for the mpstat command to sort and filter data. The
following options are supported:
sortcolumn = The name of the metrics in the mpstat command output.
sortorder = [asc|desc]. The default value of sortorder is desc.
topcount = The number of CPUs to be displayed in the mpstat command sorted output.
To see the list of the top 10 users of the CPU, enter the following command, which is shown in
Example 7-24 on page 315:
mpstat -w -O sortcolumn=us,sortorder=desc,topcount=10 2
More tools: There are many other useful trace tools to monitor the hypervisor activities, for
example, the CPU utilization reporting tool (curt). Refer to the IBM information center
website or man manual for more information.
14
The definition of a logical processor is that it is the basic unit of processor hardware that allows the operating
system to dispatch a task or execute a thread. Intelligent thread technology dynamically switches the processor
threading mode (SMT) between 1, 2, and 4 threads per processor core to deliver optimal performance to your
applications. Each logical processor can execute only one thread context at a time.
15
At the time of writing this book, if you want to configure more than 128 cores in one LPAR with the FC4700
processor, you need to purchase software key FC1256 and install it in the server. The name of this code is the “AIX
Enablement for 256-cores LPAR”.Chapter 7. POWER7 Enterprise Server performance considerations 315
Example 7-24 Example of the sorting and filtering function of the mpstat command
# mpstat -w -O sortcolumn=us,sortorder=desc,topcount=10 2
System configuration: lcpu=16 ent=2.0 mode=Capped
cpu min maj mpc int cs ics rq mig lpa sysc us sy wa id pc %ec lcs
15 0 0 0 100 0 0 0 0 - 0 100.0 0.0 0.0 0.0 0.12 6.2 100
14 0 0 0 100 0 0 0 0 - 0 100.0 0.0 0.0 0.0 0.12 6.2 100
7 0 0 0 100 0 0 0 0 - 0 99.9 0.1 0.0 0.0 0.13 6.6 100
11 0 0 0 100 0 0 0 0 - 0 99.9 0.1 0.0 0.0 0.12 6.2 100
6 0 0 0 100 0 0 0 0 - 0 99.9 0.1 0.0 0.0 0.13 6.7 100
13 0 0 0 100 6 3 0 0 100.0 0 99.9 0.1 0.0 0.0 0.12 6.2 100
3 0 0 0 100 0 0 0 0 - 0 99.9 0.1 0.0 0.0 0.12 6.2 100
5 0 0 0 100 0 0 0 0 100.0 0 99.9 0.1 0.0 0.0 0.15 7.7 100
9 0 0 0 100 70 34 0 0 100.0 1 99.9 0.1 0.0 0.0 0.13 6.3 100
12 0 0 0 100 31 15 0 0 100.0 1 99.9 0.1 0.0 0.0 0.12 6.2 100
ALL 0 0 01760 125820 125655 0 0 0.0 194 95.8 0.2 0.0 4.0 2.00 99.9 1602
sar
The sar command has this syntax:
sar -O (option for sorting and filtering)
/usr/sbin/sar [ { -A [ -M ] | [ -a ] [ -b ] [ -c ] [ -d ][ -k ] [ -m ] [ -q ] [ -r
] [ -u ] [ -v ] [ -w ] [ -y ] [ -M ] } ] [ -P processoridentifier, ... | ALL | RST
[-O {sortcolumn=col_name[,sortorder={asc|desc}][,topcount=n]}]]] [ [ -@ wparname ]
[ -e[YYYYYMMDD]hh [ :mm [ :ss ] ] ] [ -ffile ] [ -iseconds ] [ -ofile ] [
-s[YYYYYMMDD]hh [ :mm [:ss ] ] ][-x] [ Interval [ Number ] ]
There are three values for the -O option of the sar command to realize sorting and filtering.
The following options are supported:
sortcolumn = The name of the metrics in the sar command output.
sortorder = [asc|desc] The default value of sortorder is desc.
topcount = The number of CPUs to be displayed in the sar command sorted output.
To list the top 10 CPUs, which are sorted on the scall/s column, enter the following command,
as shown in Example 7-25:
sar -c -O sortcolumn=scall/s,sortorder=desc,topcount=10 -P ALL 1
Example 7-25 Example of the sorting and filtering function of the sar command
# sar -c -O sortcolumn=scall/s,sortorder=desc,topcount=10 -P ALL 1
System configuration: lcpu=16 ent=2.00 mode=Capped
11:44:38 cpu scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
11:44:39 4 653 176 180 0.00 0.00 12545 8651
0 9 0 0 0.00 0.00 0 0
1 5 0 0 0.00 0.00 0 0
8 2 0 0 0.00 0.00 0 0
9 1 0 0 0.00 0.00 0 0
12 1 0 0 0.00 0.00 0 0
7 0 0 0 0.00 0.00 0 0
2 0 0 0 0.00 0.00 0 0
5 0 0 0 0.00 0.00 0 0
3 0 0 0 0.00 0.00 0 0
- 1016 180 180 0.99 0.99 17685 8646316 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
topas
There are two enhanced features for the topas command:
Implement topas panel freezing. Use the spacebar key as a toggle for freezing.
The spacebar key on the keyboard acts as a toggle for freezing the topas panel. If frozen,
topas stops data collection and continues to display the data from the previous iteration.
You can move around the panel and sort the data based on the selected column. In the
frozen state, if you move between panels, certain panels might not display the data. In this
case, press the spacebar key to unfreeze the topas panel.
Implement topas panel scrolling and sorting. Use the Page Up and Page Down keys for
scrolling.
If the amount of data is more than the topas window size, you use the Page Up and Page
Down keys to scroll through the data. The data is sorted based on the selected column.
Table 7-11 lists the freezing and scrolling properties of the topas command.
Table 7-11 Freezing and scrolling properties of the topas command
XML output commands
The following XML output commands are for lparstat, vmstat, iostat, mpstat, and sar:
iostat [-X [-o filename]] [interval[count]]
vmstat [-X [-o filename]] [interval [ count ]]]
lpartstat [-X [-o filename]] [interval[count]]
mpstat [-X [-o filename]] [interval[count]]
sar [-X [-o filename]] [interval[count]]
Panel Freezing Scrolling
Process Panel Y Y
Logical Partition Y Y
Tape Panel Y Y
Disk Panel Y Y
Tape Panel Y Y
SRAD Panel Y Y
Volume Group Y Y
File System Y Y
WLM Y Y
WPAR Y Y
CEC N N
Cluster N N
Adapter N N
Virtual I/O server N NChapter 7. POWER7 Enterprise Server performance considerations 317
The following features are for the XML output commands:
The default output file name is command_DDMMYYHHMM.xml and is generated in the
current directory.
The user can specify the output file name and the directory using the -o flag:
lparstat -X -o /tmp/lparstat_data.xml
These XML schema files are shipped with the base operating system under
/usr/lib/perf:
– iostat_schema.xsd
– lparstat_schema.xsd
– mpstat_schema.xsd
– sar_schema.xsd
– vmstat_schema.xsd
Currently, the XML output that is generated by these commands is not validated as per the
schema. It is up to the application to perform this validation.
7.8.6 Monitoring block IO statistics
In AIX 6.1 TL6 or AIX 7.1, there are enhanced features with the iostat command. The new -b
option was added to capture data to help identify I/O performance issues and correct the
problem more quickly:
The -b option provides the block device utilization report, which shows detailed I/O
statistics for block devices.
The block I/O stats collection has been turned off by default.
The root user can turn it on with the raso tunable command raso -o biostat=1.
The -b option can be used by the root user, as well as a non-root user.
The minimum value that can be specified for the interval is 2 seconds.
Syntax:
iostat -b [block Device1 [block Device [...]]] Interval [Sample]
Table 7-12 shows the column names and descriptions of the output report of the iostat -b
command.
Table 7-12 The column names and descriptions of the output report from the iostat -b command
Column name Description
device Name of the device
bread Indicates the number of bytes read over the monitoring interval. The default unit
is bytes; a suffix is appended if required (1024=K, 1024K=M).
bwrite Indicates the number of bytes written over the monitoring interval. The default
unit is bytes; a suffix is appended if required (1024=K, 1024K=M).
rserv Indicates the read service time per read over the monitoring interval. The default
unit is millisecond.
wserv Indicates the write service time per write over the monitoring interval. The
default unit is millisecond.
rerr Indicates the number of read errors over the monitoring interval. The default unit
is numbers; a suffix is appended if required (1000 = K, 1000K = M, and
1000M = G).318 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Example 7-26 shows the description of the raso command’s tunable parameter (biostat).
Example 7-26 The description of the biostat parameter of raso
# raso -h biostat
Help for tunable biostat:
Purpose:
Specifies whether block IO device statistics collection should be enabled or not.
Values:
Default: 0
Range: 0, 1
Type: Dynamic
Unit: boolean
Tuning:
This tunable is useful in analyzing performance/utilization of various block IO
devices. If this tunable is enabled, we can use iostat -b to show IO statistics
for various block IO devices.
Possible Value:
1 : Enabled
0 : Disabled
Examples
Example 7-27 shows turning on the biostat.
Example 7-27 Enable the analysis of the performance and utilization of various block I/O devices
# raso -o biostat=1
Setting biostat to 1
Example 7-28 shows monitoring block I/O devices using iostat -b. It shows that there are
I/O activities on the hdisk1 device.
Example 7-28 Monitor block I/O devices using iostat -b
# iostat -b 2
System configuration: lcpu=16 drives=3 vdisks=0
Block Devices :6
device reads writes bread bwrite rserv wserv rerr werr
hdisk0 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00
hdisk1 319.00 0.00 319.000M 0.000 6.00 0.00 0.00 0.00
werr Indicates the number of write errors over the monitoring interval. The default
unit is numbers; a suffix is appended if required (1000 = K, 1000K = M, and
1000M = G).
reads Indicates the number of read requests over the monitoring interval. The default
unit is numbers; a suffix is appended if required (1000=K, 1000K=M, and
1000M=G).
writes Indicates the number of write requests over the monitoring interval. The default
unit is numbers; a suffix is appended if required (1000=K, 1000K=M, and
1000M=G).
Column name DescriptionChapter 7. POWER7 Enterprise Server performance considerations 319
hd4 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00
hd8 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00
hd9var 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00
hd2 0.00 0.00 0.000 0.000 0.00 0.00 0.00 0.00
7.8.7 Monitoring Active Memory Expansion (AME) statistics
AIX 6.1 with the 6100-04 TL SP2 release or AIX 7.1 introduced the new amepat command,
which is an Active Memory Expansion (AME) planning and advisory tool.
amepat
The amepat command reports AME information and statistics, as well as provides an advisory
report that assists you in planning the use of AME for existing workloads.
The AME planning and advisory tool amepat serves two key functions:
Workload planning
You can run the amepat command to determine a workload that will benefit from AME and
also to provide a list of possible AME configurations for a workload.
Monitoring
When AME is enabled, the amepat tool is used to monitor the workload and AME
performance statistics.
You can invoke the amepat command in two modes:
Recording mode
In this mode, amepat records the system configuration and various performance statistics
and places them into a user-specified recording file.
Reporting mode
In this mode, amepat analyzes the system configuration and performance statistics, which
were collected in real time or from the user-specified recording file to generate workload
utilization and planning reports.
You can invoke the amepat command by using the System Management Interface Tool
(SMIT). For example, you can use the smit amepat fast path to run this command.
In 6.5.4, “Active Memory Sharing configuration” on page 225, there is one testing scenario
that introduces how to use the amepat command to analyze memory behavior and get the
AME recommendation on a running AIX LPAR environment with a workload. It explains how
to configure the AME attribution based on its recommendation. It also explains how to monitor
the AME performance statistics with the topas command to see the benefit after enabling
AME.
In fact, amepat can also be run in LPARs in which AME is already enabled. When used in this
mode, amepat provides a report of other possible AME configurations for the workload.
Note: When you invoke amepat without specifying the duration or interval, the utilization
statistics (system and AME) do not display any average, minimum, or maximum values.
The utilization statistics only display the current values. The CPU utilization only displays
the average from the system boot time.320 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Example 7-29 on page 320 shows one amepat report after AME has been enabled in this
LPAR.
Example 7-29 The amepat report when AME is enabled
# amepat 1
Command Invoked : amepat 1
Date/Time of invocation : Fri May 27 17:07:15 EDT 2011
Total Monitored time : 3 mins 32 secs
Total Samples Collected : 1
System Configuration:
---------------------
Partition Name : lpar2_p780
Processor Implementation Mode : POWER7
Number Of Logical CPUs : 4
Processor Entitled Capacity : 1.00
Processor Max. Capacity : 1.00
True Memory : 4.00 GB
SMT Threads : 4
Shared Processor Mode : Enabled-Uncapped
Active Memory Sharing : Enabled
Active Memory Expansion : Enabled
Target Expanded Memory Size : 6.00 GB
Target Memory Expansion factor : 1.50
System Resource Statistics: Current
--------------------------- ----------------
CPU Util (Phys. Processors) 0.98 [ 98%]
Virtual Memory Size (MB) 5115 [ 83%]
True Memory In-Use (MB) 4092 [100%]
Pinned Memory (MB) 1240 [ 30%]
File Cache Size (MB) 2 [ 0%]
Available Memory (MB) 1066 [ 17%]
AME Statistics: Current
--------------- ----------------
AME CPU Usage (Phy. Proc Units) 0.89 [ 89%]
Compressed Memory (MB) 1648 [ 27%]
Compression Ratio 2.61
Active Memory Expansion Modeled Statistics:
-------------------------------------------
Modeled Expanded Memory Size : 6.00 GB
Average Compression Ratio : 2.61
Expansion Modeled True Modeled CPU Usage
Factor Memory Size Memory Gain Estimate
--------- ------------- ------------------ -----------
1.03 5.88 GB 128.00 MB [ 2%] 0.00 [ 0%]
1.10 5.50 GB 512.00 MB [ 9%] 0.00 [ 0%]
1.15 5.25 GB 768.00 MB [ 14%] 0.00 [ 0%]
1.20 5.00 GB 1.00 GB [ 20%] 0.13 [ 13%]Chapter 7. POWER7 Enterprise Server performance considerations 321
1.27 4.75 GB 1.25 GB [ 26%] 0.43 [ 43%]
1.34 4.50 GB 1.50 GB [ 33%] 0.73 [ 73%]
1.38 4.38 GB 1.62 GB [ 37%] 0.88 [ 88%]
1.50 4.00 GB 2.00 GB [ 50%] 0.89 [ 89%] << CURRENT CONFIG
Active Memory Expansion Recommendation:
---------------------------------------
The recommended AME configuration for this workload is to configure the LPAR
with a memory size of 5.00 GB and to configure a memory expansion factor
of 1.20. This will result in a memory gain of 20%. With this
configuration, the estimated CPU usage due to AME is approximately 0.13
physical processors, and the estimated overall peak CPU resource required for
the LPAR is 0.22 physical processors.
NOTE: amepat's recommendations are based on the workload's utilization level
during the monitored period. If there is a change in the workload's utilization
level or a change in workload itself, amepat should be run again.
The modeled Active Memory Expansion CPU usage reported by amepat is just an
estimate. The actual CPU usage used for Active Memory Expansion may be lower
or higher depending on the workload.
In one AME-enabled environment, several existing tools, including vmstat -c, lparstat -c,
svmon -O summary=ame, and topas, have been enhanced to monitor AME statistics.
vmstat -c
The following new statistics are displayed when executing the vmstat -c command:
csz
Current compressed pool size in 4 KB page units.
cfr
Free pages available in compressed pool in 4 KB page units.
dxm
Deficit in the expanded memory size in 4 KB page units.
Example 7-30 shows an example of the vmstat -c command.
Example 7-30 Monitor AME statistics using vmstat -c
# vmstat -c 2
System Configuration: lcpu=4 mem=6144MB tmem=4096MB ent=1.00 mmode=shared-E mpsz=24.00GB
kthr memory page faults cpu
----- ----------------------------- ---------------------- ---------------- ---------------
r b avm fre csz cfr dxm ci co pi po in sy cs us sy id wa pc ec
17 17 1192182 389787 47789 2227 0 11981 34223 0 0 61 176 4849 54 43 1 3 1.00 100.0
12 15 1221972 359918 51888 2335 0 23081 39926 0 0 42 348 1650 52 44 1 3 1.00 100.3
12 15 1242037 340443 56074 4204 0 21501 33590 0 0 10 285 2849 56 39 2 3 1.00 99.7
23 0 1262541 320006 58567 3988 0 30338 41675 0 0 83 277 2204 52 45 0 3 1.00 100.0
12 7 1275417 306433 62665 3494 0 27048 35895 0 0 195 229 2802 49 47 0 4 1.00 100.0
lparstat -c
The following statistics are displayed only when the -c flag is specified. Refer to
Example 7-31 on page 322.322 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
%xcpu
Indicates the percentage of CPU utilization for the Active Memory Expansion activity.
xphysc
Indicates the number of physical processors used for the Active Memory Expansion
activity.
dxm
Indicates the size of the expanded memory deficit for the LPAR in MB.
Example 7-31 Monitoring AME statistics using lparstat -c
# lparstat -c 2
System configuration: type=Shared mode=Uncapped mmode=Shar-E smt=4 lcpu=4
mem=6144MB tmem=4096MB psize=14 ent=1.00
%user %sys %wait %idle physc %entc lbusy vcsw phint %xcpu xphysc dxm
----- ----- ------ ------ ----- ----- ------ ----- ----- ------ ------ ------
48.3 51.7 0.0 0.0 1.00 100.1 99.2 0 5 89.6 0.8968 0
41.1 55.2 3.8 0.0 1.00 100.1 92.9 67 2 88.3 0.8842 0
40.0 56.2 3.7 0.0 1.00 100.0 92.2 53 0 88.7 0.8863 0
43.6 54.1 2.2 0.0 1.00 100.2 95.5 44 2 72.4 0.7248 0
39.2 54.7 6.0 0.1 1.00 99.7 84.6 154 4 50.7 0.5049 0
svmon -O summary
The svmon command provides two options for svmon -O summary. Refer to Example 7-32.
ame
Displays the Active Memory Expansion information (in an Active Memory
Expansion-enabled system).
longreal
Displays the Active Memory Expansion information (in an Active Memory
Expansion-enabled system) in a long format.
Example 7-32 Monitor AME statistics using svmon -O
# svmon -O summary=ame
Unit: page
---------------------------------------------------------------------------
size inuse free pin virtual available loaned mmode
memory 1572864 1302546 270318 317841 1312425 269358 0 Shar-E
ucomprsd - 865851 -
comprsd - 436695 -
pg space 131072 4973
work pers clnt other
pin 215054 0 0 102787
in use 1301468 0 1078
ucomprsd 864773
comprsd 436695
---------------------------------------------------------------------------
True Memory: 104857
CurSz %Cur TgtSz %Tgt MaxSz %Max CRatio
ucomprsd 866689 82.65 691918 65.99 - - -Chapter 7. POWER7 Enterprise Server performance considerations 323
comprsd 181887 17.35 356658 34.01 693230 66.11 2.53
txf cxf dxf dxm
AME 1.50 1.50 0.00 0
topas
The topas tool displays memory compression statistics in an Active Memory
Expansion-enabled system with the topas command. Refer to Example 7-33. The following
data is reported:
TMEM,MB
True memory size, in megabytes
CMEM,MB
Compressed pool size, in megabytes
EF[T/A]
Expansion factors: Target and Actual
CI
Compressed pool page-ins
CO
Compressed pool page-outs
Example 7-33 Monitoring AME statistics using topas
Topas Monitor for host:lpar2 EVENTS/QUEUES FILE/TTY
Fri May 27 18:14:33 2011 Interval:FROZEN Cswitch 1388 Readch 4824
Syscall 459 Writech 579
CPU User% Kern% Wait% Idle% Physc Entc% Reads 52 Rawin 0
Total 49.3 48.7 2.0 0.0 1.00 99.88 Writes 1 Ttyout 579
Forks 0 Igets 0
Network BPS I-Pkts O-Pkts B-In B-Out Execs 0 Namei 38
Total 1.76K 25.02 1.00 1.13K 645.5 Runqueue 16.01 Dirblk 0
Waitqueue 0.0
Disk Busy% BPS TPS B-Read B-Writ MEMORY
Total 0.0 0 0 0 0 PAGING Real,MB 6144
Faults 39717K % Comp 84
FileSystem BPS TPS B-Read B-Writ Steals 39567K % Noncomp 0
Total 4.71K 52.04 4.71K 0 PgspIn 0 % Client 0
PgspOut 0
Name PID CPU% PgSp Owner PageIn 0 PAGING SPACE
cmemd 655380 32.0 120K root PageOut 0 Size,MB 512
lrud 262152 16.3 76.0K root Sios 0 % Used 4
nmem64 8519768 13.8 255M root % Free 96
nmem64 5767306 6.5 255M root AME
nmem64 9895952 4.6 255M root TMEM 4.00GWPAR Activ 0
nmem64 5898280 4.6 255M root CMEM 482.81MWPAR Total 0
nmem64 8716434 2.5 255M root EF[T/A] 1.5/1.5Press: "h"-help
nmem64 5374162 2.4 255M root CI:39.7 KCO:38.4K "q"-quit
nmem64 10420454 2.3 255M root
nmem64 9699524 2.2 255M root
nmem64 7995508 2.1 255M root
nmem64 9175286 2.1 255M root
nmem64 10223870 2.1 255M root324 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
nmem64 8257664 1.9 255M root
nmem64 10027162 1.4 255M root
nmem64 7405742 1.0 255M root
topas 10944586 0.5 1.93M root
topas 3473408 0.1 3.70M root
java 8323080 0.0 78.7M root
java 9437240 0.0 48.6M pconsole
nmon (topas_nmon)
The nmon command records the AME statistics in the nmon recording file. The file format for
the nmon recording file is *.nmon, for example, lpar2c_p780_110603_1028.nmon.
Start the nmon recording by running nmon -f on the command line. For detailed information
about how to use the nmon command, refer to the man manual (man nmon). The nmon
recording file is created on the current directory or in the directory that you defined in the
command line. After the nmon file is generated, download it and use the nmon analyzer tool
to analyze and generate one .xls file, for example, lpar2c_p780_110603_1028.nmon.xls.
The following tags in the xls file have AME statistics.
MEM tag
The MEM tag includes the following details that relate to AME. The details are recorded if
AME is enabled in the partition:
Size of the compressed pool in MB
Size of true memory in MB
Expanded memory size in MB
Size of the uncompressed pool in MB
Figure 7-27 shows the output of MEM tag in the lpar2c_p780_110603_1028.nmon.xls file.
Figure 7-27 AME statistics in MEM tag of nmon file
MEMNEW tag
The MEMNEW tag includes the following detail that relates to AME, as shown in Figure 7-28.
Figure 7-28 shows the percentage of total memory used for the compressed pool.
Figure 7-28 AME statistics in MEMNEW tag of nmon fileChapter 7. POWER7 Enterprise Server performance considerations 325
PAGE t ag
The PAGE tag of the nmon file includes the following details that relate to AME:
Compressed pool page-ins
Other tools, such as topas, call this CI.
Compressed pool page-outs
Other tools, such as topas, this call CO.
7.8.8 Monitoring memory affinity statistics
IBM POWER7 processor-based systems contain modules that are capable of supporting
multiple processor chips depending on the particular system. Each module contains multiple
processors, and the system memory is attached to these modules. Although any processor
can access all of the memory in the system, a processor has faster access and higher
bandwidth when addressing memory that is attached to its own module rather than memory
that is attached to the other modules in the system.
Several AIX commands have been enhanced to retrieve POWER7 memory affinity statistics,
including lssrad, mpstat, and svmon.
lssrad
This is a new tool to display core and memory placement on an LPAR. The REF1 column in
the output is the first hardware-provided reference point that identifies sets of resources that
are near each other. The SRAD is the Scheduler Resource Allocation Domain column. You
need to allocate cores and memory from the same REF1 and SRAD.
We explain the following new terminology:
Resource allocation domain (RAD)
A collection of system resources (CPUs and memory)
Scheduler Resource Allocation Domain (SRAD)
A collection of system resources that are the basis for most of the resource allocation and
scheduling activities that are performed by the kernel
Fix: Unfortunately, at the time of writing this book, the last two tags appear in Excel
columns that are used by the nmon analyzer and therefore got overwritten. There is one
temporary fix to resolve it (refer to the following website):
https://www.ibm.com/developerworks/mydeveloperworks/blogs/aixpert/entry/quick_t
emporary_fix_for_nmon_s_analyser_ame_stats37?lang=zh
Examples: The previous examples are only for illustrating how to use the commands and
tools. The AME factor that is used with the workload in the examples is not the best factor.326 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Resource affinity structures:
– Various hierarchies:
• Two-tier (local/remote)
• Three-tier (local/near/far)
– AIX Topology Service – System detail level (SDL):
• SRAD SDL
This affinity structure is used to identify local resources.
• REF1 SDL (first hardware provided reference point)
This affinity structure is used to identify near/far memory boundaries.
Figure 7-29 shows the relationship among system topology, RAD topology, and the lssrad
command output.
Figure 7-29 Relationship among system topology, RAD topology, and lssrad command output
Another example of lssrad (refer to Example 7-34) shows the placement of an LPAR with 12
cores (SMT2) and 48 GB memory.
Example 7-34 One example of lssrad command
# lssrad -av
REF1 SRAD MEM CPU
0
0 19900.44 0-1 4-5 8-9 12-13 16-17
1 27519.19 20-21 24-25 28-29 32-33 36-37 40-41 44-45
System Topology RAD Topology lssrad output
?
CPU 0-6 MEM 0-2
CPU 3 MEM 2
(1024 MB)
RAD 0
# lssrad –va
REF1 SRAD MEM CPU
0
0 512 0-1
1 512 2
1
2 1024 3-4
3 5-6
CPU 4
CPU 5 CPU 6
CPU 2 MEM 1
(512 MB)
MEM 0
(512 MB)
CPU 0 CPU 1
RAD 0 RAD 1
CPU 0-2
MEM 0-1
CPU 3-6
MEM 2
RAD 0 RAD 1 RAD 2 RAD3
CPU 0-1
MEM0
CPU 2
MEM 1
CPU 3-4
MEM 2
CPU 5-6
SDL 0
SDL 2
SDL 1Chapter 7. POWER7 Enterprise Server performance considerations 327
mpstat
There is an enhanced feature for the mpstat -d command to display per logical CPU SRAD
affinity. It adds three columns to the report. Refer to Table 7-13.
Table 7-13 Description of three new columns of the mpstat -d command report
Example 7-35 shows the output of the mpstat -d command.
Example 7-35 Monitoring memory affinity statistics using mpstat -d
# mpstat -d 2
System configuration: lcpu=16 ent=2.0 mode=Capped
cpu cs ics ...S2rd S3rd S4rd S5rd ilcs vlcs S3hrd S4hrd S5hrd
0 709 286 ... 0.0 0.0 0.0 0.0 1 641 41.0 0.0 59.0
1 14 7 ... 0.0 0.0 0.0 0.0 0 36 0.0 0.0 100.0
2 0 0 ... - - - - 0 49 - - -
3 0 0 ... - - - - 0 29 - - -
14 0 0 ... - - - - 0 4 - - -
ALL 723 293 ... 0.0 0.0 0.0 0.0 1 759 40.4 0.0 59.6
-------------------...------------------------------------------
0 994 404 ... 0.0 0.0 0.0 0.0 0 886 40.1 0.0 59.9
1 16 8 ... 0.0 0.0 0.0 0.0 0 48 0.0 0.0 100.0
2 0 0 . . . - - - - 0 6 8 - - -
3 0 0 ... - - - - 0 40 - - -
ALL 1010 412 ... 0.0 0.0 0.0 0.0 0 1042 39.6 0.0 60.4
svmon
The svmon command provides options to display the memory affinity at the process level,
segment level, or cpu level SRAD allocation for each thread:
The affinity domains are represented based on SRADID and provide this information:
– Memory information of each SRAD (total, used, free, and filecache)
– Logical CPUs in each SRAD
Display home SRAD affinity statistics for the threads of a process
Provide application’s memory placement policies
Table 7-14 on page 328 and Example 7-36 on page 328 show descriptions and examples.
Column name Description
S3hrd (-a, -d flag) The percentage of local thread dispatches on this logical processor
S4hrd (-a, -d flag) The percentage of near thread dispatches on this logical processor
S5hrd (-a, -d flag) The percentage of far thread dispatches on this logical processor328 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Table 7-14 smon options, values, and descriptions
Example 7-36 Monitoring memory affinity using svmon
# svmon -O affinity=detail,threadaffinity=on -P 4522046
Unit: page
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual
4522046 topasrec 26118 11476 0 25931
Tid HomeSRAD LocalDisp NearDisp FarDisp
15401157 0 272 0 0
Text Stack Data SHMNamed SHMAnon MapFile UnmapFil EarlyLRU
Default Default Default Default Default Default Default N
Domain affinity Npages Percent lcpus
0 15494 59.8 200 0 1 4 5 8 9 12 13 16 17
1 10437 40.2 136 20 21 24 25 28 29 32 33 36 37 40 41 44 45
7.8.9 Monitoring the available CPU units in a processor pool
In a micro-partition environment, you can use the lparstat command to monitor the current
available physical processors in the shared pool, but you need to turn on “Allow processor
pool utilization authority” through the HMC or SDMC. This LPAR property is on the processor
Configuration tab. Figure 7-30 on page 329 shows the window where you enable processor
pool utilization authority.
svmon option Value Description
affinity on Displays memory affinity at the process level
detail Displays memory affinity at the segment level
off (default) Does not display the memory affinity
threadaffinity on Displays the cpu level SRAD allocation for each thread
off (default) Does not display the cpu level SRAD allocation for each
threadChapter 7. POWER7 Enterprise Server performance considerations 329
Figure 7-30 Turn on “Allow performance information collection” through the HMC
After selecting “Allow processor pool utilization authority” and starting the LPAR, from the AIX
command, we can see the current available physical processors in the shared pool, as shown
in Example 7-37.
Example 7-37 Monitor processor pool’s available CPU units
# lparstat 2
System configuration: type=Shared mode=Capped smt=4 lcpu=16 mem=180224MB psize=64
ent=2.00
%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ------ ------ ----- ----- ------ --- ----- -----
0.0 0.4 0.0 99.5 0.02 0.8 1.0 64.00 44 0
0.0 0.3 0.0 99.7 0.01 0.5 0.9 64.00 495 0
0.0 0.3 0.0 99.7 0.01 0.5 1.7 62.74 608 0
The app statistics column: In Example 7-37, the app statistics column takes affect
immediately after you turn the “Allow processor pool utilization authority” option on or off
from the HMC or SDMC.330 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7.8.10 Monitoring remote node statistics in a clustered AIX environment
In AIX 7.1 and AIX 6.1 TL6, the existing perfstat library is enhanced to support performance
data collection and analysis for a single node or multiple nodes in a cluster. The enhanced
perfstat library provides application programming interfaces (APIs) to obtain performance
metrics that relate to processor, memory, I/O, and others to provide performance statistics
about a node in a cluster.
The perfstat API is a collection of C programming language subroutines that execute in the
user space and use the perfstat kernel extension to extract various AIX performance metrics.
The perfstat library provides three kinds of interfaces for clients to use in their programs to
monitor remote node statistics in a clustered AIX Environment:
Node interface
Node interfaces report metrics related to a set of components or to the individual
components of a remote node in the cluster. The components include processors or
memory, and individual components include a processor, network interface, or memory
page of the remote node in the cluster.
Cluster interface
The perfstat_cluster_total interface is used to retrieve cluster statistics from the
perfstat_cluster_total_t structure, which is defined in the libperfstat.h file.
Node list interface
The perfstat_node_list interface is used to retrieve the list of nodes in the perfstat_node_t
structure, which is defined in the libperfstat.h file.
For detailed information about the perfstat library and how to program these APIs in the client
application, refer to the IBM information center website:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.doc
/doc/base/technicalreferences.htm
7.9 Performance Management for Power Systems
Performance Management (PM) for Power is a tool to help you manage the growth and
performance of your IBM Power Systems. It provides a comprehensive and secure
performance management and capacity planning capability that can help ensure that your IT
system is ready to meet your business opportunities and challenges.
PM for Power Systems supports the latest release of AIX 7.1 and the entire family of
POWER7 processor-based systems, plus the Power processor blades in a BladeCenter.
PM for Power Systems presents a broad array of benefits to you, whether you are running
IBM i or AIX. As a systems management tool, it helps you ensure the most performance from
your system by continuously measuring growth and performance. These measurements allow
you, your IBM Business Partner, or IBM to more quickly diagnose existing performance or
capacity problems, identify potential resource constraints, and plan for future system growth.
Measuring your system’s performance and utilization trends and their effect on future
resource requirements can help you make better informed and cost-effective decisions when
planning for future system needs. In addition, PM for Power Systems provides the
partition-level statistics that are needed to help you evaluate the benefits of increasing total
system utilization through IBM consolidation and virtualization capabilities.Chapter 7. POWER7 Enterprise Server performance considerations 331
System and workload management tasks are an important aspect of a system administrator’s
role. The administrator has the responsibility to monitor and maintain the system, gather
performance data, summarize results, and manage growth. PM for Power Systems offerings
are designed to help you manage the performance of the IBM i and AIX systems in your
enterprise.
Whether you have a single server with one LPAR or multiple servers with multiple LPARs, PM
for Power Systems can save you time. These tools allow you to be proactive in monitoring
your system performance, help you identify system problems, and help you plan for future
capacity needs.
PM for Power Systems is an easy to use, automated, and self-managing offering. A collection
agent that is specifically for PM for Power Systems is integrated into the current releases of
IBM i and AIX. This agent automatically gathers non-proprietary performance data from your
system and allows you to choose if you want to send it to IBM, at your discretion, on a daily or
weekly basis. In return, you receive access to reports, tables, and graphs on the Internet that
show your specific partition’s (or total system if no partitioning is used) utilization, growth, and
performance calculations.
7.9.1 Levels of support available within PM for Power Systems
There are two levels of support that are available within the PM for Power Systems service.
No additional charge summary-level service
If your IBM Power System server is still under warranty or if it is covered under an IBM
hardware maintenance agreement, you receive the benefit of the management summary
graph at no additional charge. This level provides an easy to implement and use process with
interactive reporting that provides summary-level capacity trend information and performance
management parameters for your IBM i-based or AIX-based systems and LPARs. Users are
also allowed access to the Workload Estimator (WLE) for no additional charge to size future
requirements based on the collected data.
After you register the partition with the registration key that IBM provides, you can view the
performance management summary reports via a standard web browser. These reports are
referred to as the management summary graphs (MSG). You can monitor the CPU and disk
attributes of the system, measure capacity trends, and anticipate requirements. You can also
merge the previously collected PM historical data with the IBM Systems Workload Estimator
to size needed upgrades and so on. Flexibility is also provided so that you can arrange the
information about specific partitions or systems in groups in the viewing tool, to make the
information more meaningful to your operation.
Full-detail level (fee service)
Available as either an extension of the IBM Enhanced Technical Support offering or as a
stand-alone offering (depending on your country location), this full detail level is a web-based
fee service that provides access to many more detailed reports and graphs, again through a
secure Internet connection. Additionally, detailed access to many of the graphs is provided
using the interactive function that allows various time frame views of the data, including data
transmitted as recently as the previous day. The detailed reports provide current information
about resource constraints, resources approaching maximum capacity, disk files, processor
For more information: More information about all facets of PM for Power Systems is
available at this website:
http://www.ibm.com/systems/power/support/perfmgmt332 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
utilization, and memory utilization. Like the no additional charge summary graph offering,
users are allowed unlimited access to the WLE for sizing future requirements.
There are several functions that are available to both the “no additional charge” and “fee”
service options:
Monthly .pdf of detailed reports
A .pdf of your entire report package is generated monthly by partition. It is viewable and
downloadable from the website via your personalized secure password.
Customizable graphs
PM for Power Systems provides the user with the ability to customize the reports and
graphs by changing the time period. For example, instead of looking at a 30-day view, you
can drill down to a seven day view, or even a daily view of the same information.
Additionally, the user has access to up to 24 months of history to redraw the same graph from
a historical perspective, providing that the system or partition has been transmitting PM for
Power Systems data consistently.
PM for Power Systems uses performance information and capacity information from your
system. This data includes system utilization information, performance information, and
hardware configuration information. After the data is collected, PM for Power Systems
processes the data and prepares it for transmission to IBM for future analysis and report
generation. Within IBM, the PM data will be used to prepare the reports and graphs that are
delivered as part of the PM for Power Systems offering.
7.9.2 Benefits of PM for Power Systems
PM for Power Systems capabilities are automated, self-maintaining tools for single or multiple
partition systems. IBM stores the data input that is collected by PM for Power Systems for you
and helps you to perform these tasks:
Identify performance bottlenecks before they affect your performance
Identify resource-intensive applications
Maximize the return on your current and future hardware investments
Plan and manage consistent service levels
Forecast data processing growth that is based on trends
The management of your system is simplified with the ability to stay abreast of utilization. This
is true even if you run multiple partitions in separate time zones with a mixture of IBM i and
AIX. If your system is logically partitioned (LPAR), we suggest that you enable PM Agent on
all partitions.
Releases that are supported include those releases that have not reached their End of
Program Support date. information about releases of AIX and IBM i that have not reached
their End of Program Support dates is available at this website:
IBM i
http://www-947.ibm.com/systems/support/i/planning/upgrade/suptschedule.html
AIX
http://www-01.ibm.com/software/support/systemsp/lifecycle/
The PM for Power team diligently maintains start-up instructions by OS release on the PM for
Power website, as shown on the Getting started page that is shown in Figure 7-31 on
page 333.Chapter 7. POWER7 Enterprise Server performance considerations 333
Figure 7-31 PM for Power Getting started web page
Getting started is extremely easy and there is nothing to order or install on the Power server
or operating system from a PM collection agent perspective.
7.9.3 Data collection
The collection of performance data using the PM agent is automated, self-managing, and
done on a partition boundary. The Electronic Service Agent (ESA) automatically triggers the
collection of non-proprietary performance data and automatically transmits the data to IBM
based on the parameters that you have defined. From a performance standpoint, the
collection programs use less than 1% of your processor unit.
The data is encrypted and sent to a secure IBM site. IBM automatically formats the raw data
into reports, tables, and graphs that are easy to understand and interpret. You can review
your performance data as often as you choose. Your information is updated daily from the
previous collection if you transmit daily. It is updated weekly if you transmit weekly.
The automated collection mechanism relieves the system administrator of the
time-consuming tasks that are associated with starting and stopping performance collections,
and getting raw data into a readable format that can easily be analyzed. After the ESA is
configured and data is transmitted, collection and reporting are self-managing. Data
continues to be collected, transmitted to IBM, and then deleted from your system to minimize
storage requirements. Your reports, graphs, and tables are available for analysis on the
website to view at your convenience.
A self-maintained, automated approach allows you to be proactive in the analysis of your
system performance. It provides a mechanism to avoid potential resource constraints and to
plan for future capacity requirements.334 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
It is important that systems and partitions transmit the PM for Power Systems data to IBM on
a consistent basis. The collection agent and IBM Service Agent are designed to automatically
continuously collect and periodically transmit the data to IBM; however, the possibility exists
that transmissions might not occur for a variety of reasons.
To help make it easy for you to monitor whether transmissions are getting through to IBM on a
consistent basis, an icon on the server information panel portrays a 90-day calendar of
successful or unsuccessful transmissions, as seen in Figure 7-32.
Figure 7-32 Customer Transmission History
7.9.4 Accessing the PM for Power Systems website
General information about PM for Power Systems, including any setup instructions, is
available at the PM for Power Systems home page:
http://www-03.ibm.com/systems/power/support/perfmgmt/index.html
On the right side of that page is a call-out box for PM for Power Systems reports, with a link to
the actual login page for report access:
https://pmeserver.rochester.ibm.com/PMServerInfo/loginPage.jsp
It is at this site that the clients must indicate that they are clients and enter their IBM Web IDs.
Figure 7-33 on page 335 shows the login page.
Mon Wed Tue Fri Sat Sun Tue Mon Wed Tue Fri Sat Sun Tue
Mon Wed Tue Fri Sat Sun Tue
1 2 3 4
1 2
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 4 5 6 7
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
331
26 27 28 29 30 13 14
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 26 27 28 28
303/2010 304/2010
305/2010
Mon Wed Tue Fri Sat Sun Tue Mon Wed Tue Fri Sat Sun Tue
Mon Wed Tue Fri Sat Sun Tue
1 2 3 4
1 2
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 4 5 6 7
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
331
26 27 28 29 30 13 14
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 26 27 28 28
303/2010 304/2010
305/2010Chapter 7. POWER7 Enterprise Server performance considerations 335
Figure 7-33 PM for Power Systems login page
After an LPAR or system is registered, the user is presented with the Server Information
Panel (SIP) when signing on with the IBM Web ID. From this window, you can select the group
of systems that you want to view. All systems and partitions are in one group if you do not
differentiate them into separate groups when you register the system or partition.
The SIP provides information about each partition for both first and second shift. There are
icons at the left of the window to use for the interactive graphing function or for requesting a
.pdf of either the full service detail report set (fee) or the summary level (no additional
charge) report.
Figure 7-34 on page 336 is an example of an SIP showing the icons to access to view
reports. We also show the definitions of the icons for authorizing an IBM Business Partner to
access your graphs and for checking the status of PM data transmission.336 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 7-34 Server information panel in PM for Power
Important: For a full description of the PM for Power Systems process and an explanation
of the detail level reports and graphs that are available, view the graph reference document
at this website:
http://www.ibm.com/systems/support/perfmgmt
This website shows you all the latest information and documentation about PM for Power.© Copyright IBM Corp. 2011. All rights reserved. 337
Chapter 8. PowerCare Services offerings for
Power Enterprise Servers
IBM Power Enterprise Servers are the most powerful and scalable members of our Power
Systems family. They have been designed to provide clients the most cost-effective IT
infrastructure. Power systems provide exceptional performance, massive scalability, and
energy-efficient processing to meet the highest levels of computing requirements.
This chapter identifies the PowerCare Services offerings available to our Power Enterprise
Server clients that have purchased Power Model 780 or 795 systems. IBM has integrated the
PowerCare Services offering with your Power server. The services are independent of the
operating system that you have chosen to deploy. This suite of service options, which is
provided to you at no additional charge, offers you technical leadership and consulting
resources to ensure that you effectively deploy and utilize your Power Systems environment,
and the many reliability, availability, and serviceability (RAS) and virtualization features
described within this IBM Redbooks publication.
We discuss the following topics in this chapter:
PowerCare highlights
PowerCare Services offerings
8
For more information: You can find the latest information and offerings on our PowerCare
website at this website:
http://www-03.ibm.com/systems/power/support/powercare/338 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
8.1 PowerCare highlights
We think that in order for our clients to concentrate on their core business issues, it is
essential to provide them with world-class IT services that complement our world-class IT
solutions. Our Power development teams have delivered outstanding RAS and virtualization
technologies that can provide high levels of system availability in today’s on-demand world.
Keeping your business up and running is about managing and minimizing risks along with
optimizing and maximizing the availability of your systems.
IBM Power Systems PowerCare Services bring skills and expertise to help you increase the
business value from your Power Systems investments, independent of whether you use the
IBM i or AIX operating system, or both. Every IBM Power 780 or 795 system is entitled to
receive one of these PowerCare service choices per serial number at no additional charge:
Availability optimization
IBM Systems Director and VMControl enablement
IBM Systems Director Active Energy Manager (AEM) enablement
Security assessment
Performance optimization
Power Flex enablement
Power 795 upgrade implementation service
Technical training
8.2 PowerCare Services offerings
IBM Power Systems PowerCare Services help you use your installation to deliver real
business value. The services are delivered by IBM Systems Lab Services and Training
personnel around the globe to provide expertise in all aspects of managing Power Systems
environments. This service includes your choice of one of the following services, which are
described in the following sections, at no additional charge. These services are designed to
assist you in taking advantage of emerging technologies on your Power Systems platform by
bringing the skills and resources of the development lab to your enterprise via on-site
consulting.
Our Power Systems Technical University offers hundreds of sessions on a wide range of
Power topics. Sessions include from beginner to advanced training levels, preferred practices,
and certification testing. You hear details behind all the latest POWER7 announcements and
have an opportunity to see all the latest Power System products and solutions in our solution
center. Technical Universities are held across the globe in a location that is convenient to you.
Important: Every IBM Power 780 or 795 system is entitled to receive one PowerCare
service choice per serial number at no additional charge.
PowerCare benefit: PowerCare clients, who complete a PowerCare engagement and
return the client feedback request, receive one complimentary admission to attend any
IBM Power Systems conference worldwide. One tuition waiver per company for each
qualifying serial number is eligible for this special offer.
The tuition waiver provides us an opportunity to say thank you and to provide you the
opportunity to explore the latest in Power Systems technology, and learn from IBM product
developers and experts, as well as network with your peers.Chapter 8. PowerCare Services offerings for Power Enterprise Servers 339
If you have questions, contact the IBM PowerCare team at pwrcare@us.ibm.com.
8.2.1 Availability optimization services
With the availability optimization service, you can choose one of two options to fit your
requirements. Choices include either an analysis of your Power Systems availability
technologies or a system health check of both hardware and operating system software. Our
consultants bring the skills and experience of the development lab to you for your choice of
on-site consulting, which assists you in taking advantage of emerging technologies, RAS
features, and virtualization on your Power platform.
Availability optimization assessment
The availability optimization assessment (AOA) is designed to be an analysis of your Power
Systems infrastructure based on your specific availability requirements. This independent
availability assessment provides a high-level review of the overall IBM Power Systems
environment availability readiness. The AOA reviews your current Power Systems
environment and aims to proactively identify system exposures that might affect overall
availability. Additionally, the assessment reviews the current system management processes
that support the overall environment and captures any concerns and issues that you might
have. The reviewer assists you in understanding how the new availability and virtualization
features, such as concurrent maintenance, PowerVM Live Partition Mobility, and PowerHA,
can strengthen the overall system’s availability.
The AOA provides a summary of findings and observations, along with specific
recommendations to meet your specific availability requirements, and supporting information
that acts as education and backup to the recommendations. Recommendations are made
based on preferred practices for availability that have been developed and learned through
many client deployments of our enterprise servers worldwide. IBM recognizes that every
client’s environment is unique, as well as its business requirements, so preferred
practice-based recommendations are tailored to meet your specific availability requirements.
There are twelve availability indicators that have been identified and each one is addressed
during your AOA. Refer to Figure 8-1 on page 340.340 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 8-1 PowerCare availability optimization assessment indicators
Power Systems health check optimization
The Power Systems health check optimization is designed to assess the overall health of your
Power Systems environment. The assessment aims to proactively identify hardware single
system points of failure (SPOF) and other problems. The assessment aims to identify areas of
exposure quickly before they can affect critical operations, including hardware, IBM software,
and setup. With this option, you gain access to IBM preferred practices and technology that
can maximize your resource utilization to improve your return on investment. With help from
the IBM PowerCare team, you can gain immediate insights into how to optimize your Power
systems environment and identify opportunities to strengthen overall system availability.
The end result of the health check is a comprehensive presentation, outlining system health
status and trends, areas for improvement, and recommended actions. These actions can be
performed by your staff or by IBM System Technology Group (STG) Lab Services in a
separate engagement.
The IBM PowerCare team has identified 10 individual health check indicators, and each
indicator is addressed during your PowerCare Health Check. Figure 8-2 on page 341 outlines
the health check indicators.
1 © 2 011 IBM Corpo ratio n
Power Syst ems Availabilit y Indicators
1. High Availability
2. Disas ter Recovery
3. Single Points of Failure & Hardware
Configuration
4. HMC, FSP and LPAR Configuration
5. Operating System Software Configuration
6. Patch and fix maintenance
7. Security and authority settings
8. Database
9. Backup and Recovery strategies
10. Communications
11. Data Center
12. Systems Management and Service Delivery
processes
Medium-High risk of not meeting requirements
Need focus on Availability exposures which have
ca p ab ility to ca u se o ut ag e a nd h ig h p o te n tial of n o t
me et b u sine ss r eq u ire me nt s
Orange
L ow r isk of not meeting requirements
Fu lly e xp loit ing a va il ab ilit y f un ct io n s an d f ea tu res .
Green
L ike ly t o me e t bu sin es s re qu ireme n ts.
Medi um r isk of not meeting requirements
P ar t ially exp lo iting a va ilab ility fu nc tion s a n d fe a tu res
but remaining vulnerable to outages with potential to
n o t me e t b u sin es s re qu ire me nt s.
Yellow
H ig h r isk of not meeting requirements
A va ila b ilit y e xp o sure s e xist t ha t are like ly t o ca use
sy stem outage and failure to meet busines s
re qu ire me n ts.
Red
Color Description
Four (R,O,Y,G) Assessment Metrics Chapter 8. PowerCare Services offerings for Power Enterprise Servers 341
Figure 8-2 Health check indicators
8.2.2 Systems Director and VMControl enablement
IBM introduced IBM Systems Director VMControl in the first half of 2009, and IBM has
significantly enhanced its capabilities since its introduction. VMControl is designed to
automate the management of a virtualized infrastructure, to improve workload resiliency, and
to help reduce deployment time for new virtual servers. VMControl is a plug-in for the IBM
Systems Director, which is the IBM enterprise-wide management platform for servers,
storage, networks, and software. After you install VMControl, it seamlessly integrates into
Systems Director’s browser-based interface, and VMControl can be used with systems that
are already under Systems Director management.
IBM Systems Director and VMControl enablement provide simplified virtualization
management, which enables faster problem-solving and helps you more effectively utilize
your virtual environment. VMControl enables you to deploy virtual appliances quickly and
easily and provides centralized management of virtual appliance assets, thus helping to
increase systems administrator productivity.
PowerCare Healthcheck Assessment Indicators
Helping to keep critical systems up and running, enabling end user access 24x7
1. Single Points of Failure & Hardware
configuration
2. HMC, FSP and LPAR Configuration
3. Operating System Software
Configuration
4. Patch and fix maintenance
5. High Availability/Disaster Recovery
6. Security and authority settings
7. Journaling & Database
8. Backup and Recovery strategies
9. Communications
10. Systems Management and Service
Delivery processes
WARNING - Medium-High risk
Need focus on Availability exposures which have
capability to cause outage and high potential of to
not meet business requirements
Orange
Low risk
Fully exploiting availability functions and features.
Likely to meet business requirements.
Green
CAUTION - Medium risk
Partially exploiting availability functions and
features but remaining vulnerable to outages with
potential to not meet business requirements.
Yellow
URGENT- High risk
Availability exposures exist that are likely to cause
system outage and failure to meet business
requirements.
Red
Color Description
Four (R,O,Y,G) Assessment Metrics
For more information: Refer to one of the following websites for more information about
IBM Systems Director and VMControl.
For IBM Systems Director:
http://www.ibm.com/systems/management/director
For IBM Systems Director VMControl:
http://www.ibm.com/systems/management/director/plugins/vmcontrol342 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
IBM Systems Director VMControl Standard Edition utilizes a workload-optimized approach to
decrease infrastructure costs and improve service levels. VMControl Standard Edition
captures workloads from active systems and stores them into a repository as reusable
system images, which are also referred to as virtual appliances. The VMControl Standard
Edition also provides support to manage virtual appliances and automate the deployment of
virtual appliances from a centralized repository.
The VMControl Standard Edition enables the creation and management of images to use
when creating and deploying virtual workloads using AIX (by using Network Installation
Management (NIM)) and Linux on Power. The definition of these system images is based on
the Distributed Management Task Force (DMTF) Open Virtualization Format (OVF)
specifications. The open design and support of industry standards enable IBM Systems
Director and VMControl to provide heterogeneous physical and virtual management of
multiple platforms and operating systems, which helps protect client IT investments.
VMControl offers these benefits:
Eliminates the installation, configuration, and maintenance costs that are associated with
running complex stacks of software
Reduces operational and infrastructure costs due to increased efficiency in using IT
resources
Manages a library of ready-to-deploy or customized system templates that meet specific
hardware and software requirements
Stores, copies, and customizes existing images to reuse them within system templates for
creating virtual servers
The intent of the PowerCare Systems Director and VMControl enablement service is to set up
a proof-of-concept (POC) environment in your Power environment. The service provides a
highly skilled Lab Services and Training resource team at your location to help install,
configure, and exploit the capabilities of IBM Systems Director and PowerVMControl. It is not
intended to be the foundation of a production-ready solution, but to provide you with the ability
to learn how to best use the features and functions that are important to you and to observe
the configuration behavior in a safe and non-disruptive environment.
The IBM team works with the client’s team to identify Power platform management
requirements, issues, and strategies for your Power Systems environment. The team helps
you to develop the foundation of your IBM Systems Director solution that addresses your
specific objectives.
The strategies and objectives incorporate many features of IBM Systems Director VMControl
and might include these features:
Managing your virtualized environments
Creating and managing virtual servers
Managing a cross-platform environment
Monitoring system resources and alerting with automation plans
Updating management
Discovering inventory and devices
There are several activities that must be completed before your on-site service can be
performed. The IBM team needs to understand your current IT environment and your systems
management objectives to identify what to include in the implementation. There are also
Terminology differences: VMControl refers to a logical partition (LPAR) as a virtual
server.Chapter 8. PowerCare Services offerings for Power Enterprise Servers 343
hardware, software, and network requirements for the installation of IBM Systems Director
and VMControl that must be met to have the overall readiness of the environment.
At the end of the service, you receive a document that contains your identified Power platform
management requirements and issues and an overview of your IBM Systems Director
installation and configuration, along with a summary of the key functions that were
implemented.
8.2.3 Systems Director Active Energy Manager enablement
The IBM Systems Director Active Energy Manager (AEM) measures, monitors, and manages
the energy components that are built into IBM Power Systems, enabling a cross-platform
management solution. AEM extends the scope of energy management to include facility
providers to enable a more complete view of energy consumption within the data center.
AEM is an IBM Director extension that supports the following endpoints: IBM BladeCenter,
Power Systems, System x, and System z servers. IBM storage systems and non-IBM
platforms can be monitored through protocol data unit (PDU+) support. In addition, AEM can
collect information from selected facility providers, including Liebert SiteScan from Emerson
Network Power and SynapSense.
The AEM server can run on the following platforms: Windows on System x, Linux on System
x, Linux on System p, and Linux on System z. AEM uses agent-less technology, and therefore
no agents are required on the endpoints.
The objective of the AEM enablement service is to help you install, configure, and exploit the
capabilities of IBM Systems Director AEM along with the core features of IBM Systems
Director. The IBM team works with your team to understand your data center platform, as well
as your energy and thermal management requirements. The IBM team works with the data
center staff to show them how to use AEM to manage actual the power consumption that
affects the thermal load that your IBM servers are placing on your data center. By providing
hands-on skills transfer throughout the engagement, your staff learns how to best use the
features and functions that are important within your environment.
Your PowerCare AEM enablement summary document contains this information:
Your data center platform, energy, and thermal management requirements and issues
An overview of your Systems Director and AEM installation and configuration
A summary of the key functions that are implemented
Next steps
8.2.4 IBM Systems Director Management Console
The PowerCare SDMC services are designed to set up a POC with the SDMC in your data
center. The POC is intended to provide your staff with how to best use the SDMC features
and functions that are important to your environment, and to observe the behavior of the
features and functions in a non-disruptive setting. The IBM consultants can create a plan to
extend this solution to a production environment if you want.
On-site PowerCare SDMC activities include these tasks:
1. Review your current environment and overall readiness for implementing SDMC.
2. Discuss your systems management objectives and clarify any new objectives.
3. Examine the features and capabilities of the SDMC for a Power Systems environment.
4. Develop a tactical plan to address the systems management objectives identified.344 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
5. Install IBM Systems Director Common Agents on target managed systems for enabling
SDMC management control of the systems.
6. Configuring and activating one or more of the SDMC functions might be included in the
implementation of the platform management objectives.
Informal hands-on skills transfer is provided throughout the engagement for the Power staff to
have a complete understanding of the SDMC.
At the end of your PowerCare services engagement, we develop the IBM SDMC engagement
summary document that summarizes your systems and platform management requirements
that were identified, along with the identification of managed endpoints that were included in
the implementation.
8.2.5 Security assessment
In our current economic climate, many security and IT experts are warning enterprises that
the threat for data theft is growing and can proved devastating for many enterprises. Good
security is like an onion. Layered security provides the best protection, because it does not
rely solely on the integrity of any one element. Multiple layers of security that cost potential
outside intruder’s time and dollars, because they must deal with and defeat successive layers
of barriers. In the real world, multiple layers of security often cause the malicious attacker to
get frustrated, or to simply run out of time and options before an actual vulnerability occurs.
The PowerCare security service offers you a choice of several options to meet your specific
security needs:
PowerCare security assessment for AIX or IBM i
Single sign-on (SSO) for IBM i implementation assistance
Lightweight Directory Access Protocol (LDAP)-Microsoft Active Directory (MSAD) AIX
implementation workshop
Role-based access control (RBAC) AIX implementation workshop
Security assessment
The PowerCare security assessment evaluates the security configuration of your Power
Systems, identifying vulnerabilities before they can be exploited. New laws and regulations in
the United States, such as Payment Card Industry (PCI), Sarbanes-Oxley (SOX), and the
Health Insurance Portability and Accountability Act (HIPAA), are forcing organizations to be
compliant with security and privacy requirements. Our security professionals can help you to
adapt to new ways of working, and new ways of thinking about security and the regulatory
requirements:
Sarbanes-Oxley Act of 2002 (SOX), which requires compliance for software security and
application security based around digital data integrity and accountability.
Health Insurance Portability and Accountability Act of 1996 (HIPPA), which protects the
privacy of individually identifiable health information. It sets national standards for the
security of electronic protected health information and the confidentiality provisions of the
patient safety rule, which protect identifiable information from being used to analyze
patient safety events and improve patient safety.
Payment Card Industry Data Security Standard (PCI DSS) standards include twelve
requirements for any business that stores, processes, or transmits payment cardholder
data. These requirements specify the framework for a secure payments environment.Chapter 8. PowerCare Services offerings for Power Enterprise Servers 345
Thorough documentation of the results and specific recommendations for mitigating the
identified vulnerabilities and improving overall security posture are provided at the conclusion
of the service. Do not consider the review a complete and comprehensive report on every
aspect of security on your Power System. You cannot capture each and every exposure that
might exist. We think that the findings and recommendations are an add-on to complement
your well-thought-out existing security policy. The review can be used as a status check on
many of the settings and configurations that exist at the time of the assessment. It can further
be used as a measurement of how your system is configured in relationship to your security
policy and other regulatory requirements that exist.
Every client’s environment differs, so final recommendations are created that take into
account the uniqueness of your particular environment. The security assessments focus on
IBM AIX Version 5.3 or later and on IBM i 5.4 or later.
The security assessment proactively looks at several areas within AIX:
System installation and configuration
Auditing and logging
File and directory permissions
Login controls
Security architecture
AIX-specific security feature usage by release
User and group accounts
Passwords
TCP/IP security
Network File System (NFS) security
The security assessment on IBM i addresses the following security areas:
System installation and configuration
Auditing and logging
Resource security
Integrated file system (IFS) security
NetServer security
Network security, including TCP/IP
Users and group controls
At the completion of the assessment, feedback is provided on how you can improve your
current security implementation. It also includes advice on taking your security
implementation to the next level to enhance your overall security strategy and to take
advantage of the features that are provided with IBM Power Systems. Preferred practices can
significantly help reduce security exposures in all environments.
Figure 8-3 on page 346 shows one example of the type of information that your assessment
provides.346 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure 8-3 Shows a quick summary of security-related exposures on IBM i
Single sign-on (SSO) for IBM
The objective of SSO is to have one single logon per user that allows users access to all
applications and all systems that they require. Its goal is to provide a unified mechanism to
manage the authentication of users and implement business rules that determine user
access to applications and data. Without a unified sign-on strategy, developers re-implement
custom security for each application, which can limit scalability and create on-going
maintenance issues.
Preferred practice: Perform a security assessment yearly to provide a health check of
your security effectiveness.Chapter 8. PowerCare Services offerings for Power Enterprise Servers 347
We implemented SSO technology within IBM i in V5R2 (4 June 2002). The implementation is
designed where network users can access a Network Authentication Service (NAS) to
automatically authenticate and authorize themselves to sign on to IBM i applications without
entering a user profile and password.
The Enterprise Identity Mapping (EIM) table maps a network user’s Microsoft Windows
domain identity to specific user profiles for each i partition to which the user is authorized to
sign on. The EIM table is maintained on an i partition and accessed through an LDAP server.
You can obtain additional information about SSO in Windows-based Single Sign-on and the
EIM Framework on the IBM eServer iSeries Server, SG24-6975.
There are many benefits that SSO can provide to your organization:
Improved user production because users no longer have to remember multiple user IDs
and passwords.
Fewer requests to your help desk to reset forgotten passwords or locked accounts.
Simplified administration because managing user accounts is much easier, but
applications can still require additional user-specific attributes within the application.
You might choose to have IBM assist you with the implementation of SSO for your IBM i and
show you how to avoid several of the pitfalls that slow down your deployment. Your
PowerCare SSO service includes these tasks:
Integrating up to four IBM i systems with your Microsoft Windows Server 2003
domain/active directory authentication environment.
Configure your IBM i operating system correctly to participate in the environment.
Configure one instance on an EIM domain to be used by all four instances of the IBM i
operating system.
Enable up to five user IDs to use the password elimination.
Assist with the analysis of your critical applications relating to the ability and effort to allow
participation in the password elimination/SSO environment.
LDAP-MSAD AIX implementation workshop
Lightweight Directory Access Protocol (LDAP) is an open standard that provides a central
mechanism for maintaining system configuration and policy information. This standard allows
you to configure and manage multiple systems with a single set of user identity configuration
information, which simplifies system administration tasks.
Today, most of your users begin their computing sessions by logging into a Microsoft
Windows Active Directory (MSAD) domain. The ability to reuse their MSAD credentials is a
significant benefit for users when their end destination is a Power server that requires
additional login authentication. You can find more information about AIX and LDAP in
Integrating AIX into Heterogeneous LDAP Environments, SG24-7165.
Your LDAP-MSAD implementation workshop provides an on-site demonstration using your
AIX LDAP client partition and your MSAD server to show the capability of a user logging onto
an AIX partition using a Windows account and password combination that has been properly
enabled. Your IBM Services Specialist provides on-site training and skills transfer to explain
the fundamentals of how LDAP is used with AIX to provide centralized user management and
IBM Lab Services EIM Populator utility: The PowerCare service includes the use of
the IBM Lab Services EIM Populator utility during this engagement to assist with the
loading of identity mapping information into EIM.348 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
how to use Microsoft’s Identity Management with UNIX to enable Windows user and group
accounts to be used by your Power AIX partitions.
8.2.6 Performance optimization assessment
Performance is a concern for IT worldwide. Dependencies on hardware, software
applications, and distributed computing enhance the need for these systems to perform well.
By properly balancing system resources, jobs can run at their optimal levels with minimal
resource conflicts. To optimize your system’s performance, you need to fully understand the
business requirements that your Power System is addressing and be able to translate these
business needs into performance objectives.
The performance optimization assessment is for clients who want to obtain information that
assists with their Power System performance optimization. The performance services offering
is three-fold:
Provide guidance with the usage of virtualization technologies to identify where
consolidation and workload balancing apply. The major focus of server virtualization is to
reduce IT complexity and total cost of ownership, while maximizing resource usage.
Virtualization enables the consolidation of many disparate and potentially underutilized
servers into fewer physical systems, which can result in reduced system management,
software licensing, and hardware costs.
Identify areas where machine consolidation applies.
Provide a system health check with a focus on performance optimization, inspect the
operating system running on specific LPARs for up-to-date fix levels and drivers, and
provide memory, disk, and swap space system parameters.
The IBM Power Systems performance assessment is designed to help you optimize your
Power Systems environment using virtualization, and to increase the efficiency of one or
more of your critical partitions. The number of LPARs that can be analyzed depends on the
complexity of your environment. The IBM Power Systems performance assessment aims to
identify the areas where the performance of specific partitions can be fine-tuned by changing,
where applicable certain settings:
Micro-partitioning settings
Partitioning and resource allocation
Virtual I/O server setup
Virtual Ethernet
Virtual Fibre Channel (FC)
Virtual storage
Subsystem setup
Journal configuration
General system settings
Additional configuration objects
Figure 8-4 on page 349 shows an example of a performance assessment summary.
Changing performance objectives: Remember that as your business needs evolve and
change, your performance objectives must also evolve and change.
IBM i and AIX: Both IBM i and AIX operating systems are included in the performance
assessment.Chapter 8. PowerCare Services offerings for Power Enterprise Servers 349
Figure 8-4 Performance assessment summary
Performance data is collected from your identified Power System partition and analyzed using
the appropriate operating system tools. Although performance improvements cannot be
guaranteed, the intended result of the assessment is to provide you with recommendations to
improve your overall system performance. If time permits, modifications to the system might
be tested under the IBM team’s supervision.
8.2.7 Power Flex enablement
Power Flex is a new and exciting capability on our high-end IBM POWER7 Power 795
Enterprise Server that allows more flexible use of purchased processor and memory
activations across a pool of Power 795 systems to help increase the utility of these resources
and to enhance your application availability.
A Power Flex infrastructure on the Power 795 can deliver unprecedented performance,
capacity, and seamless growth for your AIX, IBM i, and Linux applications today and into the
future. Flex Capacity Upgrade on Demand provides new options for resource sharing in
support of large-scale workload consolidation and changing business and application
requirements. Employing the strength and innovation of the IBM Power 795 server, PowerVM
virtualization, and Capacity on Demand technology, Power Flex can enable your organization
to more affordable deploy applications across two to four enterprise systems to enhance
application availability. At the same time, Power Flex allows vital virtual processor and
memory resources to be deployed precisely where you need them, dynamically adjusting
capacity and even reallocating it beyond the boundaries of a single Power 795 system.
The Power Flex enablement service is centered around high availability and is intended for
clients who want to implement Power Flex, and take advantage of all the capabilities that it
provides. In addition, the offering takes a system health check in both hardware and software
areas with the goal of improving system availability. The Power Flex offering is limited to
analyzing up to four eligible IBM Power servers as part of a Power Flex pool.
Basic Virtualization implemented-LPARs.
Intermediate Virtualization features like shared processors, VIO Servers not implemented, but in plans.
Advanced Virtualization features like Live Partition Mobility are not possible due to lack of pre-requistes.
Grouping of production and development / test workloads achieved.
DR servers are also used for development and test workloads. During DR, non-business critical
workloads will be shutdown to make way for production-in DR.
Additional consolidation can happen based on resource usage /VIO Servers.
Additional processors will be added support newer workloads.
CPUs utilization is maximum for Singapore server.
Other workloads are running at optimal utilization.
Overall memory utilization is optimal.
A few best practice changes for memory management are recommended.
Memory management tuning recommendations can potentially reduce the possibility of paging during
peak workload periods.
Some servers running with visibly very high IO Waits.
At times, IO waits are higher than the actual system utilization at that time.
IOs pending due to non-availability of LVM, JFS2 and Device driver buffers increase daily.
FLRT and MDS reports indicate recommended upgrades.
Some of the updates needed are HIPER & PE.
Indicator Score Observations
Virtualization
Consolidation
Performance Review
CPU performance
Memory performance
IO performance
Availability Review350 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
IBM PowerCare experts assist you with the enablement of your Power Flex environment,
which consists of two to four Power 795 systems, Flex Capacity on Demand option, On/Off
Capacity on Demand, and optionally a PowerVM Live Partition Mobility. The Power Flex
solution enables you to shift licensed capacity from one system to another system to perform
scheduled system maintenance. It can also help you balance workloads and help handle
peaks in demand.
The Power Systems Power Flex enablement and consulting workshop focus on your Power
Systems hardware and software configuration and provide solution options that relate to
Power Flex and high availability.
This offering includes these functions:
Power Flex solution offering and design
Hardware configuration options
Power Flex licensing options
Operational processes and expectations
Software licensing
Resource balancing scenarios
Planned maintenance scenarios
Capacity Upgrade on Demand and On/Off Capacity on Demand
Planned outages utilizing Live Partition Mobility and Capacity Upgrade on Demand
The following availability components are addressed in conjunction with Power Flex:
PowerHA solution scenarios and options
Unplanned outages utilizing PowerHA and Capacity Upgrade on Demand
Single system points of failure and hardware configuration
Hardware Management Console (HMC), Fibre Channel Service Protocol (FSP), and
LPAR configuration
Operating system software levels and support
Technology Levels and fix maintenance prerequisites
Figure 8-5 on page 351 shows a sample Power Flex enablement summary.Chapter 8. PowerCare Services offerings for Power Enterprise Servers 351
Figure 8-5 Power Flex enablement summary
8.2.8 Power 795 upgrade implementation services
Our consultants bring the experience and expertise of IBM to assist you in the planning and
implementation of an upgrade to a Power 795 environment. Our experience with other Power
client upgrades worldwide can help you to identify and mitigate potential risks that might
affect both the time and cost of your migration project. The PowerCare Power 795 upgrade
implementation service works with you during the planning and deployment of a POWER6
595 to Power 795 upgrade. This service helps you to ensure that the LPARs, virtual I/O
servers, and AIX configurations are prepared and backed up so that they can be quickly
deployed after the hardware upgrade is complete. The PowerCare team has developed
several tools to automate the data gathering and backup requirements and to ensure that the
data is captured and readily available when you need to restore it. Data is captured from both
AIX and IBM i operating system partitions.
Upgrading the POWER6 595 to the Power 795 is a complex process, which requires careful
coordination and planning. The Power 795 requires the current levels of operating systems
and firmware on the HMC, virtual I/O servers, and all the LPARs that will be in place before
the upgrade can be successfully completed. The PowerCare upgrade services deal strictly
1 © 2 011 IBM Corpo ratio n
PowerCare Power Flex Enablement Summary
?
?
?
?
?
?
?
?
?
?
?
Status
L ive P a rt ition Mob ility Act ive ly u sin g LPM on selected par titions ou tside of Po wer Fle x
Excellent fix level currency to support Power Flex
T ec hn olo g y L e vels a nd f ix
ma in tenance
Will use Advan ced P lan ning to e limin ate se mi-a nnu al ma in tena nce windo w o utag es. The y
a re awar e of t he 2 day lea d- time r equ iremen t.
A d va nc ed P lan n ing
St and ard ize d on AIX 5 .3 TL1 2 an d AI X 6.1 TL 6 a cr oss en ter pr ise.
HMC cod e cur re nt; FSP co de le ve ls su ppo rte d an d con fig ur ation su ppo rt s Po we r Flex
Po we rV M 2. 2.0.1 0 ( sp24 ) Newest fix pa ck in st alle d .
Each system within the Power Flex pool can handle 100% of the additional CPU
r eq uire ment s at th is t ime. Har dwar e ha s excelle nt r edu nda ncy conf ig ura tion
Ve ry f ami lia r wi th u sin g Po we rHA ( HA CMP ) f or f ailo ver s. Current ve rsion is V5 .4.1 goes
out of support 9/2 011 and they are dev eloping migrat ion pla n to V6 .1
The clien t ha s alre ady u se d 30 -d ay tr ia l ca pacity
Cur re ntly using On/Off Cap acity on Demand pr ocessor days to solve pe rfo rmance issue s
r ath er than to p ro vide utilit y comp uting f or sh or t-te rm pr ojects or wor kloa d sp ikes
Pla nning to move a ctiva tio ns when nee ded and can p re dict pea k wo rkload times
Power Systems Capacity on Demand Project Office has not received signed documents as
of this en ablement eng age men t
O b se rvat ion s
PowerHA Syst em Mir ror
O p era tin g S yste m S of twa re
Co n fig ura tio n
HMC, FS P a nd LP AR
Co n fig ura tio n
S in gle P oin t s o f f ailu re &
Hardware Configuration
Trial On/Off Capacity on
Demand
On/Off Capacity on Demand
Re so u rce re-b ala n cing
P o wer Flex L icen sin g & Ke ys
Po wer Fle x Co mpo n e nt s
Skills transfer: The IBM team works side-by-side with your team to develop and execute
the project steps to deliver a successful Power 795 upgrade. 352 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
with the technical work that is necessary to move workloads from your existing Power system
to your new POWER7 system. Several aspects of the system configuration must be captured
before the upgrade to ensure that the system can be fully returned to an operating state after
the upgrade. Figure 8-6 shows a sample of the detailed hardware planning that is
documented.
Figure 8-6 Hardware upgrade planning
Your Power technology services specialist from our STG Lab Services consulting
organization provides support during the pre-upgrade/migration, upgrade/migration, and
post-upgrade/migration phases for ensuring that the LPARs, virtual I/O servers, and AIX
configurations, and software/firmware prerequisites are set up as required. This support
includes the remote assessment of the source and target environments prior to the migration,
on-site support to the IBM migration team, and post-migration support to assist with the
reconfiguration and setup of the target environment LPARs, virtual I/O servers, and AIX, if
required.
After your hardware upgrade has been completed, your PowerCare services specialist
remains on-site while you boot all partitions and validate that the upgrade was successful.
8.2.9 PowerCare technical training
The IBM Systems Lab Services and Consulting team provides PowerCare service options to
our Power 780 and 795 clients with technical leadership and consulting at no charge. We can
help optimize the utilization of your Power System solutions. We are focused on the new
technologies that are emerging from the IBM product development labs and the delivery and
training of new and important technologies to our clients. The IBM Systems Lab Services and
Training Power Services team is fully experienced in technology implementation in Power
Systems environments. This highly trained part of the IBM Power Systems development lab Chapter 8. PowerCare Services offerings for Power Enterprise Servers 353
offers comprehensive knowledge of and experience with the products and solutions that can
help you get more from your Power Systems investment.
Our newest PowerCare service option, PowerCare Technical Training, brings a subject matter
expert (SME) to your site to help you increase the knowledge of your technical team on one of
several Power course topics. You can choose from a selection of Power course topics and
solution areas (Refer to Table 8-1):
Availability
Systems Director VMControl
Systems Director/AEM
Security
Performance optimization
The PowerCare training courses consists of content that maps closely to the skills that are
required to perform other PowerCare services options. These on-site training services
provide lab exercises to complement the learning objectives. You can ask questions and learn
about processes that are specific to your Power environment from experts in the industry.
Your training also includes consultation with you on the next steps to address your training
requirements through our training road maps.
Table 8-1 PowerCare course catalog
Course code Course title Operating system
AP21 Availability for Power Systems AIX
AP23 Security for Power Systems AIX
AP24 Security for IBM i IBM i
AP25 Performance for Power Systems AIX
AP26 Performance for IBM i IBM i
AP27 Systems Director for Power Systems AIX and IBM i
For more information: You can obtain more information about the IBM PowerCare
education choices at this website:
http://www-03.ibm.com/services/learning/ites.wss/ph/en?pageType=page&contentID=
a0000775354 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 355
Appendix A. Administration concepts
This appendix provides IBM Power Systems administration with AIX concepts that relate to
the testing that we performed and the examples in this publication.
We discuss the following topics:
Making a root volume group (rootvg) easier to manage
Example importing non-root volume group
A dynamic LPAR operation using the HMC
Setting up Secure Shell keys between two management consoles
Simple cluster installation
Installing and configuring PowerHA
A356 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Making a root volume group (rootvg) easier to manage
If at all possible, do not put non-system file systems in the rootvg. Only system data must
reside in rootvg.
For instance, an administrator can install WebSphere in /usr/WebSphere or DB2 in /opt/IBM.
Both /usr and /opt are operating system file systems. You end up with a large rootvg.
Consider using a completely separate path. If the application is hard-coded to use only a
certain path, consider creating a separate file system on anther disk with the required path.
Non-system data makes it difficult to manage the root (rootvg) location configuration.
Application configuration files, data, and log files increase exponentially and can make the
rootvg large. Unless the /etc/exclude.roovg file is created and updated with files that must
not be backed up, all file systems on the rootvg are backed up with the mksysb command.
The mksysb backups, as well as the alternate disk clones, become very large and
time-consuming to create and restore.
Example A-1 shows which of the file systems must be included in the rootvg. Any other file
systems must be on separate disks (physical volumes), thus another volume group.
Example A-1 Rootvg without non-system files
# hostname
rflpar20
# lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot 2 2 1
closed/syncd N/A
hd6 paging 32 32 1 open/syncd N/A
hd8 jfs2log 1 1 1 open/syncd N/A
hd4 jfs2 12 12 1 open/syncd /
hd2 jfs2 114 114 1 open/syncd /usr
hd9var jfs2 15 15 1 open/syncd /var
hd3 jfs2 4 4 1 open/syncd /tmp
hd1 jfs2 1 1 1 open/syncd /home
hd10opt jfs2 22 22 1 open/syncd /opt
hd11admin jfs2 8 8 1 open/syncd /admin
lg_dumplv sysdump 64 64 1 open/syncd N/A
livedump jfs2 16 16 1 open/syncd
/var/adm/ras/livedump
For more information about volume groups and file system creation, refer to the AIX Logical
Volume Manager from A to Z: Introduction and Concepts, SG24-5432.
Important: Administrators might put only configurations and binaries (installation) on file
systems other than the rootvg file system. Make sure that log files and data are excluded
from the root volume group file systems. Restoring rootvg restores both the binaries and
the base operating system (BOS).Appendix A. Administration concepts 357
Example importing non-root volume group
After a new and complete overwrite or a mksysb restore, it might be necessary to re-import the
data volume groups. If the data resided on the root volume group, it might be lost. If the data
and applications resided on a disk other than the rootvg disk, they can be recovered by
importing the volume groups.
Example A-2 shows how to confirm the content of a disk after recovering the volume group.
Example A-2 Displaying how to import a volume group
After a system creation. df might show only system file systems
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 393216 70456 83% 11875 57% /
/dev/hd2 3735552 247072 94% 34250 53% /usr
/dev/hd9var 491520 139432 72% 6501 29% /var
/dev/hd3 163840 160536 3% 37 1% /tmp
/dev/hd1 32768 32064 3% 5 1% /home
/dev/hd11admin 262144 261416 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 720896 242064 67% 8144 23% /opt
/dev/livedump 524288 523552 1% 4 1% /var/adm/ras/livedump
#
This shows only root volume group volumes.
To check if there are any other disks run
# lspv
hdisk0 00c1f170c2c44e75 rootvg active
This example shows only hdisk0 as being avalable.
Confirm if there are no other attached or lost disks by running lsdev.
# lsdev -Cc disk
hdisk0 Available Virtual SCSI Disk Drive
The example still shows one disk
Run cfgmgr to see if any other devices are added but not in the systems
Configuration (CuDv). And then, run lspv if the number of disks changes.
# cfgmgr
# lspv
hdisk0 00c1f170c2c44e75 rootvg active
hdisk1 00f69af6dbccc5ed None
hdisk2 00f69af6dbccc57f None
#
Now, the system shows two more disks, but one volume group.
Confirm the contents of the disks before assuming they are not in use.
# lqueryvg -Atp hdisk1
0516-320 lqueryvg: Physical volume hdisk1 is not assigned to
a volume group.
0516-066 lqueryvg: Physical volume is not a volume group member.
Check the physical volume name specified.
#
# lqueryvg -Atp hdisk2358 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
0516-320 lqueryvg: Physical volume hdisk2 is not assigned to
a volume group.
Max LVs: 256
PP Size: 25
Free PPs: 311
LV count: 6
PV count: 1
Total VGDAs: 2
Conc Allowed: 0
MAX PPs per PV 32768
MAX PVs: 1024
Quorum (disk): 1
Quorum (dd): 1
Auto Varyon ?: 0
Conc Autovaryo 0
Varied on Conc 0
Logical: 00c1f17000004c0000000130095dd271.1 db2bin 1
00c1f17000004c0000000130095dd271.2 loglv00 1
00c1f17000004c0000000130095dd271.3 data1lv 1
00c1f17000004c0000000130095dd271.4 data2lv 1
00c1f17000004c0000000130095dd271.5 data1lg 1
00c1f17000004c0000000130095dd271.6 data1tmp 1
Physical: 00f69af6dbccc57f 2 0
Total PPs: 317
LTG size: 128
HOT SPARE: 0
AUTO SYNC: 0
VG PERMISSION: 0
SNAPSHOT VG: 0
IS_PRIMARY VG: 0
PSNFSTPP: 140288
VARYON MODE: ???????
VG Type: 2
Max PPs: 32768
Mirror Pool St n
Hdisk2 shows that it has some data, but no volume group. You can now import the
volume. If you have documentation of the volume group name and Major number, You
can specify it. PowerHA has Major number requirements not discussed in this topic
Import the volume
# importvg -y datavg hdisk2
datavg
# lspv
hdisk0 00c1f170c2c44e75 rootvg active
hdisk1 00f69af6dbccc5ed None
hdisk2 00f69af6dbccc57f datavg active
Confirm the contents of datavg
# lsvg -l datavg
datavg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
db2bin jfs2 1 1 1 closed/syncd /opt/IBM/db2
loglv00 jfs2log 1 1 1 closed/syncd N/A
data1lv jfs2 1 1 1 closed/syncd /userdata/dat1Appendix A. Administration concepts 359
data2lv jfs2 1 1 1 closed/syncd /userdata/dat2
data1lg jfs2 1 1 1 closed/syncd /userdata/dblog
data1tmp jfs2 1 1 1 closed/syncd /userdata/dbtmp
Mount the filesystems
# mount /opt/IBM/db2
# mount /userdata/dat1
# mount /userdata/dat2
# mount /userdata/dblog/
# mount /userdata/dblog
# mount /userdata/dbtmp
Rerun df and compare the results with that in the begining of this example.
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 393216 70408 83% 11894 57% /
/dev/hd2 3735552 247072 94% 34250 53% /usr
/dev/hd9var 491520 139432 72% 6501 29% /var
/dev/hd3 163840 160536 3% 37 1% /tmp
/dev/hd1 32768 32064 3% 5 1% /home
/dev/hd11admin 262144 261416 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 720896 242064 67% 8144 23% /opt
/dev/livedump 524288 523552 1% 4 1% /var/adm/ras/livedump
/dev/db2bin 65536 64864 2% 4 1% /opt/IBM/db2
/dev/data1lv 65536 64864 2% 4 1% /userdata/dat1
/dev/data2lv 65536 64864 2% 4 1% /userdata/dat2
/dev/data1lg 65536 64864 2% 4 1% /userdata/dblog
/dev/data1tmp 65536 64864 2% 4 1% /userdata/dbtmp
A dynamic LPAR operation using the HMC
In this example, we remove an adapter dynamically from a running logical partition (LPAR) by
using an HMC. We added this adapter in 2.10.2, “Dynamically changing the LPAR
configurations (DLAR)” on page 59 using the Systems Director Management Console
(SDMC). Follow these steps:
1. Run the lsdev command to list the devices on the LPAR, as shown in Example A-3.
Example A-3 Executing the lsdev command
# lsdev -Cc adapter
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
fcs0 Available 20-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 21-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 22-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter
vsa0 Available LPAR Virtual Serial Adapter
vscsi0 Defined Virtual SCSI Client Adapter360 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2. Remove the adapter configuration from the operating system by issuing the following
commands:
# rmdev -dl vscsi0
vscsi0 deleted
Confirm that the adapter has been removed, as shown in Example A-4.
Example A-4 Confirmation of the removal of the adapter
# lsdev -Cc adapter
ent0 Available Virtual I/O Ethernet Adapter (l-lan)
fcs0 Available 20-T1 Virtual Fibre Channel Client Adapter
fcs1 Available 21-T1 Virtual Fibre Channel Client Adapter
fcs2 Available 22-T1 Virtual Fibre Channel Client Adapter
fcs3 Available 23-T1 Virtual Fibre Channel Client Adapter
vsa0 Available LPAR Virtual Serial Adapter
#
3. Log on to the HMC. Select Systems Management ? Servers. Select the server? Select
the required managed server. See Figure A-1.
Figure A-1 Selecting a managed serverAppendix A. Administration concepts 361
4. Select the LPAR. Click Tasks ? Dynamic Logical Partitioning ? Virtual Adapters. See
Figure A-2.
Figure A-2 Selecting virtual adapters
5. Highlight the adapter. Click Actions ? Delete ? Confirm when prompted, as shown in
Figure A-3.
Figure A-3 Dynamic LPAR operation to delete an adapter
6. You can run the cfgmgr and lsdev commands again to confirm that the adapter is no
longer available.362 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Setting up Secure Shell keys between two management
consoles
In this example, we use the SDMC and the HMC. To be able to send remote commands
between an HMC and an SDMC, perform these steps:
1. Log on to the HMC and confirm that you have a connection between the consoles, as
shown in Example A-5.
Example A-5 Pinging the HMC for a connection
sysadmin@sdmc1:~> ping -c 2 -i 2 172.16.20.109
PING 172.16.20.109 (172.16.20.109) 56(84) bytes of data.
64 bytes from 172.16.20.109: icmp_seq=1 ttl=64 time=0.165 ms
64 bytes from 172.16.20.109: icmp_seq=2 ttl=64 time=0.185 ms
--- 172.16.20.109 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.165/0.175/0.185/0.010 ms
2. Set up Secure Shell (ssh) keys between the HMC and the SDMC. From the source HMC,
run the mkauthkeys command, as shown in Example A-6.
Example A-6 Running the mkauthkeys
hscroot@hmc4:~> mkauthkeys --ip 172.16.20.22 -u sysadmin -t rsa
Enter the password for user sysadmin on the remote host 172.16.20.22:
hscroot@hmc4:~>
In Example A-6:
– --ip is the destination server HMC (the SDMC in this example).
– --u is the user for this migration, typically, sysadmin or hscroot.
This ssh connection is required for remote executions, such as Remote Live Partition
mobility.
Simple cluster installation
In this section, we show you the PowerHA installation that we performed for this book. For a
comprehensive installation of PowerHA, refer to IBM PowerHA SystemMirror 7.1 for AIX,
SG24-7845.
System planning is a requirement. Figure A-4 on page 363 shows a planning diagram for our
environment.
Prior to the installation, read the product installation guide to confirm that you have met all the
prerequisites. The prerequisites influence the success or failure of the installation.
Remote command: If the destination manager is an HMC, you have to set up remote
command by selecting HMC Management and then selecting Set up Remote
command. This step is done by default on the SDMC.Appendix A. Administration concepts 363
Figure A-4 Simple cluster configuration
We performed the following steps on each client. At this point, you do not have PowerHA
configured. Follow these steps:
1. Mount the installation media on the two LPARs that are going to form part of the cluster.
The media can be CD ROM or a file system. In Example A-7, we used a Network File
System (NFS) mount file system.
Example A-7 Mounting the file system
# mount nimres1:/bigfs /mnt
2. Run installp Preview to see if all prerequisites have been met. Although you can run
installp -agXY -d -p to confirm the prerequisites, we recommend that you use smitty.
Select Smitty install ? Install and Update Software ? Install Software. Then, enter
the INPUT device/directory for the software. You can change the directory to the
installation directory, but it is not necessary.
Under the input directory, specify the directory where the installation media is located. In
our example, we used /mnt/HA71. This action opens an installation window, as shown in
Figure A-5 on page 364.
caa
data1vg
Virtual I/O Setup
Cluster : pough
RG : lpar1svcrg
Pnode : rflpar10
SVC : rflpar10_svc
AS : lpar1appserver
VG : data1vg
RG : lpar2svcrg
Pnode : rflpar20
SVC : rflpar20_svc
AS : lpar2appserver
VG : data2vg
data2vg
p780_01 p570_170364 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure A-5 Installation selection menu
3. The following selections are possible:
– SOFTWARE to install
Press F4 to manually select the product that you want to install, as shown in Figure A-6
on page 365.
– PREVIEW only? (install operation will NOT occur)
To perform a preview only, select yes.
– COMMIT software updates?
Select no. Committed installations cannot be rolled back using the smitty reject fast
path. All new installations have an auto-commit.
– ACCEPT new license agreements?
Select yes.
Select Continue when prompted and check for any preview errors. If the result is OK,
change the Preview only field to no and press Enter to continue. Follow the same
procedure for all of the LPARs that participate in the cluster.
Updates: Make sure that you install all of the available updates.Appendix A. Administration concepts 365
Figure A-6 Manual installation product selection
Installing and configuring PowerHA
We show the initial setup of a cluster. PowerHA 7.1 depends on Cluster Aware AIX (CAA).
You do not have to set up the CAA cluster up front, because this task can be done by
PowerHA when creating a cluster. We use PowerHA 7.1 terminology. Refer to IBM PowerHA
SystemMirror 7.1 for AIX, SG24-7845, for list of the changes from the previous versions of
PowerHA.
If you have never set up PowerHA, using the typical setup helps make the configuration
simple. You can use the custom setup after the cluster is configured to make custom
changes. Follow this procedure:
1. Plan and prepare the cluster for communication.
In this step, you prepare your LPARs for the cluster. The book IBM PowerHA SystemMirror
7.1 for AIX, SG24-7845 explains the options in this section in detail. You have the following
options:
– Planning the cluster topology, resource groups, and resources
– Preparing networks and storage
– Preparing CAA repository disks
– Preparing the rhost file for communication between the LPARs
2. Set up the initial cluster. Type smitty sysmirror. Click Cluster Nodes and Networks ?
Initial Cluster Setup (Typical). This path takes you to the initial step-by-step setup of a
cluster. The PowerHA 7.1 menu was designed for ease of use. If you follow the menu
items, as shown in Example A-8 on page 366 sequentially, you create a simple yet
complete cluster. We follow the sequence logically in this example.366 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Example A-8 Initial cluster setup example
Initial Cluster Setup (Typical)
Move cursor to desired item and press Enter.
Setup a Cluster, Nodes and Networks
Define Repository Disk and Cluster IP Address
What are the repository disk and cluster IP address ?
F9=Shell F10=Exit Enter=Do
3. Select Setup a Cluster, Nodes and Networks and enter the initial details. Our details
differ from the details in your environment. Refer to Example A-9.
Example A-9 Setting up the cluster and nodes
Setup a Cluster, Nodes and Networks
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Cluster Name pough
Currently Configured Node(s) rflpar10 rflpar20
F9=Shell F10=Exit Enter=Do
4. We then set up CAA. Select Define Repository and Cluster IP Address. Notice that
hdisk2 is selected for the repository disk in Example A-10.
Example A-10 Defining CAA parameters
Define Repository and Cluster IP Address
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Cluster Name pough
* Repository Disk [hdisk2] +
Cluster IP Address []
F9=Shell F10=Exit Enter=DoAppendix A. Administration concepts 367
5. Example A-11 shows the disk that was created for the repository, and it shows that hdisk2
has changed to caa_private:
caa_private0 00f69af6dbccc5ed caavg_private active
Example A-11 Repository disk setup
# lspv
hdisk0 00c1f170c2c44e75 rootvg active
hdisk1 00f69af6dbccc57f None
caa_private0 00f69af6dbccc5ed caavg_private active
hdisk3 00c1f170c59a0939 None
Example A-12 shows the multicast address.
Example A-12 Multicast address
Define Repository Disk and Cluster IP Address
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
Cluster Name pough
Repository Disk caa_private0
Cluster IP Address 228.16.21.36
F1=Help F2=Refresh F3=Cancel F4=List
Esc+5=Reset F6=Command F7=Edit F8=Image
F9=Shell F10=Exit Enter=Do
6. At this point, you can synchronize the cluster. But first, in this example, we also add
persistent IP addresses before synchronizing the cluster. On the Cluster, Nodes and
Networks menu, select Manage Nodes ? Configure Persistent Node IP
Label/Addresses ? Add a Persistent Node IP Label/Address. Select a node and
press Enter. Figure A-7 shows a Configure Persistent Node IP Label/Addresses window.
Figure A-7 Node selection dialog 368 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
7. Enter the necessary details, as shown in Figure A-8.
Figure A-8 Setting up the persistent IP addresses for nodes
8. At this point, we verify the cluster. After the cluster verification is successful, you can start
the cluster. After this stage, most configurations can use the dynamic cluster
reconfiguration DARE facility.
Smitty: Where possible, use the smitty selection feature to avoid typing values. A plus
sign (+) next to a text box indicates that the available options that can be selected.Appendix A. Administration concepts 369
9. You can use the clstart shortcut to get to the Start Cluster Services window, which is
shown in Example A-13. Most of the details in the window that is shown in the example are
selected with the F4 or ESC+4 selection facility.
Example A-13 Starting the cluster services
Start Cluster Services
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Start now, on system restart or both now +
Start Cluster Services on these nodes [rflpar10,rflpar20] +
* Manage Resource Groups Automatically +
BROADCAST message at startup? true +
Startup Cluster Information Daemon? true +
Ignore verification errors? false +
Automatically correct errors found during Interactively +
cluster start?
F1=Help F2=Refresh F3=Cancel F4=List
Esc+5=Reset F6=Command F7=Edit F8=Image
F9=Shell F10=Exit Enter=Do
10.On a successful startup, you can observe the following characteristics:
– An OK status, as shown in Example A-14 on page 370.
– The Persistent IP address, as shown in Example A-15 on page 370.
– Startup details in the /var/hacmp/log/hacmp.out file.
– The clstat utility must show the status of the nodes (Example A-16 on page 371).370 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Example A-14 Successful cluster startup
COMMAND STATUS
Command: OK stdout: yes stderr: no
Before command completion, additional instructions may appear below.
[TOP]
WARNING: Multiple communication interfaces are recommended for networks that
use IP aliasing in order to prevent the communication interface from
becoming a single point of failure. There are fewer than the recommended
number of communication interfaces defined on the following node(s) for
the given network(s):
Node: Network:
---------------------------------- ----------------------------------
WARNING: Network option "routerevalidate" is set to 0 on the following nodes:
[MORE...128]
F1=Help F2=Refresh F3=Cancel F6=Command
F8=Image F9=Shell F10=Exit /=Find
n=Find Next
Example A-15 Persistent IP address
# ifconfig -a
en0:
flags=1e080863,480
inet 172.16.21.36 netmask 0xfffffc00 broadcast 172.16.23.255
inet 172.16.21.40 netmask 0xfffffc00 broadcast 172.16.23.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
lo0:
flags=e08084b,c0
inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255
inet6 ::1%1/0
tcp_sendspace 131072 tcp_recvspace 131072 rfc1323 1Appendix A. Administration concepts 371
Example A-16 Clstat facility to show cluster status
clstat - PowerHA SystemMirror Cluster Status Monitor
-------------------------------------
Cluster: pough (1111817142)
Thu Jun 2 10:34:18 EDT 2011
State: UP Nodes: 2
SubState: STABLE
Node: rflpar10 State: UP
Interface: rflpar10 (0) Address: 172.16.21.36
State: UP
Node: rflpar20 State: UP
Interface: rflpar20 (0) Address: 172.16.21.35
State: UP
************************ f/forward, b/back, r/refresh, q/quit
******************
Setting up resources
In this example, we use mutual takeover. The IP address and volume groups are failed over
between two LPARs. We created two resource groups with the following information:
Resource group poures1:
– Volume group data1vg
– Service Address rflpar10_svc
Resource group poures2:
– Volume group data2vg
– Service Address rflpar20_svc
Plan and identify all resource components that are used in the cluster. The steps that follow
are valid for our example. You might choose other options. Refer to IBM PowerHA
SystemMirror 7.1 for AIX, SG24-7845, for possible configurations. Follow these steps:
1. Set up service IP. Type smitty sysmirror. Select Cluster Applications and
Resources ? Resources ? Configure Service IP Labels/Addresses ? Add a
Service IP Label/Address ? Select Network • net_ether_01 (172.16.20.0/22) ?
Select required service address and press Enter, as shown in Figure A-9.
Figure A-9 Service IP address selection372 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
2. Set up application control scripts. This step was called application servers in previous
versions. Select Cluster Applications and Resources ? Resources ? Configure User
Applications (Scripts and Monitors) ? Application Controller Scripts ? Add
Application Controller Scripts. See Example A-17. The scripts must exist on both nodes
with the execute bit set. You must test these scripts before you include them in PowerHA.
Example A-17 Start and stop scripts for application servers
Add Application Controller Scripts
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]
* Application Contoller Name [lpar1appserver]
* Start Script [/hascripts/startlpar1>
* Stop Script [/hascripts/stoplpar1.>
Application Monitor Name(s) +
F1=Help F2=Refresh F3=Cancel F4=List
Esc+5=Reset F6=Command F7=Edit F8=Image
F9=Shell F10=Exit Enter=Do
3. Create the resource groups and include the application controller and service IP address
in the resource groups. This resource group becomes a unit that is controlled by PowerHA.
To add a resource group, select Cluster Applications and Resources ? Resource
Groups ? Add a Resource Group. See Figure A-10. The book, IBM PowerHA
SystemMirror 7.1 for AIX, SG24-7845, explains the startup, fallover, and fallback policies
in detail.
Figure A-10 Adding a resource groupAppendix A. Administration concepts 373
4. Populate the resource group with the highly available resources. Select Cluster
Applications and Resources ? Resource Groups ? Change/Show Resources and
Attributes for a Resource Group. Select the Resource Group Name from the pop-up
menu. Add the Application Controller, associated Volume Groups, and
Service IP Labels/Addresses, as shown in Figure A-11.
Figure A-11 Populating a resource group
5. Verify and synchronize the cluster configuration. Notice that the clstat - PowerHA
SystemMirror Cluster Status Monitor window changes. Compare the clstat in Figure A-12
on page 374 and the previous clstat in Example A-16 on page 371. Notice the availability
of the resource group.374 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure A-12 clstat with resource groups
We run several Logical Volume Manager (LVM) commands in a singe node. Example A-18
shows the following detail:
The lspv command shows that the volume groups are varied on in concurrent mode.
The lsvg command on the two-cluster volume groups shows the logical volumes in each
volume group.
The data2vg command shows closed logical volumes, because they are open on the
primary node.
The df command does not show the data2vg file systems.
Example A-18 LVM commands to see the contents of the cluster volume groups
# lspv
hdisk0 00f69af6dbccc621 rootvg active
hdisk1 00f69af6dbccc57f data1vg concurrent
caa_private0 00f69af6dbccc5ed caavg_private active
hdisk3 00c1f170c59a0939 data2vg concurrent
# lsvg -l data1vg
data1vg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
db2bin jfs2 96 96 1 open/syncd /opt/IBM/db2
loglv00 jfs2log 1 1 1 open/syncd N/A
data1lv jfs2 1 1 1 open/syncd /userdata/dat1
data1lg jfs2 1 1 1 open/syncd /userdata/dblog
data1tmp jfs2 1 1 1 open/syncd /userdata/dbtmp
swlv jfs2 96 96 1 open/syncd /somesoftware
db2rep jfs2 96 96 1 open/syncd /db2repos
# lsvg -l data2vg
data2vg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTAppendix A. Administration concepts 375
data2lv jfs2 1 1 1 closed/syncd N/A
loglv01 jfs2log 1 1 1 closed/syncd N/A
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted on
/dev/hd4 458752 31936 94% 10793 69% /
/dev/hd2 5668864 11904 100% 48285 88% /usr
/dev/hd9var 786432 87992 89% 6007 36% /var
/dev/hd3 262144 256208 3% 144 1% /tmp
/dev/hd1 32768 32064 3% 5 1% /home
/dev/hd11admin 262144 261384 1% 5 1% /admin
/proc - - - - - /proc
/dev/hd10opt 720896 345592 53% 6986 16% /opt
/dev/livedump 524288 523552 1% 4 1% /var/adm/ras/livedump
/aha - - - 43 1% /aha
/dev/fslv00 524288 509552 3% 17 1% /clrepos_private1
/dev/db2rep 6291456 5850240 8% 59 1% /db2repos
/dev/db2bin 6291456 3273864 48% 8381 3% /opt/IBM/db2
/dev/swlv 6291456 3182352 50% 4519 2% /somesoftware
/dev/data1lv 65536 64864 2% 4 1% /userdata/dat1
/dev/data1lg 65536 64864 2% 4 1% /userdata/dblog
/dev/data1tmp 65536 64864 2% 4 1% /userdata/dbtmp
#376 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 377
Appendix B. Performance concepts
This appendix provides an overview of systems performance concepts, benchmarks, and
specific POWER7 Enterprise Server performance data.
We discuss the following topics:
Performance concepts
Throughput versus response time
Performance and computing resources
B378 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Performance concepts
One of the key tasks of IT Architects and solution designers is to develop
“design-for-performance” as part of the overall system design. This type of design aims to
produce a solution that meets particular performance measurements, for example, to enable
the solution or the system to deliver a particular quality of service (QoS). Therefore, system
performance can be defined as “The ability of a system to meet a particular QoS”.
However, the measure of the QoS varies depending on the type of the workload, and the role
of the person defining or measuring the QoS. Sometimes, QoS can be qualitative as opposed
to being quantitative.
The QoS can mean user response time, throughput, system utilization, or returning to
operations. QoS can also mean on broader terms, system availability or power efficiency.
Performance measurement is usually defined from one of two perspectives:
System perspective: This perspective is typically based on throughput, which is the
average of items, such as transactions or processes per particular measured unit of time,
and utilization, which is the percentage of time that a particular resource is busy.
User perspective: This perspective is typically based on response time, which is the
average elapsed time from the initiation of the task to the point where the user or the
application receives the first response. The response time often is seen as a critical aspect
of performance because of its potential visibility to users or customers.
An example is a computer system that is running a web server with an online store. The
response time here is the elapsed time between clicking the submit button to place an order,
and the beginning of receiving the order confirmation.
Throughput versus response time
Throughput and response time are related. In many cases, a higher throughput comes at the
cost of poorer response time or slower response time. Better response time comes at the cost
of lower throughput. We often see response time graphs, such as the graph that is shown in
Figure B-1 on page 379, which represents the right half of an inverse parabola.Appendix B. Performance concepts 379
.
Figure B-1 Example of a typical response time versus throughput graph
For example, we can compare the performance of a 32-bit but relatively faster CPU (higher
GHz) to a 64-bit but relatively slower speed. The 32-bit relatively faster CPU provides better
response time but with less throughput than the 64-bit relatively slower CPU. The response
time, or the user perspective performance measurement, can be more qualitative than
quantitative, and it can vary considerably, because there are many variables involved.
It is critical to the performance modeler to precisely define what is measured in what
configuration and in what circumstances. The output is usually expressed in average values.
Benchmarks have been developed to address this issue. We discuss benchmarks in the next
section.
Performance and computing resources
In this section, we discuss the computing resources that affect the entire server performance:
CPU architecture
Multi-core architecture
Memory architecture
I/O storage architecture
I/O networking architecture380 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Central processing unit
The CPU has a central role in the performance of a computer system. The following CPU
architecture parameters are significant in defining the server performance:
The CPU clock frequency: A CPU cycle is the interval of time that is needed to perform
one operation. Modern CPUs can perform multiple operations during one CPU cycle. A
CPU cycle is not equal to a CPU instruction, which often requires multiple CPU cycles to
complete.
CPU instruction-set architecture: A CPU instruction is a machine or assembly language
instruction that specifies the operation that is given to a CPU. Examples for CPU
instructions are load, add, and store. IBM Power Systems use the Reduced Instruction Set
Computing (RISC) instruction-set technology, which usually has fixed lengths, requires
more-or-less fixed CPU cycles to execute, and uses a high number of CPU registers.
Other processors are base on the Complex Instruction Set Computing technology (CISC),
where the CPU instructions are more complex than others and therefore require more
CPU cycles to complete.
The path length: The number of instructions that it takes to complete a certain task.
Multithreading: Cache misses can delay the execution of instructions in a processor for
many cycles during which no other instruction can be executed. Multithreading addresses
this issue by simultaneously holding the state of two or more threads. When one thread
becomes stalled, due to a cache miss for example, the processor switches to the state and
attempts to execute the instructions of another thread. Hardware multithreading originally
was introduced with the models M80 and p680. Newer processors, such as POWER5,
POWER6, and POWER7, support Simultaneous Multithreading (SMT), which allows both
hardware threads to execute instructions at the same time. The POWER5 and POWER6
cores support single thread mode (ST) and simultaneous multithreading with two SMT
threads (SMT2). Each SMT thread is represented as a logical CPU in AIX. When running
in SMT2 mode, a system with a single POWER5 or POWER6 core has two logical CPUs.
The POWER7 core supports single thread mode (ST) and simultaneous multithreading
with two SMT threads (SMT2) and four SMT threads (SMT4). When running in SMT4
mode, a system with a single POWER7 core has four logical CPUs. To fully benefit from
the throughput improvement of SMT, the applications need to use all the SMT threads of
the processors.
Processor virtualization: In a virtualized environment, physical processors are represented
as virtual processors that can be shared across multiple partitions. The hypervisor assigns
physical processors to shared partitions (SPLPARs), which are also known as
micro-partitions, based on the capacity configurations and resource consumption of the
partitions. Virtual processor management, which is also known as processor folding,
dynamically increases and reduces the number of virtual processors of a shared partition
based on the instantaneous load of the partition. Many workloads benefit from virtual
processor management due to its higher degree of processor affinity.
Multiple core systems
Multiple processor systems are more complex than single processor systems, because
access to shared resources, such as memory, needs to be serialized, and the data needs to
be kept synchronized across the caches of the individual CPUs. It is important to note that the
server performance is not directly proportional to the number of CPUs. For example, the
performance of the system does not increase linearly with the number of CPUs.
There are various types of multiple CPU systems, as shown in Table B-1 on page 381.Appendix B. Performance concepts 381
Table B-1 Types of multiple CPU systems
Cluster (Shared Nothing)
Each processor is a stand-alone machine.
Each processor has its own copy of the operating system.
No resources shared communication through the network.
Share Disk MP
Each processor has its own memory and cache.
Each processor has its own copy of the operating system.
Processors run in parallel.
Processors share disks.
Communication through network.
Shared Memory Cluster (SMC)
All processors are in a shared memory cluster.
Each processor has its own resources.
Each processor has its own copy of the operating system.
Processors are tightly coupled.
Connected through switch.
All processors are tightly coupled; all processors are inside the
same box with a high-speed bus or switch.
Processors share memory, disks, and I/O devices.
There is one copy of the operating system.
Multi-threaded operating system.
Network
Computer
CPU
Core
Disk
Caches
RAM
Computer
CPU
Core
Disk
Caches
RAM
Network
Computer
CPU
Core
Caches
RAM
Computer
CPU
Core
Caches
RAM
Disk
Switch
Computer
CPU
Core
Disk
Caches
RAM
Computer
CPU
Core
Disk
Caches
RAM
Computer
CPU
Core
Caches
RAM
Disk
CPU
Core
Caches
RAM
CPU
Core
Caches
RAM
CPU
Core
Caches
RAM
Fabric Bus
RAM Network382 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Memory architecture
The memory of a computer system is divided into several layers of various speeds and sizes.
Typically, faster memory is more expensive to implement and, therefore, smaller in size.
Figure B-2 is a high-level representation of the memory hierarchy based on the location of the
memory in a computer system. The implementation of the individual memory layer might
differ depending on the implementation of a specific architecture.
Figure B-2 Memory hierarchy
Registers
CPU registers are the fastest memory and are the top layer of the memory hierarchy of a
computer system. A CPU has a limited number of registers, which can be used for integer
and floating-point operations.
Caches
Modern processors have multiple layers of caches. The fastest cache that is closest to the
registers is the Level 1 cache. It is the smallest cache and is often divided into a Level 1
instruction and Level 1 data cache.
The next level of cache is the Level 2 cache, which often holds instructions and data. The
Level 2 cache has higher access latency than the Level 1 cache, but it has the advantage that
it can be several megabytes in size.
Certain processors have a third level of cache, which either can be on the same chip as the
processor or external; in the later case, the processor has a cache controller.
Computer
CPU
Core /
Registers
L1 L1
L2 L3
RAM
Storage
SCSI, iSCSI,
Storage Server
Virtual DiskAppendix B. Performance concepts 383
Cache coherency
Cache coherency becomes an important factor in symmetric multiprocessor (SMP) systems
when each processor has its own cache. A coherency problem can occur when two or more
processors have a copy of the same data in their caches. To keep the data consistent, a
processor uses snooping logic to broadcast a message over the bus each time that its cache
has been modified. When a processor receives a message from another processor and
detects that another processor changed a value for an address that exists in its own cache, it
invalidates its own copy of the data, which is called cross-invalidate.
Cross invalidate and snooping affect the performance and scalability of SMT systems due to
the increased number of cache misses and increased bus traffic.
Random access memory
The next level in the memory hierarchy is the random access memory (RAM). It is much
slower than the caches but also much cheaper to produce. The size of the RAM in a computer
system can vary from several hundred megabytes on a small workstation to several terabytes
on high-end servers. A processor accesses RAM either through integrated memory
controllers or through bus systems, which connect it to an external memory controller.
Virtual memory is a method that allows the operating system to address more memory than a
computer system actually has in real memory. Virtual memory consists of real memory and
physical disk space that is used for working storage and file pages. On AIX, virtual memory is
managed by the Virtual Memory Manager (VMM).
VMM virtual segments
The AIX virtual memory is partitioned into virtual segments. Each virtual segment is a
continuous address space of 256 MB (default segment size) or 1 TB (super segment) and
further divided into pages. Pages can have multiple sizes and are not necessarily contiguous
in physical memory.
The VMM virtual segment type defines the type of pages for which the segment is being
used, for example, for working pages or file pages. Table B-2 lists the most commonly used
VMM virtual segment types.
Table B-2 VMM virtual segment types
Real memory is divided into page frames. The page frame size depends on the version of AIX
and the platform on which it is running. On existing systems that do not support page sizes
larger than 4 KB, the real memory is divided into 4 KB frames. Platforms and AIX versions
that support larger page sizes divide the memory into frames with multiple page sizes.
AIX 5.3 and later dynamically manage pools of 4 KB and 64 KB page sizes. Starting with AIX
6.1 on POWER6, individual segments can have 4 KB and 64 KB page sizes.
Segment type Purpose
Computational Processes private segments
Shared segments
Paging space
Client Enhanced journaled file system 2 (JFS2) file and executables
Network File System (NFS) files and executables
CD-ROM, DVD file system
Compressed JFS files and executables
Persistent JFS files and executables384 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Address translation
Applications that are running on AIX, 32-bit or 64-bit, have their own address space starting
from address 0 to the highest possible address. Shared segments, such as the shared library
segment, are mapped into the address space of the application.
When an application accesses memory, the effective address that is used by the application
is translated into a real memory address. The effective-to-real-address translation is done by
the processor, which maintains an effective-to-real-address (ERAT) translation table. When a
processor does not have the necessary information for the address translation, it tries to walk
the page frame table and access the translation lookaside buffer (TLB) first.
Memory affinity
Memory affinity is an approach to allocate memory that is closest to the processor on which a
process caused a page fault. The AIX memory affinity support allows user memory allocation
in a first-touch or round-robin (default) scheduling policy. The scheduling policy can be
specified for individual memory types, such as data, mapped files, shared memory, stack,
text, and unmapped files.
An efficient use of memory affinity requires an appropriate degree of processor affinity to
assure that application threads that are interrupted are re-dispatched to the processors from
which their memory was allocated.
Processor affinity
The goal of processor affinity is to reduce the number of cache misses by re-dispatching an
interrupted thread to the same processor on which it previously was running. The efficiency of
processor affinity mainly depends on the contents of the processor’s cache. In the best case,
the processor’s cache contains sufficient data from the thread, and the thread’s execution can
continue without any waits to resolve cache misses. In the worst case, the processor’s cache
has been depleted, and the thread will experience a series of cache misses.
Server I/O storage
Figure B-3 on page 385 demonstrates the various I/O paths that applications can use when
accessing data that is located on a local storage device.Appendix B. Performance concepts 385
Figure B-3 Types of I/O paths
The most commonly used I/O path is the file system I/O where applications read or write the
data that is managed by the file system. Applications can specify through the open flags
whether the data of a file must be cached in VMM (default) or directly accessed bypassing
VMM. Refer to Table B-3.
Table B-3 File access modes
File System
Application Application Application
File System
VMM
Raw Logical Volume
Disk Device Driver
Disk Adapter DD
Disk Adapter Disk Adapter Disk Adapter
Disk Adapter DD Disk Adapter DD
Disk Device Driver Disk Device Driver
Raw Logical Volume
Storage Storage Storage
Hardware Software
Software
Software
Cached
I/O
Direct
I/O
Hardware
Hardware
Raw LV Raw Disk
File access mode Description
Non-synchronous
I/O
Regular cached I/O (default unless specified otherwise); data is flushed out to disk through write
behind or syncd; the file system reads pages into memory ahead of time when the sequential
read access pattern is determined.
Synchronous I/O Cached I/O and writes to files do not return until the data has been written to disk; the file system
reads pages into memory ahead of time when the sequential read access pattern is determined.
Direct I/O Regular cached I/O (default unless specified otherwise); data is flushed out to disk through write
behind or syncd; the file system reads pages into memory ahead of time when the sequential
read access pattern is determined.
Concurrent I/O Same as direct I/O but without inode lock serialization.
Asynchronous I/O I/O is serviced asynchronously by the AIX kernel subsystem.386 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Certain applications, typically database applications, bypass the file system and VMM layers
and access the logical volumes directly. Bypassing the file system and VMM layers usually is
done to improve performance by reducing the path length.
Applications can bypass Logical Volume Manager (LVM) altogether by accessing the raw
disks directly. Similar to raw logical volumes, this method typically is done to improve
performance by reducing the path length.
Similar to the I/O path for local storage, the data of a Network File System (NFS) can be
cached by VMM (default) or accessed without caching by using the direct I/O mount option.
The concurrent I/O option can also be used, which results in access similar to direct I/O, but
without the rnode lock serialization. Any operation on an NFS is handled by the NFS client
that communicates with the NFS server using the User Datagram Protocol (UDP) or TCP
network protocol.
The server networking I/O
Figure B-4 demonstrates the traditional network hierarchy for computer systems that are
using the TCP/IP network protocol as a communication vehicle.
Figure B-4 The traditional network hierarchy of a stand-alone server
Most applications that communicate across networks use the sockets API as the
communication channel. A socket is an endpoint of a two-way communication channel. When
an application creates a socket, it specifies the address family, the socket type, and the
protocol. A new socket is “unnamed”, which means that it does not have any association to a
local or remote address. In this state, the socket cannot be used for a two-way
communication.
Host
Switch,
Router,
Point-2-Point
Partition
Application
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Network Adapter Media
Hardware
Software
Kernel UserAppendix B. Performance concepts 387
In a virtualized environment, physical network adapters are replaced by virtual adapters that
communicate with other systems through the hypervisor instead of a physical media.
Figure B-5 demonstrates the network hierarchy in a virtualized environment.
Figure B-5 The network hierarchy in a virtualized environment
Figure B-6 on page 388 demonstrates the Shared Ethernet Adapter (SEA) bridging the virtual
LAN (VLAN) for LPAR A and LPAR B to a physical Ethernet.
Host
LPAR A
Software
Kernel
Application User
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Virtual Adapter
LPAR B
Software
Kernel
Application User
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Virtual Adapter
LPAR C
Software
Kernel
Application User
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Virtual Adapter
Hypervisor
Hardware388 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Figure B-6 SEA VLAN bridging
POWER6-based machines provide the Host Ethernet Adapter (HEA) feature that is also
known as the Integrated Virtual Ethernet adapter (IVE), which allows the sharing of physical
Ethernet adapters across multiple logical partitions (LPARs). An HEA connects directly to the
GX+ bus and offers high throughput and low latency.
LPARs connect directly to HEAs and can access external networks through the HEA without
going through an SEA or another LPAR.
Figure B-7 on page 389 demonstrates the network hierarchy for LPARs that communicate to
an external network through an HEA.
Host
Network Adapter
Virtual
Adapter
Switch,
Router,
Point-2-Point
Virtual
Adapter
LPAR A LPAR B VIO Server
Virtual
Adapter
Hypervisor
Ethernet
Adapter
Shared
Ethernet
Adapter
EthernetAppendix B. Performance concepts 389
Figure B-7 LPARs’ network hierarchy
Performance metrics
We prefer to call the computer performance metrics “utilization metrics” because they mainly
measure the utilization of a particular resource. We refer to the output that the computer
actually delivers as “performance metrics”. The utilization metrics of computer systems are
mainly measured through the use of analysis tools.
The following high-level overview shows the major performance metrics for various system
components:
CPU:
– %user, %system, %idle, and %wait
– Physical consumed, entitlement
– Number of context switches, interrupts, and system calls
– Length of the run queue
Memory:
– Virtual memory paging statistics
– Paging rate of computational pages
– Paging rate of file pages
– Page replacement page scanning and freeing statistics
– Address translation faults
– Cache miss rates
Host
Network Adapter
LPAR A
Software
Kernel User
Application
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Virtual Adapter
LPAR B
Software
Kernel User
Application
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Virtual Adapter
LPAR C
Software
Kernel User
Application
Socket Layer
TCP/UDP Layer
IP Layer
Demux/IF Layer
Device Driver
Virtual Adapter
Host Ethernet Adapter (HEA)
Hardware
Switch,
Router,
Point-2-Point
Ethernet390 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Disk I/O:
– Amount of data read/written per second (KB, MB, or GB per second)
– Transactions per second
– Elapsed time for I/O to complete
– Queuing time
Network I/O:
– Amount of data transmitted/received per second (KB or MB per second)
– Number of packets transmitted/received per second
– Network CPU utilization
– Network memory utilization
– Network statistics, errors, and retransmissions
Performance benchmarks
Performance benchmarks are well defined problems or tests that serve as a basis to evaluate
and compare the performance of computer systems. Performance benchmark tests use
representative sets of programs and data that are designed to evaluate the performance of
computer hardware and software in a particular configuration.
There are industry standard benchmarks, such the TPC and SPEC benchmarks, and
non-industry benchmarks, such the IBM rPerf, and the IDEAS’ performance metric that is
called the Relative Performance Estimate v2 (RPE2).
IBM rPerf
Workloads have shifted over the last eight years, and IBM is committed to providing clients
with a relative system performance metric that reflects those changes. IBM publishes the
rPerf relative performance metric for the IBM Power Systems family of UNIX servers. This
metric replaced ROLTP which was withdrawn.
rPerf is a combination of several measurements of total systems commercial performance
that takes into account the demands on a server in today’s environment. It is derived from an
IBM analytical model, which uses characteristics from IBM internal workloads and
Transaction Processing Council (TPC) and Standard Performance Evaluation Corporation
(SPEC) benchmarks.
The rPerf model is not intended to represent any specific public benchmark results and must
not be reasonably used in that way. The model simulates certain system operations, such as
CPU, cache, and memory. However, the model does not simulate disk or network I/O
operations.
The IBM eServer™ pSeries® 640 is the baseline reference system and has a value of 1.0.
Although rPerf can be used to compare estimated IBM UNIX commercial processing
performance, actual system performance might vary and depends on many factors, including
system hardware configuration and software design and configuration.
IDEAS’ RPE2
RPE2 is a methodology that uses public domain material in conjunction with IDEAS’ own
research and analysis to calculate a number that represents an estimate of relative
performance for a specific processor type/number of processors combination.Appendix B. Performance concepts 391
SPEC benchmarks
SPEC provides a standardized set of benchmarks to evaluate the performance of the newest
generation of high-performance computers. The Standard Performance Evaluation
Corporation (SPEC) is a non-profit corporation that was formed to establish, maintain, and
endorse a standardized set of relevant benchmarks that can be applied to the newest
generation of high-performance computers. SPEC develops benchmark suites and also
reviews and publishes submitted results from our member organizations and other
benchmark licensees.
SPEC provides a standardized set of benchmarks to evaluate the performance of the newest
generation of high-performance computers:
SPEC CPU2006 is an industry-standard benchmark that is designed to provide
performance measurements that can be used to compare compute-intensive workloads
on separate computer systems, SPEC CPU2006 contains two benchmark suites:
CINT2006 for measuring and comparing compute-intensive integer performance, and
CFP2006 for measuring and comparing compute-intensive floating point performance.
For more information, see this website:
http://www.spec.org/cpu2006/
SPECpower_ssj2008 is the first industry-standard SPEC benchmark that evaluates the
power and performance characteristics of volume server class computers. SPEC has
designed SPECpower_ssj2008 to be used as both a benchmark to compare power and
performance among various servers and as a toolset to improve server efficiency.
The benchmark workload represents typical server-side Java business applications. The
workload is scalable, multi-threaded, and portable across a wide range of operating
environments, and economical to run. It exercises the CPUs, caches, memory hierarchy,
and scalability of shared memory processors (SMPs), as well as the implementations of
the Java virtual machine (JVM), Just-In-Time (JIT) compiler, garbage collection, threads,
and certain aspects of the operating system.
SPECjbb2005 is an industry-standard benchmark that is designed to measure the
server-side performance of the Java runtime environment (JRE).
SPECjAppServer2004 (Java Application Server) is a multi-tiered benchmark for
measuring the performance of a Java 2 Enterprise Edition (J2EE) technology-based
application server.
The SPECweb2005 benchmark includes workloads to measure banking, e-commerce,
and support web server performance using HTTP (non-secure), HTTPS (secure), and a
mix of secure and non-secure HTTP connections.
The SPECsfs97_R1 benchmark includes workloads to measure both NFS V2 and NFS V3
server performance over UDP and TCP. Due to NFS V2 and UDP becoming less prevalent
in client environments, the primary workload receiving the most focus is
SPECsfs97_R1.v3 over TCP. The metrics for this benchmark include peak throughput (in
NFS ops/sec) and response time (in msec/op).
TPC benchmarks
The TPC is a non-profit corporation that was founded to define transaction processing and
database benchmarks and to disseminate objective, verifiable TPC performance data to the
industry.
In this section, we provide a general description of the TPC benchmarks. The purpose of
these database benchmarks is to provide performance data to the industry.392 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
All the TPC results must comply with standard TPC disclosure policies and be reviewed by a
TPC auditor. A Full Disclosure Report and Executive Summary must be submitted to the TPC
before a result can be announced.
For further details about the TPC benchmarks and announced results, refer to the TPC
website:
http://www.tpc.org
The TPC-C benchmark emulates a moderately complex online transaction processing (OLTP)
environment. It simulates a wholesale supplier with a number of geographically distributed
sales districts and associated warehouses, managing orders where a population of users
executes transactions against a database.
The workload consists of five types of transactions:
New order: Enters a new order from a customer
Payment: Updates a customer’s balance (recording payment)
Order status: Retrieves the status of a customer’s most recent orders
Delivery: Deliver orders (queued for deferred execution)
Stock level: Monitors the stock (inventory) level
The TPC-H benchmark models a decision support system by executing ad hoc queries and
concurrent updates against a standard database under controlled conditions. The purpose of
the benchmark is to “provide relevant, objective performance data to industry users”
according to the specifications and all implementations of the benchmark. In addition to
adhering to the specifications, the benchmark must be relevant to real-world (that is, client)
implementations.
TPC-H represents the information analysis of an industry, which must manage, sell, or
distribute a product worldwide. The 22 queries answer questions in areas, such as pricing
and promotions, supply and demand management, profit and revenue management, client
satisfaction, market share, and shipping management. The refresh functions are not meant to
represent concurrent OLTP; they are meant to reflect the need to periodically update the
database.
The TPC-E benchmark simulates the OLTP workload of a brokerage firm. The focus of the
benchmark is the central database that executes transactions related to the firm’s customer
accounts. Although the underlying business model of TPC-E is a brokerage firm, the
database schema, data population, transactions, and implementation rules have been
designed to be broadly representative of modern OLTP systems.
Benchmark results
When performing a custom benchmark or Proof of Concept (PoC), it is important that you
construct the test to simulate the production environment. This simulation is especially
important as the hardware continues to evolve into the multi-core era and more time is being
invested in the cache/memory hierarchy.
The most common pitfall when running a custom benchmark or a PoC is that the benchmark
test does not simulate the real production environment and the benchmark result does not
represent the performance that the system will achieve in the production environment. The
achieved benchmark result might be much better for the benchmark test than for the real
production workload, which most likely will lead to performance problems later when running
the real workload. It also can happen the other way, potentially causing delays or the failure of
the PoC.Appendix B. Performance concepts 393
Comparing benchmark results
When comparing performance benchmark results, it is important to compare the results of the
same performance benchmark tests. The result of one benchmark test often does not
represent the performance of a computer system for another workload.
For example, the result of a floating point-intensive benchmark test does not provide any
information about the performance of the same computer running an integer-intensive
benchmark or an OLTP workload and vice versa.
A common pitfall in setting the wrong performance expectations is to look at the results of one
performance benchmark and apply it to another workload. For example, comparing the
benchmark results of two computer systems running an OLTP workload that shows that
machine A is 50% faster than machine B and expect that machine A will also be 50% faster
for a workload that was not measured.394 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 395
Appendix C. ITSO Power Systems testing
environment
In this appendix, we explain the ITSO Power Systems testing environment.
C396 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Austin environment
Figure C-1 shows the ITSO Power System environment at the IBM Austin labs.
Figure C-1 Austin environment
Poughkeepsie benchmark center environment
Figure C-2 on page 397 shows the Poughkeepsie benchmark center Power System
environment.Appendix C. ITSO Power Systems testing environment 397
Figure C-2 Poughkeepsie benchmark center Power System environment
ITSO Poughkeepsie environment
Figure C-3 on page 398 shows the ITSO Power Systems environment.398 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
.
Figure C-3 ITSO Poughkeepsie Power Systems environment© Copyright IBM Corp. 2011. All rights reserved. 399
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks publications
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that several publications referenced in this list might be available in softcopy
only.
IBM System z Personal Development Tool Volume 4 Coupling and Parallel Sysplex,
SG24-7859
Windows-based Single Sign-on and the EIM Framework on the IBM eServer iSeries
Server, SG24-6975
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
Integrating AIX into Heterogeneous LDAP environments, SG24-7165
IBM PowerVM Virtualization Introduction and Configuration, SG24-7940-04
IBM Power 770 and 780 Technical Overview and Introduction, REDP-4639
IBM Power 795 Technical Overview and Introduction, REDP-4640
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590-02
Exploiting IBM AIX Workload Partitions, SG24-7955
Integrated Virtual Ethernet Adapter Technical Overview and Introduction, REDP-4340
Hardware Management Console V7 Handbook, SG24-7491
IBM Systems Director Management Console: Introduction and Overview, SG24-7860
IBM PowerVM Live Partition Mobility, SG24-7460
IBM PowerHA SystemMirror 7.1 for AIX, SG24-7845
PowerHA for AIX Cookbook, SG24-7739
IBM AIX Version 6.1 Differences Guide, SG24-7559
IBM Electronic Services Support using Automation and Web Tools, SG24-6323
NIM from A to Z in AIX 5L, SG24-7296
PowerVM Migration from Physical to Virtual Storage, SG24-7825
AIX 5L Performance Tools Handbook, SG24-6039
Getting Started with PowerVM Lx86, REDP-4298
AIX Logical Volume Manager from A to Z: Introduction and Concepts, SG24-5432
You can search for, view, download or order these documents and other IBM Redbooks
publications, Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks400 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Other publications
These publications are also relevant as further information sources:
POWER7 RAS features see POWER7 System RAS Key Aspects of Power Systems
Reliability, Availability, and Serviceability
http://www-03.ibm.com/systems/power/hardware/whitepapers/ras7.html
IBM Power Platform Reliability, Availability, and Serviceability (RAS) - Highly Available
IBM Power Systems Servers for Business-Critical Applications, POW03003
IBM Power Architecture
http://domino.research.ibm.com/library/cyberdig.nsf/papers/8DF8C243E7B01D948525
787300574C77/$File/rc25146.pdf
PowerVM Virtualization Active Memory Sharing, REDP-4470
Online resources
These websites are also relevant as further information sources:
Power Systems Information Center
https://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp)
System Planning Tool
http://www.ibm.com/systems/support/tools/systemplanningtool
IBM PowerVM Editions
http://www.ibm.com/systems/power/software/virtualization/editions/index.html
To obtain a license for AME or to request a free 60 day trial
https://www-912.ibm.com/tcod_reg.nsf/TrialCod?OpenForm
IBM Power Systems firmware
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/power7.html
IBM Storage System drivers
http://www-03.ibm.com/system/support/storage/ssic/interoperability.wss
For the latest HMC software
http://www.ibm.com/support/fixcentral
IBM Fix Level Recommendation Tool (FLRT)
http://www14.software.ibm.com/webapp/set2/flrt/home
Information about the virtual I/O server and the latest downloads
http://www14.software.ibm.com/webapp/set2/sas/f/vios/download/home.html
Linux kernel security updates and downloads
http://www14.software.ibm.com/webapp/set2/sas/f/pm/component.html
Virtual I/O server data sheet
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/datasheet.ht
ml Related publications 401
IBM Live Partition Mobility (LPM) information
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hc3
/iphc3whatsnew.htm
IBM Power Systems Hardware Information Center
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ed3
/p7ed3cm_matrix_mmb.htm
Director download link
http://www.ibm.com/systems/software/director/resources.html
VMControl installation download
http://www-03.ibm.com/systems/software/director/downloads/plugins.html
AEM plug-in installation download
http://www-03.ibm.com/systems/software/director/downloads/plugins.html
Processor compatibility mode information
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7hc3
/iphc3pcmdefs.htm
IBM PowerCare
http://www-03.ibm.com/systems/power/support/PowerCare/
Help from IBM
IBM Support and downloads
ibm.com/support
IBM Global Services
ibm.com/services402 Power Systems Enterprise Servers with PowerVM Virtualization and RAS © Copyright IBM Corp. 2011. All rights reserved. 403
Index
Numerics
2-way SMT (SMT2) 296
4-way SMT (SMT4) 296
A
Active Energy Management (AEM) 185
Active Energy Manager 37, 343
Active Energy Manager (AEM) 338
Active Memory Expansion 8, 13
Active Memory Expansion (AME) 8, 13, 19, 250, 254,
307
Active Memory Mirroring 13, 273
Active Memory Sharing 6–8, 13, 46, 109, 219
Active Memory Sharing (AMS) 46, 254, 282
Active migration 68, 77
Active Partition Mobility 117
Advanced Energy Manager (AEM) 160
Advanced planning event 86
Advanced System Management Interface (ASMI) 267
Alternate Processor Recovery 273
Alternate processor recovery algorithm 23
Application WPAR 72
Autonomic Health Advisor FileSystem (AHAFS) 92
Autonomic Health Advisory File System API (AHAFS) 89
Availability 3
Availability optimization assessment (AOA) 339
Availability optimization service 339
B
Barrier synchronization registers (BSR) 69
Base operating system (BOS) 356
BM Systems Director VMControl Standard Edition 342
C
Capacity BackUp 89
Capacity on Demand 7
Capacity on Demand (CoD) 88
Capacity on demand (CoD) 24
CEC Concurrent Maintenance (CCM) 123
CEC Hot Add Repair Maintenance (CHARM) 17, 123
CEC hot add repair maintenance (CHARM) 198
Central Electronic Complex (CEC) 16, 123, 198
Check point/restart 77
Chipkill 19
Cluster Aware AIX (CAA) 83, 89
Cluster interface 330
Cluster type 84
Command
./lwiupdatemgr.sh 173
/opt/ibm/director/bin/smstart 186
/opt/ibm/director/bin/smstop 186
amepat 237, 254, 319
bosboot 298
cfgassist 224
cfgdev 225, 232
cfgmgr 62, 145, 361
chdev -dev ent0 -attr large_send=1 281
chdev -dev ent3 -attr largesend=1 281
chfs 84
chgrp 84
chhwres 15
chhwres -r mem -m -o r -q --id 300
chhwres -r proc -m -o r --procs --id 300
chlv 84
chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
289
clcmd 92
clstat 246
data2vg 374
db2set 289
db2set DB2_LARGE_PAGE_MEM=DB 289
df 144, 374
dscrctl 295
export IFX_LARGE_PAGES=1 289
gzip -cd SysDir_VMControl__Linux/AIX.tar.gz | tar
-xvf - 170
ifconfig en0 -largesend 281
ifconfig en0 largesend 281
installp -agXY -d -p 363
installp Preview 363
iostat 275, 314, 317
iostat -b 317
ldedit -b lpdata 289
loadopt 40
lparstat 307, 309, 311, 314, 328
lparstat -c 321
lparstat -Ew 311
lparstat -H 313
lparstat -h 312
lparstat -i 310
lparstat -X -o /tmp/lparstat_data.xml 317
lpartstat 254
lscfg -vl 108
lsconf 151, 217–218
lsconfig -V 129
lsdev 40, 141, 359, 361
lshwres 15, 162
lslparmigr 68
lslparmigr -r manager 213
lsmap 40, 234
lsmemopt -m 15
lsnports 233
lspartition -dlpar 219
lspv 374
lsrep 39
lssp 39404 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
lssrad 293, 300, 325
lsvg 141, 374
migratepv 141
mkauthkeys 362
mkdvd 138
mkrep 39
mksysb 138, 356
mksysplan 156
mkvdev 39, 57, 223
mkvdev -sea ent1 -vadapter ent5 -default ent5 -defaultid 1 -attr thread=0 281
mkvopt 39
mount ahafs /aha /aha 93
mpstat 293, 314, 325
mpstat -d 327
mpstat -w -O sortcolumn=us,sortorder=desc,topcount=10 2 314
nmon 324
nohup ./nmem64 -m 2000 -s 3000 241
optmem -x -o start -t mirror -q 15
optmem -x -o stop 15
pcmpath query device 141
ppc64_cpu 259
raso 317
raso -o biostat=1 317
replacepv 141
rmdev 207
rmdev -dl vscsi0 360
rmss 275
sar 275, 308, 314–315
sar -c -O sortcolumn=scall/s,sortorder=desc,topcount=10 -P ALL 1 315
shutdown -Fr 145
smcli lsmemopt -m 15
smcli optmem -x -o start -t mirror -q 15
smcli optmem -x -o stop 15
smitty ffdc 23
smstatus 186
smtctl 253, 259, 298
smtctl -t 1 309
smtctl -t 2 308
svmon 272, 291, 293, 325
svmon -O summary=ame 321
topas 254, 275, 309, 311, 314, 316, 321
topas -L 310
topas_nmon 309
vfcmap 234
vmo 129, 292–293
vmo –p –o lgpg_regions= -o lgpg_size=16777216
289
vmo -p -o v_pinshm=1 289
vmstat 254, 275, 290, 314
vmstat -c 321
Common agent 172
Complex Instruction Set Computing technology (CISC)
380
Component Trace (CT) 23
Concurrent Firmware maintenance (CFM) 35
Concurrent GX adapter add 125
Customer Specified Placement (CSP) 156
D
Data backups 2
Data conversions 2
Data Stream Control Register (DSCR) 295
Dedicated processor partitions 34
Deferred updates 35
DRAM sparing 19
Dual-threaded (SMT2) 258
Dynamic Automatic Reconfiguration (DARE) 82
Dynamic Logical Partitioning 6
Dynamic logical partitioning (DLPAR) 115
Dynamic Power Saver 37
Dynamic power saver 37
Dynamic power saving 190
Dynamic processor deallocation 23
Dynamic processor sparing 23
E
Electronic Service Agent 127, 333
Electronic service agent 94
Electronic Service Agent (ESA) 14
Electronic service agent (ESA) 94
Electronic services 94
Energy Optimized Fans 37
EnergyScale for I/O 37
Enterprise Identity Mapping (EIM) 347
ESA (Electronic Service Agent) 17
Etherchannel 84
Event infrastructure 92
F
Favor Performance 37
Favor Power 37
Favour energy mode 190
Favour performance mode 190
File
/etc/exclude.rootvg 140
/etc/inittab 296
Fileset
bos.ahafs 93
First Failure Data Capture (FFDC) 23
Flex Capacity Upgrade on Demand 87
G
Global Environment 71
Group Services 90
GX Add 131
GX Repair 131
H
Hardware Management Console (HMC) 7, 25, 95, 127,
160, 205, 208
Hardware management Console (HMC) 153
Hardware page table (HPT) 70
Hardware scrubbing 25
Hardware upgrades 2
HBA (Host Bus Adapter) 107 Index 405
Health Insurance Portability and Accountability Act of
1996 (HIPPA) 344
High Performance Computing (HPC) 259
Host Based Adapters (HBAs) 140
Host channel adapter (HCA) 16
Host Ethernet Adapter (HEA) 48, 388
Host node repair 207
Hot GX adapter add/repair 13
Hot node add 28, 123
Hot node repair 28
Hot node upgrade 28
Hot node upgrade (memory) 123
Hot upgrade 205
I
IBM PowerVM 5
Image repository 168
Inactive memory units 88
Inactive migration 68, 70
Inactive mobility 77
Inactive Partition Mobility 117
Inactive processor cores 88
Integrated Virtual Ethernet (IVE) 48, 56
Integrated Virtual Ethernet Adapter (IVE) 8, 219
Integrated Virtual Ethernet adapter (IVE) 388
Integrated Virtualization Manager 267
Integrated Virtualization Manager (IVM) 7, 41, 110, 160,
231
L
Lightweight Directory Access Protocol (LDAP) 347
Lightweight Directory Access Protocol server (LDAP)
347
Lightweight Memory Trace (LMT) 23
Link aggregation 104
Link Aggregation (LA) 84
Live Application Mobility 77
Live Dump 23
Live Partition Mobility 8, 66, 115, 161
Live Partition Mobility 6
Live relocation 175
Local Host Ethernet Adapters (LHEA) 107
Logical Memory Block (LMB) 119
Logical memory block (LMB) 267
Logical memory block size (LMB) 69
M
Managed server 71
Mathematical Acceleration Subsystem (MASS) 299
MaxCore 261
MaxCore mode 28
Memory defragmentation 14
Memory page deallocation 25
Mover Service Partition 70
Mover service partition 224
Multiple Shared Processor Pool 6
Multiple Shared Processor Pools 7
N
N_port ID Virtualization 6
N_Port ID Virtualization (NPIV) 281
Network Authentication Service (NAS) 347
Network File System (NFS) 386
Network interface backup 84
Network Interface Backup (NIB) 223
Networks 83
Node Add 131
Node evacuation 206
Node interface 330
Node list interface 330
Node Repair 131
Node Upgrade 131
Nodes 83
Non-volatile RAM (NVRAM) 70
O
Off peak schedule 17
On/Off Capacity on Demand 89
Open Virtualization Format (OVF) 342
P
Paging devices 47
Partition availability priority 24
Partition Power Management 37
Payment Card Industry Data Security Standard (PCI
DSS) 344
Power Capping 37
Power capping 189
Absolute value 189
Percentage value 189
Power distribution units (PDU) 102
Power Flex 84
POWER hypervisor (PHYP) 33
Power Instruction Set Architecture (ISA) 256
Power saving 190
Power storage protection keys 260
Power Trending 37
PowerVM Enterprise Edition 6
PowerVM Hypervisor 6
PowerVM Lx86 6
PowerVM Standard Edition 6
Prepare for Hot Repair or Upgrade utility (PHRU) 206
Prepare for Hot Repair/Upgrade (PHRU) 17
Processor Core Nap 37
Processor Folding 37
Processor folding 380
Processor instruction retry algorithms 23
Q
Quad-threaded (SMT4) 258
Quality of Service (QoS) 378
R
Random Access Memory (RAM) 383
Redbooks website 399406 Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Contact us xiii
Reduced Instruction Set Computing (RISC) 380
Redundant I/O 17
Redundant service processor 25
Release Level 35–36
Reliability 3
Reliability, Availability and Serviceability (RAS) 1, 123,
273
Reliable Scalable Cluster Technology (RSCT) 90
Relocation policies 168
Automatic relocation 168
Manual relocation 168
Policy based relocation 168
Resource allocation domain (RAD) 325
Resource group 84
Resource Monitoring and Control (RMC) 66, 90, 95
RSCT Peer Domains (RPD) 91
Run-Time Error Checking (RTEC) 23
S
Sarbanes-Oxley Act of 2002 (SOX) 344
Scheduler Resource Allocation Domain (SRAD) 325
Script
dbstart 235
dbstart.sh 241
nmem64 235
webstart 235
Segment Lookaside Buffer (SLB) 291
Server Power Down 37
Service Pack 35
Service packs 36
Serviceability 3
Shared aliases 291
Shared Dedicated Capacity 7
Shared Ethernet Adapter 84
Shared Ethernet Adapter (SEA) 56, 223, 388
Shared memory pool 47
Shared processor partitions 34
Shared Storage Pools 6
Simultaneous multi-thread (SMT2) 253
Simultaneous Multi-Threading (SMT) 7
Simultaneous Multithreading (SMT) 296, 380
Simultaneous Multithreading Mode (SMT) 258
Single Instruction Multiple Data (SIMD) 259
single points of failures (SPOFs) 103
Single processor checkstop 24
Single thread (ST) 253
Single thread mode (ST) 380
Single-chip module (SCM) 30
Single-threaded (ST) 258
Software upgrades 2
Solid State Disk (SSD) 283
Static power saving 190
Static Power Server 37
Static relocation 175
Symmetrical Multi-Processor (SMP) 383
System Director Management Console (SDMC) 7, 25,
66, 104, 208, 267, 359
System firmware 35
System firmware mirroring 13
System migrations 2
System Planning Tool (SPT) 105, 153
System pool 167
System WPAR 72
Systems Director Management Console (SDMC) 95, 153
T
Technical and Delivery Assessment (TDA) 152
Technology Independent Machine Interface (TIMI) 35
Thermal Reporting 37
Time of day (ToD) 70
Tool
amepat 21
Topology Services 90
Translation Control Entry (TCE) 125
Transport Layer Security (TLS) protocol 96
Trial PowerVM Live Partition Mobility 9
TurboCore 254, 261
TurboCore mode 28
U
Uninteruptable power supply (UPS) 103
Unshared aliases 291
Utility Capacity on Demand 89
V
Vector Media Extensions (VMX) 259
Vector Multimedia Extension (VME) 148
Vector Scalar Extension (VSX) 148, 259
Vector scalar extension (VSX) 299
Virtual appliance 167
Virtual Asynchronous Service interface (VASI) 67
Virtual Ethernet 8
Virtual farm 168
Virtual I/O Server 6, 8
Virtual Memory Manager (VMM) 383
Virtual SCSI 8
Virtual Server 71
Virtual server 167
Virtual Servers 51
Virtual servers 161
W
Workload 167
Workload Partition (WPAR) 71
Workload Partitions 8
World Wide Port Name(WWPN) 43(0.5” spine)
0.475”<->0.873”
250 <-> 459 pages
Power Systems Enterprise Servers with PowerVM Virtualization and RAS
Power Systems Enterprise Servers with PowerVM Virtualization and RAS
with PowerVM Virtualization and RAS
Power Systems Enterprise Servers
RAS
with PowerVM Virtualization and
Power Systems Enterprise Servers RAS
with PowerVM Virtualization and
Power Systems Enterprise Servers
RAS
with PowerVM Virtualization and
Power Systems Enterprise Servers ®
SG24-7965-00 ISBN 0738436267
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
®
Power Systems Enterprise Servers
with PowerVM Virtualization and RAS
Unleash the IBM
Power Systems
virtualization features
Understand reliability,
availability, and
serviceability
Learn about various
deployment case
scenarios
This IBM Redbooks publication illustrates implementation, testing, and
helpful scenarios with IBM Power Systems 780 and 795 using the
comprehensive set of the Power virtualization features. We focus on
the Power Systems functional improvements, in particular, highlighting
the reliability, availability, and serviceability (RAS) features of the
enterprise servers.
This document highlights IBM Power Systems Enterprise Server
features, such as system scalability, virtualization features, and logical
partitioning among others. This book provides a documented
deployment model for Power 780 and Power 795 within a virtualized
environment, which allows clients to plan a foundation for exploiting
and using the latest features of the IBM Power Systems Enterprise
Servers.
The target audience for this book includes technical professionals (IT
consultants, technical support staff, IT Architects, and IT Specialists)
responsible for providing IBM Power Systems solutions and support.
Back cover
ibm.com/redbooks Redpaper
Front cover
IBM Power 710 and 730
Technical Overview
and Introduction
Alexandre Bicas Caldeira
Carlo Costantini
Steve Harnett
Volker Haug
Craig Watson
Fabien Willmann
The 8231-E1C and 8231-E2C based on the latest
POWER7 processor technology
PowerVM, enterprise-level RAS all
in an entry server package
2U rack-mount design for
midrange performanceInternational Technical Support Organization
IBM Power 710 and 730 Technical Overview and
Introduction
November 2011
REDP-4796-00© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
First Edition (November 2011)
This edition applies to the IBM Power 710 (8231-E1C) and Power 730 (8231-E2C) Power Systems servers.
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.© Copyright IBM Corp. 2011. All rights reserved. iii
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team who wrote this paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1. General description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Systems overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 The Power 710 server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 The Power 730 server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Operating environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Physical package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Power 710 system features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Power 730 system features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.3 Minimum features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.4 Power supply features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.5 Processor module features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.6 Memory features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Disk and media features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 I/O drawers for Power 710 and Power 730 servers . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1 12X I/O drawer PCIe expansion units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.2 EXP 12S SAS drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.3 EXP24S SFF Gen2-bay drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6.4 I/O drawers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Build to order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.8 IBM Editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.9 Server and virtualization management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.10 System racks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.10.1 IBM 7014 Model S25 rack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.10.2 IBM 7014 Model T00 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.10.3 IBM 7014 Model T42 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10.4 Feature code 0555 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10.5 Feature code 0551 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10.6 Feature code 0553 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10.7 The ac power distribution unit and rack content . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.10.8 Rack-mounting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.10.9 Useful rack additions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.10.10 OEM rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 2. Architecture and technical overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 The IBM POWER7 Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1.1 POWER7 processor overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.2 POWER7 processor core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.3 Simultaneous multithreading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.4 Memory access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30iv IBM Power 710 and 730 Technical Overview and Introduction
2.1.5 Flexible POWER7 processor packaging and offerings . . . . . . . . . . . . . . . . . . . . . 30
2.1.6 On-chip L3 cache innovation and Intelligent Cache . . . . . . . . . . . . . . . . . . . . . . . 32
2.1.7 POWER7 processor and Intelligent Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1.8 Comparison of the POWER7 and POWER6 processors . . . . . . . . . . . . . . . . . . . 33
2.2 POWER7 processor modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.1 Modules and cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2.2 Power 710 and Power 730 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Memory subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.1 Registered DIMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.2 Memory placement rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.3 Memory bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4 Capacity on Demand. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Factory deconfiguration of processor cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.6 System bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.7 Internal I/O subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.1 Slot configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.7.2 System ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.8 PCI adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.8.1 PCIe Gen1 and Gen2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.8.2 PCIe adapter form factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.8.3 LAN adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8.4 Graphics accelerator adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8.5 SAS adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.8.6 PCIe RAID and SSD SAS adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.8.7 Fibre Channel adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.8.8 Fibre Channel over Ethernet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8.9 InfiniBand Host Channel adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8.10 Asynchronous adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.9 Internal storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.9.1 RAID support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.9.2 External SAS port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.9.3 Media bays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.10 External I/O subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.10.1 12X I/O Drawer PCIe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.10.2 Dividing SFF drive bays in a 12X I/O drawer PCIe . . . . . . . . . . . . . . . . . . . . . . . 55
2.10.3 12X I/O drawer PCIe and PCI-DDR 12X Expansion Drawer 12X cabling . . . . . 58
2.10.4 12X I/O Drawer PCIe and PCI-DDR 12X Expansion Drawer SPCN cabling . . . 59
2.11 External disk subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.11.1 EXP 12S SAS Expansion Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.11.2 EXP24S SFF Gen2-bay Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.11.3 IBM System Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.12 Hardware Management Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.12.1 HMC functional overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.12.2 HMC connectivity to the POWER7 processor-based systems . . . . . . . . . . . . . . 67
2.12.3 High availability using the HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.12.4 HMC code level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.13 IBM Systems Director Management Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.14 Operating system support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.14.1 Virtual I/O Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.14.2 IBM AIX operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.14.3 IBM i operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.14.4 Linux operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.14.5 Java Supported versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Contents v
2.14.6 Boost performance and productivity with IBM compilers . . . . . . . . . . . . . . . . . . 75
2.15 Energy management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.15.1 IBM EnergyScale technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.15.2 Thermal power management device card. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Chapter 3. Virtualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.1 POWER Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2 POWER processor modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.3 Active Memory Expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 PowerVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.1 PowerVM editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4.2 Logical partitions (LPARs). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.4.3 Multiple Shared Processor Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.4.4 Virtual I/O Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.4.5 PowerVM Live Partition Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.4.6 Active Memory Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.4.7 Active Memory Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.4.8 N_Port ID virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.4.9 Operating system support for PowerVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4.10 POWER7 Linux programming support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.5 System Planning Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Chapter 4. Continuous availability and manageability . . . . . . . . . . . . . . . . . . . . . . . . 117
4.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.1.1 Designed for reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.1.2 Placement of components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.1.3 Redundant components and concurrent repair. . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.2.1 Partition availability priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.2.2 General detection and deallocation of failing components . . . . . . . . . . . . . . . . . 120
4.2.3 Memory protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.2.4 Cache protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.2.5 Special Uncorrectable Error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.2.6 PCI extended error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3 Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.3.1 Detecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3.2 Diagnosing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3.3 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.3.4 Notifying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.3.5 Locating and servicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.4 Manageability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.4.1 Service user interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.4.2 IBM Power Systems firmware maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.4.3 Electronic Services and Electronic Service Agent . . . . . . . . . . . . . . . . . . . . . . . 147
4.5 Operating system support for RAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153vi IBM Power 710 and 730 Technical Overview and Introduction© Copyright IBM Corp. 2011. All rights reserved. vii
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area.
Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service that does
not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in
other operating environments may vary significantly. Some measurements may have been made on development-level
systems and there is no guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this
document should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. viii IBM Power 710 and 730 Technical Overview and Introduction
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Active Memory™
AIX®
Electronic Service Agent™
EnergyScale™
Focal Point™
IBM Systems Director Active Energy
Manager™
IBM®
Micro-Partitioning™
POWER Hypervisor™
Power Systems™
POWER4™
POWER5™
POWER5+™
POWER6+™
POWER6®
POWER7™
PowerHA™
PowerVM™
POWER®
pSeries®
Redbooks®
Redpaper™
Redbooks (logo) ®
System Storage®
System x®
System z®
Tivoli®
The following terms are trademarks of other companies:
Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries.
LTO, Ultrium, the LTO Logo and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S.
and other countries.
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others. © Copyright IBM Corp. 2011. All rights reserved. ix
Preface
This IBM® Redpaper™ publication is a comprehensive guide covering the IBM Power 710
(8231-E1C) and Power 730 (8231-E2C) servers supporting IBM AIX®, IBM i, and Linux
operating systems. The goal of this paper is to introduce the innovative Power 710 and
Power 730 offerings and their major functions, including these:
The IBM POWER7™ processor available at frequencies of 3.0 GHz, 3.55 GHz, and
3.7 GHz.
The specialized POWER7 Level 3 cache that provides greater bandwidth, capacity,
and reliability.
The 2-port 10/100/1000 Base-TX Ethernet PCI Express adapter included in the base
configuration and installed in a PCIe Gen2 x4 slot.
The integrated SAS/SATA controller for HDD, SSD, tape, and DVD. This controller
supports built-in hardware RAID 0, 1, and 10.
The latest IBM PowerVM™ virtualization, including PowerVM Live Partition Mobility and
PowerVM IBM Active Memory™ Sharing.
Active Memory Expansion technology that provides more usable memory than is
physically installed in the system.
IBM EnergyScale™ technology that provides features such as power trending,
power-saving, capping of power, and thermal measurement.
Professionals who want to acquire a better understanding of IBM Power Systems™
products can benefit from reading this Redpaper publication. The intended audience includes
these roles:
Clients
Sales and marketing professionals
Technical support professionals
IBM Business Partners
Independent software vendors
This paper complements the available set of IBM Power Systems documentation by providing
a desktop reference that offers a detailed technical description of the Power 710 and
Power 730 systems.
This paper does not replace the latest marketing materials and configuration tools. It is
intended as an additional source of information that, together with existing sources, can be
used to enhance your knowledge of IBM server solutions.
The team who wrote this paper
This paper was produced by a team of specialists from around the world working at the
International Technical Support Organization, Poughkeepsie Center.
Alexandre Bicas Caldeira works on the Power Systems Field Technical Sales Support team
for IBM Brazil. He holds a degree in Computer Science from the Universidade Estadual
Paulista (UNESP). Alexandre has more than 11 years of experience working with IBM and
IBM Business Partners on Power Systems hardware, AIX, and PowerVM virtualization x IBM Power 710 and 730 Technical Overview and Introduction
products. He is also skilled with IBM System Storage®, IBM Tivoli® Storage Manager, IBM
System x® and VMware.
Carlo Costantini is a Certified IT Specialist for IBM and has over 33 years of experience with
IBM and IBM Business Partners. He currently works in Italy Power Systems Platforms as
Presales Field Technical Sales Support for IBM Sales Representatives and IBM Business
Partners. Carlo has broad marketing experience and his current major areas of focus are
competition, sales, and technical sales support. He is a certified specialist for Power Systems
servers. He holds a master’s degree in Electronic Engineering from Rome University.
Steve Harnett is a Senior Accredited Professional, Chartered IT Professional, and member
of the British Computing Society. He currently works as a pre-sales Technical Consultant in
the IBM Server and Technology Group in the UK. Steve has over 16 years of experience
working in post sales supporting Power Systems. He is a product Topgun and a recognized
SME in Electronic Service Agent™, Hardware Management Console, and High end Power
Systems. He also has several years of experience developing and delivering education to
clients, business partners, and IBMers.
Volker Haug is a certified Consulting IT Specialist within IBM Systems and Technology
Group, based in Ehningen, Germany. He holds a bachelor's degree in Business Management
from the University of Applied Studies in Stuttgart. His career has included more than 24
years working in the IBM PLM and Power Systems divisions as a RISC and AIX Systems
Engineer. Volker is an expert in Power Systems hardware, AIX, and PowerVM virtualization.
He is POWER7 Champion and also a member of the German Technical Expert Council, an
affiliate of the IBM Academy of Technology. He has written several books and white papers
about AIX, workstations, servers, and PowerVM virtualization.
Craig Watson has 15 years of experience working with UNIX-based systems in roles
including field support, systems administration, and technical sales. He has worked in the
IBM Systems and Technology group since 2003 and is currently working as a Systems
Architect, designing complex solutions for customers that include Power Systems, System x,
and System Storage. He holds a master’s degree in Electrical and Electronic Engineering
from the University of Auckland.
Fabien Willmann is an IT Specialist working with Techline Power Europe in France. He has
10 years of experience with Power Systems, AIX, and PowerVM virtualization. After teaching
hardware courses about Power Systems servers, he joined ITS as an AIX consultant, where
he developed his competencies in AIX, HMC management, and PowerVM virtualization.
Building new Power Systems configurations for STG pre-sales is his major area of expertise
today. Recently he also gave a workshop on the econfig configuration tool, focused on
POWER7 processor-based BladeCenters during the symposium for French Business
Partners in Montpellier.
The project that produced this publication was managed by:
Scott Vetter, IBM Certified Project Manager and PMP.
Thanks to the following people for their contributions to this project:
Larry Amy, Gary Anderson, Sue Beck, Terry Brennan, Pat Buckland, Paul D. Carey,
Pete Heyrman, John Hilburn, Dan Hurlimann, Kevin Kehne, James Keniston, Jay Kruemcke,
Robert Lowden, Hilary Melville, Thoi Nguyen, Denis C. Nizinski, Pat O’Rourke, Jan Palmer,
Ed Prosser, Robb Romans, Audrey Romonosky, Todd Rosedahl, Melanie Steckham,
Ken Trusits, Al Yanes
IBM U.S.A. Preface xi
Stephen Lutz
IBM Germany
Ta mi k i a Ba r r ow
International Technical Support Organization, Poughkeepsie Center
Now you can become a published author, too!
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our papers to be as helpful as possible. Send us your comments about this paper or
other IBM Redbooks® publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806xii IBM Power 710 and 730 Technical Overview and Introduction
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html© Copyright IBM Corp. 2011. All rights reserved. 1
Chapter 1. General description
The IBM Power 710 (8231-E1C) and IBM Power 730 servers (8231-E2C) utilize the latest
POWER7 processor technology designed to deliver unprecedented performance, scalability,
reliability, and manageability for demanding commercial workloads.
The high data transfer rates offered by the Peripheral Component Interconnect Express (PCI
Express) Gen2 slots can allow higher I/O performance or consolidation of the I/O demands
on to fewer adapters running at higher rates. This can result in better system performance at
a lower cost when I/O demands are high.
The Power 710 server is a high-performance, energy efficient, reliable, and secure
infrastructure and application server in a dense form factor. It contains innovative
workload-optimizing technologies that maximize performance based on client computing
needs and intelligent energy features that help maximize performance and optimize
energy efficiency resulting in one of the most cost-efficient solutions for AIX, IBM i, and
Linux deployments.
The IBM Power 730 server delivers the outstanding performance of the POWER7 processor
in a dense, rack-optimized form factor and is ideal for running multiple application and
infrastructure workloads in a virtualized environment. You can take advantage of the
Power 730 server’s scalability and capacity by making use of the IBM industrial strength
PowerVM technology to fully utilize the server’s capability.
12 IBM Power 710 and 730 Technical Overview and Introduction
1.1 Systems overview
The following sections provide detailed information about the Power 710 and
Power 730 systems.
1.1.1 The Power 710 server
The IBM Power 710 server is a 2U rack-mount server with one processor socket offering
4-core 3.0 GHz, 6-core 3.7 GHz, and 8-core 3.55 GHz configurations. The POWER7
processor chips in this server are 64-bit, 4-core, 6-core, and 8-core modules with 4 MB of L3
cache per core and 256 KB of L2 cache per core.
The Power 710 server supports a maximum of eight DDR3 DIMM slots, with four DIMM slots
included in the base configuration and four DIMM slots available with an optional memory
riser card. This allows for a maximum system memory of 128 GB.
The Power 710 server offers three storage backplane options. The first supports three SFF
SAS HDDs or SSDs, an SATA DVD, and a half-high tape drive. The second supports six
SFF SAS HDDs or SSDs and an SATA DVD. These choices both provide an integrated SAS
controller offering RAID 0, 1, and 10 support. The third option supports six SFF SAS HDDs or
SSDs, an SATA DVD, and adds support for Dual Write Cache RAID 5 or 6, and an external
SAS port. HDDs and SSDs are hot-swap and front accessible with each of the alternatives.
The Power 710 comes with five PCI Express (PCIe) Gen2 low profile (LP) slots for installing
adapters in the system. The system also comes with a PCIe x4 Gen2 Low Profile expansion
slot containing a 2-port 10/100/1000 Base-TX Ethernet PCI Express adapter.
Figure 1-1 shows the Power 710 server containing six SFF disk drives and a DVD drive.
Figure 1-1 IBM Power 710 server
1.1.2 The Power 730 server
The IBM Power 730 server is a 2U rack-mount server with two processor sockets offering
8-core 3.0 GHz and 3.7 GHz, 12-core 3.7 GHz, and 16-core 3.55 GHz configurations. The
POWER7 processor chips in this server are 64-bit, 4-core, 6-core, and 8-core modules with
4 MB of L3 cache per core and 256 KB of L2 cache per core. The new Power 730 also
provides expanded I/O capabilities using the high-performance Gen2 PCIe interfaces, and
adds the capability of additional I/O via the 12x PCIe I/O expansion drawers.
Remember: The Integrated Virtual Ethernet (IVE) adapter is not available for the
Power 710.Chapter 1. General description 3
The Power 730 server supports a maximum of 16 DDR3 DIMM slots, with four DIMM slots
included in the base configuration. A maximum of three additional memory riser cards, each
containing four DIMM slots, can be installed, allowing maximum system memory of 256 GB.
The Power 730 server offers three storage backplane options. The first supports three SFF
SAS hard disk drives (HDDs) or solid-state drives (SSDs), an SATA DVD, and a half-high
tape drive. The second supports six SFF SAS HDDs or SSDs and an SATA DVD. These
choices both provide an integrated SAS controller, offering RAID 0, 1, and 10 support. The
third option supports six SFF SAS HDDs or SSDs, an SATA DVD, and adds support for Dual
Write Cache RAID 5, 6, and an external SAS port. HDDs and SSDs are hot-swap and front
accessible with each of the alternatives.
The Power 730 comes with five PCI Express (PCIe) Gen2 low profile (LP) slots for installing
adapters in the system. The system also comes with a PCIe x4 Gen2 Low Profile expansion
slot containing a 2-port 10/100/1000 Base-TX Ethernet PCI Express adapter.
Figure 1-2 shows the Power 710 server containing three SFF disk drives, a DVD drive, and a
tape drive.
Figure 1-2 IBM Power 730
1.2 Operating environment
Table 1-1 provides the operating environment specifications for the servers.
Table 1-1 Operating environment for Power 710 and Power 730
Remember: The Integrated Virtual Ethernet (IVE) adapter is not available for the
Power 730.
Power 710 and Power 730 operating environment
Description Operating Non-operating
Power 710 Power 730 Power 710 Power 730
Temperature 5 - 35 degrees C (41 - 95
degrees F)
Recommended: 18 to 27 degrees C
(64 to 80 degrees F)
5 - 45 degrees C (41 - 113
degrees F)
Relative humidity 8 - 80% 8 - 80%
Maximum dew point 28 degrees C (84 degrees F) N/A4 IBM Power 710 and 730 Technical Overview and Introduction
1.3 Physical package
Table 1-2 shows the physical dimensions of the Power 710 and Power 730 chassis. Both
servers are available only in a rack-mounted form factor. Each takes 2U (2 EIA units) of
rack space.
Table 1-2 Physical dimensions
Operating voltage 100 - 127 V ac
or 200 - 240
V ac
200 - 240 V ac N/A
Operating frequency 47 - 63 Hz N/A
Power consumption 750 watts
maximum
1,260 watts
maximum
N/A
Power source loading 0.765 kVA
maximum
1.286 kVA
maximum
N/A
Thermal output 2,560 BTU/hour
maximum
4,300 BTU/hour
maximum
N/A
Maximum altitude 3,050 m
(10,000 ft)
N/A
Noise level reference point
(operating/idle)
6.0 bels 6.6 bels N/A
Tip: The maximum measured value is expected from a fully populated server under an
intensive workload. The maximum measured value also accounts for component tolerance
and non ideal operating conditions. Power consumption and heat load vary greatly by
server configuration and utilization. Use the IBM Systems Energy Estimator to obtain a
heat output estimate based on a specific configuration, available at this webiste:
http://www-912.ibm.com/see/EnergyEstimator
Power 710 and Power 730 operating environment
Description Operating Non-operating
Power 710 Power 730 Power 710 Power 730
Dimension Power 710 (8231-E1C) Power 730 (8231-E2C)
Width 440 mm (19.0 in) 440 mm (19.0 in)
Depth 706 mm (27.8 in) 706 mm (27.8 in)
Height 89 mm (3.5 in) 89 mm (3.5 in)
Weight (maximum configuration) 28.2 kg (62 lbs) 28.2 kg (62 lbs)Chapter 1. General description 5
Figure 1-3 shows the rear view of a Power 730 system.
Figure 1-3 Rear view of a Power 730 system
1.4 System features
The system chassis contains one processor module (Power 710) or two processor modules
(Power 730). Each POWER7 processor module has either 4 cores, 6 cores, or 8 cores. Each
of the POWER7 processors in the server has a 64-bit architecture, up to 2 MB of L2 cache
(256 KB per core), and up to 32 MB of L3 cache (4 MB per core).
1.4.1 Power 710 system features
This summary describes the standard features of the Power 710:
Rack-mount (2U) chassis
Single processor module:
– Four-core configuration using one 4-core 3.0 GHz processor module
– Six-core configuration using one 6-core 3.7 GHz processor module
– Eight-core configuration using one 8-core 3.55 GHz processor module
Up to 128 GB of 1066 MHz DDR3 ECC memory
Choice of three disk/media backplanes:
– Six 2.5-inch SAS HDDs or SSDs and one DVD bay, and an integrated SAS controller,
offering RAID 0, 1, and 10 support
– Six 2.5-inch SAS HDDs or SSDs, an SATA DVD, and adds support for Dual Write
Cache RAID 5, 6, and an external SAS port
– Three 2.5-inch HDD/SSD/Media backplane with one tape drive bay and one DVD bay,
an integrated SAS controller, offering RAID 0, 1, and 10 support
A PCIe x4 Gen2 Low Profile expansion slot containing a 2-port 10/100/1000 Base-TX
Ethernet PCI Express adapter
Five PCIe x8 low profile slots
One GX++ slot
GX++ Slot 1 PCIe x8 Slots
External
SAS Port
Power
Supplies
System
Ports
HMC
Ports
USB
Ports
1 Gb Ethernet
or
GX++ Slot 26 IBM Power 710 and 730 Technical Overview and Introduction
Integrated:
– Service Processor
– EnergyScale technology
– Hot-swap and redundant cooling
– Three USB ports
– Two system ports
– Two HMC ports
Optional redundant, 1725 watt ac hot-swap power supplies
1.4.2 Power 730 system features
This summary describes the standard features of the Power 730:
Rack-mount (2U) chassis
Two processor modules:
– Eight-core configuration using two 4-core 3.0 GHz processor modules
– Eight-core configuration using two 4-core 3.7 GHz processor modules
– Twelve-core configuration using two 6-core 3.7 GHz processor modules
– Sixteen-core configuration using two 8-core 3.55 GHz processor modules
Up to 256 GB of 1066 MHz DDR3 ECC memory
Choice of three disk/media backplanes:
– Six 2.5-inch SAS HDDs or SSDs and one DVD bay, and an integrated SAS controller,
offering RAID 0, 1, and 10 support
– Six 2.5-inch SAS HDDs or SSDs, an SATA DVD, and adds support for Dual Write
Cache RAID 5, 6, and an external SAS port
– Three 2.5-inch HDD/SSD/Media backplane with one tape drive bay and one DVD bay,
and an integrated SAS controller, offering RAID 0, 1, and 10 support
A PCIe x4 Gen2 Low Profile expansion slot with either a 2-port 10/100/1000 Base-TX
Ethernet PCI Express adapter or a GX++ dual-port 12x Channel Attach adapter
Five PCIe x8 low profile slots
Two GX++ slots
Integrated:
– Service Processor
– EnergyScale technology
– Hot-swap and redundant cooling
– Three USB ports
– Two system ports
– Two HMC ports
Two power supplies, 1725 Watt ac, hot-swap
1.4.3 Minimum features
Each system has a minimum feature-set in order to be valid.
The minimum Power 710 initial order must include a processor module, processor
activations, memory, one HDD/SSD, a storage backplane, a power supply and power cord,
an operating system indicator, a chassis indicator, and a Language Group Specify.Chapter 1. General description 7
The minimum Power 730 initial order must include two processor modules, processor
activations, memory, one HDD/SSD, a storage backplane, two power supplies and power
cords, an operating system indicator, a chassis indicator, and a Language Group Specify.
If IBM i is the Primary Operating System (#2145), the initial order must also include one
additional HDD/SSD, Mirrored System Disk Level Specify Code, and a System Console
Indicator. A DVD is defaulted on every order but can be de-selected. A DVD-ROM or
DVD-RAM must be accessible by the system.
1.4.4 Power supply features
One 1725 watt ac power supply (#5603) is required for the Power 710. A second power
supply is optional. Two 1725 watt ac power supplies (#5603) are required for the
Power 730. The second power supply provides redundant power for enhanced system
availability. To provide full redundancy, the two power supplies must be connected to
separate power sources.
The server will continue to function with one working power supply. A failed power supply can
be hot swapped but must remain in the system until the replacement power supply is
available for exchange.
1.4.5 Processor module features
Each of the processor modules in the system houses a single POWER7 processor chip. The
processor chip has either 4 cores, 6 cores, or 8 cores. The Power 710 supports one
processor module. The Power 730 supports two processor modules. Both processor modules
in the system must be identical.
The number of installed cores in a Power 710 or Power 730 must be equal to the number of
ordered activation codes’ features.
Table 1-3 summarizes the processor features available for the Power 710.
Table 1-3 Processor features for the Power 710
The Power 730 requires that two identical processor modules be installed. Table 1-4 lists the
available processor features.
Table 1-4 Processor features for the Power 730
Remember: If IBM i, AIX, or Linux is the primary operating system, no internal HDD or
SSD is required if feature SAN Load Source Specify (Boot from SAN) (#0837) is selected.
A Fibre Channel or FCoE adapter must be ordered if feature #0837 is selected.
Feature code Processor module description
#EPC1 4-core 3.0 GHz POWER7 processor module
#EPC2 6-core 3.7 GHz POWER7 processor module
#EPC3 8-core 3.55 GHz POWER7 processor module
Feature code Processor module description
#EPC1 4-core 3.0 GHz POWER7 processor module
#EPC4 4-core 3.7 GHz POWER7 processor module8 IBM Power 710 and 730 Technical Overview and Introduction
1.4.6 Memory features
In POWER7 processor-based systems, DDR3 memory is used throughout. The POWER7
DDR3 memory uses a new memory architecture to provide greater bandwidth and capacity.
This enables operating at a higher data rate for larger memory configurations.
Memory in the systems is installed into memory riser cards. One memory riser card is
included in the base system. The base memory riser card does not appear as a feature code
in the configurator. One additional memory riser card, feature #5265, can be installed in the
Power 710. Three additional memory riser cards, feature #5265, can be installed in the
Power 730. Each memory riser card provides four DDR3 DIMM slots. DIMMs are available in
capacities of 2 GB, 4 GB, 8 GB, and 16 GB at 1066 MHz and are installed in pairs.
Table 1-5 shows the memory features available on the systems.
Table 1-5 Summary of memory features
It is generally best that memory be installed evenly across all memory riser cards in the
system. Balancing memory across the installed memory riser cards allows memory
access in a consistent manner and typically results in the best possible performance for
your configuration.
1.5 Disk and media features
The Power 710 and 730 systems feature an integrated SAS controller, offering RAID 0, 1,
and 10 support with three storage backplane options:
Feature code #EJ0D supports six SFF disk units, either HDD or SSD, and an SATA DVD.
There is no support for split backplane and for RAID 5 or 6.
Feature code #EJ0E supports three small form-factor (SFF) disk units, either HDD or
SSD, an SATA DVD, and a tape. There is no support for split backplane and for RAID 5
or 6.
Feature code #EJ0F supports six SFF disk units, either HDD or SSD, an SATA DVD, and
and an external SAS port. RAID 5 and 6 are supported.
#EPC2 6-core 3.7 GHz POWER7 processor module
#EPC3 8-core 3.55 GHz POWER7 processor module
Feature code Processor module description
Feature code Feature capacity Access rate DIMMs
#EM04 4 GB 1066 MHz 2 x 2 GB DIMMs
#EM08 8 GB 1066 MHz 2 x 4 GB DIMMs
#EM16 16 GB 1066 MHz 2 x 8 GB DIMMs
#EM32 32 GB 1066 MHz 2 x 16 GB DIMMs
Remember: The memory cards operate at lower voltage to save energy. Therefore, they
cannot be interchanged with the 8 GB and 16 GB memory features that are used with the
8231-E2B model.Chapter 1. General description 9
Table 1-6 shows the available disk drive feature codes that can be installed in the in the
Power 710 and Power 730.
Table 1-6 Disk drive feature code description
Table 1-7 shows the available disk drive feature codes to be installed in an I/O enclosure
external to the Power 710 and Power 730.
Table 1-7 Non CEC Disk drive feature code description
Feature code Description OS support
#1890 69 GB SFF SAS SSD AIX, Linux
#1883 73.4 GB 15 K RPM SAS SFF Disk Drive AIX, Linux
#1886 146.8 GB 15 K RPM SFF SAS Disk Drive AIX, Linux
#1917 146 GB 15 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1775 177 GB SFF-1 SSD w/ eMLC Disk Drive AIX, Linux
#1995 177 GB SSD Module with EMLC AIX, Linux
#1925 300 GB 10 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1885 300 GB 10 K RPM SFF SAS Disk Drive AIX, Linux
#1880 300 GB 15 K RPM SAS SFF Disk Drive AIX, Linux
#1953 300 GB 15 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1790 600 GB 10 K RPM SAS SFF Disk Drive AIX, Linux
#1964 600 GB 10 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1909 69 GB SFF SAS SSD IBM i
#1884 69.7 GB 15 K RPM SAS SFF Disk Drive IBM i
#1888 139.5 GB 15K RPM SFF SAS Disk Drive IBM i
#1947 139 GB 15K RPM SAS SFF-2 Disk Drive IBM i
#1787 177 GB SFF-1 SSD w/ eMLC Disk Drive IBM i
#1996 177 GB SSD Module with EMLC IBM i
#1956 283 GB 10 K RPM SAS SFF-2 Disk Drive IBM i
#1911 283 GB 10 K RPM SFF SAS Disk Drive IBM i
#1879 283 GB 15 K RPM SAS SFF Disk Drive IBM i
#1948 283 GB 15 K RPM SAS SFF-2 Disk Drive IBM i
#1916 571 GB 10 K RPM SAS SFF Disk Drive IBM i
#1962 571 GB 10 K RPM SAS SFF-2 Disk Drive IBM i
Feature code Description OS support
#3586 69 GB 3.5" SAS SSD AIX, Linux
#3647 146 GB 15 K RPM SAS Disk Drive AIX, Linux
#1793 177 GB SFF-2 SSD w/ eMLC AIX, Linux10 IBM Power 710 and 730 Technical Overview and Introduction
If you need more disks than available with the internal disk bays, you can attach additional
external disk subsystems. For more detailed information about the available external disk
subsystems, see 2.11, “External disk subsystems” on page 60.
SCSI disks are not supported in the Power 710 and 730 disk bays. Also, as there is no PCIe
LP SCSI adapter available, you cannot attach existing SCSI disk subsystems.
The Power 710 and Power 730 have a slim media bay that can contain an optional
DVD-RAM (#5762). If feature #EJ0E has been selected for the storage backplane, a tape bay
is available that can contain a tape drive or removable disk drive.
#3648 300 GB 15 K RPM SAS Disk Drive AIX, Linux
#3649 450 GB 15 K RPM SAS Disk Drive AIX, Linux
#3587 69 GB 3.5" SAS SSD IBM i
#3677 139.5 GB 15 K RPM SAS Disk Drive IBM i
#1794 177 GB SFF-2 SSD w/ eMLC IBM i
#3678 283.7 GB 15 K RPM SAS Disk Drive IBM i
#3658 428 GB 15 K RPM SAS Disk Drive IBM i
Tip: Be aware of these considerations for SAS-bay-based SSDs (#1775, #1787, #1793,
#1794, #1890, and #1909):
Feature codes #1775, #1787, #1890, and #1909 are supported in the Power 710 and
Power 730 CEC.
Feature codes #1793 and #1794 are not supported in the Power 710 and Power 730
CEC. They are supported in the EXP24S SFF Gen2-bay Drawer (#5887) only.
SSDs and disk drives (HDDs) are not allowed to mirror each other.
When an SSD is placed in the feature code #EJ01 backplane, no EXP 12S Expansion
Drawer (#5886) or EXP24S SFF Gen2-bay Drawer (#5887) is supported to connect to
the external SAS port.
HDD/SSD Data Protection: If IBM i (#2145) is selected, one of the following is required:
– Disk mirroring (default), which requires feature code #0040, #0043, or #0308
– SAN boot (#0837)
– RAID which requires feature code #5630
– Mixed Data Protection (#0296)
Backplane function: The Power 710 and Power 730 models do not support the split
backplane function.
Internal Docking Station: The Internal Docking Station for Removable Disk Drive (#1123)
is supported for AIX and Linux.
Feature code Description OS supportChapter 1. General description 11
Table 1-8 shows the available media device feature codes for Power 710 and Power 730.
Table 1-8 Media device feature code description for Power 710 and 730
For more detailed information about the internal disk features, see 2.9, “Internal storage” on
page 49.
1.6 I/O drawers for Power 710 and Power 730 servers
The Power 710 and Power 730 servers support the attachment of I/O drawers. The
Power 710 supports disk-only I/O drawers (#5886 and #5887), providing large storage
capacity and multiple partition support. The Power 730 supports disk-only I/O drawers
(#5886, #5887), and also two 12X attached I/O drawers (#5802 and #5877), providing
extensive capability to expand the overall server.
These I/O drawers are supported on the Power 710 and Power 730 servers:
EXP 12S holds 3.5-inch SAS disk or SSD (#5886).
EXP24S holds 2.5-inch SAS disk or SSD (#5887).
12X I/O drawer PCIe, SFF disk (#5802).
12X I/O drawer PCIe, no disk (#5877).
1.6.1 12X I/O drawer PCIe expansion units
The 12X I/O drawer PCIe, SFF disk (#5802) and 12X I/O drawer PCIe, no disk (#5877)
expansion units are 19-inch, rack-mountable, I/O expansion drawers that are designed to be
attached to the system using 12x double date rate (DDR) cables. The expansion units can
accommodate 10 generation 3 blind-swap cassettes. These cassettes can be installed and
removed without removing the drawer from the rack.
The #5802 I/O drawer has these attributes:
Eighteen SAS hot-swap SFF disk bays
Ten PCIe based blind-swap I/O adapter slots
Redundant hot-swappable power and cooling units
The #5877 drawer is the same as #5802 except that it does not support any disk bays.
Feature code Description
#1106 USB 160 GB Removable Disk Drive
#1107 USB 500 GB Removable Disk Drive
#1123 Internal Docking Station for Removable Disk Drive
#1124 DAT160 80/160 GB SAS Tape Drive
#5762 SATA Slimline DVD-RAM Drive
Tips: Note this information:
A single #5886 or #5877 drawer can be cabled to the CEC external SAS port when a
#EJ0F DASD backplane is installed in the 8231-E1C and in the 8231-E2C.
A 3 Gbps YI cable (#3687) is used to connect the drawer to the CEC external SAS port.
Feature #5886 and #5887 drawers are not available with the 4-core processor
feature #EPC1.12 IBM Power 710 and 730 Technical Overview and Introduction
A maximum of two #5802 or #5877 drawers can be placed on the same 12X loop. The #5877
I/O drawer can be on the same loop as the #5802 I/O drawer. A #5877 drawer cannot be
upgraded to a #5802 drawer.
1.6.2 EXP 12S SAS drawer
The EXP 12S SAS drawer (#5886) is a 2 EIA drawer and mounts in a 19-inch rack. The
drawer can hold either SAS disk drives or SSD. The EXP 12S SAS drawer has twelve
3.5-inch SAS disk bays with redundant data paths to each bay. The SAS disk drives or SSDs
contained in the EXP 12S are controlled by one or two PCIe or PCI-X SAS adapters
connected to the EXP 12S via SAS cables.
The feature #5886 can also be directly attached to the SAS port on the rear of the Power 710
and Power 730, providing a very low-cost disk storage solution. When used this way, the
imbedded SAS controllers in the system unit drive the disk drives in EXP12S. A second unit
cannot be cascaded to a feature #5886 attached in this way.
1.6.3 EXP24S SFF Gen2-bay drawer
The EXP24S SFF Gen2-bay drawer is an expansion drawer supporting up to twenty-four
2.5-inch hot-swap SFF SAS HDDs on POWER6® or POWER7 servers in 2U of 19-inch rack
space. The EXP24S bays are controlled by SAS adapters/controllers attached to the I/O
drawer by SAS X or Y cables.
The SFF bays of the EXP24S are different from the SFF bays of the POWER7 system units
or 12X PCIe I/O drawers (#5802 and #5803). The EXP24S uses Gen2 or SFF-2 SAS drives
that physically do not fit in the Gen1 or SFF-1 bays of the POWER7 system unit or 12X PCIe
I/O Drawers or vice versa.
The EXP24S includes redundant AC power supplies and two power cords.
1.6.4 I/O drawers
Depending on the system configuration, the maximum number of I/O drawers supported can
vary. Table 1-9 summarizes the maximum number of I/O drawers and external disk only I/O
drawers supported.
Table 1-9 Maximum number of I/O drawers supported and total number of PCI slots
Tip: Mixing #5802/#5877 and #5796 on the same loop is not supported. Mixing #5802 and
#5877 on the same loop is supported with a maximum of two drawers total per loop.
Server Processor cards Max #5802 and
#5877 drawers
Max #5886
drawers
Max #5887
drawers
Power 710 One Not supported 8 4
Power 730 Two 2 28 14
Remember: The Power 710 4-core model does not support the 12X I/O drawer.Chapter 1. General description 13
1.7 Build to order
You can perform a build to order or a la carte configuration using the IBM Configurator for
e-business (e-config), where you specify each configuration feature that you want on the
system. You build on top of the base required features, such as the embedded Integrated
Virtual Ethernet adapter.
Preferably, begin with one of the available starting configurations, such as the IBM Editions.
These solutions are available at initial system order time with a starting configuration that is
ready to run as is.
1.8 IBM Editions
IBM Editions are available only as an initial order. If you order a Power 710 or Power 730
Express server IBM Edition as defined next, you can qualify for half the initial configuration's
processor core activations at no additional charge.
The total memory (based on the number of cores) and the quantity/size of disk, SSD, Fibre
Channel adapters, or Fibre Channel over Ethernet (FCoE) adapters shipped with the server
are the only features that determine whether a customer is entitled to a processor activation
at no additional charge.
When you purchase an IBM Edition, you can purchase an AIX, IBM i, or Linux operating
system license, or you can choose to purchase the system with no operating system. The
AIX, IBM i, or Linux operating system is processed by means of a feature code on AIX 5.3
or 6.1, or 7.1, IBM i 6.1.1 or IBM i 7.1, and SUSE Linux Enterprise Server or Red Hat
Enterprise Linux. If you choose AIX 5.3, 6.1 or 7.1 for your primary operating system, you can
also order IBM i 6.1.1 or IBM i 7.1 and SUSE Linux Enterprise Server or Red Hat Enterprise
Linux. The opposite is true if you choose an IBM i or Linux subscription as your primary
operating system.
These sample configurations can be changed as needed and still qualify for processor
entitlements at no additional charge. However, selection of total memory or HDD/SSD/Fibre
Channel/FCoE adapter quantities smaller than the totals defined as the minimums
disqualifies the order as an IBM Edition, and the no-charge processor activations are
then removed.
Consider these minimum definitions for IBM Editions:
A minimum of 1 GB memory per core on the 4-core Power 710 (#8350), 2 GB memory per
core on the 6-core and 8-core Power 710 (#8349 and #8359), and 4 GB memory per core
on the Power 730 is needed to qualify for the IBM Edition, except on the 6-core
IBM Edition where there is a 16 GB minimum memory requirement for the Power 710.
There can be various valid memory configurations that meet the minimum requirement.
Also, a minimum of two HDDs, or two SSDs, or two Fibre Channel adapters, or two FCoE
adapters is required. You only need to meet one of these disk/SSD/FC/FCoE criteria.
Partial criteria cannot be combined.
1.9 Server and virtualization management
If you want to implement partitions, a Hardware Management Console (HMC) or the
Integrated Virtualization Manager (IVM) is required to manage the Power 710 and Power 730
servers. In general, multiple IBM POWER6 and POWER7 processor-based servers can be 14 IBM Power 710 and 730 Technical Overview and Introduction
supported by a single HMC.
If an HMC is used to manage the Power 710 and Power 730, the HMC must be a rack-mount
CR3 or later or deskside C05 or later.
The IBM Power 710 and IBM Power 730 servers require the Licensed Machine Code
Version 7 Revision 740.
Existing HMC models 7310 can be upgraded to Licensed Machine Code Version 7 to support
environments that can include IBM POWER5™, IBM POWER5+™, POWER6, and POWER7
processor-based servers. Licensed Machine Code Version 6 (#0961) is not available for
7042 HMCs.
When IBM Systems Director is used to manage an HMC, or if the HMC manages more than
254 partitions, the HMC must have a minimum of 3 GB RAM and must be a rack-mount CR3
model or later or deskside C06 or later.
1.10 System racks
The Power 710 and Power 730 are designed to mount in the 25U 7014-S25 (#0555), 36U
7014-T00 (#0551), or the 42U 7014-T42 (#0553) rack. These racks are built to the 19-inch
EIA standard.
If a system is to be installed in a non-IBM rack or cabinet, ensure that the rack meets the
requirements described in 1.10.10, “OEM rack” on page 21.
Remember: If you do not use an HMC or IVM, the Power 710 and Power 730 runs in full
system partition mode. That means that a single partition owns all the server resources,
and only one operating system can be installed.
Tip: You can download or order the latest HMC code from the Fix Central website:
http://www.ibm.com/support/fixcentral
Remember: At the time of writing, the SDMC is not supported for the Power 710
(8231-E1C) and Power 730 (8231-E2C) models.
IBM intends to enhance the IBM Systems Director Management Console (SDMC) to
support the Power 710 (8231-E1C) and Power 730 (8231-E2C). IBM also intends for the
current Hardware Management Console (HMC) 7042-CR6 to be upgradable to an IBM
SDMC that supports the Power 710 (8231-E1C) and Power 730 (8231-E2C).
Remember: A new Power 710 or Power 730 server can be ordered with the appropriate
7014 rack model. The racks are available as features of the Power 710 and Power 730
only when an additional external disk drawer for an existing system (MES order) is
ordered. Use the rack feature code if IBM manufacturing has to integrate the newly
ordered external disk drawer in a 19-inch rack before shipping the MES order.Chapter 1. General description 15
1.10.1 IBM 7014 Model S25 rack
The 1.3 Meter (49-inch) Model S25 rack has these features:
25 EIA units
Weights
– Base empty rack: 100.2 kg (221 lb.)
– Maximum load limit: 567.5 kg (1250 lb.)
The S25 racks do not have vertical mounting space that will accommodate #7188 PDUs. All
PDUs required for application in these racks must be installed horizontally in the rear of the
rack. Each horizontally mounted PDU occupies 1U of space in the rack, and therefore
reduces the space available for mounting servers and other components.
1.10.2 IBM 7014 Model T00 rack
The 1.8 Meter (71-in.) Model T00 is compatible with past and present IBM Power systems.
The T00 rack has these features:
Thirty-six EIA units (36 U) of usable space.
Optional removable side panels.
Optional highly perforated front door.
Optional side-to-side mounting hardware for joining multiple racks.
Standard business black or optional white color in OEM format.
Increased power distribution and weight capacity.
Support for both ac and dc configurations.
The rack height is increased to 1926 mm (75.8 in.) if a power distribution panel is fixed to
the top of the rack.
Up to four power distribution units (PDUs) can be mounted in the PDU bays (Figure 1-4 on
page 17), but others can fit inside the rack. See 1.10.7, “The ac power distribution unit and
rack content” on page 16.
Weights:
– T00 base empty rack: 244 kg (535 lb.)
– T00 full rack: 816 kg (1795 lb.)
Remember: It is the client’s responsibility to ensure that the installation of the drawer in
the preferred rack or cabinet results in a configuration that is stable, serviceable, safe, and
compatible with the drawer requirements for power, cooling, cable management, weight,
and rail security.16 IBM Power 710 and 730 Technical Overview and Introduction
1.10.3 IBM 7014 Model T42 rack
The 2.0 Meter (79.3-inch) Model T42 addresses the client requirement for a tall enclosure to
house the maximum amount of equipment in the smallest possible floor space. The following
features differ in the Model T42 rack from the Model T00:
Forty-two EIA units (42 U) of usable space (6 U of additional space).
The Model T42 supports ac only.
Weights:
– T42 base empty rack: 261 kg (575 lb.)
– T42 full rack: 930 kg (2045 lb.)
1.10.4 Feature code 0555 rack
The 1.3 Meter Rack (#0555) is a 25 EIA unit rack. The rack that is delivered as #0555 is the
same rack delivered when you order the 7014-S25 rack. The included features might vary.
The #0555 is supported, but no longer orderable.
1.10.5 Feature code 0551 rack
The 1.8 Meter Rack (#0551) is a 36 EIA unit rack. The rack that is delivered as #0551 is the
same rack delivered when you order the 7014-T00 rack. The included features might vary.
Certain features that are delivered as part of the 7014-T00 must be ordered separately with
the #0551.
1.10.6 Feature code 0553 rack
The 2.0 Meter Rack (#0553) is a 42 EIA unit rack. The rack that is delivered as #0553 is the
same rack delivered when you order the 7014-T42 or B42 rack. The included features might
vary. Certain features that are delivered as part of the 7014-T42 or B42 must be ordered
separately with the #0553.
1.10.7 The ac power distribution unit and rack content
For rack models T00 and T42, 12-outlet PDUs are available. These include ac power
distribution units #9188 and #7188 and ac Intelligent PDU+ #5889 and #7109.Chapter 1. General description 17
Four PDUs can be mounted vertically in the back of the T00 and T42 racks. See Figure 1-4
for the placement of the four vertically mounted PDUs. In the rear of the rack, two additional
PDUs can be installed horizontally in the T00 rack and three in the T42 rack. The four vertical
mounting locations will be filled first in the T00 and T42 racks. Mounting PDUs horizontally
consumes 1U per PDU and reduces the space available for other racked components. When
mounting PDUs horizontally, use fillers in the EIA units occupied by these PDUs to facilitate
proper air-flow and ventilation in the rack.
Figure 1-4 PDU placement and PDU view
For detailed power cord requirements and power cord feature codes, see the IBM Power
Systems Hardware Information Center at the following website:
http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp
A wide range of country requirements and electrical power specifications are supported by:
The Intelligent PDU+, base option, 1 EIA Unit, Universal, UTG0247 Connector (#5889)
The Base/Side Mount Universal PDU (#9188)
The optional, additional, Universal PDU (#7188)
The Intelligent PDU+ options (#7109) support
Important: Ensure that the appropriate power cord feature is configured to support the
power being supplied.
Rack Rear View
3 4
1 2
Circuit breaker reset
Status LED 18 IBM Power 710 and 730 Technical Overview and Introduction
The #5889 and #7109 PDUs are identical to the #9188 and #7188 PDUs but are
equipped with one Ethernet port, one console serial port, and one rs232 serial port for
power monitoring.
The PDU receives power through a UTG0247 power line connector. Each PDU requires one
PDU-to-wall power cord. Various power cord features are available for various countries and
applications by varying the PDU-to-wall power cord, which must be ordered separately. Each
power cord provides the unique design characteristics for the specific power requirements.
To match new power requirements and save previous investments, these power cords can be
requested with an initial order of the rack or with a later upgrade of the rack features.
The PDU has 12 client-usable IEC 320-C13 outlets. There are six groups of two outlets fed by
six circuit breakers. Each outlet is rated up to 10 amps, but each group of two outlets is fed
from one 15 amp circuit breaker.
The Universal PDUs are compatible with previous models.
1.10.8 Rack-mounting rules
Follow these primary rules when mounting the system into a rack:
The system is designed to be placed at any location in the rack. For rack stability, it is
advisable to start filling a rack from the bottom.
Any remaining space in the rack can be used to install other systems or peripherals,
provided that the maximum permissible weight of the rack is not exceeded and the
installation rules for these devices are followed.
Before placing the system into the service position, it is essential that the rack
manufacturer’s safety instructions have been followed regarding rack stability.
1.10.9 Useful rack additions
This section highlights solutions available for IBM Power Systems rack-based systems.
IBM System Storage 7214 Tape and DVD Enclosure
The IBM System Storage 7214 Tape and DVD Enclosure is designed to mount in one EIA
unit of a standard IBM Power Systems 19-inch rack and can be configured with one or two
tape drives, or either one or two Slim DVD-RAM or DVD-ROM drives in the right-side bay.
Remember: Based on the power cord that is used, the PDU can supply from 4.8 kVA to
19.2 kVA. The total power of all the drawers plugged into the PDU must not exceed the
power cord limitation.
Tip: Each system drawer to be mounted in the rack requires two power cords, which are
not included in the base order. For maximum availability, it is definitely best to connect
power cords from the same system to two separate PDUs in the rack, and to connect each
PDU to independent power sources.Chapter 1. General description 19
The two bays of the IBM System Storage 7214 Tape and DVD Enclosure can accommodate
the following tape or DVD drives for IBM Power servers:
DAT72 36 GB Tape Drive - up to two drives (#1400)
DAT160 80 GB Tape Drive - up to two drives (1401)
Half-high LTO Ultrium 4 800 GB Tape Drive - up to two drives (#1404)
DVD-RAM Optical Drive - up to two drives (#1422)
DVD-ROM Optical Drive - up to two drives (#1423)
IBM System Storage 7216 Multi-Media Enclosure
The IBM System Storage 7216 Multi-Media Enclosure (Model 1U2) is designed to attach to
the Power 710 and the Power 730 through a USB port on the server, or through a PCIe SAS
adapter. The 7216 has two bays to accommodate external tape, removable disk drive, or
DVD-RAM drive options.
The following optional drive technologies are available for the 7216-1U2:
DAT160 80 GB SAS Tape Drive (#5619
DAT320 160 GB SAS Tape Drive (#1402)
DAT320 160 GB USB Tape Drive (#5673)
Half-high LTO Ultrium 5 1.5 TB SAS Tape Drive (#8247)
DVD-RAM - 9.4 GB SAS Slim Optical Drive (#1420 and #1422)
RDX Removable Disk Drive Docking Station (#1103)
To attach a 7216 Multi-Media Enclosure to the Power 710 and Power 730, consider the
following cabling procedures:
Attachment by an SAS adapter
A PCIe LP 2-x4-port SAS adapter 3 Gb (#5278) must be installed in the Power 710 and
Power 730 server in order to attach to a 7216 Model 1U2 Multi-Media Storage Enclosure.
Attaching a 7216 to a Power 710 and Power 730 through the integrated SAS adapter is
not supported.
For each SAS tape drive and DVD-RAM drive feature installed in the 7216, the
appropriate external SAS cable will be included.
An optional Quad External SAS cable is available by specifying (#5544) with each 7216
order. The Quad External Cable allows up to four 7216 SAS tape or DVD-RAM features to
attach to a single System SAS adapter.
Up to two 7216 storage enclosure SAS features can be attached per PCIe LP 2-x4-port
SAS adapter 3 Gb (#5278).
Attachment by a USB adapter
The Removable RDX HDD Docking Station features on 7216 only support the USB cable
that is provided as part of the feature code. Additional USB hubs, add-on USB cables, or
USB cable extenders are not supported.
For each RDX Docking Station feature installed in the 7216, the appropriate external USB
cable will be included. The 7216 RDX Docking Station feature can be connected to the
external, integrated USB ports on the Power 710 and Power 730 or to the USB ports on
4-Port USB PCI Express Adapter (# 2728).
The 7216 DAT320 USB tape drive or RDX Docking Station features can be connected to
the external, integrated USB ports on the Power 710 and Power 730.
Remember: The DAT320 160 GB SAS Tape Drive (#1402) and the DAT320 160 GB USB
Tape Drive (#5673) are no longer available as of July 15, 2011.20 IBM Power 710 and 730 Technical Overview and Introduction
The two drive slots of the 7216 enclosure can hold the following drive combinations:
One tape drive (DAT160 SAS or Half-high LTO Ultrium 5 SAS) with second bay empty
Two tape drives (DAT160 SAS or Half-high LTO Ultrium 5 SAS) in any combination
One tape drive (DAT160 SAS or Half-high LTO Ultrium 5 SAS) and one DVD-RAM SAS
drive sled with one or two DVD-RAM SAS drives
Up to four DVD-RAM drives
One tape drive (DAT160 SAS or Half-high LTO Ultrium 5 SAS) in one bay, and one RDX
Removable HDD Docking Station in the other drive bay
One RDX Removable HDD Docking Station and one DVD-RAM SAS drive sled with one
or two DVD-RAM SAS drives in the right bay
Two RDX Removable HDD Docking Stations
Figure 1-5 shows the 7216 Multi-Media Enclosure.
Figure 1-5 7216 Multi-Media Enclosure
For a current list of host software versions and release levels that support the IBM System
Storage 7216 Multi-Media Enclosure, refer to the following website:
http://www.ibm.com/systems/support/storage/config/ssic/index.jsp
Flat panel display options
The IBM 7316 Model TF3 is a rack-mountable flat panel console kit consisting of a 17-inch
337.9 mm x 270.3 mm flat panel color monitor, rack keyboard tray, IBM Travel Keyboard,
support for the IBM Keyboard/Video/Mouse (KVM) switches, and language support. The IBM
7316-TF3 Flat Panel Console Kit offers the following features:
Slim, sleek, lightweight monitor design that occupies only 1U (1.75 inches) in a 19-inch
standard rack
A 17-inch, flat screen TFT monitor with truly accurate images and virtually no distortion Chapter 1. General description 21
The ability to mount the IBM Travel Keyboard in the 7316-TF3 rack keyboard tray
Support for the IBM Keyboard/Video/Mouse (KVM) switches that provide control of as
many as 128 servers, and support of both USB and PS/2 server-side keyboard and
mouse connections
1.10.10 OEM rack
The system can be installed in a suitable OEM rack, provided that the rack conforms to the
EIA-310-D standard for 19-inch racks. This standard is published by the Electrical Industries
Alliance. For detailed information, see the IBM Power Systems Hardware Information Center
at the following website:
http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp
These are the key points mentioned:
The front rack opening must be 451 mm wide ± 0.75 mm (17.75 in. ± 0.03 in.), and the
rail-mounting holes must be 465 mm ± 0.8 mm (18.3 in. ± 0.03 in.) apart on center
(horizontal width between the vertical columns of holes on the two front-mounting flanges
and on the two rear-mounting flanges). Figure 1-6 shows a top view showing the
specification dimensions.
Figure 1-6 Top view of non-IBM rack specification dimensions
571mm (22.50 in.)
Drawer Rail
Mounting
Flanges
Back, No Door
494mm (19.45 in.)
Front, No Door
203mm (8.0 in.)
719mm (28.31 in.)
51mm (2.01 in.)
451mm (17.76 in.)
494mm (19.45 in.)22 IBM Power 710 and 730 Technical Overview and Introduction
The vertical distance between the mounting holes must consist of sets of three holes
spaced (from bottom to top) 15.9 mm (0.625 in.), 15.9 mm (0.625 in.), and 12.67 mm
(0.5 in.) on center, making each three-hole set of vertical hole spacing 44.45 mm (1.75 in.)
apart on center. Rail-mounting holes must be 7.1 mm ± 0.1 mm (0.28 in. ± 0.004 in.) in
diameter. Figure 1-7 shows the top front specification dimensions.
Figure 1-7 Rack specification dimensions, top front view
Hole Diameter =
7.1 +/- 0.1mm
Rack Mounting Holes Center-to-Center
Rack Front Opening
450 +/- 0.75mm
465 +/- 0.8mm
EIA Hole Spacing
6.75mm min
15.9mm
15.9mm
12.7mm
15.9mm
15.9mm
12.7mm
6.75mm min
15.9mm
15.9mm
12.7mm
15.9mm
15.9mm
12.7mm
Top Front of Rack
Top Front of Rack© Copyright IBM Corp. 2011. All rights reserved. 23
Chapter 2. Architecture and technical
overview
This chapter discusses the overall system architecture for the IBM Power 710 and
Power 730, represented by Figure 2-1 on page 24 and Figure 2-2 on page 25. The
bandwidths that are provided throughout this section are theoretical maximums used for
reference.
The speeds shown are at an individual component level. Multiple components and application
implementation are key to achieving the best performance.
Always do the performance sizing at the application workload environment level and evaluate
performance using real-world performance measurements and production workloads.
224 IBM Power 710 and 730 Technical Overview and Introduction
Figure 2-1 shows the logical system diagram of the Power 710.
Figure 2-1 IBM Power 710 logical system diagram
POWER7 Chip 1
4-6-8 cores
P7-IOC
Buffer Buffer
DIMM #1
DIMM #3
Memory Card #1 Memory Card #2
GX++ SLOT #2
PCIe Gen2 x8 (FH/HL) SLOT #2
PCIe Gen2 x8 (FH/HL) SLOT #3
PCIe Gen2 x8 (FH/HL) SLOT #4
PCIe Gen2 x8 (FH/HL) SLOT #5
PCIe Gen2 x8 (FH/HL) SLOT #1
PCIe Gen2 x4 (FH/HL) SLOT #6
DIMM #2
DIMM #4
Buffer Buffer
DIMM #1
DIMM #3
DIMM #2
DIMM #4
TPMD
Memory Controller
SAS
Controller
RAIDs
0,1,10
Optional RAID 5 &
6 Expansion Card
DASD & Media
Backplane
HDD1
HDD2
HDD3
HDD4
HDD5
HDD6
DVD
1.0 GHz
PCIe
Switch
USB #1
USB #2
USB #3
USB #4
USB
Controller
2 System Ports
2 HMC Ports
2 SPCN Ports
VPD Chip
Service
Processor
2.5 GHz (2* 8 Bytes
20 GBps)
2.5 GHz (2* 8 Bytes
20 GBps)
68.224 GBps per socketChapter 2. Architecture and technical overview 25
Figure 2-2 shows the logical system diagram of the Power 730.
Figure 2-2 IBM Power 730 logical system diagram
P7-IOC
Buffer Buffer
DIMM #1
DIMM #3
Memory Card #1 Memory Card #2
Memory Card #3 Memory Card #4
GX++ SLOT #2
GX++ SLOT #1
PCIe Gen2 x8 (FH/HL) SLOT #2
PCIe Gen2 x8 (FH/HL) SLOT #3
PCIe Gen2 x8 (FH/HL) SLOT #4
PCIe Gen2 x8 (FH/HL) SLOT #5
PCIe Gen2 x8 (FH/HL) SLOT #1
PCIe Gen2 x4 (FH/HL) SLOT #6
DIMM #2
DIMM #4
Buffer Buffer
DIMM #1
DIMM #3
DIMM #2
DIMM #4
TPMD
SAS
Controller
RAIDs
0,1,10
Optional RAID 5 &
6 Expansion Card
DASD & Media
Backplane
HDD1
HDD2
HDD3
HDD4
HDD5
HDD6
DVD
68.224 GBps per socket
Buffer Buffer
DIMM #1
DIMM #3
DIMM #2
DIMM #4
Buffer Buffer
DIMM #1
DIMM #3
DIMM #2
DIMM #4
1.0 GHz
PCIe
Switch
2.9 GHz
2.9 GHz
2.9 GHz POWER7 Chip 2
4-6-8 cores
POWER7 Chip 1
4-6-8 cores
Memory Controller
Memory Controller
2 System Ports
2 HMC Ports
2 SPCN Ports
VPD Chip
Service
Processor
USB #1
USB #2
USB #3
USB #4
USB
2.5 GHz (2* 8 Bytes Controller
20 GBps)
2.5 GHz (2* 8 Bytes
20 GBps)
2.5 GHz (2* 8 Bytes
20 GBps)
68.224 GBps per socket26 IBM Power 710 and 730 Technical Overview and Introduction
2.1 The IBM POWER7 Processor
The IBM POWER7 processor represents a leap forward in technology achievement and
associated computing capability. The multi-core architecture of the POWER7 processor has
been matched with innovation across a wide range of related technologies in order to deliver
leading throughput, efficiency, scalability, and RAS.
Although the processor is an important component in delivering outstanding servers,
many elements and facilities have to be balanced on a server in order to deliver maximum
throughput. As with previous generations of systems based on POWER® processors, the
design philosophy for POWER7 processor-based systems is one of system-wide balance in
which the POWER7 processor plays an important role.
In many cases, IBM has been innovative in order to achieve required levels of throughput and
bandwidth. Areas of innovation for the POWER7 processor and POWER7 processor-based
systems include, but are not limited to, the following features:
On-chip L3 cache implemented in embedded dynamic random access memory (eDRAM)
Cache hierarchy and component innovation
Advances in memory subsystem
Advances in off-chip signalling
Exploitation of long-term investment in coherence innovation
The superscalar POWER7 processor design also provides a variety of other capabilities:
Binary compatibility with the prior generation of POWER processors
Support for PowerVM virtualization capabilities, including PowerVM Live Partition Mobility
to and from POWER6 and IBM POWER6+™ processor-based systemsChapter 2. Architecture and technical overview 27
Figure 2-3 shows the POWER7 processor die layout with these major areas identified:
Processor cores
L2 cache
L3 cache and chip interconnection
Symmetric Multi Processing (SMP) links
Memory controllers
Figure 2-3 POWER7 processor die with key areas indicated
2.1.1 POWER7 processor overview
The POWER7 processor chip is fabricated using the IBM 45 nm Silicon-On-Insulator (SOI)
technology using copper interconnect and implements an on-chip L3 cache using eDRAM.
The POWER7 processor chip has an area of 567 mm2
and is built using 1.2 billion
components (transistors). Eight processor cores are on the chip, each with 12 execution
units, 256 KB of L2 cache, and access to up to 32 MB of shared on-chip L3 cache.
For memory access, the POWER7 processor includes two Double Data Rate 3 (DDR3)
memory controllers, each with four memory channels. To be able to scale effectively,
the POWER7 processor uses a combination of local and global SMP links with very
high-coherency bandwidth and takes advantage of the IBM dual-scope broadcast
coherence protocol.28 IBM Power 710 and 730 Technical Overview and Introduction
Table 2-1 summarizes the technology characteristics of the POWER7 processor.
Table 2-1 Summary of POWER7 processor technology
2.1.2 POWER7 processor core
Each POWER7 processor core implements aggressive out-of-order (OoO) instruction
execution to drive high efficiency in the use of available execution paths. The POWER7
processor has an Instruction Sequence Unit that is capable of dispatching up to six
instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to
the Instruction Execution units.
The POWER7 processor has a set of 12 execution units:
Two fixed point units
Two load store units
Four double precision floating point units
One vector unit
One branch unit
One condition register unit
One decimal floating point unit
The following caches are tightly coupled to each POWER7 processor core:
Instruction cache: 32 KB
Data cache: 32 KB
L2 cache: 256 KB, implemented in fast SRAM
Technology POWER7 processor
Die size 567 mm2
Fabrication technology 45 nm lithography
Copper interconnect
Silicon-on-Insulator
eDRAM
Components 1.2 billion components/transistors offering the
equivalent function of 2.7 billion (For further details, see
2.1.6, “On-chip L3 cache innovation and Intelligent
Cache” on page 32.)
Processor cores 4, 6, or 8
Max execution threads per core/chip 4/32
L2 cache per core/chip 256 KB/2 MB
On-chip L3 cache per core/chip 4 MB/32 MB
DDR3 memory controllers 1 or 2
SMP design-point 32 sockets with IBM POWER7 processors
Compatibility With prior generation of POWER processorChapter 2. Architecture and technical overview 29
2.1.3 Simultaneous multithreading
An enhancement in the POWER7 processor is the addition of the SMT4 mode to enable four
instruction threads to execute simultaneously in each POWER7 processor core. Thus, the
instruction thread execution modes of the POWER7 processor are as follows:
SMT1: Single instruction execution thread per core
SMT2: Two instruction execution threads per core
SMT4: Four instruction execution threads per core
Maximizing throughput
SMT4 mode enables the POWER7 processor to maximize the throughput of the processor
core by offering an increase in core efficiency. SMT4 mode is the latest step in an evolution of
multithreading technologies introduced by IBM. Figure 2-4 shows the evolution of
simultaneous multithreading.
Figure 2-4 Evolution of simultaneous multi-threading
The various SMT modes offered by the POWER7 processor allow flexibility, enabling users to
select the threading technology that meets an aggregation of objectives such as
performance, throughput, energy use, and workload enablement.
Intelligent Threads
The POWER7 processor features Intelligent Threads that can vary based on the workload
demand. The system either automatically selects (or the system administrator can manually
select) whether a workload benefits from dedicating as much capability as possible to a
single thread of work, or if the workload benefits more from having capability spread across
two or four threads of work. With more threads, the POWER7 processor can deliver more
total capacity as more tasks are accomplished in parallel. With fewer threads, those
workloads that need very fast individual tasks can get the performance that they need for
maximum benefit.
Multi-threading evolution
Thread 0 Executing Thread 1 Executing No Thread Executing
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
1995 single thread out of order
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
1997 hardware multi-thread
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
2004 2-way SMT
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
2010 4-way SMT
Thread 2 Executing Thread 3 Executing30 IBM Power 710 and 730 Technical Overview and Introduction
2.1.4 Memory access
Each POWER7 processor chip has two DDR3 memory controllers, each with four memory
channels (enabling eight memory channels per POWER7 processor chip). Each channel
operates at 6.4 GHz and can address up to 32 GB of memory. Thus, each POWER7
processor chip is capable of addressing up to 256 GB of memory.
Figure 2-5 gives a simple overview of the POWER7 processor memory access structure.
Figure 2-5 Overview of POWER7 memory access structure
2.1.5 Flexible POWER7 processor packaging and offerings
The POWER7 processor forms the basis of a flexible compute platform and can be offered in
a number of guises to address differing system requirements.
The POWER7 processor can be offered with a single active memory controller with four
channels for servers where higher degrees of memory parallelism are not required.
Similarly, the POWER7 processor can be offered with a variety of SMP bus capacities that
are appropriate to the scaling-point of particular server models.
POWER7 Processor Chip
Advanced
Buffer ASIC
Chip
Memory
Controller
Advanced
Buffer ASIC
Chip
Memory
Controller
Core Core Core Core
Core Core Core Core
Dual Integrated DDR3 memory controllers
• High channel and DIMM utilization
• Advanced energy management
• RAS advances
Eight high-speed 6.4 GHz channels
• New low-power differential signalling
New DDR3 buffer chip architecture
• Larger capacity support (32 GB/core)
• Energy management support
• RAS enablement
DDR3 DRAMsChapter 2. Architecture and technical overview 31
Figure 2-6 outlines the physical packaging options that are supported with
POWER7 processors.
Figure 2-6 Outline of the POWER7 processor physical packaging
POWER7 processors have the unique ability to optimize to various workload types. For
example, database workloads typically benefit from very fast processors that handle high
transaction rates at high speeds. Web workloads typically benefit more from processors with
many threads that allow the breaking down of web requests into many parts and handle them
in parallel. POWER7 processors uniquely have the ability to provide leadership performance
in either case.
TurboCore mode
Users can opt to run selected servers in TurboCore mode. This mode uses four cores per
POWER7 processor chip with access to the entire 32 MB of L3 cache (8 MB per core) and at
a faster processor core frequency, which delivers higher performance per core, and might
save on software costs for those applications that are licensed per core.
MaxCore mode
MaxCore mode is for workloads that benefit from a higher number of cores and threads
handling multiple tasks simultaneously, taking advantage of increased parallelism. MaxCore
mode provides up to eight cores and up to 32 threads per POWER7 processor.
POWER7 processor 4-core and 6-core offerings
The base design for the POWER7 processor is an 8-core processor with 32 MB of on-chip L3
cache (4 MB per core). However, the architecture allows for differing numbers of processor
cores to be active, 4 cores or 6 cores, as well as the full 8-core version.
In most cases (MaxCore mode), the L3 cache associated with the implementation is
dependent on the number of active cores. For a 6-core version, this typically means that
6 x 4 MB (24 MB) of L3 cache is available. Similarly, for a 4-core version, the L3 cache
available is 16 MB.
TurboCore availability: TurboCore is available on the Power 780 and Power 795.
Single Chip Organic
1 x Memory Controller
Local broadcast SMP links active
Single Chip Glass Ceramic
2 x Memory Controllers
Local broadcast SMP links active
Global broadcast SMP links active32 IBM Power 710 and 730 Technical Overview and Introduction
2.1.6 On-chip L3 cache innovation and Intelligent Cache
A breakthrough in material engineering and microprocessor fabrication has enabled IBM to
implement the L3 cache in eDRAM and place it on the POWER7 processor die. L3 cache is
critical to a balanced design, as is the ability to provide good signalling between the L3 cache
and other elements of the hierarchy such as the L2 cache or SMP interconnect.
The on-chip L3 cache is organized into separate areas with differing latency characteristics.
Each processor core is associated with a Fast Local Region of L3 cache (FLR-L3) but also
has access to other L3 cache regions as shared L3 cache. Additionally, each core can
negotiate to use the FLR-L3 cache associated with another core, depending on reference
patterns. Data can also be cloned to be stored in more than one core's FLR-L3 cache, again
depending on reference patterns. This Intelligent Cache management enables the POWER7
processor to optimize the access to L3 cache lines and minimize overall cache latencies.
Figure 2-7 shows the FLR-L3 cache regions for each of the cores on the POWER7
processor die.
Figure 2-7 Fast local regions of L3 cache on the POWER7 processor
Innovation using eDRAM on the POWER7 processor die is significant for these reasons:
Latency improvement
A six-to-one latency improvement occurs by moving the L3 cache on-chip compared to L3
accesses on an external (on-ceramic) ASIC.
Bandwidth improvement
A 2x bandwidth improvement occurs with on-chip interconnect. Frequency and bus sizes
are increased to and from each core.Chapter 2. Architecture and technical overview 33
No off-chip driver or receivers
Removing drivers or receivers from the L3 access path lowers interface requirements,
conserves energy, and lowers latency.
Small physical footprint
The eDRAM L3 cache requires far less physical space than an equivalent L3 cache
implemented with conventional SRAM. IBM on-chip eDRAM uses only a third of the
components used in conventional SRAM, which has a minimum of six transistors to
implement a 1-bit memory cell.
Low energy consumption
The on-chip eDRAM uses only 20% of the standby power of SRAM.
2.1.7 POWER7 processor and Intelligent Energy
Energy consumption is an important area of focus for the design of the POWER7 processor,
which includes Intelligent Energy features that help to dynamically optimize energy usage
and performance so that the best possible balance is maintained. Intelligent Energy features
such as EnergyScale work with IBM Systems Director Active Energy Manager™ to
dynamically optimize processor speed based on thermal conditions and system utilization.
2.1.8 Comparison of the POWER7 and POWER6 processors
Table 2-2 shows comparable characteristics between the generations of POWER7 and
POWER6 processors.
Table 2-2 Comparison of technology for the POWER7 processor and the prior generation
Feature POWER7 POWER6
Technology 45 nm 65 nm
Die size 567 mm2
341 mm2
Maximum cores 8 2
Maximum SMT threads per
core
4 threads 2 threads
Maximum frequency 4.25 GHz 5 GHz
L2 Cache 256 KB per core 4 MB per core
L3 Cache 4 MB of FLR-L3 cache per core
with each core having access to
the full 32 MB of L3 cache,
on-chip eDRAM
32 MB off-chip eDRAM ASIC
Memory support DDR3 DDR2
I/O Bus Two GX++ One GX++
Enhanced Cache Mode
(TurboCore)
Ye s
a
a. Not supported on the Power 770 and the Power 780 4-socket systems.
No
Sleep & Nap Mode
b
b. For more information about Sleep and Nap modes, see 2.15.1, “IBM EnergyScale technology”
on page 77.
Both Nap only34 IBM Power 710 and 730 Technical Overview and Introduction
2.2 POWER7 processor modules
The Power 710 and Power 730 server chassis house POWER7 processor modules that
host POWER7 processor sockets (SCM) and eight DDR3 memory DIMM slots for each
processor module.
The Power 710 server houses one processor module offering 4-core 3.0 GHz, 6-core
3.7 GHz, or 8-core 3.55 GHz configurations.
The Power 730 server houses two processor modules offering 8-core 3.0 and 3.7 GHz,
12-core 3.7 GHz, or 16-core 3.55 GHz configurations.
All off the installed processors must be activated.
2.2.1 Modules and cards
Figure 2-8 shows a Power 730 server highlighting the POWER7 processor modules and the
memory riser cards.
Figure 2-8 Power 730 with two POWER7 processor modules and four memory riser cards
Requirement: All POWER7 processors in the system must be the same frequency and
have the same number of processor cores. POWER7 processor types cannot be mixed
within a system.
POWER7 processor
modules
Memory riser cardChapter 2. Architecture and technical overview 35
2.2.2 Power 710 and Power 730 systems
Power 710 and Power 730 systems support POWER7 processor chips with various core
counts. Table 2-3 summarizes POWER7 processor options for the Power 710 system.
Table 2-3 Summary of POWER7 processor options for the Power 710 system
Table 2-4 summarizes the POWER7 processor options for the Power 730 system.
Table 2-4 Summary of POWER7 processor options for the Power 730 system
Feature Cores per
POWER7
processor
Frequency
(GHz)
Processor activation Min/max
cores per
system
Min/max
processor
module
#EPC1 4 3.0 The 4-core 3.0 GHz requires
that four processor activation
codes are ordered, available as
4 x #EPD1 or 2 x #EPD1 and
2 x #EPE1.
4 1
#EPC2 6 3.7 The 6-core 3.7 GHz requires
that six processor activation
codes are ordered, available
6 x #EPD2 or 3 x #EPD2 and
3 x #EPE2.
6 1
#EPC3 8 3.55 The 8-core 3.55 GHz requires
that eight processor activation
codes are ordered, available
8 x #EPD3 or 4 x #EPD3 and
4 x #EPE3.
8 1
Feature Cores per
POWER7
processor
Frequency
(GHz)
Processor activation Min/max
cores per
system
Min/max
processor
module
#EPC1 4 3.0 The 4-core 3.0 GHz requires
that eight processor activation
codes are ordered, available as
8 x #EPD1 or 4 x #EPD1 and
4 x #EPE1.
8 2
#EPC4 4 3.7 The 4-core 3.7 GHz requires
that eight processor activation
codes are ordered, available as
8 x #EPD4 or 4 x #EPD4 and
4 x #EPE4) are required.
8 2
#EPC2 6 3.7 The 6-core 3.7 GHz requires
that 12 processor activation
codes are ordered, available as
12 x #EPD2 or 6 x #EPD2 and
6 x #EPE2.
12 2
#EPC3 8 3.55 2 x the 8-core 3.55 GHz
requires that 16 processor
activation codes are ordered,
available as 16 x #EPD3 or
8 x #EPD3 and 8 x #EPE3.
16 236 IBM Power 710 and 730 Technical Overview and Introduction
2.3 Memory subsystem
The Power 710 is a one-socket system supporting a single POWER7 processor module. The
server supports a maximum of eight DDR3 DIMM slots, with four DIMM slots included in the
base configuration and four DIMM slots available with an optional memory riser card. Memory
features (two memory DIMMs per feature) supported are 4 GB, 8 GB, 16 GB, and 32 GB
running at speeds of 1066 MHz. A system with the optional memory riser card installed has a
maximum memory of 128 GB.
The Power 730 is a two-socket system supporting up to two POWER7 processor modules.
The server supports a maximum of 16 DDR3 DIMM slots, with four DIMM slots included in the
base configuration and 12 DIMM slots available with three optional memory riser cards.
Memory features (two memory DIMMs per feature) supported are 4 GB, 8 GB, 16 GB, and
32 GB running at speeds of 1066 MHz. A system with three optional memory riser cards
installed has a maximum memory of 256 GB.
These servers support an optional feature called Active Memory Expansion (#4795) that
allows the effective maximum memory capacity to be much larger than the true physical
memory. This feature executes innovative compression or decompression of memory content
using processor cycles in order to provide memory expansion up to 100% depending on the
workload type and its memory utilization. A server with a maximum of 128 GB can effectively
be expanded up to 256 GB. This can enhance virtualization and server consolidation by
allowing a partition to do significantly more work with the same physical amount of memory or
a server to run more partitions and do more work with the same physical amount of memory.
2.3.1 Registered DIMM
Industry-standard DDR3 Registered DIMM (RDIMM) technology is used to increase
reliability, speed, and density of memory subsystems.
2.3.2 Memory placement rules
These memory options are orderable:
4 GB (2 x 2 GB) Memory DIMMs, 1066 MHz (#EM04)
8 GB (2 x 4 GB) Memory DIMMs, 1066 MHz (#EM08)
16 GB (2 x 8 GB) Memory DIMMs, 1066 MHz (#EM16)
32 GB (2 x 16 GB) Memory DIMMs, 1066 MHz (#EM32)
A minimum of 1 GB memory per core on the 4-core POWER7 modules and 2 GB memory per
core on the 6-core and 8-core POWER7 modules is required. There can be different valid
memory configurations that meet the minimum requirement.
The maximum memory supported is as follows:
Power 710: 128 GB (four 16 GB DIMMs on each of two memory cards)
Power 730: 256 GB (four 16 GB DIMMs on each of four memory cards)
Remember: DDR2 memory (used in POWER6 processor-based systems) is not
supported in POWER7 processor-based systems.Chapter 2. Architecture and technical overview 37
Figure 2-9 shows the physical memory DIMM topology.
Figure 2-9 Memory DIMM topology for the Power 730
The memory-placement rules are as follows:
The base machine contains one memory riser card with four DIMM sockets. Memory
features occupy two memory DIMM sockets.
The Power 710 offers one additional memory riser card feature (1 x #5265) with an
additional four DIMM sockets. Maximum system memory is 64 GB without feature #5265
and 128 GB with one feature #5265.
The Power 730 offers three optional memory riser card features (3 x #5265) with an
additional four DIMM sockets per feature. Maximum system memory is 64 GB without
feature #5265 and 256 GB with three feature #5265.
A system can be ordered with a single memory feature #EM04, #EM08, #EM16, or
#EM32. The second memory feature ordered on the same memory riser card does not
have to match the first memory feature. Memory features can be mixed on either memory
riser card.
A minimum of one memory feature must be plugged into each memory riser card. Empty
memory riser cards are not allowed.
There is a performance benefit when all DIMMs on a memory riser card are of the
same capacity.
It is generally best to install memory evenly across all memory riser cards in the system.
Balancing memory across the installed memory riser cards allows memory access in a
consistent manner and typically results in the best possible performance for your
configuration. However, balancing memory fairly evenly across multiple memory riser cards,
Memory
Card #1
DDR3 RDIMM Slot 2
Port A DDR3 RDIMM Slot 4
SN-A
Port B
DDR3 RDIMM Slot 1
Port A DDR3 RDIMM Slot 3
SN-B
Port B
DDR3 RDIMM Slot 2
Port A DDR3 RDIMM Slot 4
SN-C
Port B
DDR3 RDIMM Slot 1
Port A DDR3 RDIMM Slot 3
SN-D
Port B
Memory
Card #2
POWER7
MC0
Port 0
MC0
Port 1
MC0
Port 2
MC0
Port 3
Memory
Card #3
DDR3 RDIMM Slot 2
Port A DDR3 RDIMM Slot 4
SN-A
Port B
DDR3 RDIMM Slot 1
Port A DDR3 RDIMM Slot 3
SN-B
Port B
DDR3 RDIMM Slot 2
Port A DDR3 RDIMM Slot 4
SN-C
Port B
DDR3 RDIMM Slot 1
Port A DDR3 RDIMM Slot 3
SN-D
Port B
Memory
Card #4
POWER7
MC0
Port 0
MC0
Port 1
MC0
Port 2
MC0
Port 3
MC: Memory Controller
SN: Memory Buffer38 IBM Power 710 and 730 Technical Overview and Introduction
compared to balancing memory exactly evenly, typically has a very small
performance difference.
Take into account any plans for future memory upgrades when deciding which memory
feature size to use at the time of initial system order.
Table 2-5 shows the installation slots for memory DIMMs for two POWER7 modules and four
memory cards.
Table 2-5 Memory DIMM installation sequence and slots
Power 710 Power 730 Placement rules
One memory card is
installed at the
P1-C17 slot.
One memory card is
installed at the
P1-C17 slot.
Install the first two DIMMs at slot 1 (P1-C17-C1)
and slot 2 (P1-C17-C2) on the memory card 1.
Install the next two DIMMs at slot 3 (P1-C17-C3)
and slot 4 (P1-C17-C4) on the memory card 1.
Two memory cards
are installed at the
P1-C17 slot and
P1-C16 slot.
Install the first two DIMMs at slot 1 (P1-C17-C1)
and slot 2 (P1-C17-C2) on the memory card 1.
Install the next two DIMMs at slot 1 (P1-C16-C1)
and slot 2 (P1-C16-C2) on the memory card 2.
Install the next two DIMMs at slot 3 (P1-C17-C3)
and slot 4 (P1-C17-C4) on the memory card 1.
Install the next two DIMMs at slot 3 (P1-C16-C3)
and slot 4 (P1-C16-C4) on the memory card 2.
Two memory cards
are installed at the
P1-C17 slot and
P1-C15 slot.
Install the first two DIMMs at slot 1 (P1-C17-C1)
and slot 2 (P1-C17-C2) on the memory card 1.
Install the next two DIMMs at slot 1 (P1-C15-C1)
and slot 2 (P1-C15-C2) on the memory card 3.
Install the next two DIMMs at slot 3 (P1-C17-C3)
and slot 4 (P1-C17-C4) on the memory card 1.
Install the next two DIMMs at slot 3 (P1-C15-C3)
and slot 4 (P1-C15-C4) on the memory card 3.
Three memory cards
are installed at the
P1-C17 slot, P1-C16
slot, and P1-C15 slot.
Install the first two DIMMs at slot 1 (P1-C17-C1)
and slot 2 (P1-C17-C2) on the memory card 1.
Install the next two DIMMs at slot 1 (P1-C15-C1)
and slot 2 (P1-C15-C2) on the memory card 3.
Install the next two DIMMs at slot 1 (P1-C16-C1)
and slot 2 (P1-C16-C2) on the memory card 2.
Install the next two DIMMs at slot 3 (P1-C17-C3)
and slot 4 (P1-C17-C4) on the memory card 1.
Install the next two DIMMs at slot 3 (P1-C15-C3)
and slot 4 (P1-C15-C4) on the memory card 3.
Install the next two DIMMs at slot 3 (P1-C16-C3)
and slot 4 (P1-C16-C4) on the memory card 2.Chapter 2. Architecture and technical overview 39
Figure 2-10 shows the DIMM slot positions on the Memory Riser Cards.
Figure 2-10 Memory Riser Card for the Power 710 and Power 730 Systems
2.3.3 Memory bandwidth
The POWER7 processor has exceptional cache, memory, and interconnect
bandwidths. Table 2-6 shows the bandwidth estimates for the Power 710 and Power 730
systems.
Table 2-6 Power 710 and Power 730 processor, memory, and I/O bandwidth estimates
2.4 Capacity on Demand
Capacity on Demand is not supported on the Power 710 and Power 730 systems.
Four memory cards
are installed at the
P1-C17 slot, P1-C16
slot, P1-C15 slot, and
P1-C14 slot.
Install the first two DIMMs at slot 1 (P1-C17-C1)
and slot 2 (P1-C17-C2) on the memory card 1.
Install the next two DIMMs at slot 1 (P1-C15-C1)
and slot 2 (P1-C15-C2) on the memory card 3.
Install the next two DIMMs at slot 1 (P1-C16-C1)
and slot 2 (P1-C16-C2) on the memory card 2.
Install the next two DIMMs at slot 1 (P1-C14-C1)
and slot 2 (P1-C14-C2) on the memory card 4.
Install the next two DIMMs at slot 3 (P1-C17-C3)
and slot 4 (P1-C17-C4) on the memory card 1.
Install the next two DIMMs at slot 3 (P1-C15-C3)
and slot 4 (P1-C15-C4) on the memory card 3.
Install the next two DIMMs at slot 3 (P1-C16-C3)
and slot 4 (P1-C16-C4) on the memory card 2.
Install the next two DIMMs at slot 3 (P1-C14-C3)
and slot 4 (P1-C14-C4) on the memory card 4.
Slot #1 – P1-Cn-C1
Slot #2 – P1-Cn-C2
Slot #3 – P1-Cn-C3
Slot #4 – P1-Cn-C4
Buffer A
Buffer B
Memory 3.55 GHz processor card
Power 710 Power 730
L1 (data) cache 170.4 GBps 170.4 GBps
L2 cache 170.4 GBps 170.4 GBps
L3 cache 113.6 GBps 113.6 GBps
System memory 68.22 GBps 68.22 GBps per socket40 IBM Power 710 and 730 Technical Overview and Introduction
2.5 Factory deconfiguration of processor cores
The Power 710 and Power 730 servers have the capability to be shipped with part of the
installed cores deconfigured at the factory. The primary use for this feature is to assist with
optimization of software licensing. A deconfigured core is unavailable for use in the system
and thus does not require software licensing.
Feature #2319 deconfigures one core in a Power 710 or Power 730 system. It is available on
all systems configurations.
The maximum number of this feature that can be ordered is one less than the number of
cores in the system. For example, a maximum of five #2319 can be ordered for a 6-core
system. Feature #2319 can only be specified at initial order and cannot be applied to an
installed machine.
2.6 System bus
This section provides additional information related to the internal buses.
The Power 710 and 730 systems have internal I/O connectivity through Peripheral
Component Interconnect Express (PCI Express, or PCIe) slots, as well as external
connectivity through InfiniBand adapters.
The internal I/O subsystem on the Power 710 and 730 is connected to the GX bus on a
POWER7 processor in the system. This bus runs at 2.5 GHz and provides 20 GBps of I/O
connectivity to the PCIe slots, integrated Ethernet adapter ports, SAS internal adapters, and
USB ports.
Additionally, the POWER7 processor chip installed on the Power 710 and each of the
processor chips on the Power 730 provide a GX++ bus, which is used to optionally connect to
the GX adapters. Each bus runs at 2.5 GHz and provides 20 GBps bandwidth.
The GX++ Dual-port 12x Channel Attach Adapter (#EJ0G) can be installed in either
GX++ slot.
Table 2-7 lists the I/O bandwidth of Power 720 and Power 740 processor configurations.
Table 2-7 I/O bandwidth
Remember: The GX++ slots are not hot pluggable.
I/O I/O bandwidth (maximum theoretical)
GX++ Bus from the first POWER7 SCM to the I/O chip 10 GBps simplex
20 GBps duplex
GX++ Bus (slot 1) 10 GBps simplex
20 GBps duplex
GX++ Bus (slot 2) 10 GBps simplex
20 GBps duplex
Total I/O bandwidth 30 GBps simplex
60 GBps duplexChapter 2. Architecture and technical overview 41
2.7 Internal I/O subsystem
The internal I/O subsystem resides on the system planar, which supports PCIe slots. PCIe
slots on the Power 710 and Power 730 are not hot pluggable. PCIe and PCI-X slots on the
I/O drawers are hot-pluggable.
All PCIe slots support Enhanced Error Handling (EEH). PCI EEH-enabled adapters respond
to a special data packet generated from the affected PCIe slot hardware by calling system
firmware, which will examine the affected bus, allow the device driver to reset it, and continue
without a system reboot. For Linux, EEH support extends to the majority of frequently used
devices, although certain third-party PCI devices might not provide native EEH support.
2.7.1 Slot configuration
Table 2-8 displays the PCIe Gen2 slot configuration of Power 710 and Power 730.
Table 2-8 Slot configuration of a Power 710 and Power 730.
2.7.2 System ports
The system planar has two serial ports that are called system ports. When an HMC is
connected to the server, the integrated system ports of the server are rendered
non-functional. In this case, you must install an asynchronous adapter, which is described in
Table 2-17 on page 49, for serial port usage:
Integrated system ports are not supported under AIX or Linux when the HMC ports are
connected to an HMC. Either the HMC ports or the integrated system ports can be used,
but not both.
The integrated system ports are supported for modem and asynchronous terminal
connections. Any other application using serial ports requires a serial port adapter to
be installed in a PCI slot. The integrated system ports do not support IBM
PowerHA™ configurations.
Configuration of the two integrated system ports, including basic port settings (baud rate,
and so on), modem selection, call-home and call-in policy, can be performed with the
Advanced Systems Management Interface (ASMI).
Slot# Description Location code PCI Host Bridge
(PHB)
Max card size
Slot 1 PCIe Gen2 x8 P1-C2 P7IOC PCIe PHB5 Low profile
Slot 2 PCIe Gen2 x8 P1-C3 P7IOC PCIe PHB4 Low profile
Slot 3 PCIe Gen2 x8 P1-C4 P7IOC PCIe PHB3 Low profile
Slot 4 PCIe Gen2 x8 P1-C5 P7IOC PCIe PHB2 Low profile
Slot 5 PCIe Gen2 x8 P1-C6 P7IOC PCIe PHB1 Low profile
Slot6 PCIe Gen2 x4 P1-C7 P7IOC multiplexer
PCIe PHB0
Low profile
Remember: Slot 6 is shared with GX++ adapter slot 2. If a PCIe adapter is plugged into
slot 6, then a GX++ adapter cannot be plugged into GX++ slot 2 and vice versa.42 IBM Power 710 and 730 Technical Overview and Introduction
2.8 PCI adapters
This section covers the various types and functionality of the PCI cards supported with the
IBM Power 710 and Power 730 systems.
2.8.1 PCIe Gen1 and Gen2
Peripheral Component Interconnect Express (PCIe) uses a serial interface and allows for
point-to-point interconnections between devices (using a directly wired interface between
these connection points). A single PCIe serial link is a dual-simplex connection that uses two
pairs of wires, one pair for transmit and one pair for receive, and can transmit only one bit per
cycle. These two pairs of wires are called a lane. A PCIe link can consist of multiple lanes. In
such configurations, the connection is labeled as x1, x2, x8, x12, x16, or x32, where the
number is effectively the number of lanes.
Two generations of PCIe interface are supported in Power 710 and Power 730 models:
Gen1: Capable of transmitting at the extremely high speed of 2.5 Gbps, which gives a
capability of a peak bandwidth of 2 GBps simplex on an x8 interface
Gen2: Double the speed of the Gen1 interface, which gives a capability of a Peak
bandwidth of 4 GBps simplex on an x8 interface
PCIe Gen1 slots support Gen1 adapter cards and also most of the Gen2 adapters. In this
case, where a Gen2 adapter is used in a Gen1 slot, the adapter will operate at PCIe Gen1
speed. PCIe Gen2 slots support both Gen1 and Gen2 adapters. In this case, where a Gen1
card is installed into a Gen2 slot, it will operate at PCIe Gen1 speed with a slight performance
enhancement. When a Gen2 adapter is installed into a Gen2 slot, it will operate at the full
PCIe Gen2 speed.
The Power 710 and Power 730 system enclosure is equipped with five PCIe x8 Gen2 Low
Profile slots. In addition, there is a sixth PCIe x4 dedicated to the PCIe Ethernet card that is
standard with the base unit.
IBM offers only PCIe low profile adapter options for the Power 710 and Power 730 systems.
All adapters support Extended Error Handling (EEH). PCIe adapters use a different type of
slot than PCI and PCI-X adapters. If you attempt to force an adapter into the wrong type of
slot, you might damage the adapter or the slot.
Remember: The integrated console/modem port usage just described is for systems
configured as a single, system-wide partition. When the system is configured with multiple
partitions, the integrated console/modem ports are disabled because the TTY console and
call home functions are performed with the HMC.
Note: IBM i IOP and PCI-X adapters are not supported in the Power 710 and
Power 730 systems.Chapter 2. Architecture and technical overview 43
2.8.2 PCIe adapter form factors
IBM POWER7 processor based servers are able to support two different form factors of
PCIe adapters:
PCIe low profile (LP) cards, which are used with the Power 710 and Power 730 PCIe
slots. Low profile adapters are also used in the PCIe riser card slots of Power 720 and
Power 740 servers.
PCIe full height and full high cards that are plugged into the following servers slots:
– Power 720 and Power 740 (Within the base system, five PCIe slots half length slots
are supported.)
– Power 750
– Power 755
– Power 770
– Power 780
– Power 795
– PCIe slots of the #5802 and #5877 drawers
Low profile PCIe adapters cards are only supported in low profile PCIe slots, and full height
and full high cards are only supported in full high slots.
Figure 2-11 lists the PCIe adapter form factors.
Figure 2-11 PCIe adapter form factors
Many of the full height cards features are available in low profile format. For example, the
#5273 8 Gb dual port Fibre channel adapter is the low profile adapter equivalent of the #5735
adapter full height. They have equivalent functional characteristics.
PCIe Slot Height
• Low Profile PCIe Slots
– Power 710 / 730
– Power 720 / 740
• PCIe Expansion riser
Low Profile
Full Height Full High
• Full Height / High PCIe Slots
– Power 720 / 740 / 750 / 770 / 780
– 12X PCIe I/O Drawer
• #5802 / 5877
• #5803 / 587344 IBM Power 710 and 730 Technical Overview and Introduction
Table 2-9 is a list of low profile adapter cards and their equivalent in full heighth.
Table 2-9 Equivalent adapter cards
Before adding or rearranging adapters, you can use the System Planning Tool to validate the
new adapter configuration. See the System Planning Tool website:
http://www.ibm.com/systems/support/tools/systemplanningtool/
If you are installing a new feature, ensure that you have the software required to support the
new feature and determine whether there are any existing update prerequisites to install. To
do this, use the IBM prerequisite website:
https://www-912.ibm.com/e_dir/eServerPreReq.nsf
The following sections discuss the supported adapters and provide tables of orderable
feature numbers. The tables in the following sections indicate operating support, AIX (A),
IBM i (i), and Linux (L), for each of the adapters.
Low profile Adapter description Full height
Feature
code
CCIN Feature
code
CCIN
#2053 57CD PCIe RAID and SSD SAS Adapter 3 Gb Low
Profile
#2054 or
#2055
57CD or
57CD
#5269 POWER GXT145 PCI Express Graphics
Accelerator (LP)
#5748 5748
#5270 10 Gb FCoE PCIe Dual Port adapter (LP) #5708 2BCB
#5271 4-Port 10/100/1000 Base-TX PCI-Express
adapter
#5717 5717
#5272 10 Gigabit Ethernet-CX4 PCI Express adapter
(LP)
#5732 5732
#5273 8 Gigabit PCI Express Dual Port Fibre Channel
adapter (LP)
#5735 577D
#5274 2-Port Gigabit Ethernet-SX PCI Express
adapter (LP)
#5768 5768
#5275 10 Gb ENet Fibre RNIC PCIe 8x #5769 5769
#5276 4 Gigabit PCI Express Dual Port Fibre Channel
adapter (LP)
#5774 5774
#5277 4-Port Async EIA-232 PCIe adapter (LP) #5785
#5278 SAS Controller PCIe 8x #5901 57B3Chapter 2. Architecture and technical overview 45
2.8.3 LAN adapters
To connect the Power 710 and Power 730 to a local area network (LAN), you can use the
following LAN adapters supported in the system enclosure PCIe slots. Table 2-10 lists the
additional LAN adapters that are available.
Table 2-10 Available LAN adapters
2.8.4 Graphics accelerator adapters
Table 2-11 lists the available graphics accelerator. It can be configured to operate in either
8-bit or 24-bit color modes. The adapter supports both analog and digital monitors.
Table 2-11 Available graphics accelerator adapters
Feature
code
CCIN Adapter description Slot Size OS
support
#5271 PCIe LP 4-Port 10/100/1000 Base-TX
Ethernet adapter
PCIe Low profile
Short
A, L
#5272 PCIe LP 10 GbE CX4 1-port adapter PCIe Low profile
Short
A, L
#5274 PCIe LP 2-Port 1 GbE SX adapter PCIe Low profile
Short
A, i, L
#5275 PCIe LP 10 GbE SR 1-port adapter PCIe Low profile
Short
A, L
#5281 PCIe LP 2-Port 1 GbE TX adapter PCIe Low profile
Short
A, i, L
#5284 PCIe2 LP 2-port 10 GbE SR adapter PCIe Low profile A, L
#5286 PCIe2 LP 2-Port 10 GbE SFP+ Copper
adapter
PCIe Low profle A, L
#9056
a
a. This adapter is required in the Power 710 and Power 730 systems.
PCIe LP 2-Port 1 GbE TX adapter PCIe Low profile
Short
A, i, L
Feature
code
CCIN Adapter description Slot Size OS
support
#5269
a
a. This card card is not supported in slot 6, P1-C7.
2849 PCIe LP POWER GXT145 Graphics
Accelerator
PCIe Low profile
Short
A, L46 IBM Power 710 and 730 Technical Overview and Introduction
2.8.5 SAS adapters
Table 2-12 lists the SAS adapter available for the Power 710 and Power 730 systems.
Table 2-12 Available SAS adapters
2.8.6 PCIe RAID and SSD SAS adapter
A new SSD option for selected POWER7 processor-based servers offers a significant
price/performance improvement for many client SSD configurations. The new SSD option is
packaged differently from those currently available with Power Systems. The new PCIe RAID
and SSD SAS adapter has up to four 177 GB SSD modules plugged directly onto the
adapter, saving the need for the SAS bays and cabling associated with the current SSD
offering. The new PCIe-based SSD offering can save up to 70% of the list price, and reduce
up to 65% of the footprint, compared to disk enclosure based SSD, assuming equivalent
capacity. This is dependant on the configuration required.
Figure 2-12 shows the double-wide adapter and SSD modules.
Figure 2-12 The PCIe RAID and SSD SAS adapter and 177 GB SSD modules
Feature
code
CCIN Adapter description Slot Size OS
support
#5278
a
a. This card card is not supported in slot 6, P1-C7.
PCIe LP 2-x4-port SAS adapter 3 Gb PCIe Low profile
Short
A, i, L
#5805
b
b. The full height cards are only supported by the Power 730 model with a #5802 or a
#5877 drawer.
574E PCIe 380MB Cache Dual - x4 3 Gb
SAS RAID adapter
PCIe Full height A, i, L
#5901
b
57B3 PCIe Dual-x4 SAS adapter PCIe Full heigth A, i ,L
#5913
b
57B5 PCIe2 1.8 GB Cache RAID SAS
adapter Tri-port 6 Gb
PCIe Full height A, i, L
SAS
Cntrl
177 GB
SSD
177 GB
SSD
177 GB
SSD
177 GB
SSDChapter 2. Architecture and technical overview 47
Table 2-13 shows the available RAID and SSD SAS adapters for the Power 710 and
Power 730.
Table 2-13 Available PCIe RAID and SSD SAS adapters
The 177 GB SSD Module with Enterprise Multi-level Cell (eMLC) uses a new enterprise-class
MLC flash technology, which provides enhanced durability, capacity, and performance. One,
two, or four modules can be plugged onto a PCIe RAID and SSD SAS adapter providing up to
708 GB of SSD capacity on one PCIe adapter.
Because the SSD modules are mounted on the adapter, in order to service either the adapter
or one of the modules, the entire adapter must be removed from the system.
Under AIX and Linux, the 177 GB modules can be reformatted as JBOD disks, providing
200 GB of available disk space. This removes RAID error correcting information, so it is best
to mirror the data using operating system tools in order to prevent data loss in case of failure.
2.8.7 Fibre Channel adapters
The systems support direct or SAN connection to devices using Fibre Channel adapters.
Table 2-14 provides a summary of the available Fibre Channel adapters. All of these adapters
have LC connectors. If you are attaching a device or switch with an SC type fiber connector,
then an LC-SC 50 Micron Fiber Converter Cable (#2456) or an LC-SC 62.5 Micron Fiber
Converter Cable (#2459) is required.
Table 2-14 Available Fibre Channel adapters
Feature
code
CCIN Adapter description Slot Size OS
support
#2053 57CD PCIe LP RAID & SSD SAS
adapter 3 Gb
PCIe Low profile
Double wide,
short
A, i, L
#2055
a
a. The full height cards are only supported by the Power 730 model with a #5802 or a
#5877 drawer.
57CD PCIe RAID & SSD SAS adapter
3 Gb
with Blind Swap Cassette
PCIe Full heigth A, i, L
Feature
code
CCIN Adapter description Slot Size OS
support
#5273
a
a. At the time writing, the IBM i device driver does not support this card in PCIe slot 6, P1-C7.
PCIe LP 8 Gb 2-Port Fibre Channel
adapter
PCIe Low profile
Short
A, i, L
#5276
a
PCIe LP 4 Gb 2-Port Fibre Channel
adapter
PCIe Low profile
Short
A, i, L
Note: The usage of NPIV through the Virtual I/O server requires an 8 Gb Fibre Channel
adapter (#5273).48 IBM Power 710 and 730 Technical Overview and Introduction
2.8.8 Fibre Channel over Ethernet
Fibre Channel over Ethernet (FCoE) allows for the convergence of Fibre Channel and
Ethernet traffic onto a single adapter and converged fabric.
Figure 2-13 shows a comparison between existing FC and network connections and FCoE
connections.
Figure 2-13 Comparison between existing FC and network connection and FCoE connection
Table 2-15 lists the available Fibre Channel over Ethernet Adapter. It is a high-performance,
Converged Network Adapter (CNA) using SR optics. Each port can provide Network Interface
Card (NIC) traffic and Fibre Channel functions simultaneously.
Table 2-15 Available FCoE adapters
For more information about FCoE, read An Introduction to Fibre Channel over Ethernet, and
Fibre Channel over Convergence Enhanced Ethernet, REDP-4493.
2.8.9 InfiniBand Host Channel adapter
The InfiniBand Architecture (IBA) is an industry-standard architecture for server I/O and
inter-server communication. It was developed by the InfiniBand Trade Association (IBTA) to
provide the levels of reliability, availability, performance, and scalability necessary for present
and future server systems with levels significantly better than can be achieved using
bus-oriented I/O structures.
InfiniBand (IB) is an open set of interconnect standards and specifications. The main IB
specification has been published by the InfiniBand Trade Association and is available at:
http://www.infinibandta.org/
InfiniBand is based on a switched fabric architecture of serial point-to-point links, where these
IB links can be connected to either host channel adapters (HCAs), used primarily in servers,
or target channel adapters (TCAs), used primarily in storage subsystems.
Ethernet
and Fibre
Channel
Cables
Ethernet
Cable
Fibre Channel
Cable
FC Switch
Ethernet Switch
CEC or I/O Drawer
Ethernet
CEC or I/O Drawer
FC
Rack
Fibre Channel (FC)
Device or FC Switch
Ethernet
Cables
Ethernet
Cable
Fibre Channel
Cable
FCoE Switch
CEC or I/O Drawer
Rack
Fibre Channel (FC)
Device or FC Switch
FCoE
Ethernet Device/
Switch
Ethernet Device/
Switch or FCoE
Device/Switch
Feature
code
CCIN Adapter description Slot Size OS
support
#5270 PCIe LP 10 Gb FCoE 2-port adapter PCIe Low profile
Short
A, LChapter 2. Architecture and technical overview 49
The InfiniBand physical connection consists of multiple byte lanes. Each individual byte lane
is a four wire, 2.5, 5.0, or 10.0 Gbps bi-directional connection. Combinations of link width and
byte lane speed allow for overall link speeds from 2.5 Gbps to 120 Gbps. The architecture
defines a layered hardware protocol as well as a software layer to manage initialization and
the communication between devices. Each link can support multiple transport services for
reliability and multiple prioritized virtual communication channels.
For more information about InfiniBand, see HPC Clusters Using InfiniBand on IBM Power
Systems Servers, SG24-7767.
The GX++ Dual-port 12X Channel Attach adapter (#EJ0G) provides two 12X connections for
12X Channel applications. One adapter has to be installed in GX++ bus slot 2 and will cover
one adjacent PCIe x8 G2 slot 5. The 12X Channel is connected in a loop and uses both
connectors on the adapters. Up to four I/O drawers can be attached in a single loop. This
adapter must be used with the 12X cables.
Connection to supported InfiniBand switches is accomplished by using the 12x to 4x Channel
Conversion Cables, feature #1828, #1841 or #1842.
Table 2-16 lists the available InfiniBand adapters.
Table 2-16 Available InfiniBand adapters
2.8.10 Asynchronous adapters
Asynchronous PCI adapters provide connection of asynchronous EIA-232 or RS-422
devices. If you have a cluster configuration or high-availability configuration and plan to
connect the IBM Power Systems using a serial connection, you can use the features listed in
Ta bl e 2 - 1 7 .
Table 2-17 Available asynchronous adapters
2.9 Internal storage
The Power 710 and Power 730 servers use an integrated SAS/SATA controller connected
through a PCIe bus to the P7-IOC chip supporting RAID 0, 1, and 10 (Figure 2-14 on
page 50). The SAS/SATA controller used in the server’s enclosure has two sets of four
Feature
code
CCIN Adapter description Slot Size OS
support
#5283 PCIe2 LP 2-Port 4X IB QDR adapter
40 Gb
PCIe Low profile
Short
A, L
#EJ0G GX++ Dual-Port 12X Channel Attach
adapter
GX++
PCIe
A, L
Feature
code
CCIN Adapter description Slot Size OS
support
#5277
a
a. This card card is not supported in slot 6, P1-C7.
PCIe LP 4-Port Async EIA-232 adapter PCIe Low profile
Short
A, L
#5290 PCIe LP 2-Port Async EIA-232 adapter PCIe Low profile
Short
A,L50 IBM Power 710 and 730 Technical Overview and Introduction
SAS/SATA channels, which give Power 710 and Power 730 the combined total of eight SAS
busses. Each channel can support either SAS or SATA operation. The SAS controller is
connected to a DASD backplane and supports three or six small form factor (SFF) disk drive
bays depending on the backplane option.
One of the following options must be selected as the backplane:
Feature #EJ0E supports three SFF disk units, either HDD or SSD, an SATA DVD, and a
tape (#5762 or follow on). There is no support for split backplane and RAID 5, 6.
Feature #EJ0D supports six SFF disk units, either HDD or SSD, and a SATA DVD. There
is no support for split backplane and RAID 5, 6.
Feature #EJ0F supports six SFF disk units, either HDD or SSD, a SATA DVD, a Dual
Write Cache RAID, and an external SAS port. HDDs/SSDs are hot-swap and front
accessible. Split backplane is not supported. RAID levels 5 and 6 are supported. This
feature is required when IBM i is the primary operating system (#2145).
The supported disk drives in a Power 710 and Power 730 server connect to the DASD
backplane and are hot-swap and front accessible.
Figure 2-14 details the internal topology overview for the #EJ0E backplane:
Figure 2-14 Internal topology overview for #EJ0E DASD backplane
Note: Feature #EJ0E is not supported with IBM i.
Tape Drive
Disk #3
Disk #2
Disk #1
Slim DVD
Integrated
SAS Adapter
P7IOCChapter 2. Architecture and technical overview 51
Figure 2-15 shows the internal topology overview for the #EJ0D backplane.
Figure 2-15 Internal topology overview for the #EJ0D DASD backplane
Figure 2-16 shows the details of the internal topology overview for the #5268 backplane.
Figure 2-16 Internal topology overview for the #EJ0F DASD backplane
Disk #6
Disk #5
Disk #4
Disk #3
Disk #2
Disk #1
Slim DVD
Integrated
SAS Adapter
P7IOC
Disk #6
Disk #5
Disk #4
Disk #3
Disk #2
Disk #1
Slim DVD
Integrated
SAS Adapter
P7IOC
Battery
Backup
External SAS Port52 IBM Power 710 and 730 Technical Overview and Introduction
2.9.1 RAID support
There are multiple protection options for HDD/SSD drives in the SAS SFF bays in the
Power 710 and 730 system unit or drives in disk drawers or drives in disk-only I/O drawers.
Although protecting drives is always the best idea, AIX/Linux users can choose to leave a few
or all drives unprotected at their own risk, and IBM supports these configurations. IBM i
configuration rules differ in this regard, and IBM supports IBM i partition configurations only
when HDD/SSD drives are protected.
Drive protection
HDD/SSD drive protection can be provided by the AIX, IBM i, and Linux operating system, or
by the HDD/SSD hardware controllers. Mirroring of drives is provided by the AIX, IBM i, and
Linux operating system. In addition, AIX and Linux support controllers providing RAID 0, 1, 5,
6, or 10. The integrated storage management of IBM i already provides striping. So IBM i also
supports controllers providing RAID 5 or 6. To further augment HDD/SSD protection, hot
spare capability can be used for protected drives. Specific hot spare prerequisites apply.
An integrated SAS HDD/SSD controller is provided in the Power 710 and Power 730 system
unit and provides support for JBOD and RAID 0, 1, and 10 for AIX or Linux.
It is optionally augmented by RAID 5 and RAID 6 capability when storage backplane #EJ0F is
added to the configuration. In addition to these protection options, mirroring of drives by the
operating system is supported. AIX or Linux supports all of these options. IBM i does not use
JBOD and uses imbedded functions instead of RAID 10, but does leverage the RAID 5 or 6
function of the integrated controllers. Other disk/SSD controllers are provided as PCIe SAS
adapters are supported. PCI Controllers with and without write cache are supported. RAID 5
and RAID 6 on controllers with write cache are supported.
Table 2-18 lists the RAID support by backplane.
Table 2-18 RAID support configurations
AIX and Linux can use disk drives formatted with 512-byte blocks when being mirrored by the
operating system. These disk drives must be reformatted to 528-byte sectors when used in
RAID arrays. Although a small percentage of the drive's capacity is lost, additional data
protection such as ECC and bad block detection is gained in this reformatting. For example, a
300 GB disk drive, when reformatted, provides around 283 GB. IBM i always uses drives
formatted to 528 bytes. IBM Power SSDs are formatted to 528 bytes.
Power 710 and 730 support a dual write cache RAID feature, which consists of an auxiliary
write cache for the RAID card and the optional RAID enablement.
Storage
backplane
JBOD RAID 0, 1,
and 10
RAID 5, 6 Split
backplane
External SAS
port
#EJ0D Yes Yes No No No
#EJ0E Yes Yes No No No
#EJ0F No Yes Yes No YesChapter 2. Architecture and technical overview 53
Supported RAID functions
Base hardware supports RAID 0, 1, and 10. When additional features are configured,
Power 710 and Power 730 support hardware RAID 0, 1, 5, 6, and 10:
RAID-0 provides striping for performance, but does not offer any fault tolerance.
The failure of a single drive will result in the loss of all data on the array. This increases I/O
bandwidth by simultaneously accessing multiple data paths.
RAID-1 mirrors the contents of the disks, making a form of 1:1 ratio realtime backup. The
contents of each disk in the array are identical to those of every other disk in the array.
RAID-5 uses block-level data striping with distributed parity.
RAID-5 stripes both data and parity information across three or more drives. Fault
tolerance is maintained by ensuring that the parity information for any given block of data
is placed on a drive separate from those used to store the data itself.
RAID-6 uses block-level data striping with dual distributed parity.
RAID-6 is the same as RAID-5 except that it uses a second level of independently
calculated and distributed parity information for additional fault tolerance. RAID-6
configuration requires N+2 drives to accommodate the additional parity data, which makes
it less cost effective than RAID-5 for equivalent storage capacity.
RAID-10 is also known as a striped set of mirrored arrays.
It is a combination of RAID-0 and RAID-1. A RAID-0 stripe set of the data is created
across a 2-disk array for performance benefits. A duplicate of the first stripe set is then
mirrored on another 2-disk array for fault tolerance.
2.9.2 External SAS port
The Power 710 and Power 730 DASD backplane (#EJ0F) offers the connection to an
external SAS port:
The SAS port connector is located next to the GX++ slot 2 on the rear bulkhead.
The external SAS port is used for expansion to external SAS devices or drawer such as
the EXP 12S SAS Drawer, the EXP24S SFF Gen2-bay Drawer, and the IBM System
Storage 7214 Tape and DVD Enclosure Express (Model 1U2).
2.9.3 Media bays
The Power 710 and Power 730 each have a slim media bay that contains an optional
DVD-RAM (#5762) and a tape bay (only available with #EJ0E) that can contain a tape drive
or removable disk drive. Direct dock and hot-plug of the DVD media device is supported.
2.10 External I/O subsystems
The Power 710 and Power 730 servers support the attachment of I/O drawers. The
Power 710 supports disk-only I/O drawers (#5886, #5887), providing large storage capacity
and multiple partition support. The Power 730 supports disk-only I/O drawers (#5886, #5887)
Note: Only one SAS drawer is supported from the external SAS port. Additional SAS
drawers can be supported through SAS adapters. SSDs are not supported on the SAS
drawer connected to the external port.54 IBM Power 710 and 730 Technical Overview and Introduction
and two 12X attached I/O drawers (#5802, 5877), providing extensive capability to expand
the overall server.
This section describes the external 12X I/O subsystems that can be attached to the
Power 730:
12X I/O Drawer PCIe, SFF disk (#5802)
12X I/O Drawer PCIe, no disk (#5877)
Table 2-19 provides an overview of the capabilities of the supported I/O drawers.
Table 2-19 I/O drawer capabilities
Each processor card feeds one GX++ adapter slot. On the Power 710, there is one GX++ slot
available, and on the Power 730 there are two GX++ slots available.
2.10.1 12X I/O Drawer PCIe
The 12X I/O Drawer PCIe (#5802) is a 19-inch I/O and storage drawer. It provides a 4U-tall
(EIA units) drawer containing 10 PCIe-based I/O adapter slots and 18 SAS hot-swap Small
Form Factor disk bays, which can be used for either disk drives or SSD. The adapter slots
use blind-swap cassettes and support hot-plugging of adapter cards.
A maximum of two #5802 drawers can be placed on the same 12X loop. Feature #5877 is the
same as #5802 except that it does not support any disk bays. Feature #5877 can be on the
same loop as #5802. Feature #5877 cannot be upgraded to #5802.
The physical dimensions of the drawer measure 444.5 mm (17.5 in.) wide by 177.8 mm
(7.0 in.) high by 711.2 mm (28.0 in.) deep for use in a 19-inch rack.
A minimum configuration of two 12X DDR cables, two ac power cables, and two SPCN
cables is required to ensure proper redundancy. The drawer attaches to the host CEC
enclosure with a 12X adapter in a GX++ slot through 12X DDR cables that are available in
various cable lengths:
0.6 meters (#1861)
1.5 meters (#1862)
3.0 meters (#1865)
8 meters (#1864)
The 12X SDR cables are not supported.
Feature code DASD PCI slots Requirements for a 720/740
#5802 18 x SFF disk drive bays 10 x PCIe GX++ Dual-port 12x Channel
Attach #EJ0G
#5877 None 10 x PCIe GX++ Dual-port 12x Channel
Attach #EJ0G
Note: The attachment of external I/O drawers is not supported on the 4-core Power 710.Chapter 2. Architecture and technical overview 55
Figure 2-17 shows the front view of the 12X I/O Drawer PCIe (#5802).
Figure 2-17 Front view of the 12X I/O Drawer PCIe
Figure 2-18 shows the rear view of the 12X I/O Drawer PCIe (#5802).
Figure 2-18 Rear view of the 12X I/O Drawer PCIe
2.10.2 Dividing SFF drive bays in a 12X I/O drawer PCIe
Disk drive bays in a 12X I/O drawer PCIe can be configured as a set of one, two, or four,
allowing for partitioning of disk bays. Disk bay partitioning configuration can be done by
physical mode switch on the I/O drawer.
Disk drives Service card Port cards
Power cables
10 PCIe cards X2 SAS connectors
12X connectors Mode switch
SPCN connectors56 IBM Power 710 and 730 Technical Overview and Introduction
Figure 2-18 on page 55 indicates the Mode Switch in the rear view of the #5802 I/O Drawer.
Each disk bay set can be attached to its own controller or adapter. The #5802 PCIe 12X I/O
Drawer has four SAS connections to drive bays. It connects to PCIe SAS adapters or
controllers on the host system.
Figure 2-19 shows the configuration rule of disk bay partitioning in the #5802 PCIe 12X I/O
Drawer. There is no specific feature code for mode switch setting.
Figure 2-19 Disk Bay Partitioning in #5802 PCIe 12X I/O drawer
The SAS ports, as associated with the mode selector switch map to the disk bays, have the
mappings shown in Table 2-20.
Table 2-20 SAS connection mappings
Note: Mode change using the physical mode switch requires power-off/on of the drawer.
Note: The IBM System Planing Tool supports disk bay partitioning. Also, the IBM
configuration tool accepts this configuration from IBM System Planing Tool and passes it
through IBM manufacturing using the Customer Specified Placement (CSP) option.
Location code Mappings Number of bays
P4-T1 P3-D1 to P3-D5 5 bays
P4-T2 P3-D6 to P3-D9 4 bays
P4-T3 P3-D10 to P3-D14 5 bays
P4-T3 P3-D15 to P3-D18 4 bays
MODE
SWITCH
1
2
4
#5802 12X I/O Drawer
AIX/Linux
• One set: 18 bays
• Two sets: 9 + 9 bays
• Four sets: 5 + 4 + 5 + 4 bays
IBMi
• Two sets: 9 + 9 bays
PCIe 12X I/O Drawer – SFF Drive BaysChapter 2. Architecture and technical overview 57
Location codes for #5802 I/O drawer
Figure 2-20 and Figure 2-21 provide the location codes for the front and rear views of the
#5802 I/O drawer.
Figure 2-20 5802 I/O drawer from view location codes
Figure 2-21 5802 I/O drawer rear view location codes
P3-D1
P3-D2
P3-D3
P3-D4
P3-D5
P3-D6
P3-D7
P3-C1
P3-C2
P3-D8
P3-D9
P3-D10
P3-D11
P3-C3
P3-C4
P3-D12
P3-D13
P3-D14
P3-D15
P3-D16
P3-D17
P3-D18
E1
E2
ARECW500-0
P1-C1
P1-C2
P1-C3
P1-C4
P1-T2
P1-C5
P1-C6
P1-C7
P1-C8
P1-C9
P1-C10
P4-T5
P2-T1
P2-T2
P2-T3
ARECW501-0
P1-T1
P4-T1
P4-T2
P4-T3
P4-T458 IBM Power 710 and 730 Technical Overview and Introduction
Configuring the #5802 disk drive subsystem
The #5802 SAS disk drive enclosure can hold up 18 disk drives. The disks in this enclosure
can be organized in various configurations depending on the operating system used, the type
of SAS adapter card, and the position of the mode switch.
Each disk bay set can be attached to its own controller or adapter. Feature #5802 PCIe 12X
I/O Drawer has four SAS connections to drive bays. It connects to PCIe SAS adapters or
controllers on the host systems.
For detailed information about how to configure the #5802 disk drive subsystem, see the IBM
Power Systems Hardware Information Center at:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp
2.10.3 12X I/O drawer PCIe and PCI-DDR 12X Expansion Drawer 12X cabling
I/O drawers are connected to the adapters in the CEC enclosure with data transfer cables:
12X DDR cables for the #5802 and #5877 I/O drawers
12X SDR or DDR cables, or both, for the #5796 I/O drawers
The first 12X I/O Drawer that is attached in any I/O drawer loop requires two data transfer
cables. Each additional drawer, up to the maximum allowed in the loop, requires one
additional data transfer cable. Note the following information:
A 12X I/O loop starts at a CEC bus adapter port 0 and attaches to port 0 of an I/O drawer.
The I/O drawer attaches from port 1 of the current unit to port 0 of the next I/O drawer.
Port 1 of the last I/O drawer on the 12X I/O loop connects to port 1 of the same CEC bus
adapter to complete the loop.
Figure 2-22 shows typical 12X I/O loop port connections.
Figure 2-22 Typical 12X I/O loop port connections
I/O I/O
1 0 1 0
I/O
0 1
I/O
0 1
C
0
1Chapter 2. Architecture and technical overview 59
Table 2-21 shows various 12X cables to satisfy the various length requirements.
Table 2-21 12X connection cables
General rules for 12X I/O Drawer configuration
If you have two processor cards, spread the I/O drawers across two busses for
better performance.
2.10.4 12X I/O Drawer PCIe and PCI-DDR 12X Expansion Drawer SPCN cabling
The System Power Control Network (SPCN) is used to control and monitor the status of
power and cooling within the I/O drawer.
SPCN cables connect all ac powered expansion units, as shown in the example diagram in
Figure 2-23:
1. Start at SPCN 0 (T1) of the CEC unit to J15 (T1) of the first expansion unit.
2. Cable all units from J16 (T2) of the previous unit to J15 (T1) of the next unit.
3. To complete the cabling loop, from J16 (T2) of the final expansion unit, connect to the
CEC, SPCN 1 (T2).
4. Ensure that a complete loop exists from the CEC, through all attached expansions and
back to the CEC drawer.
Figure 2-23 SPCN cabling examples
Feature code Description
#1861 0.6 meter 12X DDR cable
#1862 1.5 meter 12X DDR cable
#1865 3.0 meter 12X DDR cable
#1864 8.0 meter 12X DDR cable
J15
J16
J15
J16
J15
J16
System Unit
1
060 IBM Power 710 and 730 Technical Overview and Introduction
Table 2-22 shows SPCN cables to satisfy various length requirements.
Table 2-22 SPCN cables
2.11 External disk subsystems
The Power 710 and Power 730 servers support the attachment of I/O drawers. The
Power 710 supports disk-only I/O drawers (#5886, #5887), providing large storage capacity
and multiple partition support.
The Power 730 supports disk-only I/O drawers (#5886, #5887) and two 12X attached I/O
drawers (#5802, #5877), also providing extensive capability to expand the overall server.
The following external disk subsystems can be attached to the Power 710 and
Power 730 servers:
EXP 12S SAS Expansion Drawer (#5886, supported, but no longer orderable)
EXP24S SFF Gen2-bay Drawer (#5887)
IBM System Storage
The next sections describe the EXP 12S AND EXP24S Expansion Drawers and IBM System
Storage in more detail.
2.11.1 EXP 12S SAS Expansion Drawer
The EXP 12S SAS Expansion Drawer (#5886) is an expansion drawer that supports up to 12
hot-swap SAS Hard Disk Drives (HDD) or up to eight hot-swap Solid State Drives (SSD). The
EXP 12S includes redundant ac power supplies and two power cords. Though the drawer is
one set of 12 drives, which is run by one SAS controller or one pair of SAS controllers, it has
two SAS attachment ports and two Service Managers for redundancy. The EXP 12S takes up
a two-EIA space in a 19-inch rack. The SAS controller can be a SAS PCI-X or PCIe adapter
or pair of adapters.
The drawer can either be attached to the Power 710 and Power 730 using the #EJ0F storage
backplane, providing an external SAS port, or using the PCIe Dual-x4 SAS Adapter 3 Gb
(#5278) adapter.
With proper cabling and configuration, multiple wide ports are used to provide redundant
paths to each dual-port SAS disk. The adapter manages SAS path redundancy and path
switching in case an SAS drive failure occurs. The SAS Y cables attach to an EXP 12S SAS
Expansion Drawer. Use the SAS cable (YI) system to SAS enclosure, single controller/dual
path 1.5 m (#3686, supported but no longer orderable) or a SAS cable (YI) system to SAS
enclosure, single controller/dual path 3 m (#3687) to attach SFF SAS drives in an EXP 12S
SAS Expansion Drawer.
Feature code Description
#6006 SPCN cable drawer-to-drawer, 2 m
#6008
a
a. Supported, but no longer orderable
SPCN cable rack-to-rack, 6 m
#6007 SPCN cable rack-to-rack, 15 m
#6029
a
SPCN cable rack-to-rack, 30 mChapter 2. Architecture and technical overview 61
Figure 2-24 illustrates connecting a system external SAS port to a disk expansion drawer.
Figure 2-24 External SAS cabling
Use the SAS cable (YO) system to SAS enclosure, single controller/dual path 1.5M (#3450)
or SAS cable (YO) system to SAS enclosure, single controller/dual path 3M (#3451) to attach
SFF SAS drives in an EXP 12S SAS Expansion Drawer.
In the EXP 12S SAS Expansion Drawer, a high-availability I/O configuration can be
created using a pair of #5278 adapters and SAS X cables to protect against the failure of
an SAS adapter.
A second EXP12S SAS Expansion Drawer can be attached to another drawer using two SAS
EE cables, providing 24 SAS bays instead of 12 bays for the same SAS controller port. This
arrangement is called cascading. In this configuration, all 24 SAS bays are controlled by a
single controller or a single pair of controllers.
The EXP12S SAS Expansion Drawer can also be directly attached to the SAS port on the
rear of the Power 710 or Power 730, providing a low-cost disk storage solution. The rear SAS
port is provided by feature #EI0F. A second unit cannot be cascaded to a EXP12S SAS
Expansion Drawer attached in this way.
Note: If the internal disk bay of the Power 710 or Power 730 contains any SSD drives, an
EXP 12S SAS Expansion Drawer cannot be attached to the external SAS port on the
Power 710 or Power 730. This rule applies even if the I/O drawer only contains SAS disk
drives.
The EXP 12S SAS Expansion Drawer (#5886) is not supported on a 4-core Power 710
(#EPC1).
IPHAD709-1
External SAS Port
YI-Cable 62 IBM Power 710 and 730 Technical Overview and Introduction
For detailed information about the SAS cabling, see the serial-attached SCSI cable planning
documentation at:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7had/p7
hadsascabling.htm
2.11.2 EXP24S SFF Gen2-bay Drawer
The EXP24S SFF Gen2-bay Drawer (#5887) is an expansion drawer supporting up to 24
hot-swap 2.5-inch SFF SAS Hard Disk drives (HDD) on POWER6 or POWER7 servers in 2U
of 19-inch rack space.
The SFF bays of the EXP24S are different from the SFF bays of the POWER7 system units
or 12X PCIe I/O Drawers (#5802 and #5803). The EXP24S uses Gen2 or SFF-2 SAS drives
that physically do not fit in the Gen-1 or SFF-1 bays of the POWER7 system unit or 12X PCIe
I/O Drawers or vice versa.
The EXP 24S SAS ports are attached to SAS controllers, which can be a SAS PCI-X or PCIe
adapter or pair of adapters. The EXP24S can also be attached to an imbedded SAS controller
in a server with an imbedded SAS port. Attachment between the SAS controller and the
EXP24S SAS ports is via the appropriate SAS Y or X cables.
The drawer can be attached either using the #EJ0F storage backplane, providing an external
SAS port, or using these adapters:
PCIe Dual-x4 SAS Adapter 3 Gb (#5278)
PCIe2 1.8 GB Cache RAID SAS Adapter Tri-port 6 Gb (#5913)
The EXP24S can be ordered in one of three possible manufacturing-configured MODE
settings (not customer set-up): 1, 2, or 4 sets of disk bays.
With IBM AIX, Linux, and Virtual I/O server, the EXP24S can be ordered with four sets of six
bays (mode 4), two sets of 12 bays (mode 2), or one set of 24 bays (mode 1). With IBM i the
EXP24S can be ordered as one set of 24 bays (mode 1).
Note: A single #5887 drawer can be cabled to the CEC external SAS port when a #EJ0F
DASD backplane is part of the system. A 3 Gb/s YI cable (#3686/#3687) is used to
connect a #5887 to the CEC external SAS port.
A single #5887 will not be allowed to attach to the CEC external SAS port when a #EPC1
processor (4-core) is ordered/installed on a single socket Power 710 system.
Note: Note the following information:
The modes for the EXP24S SFF Gen2-bay Drawer are set by IBM Manufacturing.
There is no reset option after the drawer has been shipped.
If you order multiple EXP24S, avoid mixing modes within that order. There is no
externally visible indicator as to the drawer's mode.
Several EXP24S cannot be cascaded on the external SAS connector. Only one #5887
is supported.
The Power 710 and Power 730 support up to four EXP24S SFF Gen2-bay Drawers.Chapter 2. Architecture and technical overview 63
There are six SAS connectors on the rear of the EXP24S to which SAS adapters/controllers
are attached. They are labeled T1, T2, and T3, and there are two T1, two T2, and two T3
(Figure 2-25):
In mode 1, two or four of the six ports are used. Two T2 are used for a single SAS
adapter, and two T2 and two T3 are used with a paired set of two adapters or dual
adapters configuration.
In mode 2 or mode 4, four ports will be used, two T2 and two T3, to access all SAS bays.
Figure 2-25 #5887 rear connectors
An EXP24S in mode 4 can be attached to two or four SAS controllers and provide a great
deal of configuration flexibility. An EXP24S in mode 2 has similar flexibility. Up to 24 HDDs
can be supported with any of the supported SAS adapters/controllers.
EXP24S no-charge specify codes should be included with EXP24S orders to indicate to IBM
Manufacturing the mode to which the drawer should be set and the adapter/controller/cable
configuration that will be used. Table 2-23 lists the no-charge specify codes, the physical
adapters/controllers/cables with their own chargeable feature numbers.
Table 2-23 EXP24S Cabling
The following cabling options for the EXP 24S Drawer are available:
X cables for #5278
– 3 m (#3661)
– 6 m (#3662)
– 15 m (#3663)
X cables for #5913 (all 6 Gb except for 15 m cable)
– 3 m (#3454)
– 6 m (#3455)
– 10 m (#3456)
Feature code Mode Adapter/controller Cable to drawer Environment
#9359 1 One #5278 1 YO cable A, L, VIOS
#9360 1 Pair #5278 2 YO cabless A, L, VIOS
#9361 2 Two #5278 2 YO cabless A, L, VIOS
#9365 4 Four #5278 2 X cables A, L, VIOS
#9366 2 Two pairs of #5278 2 X cables A, L, VIOS
#9367 1 Pair #5805 2 YO cables A, i, L, VIOS
#9368 2 Two 5805 2 X cables A, L, VIOS
#9384 1 CEC SAS port 1 YI cable A, i, L, VIOS
#9385 1 Two #5913 2 YO cables A, i, L, VIOS
#9386 2 Four #5913 4 X cables A, L, VIOS64 IBM Power 710 and 730 Technical Overview and Introduction
YO cables for #5278
– 1.5 m (#3691)
– 3 m (#3692)
– 6 m (#3693)
– 15 m (#3694)
YO cables for #5913 (all 6 Gb except for 15 m cable)
– 1.5 m (#3450)
– 3 m (#3451)
– 6 m (#3452)
– 10 m (#3453)
YI cables for system unit SAS port (3 Gb)
– 1.5 m (#3686)
– 3 m (#3687)
For detailed information about the SAS cabling, see the serial-attached SCSI cable planning
documentation at:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7had/p7
hadsascabling.htm
2.11.3 IBM System Storage
The IBM System Storage Disk Systems products and offerings provide compelling storage
solutions with superior value for all levels of business, from entry-level up to high-end
storage systems.
IBM System Storage N series
The IBM System Storage N series is a Network Attached Storage (NAS) solution and
provides the latest technology to customers to help them improve performance, virtualization
manageability, and system efficiency at a reduced total cost of ownership. For more
information about the IBM System Storage N series hardware and software, see:
http://www.ibm.com/systems/storage/network
IBM System Storage DS3000 family
The IBM System Storage DS3000 is an entry-level storage system designed to meet the
availability and consolidation needs for a wide range of users. New features, including larger
capacity 450 GB SAS drives, increased data protection features such as RAID 6, and more
FlashCopies per volume, provide a reliable virtualization platform. For more information about
the DS3000 family, see:
http://www.ibm.com/systems/storage/disk/ds3000/index.html
IBM System Storage DS5000
New DS5000 enhancements help reduce cost by introducing SSD drives. Also with the new
EXP5060 expansion unit supporting sixty 1 TB SATA drives in a 4U package, customers can
Note: IBM plans to offer a 15-meter, 3 Gb bandwidth SAS cable for the #5913 PCIe2
1.8 GB Cache RAID SAS Adapter when attaching the EXP24S Drawer (#5887) for large
configurations where the 10-meter cable is a distance limitation.
The EXP24S Drawer rails are fixed length and designed to fit Power Systems provided
racks of 28 inches (711 mm) deep. EXP24S uses two EIA of space in a 19-inch-wide rack.
Other racks might have different depths, and these rails do not adjust. No adjustable depth
rails are orderable at this time.Chapter 2. Architecture and technical overview 65
see up to a one-third reduction in floor space over standard enclosures. With the addition of
1 Gbps iSCSI host attach, customers can reduce cost for their less demanding applications
while continuing to provide high performance where necessary, utilizing the 8 Gbps FC host
ports. With the DS5000 family, you get consistent performance from a smarter design that
simplifies your infrastructure, improves your TCO, and reduces your cost. For more
information about the DS5000 family, see:
http://www.ibm.com/systems/storage/disk/ds5000/index.html
IBM Storwize V7000 Midrange Disk System
IBM® Storwize® V7000 is a virtualized storage system to complement virtualized server
environments that provide unmatched performance, availability, advanced functions, and
highly scalable capacity never seen before in midrange disk systems. Storwize V7000 is a
powerful midrange disk system that has been designed to be easy to use and enable rapid
deployment without additional resources. Storwize V7000 is virtual storage that offers greater
efficiency and flexibility through built-in solid state drive (SSD) optimization and thin
provisioning technologies. Storwize V7000 advanced functions also enable non-disruptive
migration of data from existing storage, simplifying implementation and minimizing disruption
to users. Storwize V7000 also enables you to virtualize and reuse existing disk systems,
supporting a greater potential return on investment (ROI). For more information about
Storwize V7000, see:
http://www.ibm.com/systems/storage/disk/storwize_v7000/index.html
IBM XIV Storage System
IBM offers a mid-sized configuration of its self-optimizing, self-healing, resilient disk solution,
the IBM XIV Storage System: storage reinvented for a new era. Now, organizations with
mid-size capacity requirements can take advantage of the latest IBM technology for their
most demanding applications with as little as 27 TB usable capacity and incremental
upgrades. For more information about XIV, see:
http://www.ibm.com/systems/storage/disk/xiv/index.html
IBM System Storage DS8000
The IBM System Storage DS8000 family is designed to offer high availability, multiplatform
support, and simplified management tools. With its high capacity, scalability, broad server
support, and virtualization features, the DS8000 family is well suited for simplifying the
storage environment by consolidating data from multiple storage systems on a single system.
The high-end model DS8800 is the most advanced model in the IBM DS8000 family lineup
and introduces new dual IBM POWER6-based controllers that usher in a new level of
performance for the company’s flagship enterprise disk platform. The DS8800 offers twice the
maximum physical storage capacity than the previous model. For more information about the
DS8000 family, see:
http://www.ibm.com/systems/storage/disk/ds8000/index.html
2.12 Hardware Management Console
The Hardware Management Console (HMC) is a dedicated workstation that provides a
graphical user interface (GUI) for configuring, operating, and performing basic system tasks
for the POWER7 processor-based systems (and the POWER5, POWER5+, POWER6, and
POWER6+ processor-based systems) that function in either non-partitioned or clustered
environments. In addition, the HMC is used to configure and manage partitions. One HMC is 66 IBM Power 710 and 730 Technical Overview and Introduction
capable of controlling multiple POWER5, POWER5+, POWER6, and POWER6+ and
POWER7 processor-based systems.
Several HMC models are supported to manage POWER7 processor-based systems. Two
models (7042-C08, 7042-CR6) are available for ordering at the time of writing, but you can
also use one of the withdrawn models listed in Table 2-24.
Table 2-24 HMC models supporting POWER7 processor technology based servers
At the time of writing the HMC must be running V7R7.4.0. It can also support up to 48
POWER7 systems. Updates of the machine code, HMC functions, and hardware
prerequisites can be found on Fixcentral at this website:
http://www-933.ibm.com/support/fixcentral/
2.12.1 HMC functional overview
The HMC provides three groups of functions:
Server
Virtualization
HMC management
Server management
The first group contains all functions related to the management of the physical servers under
the control of the HMC:
System password
Status Bar
Power On/Off
Capacity on Demand
Error management
– System indicators
– Error and event collection reporting
– Dump collection reporting
– Call Home
– Customer notification
– Hardware replacement (Guided Repair)
– SNMP events
Concurrent Add/Repair/Upgrade
Type-model Availability Description
7310-C05 Withdrawn IBM 7310 Model C05 Desktop Hardware Management Console
7310-C06 Withdrawn IBM 7310 Model C06 Deskside Hardware Management Console
7042-C06 Withdrawn IBM 7042 Model C06 Deskside Hardware Management Console
7042-C07 Withdrawn IBM 7042 Model C07 Deskside Hardware Management Console
7042-C08 Available IBM 7042 Model C08 Deskside Hardware Management Console
7310-CR3 Withdrawn IBM 7310 Model CR3 Rack-mounted Hardware Management Console
7042-CR4 Withdrawn IBM 7042 Model CR4 Rack-mounted Hardware Management Console
7042-CR5 Withdrawn IBM 7042 Model CR5 Rack-mounted Hardware Management Console
7042-CR6 Available IBM 7042 Model CR6 Rack-mounted Hardware Management ConsoleChapter 2. Architecture and technical overview 67
Redundant Service Processor
Firmware Updates
Virtualization management
The second group contains all of the functions related to virtualization features such as a
partitions configuration or the dynamic reconfiguration of resources:
System Plans
System Profiles
Partitions (create, activate, shutdown)
Profiles
Partition Mobility
DLPAR (processors, memory, I/O, and so on)
Custom Groups
HMC Console management
The last group relates to the management of the HMC itself, its maintenance, security, and
configuration, for example:
Guided set-up wizard
Electronic Service Agent set up wizard
User Management
– User IDs
– Authorization levels
– Customizable authorization
Disconnect and reconnect
Network Security
– Remote operation enable and disable
– User definable SSL certificates
Console logging
HMC Redundancy
Scheduled Operations
Back-up and Restore
Updates, Upgrades
Customizable Message of the day
The HMC provides both a graphical interface and a command-line interface (CLI) for all
management tasks. Remote connection to the HMC using a web browser (as of
HMC Version 7; previous versions required a special client program called WebSM) is
possible. The CLI is also available by using the Secure Shell (SSH) connection to the HMC. It
can be used by an external management system or a partition to remotely perform many
HMC operations.
2.12.2 HMC connectivity to the POWER7 processor-based systems
POWER5, POWER5+, POWER6, POWER6+, and POWER7 processor-technology based
servers that are managed by an HMC require Ethernet connectivity between the HMC and
the server’s Service Processor. In addition, if dynamic LPAR, Live Partition Mobility, or
PowerVM Active Memory Sharing operations are required on the managed partitions,
Ethernet connectivity is needed between these partitions and the HMC. A minimum of two
Ethernet ports are needed on the HMC to provide such connectivity. The rack-mounted
7042-CR5 HMC default configuration provides four Ethernet ports. The deskside 7042-C07 68 IBM Power 710 and 730 Technical Overview and Introduction
HMC standard configuration offers only one Ethernet port. Be sure to order an optional PCI-e
adapter to provide additional Ethernet ports.
For any logical partition in a server it is possible to use a Shared Ethernet Adapter that is
configured via a Virtual I/O Server. Therefore, a partition does not require its own physical
adapter to communicate with an HMC.
For the HMC to communicate properly with the managed server, eth0 of the HMC must be
connected to either the HMC1 or the HMC2 port of the managed server, although other
network configurations are possible. You can attach a second HMC to HMC Port 2 of the
server for redundancy (or vice versa). These must be addressed by two separate subnets.
Figure 2-26 shows a simple network configuration to enable the connection from HMC to the
server and to enable Dynamic LPAR operations.
Figure 2-26 HMC to service processor and LPAR network connection
For more details about HMC and the possible network connections, see Hardware
Management Console V7 Handbook, SG24-7491.
The default mechanism for allocation of the IP addresses for the service processor HMC
ports is dynamic. The HMC can be configured as a DHCP server, providing the IP address at
the time that the managed server is powered on. In this case, the FSPs are allocated an IP
address from a set of address ranges predefined in the HMC software. These predefined
ranges are identical for Version 710 of the HMC code and for previous versions.
If the service processor of the managed server does not receive a DHCP reply before time
out, predefined IP addresses will be set up on both ports. Static IP address allocation is also
an option. You can configure the IP address of the service processor ports with a static IP
address by using the Advanced System Management Interface (ASMI) menus.
Power System
LPAR
n
LPAR
...
LPAR
2
LPAR
1
ent1 entx entx entx
Service
Processor
HMC 2
eth0
eth1 HMC 1
Management LANChapter 2. Architecture and technical overview 69
2.12.3 High availability using the HMC
The HMC is an important hardware component. When in operation, POWER7
processor-based servers and their hosted partitions can continue to operate when no HMC is
available. However, in such conditions, certain operations cannot be performed, such as a
DLPAR reconfiguration, a partition migration using PowerVM Live Partition Mobility, or the
creation of a new partition. You might therefore decide to install two HMCs in a redundant
configuration so that one HMC is always operational, even when performing maintenance of
the other one, for example.
If redundant HMC function is desired, a server can be attached to two independent HMCs to
address availability requirements. Both HMCs must have the same level of Hardware
Management Console Licensed Machine Code Version 7 and installed fixes to manage
POWER7 processor-based servers or an environment with a mixture of POWER5,
POWER5+, POWER6, POWER6+, and POWER7 processor-based servers. The HMCs
provide a locking mechanism so that only one HMC at a time has write access to the service
processor. It is recommended that both HMCs are available on a public subnet to allow full
syncronization of functionality. Depending on your environment, you have multiple options to
configure the network.
Note: The service processor is used to monitor and manage the system hardware
resources and devices. The service processor offers two Ethernet 10/100 Mbps ports as
connections. Note the following information:
Both Ethernet ports are visible only to the service processor and can be used to attach
the server to an HMC or to access the ASMI options from a client web browser using
the HTTP server integrated into the service processor internal operating system.
When not configured otherwise (DHCP or from a previous ASMI setting), both Ethernet
ports of the first FSP have predefined IP addresses:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.147 with netmask
255.255.255.0.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.147 with netmask
255.255.255.0.
For the second FSP of IBM Power 770 and 780, these default addresses are:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.146 with netmask
255.255.255.0.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.146 with netmask
255.255.255.0.
For more information about the service processor, see “Service processor” on page 128.70 IBM Power 710 and 730 Technical Overview and Introduction
Figure 2-27 shows one possible highly available HMC configuration managing two servers.
These servers have only one CEC and therefore only one FSP. Each HMC is connected to
one FSP port of all managed servers.
Figure 2-27 Highly available HMC and network architecture
Note that only hardware management networks (LAN1 and LAN2) are highly available (as
shown in Figure 2-27) for simplicity. However, the management network (LAN3) can be made
highly available by using a similar concept and adding more Ethernet adapters to LPARs
and HMCs.
Both HMCs must be on a separate VLAN to protect from any network contention. Each HMC
can be a DHCP server for its VLAN.
For more details about redundant HMCs, see the Hardware Management Console V7
Handbook, SG24-7491.
2.12.4 HMC code level
The HMC code must be at V7R7.4.0 to support the Power 710 and Power 730 systems.
In a dual HMC configuration, both must be at the same version and release of the HMC.
HMC1 HMC2
System A System B
1 2
FSP
1 2
FSP
LAN 1 LAN 2
LPAR A1
LPAR A2
LPAR A3
LPAR B1
LPAR B2
LPAR B3
eth0 eth1
eth0 eth1
LAN1 – Hardware management network for
first FSP ports (private)
LAN2 – Hardware management network for
second FSP ports (private), separate
network hardware from LAN1
LAN3 - Open network for HMC access and
dLPAR operations
LAN3 – Open networkChapter 2. Architecture and technical overview 71
If you want to migrate an LPAR from a POWER6 processor-based server onto a POWER7
processor-based server using PowerVM Live Partition Mobility, consider that if the source
server is managed by one HMC and the destination server is managed by a different HMC,
you must ensure that the HMC managing the POWER6 processor-based server is at
V7R7.3.5 or later and that the HMC managing the POWER7 processor-based server is at
V7R7.4.0 or later.
2.13 IBM Systems Director Management Console
The new released IBM Systems Director Management Console (SDMC) is intended to be
used in the same manner as the HMC. It provides the same functionality, including hardware,
service, and virtualization management, for Power Systems server and Power Systems
blades. Because SDMC uses IBM Systems Director Express® Edition, it also provides all
Systems Director Express capabilities, such as monitoring of operating systems and creating
event action plans.
No configuration changes are required when a client moves from HMC management to
SDMC management.
Much of the SDMC function is equivalent to the HMC. This includes:
Server (host) management
Virtualization management
Redundancy and high availability
The SDMC offers console redundancy similar to the HMC.
The scalability and performance of the SDMC matches that of a current HMC. This includes
both the number of systems (hosts) and the number of partitions (virtual servers) that can be
managed. Currently, 48 small-tier entry servers or 32 large-tier servers can be managed by
the SDMC with up to 1,024 partitions (virtual servers) configured across those managed
systems (hosts).
Tips: Note these tips:
When upgrading the code of a dual HMC configuration, a good practice is to disconnect
one HMC to avoid having both HMCs connected to the same server but running
different levels of code. If no profiles or partition changes take place during the
upgrade, both HMCs can stay connected. If the HMCs are at different levels and a
profile change is made from the HMC at level V7R7.4.0, for example, the format of the
data stored in the server could be changed, causing the HMC at a previous level (for
example, 3.50) to possibly go into a recovery state, because it does not understand the
new data format.
Compatibility rules exist between the various software that is executing within a
POWER7 processor-based server environment:
– HMC
– VIO
– System firmware
– Partition operating systems
To check which combinations are supported, and to identify required upgrades, you can
use the Fix Level Recommendation Tool Web page:
http://www14.software.ibm.com/webapp/set2/flrt/home72 IBM Power 710 and 730 Technical Overview and Introduction
The SDMC can be obtained as a hardware appliance in the same manner as an HMC.
Hardware appliances support managing all Power Systems servers. The SDMC can
optionally be obtained in a virtual appliance format capable of running on VMware (ESX/i 4 or
later) and KVM (Red Hat Enterprise Linux (RHEL) 5.5). The virtual appliance is only
supported for managing small-tier Power servers and Power Systems blades.
Table 2-25 details whether the SDMC software appliance, hardware appliance, or both are
supported for each model.
Table 2-25 Type of SDMC appliance support for POWER7-based server
The IBM SDMC Hardware Appliance requires an IBM 7042-CR6 rack-mounted Hardware
Management Console with the IBM SDMC indicator (#0963).
Remember: At the time of writing, the SDMC is not supported for the Power 710
(8231-E1C) and Power 730 (8231-E2C) models.
IBM intends to enhance the IBM Systems Director Management Console (SDMC) to
support the Power 710 (8231-E1C) and Power 730 (8231-E2C). IBM also intends for the
current Hardware Management Console (HMC) 7042-CR6 to be upgradable to an IBM
SDMC that supports the Power 710 (8231-E1C) and Power 730 (8231-E2C).
POWER7 models Type of SDMC appliance supported
7891-73X (IBM BladeCenter PS703) Hardware or software appliance
7891-74X (IBM BladeCenter PS704) Hardware or software appliance
8202-E4B (IBM Power 720 Express) Hardware or software appliance
8205-E6B (IBM Power 740 Express) Hardware or software appliance
8406-70Y (IBM BladeCenter PS700) Hardware or software appliance
8406-71Y (IBM BladeCenter PS701 and PS702) Hardware or software appliance
8231-E2B (IBM Power 710 and IBM Power 730 Express) Hardware or software appliance
8233-E8B (IBM Power 750 Express) Hardware or software appliance
8236-E8C (IBM Power 755) Hardware or software appliance
9117-MMB (IBM Power 770) Hardware appliance only
9179-MHB (IBM Power 780) Hardware appliance only
9119-FHB (IBM Power 795) Hardware appliance only
Remember: When ordering #0963, features #0031 (No Modem), #1946 (additional 4 GB
memory), and #1998 (additional 500 GB SATA HDD) are being configured automatically.
Feature #0963 replaces the HMC software with IBM Systems Director Management
Console Hardware Appliance V6.7.3 (5765-MCH).
Neither an external modem (#0032) nor an internal modem (#0033) can be selected with
IBM SDMC indicator (#0963).
To run HMC LMC (#0962), you cannot order the additional storage (#1998). However, you
can order the additional memory (#1946) if wanted.Chapter 2. Architecture and technical overview 73
The IBM SDMC Virtual Appliance requires IBM Systems Director Management Console
V6.7.3 (5765-MCV).
The SDMC on POWER6 processor-based servers and blades requires eFirmware
level 3.5.7. A SDMC on Power Systems POWER7 processor-based servers and blades
requires eFirmware level 7.3.0.
For more detailed information about the SDMC, see IBM Systems Director Management
Console: Introduction and Overview, SG24-7860.
2.14 Operating system support
The IBM POWER7 processor-based systems support three families of operating systems:
AIX
IBM i
Linux
In addition, the Virtual I/O Server can be installed in special partitions that provide support to
the other operating systems for using features such as virtualized I/O devices, PowerVM Live
Partition Mobility, or PowerVM Active Memory Sharing.
2.14.1 Virtual I/O Server
The minimum required level of Virtual I/O Server for both the Power 710 and Power 730 is
VIOS 2.2.1.0.
IBM regularly updates the Virtual I/O Server code. For information about the latest updates,
visit the Fix Central website:
http://www-933.ibm.com/support/fixcentral/
2.14.2 IBM AIX operating system
The following sections discuss support for the various levels of AIX operating system support.
Remember: If you want to use the software appliance, you have to provide the hardware
and virtualization environment.
At a minimum, the following resources must be available to the virtual machine:
2.53 GHz Intel Xeon E5630, Quad Core processor
500 GB storage
8 GB memory
The following hypervisors are supported:
VMware (ESXi 4.0 or later)
KVM (RHEL 5.5)
Tip: For details about the software available on IBM Power Systems, visit the Power
Systems Software site:
http://www.ibm.com/systems/power/software/index.html74 IBM Power 710 and 730 Technical Overview and Introduction
IBM periodically releases maintenance packages (service packs or technology levels) for the
AIX operating system. Information about these packages, downloading, and obtaining the
CD-ROM is on the Fix Central website:
http://www-933.ibm.com/support/fixcentral/
The Fix Central website also provides information about how to obtain the fixes shipping
on CD-ROM.
The Service Update Management Assistant, which can help you to automate the task of
checking and downloading operating system downloads, is part of the base operating
system. For more information about the suma command, go to following website:
http://www14.software.ibm.com/webapp/set2/sas/f/genunix/suma.html
IBM AIX Version 5.3
The minimum level of AIX Version 5.3 to support the Power 710 and Power 730 is AIX 5.3
with the 5300-12 Technology Level and Service Pack 5 or later.
A partition using AIX Version 5.3 will be executing in POWER6 or POWER6+ compatibility
mode. This means that although the POWER7 processor has the ability to run four hardware
threads per core simultaneously, using AIX 5.3 limits the number of hardware threads per
core to two.
IBM AIX Version 6.1
The minimum level of AIX Version 6.1 to support the Power 710 and Power 730 is:
AIX 6.1 with the 6100-07 Technology Level or later
AIX 6.1 with the 6100-06 Technology Level and Service Pack 6 or later
AIX 6.1 with the 6100-05 Technology Level and Service Pack 7 or later
A partition using AIX 6.1 with TL6 can run in POWER6, POWER6+, or POWER7 mode. It is
best to run the partition in POWER7 mode to allow exploitation of new hardware capabilities
such as SMT4 and Active Memory Expansion (AME).
IBM AIX Version 7.1
These are the minimum levels of AIX Version 7.1 to supportt he Power 710 and Power 730:
AIX 7.1 with the 7100-01 Technology Level or later
AIX 7.1 with the 7100-00 Technology Level and Service Pack 4 or 1 later
A partition using AIX 7.1 can run in POWER6, POWER6+, or POWER7 mode. It is best to run
the partition in POWER7 mode to allow exploitation of new hardware capabilities such as
SMT4 and Active Memory Expansion (AME).
2.14.3 IBM i operating system
The IBM i operating system is supported on the Power 720 and Power 740 with theses
minimum required levels:
IBM i Version 6.1 with i 6.1.1 machine code, or later
IBM i Version 7.1 or later
IBM periodically releases maintenance packages (service packs or technology levels) for the
IBM i operating system. Information about these packages, downloading, and obtaining the
CD-ROM is available on the Fix Central website:
http://www-933.ibm.com/support/fixcentral/Chapter 2. Architecture and technical overview 75
2.14.4 Linux operating system
Linux is an open source operating system that runs on numerous platforms from embedded
systems to mainframe computers. It provides a UNIX-like implementation across many
computer architectures.
The supported versions of Linux on POWER7 processor-based servers are:
SUSE Linux Enterprise Server 11 Service Pack 1, or later, with one current maintenance
update available from SUSE to enable all planned functionality
Red Hat Enterprise Linux AP 5 Update 7 for POWER, or later
Red Hat Enterprise Linux 6.1 for POWER, or later
If you want to configure Linux partitions in virtualized Power Systems you have to be aware of
these conditions:
Not all devices and features that are supported by the AIX operating system are supported
in logical partitions running the Linux operating system.
Linux operating system licenses are ordered separately from the hardware. You can
acquire Linux operating system licenses from IBM, to be included with the POWER7
processor-based servers, or from other Linux distributors.
For information about the features and external devices supported by Linux, go to:
http://www.ibm.com/systems/p/os/linux/index.html
For information about SUSE Linux Enterprise Server 10, refer to:
http://www.novell.com/products/server
For information about Red Hat Enterprise Linux Advanced Server, see:
http://www.redhat.com/rhel/features
2.14.5 Java Supported versions
There are unique considerations when running Java 1.4.2 on POWER7 servers. For best
exploitation of the outstanding performance capabilities and most recent improvements of
POWER7 technology, upgrade Java-based applications to Java 7, Java 6, or Java 5
whenever possible. For more information, see:
http://www.ibm.com/developerworks/java/jdk/aix/service.html
2.14.6 Boost performance and productivity with IBM compilers
IBM® XL C, XL C/C++, and XL Fortran compilers for AIX and for Linux exploit the latest
POWER7™ processor architecture. Release after release, these compilers continue to help
improve application performance and capability, exploiting architectural enhancements made
available through the advancement of the POWER® technology.
IBM compilers are designed to optimize and tune your applications for execution on IBM
POWER platforms, to help you unleash the full power of your IT investment, to create and
maintain critical business and scientific applications, to maximize application performance,
and to improve developer productivity.
The performance gain from years of compiler optimization experience is seen in the
continuous release-to-release compiler improvements that support the POWER4™ 76 IBM Power 710 and 730 Technical Overview and Introduction
processors, through to the POWER4+™, POWER5™, POWER5+™ and POWER6®
processors, and now including the new POWER7 processors. With the support of the latest
POWER7 processor chip, IBM advances a more than 20-year investment in the XL compilers
for POWER series and PowerPC® series architectures.
XL C, XL C/C++, and XL Fortran features introduced to exploit the latest POWER7 processor
include vector unit and vector scalar extension (VSX) instruction set to efficiently manipulate
vector operations in your application, vector functions within the Mathematical Acceleration
Subsystem (MASS) libraries for improved application performance, built-in functions or
intrinsics and directives for direct control of POWER instructions at the application level, and
architecture and tune compiler options to optimize and tune your applications.
COBOL for AIX enables you to selectively target code generation of your programs to
either exploit POWER7® systems architecture or to be balanced among all supported
POWER® systems. The performance of COBOL for AIX applications is improved by
means of an enhanced back-end optimizer. The back-end optimizer, a component common
also to the IBM XL compilers, lets your applications leverage the latest industry-leading
optimization technology.
The performance of PL/I for AIX applications has been improved through both front-end
changes and back-end optimizer enhancements. The back-end optimizer, a component
common also to the IBM XL compilers, lets your applications leverage the latest
industry-leading optimization technology. It will produce for PL/I, code that is intended to
perform well across all hardware levels, including POWER7, of AIX.
IBM Rational® Development Studio for IBM i 7.1 provides programming languages for
creating modern business applications. This includes the ILE RPG, ILE COBOL, C, and C++
compilers, as well as the heritage RPG and COBOL compilers. The latest release includes
performance improvements and XML processing enhancements for ILE RPG and ILE
COBOL, improved COBOL portability with a new COMP-5 data type, and easier Unicode
migration with relaxed USC2 rules in ILE RPG. Rational has also released a product called
Rational Open Access: RPG Edition. This product opens up the ILE RPG file I/O processing,
enabling partners, tool providers, and users to write custom I/O handlers that can access
other devices like databases, services, and web user interfaces.
IBM Rational Developer for Power Systems Software™ provides a rich set of integrated
development tools that support the XL C/C++ for AIX compiler, the XL C for AIX compiler, and
the COBOL for AIX compiler. Rational Developer for Power Systems Software offers the
capabilities of file management, searching, editing, analysis, build, and debug, all integrated
into an Eclipse workbench. XL C/C++, XL C, and COBOL for AIX developers can boost
productivity by moving from older, text-based, command-line development tools to a rich set
of integrated development tools.
The IBM Rational Power Appliance solution provides a workload optimized system and
integrated development environment for AIX development on IBM Power Systems. IBM
Rational Power Appliance includes a Power Express server preinstalled with a
comprehensive set of Rational development software along with the AIX operating system.
The Rational development software includes support for Collaborative Application Lifecycle
Management (C/ALM) through Rational Team Concert, a set of software development tools
from Rational Developer for Power Systems Software, and a choice between the XL C/C++
for AIX or COBOL for AIX compilers.Chapter 2. Architecture and technical overview 77
2.15 Energy management
The Power 710 and 730 servers are designed with features to help clients become more
energy efficient. The IBM Systems Director Active Energy Manager exploits EnergyScale
technology, enabling advanced energy management features to dramatically and dynamically
conserve power and further improve energy efficiency. Intelligent Energy optimization
capabilities enable the POWER7 processor to operate at a higher frequency for increased
performance and performance per watt or dramatically reduce frequency to save energy.
2.15.1 IBM EnergyScale technology
IBM EnergyScale technology provides functions to help the user understand and dynamically
optimize the processor performance versus processor energy consumption, and system
workload, to control IBM Power Systems power and cooling usage.
On POWER7 processor-based systems, the thermal power management device (TPMD)
card is responsible for collecting the data from all system components, changing operational
parameters in components, and interacting with the IBM Systems Director Active Energy
Manager (an IBM Systems Directors plug-in) for energy management and control.
IBM EnergyScale makes use of power and thermal information collected from the system in
order to implement policies that can lead to better performance or better energy utilization.
IBM EnergyScale features include:
Power trending
EnergyScale provides continuous collection of real-time server energy consumption. This
enables administrators to predict power consumption across their infrastructure and to
react to business and processing needs. For example, administrators can use such
information to predict datacenter energy consumption at various times of the day, week, or
month.
Thermal reporting
IBM Director Active Energy Manager can display measured ambient temperature and
calculated exhaust heat index temperature. This information can help identify data center
hot spots that need attention.
Power Saver Mode
Power Saver Mode lowers the processor frequency and voltage on a fixed amount,
reducing the energy consumption of the system while still delivering predictable
performance. This percentage is predetermined to be within a safe operating limit and is
not user configurable. The server is designed for a fixed frequency drop of up to 30%
down from nominal frequency. (The actual value depends on the server type and
configuration.) Power Saver Mode is not supported during boot or re-boot although it is a
persistent condition that will be sustained after the boot when the system starts executing
instructions.
Dynamic Power Saver Mode
Dynamic Power Saver Mode varies processor frequency and voltage based on the
utilization of the POWER7 processors. Processor frequency and utilization are inversely
proportional for most workloads, implying that as the frequency of a processor increases,
its utilization decreases, given a constant workload. Dynamic Power Saver Mode takes
advantage of this relationship to detect opportunities to save power, based on measured
real-time system utilization.78 IBM Power 710 and 730 Technical Overview and Introduction
When a system is idle, the system firmware will lower the frequency and voltage to Power
Energy Saver Mode values. When fully utilized, the maximum frequency will vary,
depending on whether the user favors power savings or system performance. If an
administrator prefers energy savings and a system is fully utilized, the system is designed
to reduce the maximum frequency to 95% of nominal values. If performance is favored
over energy consumption, the maximum frequency can be increased to up to 109% of
nominal frequency for extra performance.
Dynamic Power Saver Mode is mutually exclusive with Power Saver mode. Only one of
these modes can be enabled at a given time.
Power Capping
Power Capping enforces a user-specified limit on power usage. Power Capping is not a
power-saving mechanism. It enforces power caps by throttling the processors in the
system, degrading performance significantly. The idea of a power cap is to set a limit that
must never be reached but that frees extra power never used in the data center. The
margined power is this amount of extra power that is allocated to a server during its
installation in a datacenter. It is based on the server’s environmental specifications that
usually are never reached because server specifications are always based on maximum
configurations and worst case scenarios. The user must set and enable an energy cap
from the IBM Director Active Energy Manager user interface.
Soft Power Capping
There are two power ranges into which the power cap can be set:
– Power Capping (as described previously)
– Soft Power Capping
Soft power capping extends the allowed energy capping range further, beyond a region
that can be guaranteed in all configurations and conditions. If the energy management
goal is to meet a particular consumption limit, then Soft Power Capping is the mechanism
to use.
Processor Core Nap mode
The IBM POWER7 processor uses a low-power mode called Nap that stops processor
execution when there is no work to do on that processor core. The latency of exiting Nap
is very small, typically not generating any impact on applications running. Because of that,
the POWER Hypervisor can use the Nap mode as a general-purpose idle state. When the
operating system detects that a processor thread is idle, it yields control of a hardware
thread to the POWER Hypervisor. The POWER Hypervisor immediately puts the thread
into Nap mode. Nap mode allows the hardware to turn the clock off on most of the circuits
inside the processor core. Reducing active energy consumption by turning off the clocks
allows the temperature to fall, which further reduces leakage (static) power of the circuits
causing a cumulative effect. Nap mode saves 10 - 15% of power consumption in the
processor core.
Processor core Sleep mode
To be able to save even more energy, the POWER7 processor has an even lower power
mode called Sleep. Before a core and its associated L2 and L3 caches enter Sleep mode,
caches are flushed and transition lookaside buffers (TLBs) are invalidated, and hardware
clock is turned off in the core and in the caches. Voltage is reduced to minimize leakage
current. Processor cores inactive in the system (such as CoD processor cores) are kept in
Sleep mode. Sleep mode saves about 35% power consumption in the processor core and
associated L2 and L3 caches.
Fan control and altitude input
System firmware will dynamically adjust fan speed based on energy consumption,
altitude, ambient temperature, and energy savings modes. Power Systems are designed Chapter 2. Architecture and technical overview 79
to operate in worst-case environments, in hot ambient temperatures, at high altitudes, and
with high power components. In a typical case, one or more of these constraints are not
valid. When no power savings setting is enabled, fan speed is based on ambient
temperature and assumes a high-altitude environment. When a power savings setting is
enforced (either Power Energy Saver Mode or Dynamic Power Saver Mode), fan speed
will vary based on power consumption, ambient temperature, and altitude available.
System altitude can be set in IBM Director Active Energy Manager. If no altitude is set, the
system assume sa default value of 350 meters above sea level.
Processor Folding
Processor Folding is a consolidation technique that dynamically adjusts, over the short
term, the number of processors available for dispatch to match the number of processors
demanded by the workload. As the workload increases, the number of processors made
available increases. As the workload decreases, the number of processors made
available decreases. Processor Folding increases energy savings during periods of low to
moderate workload because unavailable processors remain in low-power idle states (Nap
or Sleep) longer.
EnergyScale for I/O
IBM POWER7 processor-based systems automatically power off hot pluggable, PCI
adapter slots that are empty or not being used. System firmware automatically scans all
pluggable PCI slots at regular intervals, looking for those that meet the criteria for being
not in use and powering them off. This support is available for all POWER7
processor-based servers and the expansion units that they support.
Server Power Down
If overall data center processor utilization is low, workloads can be consolidated on fewer
numbers of servers so that some servers can be turned off completely. It makes sense to
do this when there will be long periods of low utilization, such as weekends. AEM provides
information, such as the power that will be saved and the time that it will take to bring a
server back online, that can be used to help make the decision to consolidate and power
off. As with many of the features available in IBM Systems Director and Active Energy
Manager, this function is scriptable and can be automated.
Partition Power Management
Available with Active Energy Manager 4.3.1 or later and POWER7 systems with 730
firmware release or later is the capability to set a power savings mode for partitions or the
system processor pool. As in the system-level power savings modes, the per-partition
power savings modes can be used to achieve a balance between the power consumption
and the performance of a partition. Only partitions that have dedicated processing units
can have a unique power savings setting. Partitions that run in shared processing mode
will have a common power savings setting, which is that of the system processor pool.
This is because processing unit fractions cannot be power managed.
As in the case of system-level power savings, two Dynamic Power Saver options
are offered:
– Favor partition performance
– Favor partition power savings
The user must configure this setting from Active Energy Manager. When Dynamic Power
Saver is enabled in either mode, system firmware continuously monitors the performance
and utilization of each of the computer's POWER7 processor cores that belong to the
partition. Based on this utilization and performance data, the firmware will dynamically
adjust the processor frequency and voltage, reacting within milliseconds to adjust
workload performance and also deliver power savings when the partition is underutilized.80 IBM Power 710 and 730 Technical Overview and Introduction
In addition to the two Dynamic Power Saver options, the customer can select to have no
power savings on a given partition. This option will leave the processor cores assigned to
the partition running at their nominal frequencies and voltages.
A new power savings mode, called Inherit Host Setting, is available and is only applicable
to partitions. When configured to use this setting, a partition will adopt the power savings
mode of its hosting server. By default, all partitions with dedicated processing units, and
the system processor pool, are set to Inherit Host Setting.
On POWER7 processor-based systems, several EnergyScales are imbedded in the
hardware and do not require an operating system or external management component.
More advanced functionality requires Active Energy Manager (AEM) and IBM Systems
Director.
Table 2-26 provides a list of all features actually supported, underlining all cases where
AEM is not required. The table also details the features that can be activated by traditional
user interfaces (for example, ASMI, HMC).
Table 2-26 AEM support
The Power 710 and Power 730 implement all the Energy Scale capabilities listed in 2.15.1,
“IBM EnergyScale technology” on page 77.
2.15.2 Thermal power management device card
The thermal power management device (TPMD) card is a separate micro controller installed
on some POWER6 processor-based systems, and on all POWER7 processor-based
systems. It runs real-time firmware whose sole purpose is to manage system energy.
The TPMD card monitors the processor modules, memory, environmental temperature, and
fan speed. Based on this information, it can act upon the system to maintain optimal power
and energy conditions (for example, increase the fan speed to react to a temperature
change). It also interacts with the IBM Systems Director Active Energy Manager to report
power and thermal information and to receive input from AEM on policies to be set. The
TPMD is part of the EnergyScale infrastructure.
Feature Active Energy Manager (AEM) required ASMI HMC
Power Trending Y N N
Thermal Reporting Y N N
Static Power Saver N Y Y
Dynamic Power Saver Y N N
Power Capping Y N N
Energy-optimized Fans N - -
Processor Core Nap N - -
Processor Core Sleep N - -
Processor Folding N - -
EnergyScale for I/O N - -
Server Power Down Y - -
Partition Power
Management
Y - -Chapter 2. Architecture and technical overview 81
For information about IBM EnergyScale technology, go to:
http://www.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&appname=STGE
_PO_PO_USEN&htmlfid=POW03039USEN&attachment=POW03039USEN.PDF82 IBM Power 710 and 730 Technical Overview and Introduction© Copyright IBM Corp. 2011. All rights reserved. 83
Chapter 3. Virtualization
As you look for ways to maximize the return on your IT infrastructure investments,
consolidating workloads becomes an attractive proposition.
IBM Power Systems combined with PowerVM technology are designed to help you
consolidate and simplify your IT environment, with the following key capabilities:
Improve server utilization and sharing I/O resources to reduce total cost of ownership and
make better use of IT assets.
Improve business responsiveness and operational speed by dynamically re-allocating
resources to applications as needed, to better match changing business needs or handle
unexpected changes in demand.
Simplify IT infrastructure management by making workloads independent of hardware
resources, thereby enabling you to make business-driven policies to deliver resources
based on time, cost, and service-level requirements.
This chapter discusses the virtualization technologies and features on IBM Power Systems:
POWER Hypervisor™
POWER Modes
Partitioning
Active Memory Expansion
PowerVM
System Planning Tool
384 IBM Power 710 and 730 Technical Overview and Introduction
3.1 POWER Hypervisor
Combined with features designed into the POWER7 processors, the POWER Hypervisor
delivers functions that enable other system technologies, including logical partitioning
technology, virtualized processors, IEEE VLAN compatible virtual switch, virtual SCSI
adapters, virtual Fibre Channel adapters, and virtual consoles. The POWER Hypervisor is a
basic component of the system’s firmware and offers the following functions:
Provides an abstraction between the physical hardware resources and the logical
partitions that use them.
Enforces partition integrity by providing a security layer between logical partitions.
Controls the dispatch of virtual processors to physical processors (see “Processing mode”
on page 96).
Saves and restores all processor state information during a logical processor context
switch.
Controls hardware I/O interrupt management facilities for logical partitions.
Provides virtual LAN channels between logical partitions that help to reduce the need for
physical Ethernet adapters for inter-partition communication.
Monitors the Service Processor and will perform a reset or reload if it detects the loss of
the Service Processor, notifying the operating system if the problem is not corrected.
The POWER Hypervisor is always active, regardless of the system configuration and also
when not connected to the managed console. It requires memory to support the resource
assignment to the logical partitions on the server. The amount of memory required by the
POWER Hypervisor firmware varies according to several factors. Factors influencing the
POWER Hypervisor memory requirements include:
Number of logical partitions
Number of physical and virtual I/O devices used by the logical partitions
Maximum memory values specified in the logical partition profiles
The minimum amount of physical memory required to create a partition will be the size of the
system’s Logical Memory Block (LMB). The default LMB size varies according to the amount
of memory configured in the CEC (Table 3-1).
Table 3-1 Configured CEC memory-to-default Logical Memory Block size
In most cases, however, the actual minimum requirements and recommendations of the
supported operating systems are above 256 MB. Physical memory is assigned to partitions in
increments of LMB.
The POWER Hypervisor provides the following types of virtual I/O adapters:
Virtual SCSI
Virtual Ethernet
Virtual Fibre Channel
Virtual (TTY) console
Configurable CEC memory Default Logical Memory Block
Greater than 8 GB up to 16 GB 64 MB
Greater than 16 GB up to 32 GB 128 MB
Greater than 32 GB 256 MBChapter 3. Virtualization 85
Virtual SCSI
The POWER Hypervisor provides a virtual SCSI mechanism for virtualization of storage
devices. The storage virtualization is accomplished using two paired, adapters:
A virtual SCSI server adapter
A virtual SCSI client adapter
A Virtual I/O Server partition or a IBM i partition can define virtual SCSI server adapters.
Other partitions are client partitions. The Virtual I/O Server partition is a special logical
partition, as described in 3.4.4, “Virtual I/O Server” on page 101. The Virtual I/O Server
software is included on all PowerVM Editions and when using the PowerVM Standard Edition,
and PowerVM Enterprise Edition, dual Virtual I/O Servers can be deployed to provide
maximum availability for client partitions when performing Virtual I/O Server maintenance.
Virtual Ethernet
The POWER Hypervisor provides a virtual Ethernet switch function that allows partitions on
the same server to use a fast and secure communication without any need for physical
interconnection. The virtual Ethernet allows a transmission speed in the range of 1 - 3 Gbps,
depending on the maximum transmission unit (MTU) size and CPU entitlement. Virtual
Ethernet support began with IBM AIX Version 5.3, or an appropriate level of Linux supporting
virtual Ethernet devices (see 3.4.9, “Operating system support for PowerVM” on page 113).
The virtual Ethernet is part of the base system configuration.
Virtual Ethernet has the following major features:
The virtual Ethernet adapters can be used for both IPv4 and IPv6 communication and can
transmit packets with a size up to 65,408 bytes. Therefore, the maximum MTU for the
corresponding interface can be up to 65,394 (65,390 if VLAN tagging is used).
The POWER Hypervisor presents itself to partitions as a virtual 802.1Q-compliant switch.
The maximum number of VLANs is 4096. Virtual Ethernet adapters can be configured as
either untagged or tagged (following the IEEE 802.1Q VLAN standard).
A partition can support 256 virtual Ethernet adapters. Besides a default port VLAN ID,
the number of additional VLAN ID values that can be assigned per virtual Ethernet
adapter is 20, which implies that each virtual Ethernet adapter can be used to access
21 virtual networks.
Each partition operating system detects the virtual local area network (VLAN) switch
as an Ethernet adapter without the physical link properties and asynchronous data
transmit operations.
Any virtual Ethernet can also have connectivity outside of the server if a layer-2 bridge to
a physical Ethernet adapter is set in one Virtual I/O Server partition (see 3.4.4, “Virtual I/O
Server” on page 101, for more details about shared Ethernet), also known as Shared
Ethernet Adapter.
Remember: Virtual Ethernet is based on the IEEE 802.1Q VLAN standard. No physical
I/O adapter is required when creating a VLAN connection between partitions, and no
access to an outside network is required.86 IBM Power 710 and 730 Technical Overview and Introduction
Virtual Fibre Channel
A virtual Fibre Channel adapter is a virtual adapter that provides client logical partitions with a
Fibre Channel connection to a storage area network through the Virtual I/O Server logical
partition. The Virtual I/O Server logical partition provides the connection between the virtual
Fibre Channel adapters on the Virtual I/O Server logical partition and the physical Fibre
Channel adapters on the managed system. Figure 3-1 depicts the connections between the
client partition virtual Fibre Channel adapters and the external storage. For additional
information, see 3.4.8, “N_Port ID virtualization” on page 112.
Figure 3-1 Connectivity between virtual Fibre Channels adapters and external SAN devices
Virtual (TTY) console
Each partition must have access to a system console. Tasks such as operating system
installation, network setup, and various problem analysis activities require a dedicated
system console. The POWER Hypervisor provides the virtual console by using a virtual TTY
or serial adapter and a set of Hypervisor calls to operate on them. Virtual TTY does not
require the purchase of any additional features or software such as the PowerVM Edition
features.
Depending on the system configuration, the operating system console can be provided by the
Hardware Management Console virtual TTY, IVM virtual TTY, or from a terminal emulator
that is connected to a system port.
Client logical
partition 1
Client virtual
fibre channel
adapter
Virtual I/O Server 1
Client logical
partition 2
Client virtual
fibre channel
adapter
Client logical
partition 3
Client virtual
fibre channel
adapter
Hypervisor
Physical fibre
channel adapter
Server virtual fibre
channel adapter
Server virtual fibre
channel adapter
Server virtual fibre
channel adapter
Storage Area
Network
Physical
storage 1
Physical
storage 2
Physical
storage 3Chapter 3. Virtualization 87
3.2 POWER processor modes
Although, strictly speaking, not a virtualization feature, the POWER modes are described
here because they affect various virtualization features.
On Power System servers, partitions can be configured to run in several modes, including:
POWER6 compatibility mode
This execution mode is compatible with Version 2.05 of the Power Instruction Set
Architecture (ISA). For more information, visit the following address:
http://www.power.org/resources/reading/PowerISA_V2.05.pdf
POWER6+ compatibility mode
This mode is similar to POWER6, with eight additional Storage Protection Keys.
POWER7 mode
This is the native mode for POWER7 processors, implementing the v2.06 of the Power
Instruction Set Architecture. For more information, visit the following address:
http://www.power.org/resources/downloads/PowerISA_V2.06_PUBLIC.pdf
The selection of the mode is made on a per partition basis, from the HMC, by editing the
partition profile (Figure 3-2).
Figure 3-2 Configuring partition profile compatibility mode from the HMC88 IBM Power 710 and 730 Technical Overview and Introduction
Table 3-2 lists the differences between these modes.
Table 3-2 Differences between POWER6 and POWER7 mode
3.3 Active Memory Expansion
Power Active Memory Expansion Enablement is an optional feature of POWER7
processor-based servers that must be specified when creating the configuration in the
e-Config tool, as follows:
IBM Power 710 #4795
IBM Power 730 #4795
This feature enables memory expansion on the system. Using compression/decompression
of memory content can effectively expand the maximum memory capacity, providing
additional server workload capacity and performance.
Active Memory Expansion is an innovative POWER7 technology that allows the effective
maximum memory capacity to be much larger than the true physical memory maximum.
Compression/decompression of memory content can allow memory expansion up to 100%,
which in turn enables a partition to perform significantly more work or support more users with
the same physical amount of memory. Similarly, it can allow a server to run more partitions
and do more work for the same physical amount of memory.
Active Memory Expansion is available for partitions running AIX 6.1, Technology Level 4 with
SP2, or later.
Active Memory Expansion uses the CPU resource of a partition to compress/decompress the
memory contents of this same partition. The trade-off of memory capacity for processor
cycles can be an excellent choice, but the degree of expansion varies based on how
compressible the memory content is, and it also depends on having adequate spare CPU
POWER6 and POWER6+
mode
POWER7 mode Customer value
2-thread SMT 4-thread SMT Throughput performance,
processor core utilization
VMX (Vector Multimedia
Extension/AltiVec)
Vector Scalar Extension (VSX) High-performance computing
Affinity OFF by default 3-tier memory, Micropartition
Affinity
Improved system performance
for system images spanning
sockets and nodes
Barrier aynchronization
Fixed 128-byte array,
Kernel Extension Access
Enhanced barrier
synchronization
Variable sized array, user
shared memory access
High-performance computing
parallel programming
synchronization facility
64-core and 128-thread
scaling
32-core and 128-thread
scaling
64-core and 256-thread
scaling
256-core and 1024-thread
scaling
Performance and scalability for
large scale-up single system
image workloads (such as
OLTP, ERP scale-up, and
WPAR consolidation)
EnergyScale CPU Idle EnergyScale CPU idle and
folding with NAP and SLEEP
Improved energy efficiencyChapter 3. Virtualization 89
capacity available for this compression/decompression. Tests in IBM laboratories, using
sample work loads, showed excellent results for many workloads in terms of memory
expansion per additional CPU utilized. Other test workloads had more modest results.
Clients have much control over Active Memory Expansion usage. Each individual AIX
partition can turn on or turn off Active Memory Expansion. Control parameters set the amount
of expansion desired in each partition to help control the amount of CPU used by the Active
Memory Expansion function. An initial program load (IPL) is required for the specific partition
that is turning memory expansion on or off. After turned on, monitoring capabilities are
available in standard AIX performance tools, such as lparstat, vmstat, topas, and svmon.
Figure 3-3 represents the percentage of CPU that is used to compress memory for two
partitions with separate profiles. The green curve corresponds to a partition that has spare
processing power capacity. The blue curve corresponds to a partition constrained in
processing power.
Figure 3-3 CPU usage versus memory expansion effectiveness
Both cases show that there is a knee-of-curve relationship for CPU resource required for
memory expansion:
Busy processor cores do not have resources to spare for expansion.
The more memory expansion done, the more CPU resource is required.
The knee varies depending on how compressible the memory contents are. This example
demonstrates the need for a case-by-case study of whether memory expansion can provide a
positive return on investment.
% CPU
utilization
for
expansion
Amount of memory expansion
1 = Plenty of spare
CPU resource
available
2 = Constrained
CPU resource –
already running at
significant utilization
1
2
Very cost effective90 IBM Power 710 and 730 Technical Overview and Introduction
To help you perform this study, a planning tool is included with AIX 6.1 Technology Level 4,
allowing you to sample actual workloads and estimate how expandable the partition's
memory is and how much CPU resource is needed. Any model Power System can run the
planning tool. Figure 3-4 shows an example of the output returned by this planning tool. The
tool outputs various real memory and CPU resource combinations to achieve the desired
effective memory. It also recommends one particular combination. In this example, the tool
recommends that you allocate 58% of a processor, to benefit from 45% extra memory
capacity.
Figure 3-4 Output from Active Memory Expansion planning tool
Active Memory Expansion Modeled Statistics:
-----------------------
Modeled Expanded Memory Size : 8.00 GB
Expansion True Memory Modeled Memory CPU Usage
Factor Modeled Size Gain Estimate
--------- -------------- ----------------- -----------
1.21 6.75 GB 1.25 GB [ 19%] 0.00
1.31 6.25 GB 1.75 GB [ 28%] 0.20
1.41 5.75 GB 2.25 GB [ 39%] 0.35
1.51 5.50 GB 2.50 GB[ 45%] 0.58
1.61 5.00 GB 3.00 GB [ 60%] 1.46
Active Memory Expansion Recommendation:
---------------------
The recommended AME configuration for this workload is to configure
the LPAR with a memory size of 5.50 GB and to configure a memory
expansion factor of 1.51. This will result in a memory expansion of
45% from the LPAR's current memory size. With this configuration,
the estimated CPU usage due to Active Memory Expansion is
approximately 0.58 physical processors, and the estimated overall
peak CPU resource required for the LPAR is 3.72 physical processors.Chapter 3. Virtualization 91
After you select the value of the memory expansion factor that you want to achieve, you can
use this value to configure the partition from the HMC (Figure 3-5).
Figure 3-5 Using the planning tool result to configure the partition
On the HMC menu describing the partition, check the Active Memory Expansion box and
enter true and maximum memory, and the memory expansion factor. To turn off expansion,
clear the check box. In both cases, a reboot of the partition is needed to activate the change.
In addition, a one-time, 60-day trial of Active Memory Expansion is available to provide more
exact memory expansion and CPU measurements. The trial can be requested using the
Capacity on Demand Web page:
http://www.ibm.com/systems/power/hardware/cod/
Active Memory Expansion can be ordered with the initial order of the server or as an MES
order. A software key is provided when the enablement feature is ordered that is applied to
the server. Rebooting is not required to enable the physical server. The key is specific to an
individual server and is permanent. It cannot be moved to a separate server. This feature is
ordered per server, independently of the number of partitions using memory expansion.
Active Memory Expansion Modeled Statistics:
-----------------------
Modeled Expanded Memory Size : 8.00 GB
Expansion True Memory Modeled Memory CPU Usage
Factor Modeled Size Gain Estimate
--------- -------------- ----------------- -----------
1.21 6.75 GB 1.25 GB [ 19%] 0.00
1.31 6.25 GB 1.75 GB [ 28%] 0.20
1.41 5.75 GB 2.25 GB [ 39%] 0.35
1.51 5.50 GB 2.50 GB[ 45%] 0.58
1.61 5.00 GB 3.00 GB [ 60%] 1.46
Active Memory Expansion Recommendation:
---------------------
The recommended AME configuration for this workload is to
configure the LPAR with a memory size of 5.50 GB and to configure
a memory expansion factor of 1.51. This will result in a memory
expansion of 45% from the LPAR's current memory size. With this
configuration, the estimated CPU usage due to Active Memory
Expansion is approximately 0.58 physical processors, and the
estimated overall peak CPU resource required for the LPAR is 3.72
physical processors.
5.5 true
8.0 max
Sample output92 IBM Power 710 and 730 Technical Overview and Introduction
From the HMC, you can view whether the Active Memory Expansion feature has been
activated (Figure 3-6).
Figure 3-6 Server capabilities listed from the HMC
For detailed information regarding Active Memory Expansion, you can download the Active
Memory Expansion: Overview and Usage Guide from this location:
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&appname=S
TGE_PO_PO_USEN&htmlfid=POW03037USEN
3.4 PowerVM
The PowerVM platform is the family of technologies, capabilities and offerings that deliver
industry-leading virtualization on the IBM Power Systems. It is the new umbrella branding
term for Power Systems Virtualization (Logical Partitioning, Micro-Partitioning™, Power
Hypervisor, Virtual I/O Server, Live Partition Mobility, Workload Partitions, and more). As with
Advanced Power Virtualization in the past, PowerVM is a combination of hardware
enablement and value-added software. Section 3.4.1, “PowerVM editions” on page 93,
discusses the licensed features of each of the three separate editions of PowerVM.
Requirement: If you want to move an LPAR using Active Memory Expansion to a different
system using Live Partition Mobility, the target system must support AME (the target
system must have AME activated with the software key). If the target system does not
have AME activated, the mobility operation will fail during the pre-mobility check phase,
and an appropriate error message will be displayed to the user.Chapter 3. Virtualization 93
3.4.1 PowerVM editions
This section provides information about the virtualization capabilities of the PowerVM. The
three editions of PowerVM are suited for various purposes, as follows:
PowerVM Express Edition
PowerVM Express Edition is designed for customers looking for an introduction to more
advanced virtualization features at a highly affordable price, generally in single-server
projects.
PowerVM Standard Edition
This edition provides advanced virtualization functions and is intended for production
deployments and server consolidation.
PowerVM Enterprise Edition
This edition is suitable for large server deployments such as multi-server deployments and
cloud infrastructure. It includes unique features like Active Memory Sharing and Live
Partition Mobility.
Table 3-3 lists the versions of PowerVM that are available on Power 710 and Power 730.
Table 3-3 Availability of PowerVM per POWER7 processor technology based server model
For more information about the features included on each version of PowerVM, see IBM
PowerVM Virtualization Introduction and Configuration, SG24-7940.
3.4.2 Logical partitions (LPARs)
LPARs and virtualization increase utilization of system resources and add a new level of
configuration possibilities. This section provides details and configuration specifications about
this topic.
Dynamic logical partitioning
Logical partitioning was introduced with the POWER4™ processor-based product line
and the AIX Version 5.1 operating system. This technology offered the capability to
divide a pSeries® system into separate logical systems, allowing each LPAR to run an
operating environment on dedicated attached devices, such as processors, memory,
and I/O components.
Later, dynamic logical partitioning increased the flexibility, allowing selected system
resources, such as processors, memory, and I/O components, to be added and deleted from
logical partitions while they are executing. AIX Version 5.2, with all the necessary
enhancements to enable dynamic LPAR, was introduced in 2002. The ability to reconfigure
dynamic LPARs encourages system administrators to dynamically redefine all available
system resources to reach the optimum capacity for each defined dynamic LPAR.
PowerVM editions Express Standard Enterprise
IBM Power 710 #5225 #5227 #5228
IBM Power 730 #5225 #5227 #5228
Requirement: At the time of writing, the IBM Power 710 (8231-E1C) and Power 730
(8231-E2C) have to be managed by the Hardware Management Console or by the
Integrated Virtualization Manager.94 IBM Power 710 and 730 Technical Overview and Introduction
Micro-Partitioning
Micro-Partitioning technology allows you to allocate fractions of processors to a logical
partition. This technology was introduced with POWER5 processor-based systems. A logical
partition using fractions of processors is also known as a shared processor partition or
micro-partition. Micro-partitions run over a set of processors called Shared Processor Pools.
Virtual processors are used to let the operating system manage the fractions of processing
power assigned to the logical partition. From an operating system perspective, a virtual
processor cannot be distinguished from a physical processor, unless the operating system
has been enhanced to be made aware of the difference. Physical processors are abstracted
into virtual processors that are available to partitions. The meaning of the term physical
processor in this section is a processor core. For example, a 2-core server has two physical
processors.
When defining a shared processor partition, several options have to be defined:
The minimum, desired, and maximum processing units
Processing units are defined as processing power, or the fraction of time that the partition
is dispatched on physical processors. Processing units define the capacity entitlement of
the partition.
The Shared Processor Pools
Pick one from the list with the names of each configured Shared Processor Pool. This list
also displays the pool ID of each configured Shared Processor Pool in parentheses. If the
name of the desired Shared Processor Pools is not available here, you must first configure
the desired Shared Processor Pools using the Shared Processor Pool Management
window. Shared processor partitions use the default Shared Processor Pool called
DefaultPool by default. See 3.4.3, “Multiple Shared Processor Pools” on page 97, for
details about Multiple Shared Processor Pools.
Whether the partition will be able to access extra processing power to “fill up” its virtual
processors above its capacity entitlement (selecting either to cap or uncap your partition)
If there is spare processing power available in the Shared Processor Pools or other
partitions are not using their entitlement, an uncapped partition can use additional
processing units if its entitlement is not enough to satisfy its application
processing demand.
The weight (preference) in the case of an uncapped partition
The minimum, desired, and maximum number of virtual processors
The POWER Hypervisor calculates partition’s processing power based on minimum, desired,
and maximum values, processing mode, and is also based on requirements of other active
partitions. The actual entitlement is never smaller than the processing units’ desired value,
but it can exceed that value in the case of an uncapped partition up to the number of virtual
processors allocated.
A partition can be defined with a processor capacity as small as 0.10 processing units. This
represents 0.10 of a physical processor. Each physical processor can be shared by up to 10
shared processor partitions, and the partition’s entitlement can be incremented fractionally by
as little as 0.01 of the processor. The shared processor partitions are dispatched and
time-sliced on the physical processors under control of the POWER Hypervisor. The shared
processor partitions are created and managed by the managed console or Integrated
Virtualization Management.Chapter 3. Virtualization 95
The IBM Power 710 supports up to eight cores, and has the following maximums:
Up to eight dedicated partitions
Up to 80 micro-partitions (10 micro-partitions per physical active core)
The Power 730 allows up to 16 cores in a single system, supporting the following maximums:
Up to 16 dedicated partitions
Up to 160 micro-partitions (10 micro-partitions per physical active core)
An important point is that the maximums stated are supported by the hardware, but the
practical limits depend on the application workload demands.
Additional information about virtual processors includes:
A virtual processor can be running (dispatched) either on a physical processor or as a
standby waiting for a physical processor to became available.
Virtual processors do not introduce any additional abstraction level. They really are only a
dispatch entity. When running on a physical processor, virtual processors run at the same
speed as the physical processor.
Each partition’s profile defines CPU entitlement that determines how much processing
power any given partition will receive. The total sum of CPU entitlement of all partitions
cannot exceed the number of available physical processors in a Shared Processor Pool.
The number of virtual processors can be changed dynamically through a dynamic
LPAR operation.96 IBM Power 710 and 730 Technical Overview and Introduction
Processing mode
When you create a logical partition you can assign entire processors for dedicated use, or
you can assign partial processing units from a Shared Processor Pool. This setting defines
the processing mode of the logical partition. Figure 3-7 shows a diagram of the concepts
discussed in this section.
Figure 3-7 Logical partitioning concepts
Dedicated mode
In dedicated mode, physical processors are assigned as a whole to partitions. The
simultaneous multithreading feature in the POWER7 processor core allows the core to
execute instructions from two or four independent software threads simultaneously. To
support this feature we use the concept of logical processors. The operating system (AIX,
IBM i, or Linux) sees one physical processor as two or four logical processors if the
simultaneous multithreading feature is on. It can be turned off and on dynamically while the
operating system is executing (for AIX, use the smtctl command). If simultaneous
multithreading is off, each physical processor is presented as one logical processor, and thus
only one thread.
Shared dedicated mode
On POWER7 processor technology based servers, you can configure dedicated partitions to
become processor donors for idle processors that they own, allowing for the donation of
spare CPU cycles from dedicated processor partitions to a Shared Processor Pool. The
dedicated partition maintains absolute priority for dedicated CPU cycles. Enabling this feature
might help to increase system utilization, without compromising the computing power for
critical workloads in a dedicated processor.
Set of micro-partitions
KEY:
vp Virtual processor
lp Logical processor
PrU Processing Units
Shared-Processor Pool 0
Set of micro-partitions
Shared-Processor Pool 1
lp lp lp lp lp lp lp lp lp lp lp lp
AIX V6.1
1.5 PrU
AIX V5.3
0.5 PrU
AIX V6.1
1.5 PrU
Linux
0.5 PrU
vp vp vp vp vp vp
lp lp lp lp lp lp
AIX V5.3
AIX V6.1
Dedicated processors Dedicated processors
POWER Hypervisor
8-core SMP SystemChapter 3. Virtualization 97
Shared mode
In shared mode, logical partitions use virtual processors to access fractions of physical
processors. Shared partitions can define any number of virtual processors (the maximum
number is 10 times the number of processing units assigned to the partition). From the
POWER Hypervisor point of view, virtual processors represent dispatching objects. The
POWER Hypervisor dispatches virtual processors to physical processors according to
partition’s processing units’ entitlement. One processing unit represents one physical
processor’s processing capacity. At the end of the POWER Hypervisor’s dispatch cycle (10
ms), all partitions will receive total CPU time equal to their processing units’ entitlement. The
logical processors are defined on top of virtual processors. So, even with a virtual processor,
the concept of a logical processor exists and the number of logical processors depends on
whether the simultaneous multithreading is turned on or off.
3.4.3 Multiple Shared Processor Pools
Multiple Shared Processor Pools (MSPPs) is a capability supported on POWER7 processor
and POWER6 processor based servers. This capability allows a system administrator to
create a set of micro-partitions with the purpose of controlling the processor capacity that can
be consumed from the physical shared-processor pool.
To implement MSPPs, there is a set of underlying techniques and technologies. Figure 3-8
shows an overview of the architecture of Multiple Shared Processor Pools.
Figure 3-8 Overview of the architecture of Multiple Shared Processor Pools
POWER Hypervisor
p0 p1
Physical Shared-Processor Pool
p2 p3 p4 p5 p6 p7
Shared Processor Pool
0
Set of micro-partitions
AIX V5.3
EC 1.6
AIX V6.1
EC 0.8
Linux
EC 0.5
vp0
vp1
vp2 vp3
vp4
AIX V6.1
EC 1.6
AIX V6.1
EC 0.8
Linux
EC 0.5
vp5
vp6
vp7
vp8 vp9
vp10
Shared Processor Pool
1
Set of micro-partitions
Unused capacity in SPP0
is
redistributed to uncapped
micro-partitions within SPP0
Unused capacity in SPP1
is
redistributed to uncapped
micro-partitions within SPP1
KEY:
EC Entitled Capacity
p Physical processor
vp Virtual processor
SPPn Shared-Processor Pooln98 IBM Power 710 and 730 Technical Overview and Introduction
Micro-partitions are created and then identified as members of either the default
Shared-Processor Pool0
or a user-defined Shared-Processor Pooln
. The virtual processors
that exist within the set of micro-partitions are monitored by the POWER Hypervisor, and
processor capacity is managed according to user-defined attributes.
If the Power Systems server is under heavy load, each micro-partition within a
Shared-Processor Pool is guaranteed its processor entitlement plus any capacity that it might
be allocated from the reserved pool capacity if the micro-partition is uncapped.
If some micro-partitions in a Shared-Processor Pool do not use their capacity entitlement, the
unused capacity is ceded and other uncapped micro-partitions within the same
Shared-Processor Pool are allocated the additional capacity according to their uncapped
weighting. In this way, the entitled pool capacity of a Shared-Processor Pool is distributed to
the set of micro-partitions within that Shared-Processor Pool.
All Power Systems servers that support the Multiple Shared Processor Pools capability will
have a minimum of one (the default) Shared-Processor Pool and up to a maximum of 64
Shared-Processor Pools.
Default Shared-Processor Pool (SPP0
)
On any Power Systems server supporting Multiple Shared Processor Pools, a default
Shared-Processor Pool is always automatically defined. The default Shared-Processor Pool
has a pool identifier of zero (SPP-ID = 0) and can also be referred to as SPP0
. The default
Shared-Processor Pool has the same attributes as a user-defined Shared-Processor Pool
except that these attributes are not directly under the control of the system administrator.
They have fixed values (Table 3-4).
Table 3-4 Attribute values for the default Shared-Processor Pool (SPP0
)
Creating Multiple Shared Processor Pools
The default Shared-Processor Pool (SPP0
) is automatically activated by the system and is
always present.
All other Shared-Processor Pools exist, but by default, are inactive. By changing the
maximum pool capacity of a Shared-Processor Pool to a value greater than zero, it becomes
active and can accept micro-partitions (either transferred from SPP0
or newly created).
SPP0
attribute Value
Shared-Processor Pool ID 0.
Maximum pool capacity The value is equal to the capacity in the physical
shared-processor pool.
Reserved pool capacity 0.
Entitled pool capacity Sum (total) of the entitled capacities of the micro-partitions in the
default Shared-Processor Pool.Chapter 3. Virtualization 99
Levels of processor capacity resolution
The two levels of processor capacity resolution implemented by the POWER Hypervisor and
Multiple Shared Processor Pools are:
Level0
The first level, Level0
, is the resolution of capacity within the same Shared-Processor
Pool. Unused processor cycles from within a Shared-Processor Pool are harvested and
then redistributed to any eligible micro-partition within the same Shared-Processor Pool.
Level1
This is the second level of processor capacity resolution. When all Level0
capacity has
been resolved within the Multiple Shared Processor Pools, the POWER Hypervisor
harvests unused processor cycles and redistributes them to eligible micro-partitions
regardless of the Multiple Shared Processor Pools structure.
Figure 3-9 shows the two levels of unused capacity redistribution implemented by the
POWER Hypervisor.
Figure 3-9 The two levels of unused capacity redistribution
Capacity allocation above the entitled pool capacity (Level1
)
The POWER Hypervisor initially manages the entitled pool capacity at the Shared-Processor
Pool level. This is where unused processor capacity within a Shared-Processor Pool is
POWER Hypervisor
SPP0
SPP1
SPP2
SPPn
Micro-partition
n
SPP2
capacity
resolution
SPPn
capacity
resolution
SPP1
capacity
resolution
SPP0
capacity
resolution
Physical Shared Processor Pool
p0 p1 p2 p3 p4 p5
Level
1
capacity
resolution
Level
1
capacity resolution
POWER Hypervisor harvests unused
processor capacity from Shared-Processor
Pools and redistributes it across all
uncapped micro-partitions regardless of the
Shared-Processor Pool structure
Level
0
capacity resolution
Resolution of the Entitled Pool Capacity
within the same Shared-Processor Pool
Level
0
capacity
resolution
Micro-partition
0
Micro-partition1
Micro-partition
2
Micro-partition
3
Micro-partition
n100 IBM Power 710 and 730 Technical Overview and Introduction
harvested and then redistributed to uncapped micro-partitions within the same
Shared-Processor Pool. This level of processor capacity management is sometimes referred
to as Level0
capacity resolution.
At a higher level, the POWER Hypervisor harvests unused processor capacity from the
Multiple Shared Processor Pools that do not consume all of their entitled pool capacity. If a
particular Shared-Processor Pool is heavily loaded and several of the uncapped
micro-partitions within it require additional processor capacity (above the entitled pool
capacity), then the POWER Hypervisor redistributes some of the extra capacity to the
uncapped micro-partitions. This level of processor capacity management is sometimes
referred to as Level1
capacity resolution.
To redistribute unused processor capacity to uncapped micro-partitions in Multiple Shared
Processor Pools above the entitled pool capacity, the POWER Hypervisor uses a higher level
of redistribution, Level1
.
Where there is unused processor capacity in under-utilized Shared-Processor
Pools, the micro-partitions within the Shared-Processor Pools cede the capacity to the
POWER Hypervisor.
In busy Shared-Processor Pools where the micro-partitions have used all of the entitled pool
capacity, the POWER Hypervisor allocates additional cycles to micro-partitions, in which all
of the following statements are true:
The maximum pool capacity of the Shared-Processor Pool hosting the micro-partition
has not been met.
The micro-partition is uncapped.
The micro-partition has enough virtual-processors to take advantage of the
additional capacity.
Under these circumstances, the POWER Hypervisor allocates additional processor capacity
to micro-partitions on the basis of their uncapped weights independent of the
Shared-Processor Pool hosting the micro-partitions. This can be referred to as Level1
capacity resolution. Consequently, when allocating additional processor capacity in excess of
the entitled pool capacity of the Shared-Processor Pools, the POWER Hypervisor takes the
uncapped weights of all micro-partitions in the system into account, regardless of the Multiple
Shared Processor Pools structure.
Dynamic adjustment of maximum pool capacity
The maximum pool capacity of a Shared-Processor Pool, other than the default
Shared-Processor Pool0
, can be adjusted dynamically from the managed console, using
either the graphical interface or command-line interface (CLI).
Dynamic adjustment of Reserve Pool Capacity
The reserved pool capacity of a Shared-Processor Pool, other than the default
Shared-Processor Pool0
, can be adjusted dynamically from the managed console, using
either the graphical interface or CLI.
Important: Level1
capacity resolution: When allocating additional processor capacity in
excess of the entitled pool capacity of the Shared-Processor Pool, the POWER Hypervisor
takes the uncapped weights of all micro-partitions in the system into account, regardless
of the Multiple Shared-Processor Pool structure.Chapter 3. Virtualization 101
Dynamic movement between Shared-Processor Pools
A micro-partition can be moved dynamically from one Shared-Processor Pool to another
using the managed console using either the graphical interface or CLI. Because the entitled
pool capacity is partly made up of the sum of the entitled capacities of the micro-partitions,
removing a micro-partition from a Shared-Processor Pool reduces the entitled pool capacity
for that Shared-Processor Pool. Similarly, the entitled pool capacity of the Shared-Processor
Pool that the micro-partition joins will increase.
Deleting a Shared-Processor Pool
Shared-Processor Pools cannot be deleted from the system. However, they are
deactivated by setting the maximum pool capacity and the reserved pool capacity to
zero. The Shared-Processor Pool will still exist but will not be active. Use the managed
console interface to deactivate a Shared-Processor Pool. A Shared-Processor Pool
cannot be deactivated unless all micro-partitions hosted by the Shared-Processor Pool
have been removed.
Live Partition Mobility and Multiple Shared Processor Pools
A micro-partition may leave a Shared-Processor Pool because of PowerVM Live Partition
Mobility. Similarly, a micro-partition may join a Shared-Processor Pool in the same way.
When performing PowerVM Live Partition Mobility, you are given the opportunity to
designate a destination Shared-Processor Pool on the target server to receive and host
the migrating micro-partition.
Because several simultaneous micro-partition migrations are supported by PowerVM Live
Partition Mobility, it is conceivable to migrate the entire Shared-Processor Pool from one
server to another.
3.4.4 Virtual I/O Server
The Virtual I/O Server is part of all PowerVM Editions. It is a special purpose partition that
allows the sharing of physical resources between logical partitions to allow more efficient
utilization (for example, consolidation). In this case, the Virtual I/O Server owns the physical
resources (SCSI, Fibre Channel, network adapters, and optical devices) and allows client
partitions to share access to them, thus minimizing the number of physical adapters in the
system. The Virtual I/O Server eliminates the requirement that every partition owns a
dedicated network adapter, disk adapter, and disk drive. The Virtual I/O Server supports
OpenSSH for secure remote logins. It also provides a firewall for limiting access by ports,
network services, and IP addresses.102 IBM Power 710 and 730 Technical Overview and Introduction
Figure 3-10 shows an overview of a Virtual I/O Server configuration.
Figure 3-10 Architectural view of the Virtual I/O Server
Because the Virtual I/O Server is an operating system-based appliance server, redundancy
for physical devices attached to the Virtual I/O Server can be provided by using capabilities
such as Multipath I/O and IEEE 802.3ad Link Aggregation.
Installation of the Virtual I/O Server partition is performed from a special system backup DVD
that is provided to clients who order any PowerVM edition. This dedicated software is only for
the Virtual I/O Server (and IVM in case it is used) and is only supported in special Virtual I/O
Server partitions. Three major virtual devices are supported by the Virtual I/O Server:
Shared Ethernet Adapter
Virtual SCSI
Virtual Fibre Channel adapter
The Virtual Fibre Channel adapter is used with the NPIV feature, described in 3.4.8, “N_Port
ID virtualization” on page 112.
Shared Ethernet Adapter
A Shared Ethernet Adapter (SEA) can be used to connect a physical Ethernet network to a
virtual Ethernet network. The Shared Ethernet Adapter provides this access by connecting
the internal Hypervisor VLANs with the VLANs on the external switches. Because the Shared
Ethernet Adapter processes packets at layer 2, the original MAC address and VLAN tags of
the packet are visible to other systems on the physical network. IEEE 802.1 VLAN tagging
is supported.
The Shared Ethernet Adapter also provides the ability for several client partitions to share
one physical adapter. With an SEA, you can connect internal and external VLANs using a
physical adapter. The Shared Ethernet Adapter service can only be hosted in the Virtual I/O
Server, not in a general-purpose AIX or Linux partition, and acts as a layer-2 network bridge
to securely transport network traffic between virtual Ethernet networks (internal) and one or
more (EtherChannel) physical network adapters (external). These virtual Ethernet network
adapters are defined by the POWER Hypervisor on the Virtual I/O Server
Virtual I/O Server Hypervisor
Shared Ethernet
Adapter
Physical Ethernet
Adapter
Physical Disk
Adapter
Virtual I/O Client 1
Virtual Ethernet
Adapter
Virtual SCSI
Adapter
Virtual I/O Client 2
Virtual Ethernet
Adapter
Virtual SCSI
Adapter
Virtual Ethernet
Adapter
Virtual SCSI
Adapter
Physical
Disk
Physical
Disk
External Network
Tip: A Linux partition can provide a bridging function also, by using the brctl command.Chapter 3. Virtualization 103
Figure 3-11 shows a configuration example of an SEA with one physical and two virtual
Ethernet adapters. An SEA can include up to 16 virtual Ethernet adapters on the Virtual I/O
Server that share the same physical access.
Figure 3-11 Architectural view of a Shared Ethernet Adapter
A single SEA setup can have up to 16 Virtual Ethernet trunk adapters, and each virtual
Ethernet trunk adapter can support up to 20 VLAN networks. Therefore, a possibility is for a
single physical Ethernet to be shared between 320 internal VLAN networks. The number of
shared Ethernet adapters that can be set up in a Virtual I/O Server partition is limited only by
the resource availability, because there are no configuration limits.
Unicast, broadcast, and multicast are supported, so protocols that rely on broadcast or
multicast, such as Address Resolution Protocol (ARP), Dynamic Host Configuration Protocol
(DHCP), Boot Protocol (BOOTP), and Neighbor Discovery Protocol (NDP) can work on
an SEA.
For a more detailed discussion about virtual networking, see:
http://www.ibm.com/servers/aix/whitepapers/aix_vn.pdf
Tip: A Shared Ethernet Adapter does not need to have an IP address configured to be
able to perform the Ethernet bridging functionality. Configuring IP on the Virtual I/O Server
is convenient because the Virtual I/O Server can then be reached by TCP/IP, for example,
to perform dynamic LPAR operations or to enable remote login. This task can be done by
configuring an IP address directly on the SEA device or on an additional virtual Ethernet
adapter in the Virtual I/O Server. This leaves the SEA without the IP address, allowing for
maintenance on the SEA without losing IP connectivity in case SEA failover is configured.
VIOS Client 1
Ethernet
switch
VLAN=2 PVID=1
ent3
(sea)
en3
(if.)
en0
(if.)
Client 2
en0
(if.)
ent0
(virt.)
Client 3
en0
(if.)
ent0
(virt.)
ent1
(virt.)
ent2
(virt.)
ent0
(virt.)
VLAN=2
PVID=2
PVID=99
VID=2
PVID=1
PVID=1
PVID=1
VLAN=1
Hypervisor
External
Network
ent0
(phy.)104 IBM Power 710 and 730 Technical Overview and Introduction
Virtual SCSI
Virtual SCSI is used to refer to a virtualized implementation of the SCSI protocol. Virtual SCSI
is based on a client/server relationship. The Virtual I/O Server logical partition owns the
physical resources and acts as server or, in SCSI terms, target device. The client logical
partitions access the virtual SCSI backing storage devices provided by the Virtual I/O Server
as clients.
The virtual I/O adapters (virtual SCSI server adapter and a virtual SCSI client adapter) are
configured using a managed console or through the Integrated Virtualization Manager on
smaller systems. The virtual SCSI server (target) adapter is responsible for executing any
SCSI commands that it receives. It is owned by the Virtual I/O Server partition. The virtual
SCSI client adapter allows a client partition to access physical SCSI and SAN attached
devices and LUNs that are assigned to the client partition. The provisioning of virtual disk
resources is provided by the Virtual I/O Server.
Physical disks presented to the Virtual/O Server can be exported and assigned to a client
partition in a number of ways:
The entire disk is presented to the client partition.
The disk is divided into several logical volumes, which can be presented to a single client
or multiple clients.
As of Virtual I/O Server 1.5, files can be created on these disks, and file backed storage
devices can be created.
The logical volumes or files can be assigned to separate partitions. Therefore, virtual SCSI
enables sharing of adapters and disk devices.
Figure 3-12 shows an example where one physical disk is divided into two logical volumes by
the Virtual I/O Server. Each of the two client partitions is assigned one logical volume, which
is then accessed through a virtual I/O adapter (VSCSI Client Adapter). Inside the partition, the
disk is seen as a normal hdisk.
Figure 3-12 Architectural view of virtual SCSI
I/O Server Partition Client Partition 1 Client Partition 2
POWER Hypervisor
LVM
Physical
Adapter
Logical Hdisk Hdisk
Volume 1
Logical
Volume 2
VSCSI
Server
Adapter
VSCSI
Client
Adapter
VSCSI
Client
Adapter
Physical Disk
(SCSI, FC)
VSCSI
Server
AdapterChapter 3. Virtualization 105
At the time of writing, virtual SCSI supports Fibre Channel, parallel SCSI, iSCSI, SAS, SCSI
RAID devices and optical devices, including DVD-RAM and DVD-ROM. Other protocols such
as SSA and tape devices are not supported.
For more information about the specific storage devices supported for Virtual I/O Server, see:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/datasheet.html
Virtual I/O Server functions
Virtual I/O Server has a number of features, including monitoring solutions:
Support for Live Partition Mobility starting on POWER6 processor-based systems with the
PowerVM Enterprise Edition. For more information about Live Partition Mobility, see 3.4.5,
“PowerVM Live Partition Mobility” on page 105.
Support for virtual SCSI devices backed by a file, which are then accessed as standard
SCSI-compliant LUNs
Support for virtual Fibre Channel devices that are used with the NPIV feature.
Virtual I/O Server Expansion Pack with additional security functions such as Kerberos
(Network Authentication Service for users and Client and Server Applications), Simple
Network Management Protocol (SNMP) v3, and Lightweight Directory Access Protocol
client functionality (LDAP).
System Planning Tool (SPT) and Workload Estimator, which are designed to ease the
deployment of a virtualized infrastructure. For more information about System Planning
Tool, see 3.5, “System Planning Tool” on page 115.
Includes IBM Systems Director agent and a number of pre-installed Tivoli agents, such as:
– Tivoli Identity Manager (TIM), to allow easy integration into an existing Tivoli Systems
Management infrastructure
– Tivoli Application Dependency Discovery Manager (ADDM), which creates and
maintains automatically application infrastructure maps including dependencies,
change-histories, and deep configuration values
vSCSI eRAS.
Additional CLI statistics in svmon, vmstat, fcstat, and topas.
Monitoring solutions to help manage and monitor the Virtual I/O Server and shared
resources. New commands and views provide additional metrics for memory, paging,
processes, Fibre Channel HBA statistics, and virtualization.
For more information about the Virtual I/O Server and its implementation, see IBM PowerVM
Virtualization Introduction and Configuration, SG24-7940.
3.4.5 PowerVM Live Partition Mobility
PowerVM Live Partition Mobility allows you to move a running logical partition, including its
operating system and running applications, from one system to another without any shutdown
and without disrupting the operation of that logical partition. Inactive partition mobility allows
you to move a powered-off logical partition from one system to another. 106 IBM Power 710 and 730 Technical Overview and Introduction
Partition mobility provides systems management flexibility and improves system availability,
as follows:
Avoid planned outages for hardware or firmware maintenance by moving logical partitions
to another server and then performing the maintenance. Live Partition Mobility can help
lead to zero downtime maintenance because you can use it to work around scheduled
maintenance activities.
Avoid downtime for a server upgrade by moving logical partitions to another server and
then performing the upgrade. This approach allows your users to continue their work
without disruption.
Avoid unplanned downtime. With preventive failure management, if a server indicates a
potential failure, you can move its logical partitions to another server before the failure
occurs. Partition mobility can help avoid unplanned downtime.
Take advantage of server optimization:
– Consolidation: You can consolidate workloads running on several small, under-used
servers onto a single large server.
– Deconsolidation: You can move workloads from server to server to optimize resource
use and workload performance within your computing environment. With active
partition mobility, you can manage workloads with minimal downtime.
Mobile partition’s operating system requirements
The operating system running in the mobile partition has to be AIX or Linux. The Virtual I/O
Server partition itself cannot be migrated. All versions of AIX and Linux supported on the IBM
POWER7 processor-based servers also support partition mobility.
Source and destination system requirements
The source partition must be one that has only virtual devices. If there are any physical
devices in its allocation, they must be removed before the validation or migration is initiated.
An N_Port ID virtualization (NPIV) device is considered virtual and is compatible with
partition migration.
The hypervisor must support the Partition Mobility functionality (also called migration process)
available on POWER 6 and POWER 7 processor-based hypervisors. Firmware must be at
firmware level eFW3.2 or later. All POWER7 processor-based hypervisors support Live
Partition Mobility. Source and destination systems can have separate firmware levels, but
they must be compatible with each other.
A possibility is to migrate partitions back and forth between POWER6 and POWER7
processor-based servers. Partition Mobility leverages the POWER6 Compatibility
Modes that are provided by POWER7 processor-based servers. On the POWER7
processor-based server, the migrated partition is then executing in POWER6 or
POWER6+ Compatibility Mode.
If you want to move an active logical partition from a POWER6 processor-based server to a
POWER7 processor-based server so that the logical partition can take advantage of the
additional capabilities available with the POWER7 processor, perform these steps:
1. Set the partition-preferred processor compatibility mode to the default mode. When you
activate the logical partition on the POWER6 processor-based server, it runs in the
POWER6 mode.
2. Move the logical partition to the POWER7 processor-based server. Both the current and
preferred modes remain unchanged for the logical partition until you restart the logical
partition.Chapter 3. Virtualization 107
3. Restart the logical partition on the POWER7 processor-based server. The hypervisor
evaluates the configuration. Because the preferred mode is set to default and the logical
partition now runs on a POWER7 processor-based server, the highest mode available is
the POWER7 mode. The hypervisor determines that the most fully featured mode that is
supported by the operating environment installed in the logical partition is the POWER7
mode and changes the current mode of the logical partition to the POWER7 mode.
Now the current processor compatibility mode of the logical partition is the POWER7 mode
and the logical partition runs on the POWER7 processor-based server.
The Virtual I/O Server on the source system provides the access to the client resources and
must be identified as a mover service partition (MSP). The Virtual Asynchronous Services
Interface (VASI) device allows the mover service partition to communicate with the
hypervisor. It is created and managed automatically by the managed console and will be
configured on both the source and destination Virtual I/O Servers, which are designated as
the mover service partitions for the mobile partition, to participate in active mobility. Other
requirements include a similar time-of-day on each server. Systems must not be running on
battery power and shared storage (external hdisk with reserve_policy=no_reserve). In
addition, all logical partitions must be on the same open network with RMC established to the
managed console.
The managed console is used to configure, validate, and orchestrate. You use the
managed console to configure the Virtual I/O Server as an MSP and to configure the VASI
device. An managed console wizard validates your configuration and identifies issues that
can cause the migration to fail. During the migration, the managed console controls all
phases of the process.
Improved Live Partition Mobility benefits
The possibility to move partitions between POWER6 and POWER7 processor-based servers
greatly facilitates the deployment of POWER7 processor-based servers, as follows:
Installation of the new server can be performed while the application is executing on
POWER6 server. After the POWER7 processor-based server is ready, the application can
be migrated to its new hosting server without application down time.
When adding POWER7 processor-based servers to a POWER6 environment, you get the
additional flexibility to perform workload balancing across the entire set of POWER6 and
POWER7 processor-based servers.
When performing server maintenance, you get the additional flexibility to use POWER6
Servers for hosting applications usually hosted on POWER7 processor-based servers,
and vice-versa, allowing you to perform this maintenance with no application-planned
down time.
For more information about Live Partition Mobility and how to implement it, see IBM
PowerVM Live Partition Mobility, SG24-7460.
Tip: The “Migration combinations of processor compatibility modes for active Partition
Mobility” web page offers presentations of the supported migrations:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7hc3/iphc3pcmco
mbosact.htm108 IBM Power 710 and 730 Technical Overview and Introduction
3.4.6 Active Memory Sharing
Active Memory Sharing is an IBM PowerVM advanced memory virtualization technology that
provides system memory virtualization capabilities to IBM Power Systems, allowing multiple
partitions to share a common pool of physical memory.
Active Memory Sharing is only available with the Enterprise version of PowerVM.
The physical memory of an IBM Power System can be assigned to multiple partitions either in
a dedicated or in a shared mode. The system administrator has the capability to assign some
physical memory to a partition and some physical memory to a pool that is shared by other
partitions. A single partition can have either dedicated or shared memory:
With a pure dedicated memory model, the system administrator’s task is to optimize
available memory distribution among partitions. When a partition suffers degradation
because of memory constraints, and other partitions have unused memory, the
administrator can manually issue a dynamic memory reconfiguration.
With a shared memory model, the system automatically decides the optimal distribution of
the physical memory to partitions and adjusts the memory assignment based on partition
load. The administrator reserves physical memory for the shared memory pool, assigns
partitions to the pool, and provides access limits to the pool.
Active Memory Sharing can be exploited to increase memory utilization on the system either
by decreasing the global memory requirement or by allowing the creation of additional
partitions on an existing system. Active Memory Sharing can be used in parallel with Active
Memory Expansion on a system running a mixed workload of several operating system. For
example, AIX partitions can take advantage of Active Memory Expansion. Other operating
systems take advantage of Active Memory Sharing.
For additional information regarding Active Memory Sharing, see IBM PowerVM Virtualization
Active Memory Sharing, REDP-4470.
3.4.7 Active Memory Deduplication
In a virtualized environment, the systems might have a considerable amount of duplicated
information stored on its RAM after each partition has its own operating system, and some of
them might even share the same kind of applications. On heavily loaded systems this may
lead to a shortage of the available memory resources, forcing paging by the AMS partition
operating systems, the AMD pool, or both, which might decrease the overall system’s
performance.Chapter 3. Virtualization 109
Figure 3-13 shows the standard behavior of a system without Active Memory Deduplication
(AMD) enabled on its AMS shared memory pool. Identical pages within the same or different
LPARs each require their own unique physical memory page, consuming space with
repeated information.
Figure 3-13 AMS shared memory pool without AMD enabled
Active Memory Deduplication allows the Hypervisor to dynamically map identical partition
memory pages to a single physical memory page within a shared memory pool. This enables
a better utilization of the AMS shared memory pool, increasing the system’s overall
performance by avoiding paging. Deduplication can cause the hardware to incur fewer cache
misses, which also leads to improved performance.
D D D D U U U U U U U
U U U U U U U U U U U
U U U
D U U
U U U U U
D D U U U
U U
U U
D U
U U
Without
Active Memory
Deduplication
Mappings
AMS shared memory pool
LPAR1
Logical Memory
LPAR2
Logical Memory
LPAR3
Logical Memory
D
U
Duplicate pages
Unique pages
KEY:110 IBM Power 710 and 730 Technical Overview and Introduction
Figure 3-14 shows the behavior of a system with Active Memory Deduplication enabled on its
AMS shared memory pool. Duplicated pages from different LPARs are stored just once,
providing the AMS pool with more free memory.
Figure 3-14 Identical memory pages mapped to a single physical memory page with Active Memory
Duplication enabled
Active Memory Deduplication (AMD) depends on the Active Memory Sharing (AMS) feature
to be available, and it consumes CPU cycles donated by the AMS pool's VIOS partitions to
identify deduplicated pages. The operating systems running on the AMS partitions can hint to
the PowerVM Hypervisor that certain pages (such as frequently referenced read-only code
pages) are particularly good for deduplication.
To perform deduplication, the Hypervisor cannot compare every memory page in the AMS
pool with every other page. Instead, it computes a small signature for each page that it visits,
and stores the signatures in an internal table. Each time that a page is inspected, its signature
is looked up against the known signatures in the table. If a match is found, the memory pages
are compared to be sure that the pages are really duplicates. When an actual duplicate is
found, the Hypervisor remaps the partition memory to the existing memory page and returns
the duplicate page to the AMS pool.
D U U U U U U U
U U U U U U U U U U U
U U U
D U U
U U U U U
D D U U U
U U
U U
D U
U U
With
Active Memory
Deduplication
Mappings
AMS shared memory pool
LPAR1
Logical Memory
LPAR2
Logical Memory
LPAR3
Logical Memory
D
U
Duplicate pages
Unique pages
KEY:
FreeChapter 3. Virtualization 111
Figure 3-15 shows two pages being written in the AMS memory pool and having its
signatures matched on the deduplication table.
Figure 3-15 Memory pages having their signatures matched by Active Memory Deduplication
From the LPAR point of view, the AMD feature is completely transparent. If an LPAR attempts
to modify a deduplicated page, the Hypervisor takes a free page from the AMS pool, copies
the duplicate page content into the new page, and maps the LPAR's reference to the new
page so that the LPAR can modify its own unique page.
System administrators can dynamically configure the size of the deduplication table, ranging
from 1/8192 up to 1/256 of the configured maximum AMS memory pool size. Having this table
too small might lead to missed deduplication opportunities. On the other hand, having the
table that is too large might waste a small amount of overhead space.
The management of the Active Memory Deduplication feature is done via managed console
and SDMC, allowing administrators to:
Enable and disable Active Memory Deduplication at a AMS pool level.
Display deduplication metrics.
Display and modify the deduplication table size.
AMS
Memory
Pool
Page A
Dedup
Table
Sign A
Signature
Function
AMS
Memory
Pool
Page A
Dedup
Table
Sign A
Signature
Function
Page B
Signature
Function
Signature of Page A being written
on the Deduplication Table
Signature of Page B matching
Sign A on the Deduplication Table112 IBM Power 710 and 730 Technical Overview and Introduction
Figure 3-16 shows the Active Memory Deduplication being enabled to a shared memory pool.
Figure 3-16 Enabling the Active Memory Deduplication for a shared memory pool
The Active Memory Deduplication feature requires the following minimum components:
PowerVM Enterprise Edition
System firmware level 740
AIX Version 6: AIX 6.1 TL7 or later
AIX Version 7: AIX 7.1 TL1 SP1 or later
IBM i: 7.14 or 7.2 or later
SLES 11 SP2 or later
RHEL 6.2 or later
3.4.8 N_Port ID virtualization
N_Port ID virtualization (NPIV) is a technology that allows multiple logical partitions to access
independent physical storage through the same physical Fibre Channel adapter. This adapter
is attached to a Virtual I/O Server partition that acts only as a pass-through, managing the
data transfer through the POWER Hypervisor.
Each partition using NPIV is identified by a pair of unique worldwide port names, enabling you
to connect each partition to independent physical storage on a SAN. Unlike virtual SCSI, only
the client partitions see the disk.
For additional information and requirements for NPIV, see:
PowerVM Migration from Physical to Virtual Storage, SG24-7825
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
NPIV is supported in PowerVM Express, Standard, and Enterprise Editions on the IBM
Power 710 and Power 730 servers.Chapter 3. Virtualization 113
3.4.9 Operating system support for PowerVM
Table 3-5 summarizes the PowerVM features supported by the operating systems compatible
with the POWER7 processor-based servers.
Table 3-5 PowerVM features supported by AIX, IBM i and Linux
Feature AIX
V5.3
AIX
V6.1
AIX
V7.1
IBM i
6.1.1
IBM i
7.1
RHEL
V5.7
RHEL
V6.1
SLES
V10
SP4
SLES
V11
SP1
V i r tu a l S C S I Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual Ethernet Yes Yes Yes Yes Yes Yes Yes Yes Yes
Shared Ethernet
Adapter
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual Fibre
Channel
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual Tape Yes Yes Yes Yes Yes Yes Yes Yes Yes
Logical Partitioning Yes Yes Yes Yes Yes Yes Yes Yes Yes
DLPAR I/O adapter
add/remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
DLPAR I/O
processor
add/remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
DLPAR I/O
memory add
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
DLPAR I/O
memory remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s N o Ye s
Micro-Partitioning Yes Yes Yes Yes Yes Yes Yes Yes Yes
Shared Dedicated
Capacity
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Multiple Shared
Processor Pools
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual I/O Server Yes Yes Yes Yes Yes Yes Yes Yes Yes
Suspend/Resume No Yes Yes No No No No No No
Shared Storage
Pools
Ye s Ye s Ye s Ye s Ye s
a
No No No No
Thin Provisioning Yes Yes Yes Yes
b
Ye s
b
No No No No
Active Memory
Sharing and Active
Memory
Deduplication
N o Ye s Ye s Ye s Ye s N o Ye s N o Ye s
Live Partition
Mobility
Ye s Ye s Ye s N o N o Ye s Ye s Ye s Ye s114 IBM Power 710 and 730 Technical Overview and Introduction
3.4.10 POWER7 Linux programming support
IBM Linux Technology Center (LTC) contributes to the development of Linux by providing
support for IBM hardware in Linux distributions. In particular, the LTC makes tools and code
available to the Linux communities to take advantage of the POWER7 technology and
develop POWER7 optimized software.
Table 3-6 lists the support of specific programming features for various versions of Linux.
Table 3-6 Linux support for POWER7 features
Simultaneous
Multi-Threading
(SMT)
Ye s
c
Ye s
d
Ye s Ye s
e
Ye s Ye s
c
Ye s
c
Ye s
c
Ye s
Active Memory
Expansion
No Yes
f
Yes No No No No No No
a. Requires IBM i 7.1 TR1.
b. Will become a fully provisioned device when used by IBM i.
c. Only supports two threads.
d. AIX 6.1 up to TL4 SP2 only supports two threads, and supports four threads as of TL4 SP3.
e. IBM i 6.1.1 and up support SMT4.
f. On AIX 6.1 with TL4 SP2 and later.
Feature AIX
V5.3
AIX
V6.1
AIX
V7.1
IBM i
6.1.1
IBM i
7.1
RHEL
V5.7
RHEL
V6.1
SLES
V10
SP4
SLES
V11
SP1
Features Linux Releases Comments
SLES 10 SP4 SLES 11 RHEL 5.7 RHEL 6.1
POWER6 compatibility
mode
Ye s Ye s Ye s Ye s -
POWER7 mode No Yes No Yes -
Strong Access
Ordering
No Yes No Yes Can improve Lx86
performance
Scale to 256 cores/
1024 threads
No Yes No Yes Base OS support
available
4-way SMT No Yes No Yes -
VSX Support No Yes No Yes Full exploitation
requires Advance
Toolchain
Distro toolchain
mcpu/mtune=p7
No Yes No Yes SLES11/GA toolchain
has minimal P7
enablement necessary
to support kernel build
Advance Toolchain
Support
Ye s ,
execution
restricted to
Power6
instructions
Ye s Ye s ,
execution
restricted to
Power6
instructions
Yes Alternative IBM GNU
Toolchain
64k base page size No Yes Yes Yes -Chapter 3. Virtualization 115
For information regarding Advance Toolchain, see this website:
http://www.ibm.com/developerworks/wikis/display/hpccentral/How+to+use+Advance+Tool
chain+for+Linux+on+POWER
You can also visit the University of Illinois Linux on Power Open Source Repository:
http://ppclinux.ncsa.illinois.edu
ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at05/suse/SLES_11/release_notes.at05-
2.1-0.html
ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at05/redhat/RHEL5/release_notes.at05-
2.1-0.html
3.5 System Planning Tool
The IBM System Planning Tool (SPT) helps you design a system or systems to be partitioned
with logical partitions. You can also plan for and design non-partitioned systems by using the
SPT. The resulting output of your design is called a system plan, which is stored in a .sysplan
file. This file can contain plans for a single system or multiple systems. The .sysplan file can
be used for the following reasons:
To create reports
As input to the IBM configuration tool (e-Config)
To create and deploy partitions on your systems automatically
System plans that are generated by the SPT can be deployed on the system by the Hardware
Management Console (HMC), Systems Director Management Console (SDMC), or
Integrated Virtualization Manager (IVM).
You can create an entirely new system configuration, or you can create a system
configuration based on any of the following items:
Performance data from an existing system that the new system is to replace
Performance estimates that anticipates future workloads that you must support
Sample systems that you can customize to fit your needs
Integration between the SPT and both the Workload Estimator (WLE) and IBM Performance
Management (PM) allows you to create a system that is based on performance and capacity
data from an existing system or that is based on new workloads that you specify.
Tickless idle No Yes No Yes Improved energy
utilization and
virtualization of
partially to fully idle
partitions
Features Linux Releases Comments
SLES 10 SP4 SLES 11 RHEL 5.7 RHEL 6.1
Tip: Ask your IBM representative or Business Partner to use the Customer Specified
Placement manufacturing option if you want to automatically deploy your partitioning
environment on a new machine. SPT looks for the resource’s allocation to be the same as
that specified in your .sysplan file.116 IBM Power 710 and 730 Technical Overview and Introduction
You can use the SPT before you order a system to determine what you must order to support
your workload. You can also use the SPT to determine how you can partition a system that
you already have.
The System Planning Tool is an effective way of documenting and backing up key system
settings and partition definitions. It allows the user to create records of systems and export
them to their personal workstation or backup system of choice. These same backups can
then be imported back onto the same managed console when needed. This can be useful
when cloning systems enabling the user to import the system plan to any managed console
multiple times.
The SPT and its supporting documentation can be found on the IBM System Planning
Tool site:
http://www.ibm.com/systems/support/tools/systemplanningtool/© Copyright IBM Corp. 2011. All rights reserved. 117
Chapter 4. Continuous availability and
manageability
This chapter provides information about IBM reliability, availability, and serviceability (RAS)
design and features. This set of technologies, implemented on IBM Power Systems servers,
provides the possibility to improve your architecture’s total cost of ownership (TCO) by
reducing unplanned down time.
The elements of RAS can be described as follows:
Reliability: Indicates how infrequently a defect or fault in a server manifests itself
Availability: Indicates how infrequently the functionality of a system or application is
impacted by a fault or defect
Serviceability: Indicates how well faults and their effects are communicated to users and
services and how efficiently and nondisruptively the faults are repaired
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. POWER7 processor-based servers have new features to support new levels of
virtualization, help ease administrative burden, and increase system utilization.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
POWER7 uses lower voltage technology, improving reliability with stacked latches to reduce
soft error susceptibility. During the design and development process, subsystems go through
rigorous verification and integration testing processes. During system manufacturing,
systems go through a thorough testing process to help ensure high product quality levels.
The processor and memory subsystem contain a number of features designed to avoid or
correct environmentally induced, single-bit, intermittent failures as well as handle solid faults
in components, including selective redundancy to tolerate certain faults without requiring an
outage or parts replacement.
4118 IBM Power 710 and 730 Technical Overview and Introduction
4.1 Reliability
Highly reliable systems are built with highly reliable components. On IBM POWER
processor-based systems, this basic principle is expanded upon with a clear design for
reliability architecture and methodology. A concentrated, systematic, architecture-based
approach is designed to improve overall system reliability with each successive generation of
system offerings.
4.1.1 Designed for reliability
Systems designed with fewer components and interconnects have fewer opportunities to fail.
Simple design choices such as integrating processor cores on a single POWER chip can
dramatically reduce the opportunity for system failures. In this case, an 8-core server can
include one-fourth as many processor chips (and chip socket interfaces) as with a double
CPU-per-processor design. Not only does this case reduce the total number of system
components, it reduces the total amount of heat generated in the design, resulting in an
additional reduction in required power and cooling components. POWER7 processor-based
servers also integrate L3 cache into the processor chip for a higher integration of parts.
Parts selection also plays a critical role in overall system reliability. IBM uses three grades of
components with grade 3 being defined as industry standard (off-the-shelf). As shown in
Figure 4-1, using stringent design criteria and an extensive testing program, the IBM
manufacturing team can produce grade 1 components that are expected to be 10 times more
reliable than industry standard. Engineers select grade 1 parts for the most critical system
components. Newly introduced organic packaging technologies, rated grade 5, achieve the
same reliability as grade 1 parts.
Figure 4-1 Component failure rates
Component failure rates
0
0.2
0.4
0.6
0.8
1
Grade 3 Grade 1 Grade 5Chapter 4. Continuous availability and manageability 119
4.1.2 Placement of components
Packaging is designed to deliver both high performance and high reliability. For example,
the reliability of electronic components is directly related to their thermal environment. That
is, large decreases in component reliability are directly correlated with relatively small
increases in temperature. POWER processor-based systems are carefully packaged to
ensure adequate cooling. Critical system components such as the POWER7 processor chips
are positioned on printed circuit cards so that they receive fresh air during operation. In
addition, POWER processor-based systems are built with redundant, variable-speed fans that
can automatically increase output to compensate for increased heat in the central electronic
complex.
4.1.3 Redundant components and concurrent repair
High-opportunity components—those that most affect system availability—are protected with
redundancy and the ability to be repaired concurrently.
The use of these redundant components allows the system to remain operational:
POWER7 cores, which include redundant bits in L1 instruction and data caches, L2
caches, and L2 and L3 directories
Power 710 and Power 730 main memory DIMMs, which use an innovative ECC algorithm
from IBM research that improves bit error correction and memory failures
Redundant and hot-swap cooling
Redundant and hot-swap power supplies
For maximum availability, be sure to connect power cords from the same system to two
separate power distribution units (PDUs) in the rack, and to connect each PDU to
independent power sources. Tower form factor power cords must be plugged into two
independent power sources in order to achieve maximum availability.
4.2 Availability
IBM hardware and microcode capability to continuously monitor execution of hardware
functions is generally described as the process of First Failure Data Capture (FFDC). This
process includes the strategy of predictive failure analysis, which refers to the ability to track
intermittent correctable errors and to vary components off-line before they reach the point of
hard failure causing a system outage, and without the need to recreate the problem.
The POWER7 family of systems continues to offer and introduce significant enhancements
designed to increase system availability to drive towards a high-availability objective with
hardware components that can perform the following automatic functions:
Self-diagnose and self-correct during run time.
Automatically reconfigure to mitigate potential problems from suspect hardware.
Self-heal or automatically substitute good components for failing components.
Tip: Check your configuration for optional redundant components before ordering
your system.
Remember: POWER7 processor-based servers are independent of the operating system
for error detection and fault isolation within the central electronics complex.120 IBM Power 710 and 730 Technical Overview and Introduction
Throughout this chapter we describe IBM POWER7 processor-based systems technologies
focused on keeping a system up and running. For a specific set of functions focused on
detecting errors before they become serious enough to stop computing work, see 4.3.1,
“Detecting” on page 128.
4.2.1 Partition availability priority
Also available is the ability to assign availability priorities to partitions. If an alternate
processor recovery event requires spare processor resources and there are no other
means of obtaining the spare resources, the system determines which partition has the
lowest priority and attempts to claim the needed resource. On a properly configured
POWER processor-based server, this approach allows that capacity to first be obtained
from a low-priority partition instead of a high-priority partition.
This capability is relevant to the total system availability because it gives the system an
additional stage before an unplanned outage. In the event that insufficient resources exist to
maintain full system availability, these servers attempt to maintain partition availability by
user-defined priority.
Partition availability priority is assigned to partitions using a weight value or integer rating.
The lowest priority partition is rated at 0 (zero) and the highest priority partition is valued at
255. The default value is set at 127 for standard partitions and 192 for Virtual I/O Server
(VIOS) partitions. You can vary the priority of individual partitions.
Partition availability priorities can be set for both dedicated and shared processor partitions.
The POWER Hypervisor uses the relative partition weight value among active partitions to
favor higher priority partitions for processor sharing, adding and removing processor capacity,
and favoring higher priority partitions for normal operation.
Note that the partition specifications for minimum, desired, and maximum capacity are also
taken into account for capacity-on-demand options, and if total system-wide processor
capacity becomes disabled because of deconfigured failed processor cores. For example, if
total system-wide processor capacity is sufficient to run all partitions, at least with the
minimum capacity, the partitions are allowed to start or continue running. If processor
capacity is insufficient to run a partition at its minimum value, then starting that partition
results in an error condition that must be resolved.
4.2.2 General detection and deallocation of failing components
Runtime correctable or recoverable errors are monitored to determine whether there is a
pattern of errors. If these components reach a predefined error limit, the service processor
initiates an action to deconfigure the faulty hardware, helping to avoid a potential system
outage and to enhance system availability.
Persistent deallocation
To enhance system availability, a component that is identified for deallocation or
deconfiguration on a POWER processor-based system is flagged for persistent deallocation.
Component removal can occur either dynamically (while the system is running) or at boot
time (IPL), depending both on the type of fault and when the fault is detected.Chapter 4. Continuous availability and manageability 121
In addition, runtime unrecoverable hardware faults can be deconfigured from the system after
the first occurrence. The system can be rebooted immediately after failure and resume
operation on the remaining stable hardware. This way prevents the same faulty hardware
from affecting system operation again; the repair action is deferred to a more convenient, less
critical time.
The following persistent deallocation functions are included:
Processor.
L2/L3 cache lines. (Cache lines are dynamically deleted.)
Memory.
Deconfigure or bypass failing I/O adapters.
Processor instruction retry
As in POWER6, the POWER7 processor has the ability to do processor instruction retry and
alternate processor recovery for a number of core-related faults. Doing this significantly
reduces exposure to both permanent and intermittent errors in the processor core.
Intermittent errors, often due to cosmic rays or other sources of radiation, are generally
not repeatable.
With the instruction retry function, when an error is encountered in the core, in caches and
certain logic functions, the POWER7 processor first automatically retries the instruction. If the
source of the error was truly transient, the instruction succeeds and the system can continue
as before.
Note that on IBM systems prior to POWER6, such an error typically caused a checkstop.
Alternate processor retry
Hard failures are more difficult, being permanent errors that will be replicated each time that
the instruction is repeated. Retrying the instruction does not help in this situation because the
instruction will continue to fail.
As in POWER6, POWER7 processors have the ability to extract the failing instruction from
the faulty core and retry it elsewhere in the system for a number of faults, after which the
failing core is dynamically deconfigured and scheduled for replacement.
Dynamic processor deallocation
Dynamic processor deallocation enables automatic deconfiguration of processor cores when
patterns of recoverable core-related faults are detected. Dynamic processor deallocation
prevents a recoverable error from escalating to an unrecoverable system error, which might
otherwise result in an unscheduled server outage. Dynamic processor deallocation relies on
the service processor’s ability to use FFDC-generated recoverable error information to notify
the POWER Hypervisor when a processor core reaches its predefined error limit. Then the
POWER Hypervisor dynamically deconfigures the failing core and is called out for
replacement. The entire process is transparent to the partition owning the failing instruction.
Single processor checkstop
As in the POWER6 processor, the POWER7 processor provides single core check stopping
for certain processor logic, command, or control errors that cannot be handled by the
availability enhancements in the preceding section.
This significantly reduces the probability of any one processor affecting total system
availability by containing most processor checkstops to the partition that was using the
processor at the time full checkstop goes into effect.122 IBM Power 710 and 730 Technical Overview and Introduction
Even with all these availability enhancements to prevent processor errors from affecting
system-wide availability into play, there will be errors that can result in a system-wide outage.
4.2.3 Memory protection
A memory protection architecture that provides good error resilience for a relatively small L1
cache might be inadequate for protecting the much larger system main store. Therefore, a
variety of protection methods are used in POWER processor-based systems to avoid
uncorrectable errors in memory.
Memory protection plans must take into account many factors, including these:
Size
Desired performance
Memory array manufacturing characteristics
POWER7 processor-based systems have a number of protection schemes designed to
prevent, protect, or limit the effect of errors in main memory:
Chipkill
Chipkill is an enhancement that enables a system to sustain the failure of an entire
DRAM chip. An ECC word uses 18 DRAM chips from two DIMM pairs, and a failure on
any of the DRAM chips can be fully recovered by the ECC algorithm. The system can
continue indefinitely in this state with no performance degradation until the failed DIMM
can be replaced.
72-byte ECC
In POWER7, an ECC word consists of 72 bytes of data. Of these, 64 bytes are used to
hold application data. The remaining eight bytes are used to hold check bits and additional
information about the ECC word.
This innovative ECC algorithm from IBM research works on DIMM pairs on a rank basis.
(A rank is a group of nine DRAM chips on the Power 720 and 740.) With this ECC code,
the system can dynamically recover from an entire DRAM failure (by Chipkill) but can also
correct an error even if another symbol (a byte, accessed by a 2-bit line pair) experiences
a fault (an improvement from the Double Error Detection/Single Error Correction ECC
implementation found on the POWER6 processor-based systems).
Hardware scrubbing
Hardware scrubbing is a method used to deal with intermittent errors. IBM POWER
processor-based systems periodically address all memory locations. Any memory
locations with a correctable error are rewritten with the correct data.
CRC
The bus that is transferring data between the processor and the memory uses CRC error
detection with a failed operation-retry mechanism and the ability to dynamically retune the
bus parameters when a fault occurs. In addition, the memory bus has spare capacity to
substitute a data bit-line whenever it is determined to be faulty.
POWER7 memory subsystem
The POWER7 processor chip contains two memory controllers with four channels per
memory controller. Each channel connects to a single DIMM, but as the channels work in
pairs, a processor chip can address four DIMM pairs, two pairs per memory controller.
The bus transferring data between the processor and the memory uses CRC error detection
with a failed operation retry mechanism and the ability to dynamically retune bus parameters Chapter 4. Continuous availability and manageability 123
when a fault occurs. In addition, the memory bus has spare capacity to substitute a spare
data bit-line for one that is determined to be faulty.
Figure 4-2 shows a POWER7 processor chip, with its memory interface, consisting of two
controllers and four DIMMs per controller. Advanced memory buffer chips are exclusive to
IBM and help to increase performance, acting as read/write buffers. Power 710 and
Power 730 use one memory controller. Advanced memory buffer chips are on the system
planar and support two DIMMs each.
Figure 4-2 POWER7 memory subsystem
Memory page deallocation
While coincident cell errors in separate memory chips are statistically rare, IBM POWER7
processor-based systems can contain these errors using a memory page deallocation
scheme for partitions running IBM AIX and IBM i operating systems, as well as for memory
pages owned by the POWER Hypervisor. If a memory address experiences an uncorrectable
or repeated correctable single cell error, the Service Processor sends the memory page
address to the POWER Hypervisor to be marked for deallocation.
Pages used by the POWER Hypervisor are deallocated as soon as the page is released.
In other cases, the POWER Hypervisor notifies the owning partition that the page must be
deallocated. Where possible, the operating system moves any data currently contained in
that memory area to another memory area and removes the pages associated with this error
from its memory map, no longer addressing these pages. The operating system performs
memory page deallocation without any user intervention and is transparent to users and
applications.
The POWER Hypervisor maintains a list of pages marked for deallocation during the current
platform Initial Program Load (IPL). During a partition IPL, the partition receives a list of all the
bad pages in its address space. In addition, if memory is dynamically added to a partition
(through a dynamic LPAR operation), the POWER Hypervisor warns the operating system
when memory pages are included that need to be deallocated.
Ctrl
DIMMCtrl
DIMM
Memory Controller
Ctrl
DIMMCtrl
DIMM
Buffer Buffer
Ctrl
CtrlDIMM
DIMM
Memory Controller
Ctrl
Ctrl DIMM
DIMM
Buffer Buffer
Port Port Port Port
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
32 MB L3 Cache
SMP Fabric
GX124 IBM Power 710 and 730 Technical Overview and Introduction
Finally, If an uncorrectable error in memory is discovered, the logical memory block
associated with the address with the uncorrectable error is marked for deallocation by the
POWER Hypervisor. This deallocation will take effect on a partition reboot if the logical
memory block is assigned to an active partition at the time of the fault.
In addition, the system will deallocate the entire memory group associated with the error on
all subsequent system reboots until the memory is repaired. This precaution is intended to
guard against future uncorrectable errors while waiting for parts replacement.
Memory persistent deallocation
Defective memory discovered at boot time is automatically switched off. If the Service
Processor detects a memory fault at boot time, it marks the affected memory as bad so that it
is not used on subsequent reboots.
If the Service Processor identifies faulty memory in a server that includes CoD memory, the
POWER Hypervisor attempts to replace the faulty memory with available CoD memory.
Faulty resources are marked as deallocated, and working resources are included in the active
memory space. Because these activities reduce the amount of CoD memory available for
future use, repair of the faulty memory must be scheduled as soon as convenient.
Upon reboot, if not enough memory is available to meet minimum partition requirements, the
POWER Hypervisor will reduce the capacity of one or more partitions.
Depending on the configuration of the system, the HMC Service Focal Point™, OS Service
Focal Point, or Service Processor will receive a notification of the failed component, and will
trigger a service call.
4.2.4 Cache protection
POWER7 processor-based systems are designed with cache protection mechanisms,
including cache line delete in both L2 and L3 arrays, Processor Instruction Retry and
Alternate Processor Recovery protection on L1-I and L1-D, and redundant “Repair” bits in
L1-I, L1-D, and L2 caches, as well as L2 and L3 directories.
L1 instruction and data array protection
The POWER7 processor instruction and data caches are protected against intermittent errors
using Processor Instruction Retry and against permanent errors by Alternate Processor
Recovery, both mentioned previously. L1 cache is divided into sets. POWER7 processor can
deallocate all but one before doing a Processor Instruction Retry.
In addition, faults in the Segment Lookaside Buffer (SLB) array are recoverable by the
POWER Hypervisor. The SLB is used in the core to perform address translation calculations.
L2 and L3 array protection
The L2 and L3 caches in the POWER7 processor are protected with double-bit detect
single-bit correct error detection code (ECC). Single-bit errors are corrected before
forwarding to the processor and are subsequently written back to the L2 and L3 cache.
In addition, the caches maintain a cache line delete capability. A threshold of correctable
errors detected on a cache line can result in the data in the cache line being purged and the
cache line removed from further operation without requiring a reboot. An ECC uncorrectable
error detected in the cache can also trigger a purge and delete of the cache line. This results
in no loss of operation because an unmodified copy of the data can be held on system Chapter 4. Continuous availability and manageability 125
memory to reload the cache line from main memory. Modified data is handled through
Special Uncorrectable Error handling.
L2 and L3 deleted cache lines are marked for persistent deconfiguration on subsequent
system reboots until they can be replaced.
4.2.5 Special Uncorrectable Error handling
While it is rare, an uncorrectable data error can occur in memory or a cache. IBM POWER
processor-based systems attempt to limit the impact of an uncorrectable error to the least
possible disruption, using a well-defined strategy that first considers the data source.
Sometimes, an uncorrectable error is temporary in nature and occurs in data that can be
recovered from another repository, for example:
Data in the instruction L1 cache is never modified within the cache itself. Therefore, an
uncorrectable error discovered in the cache is treated like an ordinary cache miss, and
correct data is loaded from the L2 cache.
The L2 and L3 cache of the POWER7 processor-based systems can hold an unmodified
copy of data in a portion of main memory. In this case, an uncorrectable error simply
triggers a reload of a cache line from main memory.
In cases where the data cannot be recovered from another source, a technique called Special
Uncorrectable Error (SUE) handling is used to prevent an uncorrectable error in memory or
cache from immediately causing the system to terminate. Rather, the system tags the data
and determines whether it will ever be used again:
If the error is irrelevant, it will not force a checkstop.
If data is used, termination can be limited to the program/kernel or hypervisor owning the
data; or freeze of the I/O adapters controlled by an I/O hub controller if data is going to be
transferred to an I/O device.
When an uncorrectable error is detected, the system modifies the associated ECC word,
thereby signaling to the rest of the system that the “standard” ECC is no longer valid. The
Service Processor is then notified and takes appropriate actions. When running AIX 5.2 or
later or Linux and a process attempts to use the data, the operating system is informed of the
error and might terminate, or only terminate a specific process associated with the corrupt
data, depending on the operating system and firmware level and whether the data was
associated with a kernel or non-kernel process.
It is only in the case where the corrupt data is used by the POWER Hypervisor that the entire
system must be rebooted, thereby preserving overall system integrity.
Depending on system configuration and the source of the data, errors encountered during I/O
operations might not result in a machine check. Instead, the incorrect data is handled by the
processor host bridge (PHB) chip. When the PHB chip detects a problem, it rejects the data,
preventing data being written to the I/O device.
The PHB then enters a freeze mode, halting normal operations. Depending on the model and
type of I/O being used, the freeze might include the entire PHB chip, or simply a single bridge,
resulting in the loss of all I/O operations that use the frozen hardware until a power-on reset of
the PHB is done. The impact to partitions depends on how the I/O is configured for
redundancy. In a server configured for failover availability, redundant adapters spanning
multiple PHB chips can enable the system to recover transparently, without partition loss.126 IBM Power 710 and 730 Technical Overview and Introduction
4.2.6 PCI extended error handling
IBM estimates that PCI adapters can account for a significant portion of the hardware-based
errors on a large server. Whereas servers that rely on boot-time diagnostics can identify
failing components to be replaced by hot-swap and reconfiguration, runtime errors pose a
more significant problem.
PCI adapters are generally complex designs involving extensive on-board instruction
processing, often on embedded microcontrollers. They tend to use industry standard grade
components with an emphasis on product cost relative to high reliability. In certain cases,
they might be more likely to encounter internal microcode errors or many of the hardware
errors described for the rest of the server.
The traditional means of handling these problems is through adapter internal error reporting
and recovery techniques in combination with operating system device driver management
and diagnostics. In certain cases, an error in the adapter might cause transmission of bad
data on the PCI bus itself, resulting in a hardware-detected parity error and causing a global
machine check interrupt, eventually requiring a system reboot to continue.
PCI extended error handling (EEH) enabled adapters respond to a special data packet
generated from the affected PCI slot hardware by calling system firmware, which will examine
the affected bus, allow the device driver to reset it, and continue without a system reboot. For
Linux, EEH support extends to the majority of frequently used devices, although various
third-party PCI devices might not provide native EEH support.
To detect and correct PCIe bus errors, POWER7 processor-based systems use CRC
detection and instruction retry correction, while for PCI-X they use ECC.Chapter 4. Continuous availability and manageability 127
Figure 4-3 shows the location and various mechanisms used throughout the I/O subsystem
for PCI extended error handling.
Figure 4-3 PCI extended error handling
4.3 Serviceability
IBM Power Systems design considers both IBM and client needs. The IBM Serviceability
Team has enhanced the base service capabilities and continues to implement a strategy that
incorporates best-of-breed service characteristics from diverse IBM systems offerings.
Serviceability includes system installation, system upgrades and downgrades (MES), and
system maintenance and repair.
The goal of the IBM Serviceability Team is to design and provide the most efficient system
service environment that includes:
Easy access to service components, design for Customer Set Up (CSU), Customer
Installed Features (CIF), and Customer Replaceable Units (CRU)
On demand service education
Error detection and fault isolation (ED/FI)
First-failure data capture (FFDC)
An automated guided repair strategy that uses common service interfaces for a converged
service approach across multiple IBM server platforms
By delivering on these goals, IBM Power Systems servers enable faster and more accurate
repair and reduce the possibility of human error.
PCIe
Adapter
PCI-X
Adapter
Parity error
Parity error
I/O drawer concurrent add
CRC with
retry or ECC
PCI Bridge Enhanced
Error Handling
PCI-X to PCI-X
POWER7
12X Channel
Hub
PCI-X
Bridge
PCI-X
Bridge
POWER7
12X Channel
Hub
12X Channel –
PCIe Bridge
GX+ / GX++ bus
adapter
12x channel failover
support
PCI Bus Enhanced Error
Handling128 IBM Power 710 and 730 Technical Overview and Introduction
Client control of the service environment extends to firmware maintenance on all of the
POWER processor-based systems. This strategy contributes to higher systems availability
with reduced maintenance costs.
This section provides an overview of the progressive steps of error detection, analysis,
reporting, notifying, and repairing that are found in all POWER processor-based systems.
4.3.1 Detecting
The first and most crucial component of a solid serviceability strategy is the ability to
accurately and effectively detect errors when they occur. Although not all errors are a
guaranteed threat to system availability, those that go undetected can cause problems
because the system does not have the opportunity to evaluate and act if necessary. POWER
processor-based systems employ System z® server-inspired error detection mechanisms
that extend from processor cores and memory to power supplies and hard drives.
Service processor
The service processor is a microprocessor that is powered separately from the main
instruction processing complex. The service processor provides the capabilities for:
POWER Hypervisor (system firmware) and Hardware Management Console connection
surveillance
Several remote power control options
Reset and boot features
Environmental monitoring
The service processor monitors the server’s built-in temperature sensors, sending
instructions to the system fans to increase rotational speed when the ambient temperature
is above the normal operating range. Using an architected operating system interface, the
service processor notifies the operating system of potential environmentally related
problems so that the system administrator can take appropriate corrective actions before
a critical failure threshold is reached.
The service processor can also post a warning and initiate an orderly system shutdown
when:
– The operating temperature exceeds the critical level (for example, failure of air
conditioning or air circulation around the system).
– The system fan speed is out of operational specification (for example, because of
multiple fan failures).
– The server input voltages are out of operational specification.
The service processor can immediately shut down a system when:
– Temperature exceeds the critical level or remains above the warning level for too long.
– Internal component temperatures reach critical levels.
– Non-redundant fan failures occur.
Placing calls
On systems without a Hardware Management Console, the service processor can place
calls to report surveillance failures with the POWER Hypervisor, critical environmental
faults, and critical processing faults even when the main processing unit is inoperable.Chapter 4. Continuous availability and manageability 129
Mutual surveillance
The service processor monitors the operation of the POWER Hypervisor firmware during
the boot process and watches for loss of control during system operation. It also allows
the POWER Hypervisor to monitor service processor activity. The service processor can
take appropriate action, including calling for service, when it detects that the POWER
Hypervisor firmware has lost control. Likewise, the POWER Hypervisor can request a
service processor repair action if necessary.
Availability
The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (AC power) failure.
Figure 4-4 ASMI Auto Power Restart setting window interface
Fault monitoring
Built-in self-test (BIST) checks processor, cache, memory, and associated hardware that
is required for proper booting of the operating system, when the system is powered on at
the initial installation or after a hardware configuration change (for example, an upgrade).
If a non-critical error is detected or if the error occurs in a resource that can be removed
from the system configuration, the booting process is designed to proceed to completion.
The errors are logged in the system nonvolatile random access memory (NVRAM). When
Requirement: The auto-restart (reboot) option has to be enabled from the Advanced
System Manager Interface or from the Control (Operator) Panel. Figure 4-4 shows this
option using the ASMI.130 IBM Power 710 and 730 Technical Overview and Introduction
the operating system completes booting, the information is passed from the NVRAM to the
system error log where it is analyzed by error log analysis (ELA) routines. Appropriate
actions are taken to report the boot-time error for subsequent service, if required.
Concurrent access to the service processor menus of the Advanced System Management
Interface (ASMI)
This access allows nondisruptive abilities to change system default parameters,
interrogate service processor progress and error logs, set and reset server indicators,
(Guiding Light for midrange and high-end servers, Light Path for low-end servers),
accessing all service processor functions without having to power down the system to the
standby state. This way allows the administrator or service representative to dynamically
access the menus from any web browser-enabled console that is attached to the Ethernet
service network, concurrently with normal system operation.
Managing the interfaces for connecting uninterruptible power source systems to the
POWER processor-based systems, performing Timed Power-On (TPO) sequences, and
interfacing with the power and cooling subsystem
Error checkers
IBM POWER processor-based systems contain specialized hardware detection circuitry that
is used to detect erroneous hardware operations. Error checking hardware ranges from parity
error detection coupled with processor instruction retry and bus retry, to ECC correction on
caches and system buses. All IBM hardware error checkers have distinct attributes:
Continuous monitoring of system operations to detect potential calculation errors
Attempts to isolate physical faults based on runtime detection of each unique failure
Ability to initiate a wide variety of recovery mechanisms designed to correct the problem.
The POWER processor-based systems include extensive hardware and firmware
recovery logic.
Fault isolation registers
Error checker signals are captured and stored in hardware fault isolation registers (FIRs). The
associated logic circuitry is used to limit the domain of an error to the first checker that
encounters the error. In this way, runtime error diagnostics can be deterministic so that for
every check station, the unique error domain for that checker is defined and documented.
Ultimately, the error domain becomes the field-replaceable unit (FRU) call, and manual
interpretation of the data is not normally required.
First-failure data capture
FFDC is an error isolation technique that ensures that when a fault is detected in a system
through error checkers or other types of detection methods, the root cause of the fault will
be captured without the need to recreate the problem or run an extended tracing or
diagnostics program.
For the vast majority of faults, a good FFDC design means that the root cause is detected
automatically without intervention by a service representative. Pertinent error data related to
the fault is captured and saved for analysis. In hardware, FFDC data is collected from the
fault isolation registers and from the associated logic. In firmware, this data consists of return
codes, function calls, and so forth.
FFDC check stations are carefully positioned within the server logic and data paths to ensure
that potential errors can be quickly identified and accurately tracked to a field-replaceable unit
(FRU).Chapter 4. Continuous availability and manageability 131
This proactive diagnostic strategy is a significant improvement over the classic, less accurate
reboot and diagnose service approaches.
Figure 4-5 shows a schematic of a fault isolation register implementation.
Figure 4-5 Schematic of FIR implementation
Fault isolation
The service processor interprets error data that is captured by the FFDC checkers (saved in
the FIRs or other firmware-related data capture methods) to determine the root cause of the
error event.
Root cause analysis might indicate that the event is recoverable, meaning that a service
action point or need for repair has not been reached. Alternatively, it could indicate that a
service action point has been reached, where the event exceeded a pre-determined
threshold or was unrecoverable. Based on the isolation analysis, recoverable error threshold
counts might be incremented. No specific service action is necessary when the event is
recoverable.
When the event requires a service action, additional required information is collected to
service the fault. For unrecoverable errors or for recoverable events that meet or exceed their
service threshold, meaning that a service action point has been reached, a request for
service is initiated through an error logging component.
Memory
CPU
L2 / L3
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
L1
Disk
Text
Text
Text
Text
Text
Text
Text
Text
Non-volatile
RAM
Service
Processor
Error checkers
Text Fault isolation register (FIR)
Unique fingerprint of each
captured error
Log error132 IBM Power 710 and 730 Technical Overview and Introduction
4.3.2 Diagnosing
Using the extensive network of advanced and complementary error detection logic that is built
directly into hardware, firmware, and operating systems, the IBM Power Systems servers can
perform considerable self-diagnosis.
Boot time
When an IBM Power Systems server powers up, the service processor initializes the system
hardware. Boot-time diagnostic testing uses a multi-tier approach for system validation,
starting with managed low-level diagnostics that are supplemented with system firmware
initialization and configuration of I/O hardware, followed by OS-initiated software test routines.
Boot-time diagnostic routines include:
Built-in self-tests (BISTs) for both logic components and arrays ensure the internal
integrity of components. Because the service processor assists in performing these tests,
the system is enabled to perform fault determination and isolation whether or not the
system processors are operational. Boot-time BISTs might also find faults undetectable by
processor-based power-on self-test (POST) or diagnostics.
Wire-tests discover and precisely identify connection faults between components such as
processors, memory, or I/O hub chips.
Initialization of components such as ECC memory, typically by writing patterns of data and
allowing the server to store valid ECC data for each location, can help isolate errors.
To minimize boot time, the system determines which of the diagnostics are required to be
started in order to ensure correct operation, based on the way the system was powered off or
on the boot-time selection menu.
Run time
All Power Systems servers can monitor critical system components during run time, and they
can take corrective actions when recoverable faults occur. IBM hardware error-check
architecture provides the ability to report non-critical errors in an out-of-band communications
path to the service processor without affecting system performance.
A significant part of IBM runtime diagnostic capabilities originates with the service processor.
Extensive diagnostic and fault analysis routines have been developed and improved over
many generations of POWER processor-based servers, and enable quick and accurate
predefined responses to both actual and potential system problems.
The service processor correlates and processes runtime error information using logic derived
from IBM engineering expertise to count recoverable errors (called thresholding) and predict
when corrective actions must be automatically initiated by the system. These actions can
include:
Requests for a part to be replaced
Dynamic invocation of built-in redundancy for automatic replacement of a failing part
Dynamic deallocation of failing components so that system availability is maintained
Device drivers
In certain cases diagnostics are best performed by operating system-specific drivers, most
notably I/O devices that are owned directly by a logical partition. In these cases, the operating
system device driver often works in conjunction with I/O device microcode to isolate and
recover from problems. Potential problems are reported to an operating system device driver,
which logs the error. I/O devices can also include specific exercisers that can be invoked by
the diagnostic facilities for problem recreation if required by service procedures.Chapter 4. Continuous availability and manageability 133
4.3.3 Reporting
In the unlikely event that a system hardware or environmentally induced failure is diagnosed,
IBM Power Systems servers report the error through a number of mechanisms. The analysis
result is stored in system NVRAM. Error log analysis (ELA) can be used to display the failure
cause and the physical location of the failing hardware.
With the integrated service processor, the system has the ability to automatically send out an
alert through a phone line to a pager, or call for service in the event of a critical system failure.
A hardware fault also illuminates the amber system fault LED located on the system unit to
alert the user of an internal hardware problem.
On POWER7 processor-based servers, hardware and software failures are recorded in the
system log. When an HMC is attached, an ELA routine analyzes the error, forwards the event
to the Service Focal Point (SFP) application running on the HMC or SDMC, and has the
capability to notify the system administrator that it has isolated a likely cause of the system
problem. The service processor event log also records unrecoverable checkstop conditions,
forwards them to the Service Focal Point (SFP) application, and notifies the system
administrator. After the information is logged in the SFP application, if the system is properly
configured, a call-home service request is initiated and the pertinent failure data with service
parts information and part locations is sent to the IBM service organization. This information
will also contain the client contact information as defined in the Electronic Service Agent
(ESA) guided set-up wizard.
Error logging and analysis
When the root cause of an error has been identified by a fault isolation component, an error
log entry is created with basic data such as:
An error code uniquely describing the error event
The location of the failing component
The part number of the component to be replaced, including pertinent data such as
engineering and manufacturing levels
Return codes
Resource identifiers
FFDC data
Data containing information about the effect that the repair will have on the system is also
included. Error log routines in the operating system and FSP can then use this information
and decide whether the fault is a call home candidate. If the fault requires support
intervention, then a call will be placed with service and support and a notification will be sent
to the contact defined in the ESA-guided set-up wizard.
Remote support
The Remote Management and Control (RMC) subsystem is delivered as part of the base
operating system, including the operating system running on the Hardware Management
Console. RMC provides a secure transport mechanism across the LAN interface between the
operating system and the Hardware Management Console and is used by the operating
system diagnostic application for transmitting error information. It performs a number of other
functions also, but these are not used for the service infrastructure.
Service Focal Point
A critical requirement in a logically partitioned environment is to ensure that errors are not lost
before being reported for service, and that an error is only reported once, regardless of how 134 IBM Power 710 and 730 Technical Overview and Introduction
many logical partitions experience the potential effect of the error. The Manage Serviceable
Events task on the HMC or SDMC is responsible for aggregating duplicate error reports, and
ensures that all errors are recorded for review and management.
When a local or globally reported service request is made to the operating system, the
operating system diagnostic subsystem uses the Remote Management and Control
Subsystem (RMC) to relay error information to the Hardware Management Console. For
global events (platform unrecoverable errors, for example) the service processor will also
forward error notification of these events to the Hardware Management Console, providing a
redundant error-reporting path in case of errors in the RMC network.
The first occurrence of each failure type is recorded in the Manage Serviceable Events task
on the HMC /SDMC. This task then filters and maintains a history of duplicate reports from
other logical partitions on the service processor. It then looks at all active service event
requests, analyzes the failure to ascertain the root cause and, if enabled, initiates a call home
for service. This methodology ensures that all platform errors will be reported through at least
one functional path, ultimately resulting in a single notification for a single problem.
Extended error data
Extended error data (EED) is additional data that is collected either automatically at the time
of a failure or manually at a later time. The data collected is dependent on the invocation
method but includes information like firmware levels, operating system levels, additional fault
isolation register values, recoverable error threshold register values, system status, and any
other pertinent data.
The data is formatted and prepared for transmission back to IBM to assist the service support
organization with preparing a service action plan for the service representative or for
additional analysis.
System dump handling
In certain circumstances, an error might require a dump to be automatically or manually
created. In this event, it is off-loaded to the HMC. Specific HMC information is included as
part of the information that can optionally be sent to IBM support for analysis. If additional
information relating to the dump is required, or if it becomes necessary to view the dump
remotely, the HMC dump record notifies the IBM support center regarding on which HMC the
dump is located.
4.3.4 Notifying
After a Power Systems server has detected, diagnosed, and reported an error to an
appropriate aggregation point, it then takes steps to notify the client, and if necessary the IBM
support organization. Depending on the assessed severity of the error and support
agreement, this could range from a simple notification to having field service personnel
automatically dispatched to the client site with the correct replacement part.Chapter 4. Continuous availability and manageability 135
Client Notify
When an event is important enough to report, but does not indicate the need for a repair
action or the need to call home to IBM service and support, it is classified as Client Notify.
Clients are notified because these events might be of interest to an administrator. The event
might be a symptom of an expected systemic change, such as a network reconfiguration or
failover testing of redundant power or cooling systems. Examples of these events include:
Network events such as the loss of contact over a local area network (LAN)
Environmental events such as ambient temperature warnings
Events that need further examination by the client (although these events do not
necessarily require a part replacement or repair action)
Client Notify events are serviceable events, by definition, because they indicate that
something has happened that requires client awareness in the event that the client wants
to take further action. These events can always be reported back to IBM at the
client’s discretion.
Call home
A correctly configured POWER processor-based system can initiate an automatic or manual
call from a client location to the IBM service and support organization with error data, server
status, or other service-related information. The call-home feature invokes the service
organization in order for the appropriate service action to begin, automatically opening a
problem report, and in certain cases, dispatching field support. This automated reporting
provides faster and potentially more accurate transmittal of error information. Although
configuring call-home is optional, clients are strongly encouraged to configure this feature to
obtain the full value of IBM service enhancements.
Vital product data and inventory management
Power Systems store vital product data (VPD) internally, which keeps a record of how much
memory is installed, how many processors are installed, manufacturing level of the parts, and
so on. These records provide valuable information that can be used by remote support and
service representatives, enabling them to provide assistance in keeping the firmware and
software on the server up-to-date.
IBM problem management database
At the IBM support center, historical problem data is entered into the IBM Service and
Support Problem Management database. All of the information that is related to the error,
along with any service actions taken by the service representative, is recorded for problem
management by the support and development organizations. The problem is then tracked
and monitored until the system fault is repaired.
4.3.5 Locating and servicing
The final component of a comprehensive design for serviceability is the ability to effectively
locate and replace parts requiring service. POWER processor-based systems use a
combination of visual cues and guided maintenance procedures to ensure that the identified
part is replaced correctly, every time.136 IBM Power 710 and 730 Technical Overview and Introduction
Packaging for service
The following service enhancements are included in the physical packaging of the systems to
facilitate service:
Color coding (touch points):
– Terracotta-colored touch points indicate that a component (FRU or CRU) can be
concurrently maintained.
– Blue-colored touch points delineate components that are not concurrently maintained
(those that require the system to be turned off for removal or repair).
Tool-less design: Selected IBM systems support tool-less or simple tool designs. These
designs require no tools or simple tools such as flathead screw drivers to service the
hardware components.
Positive retention: Positive retention mechanisms help to ensure proper connections
between hardware components, such as from cables to connectors, and between two
cards that attach to each other. Without positive retention, hardware components run the
risk of becoming loose during shipping or installation, preventing a good electrical
connection. Positive retention mechanisms such as latches, levers, thumb-screws, pop
Nylatches (U-clips), and cables are included to help prevent loose connections and aid in
installing (seating) parts correctly. These positive retention items do not require tools.
Light Path
The Light Path LED feature is for low-end systems, including Power Systems up to models
750 and 755, that may be repaired by clients. In the Light Path LED implementation, when a
fault condition is detected on the POWER7 processor-based system, an amber FRU fault
LED is illuminated, which is then rolled up to the system fault LED. The Light Path system
pinpoints the exact part by turning on the amber FRU fault LED that is associated with the
part to be replaced.
The system can clearly identify components for replacement by using specific
component-level LEDs, and can also guide the servicer directly to the component by
signaling (staying on solid) the system fault LED, enclosure fault LED, and the component
FRU fault LED.
After the repair, the LEDs shut off automatically if the problem is fixed.
Guiding Light
Midrange and high-end systems, including models 770 and 780 and later, are usually
repaired by IBM Support personnel.
The enclosure and system identify LEDs that turn on solid and that can be used to follow the
path from the system to the enclosure and down to the specific FRU.
Guiding Light uses a series of flashing LEDs, allowing a service provider to quickly and
easily identify the location of system components. Guiding Light can also handle
multiple error conditions simultaneously, which might be necessary in certain complex
high-end configurations.
In these situations, Guiding Light waits for the servicer’s indication of what failure to attend
first and then illuminates the LEDs to the failing component.
Data centers can be complex places, and Guiding Light is designed to do more than identify
visible components. When a component might be hidden from view, Guiding Light can flash a
sequence of LEDs that extend to the frame exterior, clearly guiding the service
representative to the correct rack, system, enclosure, drawer, and component.Chapter 4. Continuous availability and manageability 137
Service labels
Service providers use these labels to assist them in performing maintenance actions. Service
labels are found in various formats and positions and are intended to transmit readily
available information to the servicer during the repair process.
Several of these service labels and the purpose of each are described in this list:
Location diagrams are strategically located on the system hardware, relating information
regarding the placement of hardware components. Location diagrams can include location
codes, drawings of physical locations, concurrent maintenance status, or other data that is
pertinent to a repair. Location diagrams are especially useful when multiple components
are installed such as DIMMs, CPUs, processor books, fans, adapter cards, LEDs, and
power supplies.
Remove or replace procedure labels contain procedures often found on a cover of the
system or in other spots that are accessible to the servicer. These labels provide
systematic procedures, including diagrams, detailing how to remove and replace certain
serviceable hardware components.
Numbered arrows are used to indicate the order of operation and serviceability
direction of components. Various serviceable parts such as latches, levers, and touch
points must be pulled or pushed in a certain direction and certain order so that the
mechanical mechanisms can engage or disengage. Arrows generally improve the ease
of serviceability.
The operator panel
The operator panel on a POWER processor-based system is a four-row by 16-element LCD
display that is used to present boot progress codes, indicating advancement through the
system power-on and initialization processes. The operator panel is also used to display error
and location codes when an error occurs that prevents the system from booting. It includes
several buttons, enabling a service support representative (SSR) or client to change various
boot-time options and for other limited service functions.
Concurrent maintenance
The IBM POWER7 processor-based systems are designed with the understanding that
certain components have higher intrinsic failure rates than others. The movement of fans,
power supplies, and physical storage devices naturally make them more susceptible to
wearing down or burning out. Other devices such as I/O adapters can begin to wear from
repeated plugging and unplugging. For these reasons, these devices have been specifically
designed to be concurrently maintainable when properly configured.
In other cases, a client might be in the process of moving or redesigning a data center, or
planning a major upgrade. At times like these, flexibility is crucial. The IBM POWER7
processor-based systems are designed for redundant or concurrently maintainable power,
fans, physical storage, and I/O towers.
The most recent members of the IBM Power Systems family, based on the POWER7
processor, continue to support concurrent maintenance of power, cooling, PCI adapters,
media devices, I/O drawers, GX adapter, and the operator panel. In addition, they support
concurrent firmware fixpack updates when possible. The determination of whether a firmware
fixpack release can be updated concurrently is identified in the readme file that is released
with the firmware.
Firmware updates
System Firmware is delivered as a Release Level or a Service Pack. Release Levels support
the general availability (GA) of new functions and features, and new machine types and 138 IBM Power 710 and 730 Technical Overview and Introduction
models. Upgrading to a higher Release Level is disruptive to customer operations. IBM
intends to introduce no more than two new Release Levels per year. These Release Levels
will be supported by Service Packs. Service Packs are intended to contain only firmware fixes
and not to introduce new function. A Service Pack is an update to an existing Release Level.
If a system is HMC managed you will use the HMC for firmware updates. Using the HMC
allows you to take advantage of the Concurrent Firmware Maintenance (CFM) option when
concurrent service packs are available. CFM is the IBM term used to describe the IBM Power
Systems firmware updates that can be partially or wholly concurrent or non-disruptive. With
the introduction of CFM, IBM is significantly increasing a client’s opportunity to stay on a
given release level for longer periods of time. Clients wanting maximum stability can defer
until there is a compelling reason to upgrade such as:
A Release Level is approaching its end-of-service date (that is, it has been available for
about a year and hence will go out of service support soon).
When moving a system to a more standardized Release Level when there are multiple
systems in an environment with similar hardware.
A new release has new functionality that is needed in the environment.
A scheduled maintenance action will cause a platform reboot. This provides an
opportunity to also upgrade to a new firmware release.
The update and upgrade of system firmware is dependant on several factors such as whether
the system is standalone or managed by a HMC, the current firmware installed, and what
operating systems are running on the system. These scenarios and the associated
installation instructions are comprehensively outlined in the firmware section of Fix Central,
which can be found here:
http://www.ibm.com/support/fixcentral/
You may also wish to review the best practice white papers which can be found here:
http://www14.software.ibm.com/webapp/set2/sas/f/best/home.html
Repairing and verifying
Repair and verify (R&V) is a system used to guide a service provider step-by-step through the
process of repairing a system and verifying that the problem has been repaired. The steps
are customized in the appropriate sequence for the particular repair for the specific system
being repaired. Repair scenarios covered by repair and verify include:
Replacing a defective field-replaceable unit (FRU)
Reattaching a loose or disconnected component
Correcting a configuration error
Removing or replacing an incompatible FRU
Updating firmware, device drivers, operating systems, middleware components, and IBM
applications after replacing a part
Repair and verify procedures can be used by both service representative providers who are
familiar with the task and those who are not. Education On Demand content is placed in the
procedure at the appropriate locations. Throughout the repair and verify procedure, repair
history is collected and provided to the Service and Support Problem Management Database
for storage with the serviceable event to ensure that the guided maintenance procedures are
operating correctly.
If a server is managed by a HMC, then many of the R&V procedures are performed from
the HMC. If the FRU to be replaced is a PCI adaptor or an internal storage device, then the Chapter 4. Continuous availability and manageability 139
service action is always performed from the operating system of the partition owning
that resource.
Clients can subscribe through the subscription services to obtain the notifications about the
latest updates available for service-related documentation. The latest version of the
documentation is accessible through the internet.
4.4 Manageability
Various functions and tools help manageability and enable you to efficiently and effectively
manage your system.
4.4.1 Service user interfaces
The Service Interface allows support personnel or the client to communicate with the service
support applications in a server using a console, interface, or terminal. Delivering a clear,
concise view of available service applications, the Service Interface allows the support team
to manage system resources and service information in an efficient and effective way.
Applications available through the Service Interface are carefully configured and placed to
give service providers access to important service functions.
Various service interfaces are used, depending on the state of the system and its operating
environment. The primary service interfaces are:
Light Path and Guiding Light
For more information, see “Light Path” on page 136 and “Guiding Light” on page 136.
Service processor, Advanced System Management Interface (ASMI)
Operator panel
Operating system service menu
Service Focal Point on the Hardware Management Console
Service Focal Point Lite on Integrated Virtualization Manager
Service processor
The service processor is a controller that is running its own operating system. It is a
component of the service interface card.
The service processor operating system has specific programs and device drivers for the
service processor hardware. The host interface is a processor support interface that is
connected to the POWER processor. The service processor is always working, regardless of
the main system unit’s state. The system unit can be in the following states:
Standby (power off)
Operating, ready-to-start partitions
Operating with running logical partitions
Functions
The service processor is used to monitor and manage the system hardware resources and
devices. The service processor checks the system for errors, ensuring the connection to the
HMC for manageability purposes and accepting Advanced System Management Interface
(ASMI) Secure Sockets Layer (SSL) network connections. The service processor provides 140 IBM Power 710 and 730 Technical Overview and Introduction
the ability to view and manage the machine-wide settings by using the ASMI, and enables
complete system and partition management from the HMC.
The service processor uses two Ethernet 10/100/1000 Mbps ports. Consider this information:
Both Ethernet ports are visible only to the service processor and can be used to attach the
server to an HMC or to access the ASMI. The ASMI options can be accessed through an
HTTP server that is integrated into the service processor operating environment.
Because of firmware-heavy workload, firmware can support only these ports at the
10/100 Mbps rate, although the Ethernet MAC is capable of 1 Gbps.
Both Ethernet ports have a default IP address, as follows:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.147.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.147.
When a redundant service processor is present, these default IP addresses are used:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.146.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.146.
These functions are available through the service processor:
Call Home
Advanced System Management Interface (ASMI)
Error Information (error code, PN, Location Codes) menu
View of guarded components
Limited repair procedures
Generate dump
LED Management menu
Remote view of ASMI menus
Firmware update through USB key
Advanced System Management Interface
Advanced System Management Interface (ASMI) is the interface to the service processor that
enables you to manage the operation of the server, such as auto-power restart, and to view
information about the server, such as the error log and vital product data. Various repair
procedures require connection to the ASMI.
The ASMI is accessible through the HMC. It is also accessible by using a web browser on a
system that is connected directly to the service processor (in this case, either a standard
Ethernet cable or a crossed cable) or through an Ethernet network. ASMI can also be
accessed from an ASCII terminal. Use the ASMI to change the service processor IP
addresses or to apply certain security policies and prevent access from undesired IP
addresses or ranges.
You might be able to use the service processor’s default settings. In that case, accessing the
ASMI is not necessary.
Tip: The service processor enables a system that does not boot to be analyzed. The error
log analysis can be performed from either the ASMI or the HMC.Chapter 4. Continuous availability and manageability 141
To access ASMI, use one of the following steps:
Access the ASMI by using an HMC.
If configured to do so, the HMC connects directly to the ASMI for a selected system from
this task.
To connect to the Advanced System Management interface from an HMC, follow
these steps:
a. Open Systems Management from the navigation pane.
b. From the work pane, select one or more managed systems to work with.
c. From the System Management tasks list, select Operations Advanced System
Management (ASM).
Access the ASMI by using a web browser.
The web interface to the ASMI is accessible through running Microsoft Internet
Explorer 7.0, Opera 9.24, or Mozilla Firefox 2.0.0.11 running on a PC or mobile computer
that is connected to the service processor. The web interface is available during all
phases of system operation, including the initial program load (IPL) and run time.
However, a few of the menu options in the web interface are unavailable during IPL or run
time to prevent usage or ownership conflicts if the system resources are in use during that
phase. The ASMI provides a Secure Sockets Layer (SSL) web connection to the service
processor. To establish an SSL connection, open your browser using this address:
https://
Where is the address of the service processor of
your Power Systems server, such as 9.166.196.7.
Access the ASMI using an ASCII terminal.
The ASMI on an ASCII terminal supports a subset of the functions that are provided by the
web interface and is available only when the system is in the platform standby state. The
ASMI on an ASCII console is not available during various phases of system operation,
such as the IPL and run time.
The operator panel
The service processor provides an interface to the operator panel, which is used to display
system status and diagnostic information.
Tip: To make the connection through Internet Explorer, click Tools Internet Options.
Clear the Use TLS 1.0 check box and click OK.142 IBM Power 710 and 730 Technical Overview and Introduction
The operator panel can be accessed in two ways:
By using the normal operational front view
By pulling it out to access the switches and view the LCD display. Figure 4-6 shows that
the operator panel on a Power 710 and 730 is pulled out.
Figure 4-6 Operator panel is pulled out from the chassis
The operator panel includes features such as these:
A 2 x 16 character LCD display
Reset, enter, power on/off, increment, and decrement buttons
Amber System Information/Attention, green Power LED
Blue Enclosure Identify LED on the Power 710 and Power 730
Altitude sensor
USB port
Speaker/beeper
The following functions are available through the operator panel:
Error information
Generate dump
View machine type, model, and serial number
Limited set of repair functions
Operating system service menu
The system diagnostics consist of IBM i service tools, stand-alone diagnostics that are loaded
from the DVD drive, and online diagnostics (available in AIX).
Online diagnostics, when installed, are a part of the AIX or IBM i operating system on the disk
or server. They can be booted in single-user mode (service mode), run in maintenance mode,
or run concurrently (concurrent mode) with other applications. They have access to the AIX
Release Lever
(slide left to release operator panel and pull out from chassis)Chapter 4. Continuous availability and manageability 143
error log and the AIX configuration data. IBM i has a service tools problem log, IBM i history
log (QHST), and IBM i problem log.
These are the available modes:
Service mode
Service mode requires a service mode boot of the system. It enables the checking of
system devices and features. Service mode provides the most complete checkout of the
system resources. All system resources, except the SCSI adapter and the disk drives
used for paging, can be tested.
Concurrent mode
Concurrent mode enables the normal system functions to continue while selected
resources are being checked. Because the system is running in normal operation, certain
devices might require additional actions by the user or diagnostic application before
testing can be done.
Maintenance mode
Maintenance mode enables the checking of most system resources. Maintenance mode
provides the same test coverage as service mode. The difference between the two modes
is the way that they are invoked. Maintenance mode requires that all activity on the
operating system be stopped. The shutdown -m command is used to stop all activity on
the operating system and put the operating system into maintenance mode.
The System Management Services (SMS) error log is accessible on the SMS menus.
This error log contains errors that are found by partition firmware when the system or partition
is booting.
You can access the service processor’s error log on the ASMI menus.
You can also access the system diagnostics from an AIX Network Installation Management
(NIM) server.
The IBM i operating system and associated machine code provide Dedicated Service Tools
(DST) as part of the IBM i licensed machine code (Licensed Internal Code) and System
Service Tools (SST) as part of the IBM i operating system. DST can be run in dedicated mode
(no operating system loaded). DST tools and diagnostics are a superset of those available
under SST.
The IBM i End Subsystem (ENDSBS *ALL) command can shut down all IBM and customer
application subsystems except the controlling subsystem QTCL. The Power Down System
(PWRDWNSYS) command can be set to power down the IBM i partition and restart the
partition in DST mode.
You can start SST during normal operations, which leaves all applications up and running,
using the IBM i Start Service Tools (STRSST) command (when signed onto IBM i with the
appropriately secured user ID).
With DST and SST, you can look at various logs, run various diagnostics, or take various
kinds of system dumps or other options.
Tip: When you order a Power Systems server, a DVD-ROM or DVD-RAM might be
optional. An alternate method for maintaining and servicing the system must be available if
you do not order the DVD-ROM or DVD-RAM.144 IBM Power 710 and 730 Technical Overview and Introduction
Depending on the operating system, these are the service-level functions that you typically
see when using the operating system service menus:
Product activity log
Trace Licensed Internal Code
Work with communications trace
Display/Alter/Dump
Licensed Internal Code log
Main storage dump manager
Hardware service manager
Call Home/Customer Notification
Error information menu
LED management menu
Concurrent/Non-concurrent maintenance (within scope of the OS)
Managing firmware levels
– Server
– Adapter
Remote support (access varies by OS)
Service Focal Point on the Hardware Management Console
Service strategies become more complicated in a partitioned environment. The Manage
Serviceable Events task in the HMC can help to streamline this process.
Each logical partition reports errors that it detects and forwards the event to the Service Focal
Point (SFP) application that is running on the HMC, without determining whether other logical
partitions also detect and report the errors. For example, if one logical partition reports an
error for a shared resource, such as a managed system power supply, other active logical
partitions might report the same error.
By using the Manage Serviceable Events task in the HMC, you can avoid long lists of
repetitive call-home information by recognizing that these are repeated errors and
consolidating them into one error.
In addition, you can use the Manage Serviceable Events task to initiate service functions on
systems and logical partitions, including the exchanging of parts, configuring connectivity,
and managing dumps.
4.4.2 IBM Power Systems firmware maintenance
The IBM Power Systems Client-Managed Microcode is a methodology that enables you to
manage and install microcode updates on Power Systems and associated I/O adapters.
The system firmware consists of service processor microcode, Open Firmware microcode,
SPCN microcode, and the POWER Hypervisor.
The firmware and microcode can be downloaded and installed either from an HMC, from a
running partition, or from USB port number 1 on the rear of a Power 710 and Power 730, if
that system is not managed by an HMC.
Power Systems has a permanent firmware boot side, or A side, and a temporary firmware
boot side, or B side. New levels of firmware must be installed on the temporary side first to
test the update’s compatibility with existing applications. When the new level of firmware has
been approved, it can be copied to the permanent side.Chapter 4. Continuous availability and manageability 145
For access to the initial web pages that address this capability, see the Support for IBM
Systems web page:
http://www.ibm.com/systems/support
For Power Systems, select the Power link. Figure 4-7 shows an example.
Figure 4-7 Support for Power servers web page
Although the content under the Popular links section can change, click the Firmware and
HMC updates link to go to the resources for keeping your system’s firmware current.146 IBM Power 710 and 730 Technical Overview and Introduction
If there is an HMC to manage the server, the HMC interface can be use to view the levels of
server firmware and power subsystem firmware that are installed and that are available to
download and install.
Each IBM Power Systems server has the following levels of server firmware and power
subsystem firmware:
Installed level
This level of server firmware or power subsystem firmware has been installed and will be
installed into memory after the managed system is powered off and then powered on. It is
installed on the temporary side of system firmware.
Activated level
This level of server firmware or power subsystem firmware is active and running
in memory.
Accepted level
This level is the backup level of server or power subsystem firmware. You can return to
this level of server or power subsystem firmware if you decide to remove the installed
level. It is installed on the permanent side of system firmware.
IBM provides the Concurrent Firmware Maintenance (CFM) function on selected Power
Systems. This function supports applying nondisruptive system firmware service packs to the
system concurrently (without requiring a reboot operation to activate changes). For systems
that are not managed by an HMC, the installation of system firmware is always disruptive.
The concurrent levels of system firmware can, on occasion, contain fixes that are known as
deferred. These deferred fixes can be installed concurrently but are not activated until the
next IPL. Deferred fixes, if any, will be identified in the Firmware Update Descriptions table of
the firmware document. For deferred fixes within a service pack, only the fixes in the service
pack that cannot be concurrently activated are deferred. Table 4-1 shows the file-naming
convention for system firmware.
Table 4-1 Firmware naming convention
PPNNSSS_FFF_DDD
PP Package identifier 01 -
02 -
NN Platform and class AL Low End
AM Mid Range
AS Blade Server
AH High End
AP Bulk Power for IH
AB Bulk Power for High
End
SSS Release indicator
FFF Current fixpack
DDD Last disruptive fixpackChapter 4. Continuous availability and manageability 147
The following example uses the convention:
01AL710_086 = POWER7 Entry Systems Firmware for 8233-E8B and 8236-E8B
An installation is disruptive if the following statements are true:
The release levels (SSS) of currently installed and new firmware differ.
The service pack level (FFF) and the last disruptive service pack level (DDD) are equal in
new firmware.
Otherwise, an installation is concurrent if the service pack level (FFF) of the new firmware is
higher than the service pack level currently installed on the system and the conditions for
disruptive installation are not met.
4.4.3 Electronic Services and Electronic Service Agent
IBM has transformed its delivery of hardware and software support services to help you
achieve higher system availability. Electronic Services is a web-enabled solution that offers
an exclusive, no-additional-charge enhancement to the service and support available for IBM
servers. These services provide the opportunity for greater system availability with faster
problem resolution and preemptive monitoring. The Electronic Services solution consists of
two separate, but complementary, elements:
Electronic Services news page
The Electronic Services news page is a single internet entry point that replaces the
multiple entry points that are traditionally used to access IBM internet services and
support. The news page enables you to gain easier access to IBM resources for
assistance in resolving technical problems.
IBM Electronic Service Agent
The Electronic Service Agent is software that resides on your server. It monitors events
and transmits system inventory information to IBM on a periodic, client-defined timetable.
The Electronic Service Agent automatically reports hardware problems to IBM.
Early knowledge about potential problems enables IBM to deliver proactive service that can
result in higher system availability and performance. In addition, information that is collected
through the Service Agent is made available to IBM service support representatives when
they help answer your questions or diagnose problems. Installation and use of IBM Electronic
Service Agent for problem reporting enables IBM to provide better support and service for
your IBM server.
To learn how Electronic Services can work for you, visit:
https://www.ibm.com/support/electronic/portal148 IBM Power 710 and 730 Technical Overview and Introduction
4.5 Operating system support for RAS features
Table 4-2 gives an overview of a number of features for continuous availability that are
supported by the various operating systems running on the Power 710, Power 720,
Power 730, and Power 740 systems.
Table 4-2 Operating system support for RAS features
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i RHEL
5.7
RHEL
6.1
SLES11
SP1
System deallocation of failing components
Dynamic Processor Deallocation X X X X X X X
Dynamic Processor Sparing X X X X X X X
Processor Instruction Retry X X X X X X X
Alternate Processor Recovery X X X X X X X
Partition Contained Checkstop X X X X X X X
Persistent processor deallocation X X X X X X X
GX++ bus persistent deallocation X X X X - - X
PCI bus extended error detection X X X X X X X
PCI bus extended error recovery X X X X Most Most Most
PCI-PCI bridge extended error handling X X X X - - -
Redundant RIO or 12x Channel link X X X X X X X
PCI card hot-swap X X X X X X X
Dynamic SP failover at run-time X X X X X X X
Memory sparing with CoD at IPL time X X X X X X X
Clock failover runtime or IPL X X X X X X X
Memory availability
64-byte ECC code X X X X X X X
Hardware scrubbing X X X X X X X
CRC X X X X X X X
Chipkill X X X X X X X
L1 instruction and data array protection X X X X X X X
L2/L3 ECC & cache line delete X X X X X X X
Special uncorrectable error handling X X X X X X X
Fault detection and isolation
Platform FFDC diagnostics X X X X X X X
Run-time diagnostics X X X X Most Most Most
Storage Protection Keys - X X X - - -Chapter 4. Continuous availability and manageability 149
Dynamic Trace X X X X - - X
Operating System FFDC - X X X - - -
Error log analysis X X X X X X X
Service Processor support for:
Built-in-Self-Tests (BIST) for logic and arrays X X X X X X X
Wire tests X X X X X X X
Component initialization X X X X X X X
Serviceability
Boot-time progress indicators X X X X Most Most Most
Electronic Service Agent Call Home from HMC
a
X X X X - - -
Firmware error codes X X X X X X X
Operating system error codes X X X X Most Most Most
Inventory collection X X X X X X X
Environmental and power warnings X X X X X X X
Hot-plug fans, power supplies X X X X X X X
Extended error data collection X X X X X X X
SP call home on non-HMC configurations X X X X - - -
I/O drawer redundant connections X X X X X X X
I/O drawer hot add and concurrent repair X X X X X X X
Concurrent RIO/GX adapter add X X X X X X X
Concurrent cold-repair of GX adapter X X X X X X X
SP mutual surveillance with POWER Hypervisor X X X X X X X
Dynamic firmware update with HMC X X X X X X X
Electronic Service Agent Call Home Application X X X X - - -
Lightpath LEDs X X X X X X X
System dump for memory, POWER Hypervisor, SP X X X X X X X
Infocenter/Systems Support Site service publications X X X X X X X
System Support Site education X X X X X X X
Operating system error reporting to HMC SFP X X X X X X X
RMC secure error transmission subsystem X X X X X X X
Health check scheduled operations with HMC X X X X X X X
Operator panel (real or virtual) X X X X X X X
Concurrent operator panel maintenance X X X X X X X
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i RHEL
5.7
RHEL
6.1
SLES11
SP1150 IBM Power 710 and 730 Technical Overview and Introduction
Redundant HMCs X X X X X X X
Automated server recovery/restart X X X X X X X
High availability clustering support X X X X X X X
Repair and Verify Guided Maintenance X X X X Most Most Most
Concurrent kernel update - X X X X X X
a. Electronic Service Agent via a managed HMC will report platform-level information but not Linux operating system
detected errors.
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i RHEL
5.7
RHEL
6.1
SLES11
SP1© Copyright IBM Corp. 2011. All rights reserved. 151
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this paper.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
IBM BladeCenter PS700, PS701, and PS702 Technical Overview and Introduction,
REDP-4655
IBM BladeCenter PS703 and PS704 Technical Overview and Introduction, REDP-4744
IBM Power 720 and 740 (8202-E4C, 8205-E6C) Technical Overview and Introduction,
REDP-4797
IBM Power 750 and 755 (8233-E8B, 8236-E8C) Technical Overview and Introduction,
REDP-4638
IBM Power 770 and 780 (9117-MMC, 9179-MHC) Technical Overview and Introduction,
REDP-4798
IBM Power 795 (9119-FHB) Technical Overview and Introduction, REDP-4640
IBM PowerVM Virtualization Introduction and Configuration, SG24-7940
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
IBM PowerVM Live Partition Mobility, SG24-7460
IBM System p Advanced POWER Virtualization (PowerVM) Best Practices, REDP-4194
PowerVM Migration from Physical to Virtual Storage, SG24-7825
IBM System Storage DS8000: Copy Services in Open Environments, SG24-6788
IBM System Storage DS8700 Architecture and Implementation, SG24-8786
PowerVM and SAN Copy Services, REDP-4610
SAN Volume Controller V4.3.0 Advanced Copy Services, SG24-7574
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Other publications
These publications are also relevant as further information sources:
IBM Power Systems Facts and Features POWER7 Blades and Servers
http://www.ibm.com/systems/power/hardware/reports/factsfeatures.html152 IBM Power 710 and 730 Technical Overview and Introduction
Specific storage devices supported for Virtual I/O Server
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/datasheet.html
IBM Power 710 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03048usen/POD03048USEN.PDF
IBM Power 720 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03048usen/POD03048USEN.PDF
IBM Power 730 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03050usen/POD03050USEN.PDF
IBM Power 740 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03051usen/POD03051USEN.PDF
IBM Power 750 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03034usen/POD03034USEN.PDF
IBM Power 755 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03035usen/POD03035USEN.PDF
IBM Power 770 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03035usen/POD03035USEN.PDF
IBM Power 780 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03032usen/POD03032USEN.PDF
IBM Power 795 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03053usen/POD03053USEN.PDF
Active Memory Expansion: Overview and Usage Guide
http://public.dhe.ibm.com/common/ssi/ecm/en/pow03037usen/POW03037USEN.PDF
Migration combinations of processor compatibility modes for active Partition Mobility
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7hc3/iphc3pcmco
mbosact.htm
Advance Toolchain for Linux website
http://www.ibm.com/developerworks/wikis/display/hpccentral/How+to+use+Advance+T
oolchain+for+Linux+on+POWER
Online resources
These websites are also relevant as further information sources:
IBM Power Systems Hardware Information Center
http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp
IBM System Planning Tool website
http://www.ibm.com/systems/support/tools/systemplanningtool/
IBM Fix Central website
http://www.ibm.com/support/fixcentral/
Power Systems Capacity on Demand website
http://www.ibm.com/systems/power/hardware/cod/ Related publications 153
Support for IBM Systems website
http://www.ibm.com/support/entry/portal/Overview?brandind=Hardware~Systems~Power
IBM Power Systems website
http://www.ibm.com/systems/power/
IBM Storage website
http://www.ibm.com/systems/storage/
Help from IBM
IBM Support and downloads
ibm.com/support
IBM Global Services
ibm.com/services154 IBM Power 710 and 730 Technical Overview and Introduction®
REDP-4796-00
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
Redpaper™
IBM Power 710 and 730
Technical Overview and
Introduction
Features the
8231-E1C and
8231-E2C based on
the latest POWER7
processor technology
PowerVM,
enterprise-level RAS
all in an entry server
package
2U rack-mount design
for midrange
performance
This IBM Redpaper publication is a comprehensive guide covering the
IBM Power 710 (8231-E1C) and Power 730 (8231-E2C) servers
supporting AIX, IBM i, and Linux operating systems. The goal of this
paper is to introduce the major innovative Power 710 and Power 730
offerings and their prominent functions, including these:
The IBM POWER7™ processor available at frequencies of 3.0 GHz,
3.55 GHz, and 3.7 GHz
The specialized POWER7 Level 3 cache that provides greater
bandwidth, capacity, and reliability
The 2-port 10/100/1000 Base-TX Ethernet PCI Express adapter
included in the base configuration and installed in a PCIe Gen2 x4
slot
The integrated SAS/SATA controller for HDD, SSD, tape, and DVD.
This controller supports built-in hardware RAID 0, 1, and 10
The latest IBM PowerVM™ virtualization, including PowerVM Live
Partition Mobility and PowerVM IBM Active Memory™ Sharing
Active Memory Expansion technology that provides more usable
memory than is physically installed in the system
IBM EnergyScale™ technology that provides features such as
power trending, power-saving, capping of power, and thermal
measurement
Professionals who want to acquire a better understanding of IBM
Power Systems products can benefit from reading this paper.
This paper expands the current set of IBM Power Systems
documentation by providing a desktop reference that offers a detailed
technical description of the Power 710 and Power 730 systems.
Back cover
© Copyright IBM Corp. 2011. All rights reserved. ibm.com/redbooks 1
Redpaper
IBM CICS Event Processing: New Features
in V4.2
Introduction
IBM® CICS® Transaction Server for z/OS® Version 4.1 made it possible to support event
processing for business applications. Event processing can help the business community to
gain insight into how their business processes are performing. Furthermore, event processing
can help companies take advantage of new business opportunities by providing a
non-invasive methodology for enhancing existing business applications.
With the potential offered by event processing support, CICS Transaction Server for z/OS
Version 4.2 enhances this capability with a number of new features. This IBM Redpapers™
publication describes several of the most significant of these new features, such as assured
event emission, separate event processing (EP) adapters, and search facilities to help you
understand the effect of application changes.
For more information about the new feature which provides CICS system events, see the IBM
Redpaper™ Gaining Insight into IBM CICS Systems with Events, REDP-4810, in this series.
Assured event emission
Business events carry business importance to users. In many situations users can tolerate
the potential for occasional loss of events, for example, because you are looking for trends
and patterns over a period of time. However, in situations where the user needs to respond to
each and every event, it is vital to make sure that every event that is captured will be emitted
successfully.
CICS TS 4.2 provides assured event emission for your event-critical business solutions.
Assured event emission is achieved by selecting synchronous event emission in the CICS
event binding editor.
Catherine Moxey
Jenny He2 IBM CICS Event Processing: New Features in V4.2
With synchronous event emission, event formatting and emission processing are completed
synchronously within the unit of work of the capturing transaction. The unit of work completes
successfully only if the event is emitted. If the event emission fails, or CICS itself fails, the
capturing transaction will be abended.
Synchronous emission mode is used when the capturing unit of work can only be regarded as
successful if it can be assured that the event has been emitted. With asynchronous event
emission, the event is queued for asynchronous processing by a separate event processing
(EP) dispatcher thread, which reduces the overhead of the capturing transaction, but means
that in those rare situations where the event fails to be emitted, the capturing transaction will
still complete successfully.
If a synchronous event is not emitted, messages are produced in the CICSLog to explain why,
statistics are updated, and the unit of work for the capturing transaction is backed out. Without
assured event emission, if an event fails to be emitted, CICS still provides information and
statistics about the failure, but the capturing unit of work continues successfully and the event
is discarded.
Emission mode for the event is specified in the advanced options when defining an EP
adapter using the CICS event binding editor. Figure 1 shows how synchronous emission
mode is specified. You can also specify whether an event is transactional or not. For more
information about emission and transactional modes, see the information about event
processing adapters in the CICS TS 4.2 Information Center.
Figure 1 Specifying emission mode and transactional mode in the CICS event binding editor
The following examples illustrate situations in which either synchronous or asynchronous
events would be used.
Example for asynchronous events
One example of when asynchronous events are appropriate is when an event for bank
account transaction which exceed $1,000 is used to monitor the frequency of large
transactions in a day, or to recognize high-value customers. The emission of such events is
entirely separate from the application which processes the bank account transactions, which
is not dependent on whether or not the events are emitted successfully. IBM CICS Event Processing: New Features in V4.2 3
Example for synchronous events
Using an example of withdrawal of money from a bank account, a synchronous event can be
set up so that, if the balance is lower than the amount to be withdrawn, an event is emitted
immediately within the same money withdrawal transaction to trigger a balance transfer
process. In this example, the event is used to drive additional processing, so if the event
emission fails, the withdrawal of money should not be allowed to complete.
With assured event emission and a suitable choice of EP adapter, you can choose the level of
integrity between the business application and the emission and transportation of the event.
Assuring your event emission provides the opportunity to build business-critical, event-based
applications, and extend existing applications in as reliable a way as if you had, for example,
added an MQPUT statement to an application to write an event, but without actually changing
the application. The trade-off is that the synchronous processing that is essential to assuring
that events are emitted might have an impact on your application response time. The
synchronous event emission can also change the behavior of a transaction because an event
emission failure causes the application to back out. A judicious use of synchronous event
emission minimizes the application impact.
It is worth noting that all system events (as discussed in Gaining Insight into IBM CICS
Systems with Events, REDP-4810) are emitted asynchronously to minimize the impact to the
system.
Search facility
With the growing number of event bindings created in your enterprise, it becomes essential to
understand the impact of proposed changes to your CICS applications and systems on the
event specified within these event bindings. You might say, “We emit events from a number of
our applications. If we ever need to make a change to any of those applications, how will we
know which event capture specifications are affected and might need updating?” To help
answer this question, the event processing search facility was made available in CICS TS 4.2.
The event processing search facility is a custom search tool provided in the IBM CICS
Explorer®. This facility searches event bindings and EP adapters in the CICS Explorer
workspace or those installed in a CICSplex that Explorer is connected to. You can search
either by resource name or by variable, structure, and copybook name of an imported
language structure. The search facility will find all capture specifications which match the
searched string. Based on the search result, you can decide which event bindings must
change.4 IBM CICS Event Processing: New Features in V4.2
For example, if you need to change a program named EVPROC02, as shown in Figure 2,
using the EP search facility, you can search for the string of EVPROC02 with the resource
type of PROGRAM. The scope for the search is the installed event bindings in a CICSplex
SDAYPEG to which the CICS Explorer is connected.
Figure 2 EP search facility IBM CICS Event Processing: New Features in V4.2 5
Figure 3 shows the event bindings that contain 'EVPROC02' after performing the search.
Expanding each event binding, you can see the details.
Figure 3 EP search results
Separate EP adapters
Another new feature provided by CICS TS 4.2 is the separation of EP adapter definitions from
event bindings. This makes it easier to change the configuration details for how events are to
be formatted and emitted, and to reuse the same configuration details for events in multiple
event bindings.
With this feature, in the CICS event binding editor, you can define a CICS Event Processing
Adapter resource, as separate from a CICS Event Binding resource, as shown in Figure 4.
Figure 4 Separate EP adapter and event binding definition menu options6 IBM CICS Event Processing: New Features in V4.2
For example, you can define an ABM WebSphere® MQ EP adapter resource called
mq_queue, as shown in Figure 5 under the resource bundle adapterbun1. In an event binding
writetsq1_evb, you can then select the option Use a predefined EPADAPTER resource and
provide the name of mq_queue as the adapter definition. The same EP adapter can be
specified for as many other event bindings as required.
Figure 5 Choosing a separate EPADAPTER resource for an event binding
With the separate EP adapter configuration, you can share the same EP adapter resource
among a number of event bindings while having only one EP adapter resource to manage.
You can also create several types of EP adapter resources and use each of them for specific
emission and delivery requirements.
For example, you can specify one WebSphere MQ EP adapter destined for WebSphere MQ
queue A with emission mode of asynchronous and non-transactional, and another
WebSphere MQ EP adapter destined for WebSphere MQ queue B with emission mode of
synchronous and transactional. According to the business requirements of the events, you
can then choose the appropriate EP adapter for each event binding.
Other event processing features in CICS TS 4.2
Although we have described three of the new EP features available in CICS TS 4.2, other new
features are also available.
A number of additional data types are now supported by event processing, such as short and
long floating point numbers and COBOL zoned decimal data types. For a complete list of the
supported data types, see the CICS TS 4.2 Information Center.
The CICS Temporary Storage (TS) queue EP adapter now supports the XML event formats of
common base event, common base event REST, and WebSphere Business Events (WBE),
adding to its value by providing a way of quickly testing to ensure that the right events are
being produced.
The EXEC CICS INQUIRE CAPTURESPEC command provides new options for system
programmers and developers to determine information about any primary predicate or
application context filters. The new EXEC CICS INQUIRE CAPOPTPRED, CAPDATAPRED, and
CAPINFOSRCE commands can be used to determine information about application command
options, application data predicates, and information sources that are specified in a given
capture specification. IBM CICS Event Processing: New Features in V4.2 7
You can now define capture specifications to emit events from the file and temporary storage
commands issued by the CICS Atom support and the EXEC CICS LINK commands that are
issued by the CICS-WebSphere MQ bridge program.
Conclusion
In conclusion, CICS TS 4.2 enhances event processing support through both runtime
offerings and improved tooling. All of these enable you to make the best use of event
processing and move towards event-driven business in a fast changing world.
The team who wrote this paper
This paper was produced by a team of specialists from around the world working at the
International Technical Support Organization, Hursley Center.
Catherine Moxey is an IBM Senior Technical Staff Member in CICS Strategy and Planning,
based at IBM Hursley near Winchester, UK. Catherine has more than 20 years' development
experience with IBM, primarily in CICS and System z®, but also in Web services
technologies. She is the architect for the event processing support in CICS Transaction
Server for z/OS.
Jenny He is a Software Engineer in CICS Transaction Server product suite at Hursley, UK.
Jenny has worked at IBM for nine years and has a broad experience in software development,
including Eclipse plug-in development, mainframe product development, agile development
process, solution building and testing for IBM Business Process Management product suite,
and authoring IBM Redbooks®. Jenny holds a Bachelor of Engineering degree in Electronics
from South China University of Technology, and a PhD in optical networking from University of
Essex, UK.
Thanks to the following people for their contributions to this project:
Chris Rayns
International Technical Support Organization, Poughkeepsie Center
Now you can become a published author, too!
Here's an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html8 IBM CICS Event Processing: New Features in V4.2
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by
GSA ADP Schedule Contract with IBM Corp. 9
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. 10 IBM CICS Event Processing: New Features in V4.2
®
Redpaper™
This document REDP-4809-00 was created or updated on December 8, 2011.
Send us your comments in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400 U.S.A.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
CICS Explorer®
CICS®
IBM®
Redbooks®
Redpaper™
Redpapers™
Redbooks (logo) ®
System z®
WebSphere®
z/OS®
The following terms are trademarks of other companies:
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others.
ibm.com/redbooks Redpaper
Front cover
IBM Power 770 and 780
Technical Overview
and Introduction
Alexandre Bicas Caldeira
Carlo Costantini
Steve Harnett
Volker Haug
Craig Watson
Fabien Willmann
Features the 9117-MMC and 9179-MHC based on
the latest POWER7 processor technology
Describes MaxCore and TurboCore for
redefining performance
Discusses Active Memory
Mirroring for HypervisorInternational Technical Support Organization
IBM Power 770 and 780 Technical Overview and
Introduction
December 2011
REDP-4798-00© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
First Edition (December 2011)
This edition applies to the IBM Power 770 (9117-MMC) and Power 780 (9179-MHC) Power Systems servers.
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.© Copyright IBM Corp. 2011. All rights reserved. iii
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
The team who wrote this paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1. General description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Systems overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 IBM Power 770 server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 IBM Power 780 server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Operating environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Physical package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 System features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Power 770 system features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Power 780 system features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3 Minimum features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.4 Power supply features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.5 Processor card features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.6 Summary of processor features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.7 Memory features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Disk and media features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 I/O drawers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.1 PCI-DDR 12X Expansion Drawers (#5796) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6.2 12X I/O Drawer PCIe (#5802 and #5877) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6.3 EXP 12S SAS Drawer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6.4 EXP 24S SFF Gen2-bay Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.6.5 I/O drawers and usable PCI slot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.7 Comparison between models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8 Build to Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.9 IBM Editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.10 Model upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.11 Hardware Management Console models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.12 System racks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.12.1 IBM 7014 model T00 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.12.2 IBM 7014 model T42 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.3 IBM 7014 model S25 rack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.4 Feature number 0555 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.5 Feature number 0551 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.6 Feature number 0553 rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.12.7 The AC power distribution unit and rack content . . . . . . . . . . . . . . . . . . . . . . . . 31
1.12.8 Rack-mounting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.12.9 Useful rack additions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 2. Architecture and technical overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1 The IBM POWER7 processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.1 POWER7 processor overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41iv IBM Power 770 and 780 Technical Overview and Introduction
2.1.2 POWER7 processor core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.1.3 Simultaneous multithreading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.4 Memory access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.1.5 Flexible POWER7 processor packaging and offerings . . . . . . . . . . . . . . . . . . . . . 44
2.1.6 On-chip L3 cache innovation and Intelligent Cache . . . . . . . . . . . . . . . . . . . . . . . 46
2.1.7 POWER7 processor and Intelligent Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.1.8 Comparison of the POWER7 and POWER6 processors . . . . . . . . . . . . . . . . . . . 47
2.2 POWER7 processor cards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.1 Two-socket processor card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.2 Four-socket processor card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.3 Processor comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3 Memory subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.1 Fully buffered DIMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.2 Memory placement rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.3.3 Memory throughput. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.3.4 Active Memory Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Capacity on Demand. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4.1 Capacity Upgrade on Demand (CUoD). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4.2 On/Off Capacity on Demand (On/Off CoD). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.4.3 Utility Capacity on Demand (Utility CoD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.4.4 Trial Capacity On Demand (Trial CoD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.4.5 Software licensing and CoD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5 CEC Enclosure interconnection cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.6 System bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.6.1 I/O buses and GX++ card . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.6.2 Flexible Service Processor bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.7 Internal I/O subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.7.1 Blind-swap cassettes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.7.2 System ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.8 PCI adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.8.1 PCIe Gen1 and Gen2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.8.2 PCI-X adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.8.3 IBM i IOP adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.8.4 PCIe adapter form factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.8.5 LAN adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.8.6 Graphics accelerator adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.8.7 SCSI and SAS adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.8.8 iSCSI adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.8.9 Fibre Channel adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.8.10 Fibre Channel over Ethernet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.8.11 InfiniBand Host Channel adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.8.12 Asynchronous adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.9 Internal storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.9.1 Dual split backplane mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.9.2 Triple split backplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.9.3 Dual storage IOA configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.9.4 DVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.10 External I/O subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.10.1 PCI-DDR 12X Expansion drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.10.2 12X I/O Drawer PCIe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.10.3 Dividing SFF drive bays in 12X I/O drawer PCIe . . . . . . . . . . . . . . . . . . . . . . . . 87
2.10.4 12X I/O Drawer PCIe and PCI-DDR 12X Expansion Drawer 12X cabling . . . . . 90
2.10.5 12X I/O Drawer PCIe and PCI-DDR 12X Expansion Drawer SPCN cabling . . . 91 Contents v
2.11 External disk subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.11.1 EXP 12S Expansion Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.11.2 EXP24S SFF Gen2-bay Drawer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.11.3 TotalStorage EXP24 disk drawer and tower . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.11.4 IBM TotalStorage EXP24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.11.5 IBM System Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.12 Hardware Management Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.12.1 HMC functional overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.12.2 HMC connectivity to the POWER7 processor-based systems . . . . . . . . . . . . . 102
2.12.3 High availability using the HMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.13 IBM Systems Director Management Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.14 Operating system support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.14.1 Virtual I/O Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.14.2 IBM AIX operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.14.3 IBM i operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.14.4 Linux operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.14.5 Java supported versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.14.6 Boosting performance and productivity with IBM compilers . . . . . . . . . . . . . . . 112
2.15 Energy management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.15.1 IBM EnergyScale technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.15.2 Thermal power management device card. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Chapter 3. Virtualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.1 POWER Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.2 POWER processor modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.3 Active Memory Expansion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.4 PowerVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.4.1 PowerVM editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.4.2 Logical partitions (LPARs). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.4.3 Multiple Shared Processor Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.4.4 Virtual I/O Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.4.5 PowerVM Live Partition Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.4.6 Active Memory Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.4.7 Active Memory Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.4.8 N_Port ID virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
3.4.9 Operating system support for PowerVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.4.10 POWER7 Linux programming support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
3.5 System Planning Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Chapter 4. Continuous availability and manageability . . . . . . . . . . . . . . . . . . . . . . . . 153
4.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.1.1 Designed for reliability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.1.2 Placement of components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.1.3 Redundant components and concurrent repair. . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
4.2.1 Partition availability priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.2.2 General detection and deallocation of failing components . . . . . . . . . . . . . . . . . 157
4.2.3 Memory protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.2.4 Active Memory Mirroring for Hypervisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.2.5 Cache protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.2.6 Special uncorrectable error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.2.7 PCI enhanced error handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.2.8 POWER7 I/O chip freeze behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168vi IBM Power 770 and 780 Technical Overview and Introduction
4.3 Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.3.1 Detecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.3.2 Diagnosing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.3.3 Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.3.4 Notifying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.3.5 Locating and servicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.4 Manageability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.4.1 Service user interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.4.2 IBM Power Systems firmware maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.4.3 Electronic Services and Electronic Service Agent . . . . . . . . . . . . . . . . . . . . . . . 190
4.5 Operating system support for RAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195© Copyright IBM Corp. 2011. All rights reserved. vii
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in
other operating environments may vary significantly. Some measurements may have been made on development-level
systems and there is no guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this
document should verify the applicable data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. viii IBM Power 770 and 780 Technical Overview and Introduction
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Active Memory™
AIX®
BladeCenter®
DS8000®
Electronic Service Agent™
EnergyScale™
Focal Point™
IBM Systems Director Active Energy
Manager™
IBM®
Micro-Partitioning®
POWER Hypervisor™
Power Systems™
Power Systems Software™
POWER6+™
POWER6®
POWER7®
PowerHA®
PowerPC®
PowerVM®
Power®
POWER®
pSeries®
Rational Team Concert™
Rational®
Redbooks®
Redpaper™
Redbooks (logo) ®
Storwize®
System Storage®
System x®
System z®
Tivoli®
XIV®
The following terms are trademarks of other companies:
Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries.
LTO, Ultrium, the LTO Logo and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S.
and other countries.
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others. © Copyright IBM Corp. 2011. All rights reserved. ix
Preface
This IBM® Redpaper™ publication is a comprehensive guide covering the IBM Power® 770
(9117-MMC) and Power 780 (9179-MHC) servers supporting IBM AIX®, IBM i, and Linux
operating systems. The goal of this paper is to introduce the major innovative Power 770 and
780 offerings and their prominent functions, including these:
The IBM POWER7® processor available at frequencies of 3.3 GHz, 3.44 GHz, 3.72 GHz,
and 3.92 GHz, and 4.14 GHz
The specialized IBM POWER7 Level 3 cache that provides greater bandwidth, capacity,
and reliability
The 1 Gb or 10 Gb Integrated Multifunction Card that provides two USB ports, one
serial port, and four Ethernet connectors for a processor enclosure and does not require a
PCI slot
The new Active Memory™ Mirroring (AMM) for Hypervisor feature that mirrors the main
memory used by the firmware
IBM PowerVM® virtualization, including PowerVM Live Partition Mobility and PowerVM
Active Memory Sharing
Active Memory Expansion that provides more usable memory than what is physically
installed on the system
IBM EnergyScale™ technology that provides features such as power trending,
power-saving, capping of power, and thermal measurement
Enterprise-ready reliability, serviceability, and availability
Professionals who want to acquire a better understanding of IBM Power Systems™ products
should read this Redpaper publication. The intended audience includes the following areas:
Clients
Sales and marketing professionals
Technical support professionals
IBM Business Partners
Independent software vendors
This Redpaper publication expands the current set of IBM Power Systems documentation by
providing a desktop reference that offers a detailed technical description of the Power 770
and Power 780 systems.
This paper does not replace the latest marketing materials and configuration tools. It is
intended as an additional source of information that, together with existing sources, can be
used to enhance your knowledge of IBM server solutions.
The team who wrote this paper
This paper was produced by a team of specialists from around the world working at the
International Technical Support Organization, Poughkeepsie Center.
Alexandre Bicas Caldeira works on the Power Systems Field Technical Sales Support
Team for IBM Brazil. He holds a degree in computer science from the Universidade Estadual
Paulista (UNESP). Alexandre has more than 11 years of experience working for IBM and IBM x IBM Power 770 and 780 Technical Overview and Introduction
Business Partners on Power Systems hardware, AIX, and PowerVM virtualization products.
He is also skilled on IBM System Storage®, IBM Tivoli® Storage Manager, IBM System x®,
and VMware.
Carlo Costantini is a Certified IT Specialist for IBM and has over 33 years of experience
with IBM and IBM Business Partners. He currently works in Italy Power Systems Platforms
as Presales Field Technical Sales Support for IBM Sales Representatives and IBM
Business Partners. Carlo has broad marketing experience, and his current areas of focus
are competition, sales, and technical sales support. He is a Certified Specialist for
Power Systems servers. He holds a master’s degree in Electronic Engineering from
Rome University.
Steve Harnett is a Senior Accredited Professional, Chartered IT Professional, and a member
of the British Computing Society. He works as a pre-sales Technical Consultant in the IBM
Server and Technology Group in the UK. Steve has over 16 years of experience of working in
post sales supporting Power Systems. He is a product Topgun and a recognized SME in
Electronic Service Agent™, Hardware Management Console, and high-end Power Systems.
He also has several years of experience in developing and delivering education to clients,
IBM Business Partners, and IBMers.
Volker Haug is a certified Consulting IT Specialist within the IBM Systems and Technology
Group based in Ehningen, Germany. He holds a bachelor's degree in business management
from the University of Applied Studies in Stuttgart. His career has included more than 24
years working in the IBM PLM and Power Systems divisions as a RISC and AIX Systems
Engineer. Volker is an expert in Power Systems hardware, AIX, and PowerVM virtualization.
He is a POWER7 Champion and a member of the German Technical Expert Council, an
affiliate of the IBM Academy of Technology. He has written several books and white papers
about AIX, workstations, servers, and PowerVM virtualization.
Craig Watson has 15 years of experience working with UNIX-based systems in roles
including field support, systems administration, and technical sales. He has worked in the
IBM Systems and Technology group since 2003 and is currently working as a Systems
Architect, designing complex solutions for customers that include Power Systems, System x,
and Systems Storage. He holds a master’s degree in electrical and electronic engineering
from the University of Auckland.
Fabien Willmann is an IT Specialist working with Techline Power Europe in France. He
has 10 years of experience with Power Systems, AIX, and PowerVM virtualization. After
teaching hardware courses about Power Systems servers, he joined ITS as an AIX
consultant, where he developed his compentencies in AIX, HMC management, and
PowerVM virtualization. Building new Power Systems configurations for STG presales is his
major area of expertise today. He recently gave a workshop on the econfig configuration tool,
focused on POWER7 processor-based BladeCenters during the symposium for French
Business Partners in Montpellier.
The project that produced this publication was managed by:
Scott Vetter, IBM Certified Project Manager and PMP. Preface xi
Thanks to the following people for their contributions to this project:
Larry Amy, Gary Anderson, Sue Beck, Terry Brennan, Pat Buckland, Paul D. Carey,
Pete Heyrman, John Hilburn, Dan Hurlimann, Kevin Kehne, James Keniston, Jay Kruemcke,
Robert Lowden, Hilary Melville, Thoi Nguyen, Denis C. Nizinski, Pat O’Rourke, Jan Palmer,
Ed Prosser, Robb Romans, Audrey Romonosky, Todd Rosedahl, Melanie Steckham,
Ken Trusits, Al Yanes
IBM U.S.A.
Stephen Lutz
IBM Germany
Ta mi k i a Ba r r ow
International Technical Support Organization, Poughkeepsie Center
Now you can become a published author, too!
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our papers to be as helpful as possible. Send us your comments about this paper or
other IBM Redbooks® publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400xii IBM Power 770 and 780 Technical Overview and Introduction
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html© Copyright IBM Corp. 2011. All rights reserved. 1
Chapter 1. General description
The IBM Power 770 (9117-MMC) and IBM Power 780 servers (9179-MHC) utilize the latest
POWER7 processor technology designed to deliver unprecedented performance, scalability,
reliability, and manageability for demanding commercial workloads.
The innovative IBM Power 770 and Power 780 servers with POWER7 processors are
symmetric multiprocessing (SMP), rack-mounted servers. These modular-built system uses
one to four enclosures. Each enclosure is four EIA units (4U) tall and is housed in a 19-inch
rack.
New to the Power 770 and Power 780 models are two new, powerful POWER7 processor
cards. Each drawer contains a powerful POWER7 processor card and an enhanced
POWER7 I/O backplane.
12 IBM Power 770 and 780 Technical Overview and Introduction
1.1 Systems overview
You can find detailed information about the Power 770 and Power 780 systems within the
following sections.
1.1.1 IBM Power 770 server
Each Power 770 processor card features 64-bit architecture designed with two single-chip
module (SCM) POWER7 processors. Each POWER7 SCM enables up to either six or eight
active processor cores with 2 MB of L2 cache (256 KB per core), 24 MB of L3 cache (4 MB
per core) for the 6-core SCM, and 32 MB of L3 cache (4 MB per core) for the 8-core SCM.
A Power 770 server using 6-core SCM processors will enable up to 48 processor cores
running at frequencies of 3.72 GHz. A system configured with up to four CEC enclosures
using 8-core SCM processors will enable up to 64 processor cores running at frequencies up
to 3.30 GHz. The Power 770 server is available starting as low as four active cores and
incrementing one core at a time through built-in Capacity on Demand (CoD) functions to a
maximum of 64 active cores.
A single Power 770 CEC enclosure is equipped with 16 DIMM slots running at speeds up to
1066 MHz. A system configured with four drawers and 64 GB DDR3 DIMMs supports up to a
maximum of 4.0 TB of DDR3 memory. All POWER7 DDR3 memory uses memory
architecture that provides increased bandwidth and capacity. This enables operating at a
higher data rate for large memory configurations.
The Power 770 has two new integrated POWER7 I/O controllers that enhance I/O
performance while supporting a maximum of six internal PCIe adapters and six internal small
form-factor SAS DASD bays.
The Power 770 features Active Memory Mirroring (AMM) for Hypervisor, which is available as
an optional feature. AMM guards against system-wide outages due to any uncorrectable error
associated with firmware. Also available as an option is Active Memory Expansion, which
enhances memory capacity. Chapter 1. General description 3
Figure 1-1 shows a Power 770 with the maximum four enclosures, and the front and rear
views of a single-enclosure Power 770.
Figure 1-1 Four-enclosure Power 770, a single-enclosure Power 770 front and rear views
1.1.2 IBM Power 780 server
Each Power 780 processor card comprises either two single-chip module (SCM) POWER7
processors or four SCM POWER7 processors, each designed with 64-bit architecture. Each
POWER7 SCM enables either up to six or eight active processor cores with 2 MB of L2 cache
(256 KB per core, 24 MB of L3 cache (4 MB per core) for the 6-core SCM, and 32 MB of L3
cache (4 MB per core) for the 8-core SCM.
For the Power 780, each POWER7 SCM processor is available at frequencies of 3.44 GHz
with six cores, 3.92 GHz with eight cores, or 4.14 GHz with four cores. The Power 780 server
is available starting as low as four active cores and incrementing one core at a time through
built-in Capacity on Demand (CoD) functions to a maximum of 96 active cores.
The Power 780 features the truly unique ability to switch between its standard throughput
optimized mode and its unique TurboCore mode. In TurboCore mode performance per core
is boosted with access to both additional cache and additional clock speed. Based on the
user's configuration option, any Power 780 system can be booted in standard mode, enabling
up to a maximum of 64 processor cores running at 3.92 GHz, or in TurboCore mode, enabling
up to 32 processor cores running at 4.14 GHz and twice the cache per core.
A single Power 770 CEC enclosure is equipped with 16 DIMM slots running at speeds up to
1066 MHz. A system configured with four drawers and 64 GB DDR3 DIMMs supports up to a
maximum of 4.0 TB of DDR3 memory. All POWER7 DDR3 memory uses memory
architecture that provides increased bandwidth and capacity. This enables operating at a
higher data rate for large memory configurations.
Power 770 with 4 enclosures Power 770 single enclosure front view
Power 770 single enclosure rear view
Note: TurboCore mode is not supported on the 6-core processor card.4 IBM Power 770 and 780 Technical Overview and Introduction
The Power 780 has two new integrated POWER7 I/O controllers that enhance I/O
performance while supporting a maximum of six internal PCIe adapters and six internal small
form-factor SAS DASD bays.
The Power 780 features AMM for Hypervisor, which is available as a standard feature.
AMM guards against system-wide outages due to any uncorrectable error associated with
firmware. Also available as an option is Active Memory Expansion, which enhances
memory capacity.
1.2 Operating environment
Table 1-1 lists the operating environment specifications for the servers.
Table 1-1 Operating environment for Power 770 and Power 780 (for one enclosure only)
Power 770 and Power 780 operating environment
Description Operating Non-operating
Temperature 5 - 35 degrees C
(41 to 95 degrees F)
5 - 45 degrees C
(41 - 113 degrees F)
Relative humidity 20 - 80% 8 - 80%
Maximum dew point 29 degrees C
(84 degrees F)
28 degrees C
(82 degrees F)
Operating voltage 200 - 240 V ac Not applicable
Operating frequency 50 - 60 +/- 3 Hz Not applicable
Power consumption Power 770:
1,600 watts maximum (per
enclosure with 16 cores active)
Power 780:
1,900 watts maximum (per
enclosure with 24 cores active)
Not applicable
Power source loading Power 770:
1.649 kVA maximum (per
enclosure with 16 cores active)
Power 780:
1.959 kVA maximum (per
enclosure with 24 cores active)
Not applicable
Thermal output Power 770:
5,461 Btu/hr maximum (per
enclosure with 16 cores active)
Power 780:
6,485 Btu/hr maximum (per
enclosure with 24 cores active)
Not applicable
Maximum altitude 3048 m
(10,000 ft)
Not applicableChapter 1. General description 5
1.3 Physical package
Table 1-2 lists the physical dimensions of an individual enclosure. Both servers are available
only in a rack-mounted form factor. They are modular systems that can be constructed from
one to four building-block enclosures. Each of these enclosures can take 4U (EIA units) of
rack space. Thus, a two-enclosure system requires 8U, three enclosures require 12U, and
four enclosures require 16U.
Table 1-2 Physical dimensions of a Power 770 and Power 780 enclosure
Noise level for one enclosure Power 770 (one enclosure with 16 active cores):
7.1 bels (operating/idle)
6.6 bels (operating/idle) with acoustic rack doors
Power 780 (one enclosure with 24 active cores):
7.1 bels (operating/idle)
6.6 bels (operating/idle) with acoustic rack doors
Noise level for four enclosures Power 770 (four enclosure with 64 active cores):
7.6 bels (operating/idle)
7.1 bels (operating/idle) with acoustic rack doors
Power 780 (four enclosure with 96 active cores):
7.6 bels (operating/idle)
7.1 bels (operating/idle) with acoustic rack doors
Power 770 and Power 780 operating environment
Description Operating Non-operating
Dimension Power 770 (Model 9117-MMC)
single enclosure
Power 780 (Model 9179-MHC)
single enclosure
Width 483 mm (19.0 in) 483 mm (19.0 in)
Depth 863 mm (32.0 in) 863 mm (32.0 in)
Height 174 mm (6.85 in), 4U (EIA units) 174 mm (6.85 in), 4U (EIA units)
Weight 70.3 kg (155 lb) 70.3 kg (155 lb)6 IBM Power 770 and 780 Technical Overview and Introduction
Figure 1-2 shows the front and rear views of the Power 770 and Power 780.
Figure 1-2 Front and rear views of the Power 770 and Power 780
1.4 System features
The Power 770 processor card features 64-bit architecture designed with two single-chip
module (SCM) POWER7 processors. The Power 780 processor card comprises either two
single-chip module (SCM) POWER7 processors or four SCM POWER7 processors, each
designed with 64-bit architecture.
Each POWER7 SCM enables either up to six or eight active processor cores with 2 MB of L2
cache (256 KB per core), 24 MB of L3 cache (4 MB per core) for the 6-core SCM, and 32 MB
of L3 cache (4 MB per core) for the 8-core SCM.
1.4.1 Power 770 system features
The following features are available on the Power 770:
4U 19-inch rack-mount system enclosure
One to four system enclosures: 16U maximum system size
One processor card feature per enclosure (includes the voltage regulator):
– 0/12-core, 3.72 GHz processor card (#4983)
– 0/16-core, 3.3 GHz processor card (#4984)
POWER7 DDR3 Memory DIMMs (16 DIMM slots per CEC enclosure):
– 0/32 GB (4 X 8 GB), 1066 MHz (#5600)
– 0/64 GB (4 X 16 GB), 1066 MHz (#5601)
– 0/128 GB (4 X 32 GB), 1066 MHz (#5602)
– 0/256 GB (4 X 64 GB), 1066 MHz (#5564)
Six hot-swappable, 2.5-inch, small form factor, SAS disk or SSD bays per enclosure
GX++ Bus
Integrated
Ports
Power
Supplies
FSP
connectors
P
C
I
e
P
C
I
e
P
C
I
e
P
C
I
e
P
C
I
e
P
C
I
e
SPCN
Ports
Serial
Port
USB
Ports
Power 770 and Power 780 enclosure rear view
HMC
PortsChapter 1. General description 7
One hot-plug, slim-line, SATA media bay per enclosure (optional)
Redundant hot-swap AC power supplies in each enclosure
Choice of Integrated Multifunction Card options; maximum one per enclosure:
– Dual 10 Gb Copper and Dual 1 Gb Ethernet (#1768)
– Dual 10 Gb Optical and Dual 1 Gb Ethernet (#1769)
One serial port included on each Integrated Multifunction Card
Two USB ports included on each Integrated Multifunction Card, plus another USB port on
each enclosure (maximum nine usable per system)
Two HMC ports per enclosure (maximum four per system)
Eight I/O expansion slots per enclosure (maximum 32 per system)
– Six Gen2 PCIe 8x slots plus two GX++ slots per enclosure
Dynamic LPAR support, Processor and Memory CUoD
PowerVM (optional)
– Micro-Partitioning®
– Virtual I/O Server (VIOS)
– Automated CPU and memory reconfiguration support for dedicated and shared
processor logical partition groups (dynamic LPAR)
– Support for manual provisioning of resources, namely PowerVM Live Partition
Migration (PowerVM Enterprise Edition)
Optional PowerHA® for AIX, IBM i, and Linux
12X I/O drawer with PCI slots
– Up to 16 PCIe I/O drawers (#5802 or #5877)
– Up to 32 PCI-X DDR I/O drawers (7314-G30 or #5796)
Additional considerations: Note the following considerations:
The Ethernet port of the Integrated Multifunction Card cannot be used for an IBM i
console. Use separate Ethernet adapters that can be directly controlled by IBM i
without the Virtual I/O server for IBM i LAN consoles if desired. Alternatively, an
HMC can also be used for an IBM i console.
The first and second CEC enclosure must contain one Integrated Multifunction Card
(#1768 or #1769). The Integrated Multifunction Card is optional for the third or fourth
CEC enclosure.
Each Integrated Multifunction Card has four Ethernet ports, two USB ports, and one
serial port. Usage of the serial port by AIX/Linux is supported for MODEM call home,
TTY console, and snooping even if an HMC/SDMC is attached to the server. Usage
by the serial port to communicate with a UPS is not supported.
The first and second CEC enclosures each have two HMC/SDMC ports on the
service processor (#EU05). If there are two CEC enclosures, the HMC must be
connected to both service processor cards.8 IBM Power 770 and 780 Technical Overview and Introduction
Disk-only I/O drawers
– Up to 56 EXP24S SFF SAS I/O drawers on external SAS controller (#5887)
– Up to 110 EXP12S SAS DASD/SSD I/O drawers on SAS PCI controllers (#5886)
– Up to 60 EXP24 SCSI DASD Expansion drawers on SCSI PCI controllers (7031-D24)
IBM Systems Director Active Energy Manager™
The Power 770 operator interface controls located on the front panel of the primary I/O
drawer consist of a power ON/OFF button with a POWER® indicator, an LCD display for
diagnostic feedback, a RESET button, and a disturbance or system attention LED.
1.4.2 Power 780 system features
The following features are available on the Power 780:
4U 19-inch rack-mount system enclosure
One to four system enclosures: 16U maximum system size
One processor card feature per enclosure (includes the voltage regulator):
– 0/16 core, 3.92 GHz or 0/8 core, 4.14 GHz (TurboCore) processor card (#5003)
– 0/24 core, 3.44 GHz processor card (#EP24)
POWER7 DDR3 Memory DIMMs (16 DIMM slots per processor card):
– 0/32 GB (4 X 8 GB), 1066 MHz (#5600)
– 0/64 GB (4 X 16 GB), 1066 MHz (#5601)
– 0/128 GB (4 X 32 GB), 1066 MHz (#5602)
– 0/256 GB (4 X 64 GB), 1066 MHz (#5564)
Six hot-swappable, 2.5-inch, small form factor, SAS disk or SSD bays per enclosure
One hot-plug, slim-line, SATA media bay per enclosure (optional)
Redundant hot-swap AC power supplies in each enclosure
Choice of Integrated Multifunction Card options; maximum one per enclosure:
– Dual 10 Gb Copper and Dual 1 Gb Ethernet (#1768)
– Dual 10 Gb Optical and Dual 1 Gb Ethernet (#1769)
One serial port included on each Integrated Multifunction Card
Two USB ports included on each Integrated Multifunction Card plus another USB port on
each enclosure (maximum nine usable per system)Chapter 1. General description 9
Two HMC ports per enclosure (maximum four per system)
Eight I/O expansion slots per enclosure (maximum 32 per system)
– Six Gen2 PCIe 8x slots plus two GX++ slots per enclosure
Dynamic LPAR support, Processor and Memory CUoD
PowerVM (optional):
– Micro-Partitioning
– Virtual I/O Server (VIOS)
– Automated CPU and memory reconfiguration support for dedicated and shared
processor logical partition (LPAR) groups
– Support for manual provisioning of resources partition migration (PowerVM
Enterprise Edition)
Optional PowerHA for AIX, IBM i, and Linux
12X I/O drawer with PCI slots
– Up to 16 PCIe I/O drawers (#5802 or #5877)
– Up to 32 PCI-X DDR I/O drawers (7314-G30 or feature #5796)
Disk-only I/O drawers
– Up to 56 EXP24S SFF SAS I/O drawers on external SAS controller (#5887)
– Up to 110 EXP12S SAS DASD/SSD I/O drawers on SAS PCI controllers (#5886)
– Up to 60 EXP24 SCSI DASD Expansion drawers on SCSI PCI controllers (7031-D24)
IBM Systems Director Active Energy Manager
The Power 780 operator interface/controls located on the front panel of the primary I/O
drawer consist of a power ON/OFF button with a POWER indicator, an LCD display for
diagnostic feedback, a RESET button, and a disturbance or system attention LED.
Additional considerations: Note the following considerations:
The Ethernet ports of the Integrated Multifunction Card cannot be used for an IBM i
console. Separate Ethernet adapters that can be directly controlled by IBM i without
the Virtual I/O server should be used for IBM i LAN consoles if desired. Alternatively,
an HMC can also be used for an IBM i console.
The first and second CEC enclosure must contain one Integrated Multifunction Card
(#1768 or #1769). The Integrated Multifunction Card is optional for the third or fourth
CEC enclosure.
Each Integrated Multifunction Card has four Ethernet ports, two USB ports, and one
serial port. Usage of the serial port by AIX/Linux is supported for MODEM call home,
TTY console, and snooping even if an HMC/SDMC is attached to the server. Usage
by the serial port to communicate with a UPS is not supported.
The first and second CEC enclosures each have two HMC/SDMC ports on the
service processor (#EU05). If there are two CEC enclosures, the HMC must be
connected to both service processor cards.10 IBM Power 770 and 780 Technical Overview and Introduction
1.4.3 Minimum features
Each system has a minimum feature set in order to be valid. Table 1-3 shows the minimum
system configuration for a Power 770.
Table 1-3 Minimum features for Power 770 system
Power 770 minimum features Additional notes
1x CEC enclosure (4U) 1x System Enclosure with IBM Bezel (#5585) or OEM Bezel
(#5586)
1x Service Processor (#5664)
1x DASD Backplane (#5652)
2x Power Cords (two selected by customer)
– 2x A/C Power Supply (#5632)
1x Operator Panel (#1853)
1x Integrated Multifunction Card options (one of these):
– Dual 10 Gb Copper and Dual 1 Gb Ethernet (#1768)
– Dual 10 Gb Optical and Dual 1 Gb Ethernet (#1769)
1x primary operating system (one
of these)
AIX (#2146)
Linux (#2147)
IBM i (#2145)
1x Processor Card 0/12-core, 3.72 GHz processor card (#4983)
0/16-core, 3.3 GHz processor card (#4984)
4x Processor Activations
(quantity of four for one of these)
One Processor Activation for processor feature #4983
(#5329)
One Processor Activation for processor feature #4984
(#5334)
2x DDR3 Memory DIMMs (one of
these)
0/32 GB (4 X 8 GB), 1066 MHz (#5600)
0/64 GB (4 X 16 GB), 1066 MHz (#5601)
0/128 GB (4 X 32 GB), 1066 MHz (#5602)
0/256 GB (4 X 64 GB), 1066 MHz (#5564)
32x Activations of 1 GB DDR3 -
POWER7 Memory (#8212)
-
For AIX and Linux: 1x disk drive
For IBM i: 2x disk drives
Formatted to match the system Primary O/S indicator selected,
or if using a Fibre Channel attached SAN (indicated by #0837) a
disk drive is not required.
1X Language Group (selected by
the customer)
-
1x Removable Media Device
(#5762)
Optionally orderable, a standalone system (not network
attached) would required this feature.
1x HMC Required for every Power 770 (9117-MMC)Chapter 1. General description 11
Table 1-4 shows the minimum system configuration for a Power 780 system.
Table 1-4 Minimum features for Power 780 system
Note: Consider the following:
A minimum number of four processor activations must be ordered per system.
The minimum activations ordered with all initial orders of memory features #5600, #5601, and
#5602 must be 50% of their installed capacity.
The minimum activations ordered with MES orders of memory features #5600, #5601, and #5602
will depend on the total installed capacity of features #5600, #5601, and #5602. This allows
newly ordered memory to be purchased with less than 50% activations when the currently
installed capacity exceeds 50% of the existing features #5600, #5601, and #5602 capacity.
The minimum activations ordered with all initial orders of memory feature #5564 must be 192 GB
of 256 GB per each feature #5564 ordered (that is, 75% of the installed feature #5564 capacity).
The minimum activations purchased with MES orders of feature #5564 memory, 0/256 GB, will
depend on the total installed capacity of feature #5564. This allows MES orders of feature #5564
memory to be purchased with less than 192/256 GB per each feature #5564 ordered when the
system activations currently installed exceed 75% of the existing feature #5564 capacity.
Power 780 minimum features Additional notes
1x CEC enclosure (4U) 1x System Enclosure with IBM Bezel (#5595) or OEM Bezel
(#5596)
1x Service Processor (#5664)
1x DASD Backplane (#5652)
2x Power Cords (two selected by customer)
– 2x A/C Power Supply (#5532)
1x Operator Panel (#1853)
1x Integrated Multifunction Card options (one of these):
– Dual 10 Gb Copper and Dual 1 Gb Ethernet (#1768)
– Dual 10 Gb Optical and Dual 1 Gb Ethernet (#1769)
1x primary operating system (one
of these)
AIX (#2146)
Linux (#2147)
IBM i (#2145)
1x Processor Card (one of these) 0/16 core, 3.92 GHz or 0/8 core, 4.14 GHz (TurboCore)
processor card (#5003)
0/24 core, 3.44 GHz processor card (#EP24)
4x Processor Activations for
Processor Feature #4982 (#5469)
-
2x DDR3 Memory DIMM (one of
these)
0/32 GB (4 X 8 GB), 1066 MHz (#5600)
0/64 GB (4 X 16 GB), 1066 MHz (#5601)
0/128 GB (4 X 32 GB), 1066 MHz (#5602)
0/256 GB (4 X 64 GB), 1066 MHz (#5564)
32x Activations of 1 GB DDR3 -
POWER7 Memory (#8212)
-
For AIX and Linux: 1x disk drive
For IBM i: 2x disk drives
Formatted to match the system Primary O/S indicator selected,
or if using a Fibre Channel attached SAN (indicated by #0837) a
disk drive is not required.
1X Language Group (selected by
the customer)
-
Power 770 minimum features Additional notes12 IBM Power 770 and 780 Technical Overview and Introduction
1.4.4 Power supply features
Two system AC power supplies are required for each CEC enclosure. The second power
supply provides redundant power for enhanced system availability. To provide full
redundancy, the two power supplies must be connected to separate power distribution
units (PDUs).
A CEC enclosure will continue to function with one working power supply. A failed power
supply can be hot-swapped but must remain in the system until the replacement power
supply is available for exchange. The system requires one functional power supply in each
CEC enclosure to remain operational.
Each Power 770 or Power 780 server with two or more CEC enclosures must have one
Power Control Cable (#6006 or similar) to connect the service interface card in the first
enclosure to the service interface card in the second enclosure.
1.4.5 Processor card features
Each of the four system enclosures contains one powerful POWER7 processor card feature,
consisting of two single-chip module processors. Each of the POWER7 processors in the
server has a 64-bit architecture, includes six or eight cores on a single-chip module, and
contains 2 MB of L2 cache (256 KB per core), 24 MB of L3 cache (4 MB per core) for the
6-core SCM, and 32 MB of L3 cache (4 MB per core) for the 8-core SCM.
There are two types of Power 770 processor cards, offering the following features:
Two 6-core POWER7 SCMs with 24 MB of L3 cache (12-cores per processor card, each
core with 4 MB of L3 cache) at 3.72 GHz (#4983)
Two 8-core POWER7 SCMs with 32 MB of L3 cache (16-cores per processor card, each
core with 4 MB of L3 cache) at 3.3 GHz (#4984)
The Power 780 has two types of processor cards. One of these has two different processing
modes (MaxCore and TurboCore).
1x Removable Media Device
(#5762)
Optionally orderable, a standalone system (not network
attached) requires this feature.
1x HMC Required for every Power 780 (9179-MHC)
Note the following considerations:
A minimum number of four processor activations must be ordered per system.
The minimum activations ordered with all initial orders of memory features #5600, #5601, and
#5602 must be 50% of their installed capacity.
The minimum activations ordered with MES orders of memory features #5600, #5601, and #5602
will depend on the total installed capacity of features #5600, #5601, and #5602. This allows
newly ordered memory to be purchased with less than 50% activations when the currently
installed capacity exceeds 50% of the existing features #5600, #5601, and #5602 capacity.
The minimum activations ordered with all initial orders of memory feature #5564 must be 192 GB
of 256 GB per each feature #5564 ordered (that is, 75% of the installed feature #5564 capacity).
The minimum activations purchased with MES orders of feature #5564 memory, 0/256 GB, will
depend on the total installed capacity of feature #5564. This allows MES orders of feature #5564
memory to be purchased with less than 192/256 GB per each feature #5564 ordered when the
system activations currently installed exceed 75% of the existing feature #5564 capacity.
Power 780 minimum features Additional notesChapter 1. General description 13
The processor card houses the two or four POWER7 SCMs and the system memory. The
Power 780 processor card offers the following features:
Feature #5003 offers two 8-core POWER7 SCMs with 32 MB of L3 cache (16 cores per
processor card are activated in MaxCore mode and each core with 4 MB of L3 cache) at
3.92 GHz.
Feature #5003 also offers two 8-core POWER7 SCMs with 32 MB of L3 cache (8 cores
per processor card are activated in TurboCore mode and each core is able to use 8 MB of
L3 cache) at 4.14 GHz.
Feature #EP24 offers four 6-core POWER7 SCMs with 24 MB of L3 cache (24 cores per
processor card, each core with 4 MB of L3 cache) at 3.44 GHz.
Figure 1-3 shows the top view of the Power 770 and Power 780 system having two SCMs
installed. The two POWER7 SCMs and the system memory reside on a single processor card
feature.
Figure 1-3 Top view of Power 770 and Power 780 system with two SCMs
POWER7
Fans
PCIe Slot
PCIe Slot
PCIe Slot
PCIe Slot
PCIe Slot
PCIe Slot
Memory
Memory
Memory
TPMD
POWER714 IBM Power 770 and 780 Technical Overview and Introduction
Figure 1-4 shows the top view of the Power 780 system having four SCMs installed. The four
POWER7 SCMs and the system memory reside on a single processor card feature.
Figure 1-4 Top view of a Power 780 system with four SCMs
In standard or MaxCore mode, the Power 780 system uses all processor cores running at
3.92 GHz and has access to the full 32 MB of L3 cache. In TurboCore mode, only four of the
eight processor cores are available, but at a higher frequency (4.14 GHz), and these four
cores have access to the full 32 MB of L3 cache. Thus, in Turbo-core mode there are fewer
cores running at a higher frequency and a higher core-to-L3-cache ratio.
For a more detailed description of MaxCore and TurboCore modes, see 2.1.5, “Flexible
POWER7 processor packaging and offerings” on page 44.
Several types of Capacity on Demand (CoD) processors are optionally available on the
Power 770 and Power 780 servers to help meet changing resource requirements in an on
demand environment by using resources installed on the system but not activated. CoD
allows you to purchase additional permanent processor or memory capacity and dynamically
activate it when needed.
More detailed information about CoD can be found in 2.4, “Capacity on Demand” on page 60.
Note: TurboCore mode is supported on the Power 780, but is not supported on the
Power 770.
POWER7
POWER7
POWER7
POWER7
Memory
Memory
Memory
TPMD
PCIe Slot #1
PCIe Slot #2
PCIe Slot #3
PCIe Slot #4
PCIe Slot #5
PCIe Slot #6
FansChapter 1. General description 15
1.4.6 Summary of processor features
Table 1-5 summarizes the processor feature codes for the Power 770.
Table 1-5 Summary of processor features for the Power 770
Feature
code
Description OS
support
#4983 0/12-core 3.72 GHz POWER7 processor card:
12-core 3.72 GHz POWER7 CUoD processor planar containing two
six-core processors. Each processor has 2 MB of L2 cache (256 KB per
core) and 32 MB of L3 cache (4 MB per core). There are 16 DDR3 DIMM
slots on the processor planar (8 DIMM slots per processor), which can
be used as Capacity on Demand (CoD) memory without activating the
processors. The voltage regulators are included in this feature code.
AIX
IBM i
Linux
#5329 One processor activation for processor #4983:
Each occurrence of this feature permanently activates one processor on
Processor Card #4983. One processor activation for processor feature
#4983 with inactive processors.
AIX
IBM i
Linux
#5330 Processor CoD utility billing for #4983, 100 processor-minutes:
Provides payment for temporary use of processor feature #4983 with
supported AIX or Linux operating systems. Each occurrence of this
feature will pay for 100 minutes of usage. The purchase of this feature
occurs after the customer has 100 minutes of use on processor cores in
the Shared Processor Pool that are not permanently active.
AIX
Linux
#5331 Processor CoD utility billing for #4983, 100 processor-minutes:
Provides payment for temporary use of processor feature #4983 with
supported IBM i operating systems. Each occurrence of this feature will
pay for 100 minutes of usage. The purchase of this feature occurs after
the customer has 100 minutes of use on processor cores in the Shared
Processor Pool that are not permanently active.
IBM i
#5332 One processor-day on/off billing for #4983:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/Off Processor Core
Day Billing features and bill you. One #5332 must be ordered for each
billable processor core day of feature #4983 used by a supported AIX or
Linux operating system.
AIX
Linux
#5333 One processor-day on/off billing for #4983:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/Off Processor Core
Day Billing features and the client will be charged. One #5333 must be
ordered for each billable processor core day of feature #4983 used by a
supported IBM i operating system.
IBM i16 IBM Power 770 and 780 Technical Overview and Introduction
#4984 0/16-core 3.3 GHz POWER7 processor card:
16-core 3.3 GHz POWER7 CUoD processor planar containing two
eight-core processors. Each processor has 2 MB of L2 cache (256 KB
per core) and 32 MB of L3 cache (4 MB per core). There are 16 DDR3
DIMM slots on the processor planar (8 DIMM slots per processor), which
can be used as Capacity on Demand (CoD) memory without activating
the processors. The voltage regulators are included in this feature code.
AIX
IBM i
Linux
#5334 One processor activation for processor #4984:
Each occurrence of this feature will permanently activate one processor
on Processor Card #4984. One processor activation for processor
feature #4984 with inactive processors.
AIX
IBM i
Linux
#5335 Processor CoD utility billing for #4984, 100 processor-minutes:
Provides payment for temporary use of processor feature #4984 with
supported AIX or Linux operating systems. Each occurrence of this
feature will pay for 100 minutes of usage. The purchase of this feature
occurs after the customer has 100 minutes of use on processor cores in
the Shared Processor Pool that are not permanently active.
AIX
Linux
#5336 Processor CoD utility billing for #4984, 100 processor-minutes:
Provides payment for temporary use of processor feature #4984 with
supported IBM i operating systems. Each occurrence of this feature will
pay for 100 minutes of usage. The purchase of this feature occurs after
the customer has 100 minutes of use on processor cores in the Shared
Processor Pool that are not permanently active.
IBM i
#5337 One processor-day on/off billing for #4984:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/Off Processor Core
Day Billing features and the client will be charged. One #5337 must be
ordered for each billable processor core day of feature #4984 used by a
supported AIX or Linux operating system.
AIX
Linux
#5338 One processor-day on/off billing for #4984:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/Off Processor Core
Day Billing features and the client will be charged. One #5338 must be
ordered for each billable processor core day of feature #4984 used by a
supported IBM i operating system.
IBM i
#7951 On/off processor enablement:
This feature can be ordered to enable your server for On/Off Capacity on
Demand. After it is enabled, you can request processors on a temporary
basis. You must sign an On/Off Capacity on Demand contract before you
order this feature.
Note: To renew this feature after the allowed 360 processor days have
been used, this feature must be removed from the system configuration
file and reordered by placing an MES order.
AIX
Linux
IBM i
Feature
code
Description OS
supportChapter 1. General description 17
Table 1-6 summarizes the processor feature codes for the Power 780.
Table 1-6 Summary of processor features for the Power 780
Feature
code
Description OS
support
#5003 0/16 core 3.92 GHz / 4.14 GHz POWER7 Turbocore processor card:
This feature has two modes. Standard mode utilizes all 16 cores at
3.92 GHz and TurboCore mode utilizes eight cores at 4.14 GHz. This
feature is a POWER7 CUoD processor planar containing two 8-core
processors. TurboCore mode utilizes cores one through eight with
enhanced memory caching. TurboCore mode must be turned off when
you want to utilize more than eight cores. Switching between modes
requires a system reboot.
AIX
IBM i
Linux
#5333 1-core activation for processor feature #5003:
Each occurrence of this feature will permanently activate one processor
core on Processor Card #5003.
AIX
IBM i
Linux
#EP2L 100 on/off processor days of CoD billing for processor #5003:
After the ON/OFF Processor function is enabled in a system, you must
report your on/off usage to IBM at least monthly. This information, used
to compute your billing data, is provided to your sales channel. The sales
channel will place an order on your behalf for the quantity of this feature
that matches your reported use. One #EP2L provides 100 days of on/off
processor billing for POWER7 CoD Processor Book #5003 for AIX/Linux.
AIX
Linux
#EP2M 100 on/off processor days of CoD billing for processor #5003:
After the ON/OFF Processor function is enabled in a system, you must
report your on/off usage to IBM at least monthly. This information, used
to compute your billing data, is provided to your sales channel. The sales
channel will place an order on your behalf for the quantity of this feature
that matches your reported use. One #EP2M provides 100 days of on/off
processor billing for POWER7 CoD Processor Book #5003 for IBM i.
IBM i
#5342 One processor day on/off billing for #5003:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/ Off Processor Core
Day Billing features and the client will be charged. One #5342 must be
ordered for each billable processor core day of feature #5003 used by a
supported AIX or Linux operating system.
AIX
Linux
#5343 One processor day on/off billing for #5003:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/ Off Processor Core
Day Billing features and the client will be charged. One #5343 must be
ordered for each billable processor core day of feature #5003 used by a
supported IBM i operating system.
IBM i18 IBM Power 770 and 780 Technical Overview and Introduction
#EP24 0/24 core 3.44 GHz POWER7 processor card:
24-core 3.44 GHz POWER7 CUoD processor planar containing four
6-core processors. Each processor has 2 MB of L2 cache (256 KB per
core) and 32 MB of L3 cache (4 MB per core). There are 16 DDR3 DIMM
slots on the processor planar (eight DIMM slots per processor), which
can be used as CoD memory without activating the processors. The
voltage regulators are included in this feature code.
AIX
IBM i
Linux
#EP25 1-core activation for processor feature #EP24:
Each occurrence of this feature will permanently activate one processor
core on Processor Card #EP24.
AIX
Linux
IBM i
#EP2N 100 on/off processor days of CoD billing for processor #EP24:
After the ON/OFF Processor function is enabled in a system, you must
report your on/off usage to IBM at least monthly. This information, used
to compute your billing data, is provided to your sales channel. The sales
channel will place an order on your behalf for the quantity of this feature
that matches your reported use. One #EP2N provides 100 days of on/off
processor billing for POWER7 CoD Processor Book #EP24 for
AIX/Linux.
AIX
Linux
#EP2P 100 on/off processor days of CoD billing for processor #EP24:
After the ON/OFF Processor function is enabled in a system, you must
report your on/off usage to IBM at least monthly. This information, used
to compute your billing data, is provided to your sales channel. The sales
channel will place an order on your behalf for the quantity of this feature
that matches your reported use. One #EP2P provides 100 days of on/off
processor billing for POWER7 CoD Processor Book #EP24 for IBM i.
IBM i
#EP28 One processor day on/off billing for #EP24:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/ Off Processor Core
Day Billing features and the client will be charged. One #EP27 must be
ordered for each billable processor core day of feature #EP24 used by a
supported AIX or Linux operating system.
AIX
Linux
#EP29 One processor day on/off billing for #EP24:
After an On/Off Processor Enablement feature is ordered and the
associated enablement code is entered into the system, you must report
your on/off usage to IBM at least monthly. This information, used to
compute your billing data, is then provided to your sales channel. The
sales channel will place an order for a quantity of On/ Off Processor Core
Day Billing features and the client will be charged. One #EP29 must be
ordered for each billable processor core day of feature #EP24 used by a
supported IBM i operating system.
IBM i
#7951 On/Off Processor Enablement:
This feature can be ordered to enable your server for On/Off Capacity on
Demand. After it is enabled, you can request processors on a temporary
basis. You must sign an On/Off Capacity on Demand contract before you
order this feature.
Note: To renew this feature after the allowed 360 processor days have
been used, this feature must be removed from the system configuration
file and reordered by placing an MES order.
AIX
Linux
IBM i
Feature
code
Description OS
supportChapter 1. General description 19
1.4.7 Memory features
In POWER7 systems, DDR3 memory is used throughout. The POWER7 DDR3 memory uses
a memory architecture to provide greater bandwidth and capacity. This enables operating at a
higher data rate for large memory configurations. All processor cards have 16 memory DIMM
slots (eight per processor) running at speeds up to 1066 MHz and must be populated with
POWER7 DDR3 Memory DIMMs.
Figure 1-5 outlines the general connectivity of an 8-core POWER7 processor and DDR3
memory DIMMS. The eight memory channels (four per memory controller) can be clearly
seen.
Figure 1-5 Outline of 8-core POWER7 processor connectivity to DDR3 DIMMs - Used in a 2-socket
Power 770 and Power 780
On each processor card for the Power 770 and Power 780 there is a total of 16 DDR3
memory DIMM slots to be connected. When using two SCMs per card, eight DIMM slots are
used per processor, and when using four SCMs per card in the Power 780 server, four DIMM
slots are used per processor.
The quad-high (96 mm) DIMM cards can have an 8 GB, 16 GB, 32 GB, or 64 GB capacity
and are connected to the POWER7 processor memory controller through an advanced
memory buffer ASIC. For each DIMM, there is a corresponding memory buffer. Each memory
channel into the POWER7 memory controllers is driven at 6.4 GHz.
Each DIMM (except the 64 GB DIMM) contains DDR3 x8 DRAMs in a configuration, with 10
DRAMs per rank, and plugs into a 276-pin DIMM slot connector. The 64 GB DIMM is an
8-rank DIMM using x4 parts (1024Kx4). The x4 DIMMs are 20 DRAMs per rank.
POWER7 Processor Chip
Memory Controller 0 Memory Controller 1
L3 Cache
Core Core Core Core
Core Core Core Core
Advanced Buffer
DIMM 5
A B C D D C B A
Advanced Buffer
DIMM 8
Advanced Buffer
DIMM 7
Advanced Buffer
DIMM 6
Advanced Buffer
DIMM 4
Advanced Buffer
DIMM 3
Advanced Buffer
DIMM 2
Advanced Buffer
DIMM 120 IBM Power 770 and 780 Technical Overview and Introduction
The Power 770 and Power 780 have memory features in 32 GB, 64 GB, 128 GB, and
256 GB capacities. Table 1-7 summarizes the capacities of the memory features and
highlights other characteristics.
Table 1-7 Summary of memory features
None of the memory in these features is active. Feature number #8212 or #8213 must be
purchased to activate the memory. Table 1-8 outlines the memory activation feature codes
and corresponding memory capacity activations.
Table 1-8 CoD system memory activation features
Note: DDR2 DIMMs (used in POWER6®-based systems) are not supported in
POWER7-based systems.
Feature
code
Memory
technology
Capacity Access
rate
DIMMs DIMM slots
used
#5600 DDR3 32 GB 1066 MHz 4 x 8 GB DIMMs 4
#5601 DDR3 64 GB 1066 MHz 4 x 16 GB DIMMs 4
#5602 DDR3 128 GB 1066 MHz 4 x 32 GB DIMMs 4
#5564 DDR3 256 GB 1066 MHz 4 x 64 GB DIMM 4
Feature
code
Activation
capacity
Additional information OS
support
#8212 1 GB Activation of 1 GB of DDR3 POWER7 memory. Each
occurrence of this feature permanently activates 1 GB
of DDR3 POWER7 memory.
AIX
IBM i
Linux
#8213 100 GB Activation of 100 GB of DDR3 POWER7 memory.
Each occurrence of this feature permanently activate
100 GB of DDR3 POWER7 memory.
AIX
IBM i
Linux
#7954 N/A On/Off Memory Enablement: This feature can be
ordered to enable your server for On/Off Capacity on
Demand. After it is enabled, you can request memory
on a temporary basis. You must sign an On/Off
Capacity on Demand contract before this feature is
ordered. To renew this feature after the allowed
999 GB Days have been used, this feature must be
removed from the system configuration file and
reordered by placing an MES order.
AIX
IBM i
Linux
#4710 N/A On/Off 999 GB-Days, Memory Billing POWER7:
After the ON/OFF Memory function is enabled in a
system, you must report your on/off usage to IBM at
least monthly. This information, used to compute your
billing data, is provided to your sales channel. The
sales channel will place an order on your behalf for the
quantity of this feature that matches your reported
use. One #4710 feature must be ordered for each 999
billable days for each 1 GB increment of POWER7
memory that was used.
AIX
IBM i
LinuxChapter 1. General description 21
1.5 Disk and media features
Each system building block features two SAS DASD controllers with six hot-swappable
2.5-inch Small Form Factor (SFF) disk bays and one hot-plug, slim-line media bay per
enclosure. The SFF SAS disk drives and Solid State Drive (SSD) are supported internally. In
a full configuration with four connected building blocks, the combined system supports up to
24 disk bays. SAS drives and SSD drives can share the same backplane.
Table 1-9 shows the available disk drive feature codes that each bay can contain.
Table 1-9 Disk drive feature code description
#7377 N/A On/Off, 1 GB-1Day, Memory Billing POWER7:
After the ON/OFF Memory function is enabled in a
system you must report the client’s on/off usage to
IBM on a monthly basis. This information is used to
compute IBM billing data. One #7377 feature must be
ordered for each billable day for each 1 GB increment
of POWER7 memory that was used.
Note that inactive memory must be available in the
system for temporary use.
AIX
IBM i
Linux
Note:
All POWER7 memory features must be purchased with sufficient permanent memory activation
features so that the system memory is at least 50% active.
The minimum activations ordered with all initial orders of memory feature #5564 must be
192 GB of 256 GB per each feature #5564 ordered (that is, 75% of the installed feature
#5564 capacity).
Note: Memory CoD activations activate memory hardware only for the system serial
number for which they are purchased. If memory hardware is moved to another system,
the memory might not be functional in that system until arrangements are made to move
the memory activations or purchase additional memory activations.
Feature
code
Activation
capacity
Additional information OS
support
Feature code Description OS support
#1886 146 GB 15 K RPM SFF SAS Disk Drive AIX, Linux
#1917 146 GB 15 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1995 177 GB SSD Module with eMLC AIX, Linux
#1775 177 GB SFF-1 SSD with eMLC AIX, Linux
#1793 177 GB SFF-2 SSD with eMLC AIX, Linux
#1925 300 GB 10 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1885 300 GB 10 K RPM SFF SAS Disk Drive AIX, Linux
#1880 300 GB 15 K RPM SAS SFF Disk Drive AIX, Linux
#1953 300 GB 15 K RPM SAS SFF-2 Disk Drive AIX, Linux22 IBM Power 770 and 780 Technical Overview and Introduction
Certain adapters are available for order in large quantities. Table 1-10 lists the disk drives
available in a quantity of 150.
Table 1-10 Available disk drives in quantity of 150
#1790 600 GB 10 K RPM SAS SFF Disk Drive AIX, Linux
#1964 600 GB 10 K RPM SAS SFF-2 Disk Drive AIX, Linux
#1947 139 GB 15 K RPM SAS SFF-2 Disk Drive IBM i
#1888 139 GB 15 K RPM SFF SAS Disk Drive IBM i
#1996 177 GB SSD Module with eMLC IBM i
#1787 177 GB SFF-1 SSD with eMLC IBM i
#1794 177 GB SFF-2 SSD with eMLC IBM i
#1956 283 GB 10 K RPM SAS SFF-2 Disk Drive IBM i
#1911 283 GB 10 K RPM SFF SAS Disk Drive IBM i
#1879 283 GB 15 K RPM SAS SFF Disk Drive IBM i
#1948 283 GB 15 K RPM SAS SFF-2 Disk Drive IBM i
#1916 571 GB 10 K RPM SAS SFF Disk Drive IBM i
#1962 571 GB 10 K RPM SAS SFF-2 Disk Drive IBM i
Feature code Description OS support
#7550 Quantity 150 of #1790 (600 GB 10 K RPM SAS SFF Disk Drive) AIX, Linux
#1928 Quantity 150 of #1880 (300 GB 15 K RPM SAS SFF Disk Drive) AIX, Linux
#7547 Quantity 150 of #1885 (300 GB 10 K RPM SFF SAS Disk Drive) AIX, Linux
#7548 Quantity 150 of #1886 (146 GB 15 K RPM SFF SAS Disk Drive) AIX, Linux
#1866 Quantity 150 of #1917 (146 GB 15 K RPM SAS SFF-2 Disk Drive) AIX, Linux
#1869 Quantity 150 of #1925 (300 GB 10 K RPM SAS SFF-2 Disk Drive) AIX, Linux
#1929 Quantity 150 of #1953 (300 GB 15 K RPM SAS SFF-2 Disk Drive) AIX, Linux
#1818 Quantity 150 of #1964 (600 GB 10 K RPM SAS SFF-2 Disk Drive) AIX, Linux
#1926 Quantity 150 of #1879 (283 GB 15 K RPM SAS SFF Disk Drive) IBM i
#7544 Quantity 150 of #1888 (139 GB 15 K RPM SFF SAS Disk Drive) IBM i
#7557 Quantity 150 of #1911(283 GB 10 K RPM SFF SAS Disk Drive) IBM i
#7566 Quantity 150 of #1916 (571 GB 10 K RPM SAS SFF Disk Drive) IBM i
#1868 Quantity 150 of #1947 (139 GB 15 K RPM SAS SFF-2 Disk Drive) IBM i
#1927 Quantity 150 of #1948 (283 GB 15 K RPM SAS SFF-2 Disk Drive) IBM i
#1844 Quantity 150 of #1956 (283 GB 10 K RPM SAS SFF-2 Disk Drive) IBM i
#1817 Quantity 150 of #1962 (571 GB 10 K RPM SAS SFF-2 Disk Drive) IBM i
Feature code Description OS supportChapter 1. General description 23
The Power 770 and Power 780 support both 2.5-inch and 3.5-inch SAS SFF hard disks. The
3.5-inch DASD hard disk can be attached to the Power 770 and Power 780 but must be
located in a feature #5886 EXP12S I/O drawer, whereas 2.5-inch DASD hard files can be
either mounted internally or in the EXP24S SFF Gen2-bay Drawer (#5887).
If you need more disks than available with the internal disk bays, you can attach additional
external disk subsystems. For more detailed information about the available external disk
subsystems, see 2.11, “External disk subsystems” on page 92.
SCSI disks are not supported in the Power 770 and 780 disk bays. However, if you want to
use SCSI disks, you can attach existing SCSI disk subsystems.
The disk/media backplane feature #5652 provides six SFF disk slots and one SATA media
slot. In a full configuration with four connected building blocks, the combined system supports
up to four media devices with Media Enclosure and Backplane #5652. The SATA Slimline
DVD-RAM drive (#5762) is the only supported media device option.
1.6 I/O drawers
The system has eight I/O expansion slots per enclosure, including two dedicated GX++ slots.
If more PCI slots are needed, such as to extend the number of LPARs, up to 32 PCI-DDR
12X Expansion Drawers (#5796) and up to 16 12X I/O Drawer PCIe (#5802 and #5877) can
be attached.
The Power 770 and the Power 780 servers support the following 12X attached I/O drawers,
providing extensive capability to expand the overall server expandability and connectivity:
Feature #5802 provides PCIe slots and SFF SAS disk slots.
Feature #5877 provides PCIe slots.
Feature #5796 provides PCI-X slots.
The 7314-G30 drawer provides PCI-X slots (supported, but no longer orderable).
Disk-only I/O drawers are also supported, providing large storage capacity and multiple
partition support:
Feature #5886 EXP 12S holds a 3.5-inch SAS disk or SSD.
Feature #5887 EXP 24S SFF Gen2-bay Drawer for high-density storage holds SAS Hard
Disk drives.
The 7031-D24 holds a 3.5-inch SCSI disk (supported but no longer orderable).
The 7031-T24 holds a 3.5-inch SCSI disk (supported but no longer orderable).
1.6.1 PCI-DDR 12X Expansion Drawers (#5796)
The PCI-DDR 12X Expansion Drawer (#5796) is a 4U tall (EIA units) drawer and mounts in a
19-inch rack. Feature #5796 takes up half the width of the 4U (EIA units) rack space. Feature
#5796 requires the use of a #7314 drawer mounting enclosure. The 4U vertical enclosure can
hold up to two #5796 drawers mounted side by side in the enclosure. A maximum of four
#5796 drawers can be placed on the same 12X loop.24 IBM Power 770 and 780 Technical Overview and Introduction
The I/O drawer has the following attributes:
A 4U (EIA units) rack-mount enclosure (#7314) holding one or two #5796 drawers
Six PCI-X DDR slots: 64-bit, 3.3 V, 266 MHz (blind-swap)
Redundant hot-swappable power and cooling units
1.6.2 12X I/O Drawer PCIe (#5802 and #5877)
The #5802 and #5877 expansion units are 19-inch, rack-mountable, I/O expansion drawers
that are designed to be attached to the system using 12X double data rate (DDR) cables. The
expansion units can accommodate 10 generation 3 cassettes. These cassettes can be
installed and removed without removing the drawer from the rack.
A maximum of two #5802 drawers can be placed on the same 12X loop. Feature #5877 is the
same as #5802, except it does not support disk bays. Feature #5877 can be on the same
loop as #5802. Feature #5877 cannot be upgraded to #5802.
The I/O drawer has the following attributes:
Eighteen SAS hot-swap SFF disk bays (only #5802)
Ten PCI Express (PCIe) based I/O adapter slots (blind-swap)
Redundant hot-swappable power and cooling units
1.6.3 EXP 12S SAS Drawer
The EXP 12S SAS drawer (#5886) is a 2 EIA drawer and mounts in a 19-inch rack. The
drawer can hold either SAS disk drives or SSD. The EXP 12S SAS drawer has twelve
3.5-inch SAS disk bays with redundant data paths to each bay. The SAS disk drives or SSDs
contained in the EXP 12S are controlled by one or two PCIe or PCI-X SAS adapters
connected to the EXP 12S via SAS cables.
1.6.4 EXP 24S SFF Gen2-bay Drawer
The EXP24S SFF Gen2-bay Drawer is an expansion drawer supporting up to twenty-four
2.5-inch hot-swap SFF SAS HDDs on POWER6 or POWER7 servers in 2U of 19-inch rack
space. The EXP24S bays are controlled by SAS adapters/controllers attached to the I/O
drawer by SAS X or Y cables.
The SFF bays of the EXP24S are different from the SFF bays of the POWER7 system units
or 12X PCIe I/O drawers (#5802 and #5803). The EXP24S uses Gen2 or SFF-2 SAS drives
that physically do not fit in the Gen1 or SFF-1 bays of the POWER7 system unit or 12X PCIe
I/O Drawers, or vice versa.
1.6.5 I/O drawers and usable PCI slot
The I/O drawer model types can be intermixed on a single server within the appropriate I/O
loop. Depending on the system configuration, the maximum number of I/O drawers that is
supported differs.
Note: Mixing #5802 or 5877 and #5796 on the same loop is not supported.Chapter 1. General description 25
Table 1-11 summarizes the maximum number of I/O drawers supported and the total number
of PCI slots available when expansion consists of a single drawer type.
Table 1-11 Maximum number of I/O drawers supported and total number of PCI slots
Table 1-12 summarizes the maximum number of disk-only I/O drawers supported.
Table 1-12 Maximum number of disk only I/O drawers supported
1.7 Comparison between models
The Power 770 offers configuration options, where the POWER7 processor can have one of
two different processor cards installed. Both contain two Single Chip Module (SCM) cards.
These cards can contain the following processor configurations:
Two socket card: Eight cores at 3.3 GHz
Two socket card: Six cores at 3.72 GHz.
Both of these Power 770 models are available starting as low as four active cores and
incrementing one core at a time through built-in CoD functions to a maximum of 48 active
cores with the 3.72 GHz processor or 64 active cores with the 3.3 GHz processor.
The Power 780 offers configuration options where the POWER7 processor can have one of
two processor cards installed. These are either a two-socket SCM card or four SCM cards.
These processor cards contain the following processor configurations:
Two socket card: Eight cores at 3.92 GHz (MaxCore mode) or four cores at 4.14 GHz
(TurboCore mode)
Four socket card: Six cores at 3.44 GHz
Both of these Power 780 models are available starting as low as four active cores and
incrementing one core at a time through built-in CoD functions to a maximum of 64 active
cores with the 3.92 GHz processor or 96 active cores with the 3.44 GHz processor.
System drawers Max #5796
drawers
Max #5802 and
#5877 drawers
Total number of slots
#5796 #5802 and #5877
PCI-X PCIe PCI-X PCIe
1 drawer 8 4 48 6 0 46
2 drawers 16 8 96 12 0 92
3 drawers 24 12 144 18 0 138
4 drawers 32 16 192 24 0 184
Server Max #5886 drawers Max #5887 drawers
Power 770 110 56
Power 780 110 5626 IBM Power 770 and 780 Technical Overview and Introduction
Table 1-13 summarizes the processor core options and frequencies and matches them to the
L3 cache sizes for the Power 770 and Power 780.
Table 1-13 Summary of processor core counts, core frequencies, and L3 cache sizes
1.8 Build to Order
You can perform a Build to Order (also called a la carte) configuration using the IBM
Configurator for e-business (e-config), where you specify each configuration feature that you
want on the system.
This is the only configuration method for the IBM Power 770 and Power 780 servers.
1.9 IBM Editions
IBM Edition offerings are not available for the IBM Power 770 and Power 780 servers.
1.10 Model upgrade
You can upgrade the 9117-MMA with IBM POWER6 or POWER6+™ processors to the IBM
Power 770 and Power 780 with POWER7 processors. For upgrades from POWER6 or
POWER6+ processor-based systems, IBM will install new CEC enclosures to replace the
enclosures that you currently have. Your current CEC enclosures will be returned to IBM in
exchange for the financial consideration identified under the applicable feature conversions
for each upgrade.
Clients taking advantage of the model upgrade offer from a POWER6 or POWER6+
processor-based system are required to return all components of the serialized MT-model
that were not ordered through feature codes. Any feature for which a feature conversion is
System Cores per
POWER7 SCM
Frequency
(GHz)
L3 cache
a
a. The total L3 cache available on the POWER7 SCM, maintaining 4 MB per processor core.
Enclosure
summation
b
b. The total number of processor cores and L3 cache within a populated enclosure.
System maximum
(cores)
c
c. The maximum number of cores with four CEC enclosures and all cores activated.
Power 770 6 3.72 24 MB 12-cores and
48 MB L3 cache
48
Power 770 8 3.30 32 MB 16-cores and
64 MB L3 cache
64
Power 780 6 3.44 24 MB 24-cores and
96 MB L3 cache
96
Power 780 in MaxCore
mode
d
d. MaxCore mode applies to Power 780 only. Each POWER7 SCM has eight active cores and 32 MB L3 cache.
8 3.92 32 MB 16-cores and
64 MB L3 cache
64
Power 780 in TurboCore
mode
e
e. TurboCore mode applies to Power 780 only. Each POWER SCM uses four of the eight cores but at a higher
frequency and 32 MB L3 cache.
4 activated 4.14 32 MB 8-cores active and
64 MB L3 cache
32Chapter 1. General description 27
used to obtain a new part must be returned to IBM also. Clients can keep and reuse any
features from the CEC enclosures that were not involved in a feature conversion transaction.
Upgrade considerations
Feature conversions have been set up for:
POWER6 and POWER6+ processors to POWER7 processors
DDR2 memory DIMMS to DDR3 memory DIMMS
Trim kits (A new trim kit is needed when upgrading to a 2-door, 3-door, or 4-door system.)
Enterprise enablement
The following features that are present on the current system can be moved to the
new system:
DDR3 memory DIMMs (#5600, #5601, and #5602)
Active Memory Expansion Enablement (#4791)
FSP/Clock Pass Through Card (#5665)
Service Processor (#5664) o 175 MB Cache RAID - Dual IOA Enablement Card (#5662)
Operator Panel (#1853)
Disk/Media Backplane (#5652)
PCIe adapters with cables, line cords, keyboards, and displays
PowerVM Standard edition (#7942) or PowerVm Enterprise edition (#7995)
I/O drawers (#5786, #5796, #5802, #5877, and #5886)
Racks (#0551, #0553, and #0555)
Doors (#6068 and #6069)
SATA DVD-RAM (#5762)
The Power 770 and Power 780 can support the following drawers:
#5802 and #5877 PCIe 12X I/O drawers
#5797 and #7413-G30 PCI-X (12X) I/O Drawer
#5786 and #7031-D24 TotalStorage EXP24 SCSI Disk Drawer
#5886 EXP12S SAS Disk Drawer
The Power 770 and Power 780 support only the SAS DASD SFF hard disks internally. The
existing 3.5-inch DASD hard disks can be attached to Power 770 and Power 780, but must
be located in an I/O drawer such as #5886.
For POWER6 or POWER6+ processor-based systems that have the On/Off CoD function
enabled, you must reorder the On/Off enablement features (#7951and #7954) when placing
the upgrade MES order for the new Power 770 or 780 system to keep the On/Off CoD
function active. To initiate the model upgrade, the On/Off enablement features should be
removed from the configuration file before the Miscellaneous Equipment Shipment (MES)
order is started. Any temporary use of processors or memory owed to IBM on the existing
system must be paid before installing the new Power 770 model MMC or Power 780
model MHC.
Features #8018 and #8030 are available to support migration of the PowerVM features #7942
or #7995 during the initial order and build of the MMC or MHC upgrade MES order.
Customers can add feature #8018 or #8030 to their upgrade orders in a quantity not to
exceed the quantity of feature #7942 or #7995 obtained for the system being upgraded.
Feature #7942 or #7995 must be migrated to the new configuration report in a quantity that
equals feature #8018 or #8030. Additional #7942 or #7995 features can be ordered during
the upgrade.
Clients can add feature #8018 to their upgrade orders in a quantity not to exceed the quantity
of feature #7942 obtained for the system being upgraded. Feature #7942 must be migrated to 28 IBM Power 770 and 780 Technical Overview and Introduction
the new configuration report in a quantity that equals feature #8018. Additional #7942
features can be ordered during the upgrade.
1.11 Hardware Management Console models
The Hardware Management Console (HMC) is required for managing the IBM
Power 770 and Power 780. It provides a set of functions that are necessary to manage
the system, including:
Creating and maintaining a multiple partition environment
Displaying a virtual operating system session terminal for each partition
Displaying a virtual operator panel of contents for each partition
Detecting, reporting, and storing changes in hardware conditions
Powering managed systems on and off
Acting as a service focal point for service representatives to determine an appropriate
service strategy
The IBM Power 770 and Power 780 are not supported by the Integrated Virtualization
Manager (IVM).
Several HMC models are supported to manage POWER7-based systems. Two models
(7042-C08 and 7042-CR6) are available for ordering at the time of writing, but you can also
use one of the withdrawn models listed in Table 1-14.
Table 1-14 HMC models supporting POWER7 processor technology-based servers
At the time of writing, base Licensed Machine Code Version 7 Revision 7.4.0 or later is
required to support the Power 770 and Power 780.
Existing HMC models 7310 can be upgraded to Licensed Machine Code Version 7 to support
environments that might include POWER5, POWER5+, POWER6, POWER6+, and
POWER7 processor-based servers. Licensed Machine Code Version 6 (#0961) is not
available for 7042 HMCs.
If you want to support more than 254 partitions in total, then the HMC might require a memory
upgrade to 4 GB.
Type-model Availability Description
7310-C05 Withdrawn IBM 7310 Model C05 Desktop Hardware Management Console
7310-C06 Withdrawn IBM 7310 Model C06 Deskside Hardware Management Console
7042-C06 Withdrawn IBM 7042 Model C06 Deskside Hardware Management Console
7042-C07 Withdrawn IBM 7042 Model C07 Deskside Hardware Management Console
7042-C08 Available IBM 7042 Model C08 Deskside Hardware Management Console
7310-CR3 Withdrawn IBM 7310 Model CR3 Rack-Mounted Hardware Management Console
7042-CR4 Withdrawn IBM 7042 Model CR4 Rack-Mounted Hardware Management Console
7042-CR5 Withdrawn IBM 7042 Model CR5 Rack-Mounted Hardware Management Console
7042-CR6 Available IBM 7042 Model CR6 Rack mounted Hardware Management ConsoleChapter 1. General description 29
1.12 System racks
The Power 770 and its I/O drawers are designed to be mounted in the 7014-T00,
7014-T42, 7014-B42, 7014-S25, #0551, #0553, or #0555 rack. The Power 780 and I/O
drawers can be ordered only with the 7014-T00 and 7014-T42 racks. These are built to the
19-inch EIA standard. An existing 7014-T00, 7014-B42, 7014-S25, 7014-T42, #0551, #0553,
or #0555 rack can be used for the Power 770 and Power 780 if sufficient space and power
are available.
The 36U (1.8-meter) rack (#0551) and the 42U (2.0-meter) rack (#0553) are available for
order on MES upgrade orders only. For initial system orders, the racks must be ordered as
machine type 7014, models T00, B42, S25, or T42.
If a system is to be installed in a rack or cabinet that is not IBM, it must meet requirements.
1.12.1 IBM 7014 model T00 rack
The 1.8-meter (71-in.) model T00 is compatible with past and present IBM Power systems.
The features of the T00 rack are as follows:
It has 36U (EIA units) of usable space.
It has optional removable side panels.
It has an optional highly perforated front door.
It has optional side-to-side mounting hardware for joining multiple racks.
It has standard business black or optional white color in OEM format.
It has increased power distribution and weight capacity.
It supports both AC and DC configurations.
The rack height is increased to 1926 mm (75.8 in.) if a power distribution panel is fixed to
the top of the rack.
Up to four power distribution units (PDUs) can be mounted in the PDU bays (Figure 1-6 on
page 31), but others can fit inside the rack. See 1.12.7, “The AC power distribution unit
and rack content” on page 31.
Weights are:
– T00 base empty rack: 244 kg (535 lb)
– T00 full rack: 816 kg (1795 lb)
Note: The client is responsible for ensuring that the installation of the drawer in the
preferred rack or cabinet results in a configuration that is stable, serviceable, safe, and
compatible with the drawer requirements for power, cooling, cable management, weight,
and rail security.30 IBM Power 770 and 780 Technical Overview and Introduction
1.12.2 IBM 7014 model T42 rack
The 2.0-meter (79.3-inch) Model T42 addresses the client requirement for a tall enclosure to
house the maximum amount of equipment in the smallest possible floor space. The features
that differ in the model T42 rack from the model T00 include:
It has 42U (EIA units) of usable space (6U of additional space).
The model T42 supports AC only.
Weights are:
– T42 base empty rack: 261 kg (575 lb)
– T42 full rack: 930 kg (2045 lb)
1.12.3 IBM 7014 model S25 rack
The 1.3-meter (49-inch) model S25 rack has the following features:
25U (EIA units)
Weights:
– Base empty rack: 100.2 kg (221 lb)
– Maximum load limit: 567.5 kg (1250 lb)
The S25 racks do not have vertical mounting space that accommodate feature number 7188
PDUs. All PDUs required for application in these racks must be installed horizontally in the
rear of the rack. Each horizontally mounted PDU occupies 1U of space in the rack, and
therefore reduces the space available for mounting servers and other components.
1.12.4 Feature number 0555 rack
The 1.3-meter rack (#0555) is a 25U (EIA units) rack. The rack that is delivered as #0555 is
the same rack delivered when you order the 7014-S25 rack. The included features might
differ. The #0555 is supported, but it is no longer orderable.
1.12.5 Feature number 0551 rack
The 1.8-meter rack (#0551) is a 36U (EIA units) rack. The rack that is delivered as #0551 is
the same rack delivered when you order the 7014-T00 rack. The included features might
differ. Several features that are delivered as part of the 7014-T00 must be ordered separately
with the #0551.
1.12.6 Feature number 0553 rack
The 2.0-meter rack (#0553) is a 42U (EIA units) rack. The rack that is delivered as #0553 is
the same rack delivered when you order the 7014-T42 or B42 rack. The included features
might differ. Several features that are delivered as part of the 7014-T42 or B42 must be
ordered separately with the #0553.
Note: A special door (#6250) and side panels (#6238) are available to make the rack
appear as a high-end server (but in a 19-inch rack format instead of a 24-inch rack).
Note: The Power 780 cannot be ordered with a S25 or B25 rack.Chapter 1. General description 31
1.12.7 The AC power distribution unit and rack content
For rack models T00 and T42, 12-outlet PDUs are available. These include PDUs Universal
UTG0247 Connector (#9188 and #7188) and Intelligent PDU+ Universal UTG0247
Connector (#7109).
Four PDUs can be mounted vertically in the back of the T00 and T42 racks. Figure 1-6 shows
the placement of the four vertically mounted PDUs. In the rear of the rack, two additional
PDUs can be installed horizontally in the T00 rack and three in the T42 rack. The four vertical
mounting locations will be filled first in the T00 and T42 racks. Mounting PDUs horizontally
consumes 1U per PDU and reduces the space available for other racked components. When
mounting PDUs horizontally, use fillers in the EIA units occupied by these PDUs to facilitate
proper air flow and ventilation in the rack.
Figure 1-6 PDU placement and PDU view
For the Power 770 and Power 780 installed in IBM 7014 or #055x racks, the following PDU
rules apply:
For PDU #7188 and #7109 when using power cord #6654, #6655, #6656, #6657,
or #6658: Each pair of PDUs can power up to three Power 770 and Power 780
CEC enclosures.
For PDU #7188 and #7109 when using power cord #6489, 6491, #6492, or #6653: Each
pair of PDUs can power up to seven Power 770 and Power 780 CEC enclosures.
For detailed power cord requirements and power cord feature codes, see the IBM Power
Systems Hardware Information Center website:
http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp
Rack Rear View
3 4
1 2
Circuit breaker reset
Status LED 32 IBM Power 770 and 780 Technical Overview and Introduction
The Base/Side Mount Universal PDU (#9188) and the optional, additional, Universal PDU
(#7188) and the Intelligent PDU+ options (#7109) support a wide range of country
requirements and electrical power specifications. The PDU receives power through a
UTG0247 power line connector. Each PDU requires one PDU-to-wall power cord. Various
power cord features are available for different countries and applications by varying the
PDU-to-wall power cord, which must be ordered separately. Each power cord provides the
unique design characteristics for the specific power requirements. To match new power
requirements and save previous investments, these power cords can be requested with an
initial order of the rack or with a later upgrade of the rack features.
The PDU has 12 client-usable IEC 320-C13 outlets. There are six groups of two outlets fed by
six circuit breakers. Each outlet is rated up to 10 amps, but each group of two outlets is fed
from one 15 amp circuit breaker.
The Universal PDUs are compatible with previous models.
1.12.8 Rack-mounting rules
The system consists of one to four CEC enclosures. Each enclosure occupies 4U of
vertical rack space. The primary considerations to account for when mounting the system into
a rack are:
For configurations with two, three, or four drawers, all drawers must be installed together in
the same rack, in a contiguous space of 8 U, 12 U, or 16 U within the rack. The uppermost
enclosure in the system is the base enclosure. This enclosure will contain the active
service processor and the operator panel. If a second CEC enclosure is part of the
system, the backup service processor is contained in the second CEC enclosure.
The 7014-T42, -B42, or #0553 rack is constructed with a small flange at the bottom of EIA
location 37. When a system is installed near the top of a 7014-T42, -B42, or #0553 rack,
no system drawer can be installed in EIA positions 34, 35, or 36. This approach is to avoid
interference with the front bezel or with the front flex cable, depending on the system
configuration. A two-drawer system cannot be installed above position 29. A three-drawer
system cannot be installed above position 25. A four-drawer system cannot be installed
above position 21. (The position number refers to the bottom of the lowest drawer.)
When a system is installed in an 7014-T00, -T42, -B42, #0551, or #0553 rack that has no
front door, a Thin Profile Front Trim Kit must be ordered for the rack. The required trim kit
for the 7014-T00 or #0551 rack is #6263. The required trim kit for the 7014-T42, -B42, or
#0553 rack is #6272. When upgrading from a 9117-MMA, trim kits #6263 or #6272 can be
used for one drawer enclosures only.
Note: Ensure that the appropriate power cord feature is configured to support the power
being supplied.
Notes: Based on the power cord that is used, the PDU can supply from 4.8 - 19.2 kVA.
The total kilovolt ampere (kVA) of all the drawers that are plugged into the PDU must not
exceed the power cord limitation.
Each system drawer to be mounted in the rack requires two power cords, which are not
included in the base order. For maximum availability, be sure to connect power cords from
the same system to two separate PDUs in the rack, and to connect each PDU to
independent power sources.Chapter 1. General description 33
The design of the Power 770 and Power 780 is optimized for use in a 7014-T00, -T42,
-B42, -S25, #0551, or #0553 rack. Both the front cover and the processor flex cables
occupy space on the front left side of an IBM 7014, #0551, and #0553 rack that might not
be available in typical non-IBM racks.
Acoustic door features are available with the 7014-T00, 7014-B42, 7014-T42, #0551,
and #0553 racks to meet the lower acoustic levels identified in the specification section
of this document. The acoustic door feature can be ordered on new T00, B42, T42, #0551,
and #0553 racks or ordered for the T00, B42, T42, #0551, and #0553 racks that you
already own.
1.12.9 Useful rack additions
This section highlights several available solutions for IBM Power Systems
rack-based systems.
IBM 7214 Model 1U2 SAS Storage Enclosure
The IBM System Storage 7214 Tape and DVD Enclosure Express is designed to mount in
one EIA unit of a standard IBM Power Systems 19-inch rack and can be configured with
one or two tape drives, or either one or two Slim DVD-RAM or DVD-ROM drives in the
right-side bay.
The two bays of the 7214 Express can accommodate the following tape or DVD drives for IBM
Power servers:
DAT72 36 GB Tape Drive: Up to two drives
DAT72 36 GB Tape Drive: Up to two drives
DAT160 80 GB Tape Drive: Up to two drives
Half-high LTO Ultrium 4 800 GB Tape Drive: Up to two drives
DVD-RAM Optical Drive: Up to two drives
DVD-ROM Optical Drive: Up to two drives
IBM System Storage 7214 Tape and DVD Enclosure
The IBM System Storage 7214 Tape and DVD Enclosure is designed to mount in one EIA unit
of a standard IBM Power Systems 19-inch rack and can be configured with one or two tape
drives, or either one or two Slim DVD-RAM or DVD-ROM drives in the right-side bay.
The two bays of the IBM System Storage 7214 Tape and DVD Enclosure can accommodate
the following tape or DVD drives for IBM Power servers:
DAT72 36 GB Tape Drive: Up to two drives
DAT72 36 GB Tape Drive: Up to two drives
DAT160 80 GB Tape Drive: Up to two drives
Half-high LTO Ultrium 4 800 GB Tape Drive: Up to two drives
DVD-RAM Optical Drive: Up to two drives
DVD-ROM Optical Drive: Up to two drives
IBM System Storage 7216 Multi-Media Enclosure
The IBM System Storage 7216 Multi-Media Enclosure (Model 1U2) is designed to attach to
the Power 770 and the Power 780 through a USB port on the server or through a PCIe SAS
adapter. The 7216 has two bays to accommodate external tape, removable disk drive, or
DVD-RAM drive options.34 IBM Power 770 and 780 Technical Overview and Introduction
The following optional drive technologies are available for the 7216-1U2:
DAT160 80 GB SAS Tape Drive (#5619
DAT320 160 GB SAS Tape Drive (#1402)
DAT320 160 GB USB Tape Drive (#5673)
Half-high LTO Ultrium 5 1.5 TB SAS Tape Drive (#8247)
DVD-RAM - 9.4 GB SAS Slim Optical Drive (#1420 and #1422)
RDX Removable Disk Drive Docking Station (#1103)
To attach a 7216 Multi-Media Enclosure to the Power 770 and Power 780, consider the
following cabling procedures:
Attachment by an SAS adapter
A PCIe Dual-X4 SAS adapter (#5901) or a PCIe LP 2-x4-port SAS Adapter 3 Gb (#5278)
must be installed in the Power 770 and Power 780 server to attach to a 7216 Model 1U2
Multi-Media Storage Enclosure. Attaching a 7216 to a Power 770 and Power 780 through
the integrated SAS adapter is not supported.
For each SAS tape drive and DVD-RAM drive feature installed in the 7216, the appropriate
external SAS cable will be included.
An optional Quad External SAS cable is available by specifying (#5544) with each 7216
order. The Quad External Cable allows up to four 7216 SAS tape or DVD-RAM features to
attach to a single System SAS adapter.
Up to two 7216 storage enclosure SAS features can be attached per PCIe Dual-X4 SAS
adapter (#5901) or the PCIe LP 2-x4-port SAS Adapter 3 Gb (#5278).
Attachment by a USB adapter
The Removable RDX HDD Docking Station features on 7216 only support the USB cable
that is provided as part of the feature code. Additional USB hubs, add-on USB cables, or
USB cable extenders are not supported.
For each RDX Docking Station feature installed in the 7216, the appropriate external USB
cable will be included. The 7216 RDX Docking Station feature can be connected to the
external, integrated USB ports on the Power 770 and Power 780 or to the USB ports on
4-Port USB PCI Express Adapter (# 2728).
The 7216 DAT320 USB tape drive or RDX Docking Station features can be connected to
the external, integrated USB ports on the Power 770 and Power 780.
The two drive slots of the 7216 enclosure can hold the following drive combinations:
One tape drive (DAT160 SAS or Half-high LTO Ultrium 5 SAS) with second bay empty
Two tape drives (DAT160 SAS or Half-high LTO Ultrium 5 SAS) in any combination
One tape drive (DAT160 SAS or Half-high LTO Ultrium 5 SAS) and one DVD-RAM SAS
drive sled with one or two DVD-RAM SAS drives
Up to four DVD-RAM drives
One tape drive (DAT160 SAS or Half-high LTO Ultrium 5 SAS) in one bay, and one RDX
Removable HDD Docking Station in the other drive bay
One RDX Removable HDD Docking Station and one DVD-RAM SAS drive sled with one
or two DVD-RAM SAS drives in the right bay
Two RDX Removable HDD Docking Stations
Note: The DAT320 160 GB SAS Tape Drive (#1402) and the DAT320 160 GB USB Tape
Drive (#5673) are no longer available as of July 15, 2011.Chapter 1. General description 35
Figure 1-7 shows the 7216 Multi-Media Enclosure.
Figure 1-7 7216 Multi-Media Enclosure
In general, the 7216-1U2 is supported by the AIX, IBM i, and Linux operating system.
However, the RDX Removable Disk Drive Docking Station and the DAT320 USB Tape Drive
are not supported with IBM i.
Flat panel display options
The IBM 7316 Model TF3 is a rack-mountable flat panel console kit consisting of a 17-inch
337.9 mm x 270.3 mm flat panel color monitor, rack keyboard tray, IBM Travel Keyboard,
support for IBM keyboard/video/mouse (KVM) switches, and language support. The IBM
7316-TF3 Flat Panel Console Kit offers:
Slim, sleek, lightweight monitor design that occupies only 1U (1.75 inches) in a 19-inch
standard rack
A 17-inch, flat screen TFT monitor with truly accurate images and virtually no distortion
The ability to mount the IBM Travel Keyboard in the 7316-TF3 rack keyboard tray
Support for IBM keyboard/video/mouse (KVM) switches that provide control of as many as
128 servers, and support of both USB and PS/2 server-side keyboard and mouse
connections36 IBM Power 770 and 780 Technical Overview and Introduction© Copyright IBM Corp. 2011. All rights reserved. 37
Chapter 2. Architecture and technical
overview
The IBM Power 780 offers two versions of CEC enclosure. The first is a 2-socket CEC
enclosure, populated with 8-core POWER7 processor cards. This architecture (Figure 2-1 on
page 38) enables a maximum system configuration of 64 processors. The Power 780 also
offers a 4-socket CEC enclosure, populated with 6-core POWER7 processor cards
(Figure 2-2 on page 39), enabling a maximum system configuration of 96 cores.
The IBM Power 770 offers a 2-socket CEC enclosure, populated with 6-core or 8-core
POWER7 processors.
This chapter provides an overview of the system architecture and its major components. The
bandwidths that are provided are theoretical maximums used for reference.
The speeds shown are at an individual component level. Multiple components and application
implementation are key to achieving the best performance.
Always do the performance sizing at the application workload environment level and evaluate
performance using real-world performance measurements and production workloads.
238 IBM Power 770 and 780 Technical Overview and Introduction
Figure 2-1 shows the logical system diagram of the 2-socket Power 770 and Power 780.
Figure 2-1 Two-socket IBM Power 770 and Power 780 logical system diagram
P7-IOC
GX++ SLOT #2
GX++ SLOT #1
PCIe Gen2 x8 (FH/HL) SLOT #5
PCIe Gen2 x8 (FH/HL) SLOT #6
TPMD
SAS
Controller
Optional
RAID
Exp. Card
HDD1
HDD2
HDD3
HDD4
HDD5
HDD6
DVD
SMP
Connector
A
SMP
Connector
B
SMP
Connector
B
SMP
Connector
A
SAS
Controller
SAS
Controller
Optional
RAID
Exp. Card
PCIe Gen2 x8 (FH/HL) SLOT #2
PCIe Gen2 x8 (FH/HL) SLOT #3
PCIe Gen2 x8 (FH/HL) SLOT #4
PCIe Gen2 x8 (FH/HL) SLOT #1
2 x 10 Gbps + 2 x 1 Gbps Ethernet
P7-IOC
2.46 GHz
1.0 GHz
1.0 GHz
Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer
DIMM #9
DIMM #10
DIMM #11
DIMM #12
DIMM #13
DIMM #14
DIMM #15
DIMM #16
Buffer
DIMM #1
DIMM #2
136.448 GBps
per socket
Buffer Buffer
DIMM #3
DIMM #4
Buffer Buffer
DIMM #5
DIMM #6
Buffer Buffer
DIMM #7
DIMM #8
Buffer
2 System Ports
2 HMC Ports
2 SPCN Ports
VPD Chip
Service
Processor
USB #1
USB #2
USB Controller
3.24 GHz
3.24 GHz
3.24 GHz
POWER7 Chip 1
6-8 cores
Memory Controller
POWER7 Chip 2
6-8 cores
Memory Controller
136.448 GBps
per socket
2.46 GHz
2.46 GHz
2.46 GHz
2.46 GHz (2 * 4 Bytes) 19.712 GBps
2.46 GHz (2 * 4 Bytes) 19.712 GBps
2.46 GHz (2 * 4 Bytes) 19.712 GBps
2.46 GHz (2 * 4 Bytes) 19.712 GBpsChapter 2. Architecture and technical overview 39
Figure 2-2 shows the logical system diagram of the 4-socket Power 780.
Figure 2-2 Four-socket IBM Power 780 logical system diagram
P7-IOC
GX++ SLOT #2
GX++ SLOT #1
PCIe Gen2 x8 (FH/HL) SLOT #5
PCIe Gen2 x8 (FH/HL) SLOT #6
Buffer
DIMM #1
DIMM #2
TPMD
SAS
Controller
Optional
RAID
Exp. Card
HDD1
HDD2
HDD3
HDD4
HDD5
HDD6
DVD
2.9 GHz
SMP
Connector
A
SMP
Connector
B
SMP
Connector
B
SMP
Connector
A
SAS
Controller
SAS
Controller
Optional
RAID
Exp. Card
PCIe Gen2 x8 (FH/HL) SLOT #2
PCIe Gen2 x8 (FH/HL) SLOT #3
PCIe Gen2 x8 (FH/HL) SLOT #4
PCIe Gen2 x8 (FH/HL) SLOT #1
2 x 10 Gbps + 2 x 1 Gbps Ethernet
P7-IOC
Memory Controller
POWER7 Chip 3
6 cores
POWER7 Chip 4
6 cores
2.9 Gbps
POWER7 Chip 1
6 cores
POWER7 Chip 2
6 cores
Memory Controller
Memory Controller
2.46 GHz 2.46 GHz
2.46 GHz
2.46 GHz
1.0 GHz
1.0 GHz
2.9 GHz
2.9 GHz
Buffer Buffer
DIMM #3
DIMM #4
Buffer Buffer
DIMM #5
DIMM #6
Buffer Buffer
DIMM #7
DIMM #8
Buffer
Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer
DIMM #9
DIMM #10
DIMM #11
DIMM #12
DIMM #13
DIMM #14
DIMM #15
DIMM #16
2.9 GHz
Memory Controller
Memory Controller
USB #1
USB #2
USB Controller
2 System Ports
2 HMC Ports
2 SPCN Ports
VPD Chip
Service
Processor
2.46 GHz (2 * 4 Bytes) 19.712 GBps
136.448 GBps
per socket
136.448 GBps
per socket
136.448 GBps
per socket
136.448 GBps
per socket
2.46 GHz (2 * 4 Bytes) 19.712 GBps
2.46 GHz (2 * 4 Bytes) 19.712 GBps
2.46 GHz (2 * 4 Bytes) 19.712 GBps40 IBM Power 770 and 780 Technical Overview and Introduction
2.1 The IBM POWER7 processor
The IBM POWER7 processor represents a leap forward in technology achievement and
associated computing capability. The multi-core architecture of the POWER7 processor has
been matched with innovation across a wide range of related technologies to deliver leading
throughput, efficiency, scalability, and RAS.
Although the processor is an important component in delivering outstanding servers, many
elements and facilities have to be balanced on a server to deliver maximum throughput. As
with previous generations of systems based on POWER processors, the design philosophy
for POWER7 processor-based systems is one of system-wide balance in which the POWER7
processor plays an important role.
In many cases, IBM has been innovative in order to achieve required levels of throughput and
bandwidth. Areas of innovation for the POWER7 processor and POWER7 processor-based
systems include (but are not limited to) these:
On-chip L3 cache implemented in embedded dynamic random access memory (eDRAM)
Cache hierarchy and component innovation
Advances in memory subsystem
Advances in off-chip signaling
Exploitation of long-term investment in coherence innovation
The superscalar POWER7 processor design also provides a variety of other capabilities:
Binary compatibility with the prior generation of POWER processors
Support for PowerVM virtualization capabilities, including PowerVM Live Partition Mobility
to and from POWER6 and POWER6+ processor-based systemsChapter 2. Architecture and technical overview 41
Figure 2-3 shows the POWER7 processor die layout with the major areas identified:
Processor cores
L2 cache
L3 cache and chip interconnection
Simultaneous multiprocessing (SMP) links
Memory controllers.
Figure 2-3 POWER7 processor die with key areas indicated
2.1.1 POWER7 processor overview
The POWER7 processor chip is fabricated using the IBM 45 nm Silicon-On-Insulator (SOI)
technology using copper interconnect and implements an on-chip L3 cache using eDRAM.
The POWER7 processor chip is 567 mm2
and is built using 1.2 billion components
(transistors). Eight processor cores are on the chip, each with 12 execution units, 256 KB of
L2 cache, and access to up to 32 MB of shared on-chip L3 cache.
For memory access, the POWER7 processor includes two DDR3 (double data rate 3)
memory controllers, each with four memory channels. To be able to scale effectively, the
POWER7 processor uses a combination of local and global SMP links with very high
coherency bandwidth and takes advantage of the IBM dual-scope broadcast coherence
protocol.42 IBM Power 770 and 780 Technical Overview and Introduction
Table 2-1 summarizes the technology characteristics of the POWER7 processor.
Table 2-1 Summary of POWER7 processor technology
2.1.2 POWER7 processor core
Each POWER7 processor core implements aggressive out-of-order (OoO) instruction
execution to drive high efficiency in the use of available execution paths. The POWER7
processor has an Instruction Sequence Unit that is capable of dispatching up to six
instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to
the instruction execution units. The POWER7 processor has a set of 12 execution units:
Two fixed point units
Two load store units
Four double precision floating point units
One vector unit
One branch unit
One condition register unit
One decimal floating point unit
These caches are tightly coupled to each POWER7 processor core:
Instruction cache: 32 KB
Data cache: 32 KB
L2 cache: 256 KB, implemented in fast SRAM
Technology POWER7 processor
Die size 567 mm2
Fabrication technology 45 nm lithography
Copper interconnect
Silicon-on-Insulator
eDRAM
Components 1.2 billion components/transistors offering the
equivalent function of 2.7 billion (For further details see
2.1.6, “On-chip L3 cache innovation and Intelligent
Cache” on page 46.)
Processor cores 4, 6, or 8
Max execution threads core/chip 4/32
L2 cache core/chip 256 KB/2 MB
On-chip L3 cache core/chip 4 MB/32 MB
DDR3 memory controllers 1 or 2
SMP design-point 32 sockets with IBM POWER7 processors
Compatibility With prior generation of POWER processorChapter 2. Architecture and technical overview 43
2.1.3 Simultaneous multithreading
An enhancement in the POWER7 processor is the addition of the SMT4 mode to enable four
instruction threads to execute simultaneously in each POWER7 processor core. Thus, these
are the instruction thread execution modes of the POWER7 processor:
SMT1: Single instruction execution thread per core
SMT2: Two instruction execution threads per core
SMT4: Four instruction execution threads per core
SMT4 mode enables the POWER7 processor to maximize the throughput of the processor
core by offering an increase in processor-core efficiency. SMT4 mode is the latest step in an
evolution of multithreading technologies introduced by IBM. Figure 2-4 shows the evolution of
simultaneous multithreading in the industry.
Figure 2-4 Evolution of simultaneous multi-threading
The various SMT modes offered by the POWER7 processor allow flexibility, enabling users to
select the threading technology that meets an aggregation of objectives such as
performance, throughput, energy use, and workload enablement.
Intelligent Threads
The POWER7 processor features Intelligent Threads that can vary based on the workload
demand. The system either automatically selects (or the system administrator can manually
select) whether a workload benefits from dedicating as much capability as possible to a
single thread of work, or if the workload benefits more from having capability spread across
two or four threads of work. With more threads, the POWER7 processor can deliver more
total capacity as more tasks are accomplished in parallel. With fewer threads, those
workloads that need very fast individual tasks can get the performance that they need for
maximum benefit.
Multi-threading evolution
Thread 0 Executing Thread 1 Executing No Thread Executing
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
1995 single thread out of order
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
1997 hardware multi-thread
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
2004 2-way SMT
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
2010 4-way SMT
Thread 2 Executing Thread 3 Executing44 IBM Power 770 and 780 Technical Overview and Introduction
2.1.4 Memory access
Each POWER7 processor chip has two DDR3 memory controllers, each with four memory
channels (enabling eight memory channels per POWER7 processor). Each channel operates
at 6.4 GHz and can address up to 32 GB of memory. Thus, each POWER7 processor chip is
capable of addressing up to 256 GB of memory.
Figure 2-5 gives a simple overview of the POWER7 processor memory access structure.
Figure 2-5 Overview of POWER7 memory access structure
2.1.5 Flexible POWER7 processor packaging and offerings
POWER7 processors have the unique ability to optimize to various workload types. For
example, database workloads typically benefit from very fast processors that handle high
transaction rates at high speeds. Web workloads typically benefit more from processors with
many threads that allow the breaking down of web requests into many parts and handle them
in parallel. POWER7 processors uniquely have the ability to provide leadership performance
in either case.
TurboCore mode
Users can opt to run selected servers in TurboCore mode. It uses four cores per POWER7
processor chip with access to the full 32 MB of L3 cache (8 MB per core) and at a faster
processor core frequency, which might save on software costs for those applications that are
licensed per core.
Note: In certain POWER7 processor-based systems, one memory controller is active with
four memory channels being used.
POWER7 Processor Chip
Advanced
Buffer ASIC
Chip
Memory
Controller
Advanced
Buffer ASIC
Chip
Memory
Controller
Core Core Core Core
Core Core Core Core
Dual Integrated DDR3 memory controllers
• High channel and DIMM utilization
• Advanced energy management
• RAS advances
Eight high-speed 6.4 GHz channels
• New low-power differential signalling
New DDR3 buffer chip architecture
• Larger capacity support (32 GB/core)
• Energy management support
• RAS enablement
DDR3 DRAMsChapter 2. Architecture and technical overview 45
MaxCore mode
MaxCore mode is for workloads that benefit from a higher number of cores and threads
handling multiple tasks simultaneously that take advantage of increased parallelism.
MaxCore mode provides up to eight cores and up to 32 threads per POWER7 processor.
POWER7 processor 4-core and 6-core offerings
The base design for the POWER7 processor is an 8-core processor with 32 MB of on-chip L3
cache (4 MB per core). However, the architecture allows for differing numbers of processor
cores to be active, 4 cores or 6 cores, as well as the full 8-core version.
In most cases (MaxCore mode), the L3 cache associated with the implementation is
dependant on the number of active cores. For a 6-core version, this typically means that
6 x 4 MB (24 MB) of L3 cache is available. Similarly, for a 4-core version, the L3 cache
available is 16 MB.
Optimized for servers
The POWER7 processor forms the basis of a flexible compute platform and can be offered in
a number of guises to address differing system requirements.
The POWER7 processor can be offered with a single active memory controller with four
channels for servers where higher degrees of memory parallelism are not required.
Similarly, the POWER7 processor can be offered with a variety of SMP bus capacities that
are appropriate to the scaling-point of particular server models.
Figure 2-6 outlines the physical packaging options that are supported with POWER7
processors.
Figure 2-6 Outline of the POWER7 processor physical packaging
Note: TurboCore is available on the Power 780 and Power 795.
Note: The 4-core processor is not available on the Power 770 and Power 780.
Single Chip Organic
1 x Memory Controller
Local broadcast SMP links active
Single Chip Glass Ceramic
2 x Memory Controllers
Local broadcast SMP links active
Global broadcast SMP links active46 IBM Power 770 and 780 Technical Overview and Introduction
2.1.6 On-chip L3 cache innovation and Intelligent Cache
A breakthrough in material engineering and microprocessor fabrication has enabled IBM to
implement the L3 cache in eDRAM and place it on the POWER7 processor die. L3 cache is
critical to a balanced design, as is the ability to provide good signaling between the L3 cache
and other elements of the hierarchy, such as the L2 cache or SMP interconnect.
The on-chip L3 cache is organized into separate areas with differing latency characteristics.
Each processor core is associated with a Fast Local Region of L3 cache (FLR-L3) but also
has access to other L3 cache regions as shared L3 cache. Additionally, each core can
negotiate to use the FLR-L3 cache associated with another core, depending on reference
patterns. Data can also be cloned to be stored in more than one core's FLR-L3 cache, again
depending on reference patterns. This Intelligent Cache management enables the POWER7
processor to optimize the access to L3 cache lines and minimize overall cache latencies.
Figure 2-7 shows the FLR-L3 cache regions for each of the cores on the POWER7
processor die.
Figure 2-7 Fast local regions of L3 cache on the POWER7 processor
The innovation of using eDRAM on the POWER7 processor die is significant for
several reasons:
Latency improvement
A six-to-one latency improvement occurs by moving the L3 cache on-chip compared to L3
accesses on an external (on-ceramic) ASIC.
Bandwidth improvement
A 2x bandwidth improvement occurs with on-chip interconnect. Frequency and bus sizes
are increased to and from each core.Chapter 2. Architecture and technical overview 47
No off-chip driver or receivers
Removing drivers or receivers from the L3 access path lowers interface requirements,
conserves energy, and lowers latency.
Small physical footprint
The performance of eDRAM when implemented on-chip is similar to conventional SRAM
but requires far less physical space. IBM on-chip eDRAM uses only a third of the
components used in conventional SRAM, which has a minimum of six transistors to
implement a 1-bit memory cell.
Low energy consumption
The on-chip eDRAM uses only 20% of the standby power of SRAM.
2.1.7 POWER7 processor and Intelligent Energy
Energy consumption is an important area of focus for the design of the POWER7 processor,
which includes Intelligent Energy features that help to dynamically optimize energy usage
and performance so that the best possible balance is maintained. Intelligent Energy features
like EnergyScale work with IBM Systems Director Active Energy Manager to dynamically
optimize processor speed based on thermal conditions and system utilization.
2.1.8 Comparison of the POWER7 and POWER6 processors
Table 2-2 shows comparable characteristics between the generations of POWER7 and
POWER6 processors.
Table 2-2 Comparison of technology for the POWER7 processor and the prior generation
POWER7 POWER6+ POWER6
Technology 45 nm 65 nm 65 nm
Die size 567 mm2
341 mm2
341 mm2
Maximum cores 8 2 2
Maximum SMT
threads per core
4 threads 2 threads 2 threads
Maximum frequency 4.25 GHz 5.0 GHz 4.7 GHz
L2 Cache 256 KB per core 4 MB per core 4 MB per core
L3 Cache 4 MB of FLR-L3 cache
per core with each core
having access to the full
32 MB of L3 cache,
on-chip eDRAM
32 MB off-chip eDRAM
ASIC
32 MB off-chip eDRAM
ASIC
Memory support DDR3 DDR2 DDR2
I/O bus Two GX++ One GX++ One GX++
Enhanced cache
mode (TurboCore)
Ye s
a
a. Not supported on the Power 770 and Power 780 4-socket systems.
No No
Sleep and nap mode
b
Both Nap only Nap only48 IBM Power 770 and 780 Technical Overview and Introduction
2.2 POWER7 processor cards
IBM Power 770 and Power 780 servers are modular systems built using one to four CEC
enclosures. The processor and memory subsystem in each CEC enclosure is contained on a
single processor card. The processor card contains either two or four processor sockets and
16 fully buffered DDR3 memory DIMMs.
The IBM Power 770 supports the 2-socket processor cards, populated with 6-core or 8-core
POWER7 processors. This enables a maximum system configuration of 64-cores, built from
four CEC enclosures.
The IBM Power 780 supports both the 2-socket and 4-socket processor cards. The 4-socket
processor cards are populated with 6-core POWER7 processors, enabling a maximum
system configuration of 96 cores.
2.2.1 Two-socket processor card
The 2-socket processor card (Figure 2-8) is supported in both the Power 770 and the
Power 780 system. Each processor is connected to eight DIMMs via two memory controllers
(four DIMMS on each).
Figure 2-8 IBM Power 770 and Power 780 2-socket processor card
b. For more information about sleep and nap modes, see 2.15.1, “IBM EnergyScale technology”
on page 114.
Note: Mixing 2-socket and 4-socket CEC enclosures within a Power 780 is not supported.
POWER7
processors
DDR3
Memory
DIMMsChapter 2. Architecture and technical overview 49
Power 770 systems
IBM Power 770 systems support two POWER7 processor options of varying clock speed and
core counts. Table 2-3 summarizes these options.
Table 2-3 Summary of POWER7 processor options for the Power 770 server
With two POWER7 processors in each enclosure, systems can be equipped as follows:
Using 6-core POWER7 processors:
– 12 cores
– 24 cores
– 36 cores
– 48 cores
Using 8-core POWER7 processors:
– 16-cores
– 32-cores
– 48-cores
– 64-cores
Power 780 systems
The IBM Power 780 2-socket CEC enclosures offer POWER7 processors with 8 cores.
However, the system can be booted on one of two modes:
MaxCore mode
TurboCore mode
In MaxCore mode, all eight cores of each POWER7 processor are active, run at 3.92 GHz,
and have full access to the 32 MB of L3 cache. In TurboCore mode the system uses just four
of the POWER7 processor cores, but runs at the higher frequency of 4.14 GHz and has
access to the full 32 MB of L3 cache.
Table 2-4 summarizes the POWER7 processor and mode options for the Power 780 system.
Table 2-4 Summary of POWER7 processor options and modes for the Power 780 server
Cores per POWER7
processor
Frequency L3 cache size available per
POWER7 processor
6 3.72 GHz 24 MB
8 3.30 GHz 32 MB
Active cores per
POWER7 processor
System mode Frequency L3 cache size available per
POWER7 processor
8 MaxCore 3.92 GHz 32 MB
4 TurboCore 4.14 GHz 32 MB50 IBM Power 770 and 780 Technical Overview and Introduction
With two POWER7 processors in each enclosure, systems can be equipped as follows:
MaxCore mode:
– 16 cores
– 32 cores
– 48 cores
– 64 cores
TurboCore mode:
– 8 cores
– 16 cores
– 24 cores
– 32 cores
2.2.2 Four-socket processor card
A 4-socket processor card is supported on the Power 780 (Figure 2-9), enabling a maximum
system configuration of 96 cores (6-core processors). Each POWER7 processor is connected
to four memory DIMMs through a single memory controller.
Figure 2-9 IBM Power 780 4-socket processor card
Power 780 Systems
Table 2-5 summarizes the POWER7 processor options for the Power 780 4-socket system.
Table 2-5 Summary of POWER7 processor options and modes for the Power 780 server
The TurboCore option is not supported with the 4-socket processor cards.
Cores per POWER7
processor
Frequency L3 cache size available per
POWER7 processor
6 3.44 GHz 24 MB
POWER7
processors
DDR3
Memory
DIMMsChapter 2. Architecture and technical overview 51
2.2.3 Processor comparison
The 2-socket and 4-socket processor cards available for the Power 780 utilize slightly different
POWER7 processors. Table 2-6 shows a comparison.
Table 2-6 Comparison of processors used with 2-socket and 4-socket processor cards
The most significant difference between the processors are the interconnects. On the
2-socket processor card, the POWER7 processor has two memory controllers, each
connected to four memory DIMMs (Figure 2-10).
Figure 2-10 Processor interconnects on 2-socket processor card
Area POWER7 processor used on
2-socket CPU card
POWER7 processor used on
4-socket CPU card
Technology 45 nm 45 nm
Die size 567 mm2
567 mm2
Power 250 W 150 W
Cores 8 6
Max frequency 3.92 GHz
(4.14 GHz with TurboCore)
3.44 GHz
L2/L3 256 K/4 MB per core 256 K/4 MB per core
Memory support DDR3 DDR3
Fabric Bus Star Fabric Bus Star Fabric Bus
I/O Bus Two GX++ Two GX++
TurboCore mode Yes No
Sleep/nap mode Yes Yes
P7
Z Y X
A
B
MC0
MC1
GX1
GX0
DIMM 1
DIMM 2
DIMM 3
DIMM 4
DIMM 1
DIMM 2
DIMM 3
DIMM 4
GX1 Bus (4 Byte – 2.46Gb/s)
GX0 Bus (4 Byte – 2.46Gb/s)
Z Y X Buses
8 byte
3.248 Gb/s
A B Buses
8 byte
2.464 Gb/s52 IBM Power 770 and 780 Technical Overview and Introduction
The POWER7 processor used on the 4-socket processor card also has two memory
controllers, but only one is used. This results in four DIMMs per memory controller, the same
as the processor used on the 2-socket processor card.
Similarly, the processor used on the 4-socket CPU card has two GX++ buses, but only one is
used (Figure 2-11).
Figure 2-11 Processor interconnects on 4-socket processor card
2.3 Memory subsystem
On the Power 770 and Power 780 servers, independently of using two or four Single Chip
Modules (SCMs), each enclosure houses 16 DDR3 DIMM slots. The DIMM cards for the
Power 770 and Power 780 are 96 mm tall, fully buffered, and placed in one of the 16 DIMM
slots on the processor card.
2.3.1 Fully buffered DIMM
Fully buffered DIMM technology is used to increase reliability, speed, and density of
memory subsystems. Conventionally, data lines from the memory controllers have to be
connected to the data lines in every DRAM module. As memory width and access speed
increases, the signal decays at the interface of the bus and the device. This effect traditionally
degrades either the memory access times or memory density. Fully buffered DIMMs
overcome this effect by implementing an advanced buffer between the memory controllers
and the DRAMs with two independent signaling interfaces. This technique decouples the
DRAMs from the bus and memory controller interfaces, allowing efficient signaling between
the buffer and the DRAM.
2.3.2 Memory placement rules
The minimum DDR3 memory capacity for the Power 770 and Power 780 systems is 64 GB of
installed memory.
P7
W Y X
A
B
MC0
MC1
GX1
GX0
DIMM 1
DIMM 2
DIMM 3
DIMM 4
GX1 Bus (4 Byte – 2.46Gb/s)
W Y X Buses
8 byte
2.8 Gb/s
B Buses
8 byte
2.464 Gb/sChapter 2. Architecture and technical overview 53
All the memory DIMMs for the Power 770 and Power 780 are Capacity Upgrade on Demand
capable and must have a minimum of 50% of its physical capacity activated. For example, the
minimum installed memory for both servers is 64 GB RAM, whereas they can have a
minimum of 32 GB RAM active.
Figure 2-12 shows the physical memory DIMM topology for the Power 770 and Power 780
with two single-chip-modules (SCMs).
Figure 2-12 Physical memory DIMM topology for the Power 770 and Power 780 with two SCMs
For the POWER 770 and POWER 780 server models with two SCMs, there are 16 buffered
DIMM slots:
DIMM slots J1A to J8A are connected to the memory controllers on POWER7 processor 0.
DIMM slots J1B to J8B are connected to the memory controllers on POWER7 processor 1.
These DIMMs slots are divided into four Quads, each Quad having four DIMMs.
Note: DDR2 memory (used in POWER6 processor-based systems) is not supported in
POWER7 processor-based systems.
P7
SCM
A
B
A
B
P7
SCM
Regulator #1
Regulator #4
A
B
A
B
Regulator #5
DDR3 DIMM #5
DDR3 DIMM #6
DDR3 DIMM #7
DDR3 DIMM #8
MC0
MC1
DDR3 DIMM #1
DDR3 DIMM #2
DDR3 DIMM #3
DDR3 DIMM #4
DDR3 DIMM #13
DDR3 DIMM #14
DDR3 DIMM #15
DDR3 DIMM #16
DDR3 DIMM #9
DDR3 DIMM #10
DDR3 DIMM #11
DDR3 DIMM #12
MC0
MC1
Regulator #6
Regulator #7
TPMD Slot
Regulator #3
Regulator #2
I/O Connectors
P3-C1
P3-C2
P3-C3
P3-C4
P3-C6
P3-C7
P3-C8
P3-C9
P3-C10
P3-C11
P3-C12
P3-C13
P3-C17
P3-C18
P3-C19
P3-C20
J1A
J2A
J3A
J4A
J5A
J6A
J7A
J8A
J1B
J2B
J3B
J4B
J5B
J6B
J7B
J8B
Quad #1
Quad #1
Quad #2
Quad #2
Quad #1
Quad #1
Quad #2
Quad #2
Quad #3
Quad #3
Quad #4
Quad #4
Quad #3
Quad #3
Quad #4
Quad #454 IBM Power 770 and 780 Technical Overview and Introduction
Figure 2-13 shows the physical memory DIMM topology for the Power 780 with four
single-chip-modules (SCMs).
Figure 2-13 Physical memory DIMM topology for the Power 780 with four SCMs
For the POWER 780 with the four SCMs, there are 16 buffered DIMM slots available:
DIMM slots J1A to J4A are connected to the memory controller on POWER7 processor 0.
DIMM slots J5A to J8A are connected to the memory controller on POWER7 processor 1.
DIMM slots J1B to J4B are connected to the memory controller on POWER7 processor 2.
DIMM slots J5B to J8B are connected to the memory controller on POWER7 processor 3.
The memory-plugging rules are as follows:
DIMMs must be installed 4x DIMMs at a time, referred to as a DIMM-quad and identified in
Table 2-7 on page 55 and Table 2-8 on page 55 by colors.
DIMM-quads must be homogeneous. (Only DIMMs of the same capacity are allowed on
the same Quad.)
A DIMM-quad is the minimum installable unit.
For maximum memory performance, the total memory capacity on each memory
controller must be equivalent.
The DIMM-quad placement rules for a single enclosure are as follows (see Figure 2-12 on
page 53 for the physical memory topology):
– Quad 1: J1A, J2A, J5A, J6A (mandatory minimum for each enclosure)
– Quad 2: J1B, J2B, J5B, J6B (mandatory minimum for each enclosure)
– Quad 3: J3A, J4A, J7A, J8A
– Quad 4: J3B, J4B, J7B, J8B
B
B
B
B
Regulator #5
DDR3 DIMM #5
DDR3 DIMM #6
DDR3 DIMM #7
DDR3 DIMM #8
DDR3 DIMM #1
DDR3 DIMM #2
DDR3 DIMM #3
DDR3 DIMM #4
DDR3 DIMM #13
DDR3 DIMM #14
DDR3 DIMM #15
DDR3 DIMM #16
DDR3 DIMM #9
DDR3 DIMM #10
DDR3 DIMM #11
DDR3 DIMM #12
Regulator #6
Regulator #7
TPMD Slot
I/O Connectors
P3-C1
P3-C2
P3-C3
P3-C4
P3-C6
P3-C7
P3-C8
P3-C9
P3-C10
P3-C11
P3-C12
P3-C13
P3-C17
P3-C18
P3-C19
P3-C20
J1A
J2A
J3A
J4A
J5A
J6A
J7A
J8A
J1B
J2B
J3B
J4B
J5B
J6B
J7B
J8B
Quad #1
Quad #1
Quad #2
Quad #2
Quad #1
Quad #1
Quad #2
Quad #2
Quad #3
Quad #3
Quad #4
Quad #4
Quad #3
Quad #3
Quad #4
Quad #4
P7
SCM
P7
SCM
Regulator #1
Regulator #4
B
B
MC0
MC0
Regulator #3
Regulator #2
P7
SCM
B
MC0
P7
SCM
B
MC0Chapter 2. Architecture and technical overview 55
Table 2-7 shows the optimal placement of each DIMM-quad within a single enclosure system.
Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B, as shown with the highlighted color.
Table 2-7 Optimum DIMM-quad placement for a single enclosure system
When populating a multi-enclosure system with DIMM-quads, each enclosure must have at
least two DIMM-quads installed in slots J1A, J2A, J5A, J6A, J1B, J2B, J5B, and J6B. After
the mandatory requirements and memory-plugging rules are followed, there is an optimal
approach to populating the systems.
Table 2-8 shows the optimal placement of each DIMM-quad within a dual-enclosure system.
Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Table 2-8 Optimum DIMM-quad placement for a dual enclosure system
Enclosure 0
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q1 Q1 Q3 Q3 Q1 Q1 Q3 Q3 Q2 Q2 Q4 Q4 Q2 Q2 Q4 Q4
Mandatory: Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Note: For maximum memory performance, the total memory capacity on each memory controller must
be equivalent.
Enclosure 0
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q1 Q1 Q5 Q5 Q1 Q1 Q5 Q5 Q3 Q3 Q7 Q7 Q3 Q3 Q7 Q7
Enclosure 1
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q2 Q2 Q6 Q6 Q2 Q2 Q6 Q6 Q4 Q4 Q8 Q8 Q4 Q4 Q8 Q8
Mandatory: Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Note: For maximum memory performance, the total memory capacity on each memory controller must
be equivalent.56 IBM Power 770 and 780 Technical Overview and Introduction
Table 2-9 shows the optimal placement of each DIMM-quad within a three-enclosure system.
Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Table 2-9 Optimum DIMM-quad placement for a three-enclosure system
Enclosure 0
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory Controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q1 Q1 Q7 Q7 Q1 Q1 Q7 Q7 Q4 Q4 Q10 Q10 Q4 Q4 Q10 Q10
Enclosure 1
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q2 Q2 Q8 Q8 Q2 Q2 Q8 Q8 Q5 Q5 Q11 Q11 Q5 Q5 Q11 Q11
Enclosure 2
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q3 Q3 Q9 Q9 Q3 Q3 Q9 Q9 Q6 Q6 Q12 Q12 Q6 Q6 Q12 Q12
Mandatory: Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Note: For maximum memory performance, the total memory capacity on each memory controller must
be equivalent.Chapter 2. Architecture and technical overview 57
Table 2-10 shows the optimal placement of each DIMM-quad within a four-enclosure system.
Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Table 2-10 Optimum DIMM-quad placement for a four-enclosure system
Enclosure 0
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q1 Q1 Q9 Q9 Q1 Q1 Q9 Q9 Q5 Q5 Q13 Q13 Q5 Q5 Q13 Q13
Enclosure 1
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q2 Q2 Q10 Q10 Q2 Q2 Q10 Q10 Q6 Q6 Q14 Q14 Q6 Q6 Q14 Q14
Enclosure 2
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q3 Q3 Q11 Q11 Q3 Q3 Q11 Q11 Q7 Q7 Q15 Q15 Q7 Q7 Q15 Q15
Enclosure 3
POWER7 processor 0 POWER7 processor 1
Memory controller 1 Memory controller 0 Memory controller 1 Memory controller 0
J1A J2A J3A J4A J5A J6A J7A J8A J1B J2B J3B J4B J5B J6B J7B J8B
Q4 Q4 Q12 Q12 Q4 Q4 Q12 Q12 Q8 Q8 Q16 Q16 Q8 Q8 Q16 Q16
Mandatory: Each enclosure must have at least two DIMM-quads installed in slots J1A, J2A, J5A, J6A,
J1B, J2B, J5B, and J6B.
Note: For maximum memory performance, the total memory capacity on each memory controller must
be equivalent.58 IBM Power 770 and 780 Technical Overview and Introduction
2.3.3 Memory throughput
POWER7 has exceptional cache, memory, and interconnect bandwidths. Table 2-11 shows
the bandwidth estimate for the Power 770 system running at 3.3 GHz.
Table 2-11 Power 770 memory bandwidth estimates for POWER7 cores running at 3.3 GHz
With an increase in frequency, the Power 780 running at 3.92 GHz generates higher cache
bandwidth (Table 2-12).
Table 2-12 Power 780 memory bandwidth estimates for POWER7 cores running at 3.92 GHz
In TurboCore mode, the Power 780 will have its cores running at 4.14 GHz generating even
higher cache bandwidth (Table 2-13).
Table 2-13 Power 780 memory bandwidth estimates for POWER7 cores running at 4.14 GHz
Memory Bandwidth
L1 (data) cache 158.4 Gbps
L2 cache 158.4 Gbps
L3 cache 105.6 Gbps
System memory:
4x enclosures:
136.44 Gbps per socket
1091.58 Gbps
Inter-node buses (four enclosures) 158.02 Gbps
Intra-node buses (four enclosures) 415.74 Gbps
Memory Bandwidth
L1 (data) cache 188.16 Gbps
L2 cache 188.16 Gbps
L3 cache 125.44 Gbps
System memory:
4x enclosures:
136.45 Gbps per socket
1091.58 Gbps
Inter-node buses (4 enclosures) 158.02 Gbps
Intra-node buses (4 enclosures) 415.74 Gbps
Memory Bandwidth
L1 (data) cache 198.72 Gbps
L2 cache 198.72 Gbps
L3 cache 132.48 Gbps
System memory:
4x enclosures:
136.45 Gbps per socket
1091.58 Gbps
Inter-node buses (4 enclosures) 158.02 Gbps
Intra-node buses (4 enclosures) 415.74 GbpsChapter 2. Architecture and technical overview 59
2.3.4 Active Memory Mirroring
Power 770 and Power 780 servers have the ability to provide mirroring of the hypervisor
code among different memory DIMMs. This feature will enhance the availability of a server
and keep it operable in case a DIMM failure occurs in one of the DIMMs that hold the
hypervisor code.
The hypervisor code, which resides on the initial DIMMs (J1A to J8A), will be mirrored on the
same group of DIMMs to allow for more usable memory. Table 2-9 on page 56 shows the
DIMMs involved on the Memory Mirroring operation.
Figure 2-14 shows how Active Memory Mirroring uses different DIMMs.
Figure 2-14 Memory Mirroring among different DIMMs
To enable the Active Memory Mirroring (AMM) feature, the server must have 8x DIMMs of the
same size populated on slots J1A to J8A. It is also mandatory that the server has enough free
memory to accommodate the mirrored memory pages. Active Memory Mirroring is required
and automatically enabled on the Power 780. However, on the Power 770 AMM is optional
and is ordered and enabled via feature #4797.
Besides the hypervisor code itself, other components that are vital to the server operation are
also mirrored:
Hardware page tables (HPTs), responsible for tracking the state of the memory pages
assigned to partitions
Translation control entities (TCEs), responsible for providing I/O buffers for the
partition’s communications
Memory used by the hypervisor to maintain partition configuration, I/O states, Virtual I/O
information, and partition state
There are components that are not mirrored after they are not vital to the regular server
operations and would require a larger amount of memory to accommodate its data:
Advanced Memory Sharing Pool
Memory used to hold the contents of platform dumps
Note: Active Memory Mirroring will not mirror partition data. It was designed to mirror
only the hypervisor code and its components, allowing this data to be protected against a
DIMM failure.
POWER7
Processor Chip
MC0
MC1
Hypervisor Data J1A
Hypervisor Data J2A
J3A
J4A
Mirrored Hypervisor Data J5A
Mirrored Hypervisor Data J6A
J7A
J8A60 IBM Power 770 and 780 Technical Overview and Introduction
It is possible to check whether the Memory Mirroring option is enabled and change its current
status via HMC, under the Advanced Tab on the CEC Properties Panel (Figure 2-15).
Figure 2-15 CEC Properties window on an HMC
After a failure on one of the DIMMs containing hypervisor data occurs, all the server
operations remain active and FSP will isolate the failing DIMMs. Because there are no longer
eight functional DIMMs behind a memory controller, Active Memory Mirroring will not be
available until this DIMM is replaced. Systems will stay in the partially mirrored state until the
failing DIMM is replaced.
2.4 Capacity on Demand
Several types of Capacity on Demand (CoD) are optionally available on the Power 770 and
Power 780 servers to help meet changing resource requirements in an on demand
environment by using resources that are installed on the system but that are not activated.
2.4.1 Capacity Upgrade on Demand (CUoD)
CUoD allows you to purchase additional permanent processor or memory capacity and
dynamically activate them when needed.
2.4.2 On/Off Capacity on Demand (On/Off CoD)
On/Off Capacity on Demand allows you to temporarily activate and deactivate processor
cores and memory units to help meet the demands of business peaks such as seasonal
activity, period-end, or special promotions. When you order an On/Off CoD feature, you
receive an enablement code that allows a system operator to make requests for additional
processor and memory capacity in increments of one processor day or 1 GB memory day.
The system monitors the amount and duration of the activations. Both prepaid and post-pay
options are available.Chapter 2. Architecture and technical overview 61
On the post-pay options, charges are based on usage reporting collected monthly.
Processors and memory can be activated and turned off an unlimited number of times when
additional processing resources are needed.
This offering provides a system administrator an interface at the HMC to manage the
activation and deactivation of resources. A monitor that resides on the server records the
usage activity. This usage data must be sent to IBM on a monthly basis. A bill is then
generated based on the total amount of processor and memory resources utilized, in
increments of processor and memory (1 GB) days.
Before using temporary capacity on your server, you must enable your server. To do this, an
enablement feature (MES only) must be ordered, and the required contracts must be in place.
If a Power 770 or Power 780 server uses the IBM i operating system in addition to any other
supported operating system on the same server, the client must inform IBM which operating
system caused the temporary On/Off CoD processor usage so that the correct feature can be
used for billing.
The features that are used to order enablement codes and support billing charges on the
Power 770 and Power 780 can be seen in 1.4.6, “Summary of processor features” on
page 15, and 1.4.7, “Memory features” on page 19.
The On/Off CoD process consists of three steps:
1. Enablement
Before requesting temporary capacity on a server, you must enable it for On/Off CoD. To
do this, order an enablement feature and sign the required contracts. IBM will generate an
enablement code, mail it to you, and post it on the web for you to retrieve and enter on the
target server.
A processor enablement code allows you to request up to 360 processor days of
temporary capacity. If the 360 processor-day limit is reached, place an order for another
processor enablement code to reset the number of days that you can request back to 360.
A memory enablement code lets you request up to 999 memory days of temporary
capacity. If you have reached the limit of 999 memory days, place an order for another
memory enablement code to reset the number of allowable days that you can request
back to 999.
2. Activation requests
When On/Off CoD temporary capacity is needed, simply use the HMC menu for
On/Off CoD. Specify how many of the inactive processors or GB of memory are required
to be temporarily activated for a certain number of days. You will be billed for the days
requested, whether the capacity is assigned to partitions or left in the Shared
Processor Pool.
At the end of the temporary period (days that were requested), you must ensure that the
temporarily activated capacity is available to be reclaimed by the server (not assigned to
partitions), or you are billed for any unreturned processor days.
3. Billing
The contract, signed by the client before receiving the enablement code, requires the
On/Off CoD user to report billing data at least once a month (whether or not activity
occurs). This data is used to determine the proper amount to bill at the end of each billing
period (calendar quarter). Failure to report billing data for use of temporary processor or
memory capacity during a billing quarter can result in default billing equivalent to 90
processor days of temporary capacity. 62 IBM Power 770 and 780 Technical Overview and Introduction
For more information regarding registration, enablement, and usage of On/Off CoD, visit:
http://www.ibm.com/systems/power/hardware/cod
2.4.3 Utility Capacity on Demand (Utility CoD)
Utility CoD automatically provides additional processor performance on a temporary basis
within the Shared Processor Pool.
Utility CoD enables you to place a quantity of inactive processors into the server's Shared
Processor Pool, which then becomes available to the pool's resource manager. When the
server recognizes that the combined processor utilization within the Shared Processor Pool
exceeds 100% of the level of base (purchased and active) processors assigned across
uncapped partitions, then a Utility CoD Processor Minute is charged and this level of
performance is available for the next minute of use.
If additional workload requires a higher level of performance, the system automatically
allows the additional Utility CoD processors to be used, and the system automatically and
continuously monitors and charges for the performance needed above the base
(permanent) level.
Registration and usage reporting for Utility CoD is made using a public website, and payment
is based on reported usage. Utility CoD requires PowerVM Standard Edition or PowerVM
Enterprise Edition to be active.
If a Power 770 or Power 780 server uses the IBM i operating system in addition to any other
supported operating system on the same server, the client must inform IBM which operating
system caused the temporary Utility CoD processor usage so that the correct feature can be
used for billing.
For more information regarding registration, enablement, and use of Utility CoD, visit:
http://www.ibm.com/systems/support/planning/capacity/index.html
2.4.4 Trial Capacity On Demand (Trial CoD)
A standard request for Trial CoD requires you to complete a form including contact
information and vital product data (VPD) from your Power 770 or Power 780 system with
inactive CoD resources.
A standard request activates two processors or 4 GB of memory (or both two processors
and 4 GB of memory) for 30 days. Subsequent standard requests can be made after each
purchase of a permanent processor activation. An HMC is required to manage Trial
CoD activations.
An exception request for Trial CoD requires you to complete a form including contact
information and VPD from your Power 770 or Power 780 system with inactive CoD
resources. An exception request will activate all inactive processors or all inactive memory (or
all inactive processor and memory) for 30 days. An exception request can be made only one
time over the life of the machine. An HMC is required to manage Trial CoD activations.
To request either a Standard or an Exception Trial, visit:
https://www-912.ibm.com/tcod_reg.nsf/TrialCod?OpenFormChapter 2. Architecture and technical overview 63
2.4.5 Software licensing and CoD
For software licensing considerations with the various CoD offerings, see the most recent
revision of the Capacity on Demand User’s Guide at:
http://www.ibm.com/systems/power/hardware/cod
2.5 CEC Enclosure interconnection cables
IBM Power 770 or 780 systems can be configured with more than one system enclosure.
The connection between the processor cards in the separate system enclosures requires
a set of processor drawer interconnect cables. Each system enclosure must be connected
to each other through a flat flexible SMP cable. These cables are connected on the front of
the drawers.
Furthermore, service processor cables are needed to connect the components in each
system enclosure to the active service processor for system functions monitoring.
These cables connect at the rear of each enclosure and are required for two-drawer,
three-drawer, and four-drawer configurations.
The star fabric bus topology that connects the processors together in separate drawers is
contained on SMP Flex cables that are routed external to the drawers. These flex cables
attach directly to the CPU cards at the front of the drawer and are routed behind the front
covers of the drawers.
For performance reasons, there will be multiple link connections between the CEC
enclosures, and there will be differences in SMP Flex cabling between the two SCM
processor card configurations (#4983, #4984, #5003) and the four SCM processor cards
configurations (#EP24).
The SMP and FSP cable features described in Table 2-14 are required to connect
the processors together when system configuration is made of two, three, or four
CEC enclosures.
Table 2-14 Required flex cables feature codes
The cables are designed to support hot-addition of a system enclosure up to the maximum
scalability. When adding a new drawer, existing cables remain in place and new cables are
added. The only exception is for cable #3711, which is replaced when growing from a
2-drawer to a 3-drawer configuration.
CEC enclosures SMP cables for the
0-12 core (#4983) and
0-16 core (#4984)
processor cards
SMP cables for the
0-24 core (#EP24)
processor cards
FSP cables
Two #3711 and #3712 #3715 and #3716 #3671
Three #3712 and #3713 #3715, #3716, #3717 #3671 and #3672
Four #3712, #3713, #3714 #3716, #3717, #3718 #3671, #3672, #3673
Note: The #3712 and #3716 provide two SMP cable sets, and #3714 and #3717 provide
three SMP cable sets.64 IBM Power 770 and 780 Technical Overview and Introduction
The cables are also designed to allow the concurrent maintenance of the Power 770 or
Power 780 in case the IBM service representative needs to extract a system enclosure from
the rack. The design of the flexible cables allows each system enclosure to be disconnected
without any impact on the other drawers.
To allow such concurrent maintenance operation, plugging in the SMP Flex cables in the
order of their numbering is extremely important. Each cable is numbered (Figure 2-16).
Figure 2-16 SMP cable installation order
Table 2-15 reports the SMP cable usage for two-enclosure or three-enclosure scenarios.
Table 2-15 SMP cable two and three enclosures scenario
Processor type SMP cables Cable number From
connector
To
connector
Two CEC enclosures
#4983, #4984, #5003,
(0-12 core, 0-16 core)
#3711 1 U1-P3-T1 U2-P3-T1
#3712 2 U1-P3-T4 U2-P3-T4
#EP24 (0-24 core) #3715 1 U1-P3-T1 U2-P3-T1
#3716 4 U1-P3-T2 U3-P3-T2
2 U1-P3-T4 U2-P3-T4
Three CEC enclosures
#4983, #4984, #5003,
(0-12 core, 0-16 core)
#3712 2 U1-P3-T4 U2-P3-T4
#3713 3 U1-P3-T2 U3-P3-T1
4 U2-P3-T2 U3-P3-T2Chapter 2. Architecture and technical overview 65
Table 2-16 reports the SMP cable usage for the four-enclosure scenario.
Table 2-16 SMP cable four-enclosures scenario
When adding CEC enclosures in an MES upgrade, SMP cables will likely have to be added
depending on how many enclosures are being added.
Adding more than one CEC enclosure to an existing configuration can be accomplished with
a hot-add if the enclosures are added one at a time. The other option is to bring the entire
system down and add all the additional enclosures at the same time. Depending on whether
the hot-add option is desired, certain SMP cable features might or might not be necessary.
#EP24 (0-24 core) #3715 1 U1-P3-T1 U2-P3-T1
#3716 4 U2-P3-T2 U3-P3-T2
2 U1-P3-T4 U2-P3-T4
#3717 7 U2-P3-T3 U3-P3-T3
3 U1-P3-T2 U3-P3-T1
6 U1-P3-T3 U3-P3-T4
Processor type SMP cables Cable number From
connector
To
connector
#4983, #4984, #5003
(0-12 core, 0-16 core)
#3712 2 U1-P3-T4 U2-P3-T4
#3713 3 U1-P3-T2 U3-P3-T1
4 U2-P3-T2 U3-P3-T2
#3714 5 U1-P3-T1 U4-P3-T4
6 U2-P3-T3 U4-P3-T3
7 U3-P3-T3 U4-P3-T3
#EP24 (0-24 core) #3716 4 U2-P3-T2 U3-P3-T2
2 U1-P3-T4 U2-P3-T4
#3717 7 U3-P3-T3 U4-P3-T3
3 U1-P3-T2 U3-P3-T1
6 U2-P3-T3 U4-P3-T4
#3718 5 U1-P3-T1 U4-P3-T1
Processor type SMP cables Cable number From
connector
To
connector66 IBM Power 770 and 780 Technical Overview and Introduction
Similarly, the Flexible Service Processor (FSP) flex cables must be installed in the correct
order (Figure 2-17), as follows:
1. Install a second node flex cable from node 1 to node 2.
2. Add a third node flex cable from node 1 and node 2 to node 3.
3. Add a fourth node flex cable from node 1 and node 2 to node 4.
Figure 2-17 FSP flex cables
The design of the Power 770 and Power 780 is optimized for use in an IBM 7014-T00 or
7014-T42 rack. Both the front cover and the external processor fabric cables occupy space on
the front left and right side of an IBM 7014 rack. Racks that are not from IBM might not offer
the same room. When a Power 770 or Power 780 is configured with two or more system
enclosures in a 7014-T42 or 7014-B42 rack, the CEC enclosures must be located in EIA 36 or
below to allow space for the flex cables.Chapter 2. Architecture and technical overview 67
The total width of the server, with cables installed, is 21 inches (Figure 2-18).
Figure 2-18 Front view of the rack with SMP cables overlapping the rack rails
SMP
Cable
Drawer 1 - A Left
to
Drawer 4 - A Left
SMP
Cable
Drawer 2 – BLeft
to
Drawer 3 - B Left
SMP
Cable
Drawer 1 – BLeft
to
Drawer 3 - A Left
SMP
Cable
Drawer 1 - B Right
to
Drawer 2 - B Right
SMP
Cable
Drawer 2 - A Right
to
Drawer 4 - B Right
SMP
Cable
Drawer 3 - A Right
to
Drawer 4 - A Right
SMP
Cable
Drawer 1 - A Left
to
Drawer 4 - A Left
SMP
Cable
Drawer 2 – BLeft
to
Drawer 3 - B Left
SMP
Cable
Drawer 1 – BLeft
to
Drawer 3 - A Left
SMP
Cable
Drawer 1 - B Right
to
Drawer 2 - B Right
SMP
Cable
Drawer 2 - A Right
to
Drawer 4 - B Right
SMP
Cable
Drawer 3 - A Right
to
Drawer 4 - A Right68 IBM Power 770 and 780 Technical Overview and Introduction
In the rear of the rack, the FSP cables require only some room in the left side of the racks
(Figure 2-19).
Figure 2-19 Rear view of rack with detail of FSP flex cables
2.6 System bus
This section provides additional information related to the internal buses.
2.6.1 I/O buses and GX++ card
Each CEC enclosure of the Power 770 and Power 780 contains one POWER7 processor
card, where as each processor card comprises either two single-chip module POWER7
processors (#4983, #4984, or #5003) or, new to the Power 780, four single-chip module
POWER7 processors (#EP24).
Within a CEC enclosure a total of four GX++ buses are available for I/O connectivity and
expansion. Two GX++ buses off one of the two sockets are routed through the midplane to
the I/O backplane and drive two POWER7 I/O chips (POWER7 IOC) on the I/O backplane.
The two remaining GX++ buses from either of the two sockets are routed to the midplane and
feed two GX++ adapter slots.
The optional GX++ 12X DDR Adapter Dual-port (#1808), which is installed in the GX++
adapter slot, enables the attachment of a 12X loop, which runs at either SDR or DDR speed
depending upon the 12X I/O drawers attached. These GX++ adapter slots are hot-pluggable
and do not share space with any of the PCIe slots.
Two Drawer Cable
Three Drawer Cable
Four Drawer CableChapter 2. Architecture and technical overview 69
Table 2-17 shows the I/O bandwidth for available processors cards.
Table 2-17 I/O bandwidth
2.6.2 Flexible Service Processor bus
The Flexible Service Processor (FSP) flex cable, which is located at the rear of the system, is
used for FSP communication between the system drawers. A FSP card (#5664) is installed in
system drawer 1 and system drawer 2, and the FSP/Clock Pass-Through card (#5665) is
installed in system drawer 3 and system drawer 4 for connecting FSP flex cable. The FSP
cable has been changed to point-to-point cabling similar to the processor drawer interconnect
cable. When a system drawer is added, another FSP flex cable is added. A detailed cable
configuration is discussed in 2.5, “CEC Enclosure interconnection cables” on page 63.
2.7 Internal I/O subsystem
The internal I/O subsystem resides on the I/O planar, which supports eight PCIe slots. All
PCIe slots are hot-pluggable and enabled with enhanced error handling (EEH). In the unlikely
event of a problem, EEH-enabled adapters respond to a special data packet that is generated
from the affected PCIe slot hardware by calling system firmware, which examines the
affected bus, allows the device driver to reset it, and continues without a system reboot.
Processor card Slot description Frequency Device Bandwidth
(maximum
theoretical)
#4983, #4984
or #5003
CPU Socket 0
(CP0) GX bus 1
2.464 Gbps P7IOC-B 19.712 GBps
CPU Socket 0
(CP0) GX bus 0
2.464 Gbps P7IOC-A 19.712 GBps
CPU Socket 1
(CP1) GX bus 1
2.464 Gbps GX slot 2 19.712 GBps
CPU Socket 1
(CP1) GX bus 0
2.464 Gbps GX slot 1 19.712 GBps
Single enclosure 78.848 GBps
Total (4x enclosures) 315.392 GBps
#EP24 CPU Socket 0
(CP0) GX bus 1
2.464 Gbps P7IOC-B 19.712 GBps
CPU Socket 2
(CP2) GX bus 0
2.464 Gbps P7IOC-A 19.712 GBps
CPU Socket 3
(CP3) GX bus 0
2.464 Gbps GX slot - lower 19.712 GBps
CPU Socket 1
(CP1) GX bus 1
2.464 Gbps GX slot - upper 19.712 GBps
Single enclosure 78.848 GBps
Total (4x enclosures) 315.392 GBps70 IBM Power 770 and 780 Technical Overview and Introduction
Table 2-18 lists the slot configuration of the Power 770 and Power 780.
Table 2-18 Slot configuration of the Power 770 and 780
2.7.1 Blind-swap cassettes
The Power 770 and Power 780 uses new fourth-generation blind-swap cassettes to manage
the installation and removal of adapters. This new mechanism requires an interposer card
that allows the PCIe adapters to plug in vertically to the system, allows more airflow through
the cassette, and provides more room under the PCIe cards to accommodate the GX+
multifunctional host bridge chip heat sink height. Cassettes can be installed and removed
without removing the CEC enclosure from the rack.
2.7.2 System ports
Each CEC enclosure is equipped with a Integrated Multifunction Card (#1768 or #1769). This
integrated card provides two USB ports, one serial port, and four Ethernet connectors for a
processor enclosure and does not require a PCIe slot. When ordering a Power 770 or
Power 780, the following options can be selected:
Dual 10 Gb Copper and Dual 1 Gb Ethernet (#1768)
Dual 10 Gb Optical and Dual 1 Gb Ethernet (#1769)
All of the connectors are on the rear bulkhead of the CEC, and one Integrated Multifunction
Card can be placed in an individual CEC enclosure. An Integrated Multifunction Card is
required in CEC enclosures one and two, but it is not required in CEC enclosures three or
four. On a multi-enclosure system, the Integrated Multifunction Card features can differ. The
copper twinax ports support up to 5 m cabling distances. The RJ-45 ports support up to
100 m cabling distance using a Cat5e cable. The optical ports only support the 850 Nm optic
cable (multi mode cable) and support up to 300 m cabling distances.
The Power 770 and Power 780 each support one serial port in the rear of the system. This
connector is a standard 9-pin male D-shell, and it supports the RS232 interface. Because the
Power 770 and Power 780 are managed by an HMC/SDMC, this serial port is always
controlled by the operating system, and therefore is available in any system configuration. It
is driven by the integrated PLX Serial chip, and it will support any serial device that has an
operating system device driver. The FSP virtual console will be on the HMC /SDMC.
Slot number Description Location code PCI host bridge (PHB) Max. card size
Slot 1 PCIe Gen2 x8 P2-C1 P7IOC A PCIe PHB5 Full length
Slot 2 PCIe Gen2 x8 P2-C2 P7IOC A PCIe PHB4 Full length
Slot 3 PCIe Gen2 x8 P2-C3 P7IOC A PCIe PHB3 Full length
Slot 4 PCIe Gen2 x8 P2-C4 P7IOC A PCIe PHB2 Full length
Slot 5 PCIe Gen2 x8 P2-C5 P7IOC B PCIe PHB5 Full length
Slot 6 PCIe Gen2 x8 P2-C6 P7IOC B PCIe PHB4 Full length
Slot 7 GX++ P1-C2 - -
Slot 8 GX++ P1-C3 - -Chapter 2. Architecture and technical overview 71
2.8 PCI adapters
This section covers the different types and functionalities of the PCI cards supported by IBM
Power 770 and Power 780 systems.
2.8.1 PCIe Gen1 and Gen2
Peripheral Component Interconnect Express (PCIe) uses a serial interface and allows for
point-to-point interconnections between devices (using a directly wired interface between
these connection points). A single PCIe serial link is a dual-simplex connection that uses two
pairs of wires, one pair for transmit and one pair for receive, and can transmit only one bit per
cycle. These two pairs of wires are called a lane. A PCIe link can consist of multiple lanes. In
such configurations, the connection is labeled as x1, x2, x8, x12, x16, or x32, where the
number is effectively the number of lanes.
Two generations of PCIe interface are supported in Power 770 and Power 780 models:
Gen1: Capable of transmitting at the extremely high speed of 2.5 Gbps, which gives a
capacity of a peak bandwidth of 2 GBps simplex on an x8 interface
Gen2: Double the speed of the Gen1 interface, which gives a capacity of a peak
bandwidth of 4 GBps simplex on an x8 interface
PCIe Gen1 slots support Gen1 adapter cards and also most of the Gen2 adapters. In this
case, where a Gen2 adapter is used in a Gen1 slot, the adapter will operate at PCIe Gen1
speed. PCIe Gen2 slots support both Gen1 and Gen2 adapters. In this case, where a Gen1
card is installed into a Gen2 slot, it will operate at PCIe Gen1 speed with a slight performance
enhancement. When a Gen2 adapter is installed into a Gen2 slot, it will operate at the full
PCIe Gen2 speed.
The IBM Power 770 and Power 780 CEC enclosure is equipped with six PCIe x8 Gen2 slots.
2.8.2 PCI-X adapters
IBM offers PCIe adapter options for the Power 770 and Power 780 CEC enclosure. If a
PCI-extended (PCI-X) adapter is required, a PCI-X DDR 12X I/O Drawer (#5796) can be
attached to the system by using a GX++ adapter loop. PCIe adapters use a different type
of slot than PCI and PCI-X adapters. If you attempt to force an adapter into the wrong type
of slot, you might damage the adapter or the slot. All adapters support Extended Error
Handling (EEH).
2.8.3 IBM i IOP adapters
IBM i IOP adapters are not supported with the Power 770 and Power 780, which has
these results:
Existing PCI adapters that require an IOP are affected.
Existing I/O devices are affected, such as certain tape libraries or optical drive libraries, or
any HVD SCSI device.
Twinax displays or printers cannot be attached except through an OEM protocol converter.
Before adding or rearranging adapters, use the System Planning Tool to validate the new
adapter configuration. See the System Planning Tool website:
http://www.ibm.com/systems/support/tools/systemplanningtool/72 IBM Power 770 and 780 Technical Overview and Introduction
If you are installing a new feature, ensure that you have the software required to support the
new feature, and determine whether there are any existing PTF prerequisites to install. See
the IBM Prerequisite website for information:
https://www-912.ibm.com/e_dir/eServerPreReq.nsf
2.8.4 PCIe adapter form factors
IBM POWER7 processor-based servers are able to support two form factors of
PCIe adapters:
PCIe low profile (LP) cards, which are used with the Power 710 and Power 730 PCIe
slots. Low profile adapters are also used in the PCIe riser card slots of the Power 720 and
Power 740 servers.
PCIe full-height and full-high cards, which are plugged into the following servers slots:
– Power 720 and Power 740 (Within the base system, five PCIe half-length slots
are supported.)
– Power 750
– Power 755
– Power 770
– Power 780
– Power 795
– PCIe slots of the #5802 and #5877 drawers
Low-profile PCIe adapter cards are only supported in low-profile PCIe slots, and full-height
and full-high cards are only supported in full-high slots.
Figure 2-20 lists the PCIe adapter form factors.
Figure 2-20 PCIe adapter form factors
Many of the full-height card features are available in low-profile format. For example, the
#5273 8 Gb dual port Fibre Channel adapter is the low-profile adapter equivalent of the
#5735 adapter full-height. They have equivalent functional characteristics.
• Low Profile PCIe Slots
– Power 710 / 730
– Power 720 / 740
• PCIe Expansion riser
Low Profile
Full Height Full High
• Full Height / High PCIe Slots
– Power 720 / 740 / 750 / 770 / 780
– 12X PCIe I/O Drawer
• #5802 / 5877
• #5803 / 5873Chapter 2. Architecture and technical overview 73
Table 2-19 is a list of low-profile adapter cards and their equivalent in full height.
Table 2-19 Equivalent adapter cards
Before adding or rearranging adapters, use the System Planning Tool to validate the new
adapter configuration. See the System Planning Tool website:
http://www.ibm.com/systems/support/tools/systemplanningtool/
If you are installing a new feature, ensure that you have the software required to support the
new feature and determine whether there are any existing update prerequisites to install. To
do this, use the IBM Prerequisite website:
https://www-912.ibm.com/e_dir/eServerPreReq.nsf
The following sections discuss the supported adapters and provide tables of orderable
feature numbers. The tables indicate operating support, AIX (A), IBM i (i), and Linux (L), for
each of the adapters.
2.8.5 LAN adapters
To connect a Power 770 and Power 780 to a local area network (LAN), you can use the
Integrated Multifunction Card. For more information see 2.7.2, “System ports” on page 70.
Low profile Adapter description Full height
Feature
code
CCIN Feature
code
CCIN
#2053 57CD PCIe RAID and SSD SAS Adapter 3 Gb Low
Profile
#2054 or
#2055
57CD or
57CD
#5269 POWER GXT145 PCI Express Graphics
Accelerator (LP)
#5748 5748
#5270 10 Gb FCoE PCIe Dual Port adapter (LP) #5708 2BCB
#5271 4-Port 10/100/1000 Base-TX PCI-Express
adapter
#5717 5717
#5272 10 Gigabit Ethernet-CX4 PCI Express adapter
(LP)
#5732 5732
#5273 8 Gigabit PCI Express Dual Port Fibre Channel
adapter (LP)
#5735 577D
#5274 2-Port Gigabit Ethernet-SX PCI Express
adapter (LP)
#5768 5768
#5275 10 Gb ENet Fibre RNIC PCIe 8x #5769 5769
#5276 4 Gigabit PCI Express Dual Port Fibre Channel
adapter (LP)
#5774 5774
#5277 4-Port Async EIA-232 PCIe adapter (LP) #5785
#5278 SAS Controller PCIe 8x #5901 57B3
Note: The Integrated Multifunction Card can be shared to LPARS using VIOS.74 IBM Power 770 and 780 Technical Overview and Introduction
Other LAN adapters are supported in the CEC enclosure PCIe slots or in I/O enclosures that
are attached to the system using a 12X technology loop. Table 2-20 lists the additional LAN
adapters that are available.
Table 2-20 Available LAN adapters
Feature
code
CCIN Adapter description Slot Size OS
support
#5269 10 Gigabit Ethernet-SR PCI Express
adapter
PCIe Short,LP A, L
#5287 PCIe2 2-port 10 GbE SR adapter PCIe Low profile
Short
A, L
#5288 PCIe2 2-Port 10 GbE SFP+Copper
adapter
PCIe Full height
Short
A, L
#5706 5706 IBM 2-Port 10/100/1000 Base-TX
Ethernet PCI-X adapter
PCI-X Full height
Short
A, i, L
#5717 5717 4-Port 10/100/1000 Base-TX PCI
Express adapter
PCIe Full height
Short
A, L
#5732 5732 10 Gigabit Ethernet-CX4 PCI Express
adapter
PCIe Full height
Short
A, L
#5740 4-Port 10/100/1000 Base-TX PCI-X
adapter
PCI-X Full height
Short
A, L
#5744 2B44 PCIe2 4-Port 10 GbE&1 GbE
SR&RJ45 adapter
PCIe Full high L
#5745 2B43 PCIe2 4-Port 10 GbE&1 GbE
SFP+Copper&RJ45 adapter
PCIe Short L
#5767 5767 2-Port 10/100/1000 Base-TX Ethernet
PCI Express adapter
PCIe Full height
Short
A, i, L
#5768 5768 2-Port Gigabit Ethernet-SX PCI
Express adapter
PCIe Full height
Short
A, i, L
#5769 5769 10 Gigabit Ethernet-SR PCI Express
adapter
PCIe Full height
Short
A, L
#5772 576E 10 Gigabit Ethernet-LR PCI Express
adapter
PCIe Full height
Short
A, i, LChapter 2. Architecture and technical overview 75
2.8.6 Graphics accelerator adapters
The IBM Power 770 and Power 780 support up to eight graphics adapters (Table 2-21). They
can be configured to operate in either 8-bit or 24-bit color modes. These adapters support
both analog and digital monitors, and do not support hot-plug. The total number of graphics
accelerator adapters in any one partition cannot exceed four.
Table 2-21 Available graphics accelerator adapters
2.8.7 SCSI and SAS adapters
To connect to external SCSI or SAS devices, the adapters that are listed in Table 2-22 are
available to be configured.
Table 2-22 Available SCSI and SAS adapters
Feature
code
CCIN Adapter description Slot Size OS
support
#2849
a
a. Supported, but no longer orderable.
2849 POWER GXT135P Graphics
Accelerator with Digital Support
PCI-X Short A, L
#5748 POWER GXT145 PCI Express
Graphics Accelerator
PCIe Short A, L
Feature
code
CCIN Adapter description Slot Size OS
support
#1912
a
a. Supported, but no longer orderable.
571A PCI-X DDR Dual Channel Ultra320
SCSI adapter
PCI-X Short A, i, L
#2055 57CD PCIe RAID and SSD SAS Adapter
3 Gb with Blind Swap Cassette
PCIe Short A, i, L
#5646 Blind Swap Type III Cassette- PCIe,
Short Slot
PCIe Short N/A
#5647 Blind Swap Type III Cassette- PCI-X or
PCIe, Standard Slot
PCI-X
or PCIe
Short N/A
#5736 571A PCI-X DDR Dual Channel Ultra320
SCSI adapter
PCI-X Short A, i, L
#5901 57B3 PCIe Dual-x4 SAS adapter PCIe Short A, i, L
#5903
a b
574E PCIe 380MB Cache Dual - x4 3 Gb
SAS RAID adapter
PCIe Short A, i, L
#5908 575C PCI-X DDR 1.5 GB Cache SAS RAID
adapter (BSC)
PCI-X Long A, i, L
#5912 572A PCI-X DDR Dual - x4 SAS adapter PCI-X Short A, i, L
#5913
b
b. A pair of adapters is required to provide mirrored write cache data and adapter redundancy
57B5 PCIe2 1.8 GB Cache RAID SAS
adapter Tri-port 6 Gb
PCIe A, i, L
#7863 PCI Blind Swap Cassette Kit, Double
Wide Adapters, Type III
PCI Short N/A76 IBM Power 770 and 780 Technical Overview and Introduction
Table 2-23 compares Parallel SCSI to SAS attributes.
Table 2-23 Comparing Parallel SCSI to SAS
2.8.8 iSCSI adapters
iSCSI adapters in Power Systems provide the advantage of increased bandwidth through the
hardware support of the iSCSI protocol. The 1 Gigabit iSCSI TOE (TCP/IP Offload Engine)
PCI-X adapters support hardware encapsulation of SCSI commands and data into TCP, and
transports them over the Ethernet using IP packets. The adapter operates as an iSCSI TOE.
This offload function eliminates host protocol processing and reduces CPU interrupts. The
adapter uses a small form factor LC type fiber optic connector or a copper RJ45 connector.
Table 2-24 lists the orderable iSCSI adapters.
Table 2-24 Available iSCSI adapters
Items to compare Parallel SCSI SAS
Architecture Parallel, all devices connected to
shared bus
Serial, point-to-point, discrete
signal paths
Performance 320 MBps (Ultra320 SCSI),
performance degrades as devices
are added to shared bus
3 Gbps, scalable to 12 Gbps,
performance maintained as more
devices are added
Scalability 15 drives Over 16,000 drives
Compatibility Incompatible with all other drive
interfaces
Compatible with Serial ATA (SATA)
Max. cable length 12 meters total (must sum lengths
of all cables used on bus)
8 meters per discrete connection,
total domain cabling hundreds of
meters
Cable from factor Multitude of conductors adds bulk,
cost
Compact connectors and cabling
save space, cost
Hot pluggability No Yes
Device identification Manually set, user must ensure no
ID number conflicts on bus
Worldwide unique ID set at time of
manufacture
Termination Manually set, user must ensure
proper installation and
functionality of terminators
Discrete signal paths enable
device to include termination by
default
Feature
code
CCIN Adapter description Slot Size OS
support
#5713 573B 1 Gigabit iSCSI TOE PCI-X on Copper
Media Adapter
PCI-X Short A, i, L
#5714
a
a. Supported, but no longer orderable
573C 1 Gigabit iSCSI TOE PCI-X on Optical
Media Adapter
PCI-X A, i, LChapter 2. Architecture and technical overview 77
2.8.9 Fibre Channel adapter
The IBM Power 770 and Power 780 servers support direct or SAN connection to devices that
use Fibre Channel adapters. Table 2-25 summarizes the available Fibre Channel adapters.
All of these adapters except #5735 have LC connectors. If you attach a device or switch with
an SC type fibre connector, an LC-SC 50 Micron Fiber Converter Cable (#2456) or an LC-SC
62.5 Micron Fiber Converter Cable (#2459) is required.
Table 2-25 Available Fibre Channel adapters
2.8.10 Fibre Channel over Ethernet
Fibre Channel over Ethernet (FCoE) allows for the convergence of Fibre Channel and
Ethernet traffic onto a single adapter and converged fabric.
Figure 2-21 shows a comparison between an existing FC and network connection and a
FCoE connection.
Figure 2-21 Comparison between existing FC and network connection and FCoE connection
Feature
code
CCIN Adapter description Slot Size OS
support
5729
a b
a. A Gen2 PCIe slot is required to provide the bandwidth for all four ports to operate at full speed.
PCIe2 8 Gb 4-port Fibre Channel
Adapter
PCIe A, L
5735
b
b. N_Port ID Virtualization (NPIV) capability is supported through VIOS.
577D 8 Gigabit PCI Express Dual Port Fibre
Channel Adapter
PCIe Short A, i, L
5749 576B 4 Gbps Fibre Channel (2-Port) PCI-X Short i
5758 1910
280D
280E
4 Gb Single-Port Fibre Channel PCI-X
2.0 DDR Adapter
PCI-X Short A, L
5759 1910
5759
4 Gb Dual-Port Fibre Channel PCI-X
2.0 DDR Adapter
PCI-X Short A, L
5774 5774 4 Gigabit PCI Express Dual Port Fibre
Channel Adapter
PCIe Short A, i, L
Ethernet
and Fibre
Channel
Cables
Ethernet
Cable
Fibre Channel
Cable
FC Switch
Ethernet Switch
CEC or I/O Drawer
Ethernet
CEC or I/O Drawer
FC
Rack
Fibre Channel (FC)
Device or FC Switch
Ethernet
Cables
Ethernet
Cable
Fibre Channel
Cable
FCoE Switch
CEC or I/O Drawer
Rack
Fibre Channel (FC)
Device or FC Switch
FCoE
Ethernet Device/
Switch
Ethernet Device/
Switch or FCoE
Device/Switch78 IBM Power 770 and 780 Technical Overview and Introduction
For more information about FCoE, read An Introduction to Fibre Channel over Ethernet, and
Fibre Channel over Convergence Enhanced Ethernet, REDP-4493.
IBM offers a 10 Gb FCoE PCIe Dual Port adapter (#5708). This is a high-performance 10 Gb
dual port PCIe Converged Network Adapter (CNA) utilizing SR optics. Each port can provide
Network Interface Card (NIC) traffic and Fibre Channel functions simultaneously. It is
supported on AIX and Linux for FC and Ethernet.
2.8.11 InfiniBand Host Channel adapter
The InfiniBand Architecture (IBA) is an industry-standard architecture for server I/O and
inter-server communication. It was developed by the InfiniBand Trade Association (IBTA) to
provide the levels of reliability, availability, performance, and scalability necessary for present
and future server systems with levels significantly better than can be achieved by using
bus-oriented I/O structures.
InfiniBand (IB) is an open set of interconnect standards and specifications. The main IB
specification has been published by the InfiniBand Trade Association and is available at:
http://www.infinibandta.org/
InfiniBand is based on a switched fabric architecture of serial point-to-point links, where these
IB links can be connected to either host channel adapters (HCAs), used primarily in servers,
or to target channel adapters (TCAs), used primarily in storage subsystems.
The InfiniBand physical connection consists of multiple byte lanes. Each individual byte lane
is a four-wire, 2.5, 5.0, or 10.0 Gbps bidirectional connection. Combinations of link width and
byte-lane speed allow for overall link speeds from 2.5 Gbps to 120 Gbps. The architecture
defines a layered hardware protocol, as well as a software layer to manage initialization and
the communication between devices. Each link can support multiple transport services for
reliability and multiple prioritized virtual communication channels.
For more information about InfiniBand, read HPC Clusters Using InfiniBand on IBM Power
Systems Servers, SG24-7767.
IBM offers the GX++ 12X DDR Adapter (#1808) that plugs into the system backplane (GX++
slot). There are two GX++ slots in each CEC enclosure. By attaching a 12X to 4X converter
cable (#1828), an IB switch can be attached.
Table 2-26 lists the available InfiniBand adapters.
Table 2-26 Available Fibre Channel adapters
2.8.12 Asynchronous adapter
Asynchronous PCI adapters provide connection of asynchronous EIA-232 or RS-422
devices.
Recent PowerHA releases no longer support heartbeats over serial connections.
Feature
code
CCIN Adapter description Slot Size OS
support
#1808 GX++ 12X DDR adapter, Dual-port GX++ A, L
#5285
a
a. Requires PCIe Gen2 full-high slot
2-Port 4X IB QDR adapter 40 Gb PCIe Full high A, LChapter 2. Architecture and technical overview 79
Table 2-27 lists the available InfiniBand adapters.
Table 2-27 Available asynchronous adapters
2.9 Internal storage
Serial Attached SCSI (SAS) drives the Power 770 and Power 780 internal disk subsystem.
SAS provides enhancements over parallel SCSI with its point-to-point high frequency
connections. SAS physical links are a set of four wires used as two differential signal pairs.
One differential signal transmits in one direction. The other differential signal transmits in the
opposite direction. Data can be transmitted in both directions simultaneously.
The Power 770 and Power 780 CEC enclosures have an extremely flexible and powerful
backplane for supporting hard disk drives (HDD) or solid-state drives (SSD). The six small
form factor (SFF) bays can be configured in three ways to match your business needs. There
are two integrated SAS controllers that can be optionally augmented with a 175 MB Cache
RAID - Dual IOA Enablement card (Figure 2-22 on page 81). These two controllers provide
redundancy and additional flexibility. The optional 175 MB Cache RAID - Dual IOA
Enablement Card (#5662) enables dual 175 MB write cache and provides dual batteries for
protection of that write cache.
There are two PCIe integrated SAS controllers under the POWER7 I/O chip and also the SAS
controller that is directly connected to the DVD media bay (Figure 2-22 on page 81).
Power 770 and Power 780 supports various internal storage configurations:
Dual split backplane mode: The backplane is configured as two sets of three bays (3/3).
Triple split backplane mode: The backplane is configured as three sets of two
bays (2/2/2).
Dual storage IOA configuration using internal disk drives (Dual RAID of internal drives
only): The backplane is configured as one set of six bays.
Dual storage IOA configuration using internal disk drives and external enclosure (Dual
RAID of internal drives and external drives).
Configuration options will vary depending on the controller options and the operating system
selected. The controllers for the dual split backplane configurations are always the two
embedded controllers. But if the triple split backplane configuration is used, the two integrated
SAS controllers run the first two sets of bays and require a #5901 PCIe SAS adapter located
in a PCIe slot in a CEC enclosure. This adapter controls the third set of bays. By having three
controllers, you can have three boot drives supporting three partitions.
Feature
code
CCIN Adapter description Slot Size OS
support
#2728 57D1 4-port USB PCIe adapter PCIe Short A, L
#5785 4-Port Asynchronous EIA-232 PCIe
adapter
PCIe Short A, L
#5289 2B42 2-Port Async EIA-232 PCIe adapter PCIe Short A, L80 IBM Power 770 and 780 Technical Overview and Introduction
You can configure the two embedded controllers together as a pair for higher redundancy or
you can configure them separately. If you configure them separately, they can be owned by
different partitions or they could be treated independently within the same partition. If
configured as a pair, they provide controller redundancy and can automatically switch over to
the other controller if one has problems. Also, if configured as a pair, both can be active at the
same time (active/active) assuming that there are two or more arrays configured, providing
additional performance capability as well as redundancy. The pair controls all six small form
factor (SFF) bays and both see all six drives. The dual split (3/3) and triple split (2/2/2)
configurations are not used with the paired controllers. RAID 0 and RAID 10 are supported,
and you can also mirror two sets of controller/drives using the operating system.
Power 770 and Power 780, with more than one CEC enclosure, support enclosures with
different internal storage configurations.
Adding the optional 175 MB Cache RAID - Dual IOA Enablement Card (#5662) causes the
pair of embedded controllers in that CEC drawer to be configured as dual controllers,
accessing all six SAS drive bays. With this feature you can get controller redundancy,
additional RAID protection options, and additional I/O performance. RAID 5 (a minimum of
three drives required) and RAID 6 (a minimum of four drives required) are available when
configured as dual controllers with one set of six bays. Feature #5662 plugs in to the disk or
media backplane and enables a 175 MB write cache on each of the two embedded RAID
adapters by providing two rechargeable batteries with associated charger circuitry.
The write cache can provide additional I/O performance for attached disk or solid-state drives,
particularly for RAID 5 and RAID 6. The write cache contents are mirrored for redundancy
between the two RAID adapters, resulting in an effective write cache size of 175 MB. The
batteries provide power to maintain both copies of write-cache information in the event that
power is lost.
Without feature #5662, each controller can access only two or three SAS drive bays.
Another expansion option is an SAS expansion port (#1819). The SAS expansion port can
add more SAS bays to the six bays in the system unit. A # 5886 EXP 12S SAS disk drawer is
attached using a SAS port on the rear of the processor drawer, and its two SAS bays are run
by the pair of embedded controllers. The pair of embedded controllers is now running 18 SAS
bays (six SFF bays in the system unit and twelve 3.5-inch bays in the drawer). The disk
drawer is attached to the SAS port with a SAS YI cable, and the embedded controllers are
connected to the port using a feature #1819 cable assembly. In this 18-bay configuration, all
drives must be HDDs.
IBM i supports configurations using one set of six bays but does not support logically splitting
the backplane into split (dual or triple). Thus, the 175 MB Cache RAID - Dual IOA Enablement
Card (#5662) is required if IBM is to access any of the SAS bays in that CEC enclosure. AIX
and Linux support configurations using two sets of three bays (3/3) or three sets of two bays
Note: These solid-state drives (SSD) or hard disk drive (HDD) configuration rules apply:
You can mix SSD and HDD drives when configured as one set of six bays.
If you want to have both SSDs and HDDs within a dual split configuration, you must use
the same type of drive within each set of three. You cannot mix SSDs and HDDs within
a subset of three bays.
If you want to have both SSDs and HDDs within a triple split configuration, you must
use the same type of drive within each set of two. You cannot mix SSDs and HDDs
within a subset of two bays. The #5901 PCIe SAS adapter that controls the remaining
two bays in a triple split configuration does not support SSDs.Chapter 2. Architecture and technical overview 81
(2/2/2) without feature 5662. With feature #5662, they support dual controllers running one
set of six bays.
Figure 2-22 shows the internal SAS topology overview.
Figure 2-22 Internal SAS topology overview
The system backplane also includes a third embedded controller for running the DVD-RAM
drive in the CEC enclosure. Because the controller is independent from the two other SAS
disk/SSD controllers, it allows the DVD to be switched between multiple partitions without
impacting the assignment of disks or SSDs in the CEC drawer.
Table 2-28 summarizes the internal storage combination and the feature codes required for
any combination.
Table 2-28 SAS configurations summary
SAS
subsystem
configuration
#5662 External SAS
components
SAS port
cables
SAS cables Notes
Two-way split
backplane
No None None N/A IBM i does not support
this combination.
Connecting to an
external disk
enclosure is not
supported.
Three-way
split backplane
No Dual x4 SAS
adapter
(#5901)
Internal SAS
port (#1815)
SAS cable
for three-way
split
backplane
AI cable
(#3679) -
Adapter to
internal drive
(1 meter)
IBM i does not support
this combination.
An I/O adapter can be
located in another
enclosure of the
system.
P7IOC
Integrated
SAS Adapter
P7IOC
DVD
DASD
DASD
DASD
VSES
Redriver
Integrated
SAS Adapter
Integrated
SAS Adapter
Redriver
SAS Port Exp.
A
SAS Port Exp.
B
Optional
Battery
Optional
Battery
External Port82 IBM Power 770 and 780 Technical Overview and Introduction
2.9.1 Dual split backplane mode
Dual split backplane mode offers two set of three disks and is the standard configuration. If
desired, one of the sets can be connected to an external SAS PCIe or PCI-X adapter if #1819
is selected. Figure 2-23 shows how the six disk bays are shared with the dual split backplane
mode. Although solid-state drives (SSDs) are supported with a dual split backplane
configuration, mixing SSDs and hard disk drives (HDDs) in the same split domain is not
supported. Also, mirroring SSDs with HDDs is not possible, or vice versa.
Figure 2-23 Dual split backplane overview
Dual storage
IOA with
internal disk
Yes None None N/A Internal SAS port
cable (#1815) cannot
be used with this or
HA RAID
configuration.
Dual storage
IOA with
internal disk
and external
disk enclosure
Yes Requires an
external disk
enclosure
(#5886)
Internal SAS
port (#1819)
SAS cable
assembly for
connecting
to an
external SAS
drive
enclosure
#3686 or
#3687
#3686 is a 1-meter
cable.
#3687 is a 3-meter
cable.
SAS
subsystem
configuration
#5662 External SAS
components
SAS port
cables
SAS cables Notes
P7IOC
Integrated
SAS Adapter
P7IOC
DVD
DASD
DASD
DASD
VSES
Redriver
Integrated
SAS Adapter
Integrated
SAS Adapter
Redriver
SAS Port Exp.
A
SAS Port Exp.
B
External PortChapter 2. Architecture and technical overview 83
2.9.2 Triple split backplane
The triple split backplane mode offers three sets of two disk drives each. This mode requires
#1815 internal SAS cable, a SAS cable #3679, and a SAS controller, such as #5901.
Figure 2-24 shows how the six disk bays are shared with the triple split backplane mode. The
PCI adapter that drives two of the six disks can be located in the same Power 770 (or
Power 780) CEC enclosure as the disk drives or adapter, even in a different system enclosure
or external I/O drawer.
Figure 2-24 Triple split backplane overview
Although SSDs are supported with a triple split backplane configuration, mixing SSDs
and HDDs in the same split domain is not supported. Also, mirroring SSDs with HDDs is
not possible.
2.9.3 Dual storage IOA configurations
The dual storage IOA configurations are available with internal or internal with external disk
drives from another I/O drawer. Solid-state drive (SSD) are not supported with this mode.
If #1819 is selected for an enclosure, selecting SAS cable #3686 or #3687 to support RAID
internal and external drives is necessary (Figure 2-25 on page 84). If #1819 is not selected
for the enclosure, the RAID supports only enclosure internal disks.
This configuration increases availability using dual storage IOA or high availability (HA) to
connect multiple adapters to a common set of internal disk drives. It also increases the
performance of RAID arrays. These rules apply to this configuration:
This configuration uses the 175 MB Cache RAID - Dual IOA enablement card.
Using the dual IOA enablement card, the two embedded adapters can connect to each
other and to all six disk drives, as well as the 12 disk drives in an external disk drive
enclosure if one is used.
P7IOC
Integrated
SAS Adapter
P7IOC
DVD
DASD
DASD
DASD
VSES
Redriver
Integrated
SAS Adapter
Integrated
SAS Adapter
Redriver
SAS Port Exp.
A
SAS Port Exp.
B
External Port
4-Port SAS Adapter84 IBM Power 770 and 780 Technical Overview and Introduction
The disk drives are required to be in RAID arrays.
There are no separate SAS cables required to connect the two embedded SAS RAID
adapters to each other. The connection is contained within the backplane.
RAID 0, 10, 5, and 6 support up to six drives.
Solid-state drives (SSD) and HDDs can be used, but can never be mixed in the same
disk enclosure.
To connect to the external storage, you need to connect to the #5886 disk drive enclosure.
Figure 2-25 shows the topology of the RAID mode.
Figure 2-25 RAID mode (external disk drives option)
2.9.4 DVD
The DVD media bay is directly connected to the integrated SAS controller on the I/O
backplane and has a specific chip (VSES) for controlling the DVD LED and power. The VSES
appears as a separate device to the device driver and operating systems (Figure 2-22 on
page 81).
Because the integrated SAS controller is independent from the two SAS disk/SSD controllers,
it allows the DVD to be switched between multiple partitions without impacting the
assignment of disks or SSDs in the CEC enclosure.
P7IOC
Integrated
SAS Adapter
P7IOC
DVD
DASD
DASD
DASD
VSES
Redriver
Integrated
SAS Adapter
Integrated
SAS Adapter
Redriver
SAS Port Exp.
A
SAS Port Exp.
B
Optional
Battery
Optional
Battery
External PortChapter 2. Architecture and technical overview 85
2.10 External I/O subsystems
This section describes the external 12X I/O subsystems that can be attached to the
Power 770 and Power 780, listed as follows:
PCI-DDR 12X Expansion Drawer (#5796)
12X I/O Drawer PCIe, small form factor (SFF) disk (#5802)
12X I/O Drawer PCIe, No Disk (#5877)
Table 2-29 provides an overview of all the supported I/O drawers.
Table 2-29 I/O drawer capabilities
The two GX++ buses from the second processor card feed two GX++ Adapter slots. An
optional GX++ 12X DDR Adapter, Dual-port (#1808), which is installed in GX++ Adapter slot,
enables the attachment of a 12X loop, which runs at either SDR or DDR speed depending on
the 12X I/ O drawers that are attached.
2.10.1 PCI-DDR 12X Expansion drawer
The PCI-DDR 12X Expansion Drawer (#5796) is a 4U (EIA units) drawer and mounts in a
19-inch rack. Feature #5796 is 224 mm (8.8 in.) wide and takes up half the width of the 4U
(EIA units) rack space. The 4U enclosure can hold up to two #5796 drawers mounted
side-by-side in the enclosure. The drawer is 800 mm (31.5 in.) deep and can weigh up to
20 kg (44 lb).
The PCI-DDR 12X Expansion Drawer has six 64-bit, 3.3 V, PCI-X DDR slots, running at
266 MHz, that use blind-swap cassettes and support hot-plugging of adapter cards. The
drawer includes redundant hot-plug power and cooling.
Two interface adapters are available for use in the #5796 drawer:
Dual-Port 12X Channel Attach Adapter Long Run (#6457)
Dual-Port 12X Channel Attach Adapter Short Run (#6446)
The adapter selection is based on how close the host system or the next I/O drawer in the
loop is physically located. Feature #5796 attaches to a host system CEC enclosure with a
12X adapter in a GX++ slot through SDR or DDR cables (or both SDR and DDR cables). A
maximum of four #5796 drawers can be placed on the same 12X loop. Mixing #5802/5877
and #5796 on the same loop is not supported.
A minimum configuration of two 12X cables (either SDR or DDR), two AC power cables, and
two SPCN cables is required to ensure proper redundancy.
Drawer
feature code
DASD PCI slots Requirements for the Power 770
and Power 780
#5796 - 6 x PCI-X GX++ adapter card #1808
#5802 18 x SAS disk drive bays 10 x PCIe GX++ adapter card #1808
#5877 - 10 x PCIe GX++ adapter card #180886 IBM Power 770 and 780 Technical Overview and Introduction
Figure 2-26 shows the back view of the expansion unit.
Figure 2-26 PCI-X DDR 12X Expansion Drawer rear side
2.10.2 12X I/O Drawer PCIe
The 12X I/O Drawer PCIe is a 19-inch I/O and storage drawer. It provides a 4U-tall (EIA units)
drawer containing 10 PCIe-based I/O adapter slots and 18 SAS hot-swap Small Form Factor
disk bays, which can be used for either disk drives or SSD (#5802). The adapter slots use
blind-swap cassettes and supports hot-plugging of adapter cards.
A maximum of two #5802 drawers can be placed on the same 12X loop. Feature #5877 is the
same as #5802 except that it does not support any disk bays. Feature #5877 can be on the
same loop as #5802. Feature #5877 cannot be upgraded to #5802.
The physical dimensions of the drawer are 444.5 mm (17.5 in.) wide by 177.8 mm (7.0 in.)
high by 711.2 mm (28.0 in.) deep for use in a 19-inch rack.
A minimum configuration of two 12X DDR cables, two AC power cables, and two SPCN
cables is required to ensure proper redundancy. The drawer attaches to the host CEC
enclosure with a 12X adapter in a GX++ slot through 12X DDR cables that are available in
various cable lengths:
0.6 m (#1861)
1.5 m (#1862)
3.0 m (#1865)
8 m (#1864)
The 12X SDR cables are not supported.
12X Port 1
(P1-C7-T2)
P1-C8-T3 P1-C1
P1-C2
P1-C3 P1-C4
P1-C5
P1-C6
12X Port 0
E1 (P1-C7-T1)
E2
SPCN 0
(P1-C8-T1)
SPCN 1
(P1-C8-T2)Chapter 2. Architecture and technical overview 87
Figure 2-27 shows the front view of the 12X I/O Drawer PCIe (#5802).
Figure 2-27 Front view of the 12X I/O Drawer PCIe
Figure 2-28 shows the rear view of the 12X I/O Drawer PCIe (#5802).
Figure 2-28 Rear view of the 12X I/O Drawer PCIe
2.10.3 Dividing SFF drive bays in 12X I/O drawer PCIe
Disk drive bays in the 12X I/O drawer PCIe can be configured as one, two, or four sets. This
allows for partitioning of disk bays. Disk bay partitioning configuration can be done via the
physical mode switch on the I/O drawer.
Power cables
Disk drives
Service card Port cards
10 PCIe cards X2 SAS connectors
SPCN Connectors
12X Connectors
Mode Switch
Note: A mode change using the physical mode switch requires power-off/on of the drawer.88 IBM Power 770 and 780 Technical Overview and Introduction
Figure 2-29 indicates the mode switch in the rear view of the #5802 I/O Drawer.
Figure 2-29 Disk bay partitioning in #5802 PCIe 12X I/O drawer
Each disk bay set can be attached to its own controller or adapter. The #5802 PCIe 12X I/O
Drawer has four SAS connections to drive bays. It can connect to a PCIe SAS adapter or to
controllers on the host system.
Figure 2-29 shows the configuration rule of disk bay partitioning in the #5802 PCIe 12X I/O
Drawer. There is no specific feature code for mode switch setting.
The SAS ports, as associated with the mode selector switch map to the disk bays, have the
mappings shown in Table 2-30.
Table 2-30 SAS connection mappings
Note: The IBM System Planning Tool supports disk bay partitioning. Also, the IBM
configuration tool accepts this configuration from IBM System Planning Tool and passes it
through IBM manufacturing using the Customer Specified Placement (CSP) option.
Location code Mappings Number of bays
P4-T1 P3-D1 to P3-D5 5 bays
P4-T2 P3-D6 to P3-D9 4 bays
P4-T3 P3-D10 to P3-D14 5 bays
P4-T3 P3-D15 to P3-D18 4 bays
MODE
SWITCH
1
2
4
#5802 12X I/O Drawer
AIX/Linux
• One set: 18 bays
• Two sets: 9 + 9 bays
• Four sets: 5 + 4 + 5 + 4 bays
IBMi
• Two sets: 9 + 9 bays
PCIe 12X I/O Drawer – SFF Drive BaysChapter 2. Architecture and technical overview 89
The location codes for the front and rear views of the #5802 I/O drawer are provided in
Figure 2-30 and Figure 2-31.
Figure 2-30 #5802 I/O drawer from view location codes
Figure 2-31 #5802 I/O drawer rear view location codes
Configuring the #5802 disk drive subsystem
The #5802 SAS disk drive enclosure can hold up 18 disk drives. The disks in this enclosure
can be organized in several configurations depending on the operating system used, the type
of SAS adapter card, and the position of the mode switch.
P3-D1
P3-D2
P3-D3
P3-D4
P3-D5
P3-D6
P3-D7
P3-C1
P3-C2
P3-D8
P3-D9
P3-D10
P3-D11
P3-C3
P3-C4
P3-D12
P3-D13
P3-D14
P3-D15
P3-D16
P3-D17
P3-D18
E1
E2
ARECW500-0
P1-C1
P1-C2
P1-C3
P1-C4
P1-T2
P1-C5
P1-C6
P1-C7
P1-C8
P1-C9
P1-C10
P4-T5
P2-T1
P2-T2
P2-T3
ARECW501-0
P1-T1
P4-T1
P4-T2
P4-T3
P4-T490 IBM Power 770 and 780 Technical Overview and Introduction
Each disk bay set can be attached to its own controller or adapter. The feature #5802 PCIe
12X I/O Drawer has four SAS connections to drive bays. It connects to PCIe SAS adapters or
controllers on the host systems.
For detailed information about how to configure, see the IBM Power Systems Hardware
Information Center:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp
2.10.4 12X I/O Drawer PCIe and PCI-DDR 12X Expansion Drawer 12X cabling
I/O Drawers are connected to the adapters in the CEC enclosure with data transfer cables:
12X DDR cables for the #5802 and #5877 I/O drawers
12X SDR and/or DDR cables for the #5796 I/O drawers
The first 12X I/O Drawer that is attached in any I/O drawer loop requires two data transfer
cables. Each additional drawer, up to the maximum allowed in the loop, requires one
additional data transfer cable. Note the following information:
A 12X I/O loop starts at a CEC bus adapter port 0 and attaches to port 0 of an I/O drawer.
The I/O drawer attaches from port 1 of the current unit to port 0 of the next I/O drawer.
Port 1 of the last I/O drawer on the 12X I/O loop connects to port 1 of the same CEC bus
adapter to complete the loop.
Figure 2-32 shows typical 12X I/O loop port connections.
Figure 2-32 Typical 12X I/O loop port connections
Table 2-31 shows various 12X cables to satisfy the various length requirements.
Table 2-31 12X connection cables
Feature code Description
#1861 0.6-meter 12X DDR cable
#1862 1.5-meter 12X DDR cable
#1865 3.0-meter 12X DDR cable
#1864 8.0-meter 12X DDR cable
I/O I/O
1 0 1 0
I/O
0 1
I/O
0 1
C
0
1Chapter 2. Architecture and technical overview 91
General rule for the 12X IO Drawer configuration
To optimize performance and distribute workload, use as many multiple GX++ buses as
possible. Figure 2-33 shows several examples of a 12X IO Drawer configuration.
Figure 2-33 12X IO Drawer configuration
Supported 12X cable length for PCI-DDR 12X Expansion Drawer
Each #5796 drawer requires one Dual Port PCI DDR 12X Channel Adapter, either Short Run
(#6446) or Long Run (#6457). The choice of adapters is dependent on the distance to the
next 12X Channel connection in the loop, either to another I/O drawer or to the system unit.
Table 2-32 identifies the supported cable lengths for each 12X channel adapter. I/O drawers
containing the short range adapter can be mixed in a single loop with I/O drawers containing
the long range adapter. In Table 2-32, a Yes indicates that the 12X cable identified in that
column can be used to connect the drawer configuration identified to the left. A No means that
it cannot be used.
Table 2-32 Supported 12X cable lengths
2.10.5 12X I/O Drawer PCIe and PCI-DDR 12X Expansion Drawer SPCN cabling
System Power Control Network (SPCN) is used to control and monitor the status of power
and cooling within the I/O drawer.
SPCN cables connect all ac-powered expansion units (Figure 2-34 on page 92):
1. Start at SPCN 0 (T1) of the first (top) CEC enclosure to J15 (T1) of the first expansion unit.
2. Cable all units from J16 (T2) of the previous unit to J15 (T1) of the next unit.
Connection type 12X cable options
0.6 m 1.5 m 3.0 m 8.0 m
#5796 to #5796 with #6446 in both drawers Yes Yes No No
#5796 with #6446 adapter to #5796 with #6457 adapter Yes Yes Yes No
#5796 to #5796 with #6457 adapter in both drawers Yes Yes Yes Yes
#5796 with #6446 adapter to system unit No Yes Yes No
#5796 with #6457 adapter to system unit No Yes Yes Yes
PCIe
770
780
PCIe
770
780
PCIe
770
780
770
780
770
780 PCI-X
PCIe
770
780 PCIe
PCIe
770
780 PCIe
PCIe
770
780
PCIe
One PCIe I/O Drawer
Two PCIe I/O Drawers
Three PCIe I/O Drawer
Four PCIe I/O Drawer92 IBM Power 770 and 780 Technical Overview and Introduction
3. From J16 (T2) of the final expansion unit, connect to the second CEC enclosure,
SPCN 1 (T2).
4. To complete the cabling loop, connect SPCN 1 (T2) of the topmost (first) CEC enclosure
to the SPCN 0 (T1) of the next (second) CEC.
5. Ensure that a complete loop exists from the topmost CEC enclosure, through all attached
expansions and back to the next lower (second) CEC enclosure.
Figure 2-34 SPCN cabling examples
Table 2-33 shows the SPCN cables to satisfy various length requirements.
Table 2-33 SPCN cables
2.11 External disk subsystems
This section describes the following external disk subsystems that can be attached to the
Power 770 and Power 780:
EXP 12S SAS Expansion Drawer (#5886) (supported, but no longer orderable)
EXP24S SFF Gen2-bay Drawer for high-density storage (#5887)
TotalStorage EXP24 Disk Drawer (#5786)
Note: Only the first two CEC enclosures of a multi-CEC system are included in
SPCN cabling with I/O expansion units. CEC enclosures number three and four are
not connected.
Feature code Description
#6006 SPCN cable drawer-to-drawer, 3 m
#6007 SPCN cable rack-to-rack, 15 m
SPCN Connections
I/O Units
I/O
Expansion
Unit
*J15(T1)
*J16(T2)
I/O
Expansion
Unit
*J15(T1)
*J16(T2)
I/O
Expansion
Unit
*J15(T1)
*J16(T2)
I/O
Expansion
Unit
*J15(T1)
*J16(T2)
Secondary 3 CEC
No connection to third
or fourth CEC
Secondary 2 CEC
No connection to third
or fourth CEC
P1-C1-T8 (SPCN0)
Secondary 1 CEC
P1-C1-T9 (SPCN1)
P1-C1-T8 (SPCN0)
Primary CEC
P1-C1-T9 (SPCN1)Chapter 2. Architecture and technical overview 93
IBM 7031 TotalStorage EXP24 Ultra320 SCSI Expandable Storage Disk Enclosure
(no longer orderable)
IBM System Storage
Table 2-29 on page 85 provides an overview of SAS external disks subsystems.
Table 2-34 I/O drawer capabilities
2.11.1 EXP 12S Expansion Drawer
The EXP 12S (#5886) is an expansion drawer with twelve 3.5-inch form factor SAS bays.
#5886 supports up to 12 hot-swap SAS HDDs or up to eight hot-swap SSDs. The EXP 12S
includes redundant ac power supplies and two power cords. Though the drawer is one set of
12 drives, which is run by one SAS controller or one pair of SAS controllers, it has two SAS
attachment ports and two service managers for redundancy. The EXP 12S takes up a 2 EIA
space in a 19-inch rack. The SAS controller can be a SAS PCI-X or PCIe adapter or pair
of adapters.
The drawer can either be attached using the backplane, providing an external SAS port, or
using one of the following adapters:
PCIe 380 MB Cache Dual -x4 3 Gb SAS RAID adapter (#5805)
PCI-X DDR Dual -x4 SAS adapter (#5900 is supported but no longer orderable)
PCIe Dual -x4 SAS adapter (#5901)
PCIe 380 MB Cache Dual -x4 3 Gb SAS RAID adapter (#5903 is supported but no longer
orderable)
PCI-X DDR 1.5 GB Cache SAS RAID adapter (#5904)
PCI-X DDR Dual -x4 SAS adapter (#5912)
PCIe2 1.8 GB Cache RAID SAS Adapter (#5913)
The SAS disk drives or SSD contained in the EXP 12S Expansion Drawer are controlled by
one or two PCIe or PCI-X SAS adapters connected to the EXP 12S Expansion Drawer
through SAS cables. The SAS cable varies, depending on the adapter being used, the
operating system being used, and the protection desired.
The large cache PCI-X DDR 1.5 GB Cache SAS RAID Adapter (#5904) and PCI-X DDR
1.5 GB Cache SAS RAID Adapter (BSC) (#5908) uses a SAS Y cable when a single port is
running the EXP 12S Expansion Drawer. A SAS X cable is used when a pair of adapters is
used for controller redundancy.
The medium cache PCIe 380 MB Cache Dual - x4 3 Gb SAS RAID Adapter (#5903) is always
paired and uses a SAS X cable to attach the feature #5886 I/O drawer.
The zero cache PCI-X DDR Dual - x4 SAS Adapter (#5912) and PCIe Dual-x4 SAS Adapter
(#5901) use a SAS Y cable when a single port is running the EXP 12S Expansion Drawer. A
SAS X cable is used for AIX or Linux environments when a pair of adapters is used for
controller redundancy.
Drawer
feature code
DASD PCI slots Requirements for a
Power 770 and Power 780
#5886 12 x SAS disk drive bays - Any supported SAS adapter
#5887 24x - Any supported SAS adapter94 IBM Power 770 and 780 Technical Overview and Introduction
The following SAS X cables are available for usage with a PCIe2 1.8 GB Cache RAID SAS
adapter (#5913):
3 meters (#3454)
6 meters (#3455)
10 meters (#3456)
In all of these configurations, all 12 SAS bays are controlled by a single controller or a single
pair of controllers.
A second EXP 12S Expansion Drawer can be attached to another drawer by using two SAS
EE cables, providing 24 SAS bays instead of 12 bays for the same SAS controller port. This
configuration is called cascading. In this configuration, all 24 SAS bays are controlled by a
single controller or a single pair of controllers.
There is a maximum of up to 110 EXP 12S Expansion Drawer on SAS PCI controllers.
The #5886 can be directly attached to the SAS port on the rear of the Power 770 and 780,
providing a very low-cost disk storage solution.
Adding the optional 175 MB Cache RAID - Dual IOA Enablement Card (#5662) to the
Power 770 and Power 780 causes the pair of embedded controllers in that processor
enclosure to be configured as dual controllers, accessing all six SAS bays. Using the internal
SAS Cable Assembly for SAS Port (#1819) connected to the rear port, the pair of embedded
controllers is now running 18 SAS bays (six SFF bays in the system unit and twelve 3.5-inch
bays in the drawer). The disk drawer is attached to the SAS port with a SAS YI cable. In this
18-bay configuration, all drives must be HDD.
A second unit cannot be cascaded to a EXP 12S Expansion Drawer attached in this way.
Various disk options are available to be installed in the EXP 12S Expansion Drawer.
Table 2-35 shows the available disk drive feature codes.
Table 2-35 Disk options for the EXP 12S drawer
For detailed information about the SAS cabling, see the Serial-attached SCSI cable
planning documentation:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7had/p7
hadsascabling.htm
Feature code Description OS support
#3586 69 GB 3.5" SAS Solid State Drive AIX, Linux
#3646 73.4 GB 15K RPM SAS Disk Drive AIX. Linux
#3647 146.8 GB 15K RPM SAS Disk Drive AIX, Linux
#3648 300 GB 15 K RPM SAS Disk Drive AIX, Linux
#3649 450 GB 15 K RPM SAS Disk Drive AIX, Linux
#3587 69 GB 3.5” SAS Solid State Drive IBM i
#3676 69.7 GB 15 K RPM SAS Disk Drive IBM i
#3677 139.5 GB 15 K RPM SAS Disk Drive IBM i
#3678 283.7 GB 15 K RPM SAS Disk Drive IBM i
#3658 428.4 GB 15 K RPM SAS Disk Drive IBM iChapter 2. Architecture and technical overview 95
2.11.2 EXP24S SFF Gen2-bay Drawer
The EXP24S SFF Gen2-bay Drawer (#5887) is an expansion drawer supporting up to 24
hot-swap 2.5-inch SFF SAS HDDs on POWER6 or POWER7 servers in 2U of 19-inch
rack space.
The SFF bays of the EXP24S are different from the SFF bays of the POWER7 system units
or 12X PCIe I/O Drawers (#5802, #5803). The EXP 24S uses Gen-2 or SFF-2 SAS drives
that physically do not fit in the Gen-1 or SFF-1 bays of the POWER7 system unit or 12X PCIe
I/O Drawers, or vice versa.
The EXP24S SAS ports are attached to SAS controllers, which can be a SAS PCI-X or PCIe
adapter or pair of adapters. The EXP24S SFF Gen2-bay Drawer can also be attached to an
imbedded SAS controller in a server with an imbedded SAS port. Attachment between the
SAS controller and the EXP24S SAS ports is via the appropriate SAS Y or X cables.
The SAS adapters/controllers that support the EXP24S are:
PCI-X 1.5 GB Cache SAS RAID Adapter 3 Gb (#5904, #5906, #5908)
PCIe 380 MB Cache SAS RAID Adapter 3 Gb (#5805, #5903)
PCIe Dual-x4 SAS Adapter 3 Gb (#5901, #5278)
PCIe2 1.8GB Cache RAID SAS Adapter (#5913)
The SAS disk drives contained in the EXP24S SFF Gen2-bay Drawer are controlled by one
or two PCIe or PCI-X SAS adapters connected to the EXP24S through SAS cables. The SAS
cable varies, depending on the adapter being used, the operating system being used, and the
protection desired.
Adding the optional 175 MB Cache RAID - Dual IOA Enablement Card (#5662) to the
Power 770 and Power 780 causes the pair of embedded controllers in that processor
enclosure to be configured as dual controllers, accessing all six SAS bays. Using the internal
SAS Cable Assembly for SAS Port (#1819) connected to the rear port, the pair of embedded
Note: The following consideration should be applied:
The large cache PCI-X DDR 1.5 GB Cache SAS RAID Adapter (#5904) and PCI-X
DDR 1.5 GB Cache SAS RAID Adapter (BSC) (#5908) use an SAS Y cable when a
single port is running the EXP24S. A SAS X cable is used when a pair of adapters is
used for controller redundancy.
The medium cache PCIe 380 MB Cache Dual - x4 3 Gb SAS RAID Adapter (# 5903) is
always paired and uses a SAS X cable to attach the feature #5887 I/O drawer.
The zero cache PCI-X DDR Dual - x4 SAS Adapter (#5912) and PCIe Dual-x4 SAS
Adapter (#5901) use a SAS Y cable when a single port is running the EXP24S. A
SAS X cable is used for AIX or Linux environments when a pair of adapters is used for
controller redundancy.
The PCIe Gen2 1.8GB Cache RAID SAS Adapter (#5913) uses SAS YO cables.
In all of these configurations, all 24 SAS bays are controlled by a single controller or a
single pair of controllers.
A second EXP24S drawer can be attached to another drawer by using two SAS EE
cables, providing 48 SAS bays instead of 24 bays for the same SAS controller port. This
configuration is called cascading. In this configuration, all 48 SAS bays are controlled
by a single controller or a single pair of controllers.
The EXP24S SFF Gen2-bay Drawer can be directly attached to the SAS port on the
rear of the Power 770 and Power 780, providing a very low-cost disk storage solution.96 IBM Power 770 and 780 Technical Overview and Introduction
controllers is now running 30 SAS bays (six SFF bays in the system unit and twenty-four
2.5-inch bays in the drawer). The disk drawer is attached to the SAS port with a SAS YI cable.
In this 30-bay configuration, all drives must be HDD.
A second unit cannot be cascaded to a EXP24S SFF Gen2-bay Drawer attached in this way.
The EXP24S SFF Gen2-bay Drawer can be ordered in one of three possible
manufacturing-configured MODE settings (not customer set-up) of 1, 2 or 4 sets of disk bays.
With IBM AIX/Linux/VIOS, the EXP 24S can be ordered with four sets of six bays (mode4),
two sets of 12 bays (mode 2), or one set of 24 bays (mode 1). With IBM i, the EXP24S can be
ordered as one set of 24 bays (mode 1).
There are six SAS connectors on the rear of the XP24S SFF Gen2-bay Drawer to which SAS
adapters/controllers are attached. They are labeled T1, T2, and T3, and there are two T1, two
T2, and two T3 (Figure 2-35).
In mode 1, two or four of the six ports are used. Two T2 are used for a single SAS
adapter, and two T2 and two T3 are used with a paired set of two adapters or dual
adapters configuration.
In mode 2 or mode 4, four ports will be used, two T2 and two T3, to access all SAS bays.
Figure 2-35 EXP24S SFF Gen2-bay Drawer rear connectors
An EXP24S SFF Gen2-bay Drawer in mode 4 can be attached to two or four SAS
controllers and provide a great deal of configuration flexibility. An EXP24S in mode 2 has
similar flexibility. Up to 24 HDDs can be supported with any of the supported SAS
adapters/controllers.
Note: Note the following information:
The modes for the EXP24S SFF Gen2-bay Drawer are set by IBM Manufacturing.
There is no option to reset after the drawer has been shipped.
If you order multiple EXP24S, avoid mixing modes within that order. There is no
externally visible indicator as to the drawer's mode.
Several EXP24S cannot be cascaded on the external SAS connector. Only one #5887
is supported.
The Power 770 or Power 780 supports up to 56 XP24S SFF Gen2-bay Drawers.Chapter 2. Architecture and technical overview 97
Include the EXP24S SFF Gen2-bay Drawer no-charge specify codes with EXP24S
orders to indicate to IBM Manufacturing the mode to which the drawer should be set and
the adapter/controller/cable configuration that will be used. Table 2-36 lists the no-charge
specify codes and the physical adapters/controllers/cables with their own chargeable
feature numbers.
Table 2-36 EXP 24S Cabling
These cabling options for the EXP 24S Drawer are available:
X cables for #5278
– 3 m (#3661)
– 6 m (#3662)’
– 15 m (#3663)
X cables for #5913 (all 6 Gb except for 15 m cable)
– 3 m (#3454)
– 6 m (#3455)
– 10 m (#3456)
YO cables for #5278
– 1.5 m (#3691)
– 3 m (#3692)
– 6 m (#3693)
– 15 m (#3694)
YO cables for #5913 (all 6 Gb except for 15 m cable)
– 1.5 m (#3450)
– 3 m (#3451)
– 6 m (#3452)
– 10 m (#3453)
YI cables for system unit SAS port (3 Gb)
– 1.5 m (#3686)
– 3 m (#3687)
Feature code Mode Adapter/controller Cable to drawer Environment
#9360 1 Pair #5901 2 YO Cables A, L, VIOS
#9361 2 Two #5901 2 YO Cables A, L, VIOS
#9365 4 Four #5901 2 X Cable A, L, VIOS
#9366 2 Two pair #5901 2 X Cables A, L, VIOS
#9367 1 Pair #5903, #5805 2 YO Cables A, i, L, VIOS
#9368 2 Four #5903, #5805 2 X Cables A, L, VIOS
#9382 1 One #5904/06/08 1 YO Cable A, i, L, VIOS
#9383 1 Pair #5904/06/08 2 YO Cables A, i, L, VIOS
#9384 1 CEC SAS port 1 YI Cable A, i, L, VIOS
#9385 1 Two #5913 2 YO Cables A, i, L, VIOS
#9386 2 Four #5913 4 X Cables A, L, VIOS98 IBM Power 770 and 780 Technical Overview and Introduction
For detailed information about the SAS cabling, see the serial-attached SCSI cable
planning documentation:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7had/p7
hadsascabling.htm
2.11.3 TotalStorage EXP24 disk drawer and tower
The TotalStorage EXP24 is available as a 4 EIA unit drawer and mounts in a 19-inch rack
(#5786). The front of the IBM TotalStorage EXP24 Ultra320 SCSI Expandable Storage Disk
Enclosure has bays for up to 12 disk drives organized in two SCSI groups of up to six drives.
The rear also has bays for up to 12 disk drives organized in two additional SCSI groups of up
to six drives, plus slots for the four SCSI interface cards. Each SCSI drive group can be
connected by either a Single Bus Ultra320 SCSI Repeater Card (#5741) or a Dual Bus
Ultra320 SCSI Repeater Card (#5742). This allows the EXP24 to be configured as four sets
of six bays, two sets of 12 bays, or two sets of six bays plus one set of 12 bays.
The EXP24 features #5786 and #5787 have three cooling fans and two power supplies to
provide redundant power and cooling. The SCSI disk drives contained in the EXP24 are
controlled by PCI-X SCSI adapters connected to the EXP24 SCSI repeater cards by SCSI
cables. The PCI-X adapters are located in the Power 740 system unit or in an attached I/O
drawer with PCI-X slots.
The 336 system maximum is achieved with a maximum of 24 disks in a maximum of 14
TotalStorage EXP24 disk drawers (#5786) or 14 TotalStorage EXP24 disk towers (#5787).
2.11.4 IBM TotalStorage EXP24
The IBM 7031 TotalStorage EXP24 Ultra320 SCSI Expandable Storage Disk Enclosure
supports up to 24 Ultra320 SCSI Disk Drives arranged in four independent SCSI groups of up
to six drives or in two groups of up to 12 drives. Each SCSI drive group can be connected by
either a Single Bus Ultra320 SCSI Repeater Card or a Dual Bus Ultra320 SCSI Repeater
Card, allowing a maximum of eight SCSI connections per TotalStorage EXP24.
The IBM 7031 Model D24 (7031-D24) is an Expandable Disk Storage Enclosure that is a
horizontal 4 EIA by 19-inch rack drawer for mounting in equipment racks.
The IBM 7031 Model T24 (7031-T24) is an Expandable Disk Storage Enclosure that is a
vertical tower for floor-standing applications.
Note: IBM plans to offer a 15-meter, 3 Gb bandwidth SAS cable for the #5913 PCIe2
1.8 GB Cache RAID SAS Adapter when attaching the EXP24S Drawer (#5887) for large
configurations where the 10 meter cable is a distance limitation.
The EXP24S Drawer rails are fixed length and designed to fit Power Systems provided
racks of 28 inches (711 mm) deep. EXP24S uses 2 EIA of space in a 19-inch wide rack.
Other racks might have different depths, and these rails will not adjust. No adjustable
depth rails are orderable at this time.
Note: The EXP24S SCSI disk drawer is an earlier technology drawer compared to the later
SAS EXP12S drawer. It is used to house existing SCSI disk drives that are supported, but
that are no longer orderable.Chapter 2. Architecture and technical overview 99
2.11.5 IBM System Storage
The IBM System Storage Disk Systems products and offerings provide compelling storage
solutions with superior value for all levels of business, from entry-level up to high-end
storage systems.
IBM System Storage N series
The IBM System Storage N series is a Network Attached Storage (NAS) solution and
provides the latest technology to customers to help them improve performance, virtualization
manageability, and system efficiency at a reduced total cost of ownership. For more
information about the IBM System Storage N series hardware and software, see:
http://www.ibm.com/systems/storage/network
IBM System Storage DS3000 family
The IBM System Storage DS3000 is an entry-level storage system designed to meet the
availability and consolidation needs for a wide range of users. New features, including larger
capacity 450 GB SAS drives, increased data protection features such as RAID 6, and more
FlashCopies per volume, provide a reliable virtualization platform. For more information about
the DS3000 family, see:
http://www.ibm.com/systems/storage/disk/ds3000/index.html
IBM System Storage DS5000
New DS5000 enhancements help reduce cost by introducing SSD drives. Also with the new
EXP5060 expansion unit supporting 60 1 TB SATA drives in a 4U package, customers can
see up to a one-third reduction in floor space over standard enclosures. With the addition of
1 Gbps iSCSI host attach, customers can reduce cost for their less demanding applications
while continuing to provide high performance where necessary, utilizing the 8 Gbps FC host
ports. With the DS5000 family, you get consistent performance from a smarter design that
simplifies your infrastructure, improves your TCO, and reduces your cost. For more
information about the DS5000 family, see:
http://www.ibm.com/systems/storage/disk/ds5000/index.html
IBM Storwize V7000 Midrange Disk System
IBM Storwize® V7000 is a virtualized storage system to complement virtualized server
environments that provides unmatched performance, availability, advanced functions, and
highly scalable capacity never seen before in midrange disk systems. Storwize V7000 is a
powerful midrange disk system that has been designed to be easy to use and enable rapid
deployment without additional resources. Storwize V7000 is virtual storage that offers greater
efficiency and flexibility through built-in solid state drive (SSD) optimization and thin
provisioning technologies. Storwize V7000 advanced functions also enable non-disruptive
migration of data from existing storage, simplifying implementation and minimizing disruption
to users. Storwize V7000 also enables you to virtualize and reuse existing disk systems,
Note: A new IBM 7031 TotalStorage EXP24 Ultra320 SCSI Expandable Storage Disk
Enclosure cannot be ordered for the Power 720 and Power 740, and thus only existing
7031-D24 drawers or 7031-T24 towers can be moved to the Power 720 and 740 servers.
AIX and Linux partitions are supported along with the usage of a IBM 7031 TotalStorage
EXP24 Ultra320 SCSI Expandable Storage Disk Enclosure.100 IBM Power 770 and 780 Technical Overview and Introduction
supporting a greater potential return on investment (ROI). For more information about
Storwize V7000, see:
http://www.ibm.com/systems/storage/disk/storwize_v7000/index.html
IBM XIV Storage System
IBM offers a mid-sized configuration of its self-optimizing, self-healing, resilient disk solution,
the IBM XIV® Storage System, storage reinvented for a new era. Now, organizations with
mid-size capacity requirements can take advantage of the latest IBM technology for their
most demanding applications with as little as 27 TB of usable capacity and incremental
upgrades. For more information about XIV, see:
http://www.ibm.com/systems/storage/disk/xiv/index.html
IBM System Storage DS8000
The IBM System Storage DS8000® family is designed to offer high availability, multiplatform
support, and simplified management tools. With its high capacity, scalability, broad server
support, and virtualization features, the DS8000 family is well suited for simplifying the
storage environment by consolidating data from multiple storage systems on a single system.
The high-end model DS8800 is the most advanced model in the IBM DS8000 family lineup
and introduces new dual IBM POWER6-based controllers that usher in a new level of
performance for the company’s flagship enterprise disk platform. The DS8800 offers twice the
maximum physical storage capacity than the previous model. For more information about the
DS8000 family, see:
http://www.ibm.com/systems/storage/disk/ds8000/index.html
2.12 Hardware Management Console
The Hardware Management Console (HMC) is a dedicated workstation that provides a
graphical user interface (GUI) for configuring, operating, and performing basic system tasks
for the POWER7 processor-based systems (and the POWER5, POWER5+, POWER6, and
POWER6+ processor-based systems) that function in either non-partitioned or clustered
environments. In addition, the HMC is used to configure and manage partitions. One HMC is
capable of controlling multiple POWER5, POWER5+, POWER6, and POWER6+, and
POWER7 processor-based systems.
Several HMC models are supported to manage POWER7 processor-based systems. Two
models (7042-C08, 7042-CR6) are available for ordering at the time of writing, but you can
also use one of the withdrawn models listed in Table 2-37.
Table 2-37 HMC models supporting POWER7 processor technology-based servers
Type-model Availability Description
7310-C05 Withdrawn IBM 7310 Model C05 Desktop Hardware Management Console
7310-C06 Withdrawn IBM 7310 Model C06 Deskside Hardware Management Console
7042-C06 Withdrawn IBM 7042 Model C06 Deskside Hardware Management Console
7042-C07 Withdrawn IBM 7042 Model C07 Deskside Hardware Management Console
7042-C08 Available IBM 7042 Model C08 Deskside Hardware Management Console
7310-CR3 Withdrawn IBM 7310 Model CR3 Rack-mounted Hardware Management ConsoleChapter 2. Architecture and technical overview 101
At the time of writing, the HMC must be running V7R7.4.0. It can also support up to 48
Power7 systems. Updates of the machine code, HMC functions, and hardware prerequisites,
can be found on the Fix Central website:
http://www-933.ibm.com/support/fixcentral/
2.12.1 HMC functional overview
The HMC provides three groups of functions:
Server
Virtualization
HMC management
Server management
The first group contains all functions related to the management of the physical servers under
the control of the HMC:
System password
Status Bar
Power On/Off
Capacity on Demand
Error management
– System indicators
– Error and event collection reporting
– Dump collection reporting
– Call Home
– Customer notification
– Hardware replacement (Guided Repair)
– SNMP events
Concurrent Add/Repair/Upgrade
Redundant Service Processor
Firmware Updates
Virtualization management
The second group contains all of the functions related to virtualization features, such as a
partition configuration or the dynamic reconfiguration of resources:
System Plans
System Profiles
Partitions (create, activate, shutdown)
Profiles
Partition Mobility
DLPAR (processors, memory, I/O, and so on)
Custom Groups
7042-CR4 Withdrawn IBM 7042 Model CR4 Rack-mounted Hardware Management Console
7042-CR5 Withdrawn IBM 7042 Model CR5 Rack-mounted Hardware Management Console
7042-CR6 Available IBM 7042 Model CR6 Rack-mounted Hardware Management Console
Note: An HMC is a mandatory requirement for the both the Power7 770 and 780 systems,
but it is possible to share an HMC with other Power systems.
Type-model Availability Description102 IBM Power 770 and 780 Technical Overview and Introduction
HMC Console management
The last group relates to the management of the HMC itself, its maintenance, security, and
configuration, for example:
Guided set-up wizard
Electronic Service Agent set up wizard
User Management
– User IDs
– Authorization levels
– Customizable authorization
Disconnect and reconnect
Network Security
– Remote operation enable and disable
– User definable SSL certificates
Console logging
HMC Redundancy
Scheduled Operations
Back-up and Restore
Updates, Upgrades
Customizable Message of the day
The HMC provides both a graphical interface and command-line interface (CLI) for all
management tasks. Remote connection to the HMC using a web browser (as of HMC
Version 7; previous versions required a special client program called WebSM) is possible.
The CLI is also available by using the Secure Shell (SSH) connection to the HMC. It can
be used by an external management system or a partition to remotely perform many
HMC operations.
2.12.2 HMC connectivity to the POWER7 processor-based systems
POWER5, POWER5+, POWER6, POWER6+, and POWER7 processor technology-based
servers that are managed by an HMC require Ethernet connectivity between the HMC and
the server’s Service Processor. In addition, if Dynamic LPAR, Live Partition Mobility, or
PowerVM Active Memory Sharing operations are required on the managed partitions,
Ethernet connectivity is needed between these partitions and the HMC. A minimum of two
Ethernet ports are needed on the HMC to provide such connectivity. The rack-mounted
7042-CR5 HMC default configuration provides four Ethernet ports. The deskside 7042-C07
HMC standard configuration offers only one Ethernet port. Be sure to order an optional PCIe
adapter to provide additional Ethernet ports.
For any logical partition in a server it is possible to use a Shared Ethernet Adapter that is
configured via a Virtual I/O Server. Therefore, a partition does not require its own physical
adapter to communicate with an HMC. Chapter 2. Architecture and technical overview 103
For the HMC to communicate properly with the managed server, eth0 of the HMC must be
connected to either the HMC1 or HMC2 ports of the managed server, although other network
configurations are possible. You can attach a second HMC to HMC Port 2 of the server for
redundancy (or vice versa). These must be addressed by two separate subnets. Figure 2-36
shows a simple network configuration to enable the connection from HMC to server and to
enable Dynamic LPAR operations. For more details about HMC and the possible network
connections, see Hardware Management Console V7 Handbook, SG24-7491.
Figure 2-36 HMC to service processor and LPARs network connection
The default mechanism for allocation of the IP addresses for the service processor HMC
ports is dynamic. The HMC can be configured as a DHCP server, providing the IP address at
the time the managed server is powered on. In this case, the FSPs are allocated IP address
from a set of address ranges predefined in the HMC software. These predefined ranges are
identical for Version 710 of the HMC code and for previous versions.
If the service processor of the managed server does not receive a DHCP reply before time
out, predefined IP addresses will be set up on both ports. Static IP address allocation is also
an option. You can configure the IP address of the service processor ports with a static IP
address by using the Advanced System Management Interface (ASMI) menus.
Power System
LPAR
n
LPAR
...
LPAR
2
LPAR
1
ent1 entx entx entx
Service
Processor
HMC 2
eth0
eth1 HMC 1
Management LAN104 IBM Power 770 and 780 Technical Overview and Introduction
2.12.3 High availability using the HMC
The HMC is an important hardware component. When in operation, POWER7
processor-based servers and their hosted partitions can continue to operate when no HMC is
available. However, in such conditions, certain operations cannot be performed, such as a
DLPAR reconfiguration, a partition migration using PowerVM Live Partition Mobility, or the
creation of a new partition. You might therefore decide to install two HMCs in a redundant
configuration so that one HMC is always operational, even when performing maintenance of
the other one, for example.
If redundant HMC function is desired, a server can be attached to two independent HMCs to
address availability requirements. Both HMCs must have the same level of Hardware
Management Console Licensed Machine Code Version 7 and installed fixes to manage
POWER7 processor-based servers or an environment with a mixture of POWER5,
POWER5+, POWER6, POWER6+, and POWER7 processor-based servers. The HMCs
provide a locking mechanism so that only one HMC at a time has write access to the service
processor. It is recommended that both HMCs are available on a public subnet to allow full
syncronization of functionality. Depending on your environment, you have multiple options to
configure the network.
Note: The service processor is used to monitor and manage the system hardware
resources and devices. The service processor offers two Ethernet 10/100 Mbps ports as
connections. Note the following information:
Both Ethernet ports are visible only to the service processor and can be used to attach
the server to an HMC or to access the ASMI options from a client web browser using
the HTTP server integrated into the service processor internal operating system.
When not configured otherwise (DHCP or from a previous ASMI setting), both Ethernet
ports of the first FSP have predefined IP addresses:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.147 with netmask
255.255.255.0.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.147 with netmask
255.255.255.0.
For the second FSP of IBM Power 770 and 780, these default addresses are:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.146 with netmask
255.255.255.0.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.146 with netmask
255.255.255.0.
For more information about the service processor, see “Service processor” on page 169.Chapter 2. Architecture and technical overview 105
Figure 2-37 shows one possible highly available HMC configuration managing two servers.
These servers have only one CEC and therefore only one FSP. Each HMC is connected to
one FSP port of all managed servers.
Figure 2-37 Highly available HMC and network architecture
Note that only hardware management networks (LAN1 and LAN2) are highly available
(Figure 2-37) for simplicity. However, the management network (LAN3) can be made
highly available by using a similar concept and adding more Ethernet adapters to LPARs
and HMCs.
Both HMCs must be on a separate VLAN to protect from any network contention. Each HMC
can be a DHCP server for its VLAN.
Redundant service processor connectivity
For the Power 770 and Power 780 with two or more CECs, two redundant service processors
are installed in CEC enclosures 1 and 2. Redundant service processor function requires that
each HMC must be attached to one Ethernet port in CEC enclosure 1 and one Ethernet port
in CEC enclosure 2.
HMC1 HMC2
System A System B
1 2
FSP
1 2
FSP
LAN 1 LAN 2
LPAR A1
LPAR A2
LPAR A3
LPAR B1
LPAR B2
LPAR B3
eth0 eth1
eth0 eth1
LAN1 – Hardware management network for
first FSP ports (private)
LAN2 – Hardware management network for
second FSP ports (private), separate
network hardware from LAN1
LAN3 – Open network for HMC access and
dLPAR operations
LAN3 – Open network106 IBM Power 770 and 780 Technical Overview and Introduction
Figure 2-38 shows a redundant HMC and redundant service processor connectivity
configuration.
Figure 2-38 Redundant HMC connection and redundant service processor configuration
In a configuration with multiple systems or MKS, the customer is required to provide switches
or hubs to connect each HMC to the server FSP Ethernet ports in each system:
One HMC should connect to the port labeled HMC Port 1 on the first two CEC drawers
of each system.
A second HMC must be attached to HMC Port 2 on the first two CEC drawers of
each system.
This solution provides redundancy for both the MKS and the service processors.
HMC1 HMC2
CEC 1 CEC 2
1 2
FSP
1 2
FSP
LAN 1 LAN 2
LPAR 1
LPAR 2
LPAR 3
eth0 eth1
eth0 eth1
LAN1 – Hardware management network for
first FSP ports (private)
LAN2 – Hardware management network for
second FSP ports (private), separate
network hardware from LAN1
LAN3 - Open network for HMC access and
dLPAR operations
LAN3 – Open networkChapter 2. Architecture and technical overview 107
Figure 2-39 describes the four possible Ethernet connectivity options between MKS and
fuses.
Figure 2-39 Summary of HMC to FSP configuration option depending on number of CEC
For details about redundant MKS, see Hardware Management Console V7 Handbook,
SG24-7491.
HMC code level
The HMC code must be at V7R7.4.0 to support the Power 770 and Power 780 systems.
In a dual HMC configuration, both must be at the same version and release of the HMC.
Drawer 1
FSP Card
Enet 1
Enet 2
HUB 0
Configuration #1 – Single drawer and one HMC
Note: HUB is optional.
Customer can have a direct connection to the FSP card.
HMC #1
Enet
Drawer 1
FSP Card
Enet 1
Enet 2
HUB 0
Configuration #2 – Single drawer and two HMCs
Note: HUB is optional.
HMC #1
Enet
HMC #2 HUB 1
Enet
Drawer 1
FSP Card
Enet 1
Enet 2
Configuration #3 – Multi-drawer with one HMC
HMC #1 HUB 0
Enet
Drawer 2
FSP Card
Enet 1
Enet 2
Drawer 1
FSP Card
Enet 1
Enet 2
Configuration #4 – Multi-drawer with two HMCs
Drawer 2
FSP Card
Enet 1
Enet 2
HMC #2 HUB 1
Enet
HMC #1 HUB 0
Enet108 IBM Power 770 and 780 Technical Overview and Introduction
If you want to migrate an LPAR from a POWER6 processor-based server onto a POWER7
processor-based server using PowerVM Live Partition Mobility, consider this: If the source
server is managed by one HMC and the destination server is managed by a different HMC,
ensure that the HMC managing the POWER6 processor-based server is at V7R7.3.5 or later
and the HMC managing the POWER7 processor-based server is at V7R7.4.0 or later.
2.13 IBM Systems Director Management Console
The newly released IBM Systems Director Management Console (SDMC) is intended to be
used in the same manner as the HMC. It provides the same functionality, including hardware,
service, and virtualization management, for Power Systems server and Power Systems
blades. Because SDMC uses IBM Systems Director Express Edition, it also provides all
Systems Director Express capabilities, such as monitoring of operating systems and creating
event action plans.
No configuration changes are required when a client moves from HMC management to
SDMC management.
Much of the SDMC function is equivalent to the HMC. This includes:
Server (host) management.
Virtualization management.
Redundancy and high availability: The SDMC offers console redundancy similar to the HMC.
The scalability and performance of the SDMC matches that of a current HMC. This includes
both the number of systems (hosts) and the number of partitions (virtual servers) that can be
managed. Currently, 48 small-tier entry servers or 32 large-tier servers can be managed by
the SDMC with up to 1,024 partitions (virtual servers) configured across those managed
systems (hosts).
The SDMC can be obtained as a hardware appliance in the same manner as an HMC.
Hardware appliances support managing all Power Systems servers. The SDMC can
optionally be obtained in a virtual appliance format, capable of running on VMware (ESX/i 4,
Tips: Note the following tips:
When upgrading the code of a dual HMC configuration, a good practice is to disconnect
one HMC to avoid having both HMCs connected to the same server but running different
levels of code. If no profiles or partition changes take place during the upgrade, both
HMCs can stay connected. If the HMCs are at different levels and a profile change is
made from the HMC at level V7R7.4.0, for example, the format of the data stored in the
server could be changed, causing the HMC at a previous level (for example, 3.50) to
possibly go into a recovery state because it does not understand the new data format.
Compatibility rules exist between the various software that is executing within a
POWER7 processor-based server environment:
– HMC
– VIO
– System firmware
– Partition operating systems
To check which combinations are supported and to identify required upgrades, you can
use the Fix Level Recommendation Tool web page:
http://www14.software.ibm.com/webapp/set2/flrt/homeChapter 2. Architecture and technical overview 109
or later), and KVM (Red Hat Enterprise Linux (RHEL) 5.5). The virtual appliance is only
supported or managing small-tier Power servers and Power Systems blades.
Table 2-38 and Table 2-38 detail whether the SDMC software appliance, hardware appliance,
or both are supported for each model.
Table 2-38 Type of SDMC appliance support for POWER7-based server
The IBM SDMC Hardware Appliance requires an IBM 7042-CR6 Rack-mounted Hardware
Management Console and IBM SDMC indicator (#0963).
Note: At the time of writing, the SDMC is not supported for the Power 770 (9117-MMC)
and Power 780 (9179-MHC) models.
IBM intends to enhance the IBM Systems Director Management Console (SDMC) to
support the Power 770 (9117-MMC) and Power 780 (9179-MHC). IBM also intends for the
current HMC 7042-CR6 to be upgradable to an IBM SDMC that supports the Power 770
(9117-MMC) and Power 780 (9179-MHC).
POWER7 models Type of SDMC appliance supported
7891-73X (IBM BladeCenter® PS703) Hardware or software appliance
7891-74X (IBM BladeCenter PS704) Hardware or software appliance
8202-E4B (IBM Power 720 Express) Hardware or software appliance
8205-E6B (IBM Power 740 Express) Hardware or software appliance
8406-70Y (IBM BladeCenter PS700) Hardware or software appliance
8406-71Y (IBM BladeCenter PS701 and PS702) Hardware or software appliance
8231-E2B (IBM Power 710 and IBM Power 730 Express) Hardware or software appliance
8233-E8B (IBM Power 750 Express) Hardware or software appliance
8236-E8C (IBM Power 755) Hardware or software appliance
9117-MMB (IBM Power 770) Hardware appliance only
9179-MHB (IBM Power 780) Hardware appliance only
9119-FHB (IBM Power 795) Hardware appliance only
Note: When ordering #0963, the features #0031(No Modem), #1946 (additional 4 GB
memory), and #1998 (additional 500 GB SATA HDD) are being configured automatically.
Feature #0963 replaces the HMC software with IBM Systems Director Management
Console Hardware Appliance V6.7.3 (5765-MCH).
Neither an external modem (#0032) nor an internal modem (#0033) can be selected with
IBM SDMC indicator (#0963).
To run HMC LMC (#0962), you cannot order the additional storage (#1998). However, you
can order the additional memory (#1946).110 IBM Power 770 and 780 Technical Overview and Introduction
The IBM SDMC Virtual Appliance requires an IBM Systems Director Management Console
V6.7.3 (5765-MCV).
SDMC on POWER6 processor-based servers and blades requires eFirmware level 3.5.7.
SDMC on Power Systems POWER7 processor-based servers and blades requires
eFirmware level 7.3.0.
For more detailed information about the SDMC, see IBM Systems Director Management
Console: Introduction and Overview, SG24-7860.
2.14 Operating system support
The IBM POWER7 processor-based systems support three families of operating systems:
AIX
IBM i
Linux
In addition, the Virtual I/O Server can be installed in special partitions that provide support to
the other operating systems for using features such as virtualized I/O devices, PowerVM Live
Partition Mobility, or PowerVM Active Memory Sharing.
2.14.1 Virtual I/O Server
The minimum required level of Virtual I/O server for both the Power 770 and Power 780 is
VIOS 2.2.1.0.
IBM regularly updates the Virtual I/O Server code. To find information about the latest
updates, visit the Fix Central website:
http://www-933.ibm.com/support/fixcentral/
2.14.2 IBM AIX operating system
The following sections discuss the various levels of AIX operating system support.
Note: If you want to use the software appliance, you have to provide the hardware and
virtualization environment.
At a minimum, the following resources must be available to the virtual machine:
2.53 GHz Intel Xeon E5630, Quad Core processor
500 GB storage
8 GB memory
The following hypervisors are supported:
VMware (ESXi 4.0 or later)
KVM (RHEL 5.5)
Note: For details about the software available on IBM Power Systems, visit the Power
Systems Software™ website:
http://www.ibm.com/systems/power/software/index.htmlChapter 2. Architecture and technical overview 111
IBM periodically releases maintenance packages (service packs or technology levels) for the
AIX operating system. Information about these packages, downloading, and obtaining the
CD-ROM is on the Fix Central website:
http://www-933.ibm.com/support/fixcentral/
The Fix Central website also provides information about how to obtain the fixes shipping
on CD-ROM.
The Service Update Management Assistant, which can help you to automate the task of
checking and downloading operating system downloads, is part of the base operating
system. For more information about the suma command, go to following website:
http://www14.software.ibm.com/webapp/set2/sas/f/genunix/suma.html
IBM AIX Version 5.3
The minimum level of AIX Version 5.3 to support the Power 770 and Power 780 is AIX 5.3
with the 5300-12 Technology Level and Service Pack 5 or later.
A partition using AIX Version 5.3 will be executing in POWER6 or POWER6+ compatibility
mode. This means that although the POWER7 processor has the ability to run four hardware
threads per core simultaneously, using AIX 5.3 limits the number of hardware threads per
core to two.
IBM AIX Version 6.1
The minimum level of AIX Version 6.1 to support the Power 770 and Power 780 is:
AIX 6.1 with the 6100-07 Technology Level or later
AIX 6.1 with the 6100-06 Technology Level and Service Pack 6 or later
AIX 6.1 with the 6100-05 Technology Level and Service Pack 7 or later
A partition using AIX 6.1 with TL6 can run in POWER6, POWER6+, or POWER7 mode. It is
best to run the partition in POWER7 mode to allow exploitation of new hardware capabilities
such as SMT4 and Active Memory Expansion (AME).
IBM AIX Version 7.1
The minimum level of AIX Version 7.1 to support the Power 770 and Power 780 is:
AIX 7.1 with the 7100-01 Technology Level or later
AIX 7.1 with the 7100-00 Technology Level and Service Pack 4 or 1 later
A partition using AIX 7.1 can run in POWER6, POWER6+, or POWER7 mode. It is best to run
the partition in POWER7 mode to allow exploitation of new hardware capabilities such as
SMT4 and AME.
2.14.3 IBM i operating system
The IBM i operating system is supported on the Power 720 and Power 740 with these
minimum required levels:
IBM i Version 6.1 with i 6.1.1 machine code or later
IBM i Version 7.1 or later
IBM periodically releases maintenance packages (service packs or technology levels) for the
IBM i operating system. Information about these packages, downloading, and obtaining the
CD-ROM is on the Fix Central website:
http://www-933.ibm.com/support/fixcentral/112 IBM Power 770 and 780 Technical Overview and Introduction
2.14.4 Linux operating system
Linux is an open source operating system that runs on numerous platforms from embedded
systems to mainframe computers. It provides a UNIX-like implementation across many
computer architectures.
The supported versions of Linux on POWER7 processor-based servers are:
SUSE Linux Enterprise Server 11 Service Pack 1, or later, with current maintenance
updates available from SUSE to enable all planned functionality
Red Hat Enterprise Linux AP 5 Update 7 for POWER, or later
Red Hat Enterprise Linux 6.1 for POWER, or later
If you want to configure Linux partitions in virtualized Power Systems, you have to be aware
of these conditions:
Not all devices and features that are supported by the AIX operating system are supported
in logical partitions running the Linux operating system.
Linux operating system licenses are ordered separately from the hardware. You can
acquire Linux operating system licenses from IBM to be included with the POWER7
processor-based servers, or from other Linux distributors.
For information about the features and external devices supported by Linux, go to:
http://www.ibm.com/systems/p/os/linux/index.html
For information about SUSE Linux Enterprise Server 10, go to:
http://www.novell.com/products/server
For information about Red Hat Enterprise Linux Advanced Server, go to:
http://www.redhat.com/rhel/features
2.14.5 Java supported versions
There are unique considerations when running Java 1.4.2 on POWER7 servers. For best
exploitation of the outstanding performance capabilities and most recent improvements of
POWER7 technology, IBM recommends upgrading Java-based applications to Java 7,
Java 6, or Java 5 whenever possible. For more information, visit:
http://www.ibm.com/developerworks/java/jdk/aix/service.html
2.14.6 Boosting performance and productivity with IBM compilers
IBM XL C, XL C/C++, and XL Fortran compilers for AIX and for Linux exploit the latest
POWER7 processor architecture. Release after release, these compilers continue to help
improve application performance and capability, exploiting architectural enhancements made
available through the advancement of the POWER technology.
IBM compilers are designed to optimize and tune your applications for execution on IBM
POWER platforms, to help you unleash the full power of your IT investment, to create and
maintain critical business and scientific applications, to maximize application performance,
and to improve developer productivity.
The performance gain from years of compiler optimization experience is seen in the
continuous release-to-release compiler improvements that support the POWER4 processors, Chapter 2. Architecture and technical overview 113
through to the POWER4+, POWER5, POWER5+, and POWER6 processors, and now
including the new POWER7 processors. With the support of the latest POWER7 processor
chip, IBM advances a more than 20-year investment in the XL compilers for POWER series
and PowerPC® series architectures.
XL C, XL C/C++, and XL Fortran features introduced to exploit the latest POWER7 processor
include vector unit and vector scalar extension (VSX) instruction set to efficiently manipulate
vector operations in your application, vector functions within the Mathematical Acceleration
Subsystem (MASS) libraries for improved application performance, built-in functions or
intrinsics and directives for direct control of POWER instructions at the application level, and
architecture and tune compiler options to optimize and tune your applications.
COBOL for AIX enables you to selectively target code generation of your programs to
either exploit POWER7 systems architecture or to be balanced among all supported
POWER systems. The performance of COBOL for AIX applications is improved by means
of an enhanced back-end optimizer. The back-end optimizer, a component common also
to the IBM XL compilers lets your applications leverage the latest industry-leading
optimization technology.
The performance of PL/I for AIX applications has been improved through both front-end
changes and back-end optimizer enhancements. The back-end optimizer, a component
common also to the IBM XL compilers, lets your applications leverage the latest
industry-leading optimization technology. For PL/I it will produce code that is intended to
perform well across all hardware levels, including POWER7 of AIX.
IBM Rational® Development Studio for IBM i 7.1 provides programming languages for
creating modern business applications. This includes the ILE RPG, ILE COBOL, C, and C++
compilers as well as the heritage RPG and COBOL compilers. The latest release includes
performance improvements and XML processing enhancements for ILE RPG and ILE
COBOL, improved COBOL portability with a new COMP-5 data type, and easier Unicode
migration with relaxed USC2 rules in ILE RPG. Rational has also released a product called
Rational Open Access: RPG Edition. This product opens the ILE RPG file I/O processing,
enabling partners, tool providers, and users to write custom I/O handlers that can access
other devices like databases, services, and web user interfaces.
IBM Rational Developer for Power Systems Software provides a rich set of integrated
development tools that support the XL C/C++ for AIX compiler, the XL C for AIX compiler, and
the COBOL for AIX compiler. Rational Developer for Power Systems Software offers
capabilities of file management, searching, editing, analysis, build, and debug, all integrated
into an Eclipse workbench. XL C/C++, XL C, and COBOL for AIX developers can boost
productivity by moving from older, text-based, command-line development tools to a rich set
of integrated development tools.
The IBM Rational Power Appliance solution provides a workload-optimized system and
integrated development environment for AIX development on IBM Power Systems. IBM
Rational Power Appliance includes a Power Express server preinstalled with a
comprehensive set of Rational development software along with the AIX operating system.
The Rational development software includes support for Collaborative Application Lifecycle
Management (C/ALM) through Rational Team Concert™, a set of software development tools
from Rational Developer for Power Systems Software, and a choice between the XL C/C++
for AIX or COBOL for AIX compilers.114 IBM Power 770 and 780 Technical Overview and Introduction
2.15 Energy management
The Power 770 and 780 servers are designed with features to help clients become more
energy efficient. The IBM Systems Director Active Energy Manager exploits EnergyScale
technology, enabling advanced energy management features to dramatically and dynamically
conserve power and further improve energy efficiency. Intelligent Energy optimization
capabilities enable the POWER7 processor to operate at a higher frequency for increased
performance and performance per watt or dramatically reduce frequency to save energy.
2.15.1 IBM EnergyScale technology
IBM EnergyScale technology provides functions to help the user understand and dynamically
optimize the processor performance versus processor energy consumption, and system
workload, to control IBM Power Systems power and cooling usage.
On POWER7 processor-based systems, the thermal power management device (TPMD)
card is responsible for collecting the data from all system components, changing operational
parameters in components, and interacting with the IBM Systems Director Active Energy
Manager (an IBM Systems Directors plug-in) for energy management and control.
IBM EnergyScale makes use of power and thermal information collected from the system in
order to implement policies that can lead to better performance or better energy utilization.
IBM EnergyScale features include:
Power trending
EnergyScale provides continuous collection of real-time server energy consumption. This
enables administrators to predict power consumption across their infrastructure and to
react to business and processing needs. For example, administrators can use such
information to predict datacenter energy consumption at various times of the day, week,
or month.
Thermal reporting
IBM Director Active Energy Manager can display measured ambient temperature and
calculated exhaust heat index temperature. This information can help identify data center
hot spots that need attention.
Power saver mode
Power saver mode lowers the processor frequency and voltage on a fixed amount,
reducing the energy consumption of the system while still delivering predictable
performance. This percentage is predetermined to be within a safe operating limit and
is not user configurable. The server is designed for a fixed frequency drop of up to
30% down from nominal frequency (the actual value depends on the server type and
configuration). Power saver mode is not supported during boot or re-boot, although it is
a persistent condition that will be sustained after the boot when the system starts
executing instructions.
Dynamic power saver mode
Dynamic power saver mode varies processor frequency and voltage based on the
utilization of the POWER7 processors. Processor frequency and utilization are inversely
proportional for most workloads, implying that as the frequency of a processor increases,
its utilization decreases, given a constant workload. Dynamic power saver mode takes
advantage of this relationship to detect opportunities to save power, based on measured
real-time system utilization.Chapter 2. Architecture and technical overview 115
When a system is idle, the system firmware will lower the frequency and voltage to power
energy saver mode values. When fully utilized, the maximum frequency will vary
depending on whether the user favors power savings or system performance. If an
administrator prefers energy savings and a system is fully utilized, the system is designed
to reduce the maximum frequency to 95% of nominal values. If performance is favored
over energy consumption, the maximum frequency can be increased to up to 109% of
nominal frequency for extra performance.
Dynamic power saver mode is mutually exclusive with power saver mode. Only one of
these modes can be enabled at a given time.
Power capping
Power capping enforces a user-specified limit on power usage. Power capping is not a
power-saving mechanism. It enforces power caps by throttling the processors in the
system, degrading performance significantly. The idea of a power cap is to set a limit that
must never be reached but that frees up extra power never used in the data center. The
margined power is this amount of extra power that is allocated to a server during its
installation in a datacenter. It is based on the server environmental specifications that
usually are never reached because server specifications are always based on maximum
configurations and worst-case scenarios. The user must set and enable an energy cap
from the IBM Director Active Energy Manager user interface.
Soft power capping
There are two power ranges into which the power cap can be set, power capping, as
described previously, and soft power capping. Soft power capping extends the allowed
energy capping range further, beyond a region that can be guaranteed in all configurations
and conditions. If the energy management goal is to meet a particular consumption limit,
then soft power capping is the mechanism to use.
Processor core nap mode
The IBM POWER7 processor uses a low-power mode called nap that stops processor
execution when there is no work to do on that processor core. The latency of exiting nap
mode is very small, typically not generating any impact on applications running. Because
of that, the POWER Hypervisor™ can use nap mode as a general-purpose idle state.
When the operating system detects that a processor thread is idle, it yields control of a
hardware thread to the POWER Hypervisor. The POWER Hypervisor immediately puts
the thread into nap mode. Nap mode allows the hardware to turn the clock off on most of
the circuits inside the processor core. Reducing active energy consumption by turning off
the clocks allows the temperature to fall, which further reduces leakage (static) power of
the circuits causing a cumulative effect. Nap mode saves from 10 - 15% of power
consumption in the processor core.
Processor core sleep mode
To be able to save even more energy, the POWER7 processor has an even lower power
mode called sleep. Before a core and its associated L2 and L3 caches enter sleep mode,
caches are flushed and transition lookaside buffers (TLB) are invalidated, and the
hardware clock is turned off in the core and in the caches. Voltage is reduced to minimize
leakage current. Processor cores inactive in the system (such as CoD processor cores)
are kept in Sleep mode. Sleep mode saves about 35% power consumption in the
processor core and associated L2 and L3 caches.
Fan control and altitude input
System firmware will dynamically adjust fan speed based on energy consumption,
altitude, ambient temperature, and energy savings modes. Power Systems are designed
to operate in worst-case environments, in hot ambient temperatures, at high altitudes, and
with high power components. In a typical case, one or more of these constraints are not
valid. When no power savings setting is enabled, fan speed is based on ambient 116 IBM Power 770 and 780 Technical Overview and Introduction
temperature and assumes a high-altitude environment. When a power savings setting is
enforced (either Power Energy Saver Mode or Dynamic Power Saver Mode), fan speed
will vary based on power consumption, ambient temperature, and altitude available.
System altitude can be set in IBM Director Active Energy Manager. If no altitude is set, the
system will assume a default value of 350 meters above sea level.
Processor folding
Processor folding is a consolidation technique that dynamically adjusts, over the short
term, the number of processors available for dispatch to match the number of processors
demanded by the workload. As the workload increases, the number of processors made
available increases. As the workload decreases, the number of processors made
available decreases. Processor folding increases energy savings during periods of low to
moderate workload because unavailable processors remain in low-power idle states (nap
or sleep) longer.
EnergyScale for I/O
IBM POWER7 processor-based systems automatically power off hot pluggable PCI
adapter slots that are empty or not being used. System firmware automatically scans all
pluggable PCI slots at regular intervals, looking for those that meet the criteria for being
not in use and powering them off. This support is available for all POWER7
processor-based servers and the expansion units that they support.
Server power down
If overall data center processor utilization is low, workloads can be consolidated on fewer
numbers of servers so that some servers can be turned off completely. It makes sense to
do this when there will be long periods of low utilization, such as weekends. AEM provides
information, such as the power that will be saved and the time it will take to bring a server
back online, that can be used to help make the decision to consolidate and power off. As
with many of the features available in IBM Systems Director and Active Energy Manager,
this function is scriptable and can be automated.
Partition power management
Available with Active Energy Manager 4.3.1 or later, and POWER7 systems with the 730
firmware release or later, is the capability to set a power savings mode for partitions or the
system processor pool. As in the system-level power savings modes, the per-partition
power savings modes can be used to achieve a balance between the power consumption
and the performance of a partition. Only partitions that have dedicated processing units
can have a unique power savings setting. Partitions that run in shared processing mode
will have a common power savings setting, which is that of the system processor pool.
This is because processing unit fractions cannot be power- managed.
As in the case of system-level power savings, two Dynamic Power Saver options are
offered:
– Favor partition performance
– Favor partition power savings
The user must configure this setting from Active Energy Manager. When dynamic power
saver is enabled in either mode, system firmware continuously monitors the performance
and utilization of each of the computer's POWER7 processor cores that belong to the
partition. Based on this utilization and performance data, the firmware will dynamically
adjust the processor frequency and voltage, reacting within milliseconds to adjust
workload performance and also deliver power savings when the partition is under-utilized.
In addition to the two dynamic power saver options, the customer can select to have no
power savings on a given partition. This option will leave the processor cores assigned to
the partition running at their nominal frequencies and voltages.Chapter 2. Architecture and technical overview 117
A new power savings mode, called inherit host setting, is available and is only applicable
to partitions. When configured to use this setting, a partition will adopt the power savings
mode of its hosting server. By default, all partitions with dedicated processing units, and
the system processor pool, are set to the inherit host setting.
On POWER7 processor-based systems, several EnergyScales are imbedded in
the hardware and do not require an operating system or external management
component. More advanced functionality requires Active Energy Manager (AEM) and
IBM Systems Director.
Table 2-39 provides a list of all features supported, showing all cases in which AEM is not
required. Table 2-39 also details the features that can be activated by traditional user
interfaces (for example, ASMI and HMC).
Table 2-39 AEM support
The Power 770 and Power 780 systems implement all the EnergyScale capabilities listed
in 2.15.1, “IBM EnergyScale technology” on page 114.
2.15.2 Thermal power management device card
The thermal power management device (TPMD) card is a separate micro controller installed
on some POWER6 processor-based systems, and on all POWER7 processor-based
systems. It runs real-time firmware whose sole purpose is to manage system energy.
The TPMD card monitors the processor modules, memory, environmental temperature, and
fan speed. Based on this information, it can act upon the system to maintain optimal power
and energy conditions (for example, increase the fan speed to react to a temperature
change). It also interacts with the IBM Systems Director Active Energy Manager to report
power and thermal information and to receive input from AEM on policies to be set. The
TPMD is part of the EnergyScale infrastructure.
Feature Active Energy Manager (AEM) required ASMI HMC
Power Trending Y N N
Thermal Reporting Y N N
Static Power Saver N Y Y
Dynamic Power Saver Y N N
Power Capping Y N N
Energy-optimized Fans N - -
Processor Core Nap N - -
Processor Core Sleep N - -
Processor Folding N - -
EnergyScale for I/O N - -
Server Power Down Y - -
Partition Power
Management
Y - -118 IBM Power 770 and 780 Technical Overview and Introduction© Copyright IBM Corp. 2011. All rights reserved. 119
Chapter 3. Virtualization
As you look for ways to maximize the return on your IT infrastructure investments,
consolidating workloads becomes an attractive proposition.
IBM Power Systems combined with PowerVM technology are designed to help you
consolidate and simplify your IT environment with the following key capabilities:
Improve server utilization and sharing I/O resources to reduce total cost of ownership and
make better use of IT assets.
Improve business responsiveness and operational speed by dynamically re-allocating
resources to applications as needed, to better match changing business needs or handle
unexpected changes in demand.
Simplify IT infrastructure management by making workloads independent of hardware
resources, thereby enabling you to make business-driven policies to deliver resources
based on time, cost, and service-level requirements.
This chapter discusses the virtualization technologies and features on IBM Power Systems:
POWER Hypervisor
POWER Modes
Partitioning
Active Memory Expansion
PowerVM
System Planning Tool
3120 IBM Power 770 and 780 Technical Overview and Introduction
3.1 POWER Hypervisor
Combined with features designed into the POWER7 processors, the POWER Hypervisor
delivers functions that enable other system technologies, including logical partitioning
technology, virtualized processors, IEEE VLAN compatible virtual switch, virtual SCSI
adapters, virtual Fibre Channel adapters, and virtual consoles. The POWER Hypervisor is a
basic component of the system’s firmware and offers the following functions:
Provides an abstraction between the physical hardware resources and the logical
partitions that use them
Enforces partition integrity by providing a security layer between logical partitions
Controls the dispatch of virtual processors to physical processors (See “Processing mode”
on page 131.)
Saves and restores all processor state information during a logical processor
context switch
Controls hardware I/O interrupt management facilities for logical partitions
Provides virtual LAN channels between logical partitions that help to reduce the need for
physical Ethernet adapters for inter-partition communication
Monitors the Service Processor and performs a reset or reload if it detects the loss of the
Service Processor, notifying the operating system if the problem is not corrected
The POWER Hypervisor is always active, regardless of the system configuration and also
when not connected to the managed console. It requires memory to support the resource
assignment to the logical partitions on the server. The amount of memory required by the
POWER Hypervisor firmware varies according to several factors. Factors influencing the
POWER Hypervisor memory requirements include these:
Number of logical partitions
Number of physical and virtual I/O devices used by the logical partitions
Maximum memory values specified in the logical partition profiles
The minimum amount of physical memory required to create a partition will be the size of the
system’s Logical Memory Block (LMB). The default LMB size varies according to the amount
of memory configured in the CEC (Table 3-1).
Table 3-1 Configured CEC memory-to-default Logical Memory Block size
In most cases, however, the actual minimum requirements and recommendations of the
supported operating systems are above 256 MB. Physical memory is assigned to partitions in
increments of LMB.
The POWER Hypervisor provides the following types of virtual I/O adapters:
Virtual SCSI
Virtual Ethernet
Virtual Fibre Channel
Virtual (TTY) console
Configurable CEC memory Default Logical Memory Block
Greater than 8 GB, up to 16 GB 64 MB
Greater than 16 GB, up to 32 GB 128 MB
Greater than 32 GB 256 MBChapter 3. Virtualization 121
Virtual SCSI
The POWER Hypervisor provides a virtual SCSI mechanism for virtualization of storage
devices. The storage virtualization is accomplished using two, paired adapters:
A virtual SCSI server adapter
A virtual SCSI client adapter
A Virtual I/O Server partition or a IBM i partition can define virtual SCSI server adapters.
Other partitions are client partitions. The Virtual I/O Server partition is a special logical
partition, as described in 3.4.4, “Virtual I/O Server” on page 137. The Virtual I/O Server
software is included on all PowerVM Editions and when using the PowerVM Standard Edition
and PowerVM Enterprise Edition, dual Virtual I/O Servers can be deployed to provide
maximum availability for client partitions when performing Virtual I/O Server maintenance.
Virtual Ethernet
The POWER Hypervisor provides a virtual Ethernet switch function that allows partitions on
the same server to use a fast and secure communication without any need for physical
interconnection. The virtual Ethernet allows a transmission speed in the range of 1 - 3 Gbps,
depending on the maximum transmission unit (MTU) size and CPU entitlement. Virtual
Ethernet support began with IBM AIX Version 5.3, or an appropriate level of Linux supporting
virtual Ethernet devices (see 3.4.9, “Operating system support for PowerVM” on page 148).
The virtual Ethernet is part of the base system configuration.
Virtual Ethernet has the following major features:
The virtual Ethernet adapters can be used for both IPv4 and IPv6 communication and can
transmit packets with a size up to 65,408 bytes. Therefore, the maximum MTU for the
corresponding interface can be up to 65,394 (65,390 if VLAN tagging is used).
The POWER Hypervisor presents itself to partitions as a virtual 802.1Q-compliant switch.
The maximum number of VLANs is 4096. Virtual Ethernet adapters can be configured as
either untagged or tagged (following the IEEE 802.1Q VLAN standard).
A partition can support 256 virtual Ethernet adapters. Besides a default port VLAN ID,
the number of additional VLAN ID values that can be assigned per virtual Ethernet
adapter is 20, which implies that each virtual Ethernet adapter can be used to access 21
virtual networks.
Each partition operating system detects the virtual local area network (VLAN) switch
as an Ethernet adapter without the physical link properties and asynchronous data
transmit operations.
Any virtual Ethernet can also have connectivity outside of the server if a layer-2 bridge to a
physical Ethernet adapter is set in one Virtual I/O Server partition (see 3.4.4, “Virtual I/O
Server” on page 137, for more details about shared Ethernet), also known as Shared
Ethernet Adapter.
Note: Virtual Ethernet is based on the IEEE 802.1Q VLAN standard. No physical I/O
adapter is required when creating a VLAN connection between partitions, and no access to
an outside network is required.122 IBM Power 770 and 780 Technical Overview and Introduction
Virtual Fibre Channel
A virtual Fibre Channel adapter is a virtual adapter that provides client logical partitions with a
Fibre Channel connection to a storage area network through the Virtual I/O Server logical
partition. The Virtual I/O Server logical partition provides the connection between the virtual
Fibre Channel adapters on the Virtual I/O Server logical partition and the physical Fibre
Channel adapters on the managed system. Figure 3-1 depicts the connections between the
client partition virtual Fibre Channel adapters and the external storage. For additional
information, see 3.4.8, “N_Port ID virtualization” on page 147.
Figure 3-1 Connectivity between virtual Fibre Channels adapters and external SAN devices
Virtual (TTY) console
Each partition must have access to a system console. Tasks such as operating system
installation, network setup, and various problem analysis activities require a dedicated system
console. The POWER Hypervisor provides the virtual console by using a virtual TTY or serial
adapter and a set of Hypervisor calls to operate on them. Virtual TTY does not require the
purchase of any additional features or software, such as the PowerVM Edition features.
Depending on the system configuration, the operating system console can be provided by the
Hardware Management Console virtual TTY, IVM virtual TTY, or from a terminal emulator that
is connected to a system port.
3.2 POWER processor modes
Although, strictly speaking, not a virtualization feature, the POWER modes are described
here because they affect various virtualization features.
Client logical
partition 1
Client virtual
fibre channel
adapter
Virtual I/O Server 1
Client logical
partition 2
Client virtual
fibre channel
adapter
Client logical
partition 3
Client virtual
fibre channel
adapter
Hypervisor
Physical fibre
channel adapter
Server virtual fibre
channel adapter
Server virtual fibre
channel adapter
Server virtual fibre
channel adapter
Storage Area
Network
Physical
storage 1
Physical
storage 2
Physical
storage 3Chapter 3. Virtualization 123
On Power System servers, partitions can be configured to run in several modes, including:
POWER6 compatibility mode
This execution mode is compatible with Version 2.05 of the Power Instruction Set
Architecture (ISA). For more information, visit the following address:
http://www.power.org/resources/reading/PowerISA_V2.05.pdf
POWER6+ compatibility mode
This mode is similar to POWER6, with eight additional Storage Protection Keys.
POWER7 mode
This is the native mode for POWER7 processors, implementing the v2.06 of the Power
Instruction Set Architecture. For more information, visit the following address:
http://www.power.org/resources/downloads/PowerISA_V2.06_PUBLIC.pdf
The selection of the mode is made on a per-partition basis, from the managed console, by
editing the partition profile (Figure 3-2).
Figure 3-2 Configuring partition profile compatibility mode from the managed console124 IBM Power 770 and 780 Technical Overview and Introduction
Table 3-2 lists the differences between these modes.
Table 3-2 Differences between POWER6 and POWER7 mode
3.3 Active Memory Expansion
Active Memory Expansion enablement is an optional feature of POWER7 processor-based
servers that must be specified when creating the configuration in the e-Config tool, as follows:
IBM Power 770 #4791
IBM Power 780 #4791
This feature enables memory expansion on the system. Using compression/decompression
of memory content can effectively expand the maximum memory capacity, providing
additional server workload capacity and performance.
Active Memory Expansion is an innovative POWER7 technology that allows the effective
maximum memory capacity to be much larger than the true physical memory maximum.
Compression/decompression of memory content can allow memory expansion up to 100%,
which in turn enables a partition to perform significantly more work or support more users
with the same physical amount of memory. Similarly, it can allow a server to run more
partitions and do more work for the same physical amount of memory.
Active Memory Expansion is available for partitions running AIX 6.1, Technology Level 4 with
SP2, or later.
Active Memory Expansion uses CPU resource of a partition to compress/decompress the
memory contents of this same partition. The trade-off of memory capacity for processor
cycles can be an excellent choice, but the degree of expansion varies based on how
compressible the memory content is, and it also depends on having adequate spare CPU
capacity available for this compression/decompression. Tests in IBM laboratories, using
POWER6 and POWER6+
mode
POWER7 mode Customer value
2-thread SMT 4-thread SMT Throughput performance,
processor core utilization
Vector Multimedia Extension/
AltiVec (VMX)
Vector Scalar Extension (VSX) High-performance computing
Affinity OFF by default 3-tier memory, Micropartition
Affinity
Improved system performance
for system images spanning
sockets and nodes
Barrier Synchronization
Fixed 128-byte array,
Kernel Extension Access
Enhanced Barrier
Synchronization
Variable Sized Array, User
Shared Memory Access
High-performance computing
parallel programming
synchronization facility
64-core and 128-thread
scaling
32-core and 128-thread
scaling
64-core and 256-thread
scaling
256-core and 1024-thread
scaling
Performance and scalability for
large scale-up single system
image workloads (such as
OLTP, ERP scale-up, and
WPAR consolidation)
EnergyScale CPU Idle EnergyScale CPU Idle and
Folding with NAP and SLEEP
Improved energy efficiencyChapter 3. Virtualization 125
sample work loads, showed excellent results for many workloads in terms of memory
expansion per additional CPU utilized. Other test workloads had more modest results.
Clients have much control over Active Memory Expansion usage. Each individual AIX
partition can turn on or turn off Active Memory Expansion. Control parameters set the amount
of expansion desired in each partition to help control the amount of CPU used by the Active
Memory Expansion function. An initial program load (IPL) is required for the specific partition
that is turning memory expansion on or off. After turned on, monitoring capabilities are
available in standard AIX performance tools, such as lparstat, vmstat, topas, and svmon.
Figure 3-3 represents the percentage of CPU that is used to compress memory for two
partitions with separate profiles. The green curve corresponds to a partition that has spare
processing power capacity. The blue curve corresponds to a partition constrained in
processing power.
Figure 3-3 CPU usage versus memory expansion effectiveness
Both cases show that there is a knee-of-curve relationship for CPU resource required for
memory expansion:
Busy processor cores do not have resources to spare for expansion.
The more memory expansion done, the more CPU resource required.
The knee varies depending on how compressible that the memory contents are. This
example demonstrates the need for a case-by-case study of whether memory expansion can
provide a positive return on investment.
% CPU
utilization
for
expansion
Amount of memory expansion
1 = Plenty of spare
CPU resource
available
2 = Constrained
CPU resource –
already running at
significant utilization
1
2
Very cost effective126 IBM Power 770 and 780 Technical Overview and Introduction
To help you perform this study, a planning tool is included with AIX 6.1 Technology Level 4,
allowing you to sample actual workloads and estimate how expandable the partition's
memory is and how much CPU resource is needed. Any model Power System can run the
planning tool. Figure 3-4 shows an example of the output returned by this planning tool. The
tool outputs various real memory and CPU resource combinations to achieve the desired
effective memory. It also recommends one particular combination. In this example, the tool
recommends that you allocate 58% of a processor to benefit from 45% extra memory
capacity.
Figure 3-4 Output from Active Memory Expansion planning tool
Active Memory Expansion Modeled Statistics:
-----------------------
Modeled Expanded Memory Size : 8.00 GB
Expansion True Memory Modeled Memory CPU Usage
Factor Modeled Size Gain Estimate
--------- -------------- ----------------- -----------
1.21 6.75 GB 1.25 GB [ 19%] 0.00
1.31 6.25 GB 1.75 GB [ 28%] 0.20
1.41 5.75 GB 2.25 GB [ 39%] 0.35
1.51 5.50 GB 2.50 GB[ 45%] 0.58
1.61 5.00 GB 3.00 GB [ 60%] 1.46
Active Memory Expansion Recommendation:
---------------------
The recommended AME configuration for this workload is to configure
the LPAR with a memory size of 5.50 GB and to configure a memory
expansion factor of 1.51. This will result in a memory expansion of
45% from the LPAR's current memory size. With this configuration,
the estimated CPU usage due to Active Memory Expansion is
approximately 0.58 physical processors, and the estimated overall
peak CPU resource required for the LPAR is 3.72 physical processors.Chapter 3. Virtualization 127
After you select the value of the memory expansion factor that you want to achieve, you can
use this value to configure the partition from the managed console (Figure 3-5).
Figure 3-5 Using the planning tool result to configure the partition
On the HMC menu describing the partition, check the Active Memory Expansion box and
enter true and maximum memory, and the memory expansion factor. To turn off expansion,
clear the check box. In both cases, a reboot of the partition is needed to activate the change.
In addition, a one-time, 60-day trial of Active Memory Expansion is available to provide more
exact memory expansion and CPU measurements. The trial can be requested using the
Capacity on Demand web page:
http://www.ibm.com/systems/power/hardware/cod/
Active Memory Expansion can be ordered with the initial order of the server or as an MES
order. A software key is provided when the enablement feature is ordered that is applied to
the server. Rebooting is not required to enable the physical server. The key is specific to an
individual server and is permanent. It cannot be moved to a separate server. This feature is
ordered per server, independently of the number of partitions using memory expansion.
Active Memory Expansion Modeled Statistics:
-----------------------
Modeled Expanded Memory Size : 8.00 GB
Expansion True Memory Modeled Memory CPU Usage
Factor Modeled Size Gain Estimate
--------- -------------- ----------------- -----------
1.21 6.75 GB 1.25 GB [ 19%] 0.00
1.31 6.25 GB 1.75 GB [ 28%] 0.20
1.41 5.75 GB 2.25 GB [ 39%] 0.35
1.51 5.50 GB 2.50 GB[ 45%] 0.58
1.61 5.00 GB 3.00 GB [ 60%] 1.46
Active Memory Expansion Recommendation:
---------------------
The recommended AME configuration for this workload is to
configure the LPAR with a memory size of 5.50 GB and to configure
a memory expansion factor of 1.51. This will result in a memory
expansion of 45% from the LPAR's current memory size. With this
configuration, the estimated CPU usage due to Active Memory
Expansion is approximately 0.58 physical processors, and the
estimated overall peak CPU resource required for the LPAR is 3.72
physical processors.
5.5 true
8.0 max
Sample output128 IBM Power 770 and 780 Technical Overview and Introduction
hypervisorFrom the HMC, you can view whether the Active Memory Expansion feature has
been activated (Figure 3-6).
Figure 3-6 Server capabilities listed from the HMC
For detailed information regarding Active Memory Expansion, you can download the
document Active Memory Expansion: Overview and Usage Guide from this location:
http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&appname=S
TGE_PO_PO_USEN&htmlfid=POW03037USEN
3.4 PowerVM
The PowerVM platform is the family of technologies, capabilities, and offerings that deliver
industry-leading virtualization on the IBM Power Systems. It is the new umbrella branding
term for Power Systems Virtualization (Logical Partitioning, Micro-Partitioning, POWER
Hypervisor, Virtual I/O Server, Live Partition Mobility, Workload Partitions, and more). As with
Advanced Power Virtualization in the past, PowerVM is a combination of hardware
enablement and value-added software. Section 3.4.1, “PowerVM editions” on page 129,
discusses the licensed features of each of the three separate editions of PowerVM.
Note: If you want to move an LPAR using Active Memory Expansion to a different system
using Live Partition Mobility, the target system must support AME (the target system must
have AME activated with the software key). If the target system does not have AME
activated, the mobility operation fails during the pre-mobility check phase, and an
appropriate error message displays to the user.Chapter 3. Virtualization 129
3.4.1 PowerVM editions
This section provides information about the virtualization capabilities of the PowerVM. The
three editions of PowerVM are suited for various purposes, as follows:
PowerVM Express Edition
PowerVM Express Edition is designed for customers looking for an introduction to
more advanced virtualization features at a highly affordable price, generally in
single-server projects.
PowerVM Standard Edition
This edition provides advanced virtualization functions and is intended for production
deployments and server consolidation.
PowerVM Enterprise Edition
This edition is suitable for large server deployments such as multi-server deployments and
cloud infrastructure. It includes unique features like Active Memory Sharing and Live
Partition Mobility.
Table 3-3 lists the version of PowerVM that are available on Power 770 and Power 780.
Table 3-3 Availability of PowerVM per POWER7 processor technology-based server model
For more information about the features included on each version of PowerVM, see IBM
PowerVM Virtualization Introduction and Configuration, SG24-7940-04.
3.4.2 Logical partitions (LPARs)
LPARs and virtualization increase utilization of system resources and add a new level of
configuration possibilities. This section provides details and configuration specifications
about this topic.
Dynamic logical partitioning
Logical partitioning was introduced with the POWER4 processor-based product line and the
AIX Version 5.1 operating system. This technology offered the capability to divide a pSeries®
system into separate logical systems, allowing each LPAR to run an operating environment
on dedicated attached devices, such as processors, memory, and I/O components.
Later, dynamic logical partitioning increased the flexibility, allowing selected system
resources, such as processors, memory, and I/O components, to be added and deleted from
logical partitions while they are executing. AIX Version 5.2, with all the necessary
enhancements to enable dynamic LPAR, was introduced in 2002. The ability to reconfigure
dynamic LPARs encourages system administrators to dynamically redefine all available
system resources to reach the optimum capacity for each defined dynamic LPAR.
PowerVM editions Express Standard Enterprise
IBM Power 770 N/A #7942 #7995
IBM Power 780 N/A #7942 #7995
Note: At the time of writing, the IBM Power 770 (9117-MMC) and Power 780 (9179-MHC)
can only be managed by the Hardware Management Console.130 IBM Power 770 and 780 Technical Overview and Introduction
Micro-Partitioning
Micro-Partitioning technology allows you to allocate fractions of processors to a logical
partition. This technology was introduced with POWER5 processor-based systems. A logical
partition using fractions of processors is also known as a Shared Processor Partition or
micro-partition. Micro-partitions run over a set of processors called a Shared Processor Pool,
and virtual processors are used to let the operating system manage the fractions of
processing power assigned to the logical partition. From an operating system perspective, a
virtual processor cannot be distinguished from a physical processor, unless the operating
system has been enhanced to be made aware of the difference. Physical processors are
abstracted into virtual processors that are available to partitions. The meaning of the term
physical processor in this section is a processor core. For example, a 2-core server has two
physical processors.
When defining a shared processor partition, several options have to be defined:
The minimum, desired, and maximum processing units
Processing units are defined as processing power, or the fraction of time that the partition
is dispatched on physical processors. Processing units define the capacity entitlement of
the partition.
The Shared Processor Pool
Pick one from the list with the names of each configured Shared Processor Pool. This list
also displays the pool ID of each configured Shared Processor Pool in parentheses. If the
name of the desired Shared Processor Pool is not available here, you must first configure
the desired Shared Processor Pool using the Shared Processor Pool Management
window. Shared processor partitions use the default Shared Processor Pool called
DefaultPool by default. See 3.4.3, “Multiple Shared Processor Pools” on page 132, for
details about Multiple Shared Processor Pools.
Whether the partition will be able to access extra processing power to “fill up” its virtual
processors above its capacity entitlement (selecting either to cap or uncap your partition)
If there is spare processing power available in the Shared Processor Pool or
other partitions are not using their entitlement, an uncapped partition can use
additional processing units if its entitlement is not enough to satisfy its application
processing demand.
The weight (preference) in the case of an uncapped partition
The minimum, desired, and maximum number of virtual processors
The POWER Hypervisor calculates partition processing power based on minimum, desired,
and maximum values, processing mode, and is also based on requirements of other active
partitions. The actual entitlement is never smaller than the processing unit’s desired value,
but can exceed that value in the case of an uncapped partition and up to the number of virtual
processors allocated.
A partition can be defined with a processor capacity as small as 0.10 processing units. This
represents 0.10 of a physical processor. Each physical processor can be shared by up to 10
shared processor partitions, and the partition’s entitlement can be incremented fractionally by
as little as 0.01 of the processor. The shared processor partitions are dispatched and
time-sliced on the physical processors under control of the POWER Hypervisor. The shared
processor partitions are created and managed by the HMC.
The IBM Power 770 supports up to 64 cores, and has the following maximums:
Up to 64 dedicated partitions
Up to 640 micro-partitions (10 micro-partitions per physical active core)Chapter 3. Virtualization 131
The Power 780 allows up to 96 cores in a single system, supporting the following maximums:
Up to 96 dedicated partitions
Up to 960 micro-partitions (10 micro-partitions per physical active core)
An important point is that the maximums stated are supported by the hardware, but the
practical limits depend on application workload demands.
Additional information about virtual processors includes:
A virtual processor can be running (dispatched) either on a physical processor or as
standby waiting for a physical processor to became available.
Virtual processors do not introduce any additional abstraction level. They are only a
dispatch entity. When running on a physical processor, virtual processors run at the same
speed as the physical processor.
Each partition’s profile defines CPU entitlement that determines how much processing
power any given partition should receive. The total sum of CPU entitlement of all partitions
cannot exceed the number of available physical processors in a Shared Processor Pool.
The number of virtual processors can be changed dynamically through a dynamic
LPAR operation.
Processing mode
When you create a logical partition you can assign entire processors for dedicated use, or you
can assign partial processing units from a Shared Processor Pool. This setting defines the
processing mode of the logical partition. Figure 3-7 shows a diagram of the concepts
discussed in this section.
Figure 3-7 Logical partitioning concepts
Set of micro-partitions
KEY:
vp Virtual processor
lp Logical processor
PrU Processing Units
Shared-Processor Pool 0
Set of micro-partitions
Shared-Processor Pool 1
lp lp lp lp lp lp lp lp lp lp lp lp
AIX V6.1
1.5 PrU
AIX V5.3
0.5 PrU
AIX V6.1
1.5 PrU
Linux
0.5 PrU
vp vp vp vp vp vp
lp lp lp lp lp lp
AIX V5.3
AIX V6.1
Dedicated processors Dedicated processors
POWER Hypervisor
8-core SMP System132 IBM Power 770 and 780 Technical Overview and Introduction
Dedicated mode
In dedicated mode, physical processors are assigned as a whole to partitions. The
simultaneous multithreading feature in the POWER7 processor core allows the core to
execute instructions from two or four independent software threads simultaneously. To
support this feature we use the concept of logical processors. The operating system (AIX,
IBM i, or Linux) sees one physical processor as two or four logical processors if the
simultaneous multithreading feature is on. It can be turned off and on dynamically while the
operating system is executing (for AIX, use the smtctl command). If simultaneous
multithreading is off, each physical processor is presented as one logical processor, and thus
only one thread.
Shared dedicated mode
On POWER7 processor technology-based servers, you can configure dedicated partitions to
become processor donors for idle processors that they own, allowing for the donation of
spare CPU cycles from dedicated processor partitions to a Shared Processor Pool. The
dedicated partition maintains absolute priority for dedicated CPU cycles. Enabling this feature
can help to increase system utilization without compromising the computing power for critical
workloads in a dedicated processor.
Shared mode
In shared mode, logical partitions use virtual processors to access fractions of physical
processors. Shared partitions can define any number of virtual processors (the maximum
number is 10 times the number of processing units assigned to the partition). From the
POWER Hypervisor point of view, virtual processors represent dispatching objects. The
POWER Hypervisor dispatches virtual processors to physical processors according to the
partition’s processing units entitlement. One processing unit represents one physical
processor’s processing capacity. At the end of the POWER Hypervisor’s dispatch cycle
(10 ms), all partitions receive total CPU time equal to their processing unit’s entitlement. The
logical processors are defined on top of virtual processors. So, even with a virtual processor,
the concept of a logical processor exists and the number of logical processors depends
whether the simultaneous multithreading is turned on or off.
3.4.3 Multiple Shared Processor Pools
Multiple Shared Processor Pools (MSPPs) is a capability supported on POWER7 processor
and POWER6 processor-based servers. This capability allows a system administrator to
create a set of micro-partitions with the purpose of controlling the processor capacity that can
be consumed from the physical Shared Processor Pool. Chapter 3. Virtualization 133
To implement MSPPs, there is a set of underlying techniques and technologies. Figure 3-8
shows an overview of the architecture of Multiple Shared Processor Pools.
Figure 3-8 Overview of the architecture of Multiple Shared Processor Pools
Micro-partitions are created and then identified as members of either the default Shared
Processor Pool0
or a user-defined Shared Processor Pooln
. The virtual processors that exist
within the set of micro-partitions are monitored by the POWER Hypervisor, and processor
capacity is managed according to user-defined attributes.
If the Power Systems server is under heavy load, each micro-partition within a Shared
Processor Pool is guaranteed its processor entitlement plus any capacity that it might be
allocated from the reserved pool capacity if the micro-partition is uncapped.
If certain micro-partitions in a Shared Processor Pool do not use their capacity entitlement,
the unused capacity is ceded and other uncapped micro-partitions within the same Shared
Processor Pool are allocated the additional capacity according to their uncapped weighting.
In this way, the entitled pool capacity of a Shared Processor Pool is distributed to the set of
micro-partitions within that Shared Processor Pool.
All Power Systems servers that support the Multiple Shared Processor Pools capability will
have a minimum of one (the default) Shared Processor Pool and up to a maximum of 64
Shared Processor Pools.
POWER Hypervisor
p0 p1
Physical Shared-Processor Pool
p2 p3 p4 p5 p6 p7
Shared Processor Pool
0
Set of micro-partitions
AIX V5.3
EC 1.6
AIX V6.1
EC 0.8
Linux
EC 0.5
vp0
vp1
vp2 vp3
vp4
AIX V6.1
EC 1.6
AIX V6.1
EC 0.8
Linux
EC 0.5
vp5
vp6
vp7
vp8 vp9
vp10
Shared Processor Pool
1
Set of micro-partitions
Unused capacity in SPP0
is
redistributed to uncapped
micro-partitions within SPP0
Unused capacity in SPP1
is
redistributed to uncapped
micro-partitions within SPP1
KEY:
EC Entitled Capacity
p Physical processor
vp Virtual processor
SPPn Shared-Processor Pooln134 IBM Power 770 and 780 Technical Overview and Introduction
Default Shared Processor Pool (SPP0
)
On any Power Systems server supporting Multiple Shared Processor Pools, a default Shared
Processor Pool is always automatically defined. The default Shared Processor Pool has a
pool identifier of zero (SPP-ID = 0) and can also be referred to as SPP0
. The default Shared
Processor Pool has the same attributes as a user-defined Shared Processor Pool except that
these attributes are not directly under the control of the system administrator. They have fixed
values (Table 3-4).
Table 3-4 Attribute values for the default Shared Processor Pool (SPP0
)
Creating Multiple Shared Processor Pools
The default Shared Processor Pool (SPP0
) is automatically activated by the system and is
always present.
All other Shared Processor Pools exist, but by default are inactive. By changing the maximum
pool capacity of a Shared Processor Pool to a value greater than zero, it becomes active and
can accept micro-partitions (either transferred from SPP0
or newly created).
Levels of processor capacity resolution
The two levels of processor capacity resolution implemented by the POWER Hypervisor and
Multiple Shared Processor Pools are:
Level0
The first level, Level0
, is the resolution of capacity within the same Shared Processor
Pool. Unused processor cycles from within a Shared Processor Pool are harvested and
then redistributed to any eligible micro-partition within the same Shared Processor Pool.
Level1
This is the second level of processor capacity resolution. When all Level0
capacity has
been resolved within the Multiple Shared Processor Pools, the POWER Hypervisor
harvests unused processor cycles and redistributes them to eligible micro-partitions
regardless of the Multiple Shared Processor Pools structure.
SPP0
attribute Value
Shared Processor Pool ID 0.
Maximum pool capacity The value is equal to the capacity in the physical Shared
Processor Pool.
Reserved pool capacity 0.
Entitled pool capacity Sum (total) of the entitled capacities of the micro-partitions in the
default Shared Processor Pool.Chapter 3. Virtualization 135
Figure 3-9 shows the levels of unused capacity redistribution implemented by the POWER
Hypervisor.
Figure 3-9 The levels of unused capacity redistribution
Capacity allocation above the entitled pool capacity (Level1
)
The POWER Hypervisor initially manages the entitled pool capacity at the Shared Processor
Pool level. This is where unused processor capacity within a Shared Processor Pool is
harvested and then redistributed to uncapped micro-partitions within the same Shared
Processor Pool. This level of processor capacity management is sometimes referred to as
Level0
capacity resolution.
At a higher level, the POWER Hypervisor harvests unused processor capacity from the
Multiple Shared Processor Pools that do not consume all of their entitled pool capacity. If a
particular Shared Processor Pool is heavily loaded and several of the uncapped
micro-partitions within it require additional processor capacity (above the entitled pool
capacity), then the POWER Hypervisor redistributes some of the extra capacity to the
uncapped micro-partitions. This level of processor capacity management is sometimes
referred to as Level1
capacity resolution.
To redistribute unused processor capacity to uncapped micro-partitions in Multiple Shared
Processor Pools above the entitled pool capacity, the POWER Hypervisor uses a higher level
of redistribution, Level1
.
POWER Hypervisor
SPP0
SPP1
SPP2
SPPn
Micro-partition
n
SPP2
capacity
resolution
SPPn
capacity
resolution
SPP1
capacity
resolution
SPP0
capacity
resolution
Physical Shared Processor Pool
p0 p1 p2 p3 p4 p5
Level
1
capacity
resolution
Level
1
capacity resolution
POWER Hypervisor harvests unused
processor capacity from Shared-Processor
Pools and redistributes it across all
uncapped micro-partitions regardless of the
Shared-Processor Pool structure
Level
0
capacity resolution
Resolution of the Entitled Pool Capacity
within the same Shared-Processor Pool
Level
0
capacity
resolution
Micro-partition
0
Micro-partition1
Micro-partition
2
Micro-partition
3
Micro-partition
n136 IBM Power 770 and 780 Technical Overview and Introduction
Where there is unused processor capacity in under-utilized Shared Processor Pools,
the micro-partitions within the Shared Processor Pools cede the capacity to the
POWER Hypervisor.
In busy Shared Processor Pools, where the micro-partitions have used all of the entitled pool
capacity, the POWER Hypervisor allocates additional cycles to micro-partitions, in which all
of the following statements are true:
The maximum pool capacity of the Shared Processor Pool hosting the micro-partition has
not been met.
The micro-partition is uncapped.
The micro-partition has enough virtual-processors to take advantage of the
additional capacity.
Under these circumstances, the POWER Hypervisor allocates additional processor capacity
to micro-partitions on the basis of their uncapped weights independent of the Shared
Processor Pool hosting the micro-partitions. This can be referred to as Level1
capacity
resolution. Consequently, when allocating additional processor capacity in excess of the
entitled pool capacity of the Shared Processor Pools, the POWER Hypervisor takes the
uncapped weights of all micro-partitions in the system into account, regardless of the Multiple
Shared Processor Pool structure.
Dynamic adjustment of maximum pool capacity
The maximum pool capacity of a Shared Processor Pool, other than the default Shared
Processor Pool0
, can be adjusted dynamically from the managed console, using either the
graphical interface or the command-line interface (CLI).
Dynamic adjustment of reserved pool capacity
The reserved pool capacity of a Shared Processor Pool, other than the default Shared
Processor Pool0
, can be adjusted dynamically from the managed console, using either the
graphical interface or the CLI interface.
Dynamic movement between Shared Processor Pools
A micro-partition can be moved dynamically from one Shared Processor Pool to another
using the managed console using either the graphical interface or the CLI interface. Because
the entitled pool capacity is partly made up of the sum of the entitled capacities of the
micro-partitions, removing a micro-partition from a Shared Processor Pool reduces the
entitled pool capacity for that Shared Processor Pool. Similarly, the entitled pool capacity of
the Shared Processor Pool that the micro-partition joins will increase.
Deleting a Shared Processor Pool
Shared Processor Pools cannot be deleted from the system. However, they are deactivated
by setting the maximum pool capacity and the reserved pool capacity to zero. The Shared
Processor Pool will still exist but will not be active. Use the managed console interface to
deactivate a Shared Processor Pool. A Shared Processor Pool cannot be deactivated unless
all micro-partitions hosted by the Shared Processor Pool have been removed.
Important: Level1
capacity resolution: When allocating additional processor capacity in
excess of the entitled pool capacity of the Shared Processor Pool, the POWER Hypervisor
takes the uncapped weights of all micro-partitions in the system into account, regardless
of the Multiple Shared Processor Pool structure.Chapter 3. Virtualization 137
Live Partition Mobility and Multiple Shared Processor Pools
A micro-partition can leave a Shared Processor Pool because of PowerVM Live Partition
Mobility. Similarly, a micro-partition can join a Shared Processor Pool in the same way. When
performing PowerVM Live Partition Mobility, you are given the opportunity to designate a
destination Shared Processor Pool on the target server to receive and host the migrating
micro-partition.
Because several simultaneous micro-partition migrations are supported by PowerVM Live
Partition Mobility, it is conceivable to migrate the entire Shared Processor Pool from one
server to another.
3.4.4 Virtual I/O Server
The Virtual I/O Server is part of all PowerVM Editions. It is a special-purpose partition that
allows the sharing of physical resources between logical partitions to allow more efficient
utilization (for example, consolidation). In this case, the Virtual I/O Server owns the physical
resources (SCSI, Fibre Channel, network adapters, and optical devices) and allows client
partitions to share access to them, thus minimizing the number of physical adapters in the
system. The Virtual I/O Server eliminates the requirement that every partition owns a
dedicated network adapter, disk adapter, and disk drive. The Virtual I/O Server supports
OpenSSH for secure remote logins. It also provides a firewall for limiting access by ports,
network services, and IP addresses. Figure 3-10 shows an overview of a Virtual I/O
Server configuration.
Figure 3-10 Architectural view of the Virtual I/O Server
Because the Virtual I/O Server is an operating system-based appliance server, redundancy
for physical devices attached to the Virtual I/O Server can be provided by using capabilities
such as Multipath I/O and IEEE 802.3ad Link Aggregation.
Installation of the Virtual I/O Server partition is performed from a special system backup DVD
that is provided to clients who order any PowerVM edition. This dedicated software is only for
the Virtual I/O Server (and IVM in case it is used) and is only supported in special Virtual I/O
Server partitions. Three major virtual devices are supported by the Virtual I/O Server:
Shared Ethernet Adapter
Virtual SCSI
Virtual Fibre Channel adapter
Virtual I/O Server Hypervisor
Shared Ethernet
Adapter
Physical Ethernet
Adapter
Physical Disk
Adapter
Virtual I/O Client 1
Virtual Ethernet
Adapter
Virtual SCSI
Adapter
Virtual I/O Client 2
Virtual Ethernet
Adapter
Virtual SCSI
Adapter
Virtual Ethernet
Adapter
Virtual SCSI
Adapter
Physical
Disk
Physical
Disk
External Network138 IBM Power 770 and 780 Technical Overview and Introduction
The Virtual Fibre Channel adapter is used with the NPIV feature, described in 3.4.8, “N_Port
ID virtualization” on page 147.
Shared Ethernet Adapter
A Shared Ethernet Adapter (SEA) can be used to connect a physical Ethernet network to a
virtual Ethernet network. The Shared Ethernet Adapter provides this access by connecting
the internal hypervisor VLANs with the VLANs on the external switches. Because the Shared
Ethernet Adapter processes packets at layer 2, the original MAC address and VLAN tags of
the packet are visible to other systems on the physical network. IEEE 802.1 VLAN tagging
is supported.
The Shared Ethernet Adapter also provides the ability for several client partitions to share
one physical adapter. With an SEA, you can connect internal and external VLANs using a
physical adapter. The Shared Ethernet Adapter service can only be hosted in the Virtual I/O
Server, not in a general-purpose AIX or Linux partition, and acts as a layer-2 network bridge
to securely transport network traffic between virtual Ethernet networks (internal) and one or
more (EtherChannel) physical network adapters (external). These virtual Ethernet network
adapters are defined by the POWER Hypervisor on the Virtual I/O Server.
Figure 3-11 shows a configuration example of an SEA with one physical and two virtual
Ethernet adapters. An SEA can include up to 16 virtual Ethernet adapters on the Virtual I/O
Server that share the same physical access.
Figure 3-11 Architectural view of a Shared Ethernet Adapter
Tip: A Linux partition can provide bridging function also, by using the brctl command.
VIOS Client 1
Ethernet
switch
VLAN=2 PVID=1
ent3
(sea)
en3
(if.)
en0
(if.)
Client 2
en0
(if.)
ent0
(virt.)
Client 3
en0
(if.)
ent0
(virt.)
ent1
(virt.)
ent2
(virt.)
ent0
(virt.)
VLAN=2
PVID=2
PVID=99
VID=2
PVID=1
PVID=1
PVID=1
VLAN=1
Hypervisor
External
Network
ent0
(phy.)Chapter 3. Virtualization 139
A single SEA setup can have up to 16 Virtual Ethernet trunk adapters and each virtual
Ethernet trunk adapter can support up to 20 VLAN networks. Therefore, a possibility is for a
single physical Ethernet to be shared between 320 internal VLAN networks. The number of
shared Ethernet adapters that can be set up in a Virtual I/O Server partition is limited only by
the resource availability, because there are no configuration limits.
Unicast, broadcast, and multicast are supported, so protocols that rely on broadcast or
multicast, such as Address Resolution Protocol (ARP), Dynamic Host Configuration
Protocol (DHCP), Boot Protocol (BOOTP), and Neighbor Discovery Protocol (NDP), can
work on an SEA.
For a more detailed discussion about virtual networking, see:
http://www.ibm.com/servers/aix/whitepapers/aix_vn.pdf
Virtual SCSI
Virtual SCSI is used to refer to a virtualized implementation of the SCSI protocol. Virtual SCSI
is based on a client/server relationship. The Virtual I/O Server logical partition owns the
physical resources and acts as a server or, in SCSI terms, a target device. The client logical
partitions access the virtual SCSI backing storage devices provided by the Virtual I/O Server
as clients.
The virtual I/O adapters (virtual SCSI server adapter and a virtual SCSI client adapter) are
configured using a managed console or through the Integrated Virtualization Manager on
smaller systems. The virtual SCSI server (target) adapter is responsible for executing any
SCSI commands that it receives. It is owned by the Virtual I/O Server partition. The virtual
SCSI client adapter allows a client partition to access physical SCSI and SAN attached
devices and LUNs that are assigned to the client partition. The provisioning of virtual disk
resources is provided by the Virtual I/O Server.
Physical disks presented to the Virtual/O Server can be exported and assigned to a client
partition in a number of ways:
The entire disk is presented to the client partition.
The disk is divided into several logical volumes, which can be presented to a single client
or multiple clients.
As of Virtual I/O Server 1.5, files can be created on these disks, and file-backed storage
devices can be created.
The logical volumes or files can be assigned to separate partitions. Therefore, virtual SCSI
enables sharing of adapters and disk devices.
Note: A Shared Ethernet Adapter does not need to have an IP address configured to be
able to perform the Ethernet bridging functionality. Configuring IP on the Virtual I/O Server
is convenient because the Virtual I/O Server can then be reached by TCP/IP, for example,
to perform dynamic LPAR operations or to enable remote login. This task can be done
either by configuring an IP address directly on the SEA device or on an additional virtual
Ethernet adapter in the Virtual I/O Server. This leaves the SEA without the IP address,
allowing for maintenance on the SEA without losing IP connectivity in case SEA failover
is configured.140 IBM Power 770 and 780 Technical Overview and Introduction
Figure 3-12 shows an example where one physical disk is divided into two logical volumes by
the Virtual I/O Server. Each client partition is assigned one logical volume, which is then
accessed through a virtual I/O adapter (VSCSI Client Adapter). Inside the partition, the disk is
seen as a normal hdisk.
Figure 3-12 Architectural view of virtual SCSI
At the time of writing, virtual SCSI supports Fibre Channel, parallel SCSI, iSCSI, SAS, SCSI
RAID devices, and optical devices, including DVD-RAM and DVD-ROM. Other protocols such
as SSA and tape devices are not supported.
For more information about the specific storage devices supported for Virtual I/O Server, see:
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/datasheet.html
Virtual I/O Server functions
The Virtual I/O Server has a number of features, including monitoring solutions:
Support for Live Partition Mobility starting on POWER6 processor-based systems with the
PowerVM Enterprise Edition. For more information about Live Partition Mobility, see 3.4.5,
“PowerVM Live Partition Mobility” on page 141.
Support for virtual SCSI devices backed by a file, which are then accessed as standard
SCSI-compliant LUNs.
Support for virtual Fibre Channel devices that are used with the NPIV feature.
Virtual I/O Server Expansion Pack with additional security functions such as Kerberos
(Network Authentication Service for users and client and server applications), Simple
Network Management Protocol (SNMP) v3, and Lightweight Directory Access Protocol
(LDAP) client functionality.
System Planning Tool (SPT) and Workload Estimator, which are designed to ease the
deployment of a virtualized infrastructure. For more information about the System
Planning Tool, see 3.5, “System Planning Tool” on page 150.
I/O Server Partition Client Partition 1 Client Partition 2
POWER Hypervisor
LVM
Physical
Adapter
Logical Hdisk Hdisk
Volume 1
Logical
Volume 2
VSCSI
Server
Adapter
VSCSI
Client
Adapter
VSCSI
Client
Adapter
Physical Disk
(SCSI, FC)
VSCSI
Server
AdapterChapter 3. Virtualization 141
Includes IBM Systems Director agent and a number of pre-installed Tivoli agents, such as:
– Tivoli Identity Manager (TIM), to allow easy integration into an existing Tivoli Systems
Management infrastructure
– Tivoli Application Dependency Discovery Manager (ADDM), which creates and
automatically maintains application infrastructure maps including dependencies,
change-histories, and deep configuration values
vSCSI eRAS.
Additional CLI statistics in svmon, vmstat, fcstat, and topas.
Monitoring solutions to help manage and monitor the Virtual I/O Server and shared
resources. New commands and views provide additional metrics for memory, paging,
processes, Fibre Channel HBA statistics, and virtualization.
For more information about the Virtual I/O Server and its implementation, see IBM PowerVM
Virtualization Introduction and Configuration, SG24-7940.
3.4.5 PowerVM Live Partition Mobility
PowerVM Live Partition Mobility allows you to move a running logical partition, including its
operating system and running applications, from one system to another without any shutdown
or without disrupting the operation of that logical partition. Inactive partition mobility allows
you to move a powered-off logical partition from one system to another.
Partition mobility provides systems management flexibility and improves system availability,
as follows:
Avoid planned outages for hardware or firmware maintenance by moving logical partitions
to another server and then performing the maintenance. Live Partition Mobility can help
lead to zero downtime maintenance because you can use it to work around scheduled
maintenance activities.
Avoid downtime for a server upgrade by moving logical partitions to another server and
then performing the upgrade. This approach allows your users to continue their work
without disruption.
Avoid unplanned downtime. With preventive failure management, if a server indicates a
potential failure, you can move its logical partitions to another server before the failure
occurs. Partition mobility can help avoid unplanned downtime.
Take advantage of server optimization:
– Consolidation: You can consolidate workloads running on several small, under-used
servers onto a single large server.
– Deconsolidation: You can move workloads from server to server to optimize resource
use and workload performance within your computing environment. With active
partition mobility, you can manage workloads with minimal downtime.
Mobile partition’s operating system requirements
The operating system running in the mobile partition has to be AIX or Linux. The Virtual I/O
Server partition itself cannot be migrated. All versions of AIX and Linux supported on the IBM
POWER7 processor-based servers also support partition mobility.
Source and destination system requirements
The source partition must be one that has only virtual devices. If there are any physical
devices in its allocation, they must be removed before the validation or migration is initiated. 142 IBM Power 770 and 780 Technical Overview and Introduction
An N_Port ID virtualization (NPIV) device is considered virtual and is compatible with
partition migration.
The hypervisor must support the Partition Mobility functionality (also called migration process)
available on POWER 6 and POWER 7 processor-based hypervisors. Firmware must be at
firmware level eFW3.2 or later. All POWER7 processor-based hypervisors support Live
Partition Mobility. Source and destination systems can have separate firmware levels, but
they must be compatible with each other.
A possibility is to migrate partitions back and forth between POWER6 and POWER7
processor-based servers. Partition Mobility leverages the POWER6 Compatibility Modes
that are provided by POWER7 processor-based servers. On the POWER7
processor-based server, the migrated partition is then executing in POWER6 or POWER6+
Compatibility Mode.
If you want to move an active logical partition from a POWER6 processor-based server to a
POWER7 processor-based server so that the logical partition can take advantage of the
additional capabilities available with the POWER7 processor, perform these steps:
1. Set the partition-preferred processor compatibility mode to the default mode. When you
activate the logical partition on the POWER6 processor-based server, it runs in the
POWER6 mode.
2. Move the logical partition to the POWER7 processor-based server. Both the current
and preferred modes remain unchanged for the logical partition until you restart the
logical partition.
3. Restart the logical partition on the POWER7 processor-based server. The hypervisor
evaluates the configuration. Because the preferred mode is set to default and the logical
partition now runs on a POWER7 processor-based server, the highest mode available is
the POWER7 mode. The hypervisor determines that the most fully featured mode that is
supported by the operating environment installed in the logical partition is the POWER7
mode and changes the current mode of the logical partition to the POWER7 mode.
Now the current processor compatibility mode of the logical partition is the POWER7 mode,
and the logical partition runs on the POWER7 processor-based server.
The Virtual I/O Server on the source system provides the access to the client resources and
must be identified as a mover service partition (MSP). The Virtual Asynchronous Services
Interface (VASI) device allows the mover service partition to communicate with the
hypervisor. It is created and managed automatically by the managed console and will be
configured on both the source and destination Virtual I/O Servers, which are designated as
the mover service partitions for the mobile partition, to participate in active mobility. Other
requirements include a similar time-of-day on each server, systems must not be running on
battery power, and shared storage (external hdisk with reserve_policy=no_reserve). In
addition, all logical partitions must be on the same open network with RMC established to the
managed console.
The managed console is used to configure, validate, and orchestrate. You use the managed
console to configure the Virtual I/O Server as an MSP and to configure the VASI device. An
managed console wizard validates your configuration and identifies issues that can cause the
Tip: The “Migration combinations of processor compatibility modes for active Partition
Mobility” web page offers presentations of the supported migrations:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7hc3/iphc3pcmco
mbosact.htmChapter 3. Virtualization 143
migration to fail. During the migration, the managed console controls all phases of
the process.
Improved Live Partition Mobility benefits
The possibility to move partitions between POWER6 and POWER7 processor-based servers
greatly facilitates the deployment of POWER7 processor-based servers, as follows:
Installation of the new server can be performed while the application is executing on a
POWER6 server. After the POWER7 processor-based server is ready, the application can
be migrated to its new hosting server without application down time.
When adding POWER7 processor-based servers to a POWER6 environment, you get the
additional flexibility to perform workload balancing across the entire set of POWER6 and
POWER7 processor-based servers.
When performing server maintenance, you get the additional flexibility to use POWER6
Servers for hosting applications usually hosted on POWER7 processor-based servers,
and vice versa, allowing you to perform this maintenance with no application planned
down time.
For more information about Live Partition Mobility and how to implement it, see IBM
PowerVM Live Partition Mobility, SG24-7460.
3.4.6 Active Memory Sharing
Active Memory Sharing is an IBM PowerVM advanced memory virtualization technology that
provides system memory virtualization capabilities to IBM Power Systems, allowing multiple
partitions to share a common pool of physical memory.
Active Memory Sharing is only available with the Enterprise version of PowerVM.
The physical memory of an IBM Power System can be assigned to multiple partitions either in
dedicated or shared mode. The system administrator has the capability to assign some
physical memory to a partition and some physical memory to a pool that is shared by other
partitions. A single partition can have either dedicated or shared memory:
With a pure dedicated memory model, the system administrator’s task is to optimize
available memory distribution among partitions. When a partition suffers degradation
because of memory constraints and other partitions have unused memory, the
administrator can manually issue a dynamic memory reconfiguration.
With a shared memory model, the system automatically decides the optimal distribution of
the physical memory to partitions and adjusts the memory assignment based on partition
load. The administrator reserves physical memory for the shared memory pool, assigns
partitions to the pool, and provides access limits to the pool.
Active Memory Sharing can be exploited to increase memory utilization on the system either
by decreasing the global memory requirement or by allowing the creation of additional
partitions on an existing system. Active Memory Sharing can be used in parallel with Active
Memory Expansion on a system running a mixed workload of several operating system. For
example, AIX partitions can take advantage of Active Memory Expansion. Other operating
systems take advantage of Active Memory Sharing.
For additional information regarding Active Memory Sharing, see IBM PowerVM Virtualization
Active Memory Sharing, REDP-4470.144 IBM Power 770 and 780 Technical Overview and Introduction
3.4.7 Active Memory Deduplication
In a virtualized environment, the systems might have a considerable amount of
duplicated information stored on RAM after each partition has its own operating system,
and some of them might even share the same kind of applications. On heavily loaded
systems this might lead to a shortage of the available memory resources, forcing paging by
the AMS partition operating systems, the AMD pool, or both, which might decrease overall
system performance.
Figure 3-13 shows the standard behavior of a system without Active Memory Deduplication
(AMD) enabled on its AMS shared memory pool. Identical pages within the same or different
LPARs each require their own unique physical memory page, consuming space with
repeated information.
Figure 3-13 AMS shared memory pool without AMD enabled
Active Memory Deduplication allows the hypervisor to dynamically map identical partition
memory pages to a single physical memory page within a shared memory pool. This enables
a better utilization of the AMS shared memory pool, increasing the system’s overall
performance by avoiding paging. Deduplication can cause the hardware to incur fewer cache
misses, which will also lead to improved performance.
D D D D U U U U U U U
U U U U U U U U U U U
U U U
D U U
U U U U U
D D U U U
U U
U U
D U
U U
Without
Active Memory
Deduplication
Mappings
AMS shared memory pool
LPAR1
Logical Memory
LPAR2
Logical Memory
LPAR3
Logical Memory
D
U
Duplicate pages
Unique pages
KEY:Chapter 3. Virtualization 145
Figure 3-14 shows the behavior of a system with Active Memory Deduplication enabled on its
AMS shared memory pool. Duplicated pages from different LPARs are stored just once,
providing the AMS pool with more free memory.
Figure 3-14 Identical memory pages mapped to a single physical memory page with Active Memory
Duplication enabled
Active Memory Deduplication (AMD) depends on the Active Memory Sharing (AMS) feature
to be available, and consumes CPU cycles donated by the AMS pool's VIOS partitions to
identify deduplicated pages. The operating systems running on the AMS partitions can hint to
the PowerVM Hypervisor that some pages (such as frequently referenced read-only code
pages) are particularly good for deduplication.
To perform deduplication, the hypervisor cannot compare every memory page in the AMS
pool with every other page. Instead, it computes a small signature for each page that it visits
and stores the signatures in an internal table. Each time that a page is inspected, its signature
is looked up against the known signatures in the table. If a match is found, the memory pages
are compared to be sure that the pages are really duplicates. When a duplicate is found, the
hypervisor remaps the partition memory to the existing memory page and returns the
duplicate page to the AMS pool.
D U U U U U U U
U U U U U U U U U U U
U U U
D U U
U U U U U
D D U U U
U U
U U
D U
U U
With
Active Memory
Deduplication
Mappings
AMS shared memory pool
LPAR1
Logical Memory
LPAR2
Logical Memory
LPAR3
Logical Memory
D
U
Duplicate pages
Unique pages
KEY:
Free146 IBM Power 770 and 780 Technical Overview and Introduction
Figure 3-15 shows two pages being written in the AMS memory pool and having their
signatures matched on the deduplication table.
Figure 3-15 Memory pages having their signatures matched by Active Memory Deduplication
From the LPAR point of view, the AMD feature is completely transparent. If an LPAR attempts
to modify a deduplicated page, the hypervisor grabs a free page from the AMS pool, copies
the duplicate page contents into the new page, and maps the LPAR's reference to the new
page so that the LPAR can modify its own unique page.
System administrators can dynamically configure the size of the deduplication table, ranging
from 1/8192 up to 1/256 of the configured maximum AMS memory pool size. Having this table
too small might lead to missed deduplication opportunities. Conversely, having a table that is
too large might waste a small amount of overhead space.
The management of the Active Memory Deduplication feature is done via managed console,
allowing administrators to take the following steps:
Enable and disable Active Memory Deduplication at an AMS Pool level.
Display deduplication metrics.
Display and modify the deduplication table size.
AMS
Memory
Pool
Page A
Dedup
Table
Sign A
Signature
Function
AMS
Memory
Pool
Page A
Dedup
Table
Sign A
Signature
Function
Page B
Signature
Function
Signature of Page A being written
on the Deduplication Table
Signature of Page B matching
Sign A on the Deduplication TableChapter 3. Virtualization 147
Figure 3-16 shows the Active Memory Deduplication being enabled to a shared memory pool.
Figure 3-16 Enabling the Active Memory Deduplication for a shared memory pool
The Active Memory Deduplication feature requires the following minimum components:
PowerVM Enterprise edition
System firmware level 740
AIX Version 6: AIX 6.1 TL7 or later
AIX Version 7: AIX 7.1 TL1 SP1 or later
IBM i: 7.14 or 7.2 or later
SLES 11 SP2 or later
RHEL 6.2 or later
3.4.8 N_Port ID virtualization
N_Port ID virtualization (NPIV) is a technology that allows multiple logical partitions to access
independent physical storage through the same physical Fibre Channel adapter. This adapter
is attached to a Virtual I/O Server partition that acts only as a pass-through, managing the
data transfer through the POWER Hypervisor.
Each partition using NPIV is identified by a pair of unique worldwide port names, enabling you
to connect each partition to independent physical storage on a SAN. Unlike virtual SCSI, only
the client partitions see the disk.
For additional information and requirements for NPIV, see these resources:
PowerVM Migration from Physical to Virtual Storage, SG24-7825
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
NPIV is supported in PowerVM Standard and Enterprise Editions on the IBM Power 770 and
Power 780 servers.148 IBM Power 770 and 780 Technical Overview and Introduction
3.4.9 Operating system support for PowerVM
Table 3-5 summarizes the PowerVM features supported by the operating systems compatible
with the POWER7 processor-based servers.
Table 3-5 PowerVM features supported by AIX, IBM i, and Linux
Feature AIX
V5.3
AIX
V6.1
AIX
V7.1
IBM i
6.1.1
IBM i
7.1
RHEL
V5.7
RHEL
V6.1
SLES
V10
SP4
SLES
V11
SP1
V i r tu a l S C S I Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual Ethernet Yes Yes Yes Yes Yes Yes Yes Yes Yes
Shared Ethernet
Adapter
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual Fibre
Channel
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual Tape Yes Yes Yes Yes Yes Yes Yes Yes Yes
Logical Partitioning Yes Yes Yes Yes Yes Yes Yes Yes Yes
DLPAR I/O adapter
add/remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
DLPAR I/O
processor
add/remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
DLPAR I/O
memory add
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
DLPAR I/O
memory remove
Ye s Ye s Ye s Ye s Ye s Ye s Ye s N o Ye s
Micro-Partitioning Yes Yes Yes Yes Yes Yes Yes Yes Yes
Shared Dedicated
Capacity
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Multiple Shared
Processor Pools
Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s Ye s
Virtual I/O Server Yes Yes Yes Yes Yes Yes Yes Yes Yes
Suspend/Resume No Yes Yes No No No No No No
Shared Storage
Pools
Ye s Ye s Ye s Ye s Ye s
a
No No No No
Thin Provisioning Yes Yes Yes Yes
b
Ye s
b
No No No No
Active Memory
Sharing and Active
Memory
Deduplication
N o Ye s Ye s Ye s Ye s N o Ye s N o Ye s
Live Partition
Mobility
Ye s Ye s Ye s N o N o Ye s Ye s Ye s Ye sChapter 3. Virtualization 149
3.4.10 POWER7 Linux programming support
IBM Linux Technology Center (LTC) contributes to the development of Linux by providing
support for IBM hardware in Linux distributions. In particular, the LTC makes tools and code
available to the Linux communities to take advantage of the POWER7 technology and
develop POWER7 optimized software.
Table 3-6 lists the support of specific programming features for various versions of Linux.
Table 3-6 Linux support for POWER7 features
Simultaneous
Multi-Threading
(SMT)
Ye s
c
Ye s
d
Ye s Ye s
e
Ye s Ye s
c
Ye s
c
Ye s
c
Ye s
Active Memory
Expansion
No Yes
f
Yes No No No No No No
a. Requires IBM i 7.1 TR1.
b. Will become a fully provisioned device when used by IBM i.
c. Only supports two threads.
d. AIX 6.1 up to TL4 SP2 only supports two threads, and supports four threads as of TL4 SP3.
e. IBM i 6.1.1 and up support SMT4.
f. On AIX 6.1 with TL4 SP2 and later.
Feature AIX
V5.3
AIX
V6.1
AIX
V7.1
IBM i
6.1.1
IBM i
7.1
RHEL
V5.7
RHEL
V6.1
SLES
V10
SP4
SLES
V11
SP1
Features Linux releases Comments
SLES 10 SP4 SLES 11 RHEL 5.7 RHEL 6.1
POWER6
compatibility mode
Ye s Ye s Ye s Ye s -
POWER7 mode No Yes No Yes -
Strong Access
Ordering
No Yes No Yes Can improve Lx86
performance
Scale to 256 cores/
1024 threads
No Yes No Yes Base OS support
available
4-way SMT No Yes No Yes -
VSX support No Yes No Yes Full exploitation
requires Advance
Toolchain.
Distro toolchain
mcpu/mtune=p7
No Yes No Yes SLES11/GA toolchain
has minimal P7
enablement necessary
to support kernel build
Advance Toolchain
support
Ye s ,
execution
restricted to
Power6
instructions
Ye s Ye s ,
execution
restricted to
Power6
instructions
Yes Alternative IBM GNU
Toolchain
64k base page size No Yes Yes Yes -150 IBM Power 770 and 780 Technical Overview and Introduction
For information regarding Advance Toolchain, see the following website:
http://www.ibm.com/developerworks/wikis/display/hpccentral/How+to+use+Advance+Tool
chain+for+Linux+on+POWER
Also see the University of Illinois Linux on Power Open Source Repository:
http://ppclinux.ncsa.illinois.edu
ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at05/suse/SLES_11/release_notes.at05-
2.1-0.html
ftp://linuxpatch.ncsa.uiuc.edu/toolchain/at/at05/redhat/RHEL5/release_notes.at05-
2.1-0.html
3.5 System Planning Tool
The IBM System Planning Tool (SPT) helps you design systems to be partitioned with logical
partitions. You can also plan for and design non-partitioned systems by using the SPT. The
resulting output of your design is called a system plan, which is stored in a .sysplan file. This
file can contain plans for a single system or multiple systems. The .sysplan file can be used
for the following reasons:
To create reports
As input to the IBM configuration tool (e-Config)
To create and deploy partitions on your system (or systems) automatically
System plans that are generated by the SPT can be deployed on the system by the Hardware
Management Console (HMC), Systems Director Management Console (SDMC), or
Integrated Virtualization Manager (IVM).
You can create an entirely new system configuration, or you can create a system
configuration based on any of these items:
Performance data from an existing system that the new system is to replace
Performance estimates that anticipates future workloads that you must support
Sample systems that you can customize to fit your needs
Integration between the SPT and both the Workload Estimator (WLE) and IBM Performance
Management (PM) allows you to create a system that is based on performance and capacity
data from an existing system or that is based on new workloads that you specify.
Tickless idle No Yes No Yes Improved energy
utilization and
virtualization of
partially to fully idle
partitions
Features Linux releases Comments
SLES 10 SP4 SLES 11 RHEL 5.7 RHEL 6.1
Note: Ask your IBM representative or Business Partner to use the Customer Specified
Placement manufacturing option if you want to automatically deploy your partitioning
environment on a new machine. SPT looks for the resource’s allocation to be the same as
that specified in your .sysplan file.Chapter 3. Virtualization 151
You can use the SPT before you order a system to determine what you must order to support
your workload. You can also use the SPT to determine how you can partition a system that
you already have.
Using the System Planning Tool is an effective way of documenting and backing up key
system settings and partition definitions. It allows the user to create records of systems and
export them to their personal workstation or backup system of choice. These same backups
can then be imported back onto the same managed console when needed. This can be
useful when cloning systems enabling the user to import the system plan to any managed
console multiple times.
The SPT and its supporting documentation can be found on the IBM System Planning
Tool site:
http://www.ibm.com/systems/support/tools/systemplanningtool/152 IBM Power 770 and 780 Technical Overview and Introduction© Copyright IBM Corp. 2011. All rights reserved. 153
Chapter 4. Continuous availability and
manageability
This chapter provides information about IBM reliability, availability, and serviceability (RAS)
design and features. This set of technologies implemented on IBM Power Systems servers
provides the possibility to improve your architecture’s total cost of ownership (TCO) by
reducing unplanned down time.
RAS can be described as follows:
Reliability: Indicates how infrequently a defect or fault in a server manifests itself
Availability: Indicates how infrequently the functionality of a system or application is
impacted by a fault or defect
Serviceability: Indicates how well faults and their impacts are communicated to users and
services, and how efficiently and nondisruptively the faults are repaired
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. POWER7 processor-based servers have new features to support new levels of
virtualization, help ease administrative burden, and increase system utilization.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
POWER7 uses lower voltage technology, improving reliability with stacked latches to reduce
soft error (SER) susceptibility. During the design and development process, subsystems go
through rigorous verification and integration testing processes. During system manufacturing,
systems go through a thorough testing process to help ensure high product quality levels.
The processor and memory subsystem contain a number of features designed to avoid or
correct environmentally induced, single-bit, intermittent failures, as well as handle solid faults
in components, including selective redundancy to tolerate certain faults without requiring an
outage or parts replacement.
4154 IBM Power 770 and 780 Technical Overview and Introduction
IBM is the only vendor that designs, manufactures, and integrates its most critical server
components, including:
POWER processors
Caches
Memory buffers
Hub-controllers
Clock cards
Service processors
Design and manufacturing verification and integration, as well as field support information, is
used as feedback for continued improvement on the final products.
This chapter also includes a manageability section describing the means to successfully
manage your systems.
Several software-based availability features exist that are based on the benefits available
when using AIX and IBM i as the operating system. Support of these features when using
Linux can vary.Chapter 4. Continuous availability and manageability 155
4.1 Reliability
Highly reliable systems are built with highly reliable components. On IBM POWER
processor-based systems, this basic principle is expanded upon with a clear design for
reliability architecture and methodology. A concentrated, systematic, architecture-based
approach is designed to improve overall system reliability with each successive generation of
system offerings.
4.1.1 Designed for reliability
Systems designed with fewer components and interconnects have fewer opportunities to fail.
Simple design choices, such as integrating processor cores on a single POWER chip, can
dramatically reduce the opportunity for system failures. In this case, an 8-core server can
include one-fourth as many processor chips (and chip socket interfaces) as with a double
CPU-per-processor design. Not only does this case reduce the total number of system
components, it reduces the total amount of heat generated in the design, resulting in an
additional reduction in required power and cooling components. POWER7 processor-based
servers also integrate L3 cache into the processor chip for a higher integration of parts.
Parts selection also plays a critical role in overall system reliability. IBM uses three grades of
components, grade 3 defined as industry standard (off-the-shelf). As shown in Figure 4-1,
using stringent design criteria and an extensive testing program, the IBM manufacturing team
can produce grade 1 components that are expected to be 10 times more reliable than
industry standard. Engineers select grade 1 parts for the most critical system components.
Newly introduced organic packaging technologies, rated grade 5, achieve the same reliability
as grade 1 parts.
Figure 4-1 Component failure rates
Component failure rates
0
0.2
0.4
0.6
0.8
1
Grade 3 Grade 1 Grade 5156 IBM Power 770 and 780 Technical Overview and Introduction
4.1.2 Placement of components
Packaging is designed to deliver both high performance and high reliability. For example,
the reliability of electronic components is directly related to their thermal environment, that
is, large decreases in component reliability are directly correlated with relatively small
increases in temperature. POWER processor-based systems are carefully packaged to
ensure adequate cooling. Critical system components such as the POWER7 processor chips
are positioned on printed circuit cards so that they receive fresh air during operation. In
addition, POWER processor-based systems are built with redundant, variable-speed fans that
can automatically increase output to compensate for increased heat in the central
electronic complex.
4.1.3 Redundant components and concurrent repair
High-opportunity components, or those that most affect system availability, are protected with
redundancy and the ability to be repaired concurrently.
The use of redundant parts allows the system to remain operational. Among the parts are:
POWER7 cores, which include redundant bits in L1-I, L1-D, and L2 caches, and in L2 and
L3 directories
Power 770 and Power 780 main memory DIMMs, which contain an extra DRAM chip for
improved redundancy
Power 770 and 780 redundant system clock and service processor for configurations with
two or more central electronics complex (CEC) drawers
Redundant and hot-swap cooling
Redundant and hot-swap power supplies
Redundant 12X loops to I/O subsystem
For maximum availability, be sure to connect power cords from the same system to two
separate Power Distribution Units (PDUs) in the rack and to connect each PDU to
independent power sources. Deskside form factor power cords must be plugged into two
independent power sources to achieve maximum availability.
4.2 Availability
The IBM hardware and microcode capability to continuously monitor execution of hardware
functions is generally described as the process of first-failure data capture (FFDC). This
process includes the strategy of predictive failure analysis, which refers to the ability to track
intermittent correctable errors and to vary components off-line before they reach the point of
hard failure, causing a system outage, and without the need to re-create the problem.
Note: Check your configuration for optional redundant components before ordering
your system.Chapter 4. Continuous availability and manageability 157
The POWER7 family of systems continues to introduce significant enhancements that are
designed to increase system availability and ultimately a high availability objective with
hardware components that are able to perform the following functions:
Self-diagnose and self-correct during run time.
Automatically reconfigure to mitigate potential problems from suspect hardware.
Self-heal or automatically substitute good components for failing components.
Throughout this chapter, we describe IBM POWER technology’s capabilities that are focused
on keeping a system environment up and running. For a specific set of functions that are
focused on detecting errors before they become serious enough to stop computing work, see
4.3.1, “Detecting” on page 169.
4.2.1 Partition availability priority
Also available is the ability to assign availability priorities to partitions. If an alternate
processor recovery event requires spare processor resources and there are no other means
of obtaining the spare resources, the system determines which partition has the lowest
priority and attempts to claim the needed resource. On a properly configured POWER
processor-based server, this approach allows that capacity to first be obtained from a
low-priority partition instead of a high-priority partition.
This capability is relevant to the total system availability because it gives the system an
additional stage before an unplanned outage. In the event that insufficient resources exist to
maintain full system availability, these servers attempt to maintain partition availability by
user-defined priority.
Partition availability priority is assigned to partitions using a weight value or integer rating, the
lowest priority partition rated at 0 (zero) and the highest priority partition valued at 255. The
default value is set at 127 for standard partitions and 192 for Virtual I/O Server (VIOS)
partitions. You can vary the priority of individual partitions.
Partition availability priorities can be set for both dedicated and shared processor partitions.
The POWER Hypervisor uses the relative partition weight value among active partitions to
favor higher priority partitions for processor sharing, adding and removing processor capacity,
and favoring higher priority partitions for normal operation.
Note that the partition specifications for minimum, desired, and maximum capacity are also
taken into account for capacity-on-demand options and if total system-wide processor
capacity becomes disabled because of deconfigured failed processor cores. For example, if
total system-wide processor capacity is sufficient to run all partitions, at least with the
minimum capacity, the partitions are allowed to start or continue running. If processor
capacity is insufficient to run a partition at its minimum value, then starting that partition
results in an error condition that must be resolved.
4.2.2 General detection and deallocation of failing components
Runtime correctable or recoverable errors are monitored to determine if there is a pattern of
errors. If these components reach a predefined error limit, the service processor initiates an
action to deconfigure the faulty hardware, helping to avoid a potential system outage and to
enhance system availability.
Note: POWER7 processor-based servers are independent of the operating system for
error detection and fault isolation within the central electronics complex.158 IBM Power 770 and 780 Technical Overview and Introduction
Persistent deallocation
To enhance system availability, a component that is identified for deallocation or
deconfiguration on a POWER processor-based system is flagged for persistent deallocation.
Component removal can occur either dynamically (while the system is running) or at boot
time (IPL), depending both on the type of fault and when the fault is detected.
In addition, runtime unrecoverable hardware faults can be deconfigured from the system after
the first occurrence. The system can be rebooted immediately after failure and resume
operation on the remaining stable hardware. This way prevents the same faulty hardware
from affecting system operation again. The repair action is deferred to a more convenient,
less critical time.
Persistent deallocation functions include:
Processor
L2/L3 cache lines (cache lines are dynamically deleted)
Memory
Deconfigure or bypass failing I/O adapters
Processor instruction retry
As in POWER6, the POWER7 processor has the ability to retry processor instruction and
alternate processor recovery for a number of core related faults. This ability significantly
reduces exposure to both permanent and intermittent errors in the processor core.
Intermittent errors, often because of cosmic rays or other sources of radiation, are generally
not repeatable.
With this function, when an error is encountered in the core, in caches and certain logic
functions, the POWER7 processor first automatically retries the instruction. If the source of
the error was truly transient, the instruction succeeds and the system continues as before.
On IBM systems prior to POWER6, this error caused a checkstop.
Alternate processor retry
Hard failures are more difficult, being permanent errors that are replicated each time that the
instruction is repeated. Retrying the instruction does not help in this situation because the
instruction will continue to fail.
As in POWER6, POWER7 processors have the ability to extract the failing instruction from
the faulty core and retry it elsewhere in the system for a number of faults, after which the
failing core is dynamically deconfigured and scheduled for replacement.
Dynamic processor deallocation
Dynamic processor deallocation enables automatic deconfiguration of processor cores when
patterns of recoverable core-related faults are detected. Dynamic processor deallocation
prevents a recoverable error from escalating to an unrecoverable system error, which might
otherwise result in an unscheduled server outage. Dynamic processor deallocation relies on
the service processor’s ability to use FFDC-generated recoverable error information to notify
the POWER Hypervisor when a processor core reaches its predefined error limit. Then the
POWER Hypervisor dynamically deconfigures the failing core and is called out for
replacement. The entire process is transparent to the partition owning the failing instruction.
If there are available inactivated processor cores or CoD processor cores, the system
effectively puts a CoD processor into operation after an activated processor is determined to
no longer be operational. In this way, the server remains with its total processor power.Chapter 4. Continuous availability and manageability 159
If there are no CoD processor cores available system-wide, total processor capacity is
lowered below the licensed number of cores.
Single processor checkstop
As in POWER6, POWER7 provides single-processor check-stopping for certain processor
logic, command, or control errors that cannot be handled by the availability enhancements in
the preceding section.
This way significantly reduces the probability of any one processor affecting total system
availability by containing most processor checkstops to the partition that was using the
processor at the time that the full checkstop goes into effect.
Even with all these availability enhancements to prevent processor errors from affecting
system-wide availability, errors might result on a system-wide outage.
4.2.3 Memory protection
A memory protection architecture that provides good error resilience for a relatively small L1
cache might be very inadequate for protecting the much larger system main store. Therefore,
a variety of protection methods is used in POWER processor-based systems to avoid
uncorrectable errors in memory.
Memory protection plans must take into account many factors, including:
Size
Desired performance
Memory array manufacturing characteristics
POWER7 processor-based systems have a number of protection schemes designed to
prevent, protect, or limit the effect of errors in main memory. These capabilities include:
64-byte ECC code
This innovative ECC algorithm from IBM research allows a full 8-bit device kill to be
corrected dynamically. This ECC code mechanism works on DIMM pairs on a rank basis.
(Depending on the size, a DIMM might have one, two, or four ranks.) With this ECC code,
an entirely bad DRAM chip can be marked as bad (chip mark). After marking the DRAM
as bad, the code corrects all the errors in the bad DRAM. It can additionally mark a 2-bit
symbol as bad and correct the 2-bit symbol, providing a double-error detect or single-error
correct ECC, or a better level of protection in addition to the detection or correction of a
chipkill event.
This improvement in the ECC word algorithm replaces the redundant bit steering used on
POWER6 systems.
The Power 770 and 780, and future POWER7 high-end machines, have a spare DRAM
chip per rank on each DIMM that can be spared out. Effectively, this protection means that
on a rank basis, a DIMM pair can detect and correct two and sometimes three chipkill
events and still provide better protection than ECC, explained in the previous paragraph.
Hardware scrubbing
Hardware scrubbing is a method used to deal with intermittent errors. IBM POWER
processor-based systems periodically address all memory locations. Any memory
locations with a correctable error are rewritten with the correct data.160 IBM Power 770 and 780 Technical Overview and Introduction
CRC
The bus that is transferring data between the processor and the memory uses CRC error
detection with a failed operation-retry mechanism and the ability to dynamically retune bus
parameters when a fault occurs. In addition, the memory bus has spare capacity to
substitute a spare data bit-line, for that which is determined to be faulty.
Chipkill
Chipkill is an enhancement that enables a system to sustain the failure of an entire DRAM
chip. Chipkill spreads the bit lines from a DRAM over multiple ECC words so that a
catastrophic DRAM failure does not affect more of what is protected by the ECC code
implementation. The system can continue indefinitely in this state with no performance
degradation until the failed DIMM can be replaced. Figure 4-2 shows an example of how
chipkill technology spreads bit lines across multiple ECC words.
Figure 4-2 Chipkill in action with a spare memory DRAM chip on a Power 770 and Power 780
POWER7 memory subsystem
The POWER7 chip contains two memory controllers with four channels per memory
controller. Each channel connects to a single DIMM, but because the channels work in pairs,
a processor chip can address four DIMM pairs, two pairs per memory controller.
The bus transferring data between the processor and the memory uses CRC error detection
with a failed operation-retry mechanism and the ability to dynamically retune bus parameters
when a fault occurs. In addition, the memory bus has spare capacity to substitute a spare
data bit-line, for that which is determined to be faulty.
ECC word ECC word ECC word ECC word
Chipkill
DRAM sparing
Spare
memory
chip
Scattered memory chip bits across separate ECC words for ChipkillChapter 4. Continuous availability and manageability 161
Figure 4-3 shows a POWER7 chip, with its memory interface, consisting of two controllers
and four DIMMs per controller. Advanced memory buffer chips are exclusive to IBM and help
to increase performance, acting as read/write buffers. On the Power 770 and Power 780, the
advanced memory buffer chips are integrated into the DIMM that they support.
Figure 4-3 POWER7 memory subsystem
Memory page deallocation
Although coincident cell errors in separate memory chips are a statistic rarity, IBM POWER
processor-based systems can contain these errors by using a memory page deallocation
scheme for partitions that are running IBM AIX and IBM i operating systems, as well as for
memory pages owned by the POWER Hypervisor. If a memory address experiences an
uncorrectable or repeated correctable single cell error, the service processor sends the
memory page address to the POWER Hypervisor to be marked for deallocation.
Pages used by the POWER Hypervisor are deallocated as soon as the page is released.
In other cases, the POWER Hypervisor notifies the owning partition that the page should be
deallocated. Where possible, the operating system moves any data currently contained in
that memory area to another memory area and removes the page (or pages) that are
associated with this error from its memory map, no longer addressing these pages. The
operating system performs memory page deallocation without any user intervention and is
transparent to users and applications.
The POWER Hypervisor maintains a list of pages that are marked for deallocation during the
current platform Initial Program Load (IPL). During a partition IPL, the partition receives a list
of all the bad pages in its address space. In addition, if memory is dynamically added to a
partition (through a dynamic LPAR operation), the POWER Hypervisor warns the operating
system when memory pages are included that need to be deallocated.
Ctrl
DIMM
Ctrl
DIMM
Ctrl
DIMM
Ctrl
DIMM
GX
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
POWER7
Core
256 KB L2
32 MB L3 Cache
Memory Controller Memory Controller
SMP Fabric
Port Port Port Port
Ctrl
DIMM
Ctrl
DIMM
Ctrl
DIMM
Ctrl
DIMM
Buffer Chip Buffer Chip Buffer Chip Buffer Chip162 IBM Power 770 and 780 Technical Overview and Introduction
Finally, if an uncorrectable error in memory is discovered, the logical memory block
associated with the address with the uncorrectable error is marked for deallocation by the
POWER Hypervisor. This deallocation takes effect on a partition reboot if the logical memory
block is assigned to an active partition at the time of the fault.
In addition, the system deallocates the entire memory group that is associated with the error
on all subsequent system reboots until the memory is repaired. This way is intended to guard
against future uncorrectable errors while waiting for parts replacement.
Memory persistent deallocation
Defective memory that is discovered at boot time is automatically switched off. If the service
processor detects a memory fault at boot time, it marks the affected memory as bad so that it
is not to be used on subsequent reboots.
If the service processor identifies faulty memory in a server that includes CoD memory, the
POWER Hypervisor attempts to replace the faulty memory with available CoD memory.
Faulty resources are marked as deallocated, and working resources are included in the active
memory space. Because these activities reduce the amount of CoD memory available for
future use, schedule repair of the faulty memory as soon as convenient.
Upon reboot, if not enough memory is available to meet minimum partition requirements, the
POWER Hypervisor reduces the capacity of one or more partitions.
Depending on the configuration of the system, the HMC Service Focal Point™, the OS
Service Focal Point, or the service processor receives a notification of the failed component
and triggers a service call.
4.2.4 Active Memory Mirroring for Hypervisor
Active Memory Mirroring (AMM) for Hypervisor is a hardware and firmware function of
Power 770 and Power 780 systems that provides the ability of the POWER7 chip to create
two copies of data in memory. Having two copies eliminates a system-wide outage due to an
uncorrectable failure of a single DIMM in the main memory used by the hypervisor (also
called System firmware). This capability is standard and enabled by default on the Power 780
server. On the Power 770 it is an optional chargeable feature.
What memory is mirrored
These are the areas of memory that are mirrored:
Hypervisor data that is mirrored
– Hardware Page Tables (HPTs) that are managed by the hypervisor on behalf of
partitions to track the state of the memory pages assigned to the partition
– Translation control entries (TCEs) that are managed by the hypervisor on behalf of
partitions to communicate with partition I/O buffers for I/O devices
– Hypervisor code (instructions that make up the hypervisor kernel)
– Memory used by hypervisor to maintain partition configuration, I/O states, Virtual I/O
information, partition state, and so on
Note: Memory page deallocation handles single cell failures, but because of the sheer size
of data in a data bit line, it might be inadequate for dealing with more catastrophic failures.Chapter 4. Continuous availability and manageability 163
Hypervisor data that is not mirrored
– Advanced Memory Sharing (AMS) pool
– Memory used to hold contents of platform dump while waiting for offload to
management console
Partition data that is not mirrored
– Desired memory configured for individual partitions is not mirrored.
To enable mirroring, the requirement is to have eight equally sized functional memory DIMMs
behind at least one POWER7 chip in each CEC enclosure. The DIMMs will be managed by
the same memory controller. The sizes of DIMMs might be different from one Power 7 chip
to another.
A write operation in the memory begins on the first DIMM of a mirrored DIMM pair. When
this write is complete, the POWER7 chip writes the same data to a second DIMM of the
DIMM pair.
The read operations alternate between both DIMMs.
Figure 4-4 shows the hardware implementation of Memory Mirroring for Hypervisor.
Figure 4-4 Hardware implementation of Memory Mirroring for Hypervisor
The impact on performance is very low. Whereas writes operations are slightly slower
because two writes are actually done, reads are faster because two sources for the data are
used. Measured commercial workloads show no gain or loss in performance due to mirroring.
HPC workload performing huge amounts of string manipulation might see a slight
performance effect.
DIMMs
Memory controller 0
Power7 processor
chip
0
1
2
3
4
5
6
7
Mirrored Data
Data
Memory controller 1164 IBM Power 770 and 780 Technical Overview and Introduction
The Active Memory Mirroring can be disabled or enabled on the management console using
the Advanced tab of the server properties (Figure 4-5).
Figure 4-5 Enabling or disabling active memory sharing
The system must be entirely powered off and then powered on to change from mirroring
mode to non-mirrored mode.
This same frame also gives informations about the mirroring status:
Desired mirroring mode: Takes the values “Off” or “System firmware only”
System firmware mirroring status
– Fully mirrored: The mirroring is completely functional.
– Partially functional: Due to uncorrectable memory failures, some of the hypervisor
elements or objects are not mirrored. The system remains partially mirrored until DIMM
is replaced and the system is rebooted.
– Not mirrored: At the last power on of the system, the desired state was “mirroring off.’
Mirrorable memory: Total amount of physical memory that can be mirrored, which is
based on the DIMMs that are plugged
Mirrored memory in use
Available mirrored memoryChapter 4. Continuous availability and manageability 165
Mirroring optimization
Hypervisor mirroring requires specific memory locations. Those locations might be assign to
other purposes (for LPAR memory, for example) due to memory’s management based on the
logical memory block. To “reclaim” those memory locations, an optimization tool is available
on the Advanced tab of the system properties (Figure 4-6).
Figure 4-6 Optimization Tool
You can define the amount of memory available for mirroring by either selecting a custom
value or making available as much mirrorable memory as possible. After selecting OK, this
action copies the active partition’s contents from one LMB to another to free pairs of mirrored
memory. The copy operation will have a slight impact on performance while in progress.
The operation can be stopped by selecting Cancel. A time limit can also be specified.
DIMM guard at system boot
During system boot the FSP will guard a failing DIMM. Because there will not be eight
functional DIMMs behind a memory controller, hypervisor mirroring is not possible on this
chip. Then at boot time:
If there are other chips in the book with mirrorable memory, the system will boot
fully mirrored.
If this was the only mirrorable memory in this book, hypervisor enters a partially mirrored
state. Not all of the hypervisor objects are mirrored, and therefore are unprotected.
Hypervisor will continue to mirror as much as possible to continue to provide protection. If
a second uncorrectable error occurs in the same CEC while in partial mirror state, this will
likely result in system failure. The system remains partially mirrored until the DIMM is
replaced and the CEC is rebooted.166 IBM Power 770 and 780 Technical Overview and Introduction
Advanced memory mirroring features
On the Power 770 server, the Advanced Memory Mirroring for Hypervisor function is an
optional chargable feature. It must be selected in econfig.
On this server, the advanced memory mirroring is activated by entering an activation code
(also called Virtualization Technology Code, or VET) in the management console. If the
customer enables mirroring from the management console without entering the activation
code, the system boots only to standby and will wait for the customer to enter the VET code
(New SRC A700474A displays). If mirroring was enabled by mistake, you must disable it and
power cycle the CEC, as mirroring state requires a CEC reboot to change. Hypervisor
mirroring is disabled by default on the Power 770 server.
On the Power 780 server, this feature is standard. There is no individual feature code in
econfig. The mirroring is enabled by default on the server.
4.2.5 Cache protection
POWER7 processor-based systems are designed with cache protection mechanisms,
including cache-line delete in both L2 and L3 arrays, Processor Instruction Retry and
Alternate Processor Recovery protection on L1-I and L1-D, and redundant Repair bits in L1-I,
L1-D, and L2 caches, and in L2 and L3 directories.
L1 instruction and data array protection
The POWER7 processor’s instruction and data caches are protected against intermittent
errors by using Processor Instruction Retry and against permanent errors by Alternate
Processor Recovery, both mentioned previously. L1 cache is divided into sets. POWER7
processor can deallocate all but one set before doing a Processor Instruction Retry.
In addition, faults in the Segment Lookaside Buffer (SLB) array are recoverable by the
POWER Hypervisor. The SLB is used in the core to perform address translation calculations.
L2 and L3 array protection
The L2 and L3 caches in the POWER7 processor are protected with double-bit detect
single-bit correct error detection code (ECC). Single-bit errors are corrected before being
forwarded to the processor and are subsequently written back to L2 and L3.
In addition, the caches maintain a cache-line delete capability. A threshold of correctable
errors detected on a cache line can result in the data in the cache line being purged and the
cache line removed from further operation without requiring a reboot. An ECC uncorrectable
error detected in the cache can also trigger a purge and deleting of the cache line. This
results in no loss of operation because an unmodified copy of the data can be held on system
memory to reload the cache line from main memory. Modified data is handled through
Special Uncorrectable Error handling.
L2-deleted and L3-deleted cache lines are marked for persistent deconfiguration on
subsequent system reboots until the processor card can be replaced.
4.2.6 Special uncorrectable error handling
Although rare, an uncorrectable data error can occur in memory or a cache. IBM POWER7
processor-based systems attempt to limit, to the least possible disruption, the impact of an
uncorrectable error using a well-defined strategy that first considers the data source. Chapter 4. Continuous availability and manageability 167
Sometimes an uncorrectable error is temporary in nature and occurs in data that can be
recovered from another repository. For example:
Data in the instruction L1 cache is never modified within the cache itself. Therefore, an
uncorrectable error discovered in the cache is treated like an ordinary cache-miss, and
correct data is loaded from the L2 cache.
The L2 and L3 cache of the POWER7 processor-based systems can hold an unmodified
copy of data in a portion of main memory. In this case, an uncorrectable error simply
triggers a reload of a cache line from main memory.
In cases where the data cannot be recovered from another source, a technique called Special
Uncorrectable Error (SUE) handling is used to prevent an uncorrectable error in memory or
cache from immediately causing the system to terminate. Instead, the system tags the data
and determines whether it can ever be used again.
If the error is irrelevant, it does not force a checkstop.
If the data is used, termination can be limited to the program, kernel, or hypervisor owning
the data, or a freezing of the I/O adapters that are controlled by an I/O hub controller if
data is to be transferred to an I/O device.
When an uncorrectable error is detected, the system modifies the associated ECC word,
thereby signaling to the rest of the system that the standard ECC is no longer valid. The
service processor is then notified and takes appropriate actions. When running AIX V5.2 (or
later) or Linux, and a process attempts to use the data, the operating system is informed of
the error and might terminate, or only terminate a specific process associated with the corrupt
data, depending on the operating system and firmware level and whether the data was
associated with a kernel or non-kernel process.
Only when the corrupt data is being used by the POWER Hypervisor can the entire system be
rebooted, thereby preserving overall system integrity. If Active Memory Mirroring is enabled,
the entire system is protected and continues to run.
Depending on the system configuration and the source of the data, errors encountered during
I/O operations might not result in a machine check. Instead, the incorrect data is handled by
the PCI host bridge (PHB) chip. When the PHB chip detects a problem, it rejects the data,
preventing data from being written to the I/O device. The PHB then enters a freeze mode,
halting normal operations. Depending on the model and type of I/O being used, the freeze
can include the entire PHB chip, or simply a single bridge, resulting in the loss of all I/O
operations that use the frozen hardware until a power-on reset of the PHB. The impact to
partitions depends on how the I/O is configured for redundancy. In a server that is configured
for fail-over availability, redundant adapters spanning multiple PHB chips can enable the
system to recover transparently, without partition loss.
4.2.7 PCI enhanced error handling
IBM estimates that PCI adapters can account for a significant portion of the hardware-based
errors on a large server. Although servers that rely on boot-time diagnostics can identify
failing components to be replaced by hot-swap and reconfiguration, runtime errors pose a
more significant problem.
PCI adapters are generally complex designs involving extensive on-board instruction
processing, often on embedded microcontrollers. They tend to use industry standard grade
components with an emphasis on product cost that is relative to high reliability. In certain
cases, they might be more likely to encounter internal microcode errors or many of the
hardware errors described for the rest of the server.168 IBM Power 770 and 780 Technical Overview and Introduction
The traditional means of handling these problems is through adapter internal-error reporting
and recovery techniques, in combination with operating system device-driver management
and diagnostics. In certain cases, an error in the adapter can cause transmission of bad data
on the PCI bus itself, resulting in a hardware-detected parity error and causing a global
machine check interrupt, eventually requiring a system reboot to continue.
PCI enhanced error handling-enabled adapters respond to a special data packet that is
generated from the affected PCI slot hardware by calling system firmware, which examines
the affected bus, allows the device driver to reset it, and continues without a system reboot.
For Linux, enhanced error handling (EEH) support extends to the majority of frequently used
devices, although various third-party PCI devices might not provide native EEH support.
To detect and correct PCIe bus errors, POWER7 processor-based systems use CRC
detection and instruction retry correction. For PCI-X, it uses ECC.
Figure 4-7 shows the location and mechanisms used throughout the I/O subsystem for
PCI-enhanced error handling.
Figure 4-7 PCI-enhanced error handling
4.2.8 POWER7 I/O chip freeze behavior
The POWER7 I/O chip implements a “freeze behavior” for uncorrectable errors borne on the
GX+ bus and for internal POWER7 I/O chip errors detected by the POWER7 I/O chip. With
this freeze behavior, the chip refuses I/O requests to the attached I/O, but does not check
stop the system. This allows systems with redundant I/O to continue operating without an
outage instead of system checkstops seen in earlier chips, such as the POWER5 I/O chip
used on POWER6 processor-based systems.
PCIe
Adapter
PCI-X
Adapter
Parity error
Parity error
I/O drawer concurrent add
CRC with
retry or ECC
PCI Bridge Enhanced
Error Handling
PCI-X to PCI-X
POWER7
12X Channel
Hub
PCI-X
Bridge
PCI-X
Bridge
POWER7
12X Channel
Hub
12X Channel –
PCIe Bridge
GX+ / GX++ bus
adapter
12x channel failover
support
PCI Bus Enhanced Error
HandlingChapter 4. Continuous availability and manageability 169
4.3 Serviceability
IBM Power Systems design considers both IBM and client needs. The IBM Serviceability
Team has enhanced the base service capabilities and continues to implement a strategy that
incorporates best-of-breed service characteristics from diverse IBM systems offerings.
Serviceability includes system installation, system upgrades and downgrades (MES), and
system maintenance and repair.
The goal of the IBM Serviceability Team is to design and provide the most efficient system
service environment that includes:
Easy access to service components, design for Customer Set Up (CSU), Customer
Installed Features (CIF), and Customer Replaceable Units (CRU)
On demand service education
Error detection and fault isolation (ED/FI)
First-failure data capture (FFDC)
An automated guided repair strategy that uses common service interfaces for a converged
service approach across multiple IBM server platforms
By delivering on these goals, IBM Power Systems servers enable faster and more accurate
repair and reduce the possibility of human error.
Client control of the service environment extends to firmware maintenance on all of the
POWER processor-based systems. This strategy contributes to higher systems availability
with reduced maintenance costs.
This section provides an overview of the progressive steps of error detection, analysis,
reporting, notification, and repairing that are found in all POWER processor-based systems.
4.3.1 Detecting
The first and most crucial component of a solid serviceability strategy is the ability to
accurately and effectively detect errors when they occur. Although not all errors are a
guaranteed threat to system availability, those that go undetected can cause problems
because the system does not have the opportunity to evaluate and act if necessary. POWER
processor-based systems employ System z® server-inspired error detection mechanisms
that extend from processor cores and memory to power supplies and hard drives.
Service processor
The service processor is a microprocessor that is powered separately from the main
instruction processing complex. The service processor provides the capabilities for:
POWER Hypervisor (system firmware) and Hardware Management Console
connection surveillance
Several remote power control options
Reset and boot features
Environmental monitoring
The service processor monitors the server’s built-in temperature sensors, sending
instructions to the system fans to increase rotational speed when the ambient temperature
is above the normal operating range. Using an architected operating system interface, the
service processor notifies the operating system of potential environmentally related 170 IBM Power 770 and 780 Technical Overview and Introduction
problems so that the system administrator can take appropriate corrective actions before
a critical failure threshold is reached.
The service processor can also post a warning and initiate an orderly system shutdown in
the following circumstances:
– The operating temperature exceeds the critical level (for example, failure of air
conditioning or air circulation around the system).
– The system fan speed is out of operational specification (for example, because of
multiple fan failures).
– The server input voltages are out of operational specification.
The service processor can immediately shut down a system in the
following circumstances:
– Temperature exceeds the critical level or remains above the warning level for too long.
– Internal component temperatures reach critical levels.
– Non-redundant fan failures occur.
Placing calls
On systems without a Hardware Management Console, the service processor can place
calls to report surveillance failures with the POWER Hypervisor, critical environmental
faults, and critical processing faults even when the main processing unit is inoperable.
Mutual surveillance
The service processor monitors the operation of the POWER Hypervisor firmware during
the boot process and watches for loss of control during system operation. It also allows
the POWER Hypervisor to monitor service processor activity. The service processor can
take appropriate action, including calling for service, when it detects that the POWER
Hypervisor firmware has lost control. Likewise, the POWER Hypervisor can request a
service processor repair action if necessary.
Availability
The auto-restart (reboot) option, when enabled, can reboot the system automatically
following an unrecoverable firmware error, firmware hang, hardware failure, or
environmentally induced (AC power) failure.Chapter 4. Continuous availability and manageability 171
Figure 4-8 ASMI Auto Power Restart setting panel
Fault monitoring
Built-in self-test (BIST) checks processor, cache, memory, and associated hardware that
is required for proper booting of the operating system, when the system is powered on at
the initial installation or after a hardware configuration change (for example, an upgrade).
If a non-critical error is detected or if the error occurs in a resource that can be removed
from the system configuration, the booting process is designed to proceed to completion.
The errors are logged in the system nonvolatile random access memory (NVRAM). When
the operating system completes booting, the information is passed from the NVRAM to the
system error log where it is analyzed by error log analysis (ELA) routines. Appropriate
actions are taken to report the boot-time error for subsequent service, if required.
Note: The auto-restart (reboot) option has to be enabled from the Advanced System
Manager Interface or from the Control (Operator) Panel. Figure 4-8 shows this option
using the ASMI.172 IBM Power 770 and 780 Technical Overview and Introduction
Concurrent access to the service processors menus of the Advanced System
Management Interface (ASMI)
This access allows nondisruptive abilities to change system default parameters,
interrogate service processor progress and error logs andset and reset server indicators
(Guiding Light for midrange and high-end servers, Light Path for low-end servers),
accessing all service processor functions without having to power down the system to the
standby state. This allows the administrator or service representative to dynamically
access the menus from any eeb browser-enabled console that is attached to the Ethernet
service network, concurrently with normal system operation.
Managing the interfaces for connecting uninterruptible power source systems to the
POWER processor-based systems, performing Timed Power-On (TPO) sequences, and
interfacing with the power and cooling subsystem
Error checkers
IBM POWER processor-based systems contain specialized hardware detection circuitry that
is used to detect erroneous hardware operations. Error checking hardware ranges from parity
error detection coupled with processor instruction retry and bus retry, to ECC correction on
caches and system buses. All IBM hardware error checkers have distinct attributes:
Continuous monitoring of system operations to detect potential calculation errors.
Attempts to isolate physical faults based on run time detection of each unique failure.
Ability to initiate a wide variety of recovery mechanisms designed to correct the problem.
The POWER processor-based systems include extensive hardware and firmware
recovery logic.
Fault isolation registers
Error checker signals are captured and stored in hardware fault isolation registers (FIRs). The
associated logic circuitry is used to limit the domain of an error to the first checker that
encounters the error. In this way, runtime error diagnostics can be deterministic so that for
every check station, the unique error domain for that checker is defined and documented.
Ultimately, the error domain becomes the field-replaceable unit (FRU) call, and manual
interpretation of the data is not normally required.
First-failure data capture
FFDC is an error isolation technique, which ensures that when a fault is detected in a
system through error checkers or other types of detection methods, the root cause of the fault
will be captured without the need to re-create the problem or run an extended tracing or
diagnostics program.
For the vast majority of faults, a good FFDC design means that the root cause is detected
automatically without intervention by a service representative. Pertinent error data related to
the fault is captured and saved for analysis. In hardware, FFDC data is collected from the
fault isolation registers and from the associated logic. In firmware, this data consists of return
codes, function calls, and so forth.
FFDC check stations are carefully positioned within the server logic and data paths to
ensure that potential errors can be quickly identified and accurately tracked to a
field-replaceable unit (FRU).
This proactive diagnostic strategy is a significant improvement over the classic, less accurate
reboot and diagnose service approaches. Chapter 4. Continuous availability and manageability 173
Figure 4-9 shows a schematic of a fault isolation register implementation.
Figure 4-9 Schematic of FIR implementation
Fault isolation
The service processor interprets error data that is captured by the FFDC checkers (saved in
the FIRs or other firmware-related data capture methods) to determine the root cause of the
error event.
Root cause analysis might indicate that the event is recoverable, meaning that a service
action point or need for repair has not been reached. Alternatively, it could indicate that a
service action point has been reached, where the event exceeded a pre-determined
threshold or was unrecoverable. Based on the isolation analysis, recoverable error
threshold counts can be incremented. No specific service action is necessary when the
event is recoverable.
When the event requires a service action, additional required information is collected to
service the fault. For unrecoverable errors or for recoverable events that meet or exceed their
service threshold, meaning that a service action point has been reached, a request for
service is initiated through an error logging component.
4.3.2 Diagnosing
Using the extensive network of advanced and complementary error detection logic that is built
directly into hardware, firmware, and operating systems, the IBM Power Systems servers can
perform considerable self-diagnosis.
Memory
CPU
L2 / L3
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
L1
Disk
Text
Text
Text
Text
Text
Text
Text
Text
Non-volatile
RAM
Service
Processor
Error checkers
Text Fault isolation register (FIR)
Unique fingerprint of each
captured error
Log error174 IBM Power 770 and 780 Technical Overview and Introduction
Boot time
When an IBM Power Systems server powers up, the service processor initializes the system
hardware. Boot-time diagnostic testing uses a multi-tier approach for system validation,
starting with managed low-level diagnostics that are supplemented with system firmware
initialization and configuration of I/O hardware, followed by OS-initiated software test
routines. Boot-time diagnostic routines include:
Built-in self-tests (BISTs) for both logic components and arrays ensure the internal
integrity of components. Because the service processor assists in performing these tests,
the system is enabled to perform fault determination and isolation, whether or not the
system processors are operational. Boot-time BISTs can also find faults undetectable by
processor-based power-on self-test (POST) or diagnostics.
Wire-tests discover and precisely identify connection faults between components such as
processors, memory, or I/O hub chips.
Initialization of components such as ECC memory, typically by writing patterns of data and
allowing the server to store valid ECC data for each location, can help isolate errors.
To minimize boot time, the system determines which of the diagnostics are required to be
started to ensure correct operation, based on the way that the system was powered off, or on
the boot-time selection menu.
Run time
All Power Systems servers can monitor critical system components during run time, and they
can take corrective actions when recoverable faults occur. IBM hardware error-check
architecture provides the ability to report non-critical errors in an out-of-band communications
path to the service processor without affecting system performance.
A significant part of IBM runtime diagnostic capabilities originate with the service processor.
Extensive diagnostic and fault analysis routines have been developed and improved over
many generations of POWER processor-based servers, and enable quick and accurate
predefined responses to both actual and potential system problems.
The service processor correlates and processes runtime error information using logic
derived from IBM engineering expertise to count recoverable errors (called thresholding) and
predict when corrective actions must be automatically initiated by the system. These actions
can include:
Requests for a part to be replaced
Dynamic invocation of built-in redundancy for automatic replacement of a failing part
Dynamic deallocation of failing components so that system availability is maintained
Device drivers
In certain cases diagnostics are best performed by operating system-specific drivers, most
notably I/O devices that are owned directly by a logical partition. In these cases, the operating
system device driver often works in conjunction with I/O device microcode to isolate and
recover from problems. Potential problems are reported to an operating system device driver,
which logs the error. I/O devices can also include specific exercisers that can be invoked by
the diagnostic facilities for problem recreation if required by service procedures.
4.3.3 Reporting
In the unlikely event that a system hardware or environmentally induced failure is diagnosed,
IBM Power Systems servers report the error through a number of mechanisms. The analysis Chapter 4. Continuous availability and manageability 175
result is stored in system NVRAM. Error log analysis (ELA) can be used to display the failure
cause and the physical location of the failing hardware.
With the integrated service processor, the system has the ability to automatically send out an
alert through a phone line to a pager, or call for service in the event of a critical system failure.
A hardware fault also illuminates the amber system fault LED located on the system unit to
alert the user of an internal hardware problem.
On POWER7 processor-based servers, hardware and software failures are recorded in the
system log. When a management console is attached, an ELA routine analyzes the error,
forwards the event to the Service Focal Point (SFP) application running on the management
console, and has the capability to notify the system administrator that it has isolated a likely
cause of the system problem. The service processor event log also records unrecoverable
checkstop conditions, forwards them to the Service Focal Point (SFP) application, and
notifies the system administrator. After the information is logged in the SFP application, if the
system is properly configured, a call-home service request is initiated and the pertinent failure
data with service parts information and part locations is sent to the IBM service
organization.This information will also contain the client contact information as defined in the
Electronic Service Agent (ESA) guided set-up wizard.
Error logging and analysis
When the root cause of an error has been identified by a fault isolation component, an error
log entry is created with basic data such as:
An error code uniquely describing the error event
The location of the failing component
The part number of the component to be replaced, including pertinent data such as
engineering and manufacturing levels
Return codes
Resource identifiers
FFDC data
Data containing information about the effect that the repair will have on the system is also
included. Error log routines in the operating system and FSP can then use this information
and decide whether the fault is a call home candidate. If the fault requires support
intervention, then a call will be placed with service and support and a notifcation sent to the
contact defined in the ESA guided set-up wizard
Remote support
The Remote Management and Control (RMC) subsystem is delivered as part of the base
operating system, including the operating system running on the Hardware Management
Console. RMC provides a secure transport mechanism across the LAN interface between the
operating system and the Hardware Management Console and is used by the operating
system diagnostic application for transmitting error information. It performs a number of other
functions also, but these are not used for the service infrastructure.
Service Focal Point
A critical requirement in a logically partitioned environment is to ensure that errors are not lost
before being reported for service, and that an error should only be reported once, regardless
of how many logical partitions experience the potential effect of the error. The Manage
Serviceable Events task on the management console is responsible for aggregating duplicate
error reports, and ensures that all errors are recorded for review and management. 176 IBM Power 770 and 780 Technical Overview and Introduction
When a local or globally reported service request is made to the operating system, the
operating system diagnostic subsystem uses the Remote Management and Control
Subsystem (RMC) to relay error information to the Hardware Management Console. For
global events (platform unrecoverable errors, for example) the service processor will also
forward error notification of these events to the Hardware Management Console, providing a
redundant error-reporting path in case of errors in the RMC network.
The first occurrence of each failure type is recorded in the Manage Serviceable Events task
on the management console. This task then filters and maintains a history of duplicate reports
from other logical partitions on the service processor. It then looks at all active service event
requests, analyzes the failure to ascertain the root cause and, if enabled, initiates a call home
for service. This methodology ensures that all platform errors will be reported through at least
one functional path, ultimately resulting in a single notification for a single problem.
Extended error data
Extended error data (EED) is additional data that is collected either automatically at the time
of a failure or manually at a later time. The data collected is dependent on the invocation
method but includes information like firmware levels, operating system levels, additional fault
isolation register values, recoverable error threshold register values, system status, and any
other pertinent data.
The data is formatted and prepared for transmission back to IBM to assist the service support
organization with preparing a service action plan for the service representative or for
additional analysis.
System dump handling
In certain circumstances, an error might require a dump to be automatically or manually
created. In this event, it is off-loaded to the management console. Specific management
console information is included as part of the information that can optionally be sent to IBM
support for analysis. If additional information relating to the dump is required, or if it becomes
necessary to view the dump remotely, the management console dump record notifies the IBM
support center regarding on which management console the dump is located.
4.3.4 Notifying
After a Power Systems server has detected, diagnosed, and reported an error to an
appropriate aggregation point, it then takes steps to notify the client, and if necessary the IBM
support organization. Depending on the assessed severity of the error and support
agreement, this could range from a simple notification to having field service personnel
automatically dispatched to the client site with the correct replacement part.
Client Notify
When an event is important enough to report, but does not indicate the need for a repair
action or the need to call home to IBM service and support, it is classified as Client Notify.
Clients are notified because these events might be of interest to an administrator. The event
might be a symptom of an expected systemic change, such as a network reconfiguration or
failover testing of redundant power or cooling systems. Examples of these events include:
Network events such as the loss of contact over a local area network (LAN)
Environmental events such as ambient temperature warnings
Events that need further examination by the client (although these events do not
necessarily require a part replacement or repair action) Chapter 4. Continuous availability and manageability 177
Client Notify events are serviceable events, by definition, because they indicate that
something has happened that requires client awareness in the event that the client
wants to take further action. These events can always be reported back to IBM at the
client’s discretion.
Call home
A correctly configured POWER processor-based system can initiate an automatic or manual
call from a client location to the IBM service and support organization with error data, server
status, or other service-related information. The call-home feature invokes the service
organization in order for the appropriate service action to begin, automatically opening a
problem report and, in certain cases, also dispatching field support. This automated reporting
provides faster and potentially more accurate transmittal of error information. Although
configuring call-home is optional, clients are strongly encouraged to configure this feature to
obtain the full value of IBM service enhancements.
Vital product data (VPD) and inventory management
Power Systems store vital product data (VPD) internally, which keeps a record of how much
memory is installed, how many processors are installed, the manufacturing level of the parts,
and so on. These records provide valuable information that can be used by remote support
and service representatives, enabling them to provide assistance in keeping the firmware and
software on the server up-to-date.
IBM problem management database
At the IBM support center, historical problem data is entered into the IBM Service and
Support Problem Management database. All of the information that is related to the error,
along with any service actions taken by the service representative, is recorded for problem
management by the support and development organizations. The problem is then tracked
and monitored until the system fault is repaired.
4.3.5 Locating and servicing
The final component of a comprehensive design for serviceability is the ability to effectively
locate and replace parts requiring service. POWER processor-based systems use a
combination of visual cues and guided maintenance procedures to ensure that the identified
part is replaced correctly, every time.
Packaging for service
The following service enhancements are included in the physical packaging of the systems to
facilitate service:
Color coding (touch points)
– Terra-cotta-colored touch points indicate that a component (FRU or CRU) can be
concurrently maintained.
– Blue-colored touch points delineate components that are not concurrently maintained
(those that require the system to be turned off for removal or repair).
Tool-less design: Selected IBM systems support tool-less or simple tool designs. These
designs require no tools or simple tools, such as flathead screw drivers to service the
hardware components.
Positive retention: Positive retention mechanisms help to ensure proper connections
between hardware components, such as from cables to connectors, and between two
cards that attach to each other. Without positive retention, hardware components run the
risk of becoming loose during shipping or installation, preventing a good electrical 178 IBM Power 770 and 780 Technical Overview and Introduction
connection. Positive retention mechanisms such as latches, levers, thumb-screws, pop
Nylatches (U-clips), and cables are included to help prevent loose connections and aid in
installing (seating) parts correctly. These positive retention items do not require tools.
Light Path
The Light Path LED feature is for low-end systems, including Power Systems up to models
750 and 755, that can be repaired by clients. In the Light Path LED implementation, when a
fault condition is detected on the POWER7 processor-based system, an amber FRU fault
LED is illuminated, which is then rolled up to the system fault LED. The Light Path system
pinpoints the exact part by turning on the amber FRU fault LED that is associated with the
part to be replaced.
The system can clearly identify components for replacement by using specific
component-level LEDs, and can also guide the servicer directly to the component by
signaling (staying on solid) the system fault LED, enclosure fault LED, and the component
FRU fault LED.
After the repair, the LEDs shut off automatically if the problem is fixed.
Guiding Light
Midrange and high-end systems, including models 770 and 780 and later, are usually
repaired by IBM Support personnel.
The enclosure and system identify LEDs turn on solid and can be used to follow the path from
the system to the enclosure and down to the specific FRU.
Guiding Light uses a series of flashing LEDs, allowing a service provider to quickly and easily
identify the location of system components. Guiding Light can also handle multiple error
conditions simultaneously, which might be necessary in some very complex high-end
configurations.
In these situations, Guiding Light waits for the servicer’s indication of what failure to attend
first and then illuminates the LEDs to the failing component.
Data centers can be complex places, and Guiding Light is designed to do more than identify
visible components. When a component might be hidden from view, Guiding Light can flash a
sequence of LEDs that extends to the frame exterior, clearly guiding the service
representative to the correct rack, system, enclosure, drawer, and component.Chapter 4. Continuous availability and manageability 179
Service labels
Service providers use these labels to assist them in performing maintenance actions. Service
labels are found in various formats and positions, and are intended to transmit readily
available information to the servicer during the repair process.
Several of these service labels and their purposes are described in the following list:
Location diagrams are strategically located on the system hardware, relating information
regarding the placement of hardware components. Location diagrams can include location
codes, drawings of physical locations, concurrent maintenance status, or other data that is
pertinent to a repair. Location diagrams are especially useful when multiple components
are installed, such as DIMMs, sockets, processor cards, fans, adapter cards, LEDs, and
power supplies.
Remove or replace procedure labels contain procedures often found on a cover of the
system or in other spots that are accessible to the servicer. These labels provide
systematic procedures, including diagrams, detailing how to remove and replace certain
serviceable hardware components.
Numbered arrows are used to indicate the order of operation and serviceability direction of
components. Various serviceable parts such as latches, levers, and touch points must be
pulled or pushed in a certain direction and order so that the mechanical mechanisms can
engage or disengage. Arrows generally improve the ease of serviceability.
The operator panel
The operator panel on a POWER processor-based system is a four-row by 16-element LCD
display that is used to present boot progress codes, indicating advancement through the
system power-on and initialization processes. The operator panel is also used to display error
and location codes when an error occurs that prevents the system from booting. It includes
several buttons, enabling a service support representative (SSR) or client to change various
boot-time options and for other limited service functions.
Concurrent maintenance
The IBM POWER7 processor-based systems are designed with the understanding that
certain components have higher intrinsic failure rates than others. The movement of fans,
power supplies, and physical storage devices naturally make them more susceptible to
wearing down or burning out. Other devices such as I/O adapters can begin to wear from
repeated plugging and unplugging. For these reasons, these devices have been specifically
designed to be concurrently maintainable when properly configured.
In other cases, a client might be in the process of moving or redesigning a data center or
planning a major upgrade. At times like these, flexibility is crucial. The IBM POWER7
processor-based systems are designed for redundant or concurrently maintainable power,
fans, physical storage, and I/O towers.
The most recent members of the IBM Power Systems family, based on the POWER7
processor, continue to support concurrent maintenance of power, cooling, PCI adapters,
media devices, I/O drawers, GX adapter, and the operator panel. In addition, they support
concurrent firmware fix pack updates when possible. The determination of whether a
firmware fix pack release can be updated concurrently is identified in the readme file that is
released with the firmware.180 IBM Power 770 and 780 Technical Overview and Introduction
Hot-node add, hot-node repair, and memory upgrade
With the proper configuration and required protective measures, the Power 770 and
Power 780 servers are designed for node add, node repair, or memory upgrade without
powering down the system.
The Power 770 and Power 780 servers support the addition of another CEC enclosure (node)
to a system (hot-node add) or adding more memory (memory upgrade) to an existing node.
The additional Power 770 and Power 780 enclosure or memory can be ordered as a system
upgrade (MES order) and added to the original system. The additional resources of the newly
added CEC enclosure (node) or memory can then be assigned to existing OS partitions or
new partitions as required. Hot-node add and memory upgrade enable the upgrading of a
server by integrating a second, third, or fourth CEC enclosure or additional memory into the
server, with reduced impact to the system operation.
In the unlikely event that CEC hardware (for example, processor or memory) experienced a
failure, the hardware can be repaired by freeing the processors and memory in the node and
its attached I/O resources (node evacuation) dependant on the partition configuration.
To guard against any potential impact to system operation during hot-node add, memory
upgrade, or node repair, clients must comply with these protective measures:
For memory upgrade and node repair, ensure that the system has sufficient inactive
or spare processors and memory. Critical I/O resources must be configured with
redundant paths.
Schedule upgrades or repairs during non-peak operational hours.
Move business applications to another server by using the PowerVM Live Partition
Mobility feature or quiesce them. The use of LPM means that all critical applications
must be halted or moved to another system before the operation begins. Non-critical
applications can remain running. The partitions can be left running at the operating
system command prompt.
Back up critical application and system state information.
Checkpoint the databases.
Blind-swap cassette
Blind-swap PCIe adapters represent significant service and ease-of-use enhancements in I/O
subsystem design while maintaining high PCIe adapter density.
Blind-swap allows PCIe adapters to be concurrently replaced or installed without having to
put the I/O drawer or system into a service position. Since first delivered, minor carrier design
adjustments have improved an already well-thought-out service design.
For PCIe adapters on the POWER7 processor-based servers, blind-swap cassettes include
the PCIe slot, to avoid the top to bottom movement for inserting the card on the slot that was
required on previous designs. The adapter is correctly connected by just sliding the cassette
in and actuacting a latch.
Firmware updates
System Firmware is delivered as a release level or a service pack. Release Levels support
the general availability (GA) of new function or features, and new machine types or models.
Upgrading to a higher release level is disruptive to customer operations. IBM intends to
introduce no more than two new release levels per year. These release levels will be
supported by service packs. Service packs are intended to contain only firmware fixes and
not to introduce new function. A service pack is an update to an existing release level.Chapter 4. Continuous availability and manageability 181
If the system is managed by a management console, you will use the management console
for firmware updates. Using the management console allows you to take advantage of the
Concurrent Firmware Maintenance (CFM) option when concurrent service packs are
available. CFM is the IBM term used to describe the IBM Power Systems firmware updates
that can be partially or wholy concurrent or non-disruptive. With the introduction of CFM, IBM
is significantly increasing a client’s opportunity to stay on a given release level for longer
periods of time. Clients wanting maximum stability can defer until there is a compelling
reason to upgrade, such as:
A release level is approaching its end-of-service date (that is, it has been available for
about a year and hence will go out of service support soon).
Moving a system to a more standardized release level when there are multiple systems in
an environment with similar hardware.
A new release has new functionality that is needed in the environment.
A scheduled maintenance action will cause a platform reboot. This provides an
opportunity to also upgrade to a new firmware release.
The update and upgrade of system firmware is dependant on several factors, such as
whether the system is standalone or managed by a management console, the current
firmware installed, and what operating systems are running on the system. These scenarios
and the associated installation instructions are comprehensively outlined in the firmware
section of Fix Central:
http://www.ibm.com/support/fixcentral/
You might also want to review the best practice white papers, which can be found here:
http://www14.software.ibm.com/webapp/set2/sas/f/best/home.html
Repair and verify system
Repair and verify (R&V) is a system used to guide a service provider step-by-step through the
process of repairing a system and verifying that the problem has been repaired. The steps
are customized in the appropriate sequence for the particular repair for the specific system
being repaired. Repair scenarios covered by repair and verify include:
Replacing a defective field-replaceable unit (FRU) or a customer replacable unit (CRU)
Reattaching a loose or disconnected component
Correcting a configuration error
Removing or replacing an incompatible FRU
Updating firmware, device drivers, operating systems, middleware components, and IBM
applications after replacing a part
Repair and verify procedures can be used by both service representative providers who are
familiar with the task and those who are not. Education On Demand content is placed in the
procedure at the appropriate locations. Throughout the repair and verify procedure, repair
history is collected and provided to the Service and Support Problem Management Database
for storage with the serviceable event, to ensure that the guided maintenance procedures are
operating correctly.
If a server is managed by a management console, then many of the R&V procedures are
performed from the management console. If the FRU to be replaced is a PCI adapter or an
internal storage device, then the service action is always performed from the operating
system of the partition owning that resource.182 IBM Power 770 and 780 Technical Overview and Introduction
Clients can subscribe through the subscription services to obtain the notifications about the
latest updates available for service-related documentation. The latest version of the
documentation is accessible through the internet.
4.4 Manageability
Several functions and tools help manageability and enable you to efficiently and effectively
manage your system.
4.4.1 Service user interfaces
The service interface allows support personnel or the client to communicate with the service
support applications in a server using a console, interface, or terminal. Delivering a clear,
concise view of available service applications, the service interface allows the support team
to manage system resources and service information in an efficient and effective way.
Applications available through the service interface are carefully configured and placed to
give service providers access to important service functions.
Various service interfaces are used, depending on the state of the system and its operating
environment. The primary service interfaces are:
Light Path and Guiding Light
For more information, see “Light Path” on page 178 and “Guiding Light” on page 178.
Service processor, Advanced System Management Interface (ASMI)
Operator panel
Operating system service menu
Service Focal Point on the Hardware Management Console
Service Focal Point Lite on Integrated Virtualization Manager
Service processor
The service processor is a controller that is running its own operating system. It is a
component of the service interface card.
The service processor operating system has specific programs and device drivers for the
service processor hardware. The host interface is a processor support interface that is
connected to the POWER processor. The service processor is always working, regardless of
the main system unit’s state. The system unit can be in these states:
Standby (power off)
Operating, ready to start partitions
Operating with running logical partitions
The service processor is used to monitor and manage the system hardware resources and
devices. The service processor checks the system for errors, ensuring that the connection to
the management console for manageability purposes and accepting Advanced System
Management Interface (ASMI) Secure Sockets Layer (SSL) network connections. The
service processor provides the ability to view and manage the machine-wide settings by
using the ASMI, and enables complete system and partition management from the
management console.Chapter 4. Continuous availability and manageability 183
With two CEC enclosures and more, there are two redundant FSP, one in each of the first
CECs. While one is active, the second one is in standby mode. In case of a failure, there will
be a automatic takeover.
The service processor uses two Ethernet 10/100/1000 Mbps ports. Note this information:
Both Ethernet ports are visible only to the service processor and can be used to
attach the server to a management console or to access the ASMI. The ASMI options
can be accessed through an HTTP server that is integrated into the service processor
operating environment.
Because of firmware-heavy workload, firmware can support only these ports at
10/100 Mbps rate although the Ethernet MAC is capable of 1 Gbps.
Both Ethernet ports have a default IP address, as follows:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.147.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.147.
When a redundant service processor is present, the default IP addresses are:
– Service processor Eth0 or HMC1 port is configured as 169.254.2.146.
– Service processor Eth1 or HMC2 port is configured as 169.254.3.146.
The functions available through service processor include:
Call Home
Advanced System Management Interface (ASMI)
Error Information (error code, PN, Location Codes) menu
View of guarded components
Limited repair procedures
Generate dump
LED Management menu
Remote view of ASMI menus
Firmware update through USB key
Advanced System Management Interface (ASMI)
ASMI is the interface to the service processor that enables you to manage the operation of
the server, such as auto-power restart, and to view information about the server, such as the
error log and vital product data. Various repair procedures require connection to the ASMI.
The ASMI is accessible through the management console. It is also accessible by using a
web browser on a system that is connected directly to the service processor (in this case,
either a standard Ethernet cable or a crossed cable) or through an Ethernet network. ASMI
can also be accessed from an ASCII terminal, but this is only available while the system is in
the platform powered-off mode.
Use the ASMI to change the service processor IP addresses or to apply certain security
policies and prevent access from undesired IP addresses or ranges.
Note: The service processor enables a system that does not boot to be analyzed. The
error log analysis can be performed from either the ASMI or the management console.184 IBM Power 770 and 780 Technical Overview and Introduction
You might be able to use the service processor’s default settings. In that case, accessing the
ASMI is not necessary. To access ASMI, use one of the following methods:
Access the ASMI by using an management console.
If configured to do so, the management console connects directly to the ASMI for a
selected system from this task.
To connect to the Advanced System Management interface from an management
console, follow these steps:
a. Open Systems Management from the navigation pane.
b. From the work pane, select one or more managed systems to work with.
c. From the System Management tasks list, select Operations Advanced System
Management (ASM).
Access the ASMI by using a web browser.
At the time of writing, supported web browsers are Microsoft Internet Explorer
(Version 7.0), Mozilla Firefox (Version 2.0.0.11), and Opera (Version 9.24). Later versions
of these browsers might work but are not officially supported. The JavaScript language
and cookies must be enabled.
The web interface is available during all phases of system operation, including the initial
program load (IPL) and run time. However, several of the menu options in the web
interface are unavailable during IPL or run time to prevent usage or ownership conflicts if
the system resources are in use during that phase. The ASMI provides a Secure Sockets
Layer (SSL) web connection to the service processor. To establish an SSL connection,
open your browser using this adress:
https://
Access the ASMI using an ASCII terminal.
The ASMI on an ASCII terminal supports a subset of the functions that are provided by the
web interface and is available only when the system is in the platform powered-off mode. The
ASMI on an ASCII console is not available during several phases of system operation, such
as the IPL and run time.
The operator panel
The service processor provides an interface to the operator panel, which is used to display
system status and diagnostic information.
Note: To make the connection through Internet Explorer, click Tools Internet Options.
Clear the Use TLS 1.0 check box, and click OK.Chapter 4. Continuous availability and manageability 185
The operator panel can be accessed in two ways:
By using the normal operational front view.
By pulling it out to access the switches and viewing the LCD display. Figure 4-10 shows
that the operator panel on a Power 770 and Power 780 is pulled out.
Figure 4-10 Operator panel is pulled out from the chassis
Several of the operator panel features include:
A 2 x 16 character LCD display
Reset, enter, power On/Off, increment, and decrement buttons
Amber System Information/Attention, green Power LED
Blue Enclosure Identify LED on the Power 770 and Power 780
Altitude sensor
USB Port
Speaker/Beeper
The functions available through the operator panel include:
Error Information
Generate dump
View Machine Type, Model, and Serial Number
Limited set of repair functions
Operating system service menu
The system diagnostics consist of IBM i service tools, stand-alone diagnostics that are loaded
from the DVD drive, and online diagnostics (available in AIX).
Online diagnostics, when installed, are a part of the AIX or IBM i operating system on the disk
or server. They can be booted in single-user mode (service mode), run in maintenance mode,
or run concurrently (concurrent mode) with other applications. They have access to the AIX
Release Lever
(slide left to release operator panel and pull out from chassis)186 IBM Power 770 and 780 Technical Overview and Introduction
error log and the AIX configuration data. IBM i has a service tools problem log, IBM i history
log (QHST), and IBM i problem log.
These are the modes:
Service mode
Requires a service mode boot of the system and enables the checking of system devices
and features. Service mode provides the most complete checkout of the system
resources. All system resources, except the SCSI adapter and the disk drives used for
paging, can be tested.
Concurrent mode
Enables the normal system functions to continue while selected resources are being
checked. Because the system is running in normal operation, certain devices might
require additional actions by the user or diagnostic application before testing can be done.
Maintenance mode
Enables the checking of most system resources. Maintenance mode provides the same
test coverage as service mode. The difference between the two modes is the way that
they are invoked. Maintenance mode requires that all activity on the operating system be
stopped. The shutdown -m command is used to stop all activity on the operating system
and put the operating system into maintenance mode.
The System Management Services (SMS) error log is accessible on the SMS menus.
This error log contains errors that are found by partition firmware when the system or
partition is booting.
The service processor’s error log can be accessed on the ASMI menus.
You can also access the system diagnostics from a Network Installation Management
(NIM) server.
The IBM i operating system and associated machine code provide Dedicated Service Tools
(DST) as part of the IBM i licensed machine code (Licensed Internal Code) and System
Service Tools (SST) as part of the IBM i operating system. DST can be run in dedicated
mode (no operating system loaded). DST tools and diagnostics are a superset of those
available under SST.
The IBM i End Subsystem (ENDSBS *ALL) command can shut down all IBM and customer
applications subsystems except the controlling subsystem QTCL. The Power Down System
(PWRDWNSYS) command can be set to power down the IBM i partition and restart the
partition in DST mode.
You can start SST during normal operations, which leaves all applications up and running
using the IBM i Start Service Tools (STRSST) command (when signed onto IBM i with the
appropriately secured user ID).
With DST and SST you can look at various logs, run various diagnostics, or take several
kinds of system dumps or other options.
Note: When you order a Power System, a DVD-ROM or DVD-RAM might be optional. An
alternate method for maintaining and servicing the system must be available if you do not
order the DVD-ROM or DVD-RAM.Chapter 4. Continuous availability and manageability 187
Depending on the operating system, these are the service-level functions that you typically
see when using the operating system service menus:
Product activity log
Trace Licensed Internal Code
Work with communications trace
Display/Alter/Dump
Licensed Internal Code log
Main storage dump manager
Hardware service manager
Call Home/Customer Notification
Error information menu
LED management menu
Concurrent/Non-concurrent maintenance (within scope of the OS)
Managing firmware levels
– Server
– Adapter
Remote support (access varies by OS)
Service Focal Point on the Hardware Management Console
Service strategies become more complicated in a partitioned environment. The Manage
Serviceable Events task in the management console can help to streamline this process.
Each logical partition reports errors that it detects and forwards the event to the Service Focal
Point (SFP) application that is running on the management console, without determining
whether other logical partitions also detect and report the errors. For example, if one logical
partition reports an error for a shared resource, such as a managed system power supply,
other active logical partitions might report the same error.
By using the Manage Serviceable Events task in the management console, you can avoid
long lists of repetitive call-home information by recognizing that these are repeated errors and
consolidating them into one error.
In addition, you can use the Manage Serviceable Events task to initiate service functions on
systems and logical partitions, including the exchanging of parts, configuring connectivity,
and managing dumps.
4.4.2 IBM Power Systems firmware maintenance
The IBM Power Systems Client-Managed Microcode is a methodology that enables you to
manage and install microcode updates on Power Systems and associated I/O adapters.
The system firmware consists of service processor microcode, Open Firmware microcode,
SPCN microcode, and the POWER Hypervisor.
The firmware and microcode can be downloaded and installed from a management console,
a running partition, or USB port number 1 on the rear of a Power 770 and Power 780, if that
system is not managed by a management console.
Power Systems has a permanent firmware boot side, or A side, and a temporary firmware
boot side, or B side. New levels of firmware must be installed on the temporary side first in
order to test the update’s compatibility with existing applications. When the new level of
firmware has been approved, it can be copied to the permanent side.188 IBM Power 770 and 780 Technical Overview and Introduction
For access to the initial web pages that address this capability, see the Support for
IBM Systems web page:
http://www.ibm.com/systems/support
For Power Systems, select the Power link (Figure 4-11).
Figure 4-11 Support for Power servers web page
Although the content under the Popular links section can change, click Firmware and HMC
updates to go to the resources for keeping your system’s firmware current.Chapter 4. Continuous availability and manageability 189
If there is a management console to manage the server, the management console interface
can be use to view the levels of server firmware and power subsystem firmware that are
installed and are available to download and install.
Each IBM Power Systems server has the following levels of server firmware and power
subsystem firmware:
Installed level
This level of server firmware or power subsystem firmware has been installed and will be
installed into memory after the managed system is powered off and then powered on. It is
installed on the temporary side of system firmware.
Activated level
This level of server firmware or power subsystem firmware is active and running
in memory.
Accepted level
This level is the backup level of server or power subsystem firmware. You can return to
this level of server or power subsystem firmware if you decide to remove the installed
level. It is installed on the permanent side of system firmware.
IBM provides the Concurrent Firmware Maintenance (CFM) function on selected Power
Systems. This function supports applying nondisruptive system firmware service packs to the
system concurrently (without requiring a reboot operation to activate changes). For systems
that are not managed by an management console, the installation of system firmware is
always disruptive.
The concurrent levels of system firmware can, on occasion, contain fixes that are known as
deferred. These deferred fixes can be installed concurrently but are not activated until the
next IPL. For deferred fixes within a service pack, only the fixes in the service pack, which
cannot be concurrently activated, are deferred. Table 4-1 shows the file-naming convention
for system firmware.
Table 4-1 Firmware naming convention
The following example uses the convention:
01AM710_086_063 = Managed System Firmware for 9117-MMB Release 710 Fixpack 086
PPNNSSS_FFF_DDD
PP Package identifier 01 -
02 -
NN Platform and class AL Low End
AM Mid Range
AS IH Server
AH High End
AP Bulk Power for IH
AB Bulk Power
SSS Release indicator
FFF Current fix pack
DDD Last disruptive fix pack190 IBM Power 770 and 780 Technical Overview and Introduction
An installation is disruptive if the following statements are true:
The release levels (SSS) of currently installed and new firmware differ.
The service pack level (FFF) and the last disruptive service pack level (DDD) are equal in
new firmware.
Otherwise, an installation is concurrent if the service pack level (FFF) of the new firmware is
higher than the service pack level currently installed on the system and the conditions for
disruptive installation are not met.
4.4.3 Electronic Services and Electronic Service Agent
IBM has transformed its delivery of hardware and software support services to help you
achieve higher system availability. Electronic Services is a web-enabled solution that offers
an exclusive, no-additional-charge enhancement to the service and support available for IBM
servers. These services provide the opportunity for greater system availability with faster
problem resolution and preemptive monitoring. The Electronic Services solution consists of
two separate, but complementary, elements:
Electronic Services news page
The Electronic Services news page is a single internet entry point that replaces the
multiple entry points that are traditionally used to access IBM internet services and
support. The news page enables you to gain easier access to IBM resources for
assistance in resolving technical problems.
Electronic Service Agent
The Electronic Service Agent is software that resides on your server. It monitors events
and transmits system inventory information to IBM on a periodic, client-defined timetable.
The Electronic Service Agent automatically reports hardware problems to IBM.
Early knowledge about potential problems enables IBM to deliver proactive service that can
result in higher system availability and performance. In addition, information that is collected
through the Service Agent is made available to IBM service support representatives when
they help answer your questions or diagnose problems. Installation and use of IBM Electronic
Service Agent for problem reporting enables IBM to provide better support and service for
your IBM server.
To learn how Electronic Services can work for you, visit:
https://www.ibm.com/support/electronic/portal
4.5 Operating system support for RAS features
Table 4-2 gives an overview of features for continuous availability that are supported by the
various operating systems running on the Power 770 and Power 780 systems.
Table 4-2 Operating system support for RAS features
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i RHEL
5.7
RHEL
6.1
SLES11
SP1
System deallocation of failing components
Dynamic Processor Deallocation X X X X X X X
Dynamic Processor Sparing X X X X X X XChapter 4. Continuous availability and manageability 191
Processor Instruction Retry X X X X X X X
Alternate Processor Recovery X X X X X X X
Partition Contained Checkstop X X X X X X X
Persistent processor deallocation X X X X X X X
GX++ bus persistent deallocation X X X X - - X
PCI bus extended error detection X X X X X X X
PCI bus extended error recovery X X X X Most Most Most
PCI-PCI bridge extended error handling X X X X - - -
Redundant RIO or 12x Channel link X X X X X X X
PCI card hot-swap X X X X X X X
Dynamic SP failover at run time X X X X X X X
Memory sparing with CoD at IPL time X X X X X X X
Clock failover runtime or IPL X X X X X X X
Memory availability
64-byte ECC code X X X X X X X
Hardware scrubbing X X X X X X X
CRC X X X X X X X
Chipkill X X X X X X X
L1 instruction and data array protection X X X X X X X
L2/L3 ECC and cache line delete X X X X X X X
Special uncorrectable error handling X X X X X X X
Active Memory Mirroring X X X X X X X
Fault detection and isolation
Platform FFDC diagnostics X X X X X X X
Run-time diagnostics X X X X Most Most Most
Storage Protection Keys - X X X - - -
Dynamic Trace X X X X - - X
Operating System FFDC - X X X - - -
Error log analysis X X X X X X X
Freeze mode of I/O Hub X X X X - - -
Service Processor support for:
Built-in-Self-Tests (BIST) for logic and arrays X X X X X X X
Wire tests X X X X X X X
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i RHEL
5.7
RHEL
6.1
SLES11
SP1192 IBM Power 770 and 780 Technical Overview and Introduction
Component initialization X X X X X X X
Serviceability
Boot-time progress indicators X X X X Most Most Most
Electronic Service Agent Call Home from management
console
X X X X X X X
Firmware error codes X X X X X X X
Operating system error codes X X X X Most Most Most
Inventory collection X X X X X X X
Environmental and power warnings X X X X X X X
Hot-plug fans, power supplies X X X X X X X
Extended error data collection X X X X X X X
I/O drawer redundant connections X X X X X X X
I/O drawer hot add and concurrent repair X X X X X X X
Concurrent RIO/GX adapter add X X X X X X X
SP mutual surveillance with POWER Hypervisor X X X X X X X
Dynamic firmware update with management console X X X X X X X
Electronic Service Agent Call Home Application X X X X - - -
Guiding light LEDs X X X X X X X
System dump for memory, POWER Hypervisor, SP X X X X X X X
Infocenter / Systems Support Site service publications X X X X X X X
System Support Site education X X X X X X X
Operating system error reporting to management
console SFP
X X X X X X X
RMC secure error transmission subsystem X X X X X X X
Health check scheduled operations with management
console
X X X X X X X
Operator panel (real or virtual) X X X X X X X
Concurrent operator panel maintenance X X X X X X X
Redundant management consoles X X X X X X X
Automated server recovery/restart X X X X X X X
High availability clustering support X X X X X X X
Repair and Verify Guided Maintenance X X X X Most Most Most
Concurrent kernel update - X X X X X X
Concurrent Hot Add/Repair Maintenance X X X X X X X
RAS feature
AIX
5.3
AIX
6.1
AIX
7.1
IBM i RHEL
5.7
RHEL
6.1
SLES11
SP1© Copyright IBM Corp. 2011. All rights reserved. 193
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this paper.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
IBM BladeCenter PS700, PS701, and PS702 Technical Overview and Introduction,
REDP-4655
IBM BladeCenter PS703 and PS704 Technical Overview and Introduction, REDP-4744
IBM Power 710 and 730 Technical Overview and Introduction, REDP-4796
IBM Power 720 and 740 Technical Overview and Introduction, REDP-4797
IBM Power 750 and 755 Technical Overview and Introduction, REDP-4638
IBM Power 795 Technical Overview and Introduction, REDP-4640
IBM PowerVM Virtualization Introduction and Configuration, SG24-7940
IBM PowerVM Virtualization Managing and Monitoring, SG24-7590
IBM PowerVM Live Partition Mobility, SG24-7460
IBM System p Advanced POWER Virtualization (PowerVM) Best Practices, REDP-4194
PowerVM Migration from Physical to Virtual Storage, SG24-7825
IBM System Storage DS8000: Copy Services in Open Environments, SG24-6788
IBM System Storage DS8700 Architecture and Implementation, SG24-8786
PowerVM and SAN Copy Services, REDP-4610
SAN Volume Controller V4.3.0 Advanced Copy Services, SG24-7574
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Other publications
These publications are also relevant as further information sources:
IBM Power Systems Facts and Features POWER7 Blades and Servers
http://www.ibm.com/systems/power/hardware/reports/factsfeatures.html
Specific storage devices supported for Virtual I/O Server
http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/datasheet.html194 IBM Power 770 and 780 Technical Overview and Introduction
IBM Power 710 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03048usen/POD03048USEN.PDF
IBM Power 720 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03048usen/POD03048USEN.PDF
IBM Power 730 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03050usen/POD03050USEN.PDF
IBM Power 740 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03051usen/POD03051USEN.PDF
IBM Power 750 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03034usen/POD03034USEN.PDF
IBM Power 755 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03035usen/POD03035USEN.PDF
IBM Power 770 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03035usen/POD03035USEN.PDF
IBM Power 780 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03032usen/POD03032USEN.PDF
IBM Power 795 server Data Sheet
http://public.dhe.ibm.com/common/ssi/ecm/en/pod03053usen/POD03053USEN.PDF
Active Memory Expansion: Overview and Usage Guide
http://public.dhe.ibm.com/common/ssi/ecm/en/pow03037usen/POW03037USEN.PDF
Migration combinations of processor compatibility modes for active Partition Mobility
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7hc3/iphc3pcmco
mbosact.htm
Advance Toolchain for Linux website
http://www.ibm.com/developerworks/wikis/display/hpccentral/How+to+use+Advance+
Toolchain+for+Linux+on+POWER
Online resources
These websites are also relevant as further information sources:
IBM Power Systems Hardware Information Center
http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp
IBM System Planning Tool website
http://www.ibm.com/systems/support/tools/systemplanningtool/
IBM Fix Central website
http://www.ibm.com/support/fixcentral/
Power Systems Capacity on Demand website
http://www.ibm.com/systems/power/hardware/cod/ Related publications 195
Support for IBM Systems website
http://www.ibm.com/support/entry/portal/Overview?brandind=Hardware~Systems~Power
IBM Power Systems website
http://www.ibm.com/systems/power/
IBM Storage website
http://www.ibm.com/systems/storage/
Help from IBM
IBM Support and downloads
ibm.com/support
IBM Global Services
ibm.com/services196 IBM Power 770 and 780 Technical Overview and Introduction®
REDP-4798-00
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
Redpaper™
IBM Power 770 and 780
Technical Overview
and Introduction
Features the
9117-MMC and
9179-MHC based on
the latest POWER7
processor technology
Describes MaxCore
and TurboCore for
redefining
performance
Discusses Active
Memory Mirroring for
Hypervisor
This IBM Redpaper publication is a comprehensive guide covering the
IBM Power 770 (9117-MMC) and Power 780 (9179-MHC) servers
supporting IBM AIX, IBM i, and Linux operating systems. The goal of
this paper is to introduce the major innovative Power 770 and
Power 780 offerings and their prominent functions, including:
The IBM POWER7 processor available at frequencies of 3.3 GHz,
3.44 GHz, 3.72 GHz, and 3.92 GHz, and 4.14 GHz
The specialized IBM POWER7 Level 3 cache that provides greater
bandwidth, capacity, and reliability
The 1 Gb or 10 Gb Integrated Multifunction Card that provides two
USB ports, one serial port, and four Ethernet connectors for a
processor enclosure and does not require a PCI slot
The new Active Memory Mirroring (AMM) for Hypervisor feature
that mirrors the main memory used by the firmware
IBM PowerVM virtualization, including PowerVM Live Partition
Mobility and PowerVM Active Memory Sharing
Active Memory Expansion that provides more usable memory than
what is physically installed on the system
IBM EnergyScale technology that provides features such as power
trending, power-saving, capping of power, and thermal
measurement
Enterprise-ready reliability, serviceability, and availability
Professionals who want to acquire a better understanding of IBM
Power Systems products should read this paper. This paper expands
the current set of IBM Power Systems documentation by providing a
desktop reference that offers a detailed technical description of the
Power 770 and Power 780 systems.
Back cover
© Copyright IBM Corp. 2011. All rights reserved. ibm.com/redbooks 1
Redpaper
Choosing eXFlash Storage on IBM eX5
Servers
Introduction
In keeping with their leadership position providing continuous technology improvement and
innovation, IBM® recently refreshed the eXFlash offering to expand its capabilities.
Compared to the previous generation, the new features of the IBM eXFlash solution include:
Hot-swap capability
IOPS performance improved by six times to 240,000 IOPS per eXFlash unit
Storage capacity increased by four times to 1.6 TB per eXFlash unit
You can expect further expansion of IBM eXFlash capabilities for both IOPS and throughput
performance and capacity over time as SSD technology improvements are developed and
adopted by the industry.
The intent of this IBM Redpaper™ document is to discuss the benefits and advantages of
IBM eXFlash storage technology within the IBM eX5 portfolio, demonstrate how eXFlash fits
into a multi-tiered storage design approach, and provide recommendations on the most
effective use of eXFlash storage within the enterprise information infrastructure.
This paper covers the following topics:
“Application requirements for storage”
“IBM System x server and storage products”
“IBM eXFlash deployment scenarios”
Ilya Krutov2 Choosing eXFlash Storage on IBM eX5 Servers
Application requirements for storage
Choosing the right storage for application data can be a complex task because you must
ensure that critical business and application requirements are met while costs are kept
optimized. In particular, storage performance capabilities should match the processing
capabilities of the server itself to ensure the most efficient utilization of system resources.
There is no “one size fits all” approach possible because different applications have different
storage data access patterns.
In general, the factors to consider during the planning process for application data storage
include:
Importance of data (Can I accept the loss of data?)
Sensitivity of data (Do I need an advanced data protection and security?)
Availability of data (Do I need the data 24 hours per day, 7 days per week?)
Security of data (Who can read, modify, and delete the data?)
Data access speed (How quickly do I need to insert and extract the data?)
Performance or workload capacity (How many IOPS for I/O-intensive workloads and how
many MBps for throughput-intensive workloads do I need?)
Storage capacity (How much space do I need to store the data?)
Frequency of access (How often do I need the data?)
Backup and recovery strategy (How much time do I need to backup and restore the data?)
Retention policy (How long should I keep the data?)
Scalability for future growth (Do I expect the workload increase in the near future?)
Storage deployment: internal or external (If external, then JBOD or storage controller? If
storage controller, then SAS, iSCSI, FC, or FCoE?)
Data access pattern (How does the application access the data?)
– Read or write intensive
– Random or sequential access
– Large or small I/O requests
Answers to these questions will help you to formalize the performance, availability, and
capacity requirements for your applications, and match these requirements with the tiered
storage design model.
Multi-tiered storage architecture
As we mentioned previously, the planning of information infrastructure includes choosing the
most cost-effective way to fulfill the application requirements for storage access with respect
to speed, capacity, and availability. To describe these requirements, and to establish the
framework for the deployment of the storage infrastructure, the storage tiering approach was
established.
Each storage tier defines a set of characteristics to meet the application requirements. There
are four tiers, each with performance, availability, capacity, and access pattern characteristics
for the data residing on that tier. Knowing your data access requirements will help you to
place data on the appropriate storage tier, thereby ensuring that your storage infrastructure is
capable of running your workloads in a cost-efficient manner. Choosing eXFlash Storage on IBM eX5 Servers 3
The storage tiers, their corresponding characteristics, suitable storage media types, and
relative cost per gigabyte are listed in Table 1.
Table 1 Storage tiers and characteristics
Tiers 0, 1, and 2 are considered primary storage, meaning data on these tiers can be directly
accessed by the application. Tier 3 is secondary storage, that is, data that cannot be
accessed directly; in order to access this data it must be moved to primary storage. Tier 0 has
been specifically added for the enterprise solid state drives.
Data storage closer to the main memory (that is, closer to the application processes residing
in the main memory) costs more to implement than storage that is farther away. In other
words, the price per GB of data storage increases from tier 3 to tier 0. To keep costs
optimized, the most demanded data (also referred to as hot data) from the working data set
should be placed closest to the main memory, whereas less demanded data can be placed
on a higher (more distant) storage tier. From a planning standpoint, the rules that define the
policy of placing data onto different storage tiers are part of the overall information life cycle
management strategy for the organization. From a technology standpoint, data management
and relocation policy can be implemented either manually by administrators or automatically
by management software that supports policy-based data relocation, for example, IBM
GPFS™, or by integrated features of storage systems, like the IBM Easy Tier™ feature of
IBM Storwize® V7000 external storage.
Storage performance considerations for applications
Currently, the processor, memory, and I/O subsystem are well balanced and virtually not
considered as performance bottlenecks in the majority of systems. The main source of
performance issues tends to be related to storage I/O activity because the speed of traditional
HDD-based storage systems still does not match the processing capabilities of the servers.
Various caching technologies can be implemented to increase HDD storage access speed.
Despite the overall size of a stored data set only a portion of its data is actively used during
Storage tier Characteristic Storage media type Cost per gigabyte
Tier 0 (SSD) Random access
I/O-intensive
Extreme performance
Extreme availability
Frequent access
SSD, IBM eXFlash Very high
Tier 1 (Online) Random access
I/O-intensive
High performance
High availability
Frequent access
SAS, FC HDD High
Tier 2 (Nearline) Sequential access
Throughput-intensive
Capacity
High availability
Infrequent access
NL SAS, NL SATA HDD Moderate
Tier 3 (Offline) Sequential access
Throughput-intensive
Large capacity
Long-term retention
No direct access
Ta p e L ow4 Choosing eXFlash Storage on IBM eX5 Servers
normal operations at certain time intervals. Data caching algorithms ensure that the most
demanded data (most frequently used) is always kept as close to the application as possible
to provide the fastest response time.
Caching exists at several levels. Storage controllers use fast DRAM cache to keep the most
frequently used data from disks; however, the cache size is normally limited to several GBs.
Operating systems and certain applications keep their own disk cache in the fast system
memory, but the cost per GB of RAM storage is high.
With the introduction of solid state drives there is an opportunity to dramatically increase the
performance of disk-based storage to match the capabilities of other server subsystems,
while keeping costs optimized because the solid state drives have a lower cost per MB
compared to DRAM memory, and lower latency compared to traditional hard drives. This is
illustrated in Figure 1.
Figure 1 Cost per gigabyte and latency for RAM, SSD, and HDD
In general, there are two key types of storage applications based on workload they generate:
I/O-intensive applications require the storage system to process as many host's read and
write requests (or I/O requests) per second as possible given the average I/O request size
used by this application, typically 8 - 16 KB. This behavior is most common for OLTP
databases.
Throughput-intensive applications require storage system to transfer to or from host as
many gigabytes of information per second as possible, and they typically use I/O request
size of 64 - 128 KB. These characteristics commonly inherent to file servers, multimedia
streaming, and backup.
Therefore, there are two key performance metrics to evaluate storage system performance:
input/output requests per second (IOPS) and throughput depending on application workload.
Another important factor to take into account is the response time (or latency), or how much
time does the application spend waiting for the response from the storage system after
submitting a particular I/O request. In other words, response time is the amount of time
required by the storage system to complete an I/O request. Response time has direct impact
on the productivity of users who work with the application (because of how long it takes to get
the requested information) and also on the application itself. For example, slow response to
RAM SSD HDD
High Medium Low
Cost per gigabyte
Latency
Low Medium High Choosing eXFlash Storage on IBM eX5 Servers 5
database write requests might cause multiple record locks and further performance
degradation of the application.
Key factors affecting the response time of the storage system are how quickly the required
data can be located on the media (seek time) and how quickly it can be read from or written to
the media, which in part depends on the size of the I/O requests (reading or writing more data
normally takes more time).
In addition, the majority of applications generate many storage I/O requests at the same time,
and these requests might spend some time in the queue if they cannot be immediately
handled by the storage system. The number of I/O requests that can be concurrently sent to
the storage system for execution is called queue depth. This refers the service queue, that is,
the queue of requests currently being processed by the storage system. If the number of
outgoing I/O requests exceeds the parallel processing capabilities of the storage system (I/O
queue depth), the requests are put into the wait queue, and then moved to the service queue
when a place becomes available. This also affects the overall response time.
From the traditional spinning HDD perspective, improvement of latency is limited by
mechanical design. Despite an increase in rotational speed of disk plates and higher density
of stored data, the response time of HDD is still several milliseconds, which effectively limits
its maximum IOPS. For example, a single 2.5 in.15K rpm SAS HDD is capable of
approximately 300 IOPS.
With SSD-based eXFlash latency is measured in dozens of microseconds (or almost 100
times lower latency than for hard drives), which in turn leads to the 240,000 IOPS capabilities
identified earlier. Higher IOPS capabilities also mean higher queue depth and therefore better
response time for almost all types of storage I/O-intensive applications.
In other words, if the application is multi-user, heavily loaded, and accesses storage with
random I/O requests of a small size, then this application is a good candidate to consider to
put its entire data set (or part of it) on an IBM eXFlash or external SSD-based storage system.
Conversely, if an application transfers large amounts of data, like backups or archiving, then
eXFlash might not provide any advantage because the limiting factor will be the bandwidth of
the SSD interface.
The knowledge of how the application accesses data—read-intensive or write-intensive, and
random data access or sequential data access—helps you design and implement the most
cost-efficient storage to meet required service level agreements (SLAs). Table 2 summarizes
the relationship between typical application workload patterns and application types.
Table 2 Typical application workload patterns
Application type
Workload type
Read
intensive
Write
intensive
I/O
intensive
Throughput
intensive
Random
access
Sequential
access
Good for
eXFlash
OLTP Database Ye s Ye s Ye s Ye s Ye s
Data warehouse Ye s Ye s Ye s Ye s
File server Ye s Ye s Ye s
Email server Ye s Ye s Ye s Ye s Ye s
Medical imaging Ye s Ye s Ye s Ye s
Document imaging Ye s Ye s Ye s
Streaming media Ye s Ye s Ye s6 Choosing eXFlash Storage on IBM eX5 Servers
As a general rule, to deploy the most efficient storage that satisfies application performance
requirements given the required storage capacity, you should consider:
For I/O-intensive workloads: A higher number of hard drives (more drives of smaller
capacities because adding drives provides an almost linear increase in IOPS), or eXFlash
with solid state drives.
For throughput-intensive workloads: A higher bandwidth between the host controller and
storage arrays, utilizing more host ports on a controller and higher port speeds (for
example, 6 Gbps rather than 3 Gbps for SAS or 8 Gbps rather than 4 Gbps for Fibre
Channel) with a sufficient number of drives in the array to put the workload on these links.
Based on Table 2 on page 5, the following application types benefit from the deployment
based on IBM eXFlash and other SSD-based storage:
Databases
Data warehouse
Email
Medical imaging
Video on demand
Web
CAD/CAM storage
Typical IBM eXFlash usage scenarios include:
High-speed read cache in a local or SAN-based storage environment
Temporary local storage space for mid-tier applications and databases
Main (Tier 0) local data storage in a single server environments or in a distributed
scale-out environment with local-only storage or mixed local and SAN-based storage
Typical IBM SSD usage in case of external storage includes Tier 0 main data storage with
automated data movement capabilities like Easy Tier in IBM Storwize V7000.
IBM System x server and storage products
The storage deployment scenarios described in this paper reference IBM eX5 systems, and
entry and midrange IBM System Storage® products. This section provides a brief overview of
these offerings.
Video on demand Ye s Ye s Ye s Ye s
Web/Internet Ye s Ye s Ye s Ye s Ye s
CAD/CAM Ye s Ye s Ye s Ye s
Archives/Backup Ye s Ye s Ye s
Application type
Workload type
Read
intensive
Write
intensive
I/O
intensive
Throughput
intensive
Random
access
Sequential
access
Good for
eXFlash Choosing eXFlash Storage on IBM eX5 Servers 7
IBM eX5 architecture and portfolio
The IBM eX5 product portfolio represents the fifth generation of servers built upon IBM
Enterprise X-Architecture®. Enterprise X-Architecture is the culmination of generations of
IBM technology and innovation derived from our experience in high-end enterprise servers.
Now with eX5, IBM scalable systems technology for Intel processor-based servers has also
been delivered to blades. These servers can be expanded on demand and configured by
using a building block approach that optimizes system design servers for your workload
requirements.
As a part of the IBM Smarter Planet™ initiative, our IBM Dynamic Infrastructure® charter
guides us to provide servers that improve service, reduce cost, and manage risk. These
servers scale to more CPU cores, memory, and I/O than previous systems, enabling them to
handle greater workloads than the systems they supersede. Power efficiency and machine
density are optimized, making them affordable to own and operate.
The ability to increase the memory capacity independently of the processors means that
these systems can be highly utilized, yielding the best return from your application
investment. These systems allow your enterprise to grow in processing, I/O, and memory
dimensions, so that you can provision what you need now, and expand the system to meet
future requirements. System redundancy and availability technologies are more advanced
than the technologies that were previously available in the x86 systems.
The IBM eX5 product portfolio is built on Intel® Xeon® processor E7-8800/4800/2800 product
families. With the inclusion of these processors, the eX5 servers became faster, more reliable,
and more power efficient. As with previous generations of IBM Enterprise X-Architecture
systems, these servers have delivered many class leading benchmarks, including the highest
TPC-E result for a system of any architecture.
IBM eX5 systems
The four systems in the eX5 family are the x3850 X5, x3950 X5, x3690 X5, and the HX5
blade. The eX5 technology is primarily designed around three major workloads: database
servers, server consolidation using virtualization services, and Enterprise Resource Planning
(application and database) servers. Each system can scale with additional memory by adding
an IBM MAX5 memory expansion unit to the server, and the x3850 X5, x3950 X5, and HX5
can also be scaled by connecting two systems to form a 2-node scale.
Figure 2 on page 8 shows the IBM eX5 family.8 Choosing eXFlash Storage on IBM eX5 Servers
Figure 2 eX5 family (top to bottom): IBM BladeCenter® HX5 (2-node), IBM System x3690 X5, and IBM
System x3850 X5 (the IBM System x3950 X5 looks the same as the x3850 X5)
The IBM System x3850 X5 and x3950 X5 are 4U highly rack-optimized servers. The
x3850 X5 and the workload-optimized x3950 X5 are the new flagship servers of the IBM x86
server family. These systems are designed for maximum utilization, reliability, and
performance for computer-intensive and memory-intensive workloads. These servers can be
connected together to form a single system with twice the resources, or support memory
scaling with the attachment of a MAX5. With the new Intel Xeon E7 series processors, the
x3850 X5 and x3950 X5 now can scale to a two server plus two MAX5 configuration.
The IBM System x3690 X5 is a 2U rack-optimized server. This machine brings features and
performance to the middle tier, as well as a memory scalability option with MAX5.
The IBM BladeCenter HX5 is a single-wide (30 mm) blade server that follows the same
design as all previous IBM blades. The HX5 brings unprecedented levels of capacity to
high-density environments. The HX5 is expandable to form either a two-node system with
four processors, or a single-node system with the MAX5 memory expansion blade.
When compared to other machines in the IBM System x® portfolio, these systems represent
the upper end of the spectrum, are suited for the most demanding x86 tasks, and can handle
jobs which previously might have been run on other platforms. To assist with selecting the
ideal system for a given workload, IBM designed workload-specific models for virtualization
and database needs. Choosing eXFlash Storage on IBM eX5 Servers 9
Table 3 gives an overview of the features of IBM eX5 systems.
Table 3 Maximum configurations for the eX5 systems
IBM eX5 chip set
The members of the eX5 server family are defined by their ability to use IBM fifth-generation
chip sets for Intel x86 server processors. IBM engineering, under the banner of Enterprise
X-Architecture (EXA), brings advanced system features to the Intel server marketplace.
Previous generations of EXA chip sets powered System x servers from IBM with scalability
and performance beyond what was available with the chip sets from Intel.
The Intel QuickPath Interconnect (QPI) specification includes definitions for the following
items:
Processor-to-processor communications
Processor-to-I/O hub communications
Connections from processors to chip sets, such as eX5, referred to as node controllers
To fully utilize the increased computational ability of the new generation of Intel processors,
eX5 provides additional memory capacity and additional scalable memory interconnects
(SMIs), increasing bandwidth to memory. The eX5 also provides these additional reliability,
availability, and serviceability (RAS) capabilities for memory: Chipkill, Memory ProteXion, and
Full Array Memory Mirroring.
Maximum configurations x3850 X5/x3950 X5 x3690 X5 HX5
Processors 1-node 4 2 2
2-node 8 Not available 4
Memory 1-node 2048 GB (64 DIMMs)
a
a. Requires full processors to install and use all memory.
1024 GB (32 DIMMs)
b
b. Requires that the memory mezzanine board is installed along with processor 2.
256 GB (16 DIMMs)
1-node
with MAX5
3072 GB (96 DIMMs)
a
2048 GB (64 DIMMs)
b
640 GB (40 DIMMs)
2-node 4096 GB (128 DIMMs)
a
Not available 512 GB (32 DIMMs)
2-node
with MAX5
6144 GB (192 DIMMs)
a
Not available Not available
Disk drives (non-SSD)
c
c. For the x3690 X5 and x3850 X5, additional backplanes might be needed to support these numbers of drives.
1-node 8 16 Not available
2-node 16 Not available Not available
SSDs 1-node 16 24 2
2-node 32 Not available 4
Standard 1 Gb Ethernet
interfaces
1-node 2 2 2
2-node 4 Not available 4
Standard
10 Gb Ethernet interface
1-node 2
d
d. Standard on most models.
2
d
0
2-node 4 Not available 010 Choosing eXFlash Storage on IBM eX5 Servers
QPI uses a source snoop protocol. This technique means that a CPU, even if it knows
another processor has a cache line it wants (the cache line address is in the snoop filter, and
it is in the shared state), must request a copy of the cache line and wait for the result to be
returned from the source. The eX5 snoop filter contains the contents of the cache lines and
can return them immediately.
Memory that is directly controlled by a processor can be accessed faster than through the
eX5 chip set, but it is connected to all processors and introduces less delay than accesses to
memory controlled by another processor in the system.
The eX5 chip set also has, as with previous generations, connectors to allow systems to
scale beyond the capabilities provided by the Intel chip sets. We call this scaling Enterprise
X-Architecture (EXA) scaling. You can use EXA scaling to connect two x3850 X5 servers and
two MAX5 memory expansion units together to form a single system image with up to eight
Intel Xeon E7 processors and up to 6TB of RAM.
Intel Xeon processors
The latest models of the eX5 systems use Intel Xeon E7 processors. Earlier models use Intel
Xeon 7500 or 6500 series processors.
The Intel Xeon E7 family of processors used in the eX5 systems (more precisely, the
E7-2800, E7-4800 and E7-8800 series) are follow-ons to the Intel Xeon 6500 and 7500 family
of processors. Whereas the processor architecture is largely unchanged, the lithography size
was reduced from 45 nm to 32 nm, allowing for more cores (and thus more threads with
Hyper-Threading Technology), and more last level cache, while staying within the same
thermal design profile (TDP) and physical package size.
The three groups of the E7 family of processors support scaling to different levels:
The E7-2800 family is used in the x3690 X5 and BladeCenter HX5. Members of this series
only support two-processor configurations, so they cannot be used in a two-node HX5
configuration. Most processors in this family support connection to a MAX5 (except
E7-2803 and E7-2820).
The E7-4800 family is primarily used in the HX5 and the x3850 X5. This series supports
four-processor configurations, so can be used for single node x3850 X5 and two node
HX5s. All members of the E7-4800 family support connection to a MAX5, and can also be
used for two-node x3850 X5 with MAX5 configurations. Such configurations use EXA
scaling, which the E7-4800 processors support.
The E7-8800 family processors are used in the x3850 X5 to scale to two nodes without
MAX5s. There are specific high frequency and low power models of this processor
available for the x3690 X5 and HX5 as well.
These scalability capabilities are summarized in Table 4 on page 11. Choosing eXFlash Storage on IBM eX5 Servers 11
Table 4 Comparing the scalability configurations of the Intel Xeon E7 processors
For additional information about the IBM eX5 portfolio refer to the following publication:
IBM eX5 Portfolio Overview: IBM System x3850 X5, x3950 X5, x3690 X5, and
BladeCenter HX5
http://www.redbooks.ibm.com/abstracts/redp4650.html
IBM eXFlash technology
IBM eXFlash technology is a server-based high performance internal storage solution that is
based on Solid State Drives (SSDs) and performance-optimized disk controllers (both RAID
and non-RAID).
A single eXFlash unit accommodates up to eight hot-swap SSDs, and can be connected to up
to two performance-optimized controllers. eXFlash is supported on IBM System x3690 X5,
x3850 X5, and x3950 X5 servers.
Figure 3 shows an eXFlash unit, with the status lights assembly on the left side.
Figure 3 IBM eXFlash unit
E7-2800 E7-4800 E7-8800
x3690 Ye s Ye s Ye s
x3690 X5 with MAX5 Ye s
a
a. E7-2803 and E7-2820 processors do not support MAX5
Ye s Ye s
HX5 Ye s Ye s Ye s
HX5 with MAX5 Ye s
a
Ye s Ye s
HX5 2-node Not supported Ye s Ye s
x3850 X5 Not supported Ye s Ye s
x3850 X5 with MAX5 Not supported Ye s Ye s
x3850 X5 2-node without MAX5 Not supported Not supported Ye s
x3850 X5 2-node with MAX5 Not supported Yes (EXA scaling) Yes (EXA scaling)
Status lights
Solid state drives
(SSDs)12 Choosing eXFlash Storage on IBM eX5 Servers
Each eXFlash unit occupies four 2.5-inch SAS hard disk drive bays. The eXFlash units can
be installed in the following configurations:
The x3850 X5 can have up to sixteen 1.8-inch SSDs with up to two eXFlash units (up to
eight SSDs per eXFlash unit).
The x3950 X5 database-optimized models have two eXFlash units standard with sixteen
200 GB SSDs installed (for the models with Intel Xeon E7 series processors) for a total of
32 SSDs in a dual node configurations.
The x3690 X5 can have up to twenty four 1.8-inch SSDs with up to three eXFlash units (up
to eight SSDs per eXFlash unit).
A single IBM eXFlash unit has the following characteristics:
Up to eight 1.8-inch hot-swap front-accessible SSDs
Up to 240,000 random read IOPS
Up to 2 GBps of sustained read throughput
Up to 1.6 TB of available storage space with IBM 200 GB 1.8-inch eMLC SSDs or up to
400 GB with IBM 50 GB 1.8-inch eMLC SSDs
In theory, the random I/O performance of a single eXFlash unit is equivalent to that of a
storage system consisting of about 800 traditional spinning HDDs. Besides the HDDs
themselves, building such a massive I/O-intensive high-performance storage system requires
external deployment with many additional infrastructure components including host bus
adapters (HBAs), switches, storage controllers, disk expansion enclosures, and cables.
Consequently, this leads to more capital expenses, floor space, electrical power
requirements, and operations and support costs. Because eXFlash is based on internal
server storage, it does not require all those additional components and their associated costs
and environmental requirements.
In summary, an IBM eXFlash solution provides the following benefits:
Significantly lower implementation cost (up to 97% lower) for high performance
I/O-intensive storage systems with the best IOPS/$ performance ratio
Significantly higher performance (up to 30 times or more) for I/O-intensive applications like
databases and business analytics with up to nine times less response time
Significant savings in power and cooling (up to 90%) with a high performance per watt
ratio
Significant savings in floor space (up to 30 times less) with extreme performance per rack
U space ratio
Simplified management and maintenance with internal server-based configurations (no
external power and information infrastructure needed)
IBM eXFlash is optimized for a heavy mix of random read and write operations, such as
transaction processing, data mining, business intelligence and decision support, and other
random I/O-intensive applications. In addition to its superior performance, eXFlash offers
superior uptime with three times the reliability of mechanical disk drives. SSDs have no
moving parts to fail. They use Enterprise Wear-Leveling to extend their use even longer. All
operating systems that are listed in IBM ServerProven® for each machine are supported for
use with eXFlash.
The eXFlash SSD backplane uses two long SAS cables, which are included with the
backplane option. When more than one SSD backplane is installed, each backplane must be
connected to a separate disk controller. Choosing eXFlash Storage on IBM eX5 Servers 13
In environments where RAID protection is required, that is, eXFlash is used as a master data
storage, use two RAID controllers per backplane to ensure the peak IOPS can be reached.
Although use of a single RAID controller results in a functioning solution, peak IOPS can be
reduced by a factor of approximately 50%. Use the ServeRAID M5014 or M5015 with the
ServeRAID M5000 Performance Accelerator Key when all drives are SSDs. Alternatively, the
ServeRAID B5015 SSD Controller can be used instead.
The main advantage of B5015 and M5014 or M5015 with Performance Key controllers for
SSDs is a Cut Through I/O (CTIO) feature enabled. CTIO optimizes highly random read and
write I/O operations for small data blocks to support the high IOPS capabilities of SSD drives
and to ensure the fastest response time to the application. For example, enabling CTIO on a
RAID controller with SSDs allows the controller to achieve up to two times more IOPS
compared to a controller with the CTIO feature disabled.
In a non-RAID environment where eXFlash can be used as a high-speed read cache, use the
IBM 6 Gb Performance Optimized HBA to ensure maximum random I/O read performance is
achieved. Only one 6 Gb SSD HBA is supported per single SSD backplane.
It is possible to mix RAID and non-RAID environments; however, the maximum number of
disk controllers that can be used with all SSD backplanes in a single system is four.
IBM eXFlash requires the following components:
IBM eXFlash hot-swap SAS SSD backplane (if not already installed on a standard
pre-configured model)
IBM solid state drives (SSDs)
IBM disk controllers
Table 5 shows ordering information for eXFlash backplanes.
Table 5 IBM eXFlash 8x 1.8-inch HS SAS SSD Backplane
Table 6 lists the 1.8-inch solid state disk (SSD) options that are supported in the eX5 systems.
These drives are supported with the eXFlash SSD backplane, part number 59Y6213.
Table 6 Supported 1.8-inch SSDs
Table 7 lists the supported controllers.
Note: A single eXFlash unit requires a dedicated controller (or two controllers). When used
with eXFlash, these controllers cannot be connected to the HDD backplanes. The
ServeRAID B5015 SSD Controller is only supported with SSDs.
Note: IBM System x3850 X5 and x3690 X5 have different eXFlash backplanes.
Part number Feature code Description
59Y6213 4191 IBM eXFlash 8x 1.8-inch HS SAS SSD Backplane for x3850 X5
60Y0360 9281 IBM eXFlash 8x 1.8-inch HS SAS SSD Backplane for x3690 X5
Part number Feature code Description
43W7726 5428 IBM 50 GB SATA 1.8-inch MLC SSD
43W7746 5420 IBM 200 GB SATA 1.8-inch MLC SSD14 Choosing eXFlash Storage on IBM eX5 Servers
Table 7 Controllers supported with the eXFlash SSD backplane option
For more information about the devices mentioned here, see the relevant IBM Redbooks®
at-a-glance guides:
Solid State Drives for IBM BladeCenter and System x servers
http://www.redbooks.ibm.com/abstracts/tips0792.html
IBM 6 Gb Performance Optimized HBA
http://www.redbooks.ibm.com/abstracts/tips0744.html
ServeRAID B5015 SSD Controller
http://www.redbooks.ibm.com/abstracts/tips0763.html
ServeRAID M5015 and M5014 SAS/SATA Controllers
http://www.redbooks.ibm.com/abstracts/tips0738.html
ServeRAID M5000 Series Performance Accelerator Key for IBM System x
http://www.redbooks.ibm.com/abstracts/tips0799.html
IBM System Storage products
The IBM System Storage disk products portfolio covers the needs of a wide spectrum of
possible implementations, from entry-level to large enterprise. It combines the high
performance of the IBM System Storage DS8000® series and XIV® enterprise storage
systems with the Storwize V7000, N Series, and DS5000 series of midrange systems, and
with the DS3500 series low priced entry systems.
The family is further complemented by a range of expansion enclosures to expand the disk
storage capacities of individual systems into hundreds of terabytes (TB), or even to a
petabyte (PB). Furthermore, a full range of IBM System Storage capabilities such as
advanced copy services, management tools, and virtualization services are available to help
protect data.
For the purpose of this paper, we compare the key capabilities of the entry SAN (DS3500
series), midrange SAN (DS5000 series and V7000), and NAS (midrange N Series and Scale
Out Network Attached Storage - SONAS) storage families from the performance, scalability,
capacity, and advanced features points of view. Table 8 on page 15 compares DS3524,
V7000, DS5300, and N6270, the top models of the respective storage families. The purpose
of this comparison is to provide a brief overview of the relative capabilities of various IBM
System Storage families so that you can understand their positioning and evaluate which
ones are most suitable for your information infrastructure to handle projected workloads.
Part number Feature code Description
46M0912 3876 IBM 6 Gb Performance Optimized HBA
46M0916 3877 ServeRAID M5014 SAS/SATA Controller
a
46M0829 0093 ServeRAID M5015 SAS/SATA Controller
a
a. Requires M5000 Performance Accelerator key when used with eXFlash
46M0969 3889 ServeRAID B5015 SSD Controller
81Y4426 A10C ServeRAID M5000 Series Performance Accelerator Key
b
b. Adds Cut Through I/O (CTIO) for SSD FastPath optimization on ServeRAID M5014, M5015,
and M5025 controllers Choosing eXFlash Storage on IBM eX5 Servers 15
Table 8 IBM System Storage feature comparison: SAN storage
For more information about SPC benchmarks and results visit:
Storage Performance Council (SPC)
http://www.storageperformance.org
For more information about SPECsfs2008 benchmarks and results visit:
Standard Performance Evaluation Corporation (SPEC)
http://www.spec.org/sfs2008/
For more information about available IBM System Storage offerings refer to:
IBM System Storage Solutions Handbook
http://www.redbooks.ibm.com/abstracts/sg245250.html
Storage family
Feature
SAN NAS or File Storage
DS3524 V7000 DS5300 N6270
Scalability and capacity
Host connectivity FC, iSCSI, SAS FC, iSCSI FC, iSCSI FC, iSCSI, FCoE, NAS
Host interface 8 Gb FC, GbE,
10 GbE, 6 Gb SAS
8 Gb FC, GbE,
10 GbE
8 Gb FC, GbE,
10 GbE
4 Gb FC, 8 Gb FC,
GbE, 10 GbE
Max number of host ports 12 32 16 56
Max number of drives 192 240 (LFF), 480 (SFF) 480 960
Drive types SAS, NL SAS, SSD SAS, SSD FC, SATA, SSD FC, SAS, SATA
Max raw capacity, TB 384 480 (LFF), 288 (SFF) 960 2,880
Max cache size, GB 4 32 64 32
Performance
SPC-1 IOPS 24,449
a
a. 96x 300GB 10K RPM 2.5” SAS HDDs
56,511
b
b. 240x 300GB 10K RPM 2.5” SAS HDDs
62,243
c
c. 256x 146.8GB/15K RPM DDMs
Not available
SPC-2 throughput, MBps 2,510 3,132 5,543 Not available
SPECsfs2008_nfs
throughput, operations/s
Not available Not available Not available 101183
d
d. 360x 450GB 15K RPM SAS HDDs
Advanced features
Snapshots Yes Yes Yes Yes
Remote replication Yes Yes Yes Yes
Automated storage tiering No Yes No Yes16 Choosing eXFlash Storage on IBM eX5 Servers
IBM eXFlash deployment scenarios
The following subsections provide some ideas on where to deploy SSD-based IBM eXFlash
solutions locally or in conjunction with external storage to get significant performance benefits
while keeping costs optimized. The following scenarios are discussed:
Transaction processing (OLTP databases)
Data warehousing (OLAP databases)
Corporate email
Actively connected users (Web 2.0)
Medical imaging
Video on demand
Each subsection includes a workload description, possible sources of storage I/O
performance issues assuming no other bottlenecks exist in the system, and recommended
internal IBM eXFlash-based or external IBM System Storage solutions that also include
high-availability and scalability considerations.
OLTP databases
Online transaction processing (OLTP) is a multi-user, memory-, CPU-, and storage
I/O-intensive, random workload. It is characterized by a large number of small read and write
storage I/O requests (typically four or eight kilobytes and 70/30 read/write ratio) generated by
transactions originated by multiple users. The transactions are relatively simple; however,
every single transaction can generate dozens of physical storage I/O requests depending on
transaction type, application architecture, and business model used.
The key performance indicator of transactional systems is the response time: the client
expects to get the requested product information or place an order quickly. If the expectations
are not met then the chance that the client will go to a competitor is high. Because of that,
storage I/O performance is considered an important factor to ensure the response time goals
are met, and to keep other system resources (CPU and memory) at good utilization and not
waiting for the data.
Typically, the OLTP workloads can be classified as light, medium, or heavy based on the
number of transactions per second (tps) as follows:
Light workload: Up to 100 tps
Medium workload: Up to 1,000 tps
Heavy workload: More than 1,000 tps
For the purpose of this use case, assume that we have a local 1 TB database, OLTP
workload of 1,000 tps, and each OLTP transaction generates about 25 physical storage I/O
requests on average. This scenario in turn translates into 25,000 IOPS of physical storage I/O
requests that we should get from our storage system to ensure acceptable response time and
balanced performance. Because IOPS directly depend on the number of drives used in a
traditional HDD-based system we must ensure that we have a sufficient number of HDDs to
support 25,000 IOPS. We assume that a single HDD is capable of 300 IOPS with OLTP
workload. We also assume that the single four-socket x3850 X5 is capable of processing
about 3,000 transactions of this kind per second. Choosing eXFlash Storage on IBM eX5 Servers 17
Figure 4 illustrates this scenario using two approaches: achieving required IOPS with
traditional HDDs and with IBM eXFlash.
Figure 4 IBM eXFlash versus traditional HDDs: OLTP databases
Using the traditional HDD-based approach we need to deploy one x3850 X5 connected to
four IBM System Storage EXP2524 with 84 hard drives by IBM ServeRAID M5025
SAS/SATA Controller.
With IBM eXFlash, which is capable of up to 87,000 IOPS for OLTP workloads, there is no
need for external disk storage at all because all I/O performance requirements are met
internally. Although both configurations can handle 25,000 IOPS, the eXFlash-based
configuration has much higher IOPS capacity of 87,000 IOPS, whereas the HDD-based
configuration is just about at its maximum of 25,200 IOPS for the given number of drives. In
addition, the response time of the IBM eXFlash-based configuration is significantly better.
As you can see, IBM eXFlash is able to significantly reduce power and cooling requirements,
occupied rack space, and management and operational costs, all while providing better
reliability and the same or better performance levels. Table 9 summarizes characteristics of
each scenario and highlights IBM eXFlash advantages.
Table 9 Traditional HDDs versus IBM eXFlash: Local OLTP workload scenario
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage
Number of drives 84 8 Less components to acquire and
maintain, higher reliability,
Drive type 73 GB 15K server-based management
2.5” SAS HDD
200 GB
1.8” SATA SSD
Location External Internal
Maximum IOPS capacity 25,200 87,000 Transparent performance scaling
at no additional cost, significantly
better IOPS/$ ratio, ~3 times
faster response time
Used IOPS capacity 25,000 25,000
Processing time
(25,000 I/O requests)
1 sec 0.29 sec
OLAP Server
IBM System x3850 X5
87,000 IOPS
OLAP Server
IBM System x3850 X5
25,200 IOPS
Online (Tier 0)
1x IBM eXFlash
8x 200 GB SATA SSDs
2x B5015 RAID Controllers
Online (Tier 1)
4x IBM EXP2524
84x 73 GB SAS HDD18 Choosing eXFlash Storage on IBM eX5 Servers
To ensure the high availability requirements are met in case of a node failure, several
techniques can be utilized depending on the database vendor. These techniques include:
Log shipping
Replication
Database mirroring
The partitioning feature of many databases (for example, IBM DB2®) can help to split the
workload between several nodes, thereby increasing overall performance, availability, and
capacity.
If, for some reason, the entire database cannot be placed onto IBM eXFlash, consider putting
at least part of the data there. The areas to look at include:
Log files
Temporary table space
Frequently-accessed tables
Partition tables
Indexes
Some databases (for example, Oracle) support extension of their own data buffers to the
SSDs, which provides significant cost-efficient performance increase.
There are more complex configurations, where tiered storage is implemented with both
internal and external storage, and the data is moved between storage tiers based on defined
policies. The data movement can be implemented manually using the available database
tools (like the DB2 Reorg utility in DB2) or automatically using the appropriate storage
management software, for example, IBM GPFS.
Data warehouses
Data warehouses are commonly used with online analytical processing (OLAP) workloads in
decision support systems, for example, financial analysis. Unlike OLTP, where transactions
are typically relatively simple and deal with small amounts of data, OLAP queries are much
more complex and process large volumes of data. By its nature, OLAP workload is sequential
read-intensive and throughput-intensive; however, in multipurpose multi-user environments it
becomes truly random, and therefore, sensitive to IOPS given the I/O request size.
Raw storage capacity 6.1 TB 1.6 TB Significantly better storage space
utilization, no wasted storage
space, comparable utilized GB/$
ratio
Effective RAID capacity 5.3 TB 1.2 TB
Used capacity 1 TB 1 TB
Power consumption
units of measurement
kW W 99% savings in power and
cooling costs, no external power
infrastructure required
Rack U space 12U 4U No additional rack space required
Acquisition costs Comparable to
eXFlash
Comparable to
HDD
Higher scalability, reliability, and
performance, and significantly
lower power consumption at the
comparable acquisition cost
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage Choosing eXFlash Storage on IBM eX5 Servers 19
OLAP databases are normally separated from OLTP databases, and OLAP databases
consolidate historical and reference information from multiple sources. Queries are submitted
to OLAP databases to analyze consolidated data from different points of view to make better
business decisions in a timely manner.
For OLAP workloads, it is critical to have a fast response time to ensure that business
decisions support an organization’s strategy and are made in a timely manner in response to
changing market conditions; delay might significantly increase business and financial risks.
Because of that, storage I/O capabilities must match the performance of other server
subsystems to ensure that queries are processed as quickly as possible.
For illustration purposes, consider the following scenario. Multiple business analysts need to
evaluate current business performance and discover new potential opportunities. They submit
ten queries, and their queries need to cumulatively process 500 GB of historical data.
Possible approaches to implement this solution are shown on Figure 5.
Figure 5 IBM eXFlash versus traditional HDDs: Data warehouses
In one case, the storage system used with an OLAP server consists of 96 hard disk drives
and is able to deliver 600 MBps of throughput in RAID-5 arrays with random 16 KB I/O
requests. Given this, the queries will be completed by the storage system in approximately 14
minutes, or 1.4 minutes per query on average.
Alternatively, with two IBM eXFlash units, where each of them is capable of approximately
1,300 MBps of throughput in a RAID-5 array with random 16 KB I/O requests, the time to
complete the tasks decreased by more than four times, to 3.2 minutes or about 20 seconds
per query. IBM eXFlash also provides additional benefits and advantages at comparable
acquisition costs, as shown in Table 10.
Table 10 Traditional HDDs versus IBM eXFlash: Local OLAP workload scenario
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage
Number of drives 96 16 Fewer components to acquire
and maintain, higher reliability,
Drive type 73 GB 15K server-based management
2.5” SAS HDD
200 GB
1.8” SATA SSD
Location External Internal
OLAP Server
IBM System x3850 X5
1.3 GB/s throughput
OLAP Server
IBM System x3850 X5
600 MB/s throughput
Online (Tier 0)
1x IBM eXFlash
8x 200 GB SATA SSDs
Online (Tier 1)
4x IBM EXP2524
96x 73 GB SAS HDD20 Choosing eXFlash Storage on IBM eX5 Servers
Corporate email
Corporate email applications like IBM Lotus® Domino® or Microsoft Exchange use
databases to store messages and attachments, logging features for recovery purposes, and
indexes for fast searching of data. The email workload is multi-user, random, storage
I/O-intensive, and characterized by moderate to low numbers of small read and write I/O
requests per second.
Although the storage IOPS are relatively low for email workloads compared to OLTP, they
play an important role in increasing the utilization of the system resources and providing fast
response time for the users working with email.
Consider the following scenario. You need to deploy 10,000 active mailboxes across an
organization, each with a 250 MB disk space quota, and you expect heavy email exchange
resulting in about 8,000 peak storage IOPS. You also decided to implement local-only
storage for email servers, and you use replication of data between servers for high availability
purposes.
We assume that a single IBM System x3690 X5 server can host up to 10,000 active
mailboxes, and a single 2.5” 15K RPM SAS HDD is capable of 300 IOPS.
Given the scenario with traditional HDDs you need four IBM x3690 X5 servers to meet the
requirements outlined previously. In such a case, each server can host up to 5,000 mailboxes
because the number of mailboxes is limited by the capabilities of local HDD storage
(approximately 4,000 IOPS with 14 HDDs). The storage space required for 5,000 mailboxes
is approximately 1.3 TB. During normal operations, each server hosts 2,500 mailboxes (2,000
IOPS), and the remaining capacity is reserved for a failover scenario where the server would
pick up workload from a failed node.
With IBM eXFlash, one IBM x3690 X5 server can host 10,000 mailboxes because storage I/O
is no longer a limiting factor. In this case you need two x3690 X5 servers with two eXFlash
units in each server, and each server will support 5,000 mailboxes during normal operations,
and can accommodate 10,000 mailboxes in case of failover.
Throughput, MBps 600 2,600 Significantly faster query
execution
Processing time, sec 84 20
Raw storage capacity 7.0 TB 3.2 TB Better storage space utilization,
less wasted storage space,
Effective RAID capacity 6.4 TB 2.8 TB comparable utilized GB/$ ratio
Used capacity 500 GB 500 GB
Power consumption
units of measurement
kW W 99% savings in power and
cooling costs, no external power
infrastructure required
Rack U space 12U 4U No additional rack space required
Acquisition costs Comparable to
eXFlash
Comparable to
HDD
Higher scalability, reliability, and
performance, and significantly
lower power consumption at the
comparable acquisition cost
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage Choosing eXFlash Storage on IBM eX5 Servers 21
These scenarios are illustrated on Figure 6.
Figure 6 IBM eXFlash versus traditional HDDs: Corporate email
Table 11 summarizes the characteristics of different local online storage approaches.
Table 11 Traditional HDDs versus IBM eXFlash: Local email workload scenario
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage
Number of servers 4 2 Fewer components to acquire
and maintain, higher reliability,
lower management,
maintenance, and support costs
Number of drives 56 32
Drive type 146 GB 15K
2.5” SAS HDD
200 GB
1.8” SATA SSD
Location Internal Internal
Maximum IOPS 16,000 174,000 Significantly faster response time
(~10 times) and higher IOPS
Processing time
capacity
(8,000 I/O requests)
0.5 sec 0.05 sec
Raw storage capacity 8.2 TB 6.4 TB Better storage space utilization,
less wasted storage space,
RAID-5 capacity 7.7 TB 5.6 TB comparable utilized GB/$ ratio
Used capacity 5.0 TB 5.0 TB
Storage power
consumption units of
measurement
W W Three times lower power and
cooling costs
Rack U space 8U 4U Less rack space required
Acquisition costs Comparable to
eXFlash
a
a. This includes the acquisition cost of additional server hardware
Comparable to
HDD
Higher scalability, reliability, and
performance, and significantly
lower power consumption at the
comparable acquisition cost
E-mail Server
4x IBM System x3690 X5
2,500 mailboxes/server
Online (Tier 0)
4x IBM eXFlash
32x 200 GB SATA SSDs
Online (Tier 1)
Internal
48x 73 GB SAS HDDs
E-mail Server
2x IBM System x3690 X5
5,000 mailboxes/server22 Choosing eXFlash Storage on IBM eX5 Servers
Actively connected users
In a shared collaborative environment, where many users work together, they might produce
and operate with a lot of structured and unstructured content, and this requires large storage
capacity and throughput and IOPS performance. This is especially true for large Web 2.0
deployments with hundreds, thousands, or even millions of users who participate in online
gaming, photo and video sharing, social networking, and other activities through the web
interface. This workload is highly random, both read- and write-intensive, and extremely
I/O-intensive. In addition, because of the highly heterogeneous nature, workloads can be
difficult to predict, and they also require a large amount of data to be stored.
Usually, to achieve the best response time and to meet capacity requirements, the storage
systems for such workloads are deployed using multi-tiered scale-out storage architecture
with automated storage tiering management. This approach allows the provision of petabytes
of storage capacity and hundreds of thousands of operations per second with dozens of
gigabytes per second of throughput to maintain fast response times for the users.
Consider this scenario. An organization provides the following online services for their
web-based users: gaming, messaging, chatting, and photo and video sharing. The total
number of active users is 250,000, and there are 50,000 concurrent users that generate
25,000 storage I/O requests per second. Each user has a storage space quota of 1 GB.
There are four IBM System x3690 X5 servers with load balancing that process user requests,
assuming each server is capable of hosting up to 12,500 concurrent users.
IBM Storwize V7000 is used as an external disk storage system. The nearline storage tier is
built with 144x 2 TB 7.2K RPM 3.5” SATA HDDs providing 288 TB of total space in a RAID-5
configuration (12 RAID-5 arrays of 12 HDDs each). The online storage tier is built with 96x
73 GB 15K RPM 2.5” SAS HDDs providing 6.4 TB of active storage space (eight RAID-5
arrays with 12 HDDs each and assuming that concurrent users actively work with 100 MB of
data (10% of quota) for the total of 5 TB for 50,000 concurrent users) and up to 28,800 IOPS
to meet concurrent IOPS requirements. IBM GPFS is used as a scale-out file system, and it
also provides policy-based automated storage tier management.
With IBM eXFlash, the nearline storage tier is still based on 144 SATA HDDs with IBM
Storwize V7000, but the online tier is implemented with eight internal 200 GB eXFlash SSDs
installed in each of four IBM System x3690 X5 servers. This provides 5.6 TB of tier 0 online
storage (1.4 TB with RAID-5 per server), and 174,000 IOPS (43,500 IOPS per server with
one IBM ServeRAID B5015 Controller). Automated tier management and scale-out
capabilities are also provided by IBM GPFS.
Figure 7 on page 23 illustrates these use cases. Choosing eXFlash Storage on IBM eX5 Servers 23
Figure 7 IBM eXFlash versus traditional HDDs: Actively connected users
Table 12 compares an IBM eXFlash-based approach with traditional HDD disk systems.
Table 12 Traditional HDDs versus IBM eXFlash: Actively connected users
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage
Number of servers 4 4 Fewer components to acquire
and maintain; higher reliability;
lower management,
maintenance, and support costs
Number of online drives 96 32
Online drive type 73 GB 15K
2.5” SAS HDD
200 GB
1.8” SATA SSD
Location External Internal
Maximum IOPS 28,800 174,000 Significantly faster response time
(6 times faster) and higher IOPS
Storage processing time
capacity
(25,000 I/O requests)
0.87 sec 0.14 sec
Raw storage capacity 7.0 TB 6.4 TB Better storage space utilization,
less wasted storage space,
RAID-5 capacity 6.4 TB 5.6 TB comparable utilized GB/$ ratio
Used capacity 5.0 TB 5.0 TB
Storage power
consumption units of
measurement
kW W 99% savings in power and
cooling costs, no external power
infrastructure required
Web 2.0 Server
4x IBM System x3690 X5
IBM GPFS
50,000 concurrent users
Online (Tier 0)
4x IBM eXFlash
32x 200 GB SATA SSDs
Nearline (Tier 2)
Web 2.0 Server
4x IBM System x3690 X5
IBM GPFS
50,000 concurrent users
Online (Tier 1)
IBM Storwize V7000
96x 73 GB SAS HDD
IBM Storwize V7000
144x 2 TB SATA HDDs24 Choosing eXFlash Storage on IBM eX5 Servers
Medical imaging
Medical imaging is widely used for diagnostics purposes, and it includes many sources of
such information, for example, magnetic resonance imaging, computed tomography, digital
X-ray, positron emission tomography, ultrasound, and digital cardiology. All this data must be
stored for a long period of time, and this requires large storage space, sometimes petabytes
of digital data, because the study sizes can range from dozens of megabytes to several
hundred megabytes. At the same time, for faster diagnostics, there is a need to quickly get
the required images when needed.
Medical imaging is a multi-user random read-intensive workload. The key performance goal
is to achieve high throughput in MBps with random reads to ensure the quick delivery of
current information. The tiered storage design model with scale-out approach and automated
tier management capabilities fits well to support such workload because it provides required
throughput and response time together with high data storage capacity. In this case, after the
images are acquired they are placed on Tier 0 or Tier 1 storage (or short-term cache) to be
ready for examination by diagnosticians. After a certain period, for example when the patient
has been treated, these images are moved to the nearline tier (long-term cache). In addition,
when patients are scheduled to visit the doctor by appointment, their studies can be retrieved
from the long-term cache and put into short-term cache if needed.
Consider the following scenario. There are 500 patients treated every day in the hospital, 50
patients are treated by doctors at the same time, and the average patient’s study is 100 MB.
The throughput required from the Picture Archiving and Communication System (PACS)
server to fulfill doctors’ requests for studies should be 5 GBps to deliver the data in one
second or 2.5 GBps to deliver the data in two seconds. The space required to store the
short-term data is 5 GB, and about 1.5 TB of date is added to the medical archive on a
monthly basis.
The main archive is based on nearline storage consisting of IBM System Storage N6270 with
200x 2 TB SATA drives providing 400 TB of available storage space. If the short-term cache
is built on traditional HDDs, then 240 HDDs are required to achieve 5 GBps or 120 HDDs to
achieve 2.5 GBps. With IBM eXFlash, you need three units to meet 5 GBps throughput, or
two units to meet 2.5 GBps.
Rack U space (Tier 0/1) 8U 0U Less rack space required
Acquisition costs Comparable to
eXFlash
Comparable to
HDD
Higher scalability, reliability, and
performance, and significantly
lower power consumption at the
comparable acquisition cost
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage Choosing eXFlash Storage on IBM eX5 Servers 25
These scenarios are shown on Figure 8.
Figure 8 IBM eXFlash versus traditional HDDs: Medical imaging
Table 13 summarizes the characteristics of each scenario.
Table 13 Traditional HDDs versus IBM eXFlash: Medical imaging
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage
Number of servers 1 1 Fewer components to acquire
and maintain; higher reliability;
lower management,
maintenance, and support costs
Number of online drives 120 24
Online drive type 73 GB 15K
2.5” SAS HDD
200 GB
1.8” SATA SSD
Location External Internal
Maximum throughput 2.7 GBps 6 GBps Higher throughput capacity,
faster transfer rates
Study load time 1.9 sec 0.8 sec
Raw storage capacity 8.8 TB 4.8 TB Better storage space utilization,
less wasted storage space,
RAID-5 capacity 8.0 TB 4.2 TB comparable utilized GB/$ ratio
Used capacity 5 GB 5 GB
PACS Server
IBM System x3690 X5
5 GB/s throughput
Online (Tier 0)
3x IBM eXFlash
24x 200 GB SATA SSDs
PACS Server
IBM System x3690 X5
2.7 GB/s throughput
5x IBM EXP2524
120x 73 GB SAS HDD
Online (Tier1)
Nearline (Tier 2)
1x IBM N6270
200x 2 TB SATA HDDs26 Choosing eXFlash Storage on IBM eX5 Servers
Video on demand
While video on demand is traditionally sequential throughput-intensive workload, in a
multi-user environment, where every user receives their own data stream watching different
content or even the same content with some delay (for example, a recently published new
movie), the workload becomes randomized, and this requires faster response time to ensure
a better user experience and smoother video playback. In general, video streaming
applications use 64 KB I/O blocks to interact with the storage system.
As with medical imaging, video libraries require a significant amount of storage space and
sufficient throughput. This can be achieved using a tiered storage design approach with
automated tier management capabilities. With such an approach, movie libraries reside on
nearline storage, and the movies currently being watched are placed on the online (Tier 0 or
Tier 1) storage.
Consider the following scenario. A provider of on demand video content has 50,000
subscribers, and 10,000 movies in their video library. Five thousand subscribers are active at
the same time, and they watch 1,000 videos simultaneously (that is, one video is watched by
five subscribers on average). The provider uses SD video that requires about 4 Mbps or 0.5
MBps per stream, and the average movie size is 4 GB.
Let’s assume that a single IBM System x3690 X5 server used in the scenario is capable of
handling 5,000 simultaneous video streams. The total throughput for 5,000 concurrent
streams is 2.5 GBps. To store 1,000 videos we need 4 TB of storage space.
For the main video library storage, we use a nearline tier with IBM System Storage DS3524
with 48x 1 TB 7.2K RPM 2.5” SATA HDDs configured as a RAID-5 array. For the online tier we
can choose either SAS HDDs (Tier 1) or IBM eXFlash with SSDs (Tier 0), as shown on
Figure 9 on page 27.
Storage power
consumption units of
measurement
kW W 99% savings in power and
cooling costs, no external power
infrastructure required
Rack U space (Tier 0/1) 10U 0U Less rack space required
Acquisition costs Comparable to
eXFlash
Comparable to
HDD
Higher scalability, reliability, and
performance, and significantly
lower power consumption at the
comparable acquisition cost
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage Choosing eXFlash Storage on IBM eX5 Servers 27
Figure 9 IBM eXFlash versus traditional HDDs: Video on demand
With the traditional HDD scenario, to achieve such a throughput of 2.5 GBps with random I/O
reads and 4 TB of storage space we need about 120x 73 GB 15K RPM SAS HDDs in Tier 1
storage. With the IBM eXFlash scenario, we need three eXFlash units to meet the capacity
requirements, and we also meet the throughput requirement because a single eXFlash is
capable of 2 GBps of read throughput. Table 14 summarizes the characteristics of these
scenarios and highlights IBM eXFlash advantages.
Table 14 Traditional HDDs versus IBM eXFlash: Video on demand
Characteristic Traditional HDD IBM eXFlash IBM eXFlash advantage
Number of servers 1 1 Fewer components to acquire
and maintain; higher reliability;
lower management,
maintenance, and support costs
Number of online drives 120 24
Online drive type 73 GB 15K
2.5” SAS HDD
200 GB
1.8” SATA SSD
Location External Internal
Maximum throughput 2.7 GBps 6 GBps Higher throughput capacity,
faster transfer rates
Raw storage capacity 8.8 TB 4.8 TB Better storage space utilization,
less wasted storage space,
RAID-5 capacity 8.0 TB 4.2 TB comparable utilized GB/$ ratio
Used capacity 4.0 TB 4.0 TB
Storage power
consumption units of
measurement
kW W 99% savings in power and
cooling costs, no external power
infrastructure required
Rack U space (Tier 0/1) 10U 0U Less rack space required
Acquisition costs Comparable to
eXFlash
Comparable to
HDD
Higher scalability, reliability, and
performance, and significantly
lower power consumption at the
comparable acquisition cost
PACS Server
IBM System x3690 X5
5,000 connections
Online (Tier 0)
3x IBM eXFlash
24x 200 GB SATA SSDs Nearline (Tier 2)
1x IBM DS3524
1x IBM EXP3524
48x 1 TB SATA HDDs
Online (Tier 1)
5x IBM EXP2524
120x 73 GB SAS HDD
VoD Server
IBM System x3690 X5
5,000 connections28 Choosing eXFlash Storage on IBM eX5 Servers
Summary
We described several scenarios with different workload patterns where IBM eXFlash was
able to provide obvious benefits compared to traditional HDD-based approaches. In
summary, IBM eXFlash helps to:
Significantly decrease implementation costs (up to 97% lower) of high performance
I/O-intensive storage systems with best IOPS/$ performance ratio
Significantly increase performance (up to 30 times or more) of I/O-intensive applications
like databases and business analytics
Significantly save on power and cooling (up to 90%) with a high performance per watt ratio
Significantly save on floor space (up to 30 times less) with extreme performance per rack
U space ratio
In addition, the majority of current systems use a tiered storage approach, where “hot” data is
placed close to the application on the online storage (Tier 0 or Tier 1), “warm” data is placed
on the nearline storage (Tier 2), and “cold” data is placed on the offline storage (Tier 3). In
such a case, IBM eXFlash can be utilized as Tier 0 storage.
In general, typical IBM eXFlash usage scenarios include:
High-speed read cache in a local or SAN-based storage environment
Temporary local storage space for mid-tier applications and databases
Main (Tier 0) local data storage in single server environments or in a distributed scale-out
environment with local-only storage or mixed local and SAN-based storage
The author who wrote this paper
This paper was produced by a technical specialist working at the International Technical
Support Organization, Raleigh Center.
Ilya Krutov is an Advisory IT Specialist and project leader at the International Technical
Support Organization, Raleigh Center, and has been with IBM since 1998. Prior roles in IBM
included STG Run Rate Team Leader, Brand Manager for IBM System x and BladeCenter,
Field Technical Sales Support (FTSS) specialist for System x and BladeCenter products, and
instructor at IBM Learning Services Russia (IBM x86 servers, Microsoft NOS, Cisco). He
graduated from the Moscow Engineering and Physics Institute, and holds a Bachelor’s
degree in Computer Engineering.
Portions of this paper were taken from IBM eX5 Portfolio Overview: IBM System x3850 X5,
x3950 X5, x3690 X5, and BladeCenter HX5, REDP-4650. Thanks to the authors of that
document:
David Watts
Duncan Furniss
Scott Haddow
Jeneea Jervay
Eric Kern
Cynthia Knight
Thanks to the following for his contributions to this project:
Paul Nashawaty Choosing eXFlash Storage on IBM eX5 Servers 29
Now you can become a published author, too!
Here's an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Stay connected to IBM Redbooks
Find us on Facebook:
http://www.facebook.com/IBMRedbooks
Follow us on Twitter:
http://twitter.com/ibmredbooks
Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html30 Choosing eXFlash Storage on IBM eX5 Servers© Copyright International Business Machines Corporation 2011. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by
GSA ADP Schedule Contract with IBM Corp. 31
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. 32 Choosing eXFlash Storage on IBM eX5 Servers
®
Redpaper™
This document REDP-4807-00 was created or updated on December 9, 2011.
Send us your comments in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400 U.S.A.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
BladeCenter®
DB2®
Domino®
DS8000®
Dynamic Infrastructure®
Easy Tier™
GPFS™
IBM®
Lotus®
Redbooks®
Redpaper™
Redbooks (logo) ®
ServerProven®
Smarter Planet™
Storwize®
System Storage®
System x®
X-Architecture®
XIV®
The following terms are trademarks of other companies:
Microsoft, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel
SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Other company, product, or service names may be trademarks or service marks of others. |