PowerVM virtualization- IBM Power E1050

The PowerVM platform is the family of technologies, capabilities, and offerings that delivers industry-leading virtualization for enterprises. It is the umbrella branding term for IBM Power processor-based server virtualization:

Ê IBM Power Hypervisor

Ê Logical partitioning

Ê IBM Micro-Partitioning®

Ê Virtual I/O Server (VIOS)

Ê Live Partition Mobility (LPM)

PowerVM is a combination of hardware and software enablement.

Note: PowerVM Enterprise Edition License Entitlement is included with each Power10 processor-based mid-range server. PowerVM Enterprise Edition is available as a hardware feature (#EPVV); supports up to 20 partitions per core, VIOS, multiple shared processor pools (MSPPs); and also offers LPM.

5.1.1 IBM Power Hypervisor

IBM Power processor-based servers are combined with PowerVM technology and offer the following key capabilities that can help to consolidate and simplify IT environments:

Ê Improve server usage and share I/O resources to reduce the total cost of ownership (TCO) and better use IT assets.

Ê Improve business responsiveness and operational speed by dynamically reallocating resources to applications as needed to better match changing business needs or handle unexpected changes in demand.

Ê Simplify IT infrastructure management by making workloads independent of hardware resources so that business-driven policies can be used to deliver resources that are based on time, cost, and service-level requirements.

Combined with features in the Power10 processor-based mid-range servers, the Power Hypervisor delivers functions that enable other system technologies, including logical partitioning technology, virtualized processors, IEEE virtual local area network (VLAN)-compatible virtual switches, virtual Small Computer Serial Interface (SCSI) adapters, virtual Fibre Channel (FC) adapters, and virtual consoles.

The Power Hypervisor is a basic component of the system’s firmware and offers the following functions:

Ê Provides an abstraction between the physical hardware resources and the LPARs that use them.

Ê Enforces partition integrity by providing a security layer between LPARs.

Ê Controls the dispatch of virtual processors to physical processors.

Ê Saves and restores all processor state information during a logical processor context switch.

Ê Controls hardware I/O interrupt management facilities for LPARs.

Ê Provides VLAN channels between LPARs that help reduce the need for physical Ethernet adapters for inter-partition communication.

Ê Monitors the enterprise Baseboard Management Controller (eBMC) and performs a reset or reload if needed, notifying the operating system (OS) if the problem is not corrected.

The Power Hypervisor is always active, regardless of the system configuration or whether it is connected to the managed console. It requires memory to support the resource assignment of the LPARs on the server. The amount of memory that is required by the Power Hypervisor firmware varies according to several factors:

Ê Memory usage for hardware page tables (HPTs)

Ê Memory usage to support I/O devices

Ê Memory usage for virtualization

Memory usage for hardware page tables

Each partition on the system includes its own HPT that contributes to hypervisor memory usage. The HPT is used by the OS to translate from effective addresses to physical real addresses in the hardware. This translation from effective to real addresses allows multiple OSs to run simultaneously in their own logical address space. Whenever a virtual processor for a partition is dispatched on a physical processor, the hypervisor indicates to the hardware the location of the partition HPT that can be used when translating addresses.

The amount of memory for the HPT is based on the maximum memory size of the partition and the HPT ratio. The default HPT ratio is 1/128th (for AIX, VIOS, and Linux partitions) of the maximum memory size of the partition. AIX, VIOS, and Linux use larger page sizes (16 and 64 KB) instead of using 4 KB pages. The use of larger page sizes reduces the overall number of pages that must be tracked; therefore, the overall size of the HPT can be reduced. For example, the HPT is 2 GB for an AIX partition with a maximum memory size of 256 GB.

When defining a partition, the maximum memory size that is specified is based on the amount of memory that can be dynamically added to the dynamic logical partition (DLPAR) without changing the configuration and restarting the partition.

In addition to setting the maximum memory size, the HPT ratio can be configured. The hpt_ratio parameter for the chsyscfg Hardware Management Console (HMC) command can be issued to define the HPT ratio that is used for a partition profile. The valid values are 1:32, 1:64, 1:128, 1:256, or 1:512.

Specifying a smaller absolute ratio (1/512 is the smallest value) decreases the overall memory that is assigned to the HPT. Testing is required when changing the HPT ratio because a smaller HPT might incur more CPU consumption because the OS might need to reload the entries in the HPT more frequently. Most customers choose to use the IBM provided default values for the HPT ratios.

Memory usage for I/O devices

In support of I/O operations, the hypervisor maintains structures that are called the translation control entities (TCEs), which provide an information path between I/O devices and partitions. The TCEs provide the address of the I/O buffer, indications of read versus write requests, and other I/O-related attributes. Many TCEs are used per I/O device, so multiple requests can be active simultaneously to the same physical device. To provide better affinity, the TCEs are spread across multiple processor chips or drawers to improve performance while accessing the TCEs.

For physical I/O devices, the base amount of space for the TCEs is defined by the hypervisor that is based on the number of I/O devices that are supported. A system that supports high-speed adapters also can be configured to allocate more memory to improve I/O performance. Linux is the only OS that uses these extra TCEs so that the memory can be freed for use by partitions if the system uses only AIX.













































Ê Provides VLAN channels between LPARs that help reduce
the need for physical Ethernet adapters for inter-partition communication.

Ê Monitors the enterprise
Baseboard Management Controller
(eBMC) and performs a reset or
reload if needed, notifying the
operating system (OS) if the problem is not corrected.

 

The Power Hypervisor is always active, regardless of the system configuration or whether it is
connected to the managed console. It requires memory to support the
resource assignment of the LPARs on the server. The amount of memory that is required
by the Power Hypervisor firmware varies according to several
factors:

Ê Memory usage for hardware page tables
(HPTs)
Ê Memory usage to support
I/O devices

Ê Memory usage for virtualization

 

Memory usage for hardware page tables

Each
partition on the system includes its own HPT that contributes to hypervisor
memory usage. The HPT is used by the
OS to translate from effective
addresses to physical real addresses in
the hardware. This translation from effective to real addresses allows
multiple OSs to run simultaneously in their own logical address space.
Whenever a virtual processor for a partition is dispatched on a physical
p
rocessor, the hypervisor indicates to the hardware
the location of the partition HPT
that can be used when translating
addresses.

 

The
amount of memory for the HPT is based
on the maximum memory size of the partition and the HPT ratio.
The default HPT ratio
is 1/128th (for AIX, VIOS, and Linux partitions) of the maximum
memory size of the partition. AIX, VIOS, and Linux use larger page sizes
(16 and 64 KB) instead of using 4 KB pages. The use of larger page sizes reduces the overall number of pages that must be tracked; therefore, the overall size
of the HPT can be reduced. For example, the HPT is 2 GB for an AIX partition with a maximum memory size
of 256 GB.

 

When defining
a partition, the maximum memory size that is specified
is based on the amount
of memory that can be dynamically
added to the dynamic logical partition (DLPAR) without changing
the configuration and restarting the
partition.

 

In addition to setting
the maximum memory size, the
HPT ratio can be configured. The

hpt_ratio parameter for the chsyscfg Hardware Management Console (HMC) command
can be issued to define
the HPT ratio that is used for a partition profile.
The valid values
are 1:32, 1:64, 1:128, 1:256, or 1:512.

 

Specifying a
smaller absolute ratio (1/512 is the smallest value) decreases the
overall memory that is assigned to
the HPT. Testing is required when changing the HPT ratio because
a smaller HPT might incur more CPU consumption
because the OS might need to reload the entries in the HPT more
frequently. Most customers choose to use the IBM
provided default values for the HPT
ratios.

 

Memory usage for I/O devices

In
support of I/O operations, the hypervisor
maintains structures that are
called the
translation control entities (TCEs), which provide an
information path between I/O devices and partitions. The TCEs provide the address of the I/O buffer, indications of read versus write requests, and other I/O-related attributes. Many TCEs are used per I/O device, so multiple requests can be active
simultaneously to the same physical device.
To provide better
affinity, the TCEs are spread across multiple processor chips or drawers to improve performance
while accessing the TCEs.

Serviceability- IBM Power E1050

The purpose of serviceability is to efficiently repair the system while attempting to minimize or eliminate any impact to system operation. Serviceability includes system installation, Miscellaneous Equipment Specification (MES) (system upgrades/downgrades), and system maintenance or repair. Depending on the system and warranty contract, service may be performed by the client, an IBM representative, or an authorized warranty service provider. The serviceability features that are delivered in this system help provide a highly efficient service environment by incorporating the following attributes:

Ê Designed for IBM System Services Representative (IBM SSR) setup, install, and service.

Ê Error Detection and Fault Isolation (ED/FI).

Ê FFDC.

Ê Light path service indicators.

Ê Service and FRU labels that are available on the system.

Ê Service procedures are documented in IBM Documentation or available through the HMC.

Ê Automatic reporting of serviceable events to IBM through the Electronic Service Agent (ESA) Call Home application.

4.5.1 Service environment

In the PowerVM environment, the HMC is a dedicated server that provides functions for configuring and managing servers for either partitioned or full-system partition by using a GUI, command-line interface (CLI), or Representational State Transfer (REST) API. An HMC that is attached to the system enables support personnel (with client authorization) to remotely or locally (by using the physical HMC that is in proximity of the server being serviced) log in to review error logs and perform remote maintenance if required.

The Power10 processor-based servers support several service environments:

Ê Attachment to one or more HMCs or virtual HMCS (vHMCs) is a supported option by the system with PowerVM. This configuration is the default one for servers supporting logical partitions (LPARs) with dedicated or virtual I/O. In this case, all servers have at least one LPAR.

Ê No HMC. There are two service strategies for non-HMC systems:

– Full-system partition with PowerVM: A single partition owns all the server resources and only one operating system (OS) may be installed. The primary service interface is through the OS and the service processor.

– Partitioned system with NovaLink: In this configuration, the system can have more than one partition and can be running more than one OS. The primary service interface is through the service processor.

4.5.2 Service interface

Support personnel can use the service interface to communicate with the service support applications in a server by using an operator console, a GUI on the management console or service processor, or an OS terminal. The service interface helps to deliver a clear, concise view of available service applications, helping the support team to manage system resources and service information in an efficient and effective way. Applications that are available through the service interface are carefully configured and placed to grant service providers access to important service functions. Different service interfaces are used, depending on the state of the system, hypervisor, and operating environment. The primary service interfaces are:

Ê LEDs

Ê Operator panel

Ê BMC Service Processor menu

Ê OS service menu

Ê Service Focal Point (SFP) on the HMC or vHMC with PowerVM

In the light path LED implementation, the system can clearly identify components for replacement by using specific component-level LEDs and also can guide the servicer directly to the component by signaling (turning on solid) the enclosure fault LED and component FRU fault LED. The servicer also can use the identify function to flash the FRU-level LED. When this function is activated, a roll-up to the blue enclosure locate occurs. These enclosure LEDs turn on solid and can be used to follow the light path from the enclosure and down to the specific FRU in the PowerVM environment.

4.5.3 First Failure Data Capture and error data analysis

FFDC is a technique that helps ensure that when a fault is detected in a system, the root cause of the fault is captured without the need to re-create the problem or run any sort of extending tracing or diagnostics program. For most faults, a good FFDC design means that the root cause also can be detected automatically without servicer intervention.

FFDC information, error data analysis, and fault isolation are necessary to implement the advanced serviceability techniques that enable efficient service of the systems and to help determine the failing items.

In the rare absence of FFDC and Error Data Analysis, diagnostics are required to re-create the failure and determine the failing items.

4.5.4 Diagnostics

The general diagnostic objectives are to detect and identify problems so that they can be resolved quickly. Elements of th IBM diagnostics strategy include:

Ê Provides a common error code format equivalent to a system reference code with a PowerVM, system reference number, checkpoint, or firmware error code.

Ê Provides fault detection and problem isolation procedures. Supports remote connection, which can be used by the IBM Remote Support Center or IBM Designated Service.

Ê Provides interactive intelligence within the diagnostics with detailed online failure information while connected to the IBM back-end system.

4.5.5 Automatic diagnostics

The processor and memory FFDC technology is designed to perform without re-creating diagnostics or user intervention. Solid and intermittent errors are designed to be correctly detected and isolated at the time the failure occurs. Runtime and boot-time diagnostics fall into this category.

4.5.6 Stand-alone diagnostics

As the name implies, stand-alone or user-initiated diagnostics requires user intervention. The user must perform manual steps, including:

Ê Booting from the diagnostics CD, DVD, Universal Serial Bus (USB), or network

Ê Interactively selecting steps from a list of choices

4.5.7 Concurrent maintenance

The determination of whether a firmware release can be updated concurrently is identified in the readme file that is released with the firmware. An HMC is required for a concurrent firmware update with PowerVM. In addition, concurrent maintenance of PCIe adapters and NVMe drives is supported by PowerVM. Power supplies, fans, and operating panel LCDs are hot-pluggable.

Original equipment manufacturer racks- IBM Power E1050

The system can be installed in a suitable OEM rack if that the rack conforms to the EIA-310-D standard for 19-inch racks. This standard is published by the Electrical Industries Alliance. For more information, see IBM Documentation.

IBM Documentation provides the general rack specifications, including the following information:

Ê The rack or cabinet must meet the EIA Standard EIA-310-D for 19-inch racks, which was published August 24, 1992. The EIA-310-D standard specifies internal dimensions, for example, the width of the rack opening (width of the chassis), the width of the module mounting flanges, and the mounting hole spacing.

Ê The front rack opening must be a minimum of 450 mm (17.72 in.) wide, and the

rail-mounting holes must be 465 mm plus or minus 1.6 mm (18.3 in. plus or minus 0.06 in.) apart on center (horizontal width between vertical columns of holes on the two front-mounting flanges and on the two rear-mounting flanges).

Figure 3-12 is a top view showing the rack specification dimensions.

Figure 3-12 Rack specifications (top-down view)

Ê The vertical distance between mounting holes must consist of sets of three holes that are spaced (from bottom to top) 15.9 mm (0.625 in.), 15.9 mm (0.625 in.), and 12.7 mm (0.5 in.) on center, which makes each three-hole set of vertical hole spacing 44.45 mm (1.75 in.) apart on center.

Figure 3-13 shows the vertical distances between the mounting holes.

Figure 3-13 Vertical distances between mounting holes

Ê The following rack hole sizes are supported for racks where IBM hardware is mounted:

– 7.1 mm (0.28 in.) plus or minus 0.1 mm (round)

– 9.5 mm (0.37 in.) plus or minus 0.1 mm (square)

The rack or cabinet must be capable of supporting an average load of 20 kg (44 lb.) of product weight per EIA unit. For example, a four EIA drawer has a maximum drawer weight of 80 kg (176 lb.).

AC power distribution unit and rack content- IBM Power E1050

The IBM high-function PDUs provide more electrical power per PDU as earlier IBM PDUs, and they offer better PDU footprint efficiency. In addition, they are intelligent PDUs (iPDUs) that provide insight to actual power usage by receptacle and also provide remote power on/off capability for easier support by individual receptacle. The latest PDUs can be ordered as #ECJJ, #ECJL, #ECJN, and #ECJQ.

IBM Manufacturing integrates only the newer PDUs with the Power E1050 server.

IBM Manufacturing does not support integrating earlier PDUs, such as #7188, #7109, or #7196. Clients can choose to use older IBM PDUs in their racks, but must install those earlier PDUs at their site.

Table 3-25 summarizes the high-function PDU FCs for 7965-S42 followed by a descriptive list.

Table 3-25 High-function PDUs that are available with IBM Enterprise Slim Rack (7965-S42)

a. The Power E1050 server has an AC power supply with a C19/C20 connector.

Power sockets: The Power E1050 server takes IEC 60320 C19/C20 mains power and not C13. Ensure that the correct power cords and PDUs are ordered or available in the rack.

Ê High Function 9xC19 PDU plus (#ECJJ)

This intelligent, switched 200 – 240 V AC PDU includes nine C19 receptacles on the front of the PDU. The PDU is mounted on the rear of the rack, which makes the nine C19 receptacles easily accessible. For comparison, this PDU is most like the earlier generation #EPTJ PDU.

Ê High Function 9xC19 PDU plus 3-Phase (#ECJL)

This intelligent, switched 208 V 3-phase AC PDU includes nine C19 receptacles on the front of the PDU. The PDU is mounted on the rear of the rack, which makes the nine C19 receptacles easily accessible. For comparison, this PDU is most like the earlier generation #EPTL PDU.

Ê High Function 12xC13 PDU plus (#ECJN)

This intelligent, switched 200 – 240 V AC PDU includes 12 C13 receptacles on the front of the PDU. The PDU is mounted on the rear of the rack, which makes the 12 C13 receptacles easily accessible. For comparison, this PDU is most like the earlier generation #EPTN PDU.

Ê High Function 12xC13 PDU plus 3-Phase (#ECJQ)

This intelligent, switched 208 V 3-phase AC PDU includes 12 C13 receptacles on the front of the PDU. The PDU is mounted on the rear of the rack, which makes the 12 C13 receptacles easily accessible. For comparison, this PDU is most like the earlier generation #EPTQ PDU.

The PDU receives power through a UTG0247 power-line connector. Each PDU requires one PDU-to-wall power cord. Various power cord features are available for various countries and applications by varying the PDU-to-wall power cord, which must be ordered separately.

Each power cord provides the unique design characteristics for the specific power requirements. To match new power requirements and save previous investments, these power cords can be requested with an initial order of the rack or with a later upgrade of the rack features.

Table 3-26 shows the available wall power cord options for the PDU features, which must be ordered separately.

Table 3-26 PDU-to-wall power cord options for the PDU features

Notes: Ensure that the suitable power cord feature is configured to support the power that is being supplied. Based on the power cord that is used, the PDU can supply 4.8 – 19.2 kVA. The power of all the drawers that are plugged into the PDU must not exceed the power cord limitation.

The Universal PDUs are compatible with previous models.

To better enable electrical redundancy, the Power E1050 server has four power supplies that must be connected to separate PDUs, which are not included in the base order.

For maximum availability, a best practice is to connect power cords from the same system to two separate PDUs in the rack, and to connect each PDU to independent power sources.

For more information about power requirements of and the power cord for the 7965-94Y rack, see IBM Documentation.

External IO subsystems- IBM Power E1050

If more PCIe slots beyond the system node slots are required, the Power E1050 server supports adding I/O expansion drawers.

If you need more disks than are available with the internal disk bays, you can attach external disk subsystems that can be attached to the Power E1050 server, such as:
Ê EXP24SX SAS Storage Enclosure Ê IBM System Storage

Note: The existing EXP12SX SAS Storage Enclosure (#ESLL) is still supported. Earlier storage enclosures like the EXP12S SAS Drawer (#5886) and EXP24 SCSI Disk Drawer (#5786) are not supported on the Power E1050.

3.9.1 PCIe Gen3 I/O expansion drawer

This 19-inch, 4U (4 EIA) enclosure provides PCIe Gen3 slots outside of the system unit. It has two module bays. One 6-slot fanout module (#EMXH) can be placed in each module bay. Two 6-slot modules provide a total of 12 PCIe Gen3 slots. Each fanout module is connected to a PCIe3 Optical Cable adapter that is in the system unit over an Active Optical Cable (AOC) pair or a CXP copper cable pair.

The PCIe Gen3 I/O Expansion Drawer has two redundant, hot-plug power supplies. Each power supply has its own separately ordered power cord. The two power cords plug in to a power supply conduit that connects to the power supply. The single-phase AC power supply is rated at 1030 W and can use 100 – 120 V or 200 – 240 V. If using 100 – 120 V, then the maximum is 950 W. It is a best practice that the power supply connects to a power distribution unit (PDU) in the rack. Power Systems PDUs are designed for a 200 – 240 V electrical source.

The drawer has fixed rails that can accommodate racks with depths 27.5″ (69.9 cm) – 30.5″ (77.5 cm).

Chapter 3. Available features and options 99

Attention: #EMX0 has a cable management bracket at the rear of the drawer that swings up to provide service access to the PCIe adapters. 2U (2 EIA) of space is required to swing up the bracket. Thus, the drawer cannot be placed at the top 2U of a rack. There is a power cord access consideration with vertically mounted PDUs on the right side of the rack when viewed from the rear of the rack. The #EMX0 cable management bracket makes accessing some of the PDU outlets at the same rack height as the #EMX0 drawer more challenging. Using a horizontally mounted PDU or placing the PDU or #EMX0 at a different vertical location is recommended.

A BSC is used to house the full-height adapters that go into these slots. The BSC is the same BSC that is used with the previous generation server’s 12X attached I/O drawers (#5802, #5803, #5877, and #5873). The drawer includes a full set of BSCs, even if the BSCs are empty.

Concurrent repair, and adding or removing PCIe adapters is done by HMC-guided menus or by OS support utilities.

Figure 3-3 shows a PCIe Gen3 I/O expansion drawer.

Figure 3-3 PCIe Gen3 I/O expansion drawer

Figure 3-4 shows the back view of the PCIe Gen3 I/O expansion drawer.

Figure 3-4 Rear view of the PCIe Gen3 I/O expansion drawer

I/O drawers and usable PCI slots

Figure 3-5 shows the rear view of the PCIe Gen3 I/O expansion drawer that is equipped with two PCIe3 6-slot fanout modules with the location codes for the PCIe adapter slots.

Figure 3-5 Rear view of a PCIe Gen3 I/O expansion drawer with PCIe slots location codes

Chapter 3. Available features and options 101

Table 3-20 provides details about the PCI slots in the PCIe Gen3 I/O expansion drawer that is equipped with two PCIe3 6-slot fanout modules.

Table 3-20 PCIe slot locations for the PCIe Gen3 I/O expansion drawer with two fanout modules

Ê All slots support full-length, full-height adapters or short (LP) adapters with a full-height tailstock in a single-wide, Gen3 BSC.
Ê Slots C1 and C4 in each PCIe3 6-slot fanout module are x16 PCIe3 buses, and slots C2, C3, C5, and C6 are x8 PCIe buses.
Ê All slots support enhanced error handling (EEH).
Ê All PCIe slots are hot-swappable and support concurrent maintenance.

Table 3-21 summarizes the maximum number of I/O drawers that are supported and the total number of PCI slots that are available.

Table 3-21 Maximum number of I/O drawers that are supported and total number of PCI slots