Live Partition Mobility- IBM Power E1050

With LPM, you can move a running LPAR from one system to another one without disruption. Inactive partition mobility allows you to move a powered-off LPAR from one system to another one.

LPM provides systems management flexibility and improves system availability by avoiding the following situations:

Ê Planned outages for hardware upgrade or firmware maintenance.

Ê Unplanned downtime. With preventive failure management, if a server indicates a potential failure, you can move its LPARs to another server before the failure occurs.

For more information and requirements for LPM, see IBM PowerVM Live Partition Mobility, SG24-7460.

HMC 10.1.1020.0 and VIOS 3.1.3.21 or later provide the following enhancements to the LPM feature:

Ê Automatically choose fastest network for LPM memory transfer.

Ê Allow LPM when a virtual optical device is assigned to a partition.

5.1.5 Active Memory Mirroring

Active Memory Mirroring (AMM) for Hypervisor is available as an option (#EM8G) to enhance resilience by mirroring critical memory that is used by the PowerVM hypervisor so that it can continue operating in a memory failure.

A portion of available memory can be proactively partitioned such that a duplicate set can be used on non-correctable memory errors. This partition can be implemented at the granularity of DIMMs or logical memory blocks.

5.1.6 Remote Restart

Remote Restart is a high availability (HA) option for partitions. If an error occurs that causes a server outage, a partition that is configured for Remote Restart can be restarted on a different physical server. At times, it might take longer to start the server, in which case the Remote Restart function can be used for faster reprovisioning of the partition. Typically, this task can be done faster than restarting the server that stopped and then restarting the partitions. The Remote Restart function relies on technology that is like LPM, where a partition is configured with storage on a SAN that is shared (accessible) by the server that hosts the partition.

HMC 10R1 provides an enhancement to the Remote Restart feature that enables remote restart when a virtual optical device is assigned to a partition.

5.1.7 IBM Power processor modes

Although they are not virtualization features, the IBM Power processor modes are described here because they affect various virtualization features.

On IBM Power servers, partitions can be configured to run in several modes, including the following modes:

Ê Power8

This native mode for Power8 processors implements version 2.07 of the IBM Power instruction set architecture (ISA). For more information, see Processor compatibility mode definitions.

Ê Power9

This native mode for Power9 processors implements version 3.0 of the IBM Power ISA. For more information, see Processor compatibility mode definitions.

Ê Power10

This native mode for Power10 processors implements version 3.1 of the IBM Power ISA. For more information, see Processor compatibility mode definitions.

Figure 5-2 shows the available processor modes on a Power10 processor-based mid-range server.

Figure 5-2 Processor modes

Processor compatibility mode is important when LPM migration is planned between different generation of servers. An LPAR that might be migrated to a machine that is managed by a processor from another generation must be activated in a specific compatibility mode.

Note: Migrating an LPAR from a Power7 processor-based server to a Power10 processor-based mid-range server by using LPM is not supported; however, the following steps can be completed to accomplish this task:

1. Migrate LPAR from a Power7 processor-based server to a Power8 or Power9 processor-based server by using LPM.

2. Migrate the LPAR from the Power8 or Power9 processor-based server to a Power10 processor-based mid-range server.

The OS running on the Power7 processor-based server must be supported on the Power10 processor-based mid-range server or must be upgraded to a supported level before completing these steps.

5.1.8 Single-root I/O virtualization

Single-root I/O virtualization (SR-IOV) is an extension to the Peripheral Component Interconnect Express (PCIe) specification that allows multiple OSs to simultaneously share a PCIe adapter with little or no runtime involvement from a hypervisor or other virtualization intermediary.

SR-IOV is PCI standard architecture that enables PCIe adapters to become self-virtualizing. It enables adapter consolidation through sharing, much like logical partitioning enables server consolidation. With an adapter capable of SR-IOV, you can assign virtual slices of a single physical adapter to multiple partitions through logical ports, which is done without a VIOS.

5.1.9 More information about virtualization features

The following IBM Redbooks publications provide more information about the virtualization features:

Ê IBM PowerVM Best Practices, SG24-8062

Ê IBM PowerVM Virtualization Introduction and Configuration, SG24-7940

Ê IBM PowerVM Virtualization Managing and Monitoring, SG24-7590

Ê IBM Power Systems SR-IOV: Technical Overview and Introduction, REDP-5065

Multiple shared processor pools- IBM Power E1050

MSPPs are supported on Power10 processor-based servers. This capability allows a system administrator to create a set of micropartitions with the purpose of controlling the processor capacity that can be used from the physical SPP.

Micropartitions are created and then identified as members of the default processor pool or a user-defined SPP. The virtual processors that exist within the set of micropartitions are monitored by the Power Hypervisor. Processor capacity is managed according touser-defined attributes.

If the IBM Power server is under heavy load, each micropartition within an SPP is assured of its processor entitlement, plus any capacity that might be allocated from the reserved pool capacity if the micropartition is uncapped.

If specific micropartitions in an SPP do not use their processing capacity entitlement, the unused capacity is ceded, and other uncapped micropartitions within the same SPP can use the extra capacity according to their uncapped weighting. In this way, the entitled pool capacity of an SPP is distributed to the set of micropartitions within that SPP.

All IBM Power servers that support the MSPP capability have a minimum of one (the default) SPP and up to a maximum of 64 SPPs.

This capability helps customers reduce total cost of ownership (TCO) when the cost of software or database licenses depends on the number of assigned processor cores.

5.1.3 Virtual I/O Server

The VIOS is part of PowerVM. It is the specific appliance that allows the sharing of physical resources among LPARs to allow more efficient usage (for example, consolidation). In this case, the VIOS owns the physical I/O resources (SCSI, FC, network adapters, or optical devices) and allows customer partitions to share access to them, which minimizes and optimizes the number of physical adapters in the system.

The VIOS eliminates the requirement that every partition owns a dedicated network adapter, disk adapter, and disk drive. The VIOS supports OpenSSH for secure remote logins. It also provides a firewall for limiting access by ports, network services, and IP addresses.

Figure 5-1 shows an overview of a VIOS configuration.

Figure 5-1 Architectural view of the VIOS

It is a best practice to run dual VIOSs per physical server.

Shared Ethernet Adapter

A SEA can be used to connect a physical Ethernet network to a virtual Ethernet network. The SEA provides this access by connecting the Power Hypervisor VLANs to the VLANs on the external switches. Because the SEA processes packets at Layer 2, the original MAC address and VLAN tags of the packet are visible to other systems on the physical network. IEEE 802.1 VLAN tagging is supported.

By using the SEA, several customer partitions can share one physical adapter. You also can connect internal and external VLANs by using a physical adapter. The SEA service can be hosted only in the VIOS (not in a general-purpose AIX or Linux partition) and acts as a Layer 2 network bridge to securely transport network traffic between virtual Ethernet networks (internal) and one or more (Etherchannel) physical network adapters (external). These virtual Ethernet network adapters are defined by the Power Hypervisor on the VIOS.

Virtual SCSI

Virtual SCSI is used to view a virtualized implementation of the SCSI protocol. Virtual SCSI is based on a client/server relationship. The VIOS LPAR owns the physical I/O resources and acts as a server or in SCSI terms a target device. The client LPARs access the virtual SCSI backing storage devices that are provided by the VIOS as clients.

The virtual I/O adapters (a virtual SCSI server adapter and a virtual SCSI client adapter) are configured by using an HMC. The virtual SCSI server (target) adapter is responsible for running any SCSI commands that it receives, and it is owned by the VIOS partition. The virtual SCSI client adapter allows a client partition to access physical SCSI and SAN-attached devices and LUNs that are mapped to be used by the client partitions. The provisioning of virtual disk resources is provided by the VIOS.

N_Port ID Virtualization

N_Port ID Virtualization (NPIV) is a technology that allows multiple LPARs to access one or more external physical storage devices through the same physical FC adapter. This adapter is attached to a VIOS partition that acts only as a pass-through that manages the data transfer through the Power Hypervisor.

Each partition features one or more virtual FC adapters, each with their own pair of unique worldwide port names. This configuration enables you to connect each partition to independent physical storage on a SAN. Unlike virtual SCSI, only the client partitions see the disk.

For more information and requirements for NPIV, see IBM PowerVM Virtualization Managing and Monitoring, SG24-7590.

Memory usage for virtualization features- IBM Power E1050

Virtualization requires more memory to be allocated by the Power Hypervisor for hardware statesave areas and various virtualization technologies. For example, on Power10 processor-based systems, each processor core supports up to eight simultaneous multithreading (SMT) threads of execution, and each thread contains over 80 different registers.

The Power Hypervisor must set aside save areas for the register contents for the maximum number of virtual processors that are configured. The greater the number of physical hardware devices, the greater the number of virtual devices, the greater the amount of virtualization, and the more hypervisor memory is required. For efficient memory consumption, wanted and maximum values for various attributes (processors, memory, and virtual adapters) must be based on business needs, and not set to values that are significantly higher than actual requirements.

Predicting memory that is used by the Power Hypervisor

The IBM System Planning Tool (SPT) is a resource that can be used to estimate the amount of hypervisor memory that is required for a specific server configuration. After the SPT executable file is downloaded and installed, you can define a configuration by selecting the correct hardware platform and the installed processors and memory, and defining partitions and partition attributes. SPT can estimate the amount of memory that is assigned to the hypervisor, which helps you when you change a configuration or deploy new servers.

The Power Hypervisor provides the following types of virtual I/O adapters:

Ê Virtual SCSI

The Power Hypervisor provides a virtual SCSI mechanism for the virtualization of storage devices. The storage virtualization is accomplished by using two paired adapters: a virtual SCSI server adapter and a virtual SCSI customer adapter.

Ê Virtual Ethernet

The Power Hypervisor provides a virtual Ethernet switch function that allows partitions fast and secure communication on the same server without any need for physical interconnection or connectivity outside of the server if a Layer 2 bridge to a physical Ethernet adapter is set in one VIOS partition, also known as Shared Ethernet Adapter (SEA).

Ê Virtual FC

A virtual FC adapter is a virtual adapter that provides customer LPARs with an FC connection to a storage area network (SAN) through the VIOS partition. The VIOS partition provides the connection between the virtual FC adapters on the VIOS partition and the physical FC adapters on the managed system.

Ê Virtual (tty) console

Each partition must have access to a system console. Tasks such as OS installation, network setup, and various problem analysis activities require a dedicated system console. The Power Hypervisor provides the virtual console by using a virtual tty or serial adapter and a set of hypervisor calls to operate on them. Virtual tty does not require the purchase of any other features or software, such as the PowerVM Edition features.

Logical partitions

Logical partitions (LPARs) and virtualization increase the usage of system resources and add a level of configuration possibilities.

Logical partitioning is the ability to make a server run as though it were two or more independent servers. When you logically partition a server, you divide the resources on the server into subsets, which are called LPARs. You can install software on an LPAR, and the LPAR runs as an independent logical server with the resources that you allocated to the LPAR.

LPAR also is referred to in some documentation as a virtual machine (VM), which makes it look like what other hypervisors offer. However, LPARs provide a higher level of security and isolation and other features that are described in this chapter.

Processors, memory, and I/O devices can be assigned to LPARs. AIX, IBM i, Linux, and VIOS can run on LPARs. VIOS provides virtual I/O resources to other LPARs with general-purpose OSs.

Note: The Power E 1050 server does not support IBM i.

LPARs share a few system attributes, such as the system serial number, system model, and processor FCs. All other system attributes can vary from one LPAR to another.

Micro-Partitioning

When you use the Micro-Partitioning technology, you can allocate fractions of processors to an LPAR. An LPAR that uses fractions of processors is also known as a shared processor partition or micropartition. Micropartitions run over a set of processors that is called a shared processor pool (SPP), and virtual processors are used to enable the OS manage the fractions of processing power that are assigned to the LPAR.

From an OS perspective, a virtual processor cannot be distinguished from a physical processor unless the OS is enhanced to determine the difference. Physical processors are abstracted into virtual processors that are available to partitions.

On a Power10 processor-based server, a partition can be defined with a processor capacity as small as 0.05processing units. This number represents 0.05 of a physical core. Each physical core can be shared by up to 20 shared processor partitions, and the partition’s entitlement can be incremented fractionally by as little as 0.05 of the processor. The shared processor partitions are dispatched and time-sliced on the physical processors under the control of the Power Hypervisor. The shared processor partitions are created and managed by the HMC.

Note: Although the Power10 processor-based mid-range server supports up to 20 shared processor partitions, the real limit depends on application workload demands in use on the server.

Processing mode

When you create an LPAR, you can assign entire processors for dedicated use, or you can assign partial processing units from an SPP. This setting defines the processing mode of the LPAR.

Dedicated mode

In dedicated mode, physical processors are assigned as a whole to partitions. The SMT feature in the Power10 processor core allows the core to run instructions from two, four, or eight independent software threads simultaneously.

Shared dedicated mode

On Power10 processor-based servers, you can configure dedicated partitions to become processor donors for idle processors that they own, which allows for the donation of spare CPU cycles from dedicated processor partitions to an SPP. The dedicated partition maintains absolute priority for dedicated CPU cycles. Enabling this feature can help increase system usage without compromising the computing power for critical workloads in a dedicated processor mode LPAR.

Shared mode

In shared mode, LPARs use virtual processors to access fractions of physical processors. Shared partitions can define any number of virtual processors (the maximum number is 20 times the number of processing units that are assigned to the partition). The Power Hypervisor dispatches virtual processors to physical processors according to the partition’s processing units entitlement. One processing unit represents one physical processor’s processing capacity. All partitions receive a total CPU time equal to their processing unit’s entitlement. The logical processors are defined on top of virtual processors. Therefore, even with a virtual processor, the concept of a logical processor exists, and the number of logical processors depends on whether SMT is turned on or off.

PowerVM virtualization- IBM Power E1050

The PowerVM platform is the family of technologies, capabilities, and offerings that delivers industry-leading virtualization for enterprises. It is the umbrella branding term for IBM Power processor-based server virtualization:

Ê IBM Power Hypervisor

Ê Logical partitioning

Ê IBM Micro-Partitioning®

Ê Virtual I/O Server (VIOS)

Ê Live Partition Mobility (LPM)

PowerVM is a combination of hardware and software enablement.

Note: PowerVM Enterprise Edition License Entitlement is included with each Power10 processor-based mid-range server. PowerVM Enterprise Edition is available as a hardware feature (#EPVV); supports up to 20 partitions per core, VIOS, multiple shared processor pools (MSPPs); and also offers LPM.

5.1.1 IBM Power Hypervisor

IBM Power processor-based servers are combined with PowerVM technology and offer the following key capabilities that can help to consolidate and simplify IT environments:

Ê Improve server usage and share I/O resources to reduce the total cost of ownership (TCO) and better use IT assets.

Ê Improve business responsiveness and operational speed by dynamically reallocating resources to applications as needed to better match changing business needs or handle unexpected changes in demand.

Ê Simplify IT infrastructure management by making workloads independent of hardware resources so that business-driven policies can be used to deliver resources that are based on time, cost, and service-level requirements.

Combined with features in the Power10 processor-based mid-range servers, the Power Hypervisor delivers functions that enable other system technologies, including logical partitioning technology, virtualized processors, IEEE virtual local area network (VLAN)-compatible virtual switches, virtual Small Computer Serial Interface (SCSI) adapters, virtual Fibre Channel (FC) adapters, and virtual consoles.

The Power Hypervisor is a basic component of the system’s firmware and offers the following functions:

Ê Provides an abstraction between the physical hardware resources and the LPARs that use them.

Ê Enforces partition integrity by providing a security layer between LPARs.

Ê Controls the dispatch of virtual processors to physical processors.

Ê Saves and restores all processor state information during a logical processor context switch.

Ê Controls hardware I/O interrupt management facilities for LPARs.

Ê Provides VLAN channels between LPARs that help reduce the need for physical Ethernet adapters for inter-partition communication.

Ê Monitors the enterprise Baseboard Management Controller (eBMC) and performs a reset or reload if needed, notifying the operating system (OS) if the problem is not corrected.

The Power Hypervisor is always active, regardless of the system configuration or whether it is connected to the managed console. It requires memory to support the resource assignment of the LPARs on the server. The amount of memory that is required by the Power Hypervisor firmware varies according to several factors:

Ê Memory usage for hardware page tables (HPTs)

Ê Memory usage to support I/O devices

Ê Memory usage for virtualization

Memory usage for hardware page tables

Each partition on the system includes its own HPT that contributes to hypervisor memory usage. The HPT is used by the OS to translate from effective addresses to physical real addresses in the hardware. This translation from effective to real addresses allows multiple OSs to run simultaneously in their own logical address space. Whenever a virtual processor for a partition is dispatched on a physical processor, the hypervisor indicates to the hardware the location of the partition HPT that can be used when translating addresses.

The amount of memory for the HPT is based on the maximum memory size of the partition and the HPT ratio. The default HPT ratio is 1/128th (for AIX, VIOS, and Linux partitions) of the maximum memory size of the partition. AIX, VIOS, and Linux use larger page sizes (16 and 64 KB) instead of using 4 KB pages. The use of larger page sizes reduces the overall number of pages that must be tracked; therefore, the overall size of the HPT can be reduced. For example, the HPT is 2 GB for an AIX partition with a maximum memory size of 256 GB.

When defining a partition, the maximum memory size that is specified is based on the amount of memory that can be dynamically added to the dynamic logical partition (DLPAR) without changing the configuration and restarting the partition.

In addition to setting the maximum memory size, the HPT ratio can be configured. The hpt_ratio parameter for the chsyscfg Hardware Management Console (HMC) command can be issued to define the HPT ratio that is used for a partition profile. The valid values are 1:32, 1:64, 1:128, 1:256, or 1:512.

Specifying a smaller absolute ratio (1/512 is the smallest value) decreases the overall memory that is assigned to the HPT. Testing is required when changing the HPT ratio because a smaller HPT might incur more CPU consumption because the OS might need to reload the entries in the HPT more frequently. Most customers choose to use the IBM provided default values for the HPT ratios.

Memory usage for I/O devices

In support of I/O operations, the hypervisor maintains structures that are called the translation control entities (TCEs), which provide an information path between I/O devices and partitions. The TCEs provide the address of the I/O buffer, indications of read versus write requests, and other I/O-related attributes. Many TCEs are used per I/O device, so multiple requests can be active simultaneously to the same physical device. To provide better affinity, the TCEs are spread across multiple processor chips or drawers to improve performance while accessing the TCEs.

For physical I/O devices, the base amount of space for the TCEs is defined by the hypervisor that is based on the number of I/O devices that are supported. A system that supports high-speed adapters also can be configured to allocate more memory to improve I/O performance. Linux is the only OS that uses these extra TCEs so that the memory can be freed for use by partitions if the system uses only AIX.













































Ê Provides VLAN channels between LPARs that help reduce
the need for physical Ethernet adapters for inter-partition communication.

Ê Monitors the enterprise
Baseboard Management Controller
(eBMC) and performs a reset or
reload if needed, notifying the
operating system (OS) if the problem is not corrected.

 

The Power Hypervisor is always active, regardless of the system configuration or whether it is
connected to the managed console. It requires memory to support the
resource assignment of the LPARs on the server. The amount of memory that is required
by the Power Hypervisor firmware varies according to several
factors:

Ê Memory usage for hardware page tables
(HPTs)
Ê Memory usage to support
I/O devices

Ê Memory usage for virtualization

 

Memory usage for hardware page tables

Each
partition on the system includes its own HPT that contributes to hypervisor
memory usage. The HPT is used by the
OS to translate from effective
addresses to physical real addresses in
the hardware. This translation from effective to real addresses allows
multiple OSs to run simultaneously in their own logical address space.
Whenever a virtual processor for a partition is dispatched on a physical
p
rocessor, the hypervisor indicates to the hardware
the location of the partition HPT
that can be used when translating
addresses.

 

The
amount of memory for the HPT is based
on the maximum memory size of the partition and the HPT ratio.
The default HPT ratio
is 1/128th (for AIX, VIOS, and Linux partitions) of the maximum
memory size of the partition. AIX, VIOS, and Linux use larger page sizes
(16 and 64 KB) instead of using 4 KB pages. The use of larger page sizes reduces the overall number of pages that must be tracked; therefore, the overall size
of the HPT can be reduced. For example, the HPT is 2 GB for an AIX partition with a maximum memory size
of 256 GB.

 

When defining
a partition, the maximum memory size that is specified
is based on the amount
of memory that can be dynamically
added to the dynamic logical partition (DLPAR) without changing
the configuration and restarting the
partition.

 

In addition to setting
the maximum memory size, the
HPT ratio can be configured. The

hpt_ratio parameter for the chsyscfg Hardware Management Console (HMC) command
can be issued to define
the HPT ratio that is used for a partition profile.
The valid values
are 1:32, 1:64, 1:128, 1:256, or 1:512.

 

Specifying a
smaller absolute ratio (1/512 is the smallest value) decreases the
overall memory that is assigned to
the HPT. Testing is required when changing the HPT ratio because
a smaller HPT might incur more CPU consumption
because the OS might need to reload the entries in the HPT more
frequently. Most customers choose to use the IBM
provided default values for the HPT
ratios.

 

Memory usage for I/O devices

In
support of I/O operations, the hypervisor
maintains structures that are
called the
translation control entities (TCEs), which provide an
information path between I/O devices and partitions. The TCEs provide the address of the I/O buffer, indications of read versus write requests, and other I/O-related attributes. Many TCEs are used per I/O device, so multiple requests can be active
simultaneously to the same physical device.
To provide better
affinity, the TCEs are spread across multiple processor chips or drawers to improve performance
while accessing the TCEs.

Reliability and availability- IBM Power E1050

This section looks at the more general concept of RAS as it applies to any system in the data center.

The goal is to briefly define what RAS is and look at how reliability and availability are measured.

4.6.1 Reliability modeling

The prediction of system level reliability starts with establishing the failure rates of the individual components that make up the system. Then, by using the appropriate prediction models, the component-level failure rates are combined to provide a system-level reliability prediction in terms of a failure rate.

However, in documentation system-level reliability is often described in terms of mean time between failures (MTBF) for repairable systems rather than a failure rate, for example, 50 years MTBF.

A 50-year MTBF might suggest that a system runs 50 years between failures, but what it actually means is that among 50 identical systems, one per year will fail on average over a large population of systems.

4.6.2 Different levels of reliability

When a component fails, the impact of that failure can vary depending on the component.

A power supply failing in a system with a redundant power supply must be replaced. However, by itself a failure of a single power supply should not cause a system outage, and it should be a concurrent repair with no downtime.

Other components in a system might fail and cause a system-wide outage where concurrent repair is not possible. Therefore, it is typical to talk about different MTBF numbers:

Ê MTBF – Results in repair actions.

Ê MTBF – Requires concurrent repair.

Ê MTBF – Requires a non-concurrent repair.

Ê MTBF – Results in an unplanned application outage.

Ê MTBF – Results in an unplanned system outage.

4.6.3 Measuring availability

Mathematically speaking, availability is often expressed as a percentage of the time that something is available or in use over a period. An availability number for a system can be mathematically calculated from the expected reliability of the system if both the MTBF and the duration of each outage is known.

For example, consider a system that always runs exactly one week between failures and each time it fails it is down for 10 minutes. For 168 hours in a week, the system is down (10/60) hours. It is up 168 hrs – (10/60) hrs. As a percentage of the hours in the week, it can be said that the system is (168-(1/6))*100% = 99.9% available.

99.999% available means approximately 5.3 minutes of downtime in a year. On average, a system that failed once a year and was down for 5.3 minutes would be 99.999% available. This concept is often called “five 9s of availability”.

When talking about modern server hardware availability, short weekly failures like in the example is not the norm. Rather, the failure rates are much lower, and the MTBF is often measured in terms of years, perhaps more years than a system will be kept in service.

Therefore, when an MTBF of 10 years, for example, is quoted, it is not expected that on average each system will run 10 years between failures. Rather, it is more reasonable to expect that on average in a year that 1 server out of 10 will fail. If a population of 10 servers always had exactly one failure a year, a statement of 99.999% availability across that population of servers would mean that the one server that failed would be down about 53 minutes when it failed.

In theory, five 9s of availability can be achieved by having a system design that fails frequently, multiple times a year, but whose failures are limited to small periods of time. Conversely, five 9 s of availability might mean a server design with a large MTBF, but where a server takes a fairly long time to recover from the rare outage.

Figure 4-3 shows that five 9s of availability can be achieved with systems that fail frequently for minuscule amounts of time, or infrequently with much larger downtime per failure.

Figure 4-3 Five 9s of availability

Figure 4-3 on page 136 is misleading in the sense that servers with low reliability are likely to have many components that, when they fail, take down the system and keep the system down until repair. Conversely, servers that are designed for great reliability often also are designed so that the systems, or at least portions of the system, can be recovered without having to keep a system down until it is repaired. So, systems with low MTBF would have longer repair times, and a system with five 9s of availability would be synonymous with a high level of reliability.

Non-volatile Memory Express bays- IBM Power E1050

The IBM Power E1050 server has a 10-NVMe backplane that is always present in the server. It offers up to 10 NVMe bays with all four processor sockets populated and six NVMe bays if only two or three processor sockets are populated. It connects to the system board through three Molex Impact connectors and a power connector. There is no cable between the NVMe backplane and the system board.

The wiring strategy and backplane materials are chosen to ensure Gen4 signaling to all NVMe drives. All NVMe connectors are PCIe Gen4 connectors. For more information about the internal connection of the NVMe bays to the processor chips, see Figure 2-13 on page 66.

Each NVMe interface is a Gen4 x4 PCIe bus. The NVMe drives can be in an OS-controlled RAID array. A hardware RAID is not supported on the NVMe drives. The NVMe thermal design supports 18 W for 15-mm NVMe drives and 12 W for 7-mm NVMe drives.

For more information about the available NVMe drives and how to plug the drives for best availability, see 3.5, “Internal storage” on page 92.

2.3.4 Attachment of I/O-drawers

The Power E1050 server can expand the number of its I/O slots by using I/O Expansion Drawers (#EMX0). The number of I/O drawers that can be attached to an Power E1050 server depends on the number of populated processor slots, which changes the number of available internal PCIe slots of the server. Only some slots can be used to attach an I/O Expansion Drawer by using the #EJ2A CXP Converter adapter, also referred to as a cable card.

Feature Code #EJ2A is an IBM designed PCIe Gen4 x16 cable card. It is the only supported cable card to attach fanout modules of an I/O Expansion Drawer in the Power E1050 server. Previous cards from a Power E950 server cannot be used. Feature Code #EJ2A supports copper and optical cables for the attachment of a fanout module.

Note: The IBM e-config configurator adds 3-meter copper cables (Feature Code #ECCS) to the configuration if no cables are manually specified. If you want to have optical cables make sure to configure them.

Table 2-11 lists the PCIe slot order for the attachment of an I/O Expansion Drawer, the maximum number of I/O Expansion Drawers and Fanout modules, and the maximum number of available slots (dependent on the populated processor sockets).

Table 2-11 I/O Expansion Drawer capabilities depend on the number of populated processor slots

For more information about the #EMX0 I/O Expansion Drawer, see 3.9.1, “PCIe Gen3 I/O expansion drawer” on page 99.

2.3.5 System ports

The Power E1050 server has two 1-Gbit Ethernet ports and two USB 2.0 ports to connect to the eBMC service processor. The two eBMC Ethernet ports are used to connect one or two HMCs. There are no other HMC ports, as in servers that have a Flexible Service Processor (FSP). The eBMC USB ports can be used for a firmware update from a USB stick.

The two eBMC Ethernet ports are connected by using four PCIe lanes each, although the eBMC Ethernet controllers need only one lane. The connection is provided by the DCM0, one from each Power10 chip. For more information, see Figure 2-13 on page 66.

The eBMC module with its two eBMC USB ports also is connected to the DCM0 at chip 0 by using a x4 PHB, although the eBMC module uses only one lane.

Chapter 2. Architecture and technical overview                                     71

For more information about attaching an Power E1050 server to an HMC, see Accessing the eBMC so that you can manage the system.

For more information about how to do a firmware update by using the eBMC USB ports, see Installing the server firmware on the service processor or eBMC through a USB port.

Service labels- IBM Power E1050

Service providers use these labels to assist them in performing maintenance actions. Service labels are found in various formats and positions are intended to transmit readily available information to the servicer during the repair process. Here are some of these service labels and their purposes:

Ê Location diagrams: Location diagrams are on the system hardware, relating information regarding the placement of hardware components. Location diagrams might include location codes, drawings of physical locations, concurrent maintenance status, or other data that is pertinent to a repair. Location diagrams are especially useful when multiple components such as DIMMs, processors, fans, adapters, and power supplies are installed.

Ê Remove/replace procedures: Service labels that contain remove/replace procedures are often found on a cover of the system or in other spots accessible to the servicer. These labels provide systematic procedures, including diagrams detailing how to remove or replace certain serviceable hardware components.

Ê Arrows: Numbered arrows are used to indicate the order of operation and the serviceability direction of components. Some serviceable parts such as latches, levers, and touch points must be pulled or pushed in a certain direction and in a certain order for the mechanical mechanisms to engage or disengage. Arrows generally improve the ease of serviceability.

4.5.9 QR labels

QR labels are placed on the system to provide access to key service functions through a mobile device. When the QR label is scanned, it goes to a landing page for Power10 processor-based systems, which contains each machine type and model (MTM) service functions of interest while physically at the server. These functions include things installation and repair instructions, reference code lookup, and other items.

4.5.10 Packaging for service

The following service features are included in the physical packaging of the systems to facilitate service:

Ê Color coding (touch points): Blue-colored touch points delineate touch points on service components where the component can be safely handled for service actions, such as removal or installation.

Ê Tool-less design: Selected IBM systems support tool-less or simple tool designs. These designs require no tools or simple tools such as flathead screw drivers to service the hardware components.

Ê Positive retention: Positive retention mechanisms help to ensure proper connections between hardware components, such as cables to connectors, and between two cards that attach to each other. Without positive retention, hardware components run the risk of becoming loose during shipping or installation, preventing a good electrical connection. Positive retention mechanisms like latches, levers, thumbscrews, pop nylatches (U-clips), and cables are included to help prevent loose connections and aid in installing (seating) parts correctly. These positive retention items do not require tools.

4.5.11 Error handling and reporting

In a system hardware or environmentally induced failure, the system runtime error capture capability systematically analyzes the hardware error signature to determine the cause of failure. The analysis result is stored in system NVRAM. When the system can be successfully restarted either manually or automatically, or if the system continues to operate, the error is reported to the OS. Hardware and software failures are recorded in the system log. When an HMC is attached in the PowerVM environment, an ELA routine analyzes the error, forwards the event to the SFP application running on the HMC, and notifies the system administrator that it has isolated a likely cause of the system problem. The service processor event log also records unrecoverable checkstop conditions, forwards them to the SFP application, and notifies the system administrator.

The system can call home through the OS to report platform-recoverable errors and errors that are associated with PCI adapters or devices.

In the HMC-managed environment, a Call Home service request is initiated from the HMC, and the pertinent failure data with service parts information and part locations is sent to an IBM service organization. Customer contact information and specific system-related data, such as the MTM and serial number, along with error log data that is related to the failure, are sent to IBM Service.

4.5.12 Live Partition Mobility

With PowerVM Live Partition Mobility (LPM), users can migrate an AIX, IBM i, or Linux VM partition running on one IBM Power partition server to another IBM Power server without disrupting services. The migration transfers the entire system environment, including processor state, memory, attached virtual devices, and connected users. It provides continuous OS and application availability during planned partition outages for repair of hardware and firmware faults. The Power10 servers that use Power10 processor-based technology support secure LPM, where the VM image is encrypted and compressed before transfer. Secure LPM uses on-chip encryption and compression capabilities of the Power10 processor for optimal performance.

4.5.13 Call Home

Call Home refers to an automatic or manual call from a client location to the IBM support structure with error log data, server status, or other service-related information. Call Home invokes the service organization in order for the appropriate service action to begin. Call Home can be done through the ESA that is embedded in the HMC, or through a version of the ESA that is embedded in the OSs for non-HMC-managed or a version of ESA that runs as a stand-alone Call Home application. While configuring Call Home is optional, clients are encouraged to implement this feature to obtain service enhancements such as reduced problem determination and faster and potentially more accurate transmittal of error information. In general, using the Call Home feature can result in increased system availability.

4.5.14 IBM Electronic Services

ESA and Client Support Portal (CSP) comprise the IBM Electronic Services solution, which is dedicated to providing fast, exceptional support to IBM clients. ESA is a no-charge tool that proactively monitors and reports hardware events, such as system errors, and collects hardware and software inventory. ESA can help focus on the client’s company business initiatives, save time, and spend less effort managing day-to-day IT maintenance issues. In addition, Call Home Cloud Connect Web and Mobile capability extends the common solution and offers IBM Systems related support information that is applicable to servers and storage.

For more information, see IBM Call Home Connect Cloud.

Serviceability- IBM Power E1050

The purpose of serviceability is to efficiently repair the system while attempting to minimize or eliminate any impact to system operation. Serviceability includes system installation, Miscellaneous Equipment Specification (MES) (system upgrades/downgrades), and system maintenance or repair. Depending on the system and warranty contract, service may be performed by the client, an IBM representative, or an authorized warranty service provider. The serviceability features that are delivered in this system help provide a highly efficient service environment by incorporating the following attributes:

Ê Designed for IBM System Services Representative (IBM SSR) setup, install, and service.

Ê Error Detection and Fault Isolation (ED/FI).

Ê FFDC.

Ê Light path service indicators.

Ê Service and FRU labels that are available on the system.

Ê Service procedures are documented in IBM Documentation or available through the HMC.

Ê Automatic reporting of serviceable events to IBM through the Electronic Service Agent (ESA) Call Home application.

4.5.1 Service environment

In the PowerVM environment, the HMC is a dedicated server that provides functions for configuring and managing servers for either partitioned or full-system partition by using a GUI, command-line interface (CLI), or Representational State Transfer (REST) API. An HMC that is attached to the system enables support personnel (with client authorization) to remotely or locally (by using the physical HMC that is in proximity of the server being serviced) log in to review error logs and perform remote maintenance if required.

The Power10 processor-based servers support several service environments:

Ê Attachment to one or more HMCs or virtual HMCS (vHMCs) is a supported option by the system with PowerVM. This configuration is the default one for servers supporting logical partitions (LPARs) with dedicated or virtual I/O. In this case, all servers have at least one LPAR.

Ê No HMC. There are two service strategies for non-HMC systems:

– Full-system partition with PowerVM: A single partition owns all the server resources and only one operating system (OS) may be installed. The primary service interface is through the OS and the service processor.

– Partitioned system with NovaLink: In this configuration, the system can have more than one partition and can be running more than one OS. The primary service interface is through the service processor.

4.5.2 Service interface

Support personnel can use the service interface to communicate with the service support applications in a server by using an operator console, a GUI on the management console or service processor, or an OS terminal. The service interface helps to deliver a clear, concise view of available service applications, helping the support team to manage system resources and service information in an efficient and effective way. Applications that are available through the service interface are carefully configured and placed to grant service providers access to important service functions. Different service interfaces are used, depending on the state of the system, hypervisor, and operating environment. The primary service interfaces are:

Ê LEDs

Ê Operator panel

Ê BMC Service Processor menu

Ê OS service menu

Ê Service Focal Point (SFP) on the HMC or vHMC with PowerVM

In the light path LED implementation, the system can clearly identify components for replacement by using specific component-level LEDs and also can guide the servicer directly to the component by signaling (turning on solid) the enclosure fault LED and component FRU fault LED. The servicer also can use the identify function to flash the FRU-level LED. When this function is activated, a roll-up to the blue enclosure locate occurs. These enclosure LEDs turn on solid and can be used to follow the light path from the enclosure and down to the specific FRU in the PowerVM environment.

4.5.3 First Failure Data Capture and error data analysis

FFDC is a technique that helps ensure that when a fault is detected in a system, the root cause of the fault is captured without the need to re-create the problem or run any sort of extending tracing or diagnostics program. For most faults, a good FFDC design means that the root cause also can be detected automatically without servicer intervention.

FFDC information, error data analysis, and fault isolation are necessary to implement the advanced serviceability techniques that enable efficient service of the systems and to help determine the failing items.

In the rare absence of FFDC and Error Data Analysis, diagnostics are required to re-create the failure and determine the failing items.

4.5.4 Diagnostics

The general diagnostic objectives are to detect and identify problems so that they can be resolved quickly. Elements of th IBM diagnostics strategy include:

Ê Provides a common error code format equivalent to a system reference code with a PowerVM, system reference number, checkpoint, or firmware error code.

Ê Provides fault detection and problem isolation procedures. Supports remote connection, which can be used by the IBM Remote Support Center or IBM Designated Service.

Ê Provides interactive intelligence within the diagnostics with detailed online failure information while connected to the IBM back-end system.

4.5.5 Automatic diagnostics

The processor and memory FFDC technology is designed to perform without re-creating diagnostics or user intervention. Solid and intermittent errors are designed to be correctly detected and isolated at the time the failure occurs. Runtime and boot-time diagnostics fall into this category.

4.5.6 Stand-alone diagnostics

As the name implies, stand-alone or user-initiated diagnostics requires user intervention. The user must perform manual steps, including:

Ê Booting from the diagnostics CD, DVD, Universal Serial Bus (USB), or network

Ê Interactively selecting steps from a list of choices

4.5.7 Concurrent maintenance

The determination of whether a firmware release can be updated concurrently is identified in the readme file that is released with the firmware. An HMC is required for a concurrent firmware update with PowerVM. In addition, concurrent maintenance of PCIe adapters and NVMe drives is supported by PowerVM. Power supplies, fans, and operating panel LCDs are hot-pluggable.

Power10 processor RAS- IBM Power E1050

Although there are many differences internally in the Power10 processor compared to the Power9 processor that relate to performance, number of cores, and other features, the general RAS philosophy for how errors are handled has remains the same. Therefore, information about Power9 processor-based subsystem RAS can still be referenced to understand the design. For more information, see Introduction to IBM Power Reliability, Availability, and Serviceability for Power9 processor-based systems using IBM PowerVM.

The Power E1050 processor module is a dual-chip module (DCM) that differs from that of the Power E950, which has single-chip module (SCM). Each DCM has 30 processor cores, which is 120 cores for a 4-socket (4S) Power E1050. In comparison, a 4S Power E950 supports 48 cores. The internal processor buses are twice as fast with the Power E1050 running at 32 Gbps.

Despite the increased cores and the faster high-speed processor bus interfaces, the RAS capabilities are equivalent, with features like PIR, L2/L3 Cache ECC protection with cache line delete, and the CRC fabric bus retry that is a characteristic of Power9 and Power10 processors. As with the Power E950, when an internal fabric bus lane encounters a hard failure in a Power E1050, the lane can be dynamically spared out.

Figure 4-2 shows the Power10 DCM

Figure 4-2 Power10 Dual-Chip Module

4.3.1 Cache availability

The L2/L3 caches in the Power10 processor in the memory buffer chip are protected with double-bit detect, single-bit correct ECC. In addition, a threshold of correctable errors that are detected on cache lines can result in the data in the cache lines being purged and the cache lines removed from further operation without requiring a restart in the PowerVM environment. Modified data is handled through Special Uncorrectable Error (SUE) handling. L1 data and instruction caches also have a retry capability for intermittent errors and a cache set delete mechanism for handling solid failures.

4.3.2 Special Uncorrectable Error handling

SUE handling prevents an uncorrectable error in memory or cache from immediately causing the system to terminate. Rather, the system tags the data and determines whether it will ever be used again. If the error is irrelevant, it will not force a checkstop. When and if data is used, I/O adapters that are controlled by an I/O hub controller freeze if the data were transferred to an I/O device; otherwise, termination can be limited to the program/kernel, or if the data is not owned by the hypervisor.

4.3.3 Uncorrectable error recovery

When the auto-restart option is enabled, the system can automatically restart following an unrecoverable software error, hardware failure, or environmentally induced (AC power) failure.

4.4 I/O subsystem RAS

The Power E1050 provides 11 general-purpose Peripheral Component Interconnect Express (PCIe) slots that allow for hot-plugging of I/O adapters, which makes the adapters concurrently maintainable. These PCIe slots operate at Gen4 and Gen5 speeds. Some of the PCIe slots support OpenCAPI and I/O expansion drawer cable cards.

Unlike the Power E950, the Power E1050 location codes start from index 0, as with all Power 10 systems. However, slot c0 is not a general-purpose PCIe slot because it is reserved for the eBMC Service Processor card.

Another difference between the Power E950 and the Power E1050 is that all the Power E1050 slots are directly connected to a Power10 processor. In the Power E950, some slots are connected to the Power9 processor through I/O switches.

All 11 PCIe slots are available if 3-socket or 4-socket DCMs are populated. In the 2-socket DCM configuration, only seven PCIe slots are functional.

DASD options

The Power E1050 provides 10 internal Non-volatile Memory Express (NVMe) drives at Gen4 speeds, which means that they are concurrently maintainable. The NVMe drives are connected to DCM0 and DCM3. In a 2-socket DCM configuration, only six of the drives are available. To access to all 10 internal NVMe drives, you must have a 4S DCM configuration. Unlike the Power E950, the Power E1050 has no internal serial-attached SCSI (SAS) drives You can use an external drawer to provide SAS drives.

The internal NVMe drives support OS-controlled RAID 0 and RAID 1 array, but no hardware RAID. For best redundancy, use an OS mirror and dual Virtual I/O Server (VIOS) mirror. To ensure as much separation as possible in the hardware path between mirror pairs, the following NVMe configuration is recommended:

Ê Mirrored OS: NVMe3 and NVMe4 pairs, or NVMe8 and NVMe9 pairs

Ê Mirrored dual VIOS:

– Dual VIOS: NVMe3 for VIOS1, NVMe4 for VIOS2.

– Mirrored dual VIOS: NVMe9 mirrors NVMe3, and NVMe8 mirrors NVMe4.

Service processor- IBM Power E1050

The Power10 E1050 comes with a redesigned service processor that is based on a BMC design with firmware that is accessible through open-source industry-standard application programming interfaces (APIs), such as Redfish. An upgraded Advanced System Management Interface (ASMI) web browser user interface preserves the required RAS functions while allowing the user to perform tasks in a more intuitive way.

Diagnostic monitoring of recoverable errors from the processor chipset is performed on the system processor itself, and the unrecoverable diagnostic monitoring of the processor chipset is performed by the service processor. The service processor runs on its own power boundary and does not require resources from a system processor to be operational to perform its tasks.

The service processor supports surveillance of the connection to the Hardware Management Console (HMC) and to the system firmware (hypervisor). It also provides several remote power control options, environmental monitoring, reset, restart, remote maintenance, and diagnostic functions, including console mirroring. The BMC service processors menus (ASMI) can be accessed concurrently during system operation, allowing nondisruptive abilities to change system default parameters, view and download error logs, and check system health.

Redfish, an industry-standard API for server management, enables IBM Power servers to be managed individually or in a large data center. Standard functions such as inventory, event logs, sensors, dumps, and certificate management are all supported by Redfish. In addition, new user management features support multiple users and privileges on the BMC through Redfish or ASMI. User management through lightweight directory access protocol (LDAP) also is supported. The Redfish events service provides a means for notification of specific critical events such that actions can be taken to correct issues. The Redfish telemetry service provides access to a wide variety of data (such as power consumption, and ambient, core, DIMMs, and I/O temperatures) that can be streamed at periodic intervals.

The service processor monitors the operation of the firmware during the boot process and also monitors the hypervisor for termination. The hypervisor monitors the service processor and reports a service reference code when it detects surveillance loss. In the PowerVM environment, it performs a reset/reload if it detects the loss of the service processor.

4.2 Memory subsystem RAS

The Power10 E1050 server introduces a new 4U tall (DDIMM) subsystem, which has a new OpenCAPI memory interface that is called OMI, for resilient and fast communication to the processor.

Figure 4-1 Power10 E1050 OMI

This new memory subsystem design delivers solid RAS. Unlike the processor RAS characteristics, the E1050 memory RAS varies significantly from that of the IBM Power E950. The Power E1050 supports the same 4U DDIMM height as the IBM Power E1080.

Table 4-2 compares memory DIMMs, and highlights the differences between the Power E950 DIMM and the Power E1050 DDIMM. It also provides the RAS impacts of the DDIMMs, which are applicable to the Power E1080 servers.

Table 4-2 Power E950 DIMMs versus Power E1050 DDIMMs RAS comparison

4.2.1 Memory buffer

The DDIMM contains a memory buffer with key RAS features, including protection of critical data/address flows by using CRC, ECC, and parity; a maintenance engine for background memory scrubbing and memory diagnostics; and a Fault Isolation Register (FIR) structure, which enables firmware attention-based fault isolation and diagnostics.

4.2.2 Open Memory Interface

The OMI interface between the memory buffer and processor memory controller is protected by dynamic lane calibration, and a CRC retry/recovery facility to retransmit lost frames to survive intermittent bit flips. A lane fail also can be survived by triggering a dynamic lane reduction from 8 to 4, independently for both up and downstream directions. A key advantage of the OMI interface is that it simplifies the number of critical signals that must cross connectors from processor to memory compared to a typical industry-standard DIMM design.

4.2.3 Memory ECC

The DDIMM includes a robust 64-byte Memory ECC with 8-bit symbols, which can correct up to five symbol errors (one x4 chip and one additional symbol), and retry for data and address uncorrectable errors.

4.2.4 Dynamic row repair

To further extend the life of the DDIMM, the dynamic row repair feature can restore full use of a DRAM for a fault that is contained to a DRAM row while the system continues to operate.

4.2.5 Spare temperature sensors

Each DDIMM has spare temperature sensors so that the failure of one does not require a DDIMM replacement.

4.2.6 Spare DRAMs

4U DDIMMs include two spare x4 memory modules (DRAMs) per rank, which can be substituted for failed DRAMs during a runtime operation. Combined with ECC correction, the two spares allow a 4U DDIMM to continue to function with three bad DRAMs per rank, compared to 1 (single device data correct) or 2 (double device data correct) bad DRAMs in a typical industry-standard DIMM design. This setup extends self-healing capabilities beyond what is provided with the dynamic row repair capability.

4.2.7 Spare Power Management Integrated Circuits

4U DDIMMs include spare PMICs so that the failure of one PMIC does not require a DDIMM replacement.