Method for Checkpointing Instruction Groups with Out-of-Order Floating Point Instructions in a Multi-Threaded Processor

Inventors: James Wilson Bishop, Hung Qui Le, Michael James Mack, Jafar Nahidi, Dung Quoc Nguyen, Jose Angel Paredes, Scott Barnett Swaney, Brian William Thompto

Assignee: International Business Machines Corporation

Patent Number: US 7,478,276 B2

Date of Patent: January 13, 2009

Abstract

A method and apparatus are provided for dispatch group checkpointing in a microprocessor, including provisions for handling partially completed dispatch groups and instructions which modify system coherent state prior to completion. An instruction checkpoint retry mechanism is implemented to recover from soft errors in logic. The processor is able to dispatch fixed point unit (FXU), load/store unit (LSU), and floating point unit (FPU) or vector multimedia extension (VMX) instructions on the same cycle. Store data is written to a store queue when a store instruction finishes executing. The data is held in the store queue until the store instruction is checkpointed, at which point it can be released to the coherently shared level 2 (L2) cache.

Background of the Invention

1. Technical Field

The present invention relates to error detection in a data processing system. More specifically, the present invention is directed to a method and apparatus for checkpointing instruction groups with out-of-order floating point instructions in a multi-threaded processor.

2. Description of Related Art

Data processing systems employ mechanisms for error detection, diagnosis, and recovery. The Reliability, Availability, and Serviceability (RAS) concept addresses system failures, aiming to prevent, tolerate, or fix them. Error detection methods include parity codes (detecting single-bit errors) and Error Correcting Codes (ECCs) (detecting and correcting errors). Traditional Error Detection and Fault Isolation (EDFI) systems used complex checking logic integral to data flow, achieving high detection rates but increasing processor complexity, wire length, and cycle times.

Existing systems, like those using a recovery unit (Runit), maintain ECC-hardened checkpointed copies of registers. Working copies are updated during execution, while checkpoint copies are updated later. This strategy relies on fixed pipeline lengths for checkpointing, which is problematic for out-of-order execution with varying pipeline depths. Prior implementations had independent controls for register and storage checkpoints, creating cycle-time critical paths. The invention aims to provide a single point of control for blocking checkpointing and handle instructions that modify system coherent state before completion.

Summary of the Invention

The present invention provides a method and apparatus for dispatch group checkpointing in a microprocessor. It includes provisions for handling partially completed dispatch groups and instructions that modify system coherent state before completion. An instruction checkpoint retry mechanism recovers from soft errors. The processor can dispatch fixed point unit (FXU), load/store unit (LSU), and floating point unit (FPU) or vector multimedia extension (VMX) instructions concurrently. Store data is written to a store queue upon instruction completion and held until checkpointed, then released to the L2 cache.

The invention tracks instructions by dispatch group, allowing concurrent dispatch of instructions from different units. Checkpointing occurs when all instructions in a group are complete. It allows partial checkpointing if a group is flushed. A group tag (Gtag) denotes the age of a group, used with the next-to-complete (NTC) tag to determine checkpoint readiness. Recalculation of Gtag is necessary for flushed groups. The system manages separate write queues for FPU/VMX and FXU/LSU/BRU results, ensuring correct ordering and checkpointing, especially for instructions that modify system coherent state.

Brief Description of the Drawings

FIG. 1 depicts a block diagram of a data processing system, illustrating components like processors, memory controller, I/O bridge, PCI bus, network adapter, modem, and hard disk.

FIGS. 2A-2B illustrate an exemplary block diagram of a dual-threaded processor design, detailing functional units such as instruction fetch, decode, dispatch units, execution units (FXU, LSU, FPU, VMX), register sets, and the recovery unit.

FIG. 3 presents a flowchart of an exemplary operation of instruction implementation, showing the process of identifying instructions with tags (Itag), grouping them with a group tag (Gtag), performing instructions, storing data to a write queue, and incrementing a next-to-complete (NTC) counter to determine checkpoint readiness.

FIG. 4 details Gtag recalculation for flushed instruction groups, including adjustments for flushed FPU or VMX instructions, ensuring correct checkpointing context.

FIG. 5 describes the store queue operation during checkpointing. Store data is marked checkpointed and released to L2 cache when the Gtag condition is met. Error handling involves identifying errored instructions, restoring addresses, and retrying.

Detailed Description of the Preferred Embodiment

The invention provides a method and apparatus for checkpointing instruction groups in a multi-threaded processor, particularly for out-of-order floating-point instructions. It employs an instruction checkpoint retry mechanism to recover from soft errors.

Processor Architecture and Operation

The processor supports concurrent execution of multiple threads. It can dispatch fixed point unit (FXU), load/store unit (LSU), and floating point unit (FPU) or vector multimedia extension (VMX) instructions in the same cycle. Instructions are grouped into dispatch groups. FXU and LSU pipelines have similar depths, shorter than FPU and VMX pipelines. FPU and VMX instructions can execute out-of-order relative to each other and relative to FXU/LSU instructions. VMX instructions typically complete last.

The processor includes various execution units: branch unit (206), FXU (208a, 208b), LSU (207a, 207b), FPU (209a, 209b), and VMX (227a, 227b). These units operate on register sets (GPRs, FPRs, SPRs, VRs) specific to each thread or shared. Data cache (202) and Level 2 cache/memory (220) are also involved.

Checkpointing Mechanism

Checkpointing is managed by a recovery unit (Runit). Instructions are assigned an instruction tag (Itag). Instructions are grouped together and assigned a group tag (Gtag), which is the Itag of the youngest instruction in the group. Checkpointing occurs when the next-to-complete (NTC) Itag is greater than or equal to the Gtag.

Store data is written to a store queue when a store instruction finishes. This data is held until the instruction group is checkpointed, then released to the L2 cache. The Runit manages write queues for FPU/VMX results (FPWQ, stage queue) and FXU/LSU/BRU results (FXWQ). These queues store results until they are ready for checkpointing.

The system handles potential errors by maintaining checkpointed copies of registers. In case of an error, working copies are restored from the checkpointed copies, and processing resumes from the last coherent instruction boundary.

Handling Out-of-Order Execution and Errors

The Gtag mechanism helps manage checkpointing for out-of-order execution. When instructions are flushed (e.g., due to branch misprediction or exceptions), the Gtag is recalculated for the remaining instructions in the group. This allows for partial checkpointing.

Special handling is provided for instructions that modify system coherent state before completion, such as STCX (Store Conditional) and CI Load (Cache-Inhibited Load). These instructions may not be retried if an error occurs during their execution. The invention ensures these instructions are not grouped with others and are handled to allow checkpointing to proceed correctly, even if an error occurs.

The Runit acts as a single point of control for blocking checkpointing when an error is detected, ensuring consistency and recoverability. It coordinates the checkpointing of registers, the release of store data to the L2 cache, and the deallocation of completion table entries.

Data Processing System Components (FIG. 1)

The described data processing system (100) can be a symmetric multiprocessor (SMP) system with multiple SMT-capable processors (102a-102n) connected via a system bus (106). It includes a memory controller/cache (108) with local memory (109), an I/O bridge (110), and PCI bus bridges (114, 122, 124) connecting to PCI local buses (116, 126, 128). Peripherals like modems (118), network adapters (120), graphics adapters (130), and hard disks (132) are connected via these buses. A service processor (104) performs system diagnostics and error reporting.

Processor Core Details (FIGS. 2A-2B)

The processor core (200) is a superscalar, SMT-capable microprocessor. It features an instruction cache (201) feeding an instruction fetch unit (IFU) (203), which passes instructions to an instruction decode unit (IDU) (204). The IDU groups instructions for dispatch to various execution units: branch unit (206), fixed-point execution units (FXUA 208a, FXUB 208b), load/store units (LSUA 207a, LSUB 207b), floating-point execution units (FPUA 209a, FPUB 209b), and vector multimedia extension units (VMXA 227a, VMXB 227b). These units use separate register sets for each thread (GPRs 210a, 210b; FPRs 211a, 211b; VRs 228a, 228b) and shared special purpose registers (SPRs 212a, 212b, 212c). A recovery unit (215) maintains backup register copies.

Flowchart Operations (FIGS. 3-5)

FIG. 3 illustrates the instruction dispatch and checkpointing flow: instructions are tagged (Itag), grouped (Gtag), executed, and results sent to write queues. A next-to-complete (NTC) counter tracks instruction completion. Checkpointing is initiated when NTC meets or exceeds Gtag.

FIG. 4 details Gtag recalculation for flushed instruction groups, including adjustments for flushed FPU or VMX instructions, ensuring correct checkpointing context.

FIG. 5 describes the store queue operation during checkpointing. Store data is marked checkpointed and released to L2 cache when the Gtag condition is met. Error handling involves identifying errored instructions, restoring addresses, and retrying.

Claims

  1. A method for dispatch group checkpointing in a data processing system, with capability to handle partially completed groups of instructions, in a microprocessor, the method comprising: selectively grouping, by an instruction dispatch unit, ones of a plurality of decoded instructions into a set of instructions according to which ones of a plurality of execution units are available to process the plurality of decoded instructions; assigning a group identifier to the set of instructions, wherein the set of instructions includes at least one fixed point instruction and a plurality of floating point instructions, and further wherein the plurality of floating point instructions execute out of order with respect to each other as well as out of order with respect to the at least one fixed point instruction, and wherein the group identifier is used to determine whether the set of instructions can be checkpointed; dispatching, by an instruction dispatch unit, the set of instructions; sending, upon the set of instructions being dispatched, the group identifier with the set of instructions; monitoring, by the microprocessor, the processing of the set of instructions; storing result data from the processing of the set of instructions; incrementing a counter responsive to an instruction from the set of instructions completing processing, wherein the counter is maintained by completion logic; responsive to determining that the set of instructions has completed processing, moving the result data to a store queue; determining if each one of the instructions in the set of instructions completed processing without error; in response to a first subset of the set of instructions being flushed and leaving only a second subset of the set of instructions that are still being processed, calculating a new group identifier, wherein the second subset includes some, but not all, of the set of instructions; assigning the new group identifier to the second subset of the set of instructions, wherein the new group identifier is used to determine whether the second subset can be checkpointed; determining if the second subset of the set of instructions includes at least one floating point unit (FPU) or a vector multimedia extension (VMX) instruction; in response to the second subset of the set of instructions including at least one floating point unit (FPU) or a vector multimedia extension (VMX) instruction, recalculating a number of floating point instructions that are included in the second subset of the set of instructions; and sending the new group identifier and the recalculated number of floating point instructions that are included in the second subset of the set of instructions to the recovery unit.
  2. The method of claim 1, further comprising: in response to each one of the instructions in the set of instructions completing processing without error, marking the result data that was moved to the store queue as checkpointed; and releasing the result data that was moved to the store queue to a cache.
  3. The method of claim 1, further comprising: in response to an error occurring during processing of at least one particular one of the instructions in the set of instructions, identifying instructions in the set of instructions that completed processing without error; marking the result data that was moved to the store queue as checkpointed; and releasing the result data that was moved to the store queue to a cache.
  4. The method of claim 3, further comprising: identifying the at least one particular one of the instructions in the set of instructions; restoring an instruction address of the at least one particular one of the instructions in the set of instructions; and retrying the at least one particular one of the instructions in the set of instructions.
  5. The method of claim 1, wherein the instructions in the set of instructions includes one of a load/store unit (LSU), a vector multimedia extension (VMX), a branch instruction, and a non-retryable instruction in addition to the at least one fixed point instruction and the plurality of floating point instructions.
  6. The method of claim 5, further comprising; sending the group identifier and number of the plurality of floating point instructions to a recovery unit, wherein the determining that the set of instructions has completed processing is performed by comparing the group identifier to the counter.
  7. The method of claim 1, wherein each instruction in the set of instructions is identified by an instruction identifier.
  8. The method of claim 1, wherein the storing of result data includes storing the result data in one of a write queue, a reorder buffer and a stage queue.
  9. The method of claim 1, further comprising: in response to an error occurring during processing of at least one particular one of the instructions in the set of instructions, waiting for one more of the instructions in the set of instructions to checkpoint, wherein the at least one particular one of the instructions in the set of instructions comprises at least one non-retryable instruction; and blocking a checkpoint.

PDF preview unavailable. Download the PDF instead.

US7478276 ImageMagick 6.6.0-1 2010-03-04 Q8 http://www.imagemagick.org

Related Documents

Preview Method and Apparatus for Reducing Checkpoint Cycles in Multi-Threaded Processors
This patent describes a method and apparatus for reducing the number of cycles required to checkpoint instructions in a multi-threaded processor by optimizing the checkpoint pipeline and group checkpointing logic.
Preview Method and Apparatus for Execution of Threads on Processing Slices Using a History Buffer
This patent describes a method and apparatus for executing threads on processing slices, utilizing a history buffer to restore architected register data. It details techniques for managing processor state during instruction execution, particularly for handling interruptions and ensuring data integrity in multi-slice processor architectures.
Preview Method and Apparatus for Fast Synchronization and Out-of-Order Execution in Meta-Program Based Computing Systems
IBM's US Patent 8,301,870 B2 details a method and apparatus for enhancing synchronization and out-of-order execution in meta-program based computing systems. It addresses challenges in multi-threaded processor performance through address monitoring and specialized instruction tagging.
Preview IBM 3270 Information Display System: 3274 Control Unit Description and Programmer's Guide
A comprehensive technical manual detailing the IBM 3270 Information Display System, with a specific focus on the IBM 3274 Control Unit. It covers functional and programming aspects, data streams, operations, and integration within IBM environments.
Preview IBM PowerPC 440GP Embedded Processor Data Sheet
Detailed data sheet for the IBM PowerPC 440GP embedded processor, outlining its features, architecture, interfaces (PCI-X, Ethernet, DDR SDRAM), and specifications for high-end embedded applications.
Preview IBM 1130 Disk Monitor System, Version 2: Programmer's and Operator's Guide
A comprehensive technical manual detailing the IBM 1130 Disk Monitor System, Version 2. This guide covers system configuration, programming techniques, operating procedures, and utility programs for the IBM 1130 computing system, intended for programmers and operators.
Preview IBM Software Group Product Guide: Comprehensive Overview of Middleware Solutions
Explore the IBM Software Group (SWG) Product Guide, a comprehensive resource detailing IBM's middleware products, capabilities, and relationships. Ideal for sellers and business partners seeking to understand and cross-sell IBM software solutions.
Preview IBM 1130 Disk Monitor System, Version 2: Programming and Operator's Guide
This comprehensive guide provides essential operating and maintenance procedures for the IBM 1130 Disk Monitor System, Version 2. It includes an introductory section for new users, programming tips, and detailed information on monitor system control records and error messages.