Function Units

Using and Porting GNU CC

15.15.8: Specifying Function Units

On most RISC machines, there are instructions whose results are not available for a specific number of cycles. Common cases are instructions that load data from memory. On many machines, a pipeline stall will result if the data is referenced too soon after the load instruction.

In addition, many newer microprocessors have multiple function units, usually one for integer and one for floating point, and often will incur pipeline stalls when a result that is needed is not yet ready.

The descriptions in this section allow the specification of how much time must elapse between the execution of an instruction and the time when its result is used. It also allows specification of when the execution of an instruction will delay execution of similar instructions due to function unit conflicts.

For the purposes of the specifications in this section, a machine is divided into function units, each of which execute a specific class of instructions in first-in-first-out order. Function units that accept one instruction each cycle and allow a result to be used in the succeeding instruction (usually via forwarding) need not be specified. Classic RISC microprocessors will normally have a single function unit, which we can call `memory'. The newer ``superscalar'' processors will often have function units for floating point operations, usually at least a floating point adder and multiplier.

Each usage of a function units by a class of insns is specified with a define_function_unit expression, which looks like this:

(define_function_unit name multiplicity simultaneity
                      test ready-delay issue-delay
                     [conflict-list])

name is a string giving the name of the function unit.

multiplicity is an integer specifying the number of identical units in the processor. If more than one unit is specified, they will be scheduled independently. Only truly independent units should be counted; a pipelined unit should be specified as a single unit. (The only common example of a machine that has multiple function units for a single instruction class that are truly independent and not pipelined are the two multiply and two increment units of the CDC 6600.)

simultaneity specifies the maximum number of insns that can be executing in each instance of the function unit simultaneously or zero if the unit is pipelined and has no limit.

All define_function_unit definitions referring to function unit name must have the same name and values for multiplicity and simultaneity.

test is an attribute test that selects the insns we are describing in this definition. Note that an insn may use more than one function unit and a function unit may be specified in more than one define_function_unit.

ready-delay is an integer that specifies the number of cycles after which the result of the instruction can be used without introducing any stalls.

issue-delay is an integer that specifies the number of cycles after the instruction matching the test expression begins using this unit until a subsequent instruction can begin. A cost of N indicates an N-1 cycle delay. A subsequent instruction may also be delayed if an earlier instruction has a longer ready-delay value. This blocking effect is computed using the simultaneity, ready-delay, issue-delay, and conflict-list terms. For a normal non-pipelined function unit, simultaneity is one, the unit is taken to block for the ready-delay cycles of the executing insn, and smaller values of issue-delay are ignored.

conflict-list is an optional list giving detailed conflict costs for this unit. If specified, it is a list of condition test expressions to be applied to insns chosen to execute in name following the particular insn matching test that is already executing in name. For each insn in the list, issue-delay specifies the conflict cost; for insns not in the list, the cost is zero. If not specified, conflict-list defaults to all instructions that use the function unit.

Typical uses of this vector are where a floating point function unit can pipeline either single- or double-precision operations, but not both, or where a memory unit can pipeline loads, but not stores, etc.

As an example, consider a classic RISC machine where the result of a load instruction is not available for two cycles (a single ``delay'' instruction is required) and where only one load instruction can be executed simultaneously. This would be specified as:

(define_function_unit "memory" 1 1 (eq_attr "type" "load") 2 0)

For the case of a floating point function unit that can pipeline either single or double precision, but not both, the following could be specified:

(define_function_unit
   "fp" 1 0 (eq_attr "type" "sp_fp") 4 4 [(eq_attr "type" "dp_fp")])
(define_function_unit
   "fp" 1 0 (eq_attr "type" "dp_fp") 4 4 [(eq_attr "type" "sp_fp")])

Note: The scheduler attempts to avoid function unit conflicts and uses all the specifications in the define_function_unit expression. It has recently come to our attention that these specifications may not allow modeling of some of the newer ``superscalar'' processors that have insns using multiple pipelined units. These insns will cause a potential conflict for the second unit used during their execution and there is no way of representing that conflict. We welcome any examples of how function unit conflicts work in such processors and suggestions for their representation.