Optimization of Far Calls

Optimization of far calls can result in smaller executable files and improved performance. It is most useful when the automatic grouping of logical segments into physical segments takes place. Note that, by default, automatic grouping is performed by the Watcom Linker.

The Watcom C, C++ and FORTRAN 77 compilers automatically enable the far call optimization. The Watcom Linker optimizes far calls to procedures that reside in the same physical segment as the caller.

For example, a large code model program probably contains many far calls to procedures in the same physical segment. Since the segment address of the caller is the same as the segment address of the called procedure, only a near call is necessary. A near call doesn't require a relocation entry in the relocation table of the executable file whereas a far call does. Thus, the far call optimization results in smaller executable files that load faster. Furthermore, a near call generally executes faster than a far call, particularly on 286- and 386-based machines where, for applications running in protected mode, segment switching is fairly expensive.

The following describes the far call optimization. The call far label instruction is converted to one of the following sequences of code:

push   cs               seg     ss
call   near label       push    cs
nop                     call    near label

Note the following:

  1. The nop or seg ss instruction is present since a call far label instruction is five bytes. The push cs instruction is one byte and the call near label instruction is three bytes. The seg ss instruction is used because it is faster than the nop instruction.
  2. The called procedure still uses a retf instruction, but since the code segment and the near address are pushed on the stack, the far return executes correctly.
  3. The position of the padding instruction is chosen so that the return address is word-aligned. A word-aligned return address improves performance.
  4. When two consecutive call far label instructions are optimized, and the first call far label instruction is word-aligned, the following sequence replaces both call far label instructions:

        push    cs
        call    near label1
        seg     ss
        push    cs
        seg     cs
        call    near label2
        
    
  5. If your program contains only near calls, this optimization has no effect.

A far jump optimization is also performed by the Watcom Linker. This has the same benefits as the far call optimization. A jmp far label instruction to a location in the same segment is replaced by the following sequence of code:

    jmp    near label
    mov    ax,ax

Note that for 32-bit segments, this instruction becomes mov eax,eax.