SDCC Compiler User Guide: Optimizations

6. Optimizations

SDCC performs a a host of standard optimizations in addition to some MCU specific optimizations.

6.1 Sub-expression elimination

The compiler does local and global common subexpression elimination.

eg.

i = x + y + 1; 
j = x + y;

will be translated to

iTemp = x + y 
i = iTemp + 1 
j = iTemp

Some subexpressions are not as obvious as the above example.

eg.

a->b[i].c = 10; 
a->b[i].d = 11;

In this case the address arithmetic a->b[i] will be computed only once; the equivalent code in C would be.

iTemp = a->b[i]; 
iTemp.c = 10; 
iTemp.d = 11;

The compiler will try to keep these temporary variables in registers.

6.2 Dead-Code elimination.

eg.

int global; 
void f () { 
  int i; 
  i = 1;    /* dead store */ 
  global
 = 1; /* dead store */ 
  global = 2; 
  return; 
  global = 3; /* unreachable
 */ 
}

will be changed to

int global; void f () 
{     
 global = 2;     
 return; 
}

6.3 Copy-Propagation:

eg.

int f() { 
   int i, j; 
   i = 10; 
   j = i; 
   return j; 
}

will be changed to

int f() { 
    int i,j; 
    i = 10; 
    j = 10; 
    return 10;
 
}

Note: the dead stores created by this copy propagation will be eliminated by dead-code elimination .

Two types of loop optimizations are done by SDCC loop invariant lifting and strength reduction of loop induction variables.In addition to the strength reduction the optimizer marks the induction variables and the register allocator tries to keep the induction variables in registers for the duration of the loop. Because of this preference of the register allocator , loop induction optimization causes an increase in register pressure, which may cause unwanted spilling of other temporary variables into the stack / data space . The compiler will generate a warning message when it is forced to allocate extra space either on the stack or data space. If this extra space allocation is undesirable then induction optimization can be eliminated either for the entire source file ( with --noinduction option) or for a given function only (#pragma NOINDUCTION).

Loop Invariant:

for (i = 0 ; i < 100 ; i ++) 
     f += k + l;

changed to

itemp = k + l; 
for ( i = 0; i < 100; i++ ) f += itemp;

As mentioned previously some loop invariants are not as apparent, all static address computations are also moved out of the loop.

Strength reduction :

This optimization substitutes an expression by a cheaper expression.

eg.

for (i=0;i < 100; i++) ar[i*5] = i*3;

changed to

itemp1 = 0; 
itemp2 = 0; 
for (i=0;i< 100;i++) { 
     ar[itemp1]
 = itemp2; 
     itemp1 += 5; 
     itemp2 += 3; 
}

The more expensive multiplication is changed to a less expensive addition.

Loop reversing:

This optimization is done to reduce the overhead of checking loop boundaries for every iteration. Some simple loops can be reversed and implemented using a "decrement and jump if not zero" instruction. SDCC checks for the following criterion to determine if a loop is reversible (note: more sophisticated compiers use data-dependency analysis to make this determination, SDCC uses a more simple minded analysis).

The 'for' loop is of the form "for ( <symbol> = <expression> ; <sym> [< | <=] <expression> ; [<sym>++ | <sym> += 1]) <for body>"
The <for body> does not contain "continue" or 'break".
All goto's are contained within the loop.
No function calls within the loop.
The loop control variable <sym> is not assigned any value within the loop
The loop control variable does NOT participate in any arithmetic operation within the loop.
There are NO switch statements in the loop.

Note djnz instruction can be used for 8-bit values ONLY, therefore it is advantageous to declare loop control symbols as either 'char' or 'short', ofcourse this may not be possible on all situations.

6.5 Algebraic simplifications:

SDCC does numerous algebraic simplifications, the following is a small sub-set of these optimizations.

eg 
i = j + 0 ; /* changed to */ i = j; 
i /= 2; /* changed to */ i >>=
 1; 
i = j - j ; /* changed to */ i = 0; 
i = j / 1 ; /* changed to */ i = j;

Note the subexpressions given above are generally introduced by macro expansions or as a result of copy/constant propagation.

6.6 'switch' statements.

SDCC changes switch statements to jump tables when the following conditions are true.

The case labels are in numerical sequence , the labels need not be in order, and the starting number need not be one or zero.

switch(i) {                         switch (i) { 
case 4:...
                          case 1: ... 
case 5:...                          case
 2: ... 
case 3:...                          case 3: ... 
case 6:...        
                  case 4: ... 
}                                   }

Both the above switch statements will be implemented using a jump-table.

The number of case labels is at least three, since it takes two conditional statements to handle the boundary conditions.
The number of case labels is less than 84, since each label takes 3 bytes and a jump-table can be utmost 256 bytes long.

Switch statements which have gaps in the numeric sequence or those that have more that 84 case labels can be split into more than one switch statement for efficient code generation.

switch (i) { 
case 1: ... 
case 2: ... 
case 3: ... 
case 4: ... 
case
 9: ... 
case 10: ... 
case 11: ... 
case 12: ... 
}

If the above switch statement is broken down into two switch statements

switch (i) { 
case 1: ... 
case 2: ... 
case 3: ... 
case 4: ... 
}switch (i) { 
case 9: ... 
case 10: ... 
case 11: ... 
case 12:...
 
}

then both the switch statements will be implemented using jump-tables whereas the unmodified switch statement will not be .

6.7 bit-shifting operations.

Bit shifting is one of the most frequently used operation in embedded programming . SDCC tries to implement bit-shift operations in the most efficient way possible.

eg.

unsigned short i;... 
i>>= 4; 
..

generates the following code.

mov a,_i 
swap a 
anl a,#0x0f 
mov _i,a

In general SDCC will never setup a loop if the shift count is known. Another example

unsigned int i; 
... 
i >>= 9; 
...

will generate

mov a,(_i + 1) 
mov (_i + 1),#0x00 
clr c 
rrc a 
mov _i,a

Note that SDCC stores numbers in little-endian format (i.e. lowest order first)

Bit-rotation:

A special case of the bit-shift operation is bit rotation, SDCC recognizes the following expression to be a left bit-rotation.

unsigned char i; 
... 
i = ( ( i << 1) | ( i >> 7)); 
...

will generate the following code.

mov a,_i 
rl a 
mov _i,a

SDCC uses pattern matching on the parse tree to determine this operation .Variations of this case will also be recognized as bit-rotation i.e i = ((i >> 7) | (i << 1)); /* left-bit rotation */

6.8 Highest Order Bit.

It is frequently required to obtain the highest order bit of an integral type (int,long,short or char types). SDCC recognizes the following expression to yield the highest order bit and generates optimized code for it.

eg 
unsigned int gint; 
foo () { 
unsigned char hob; 
   ... 
   hob
 = (gint >> 15) & 1; 
   .. 
}

Will generate the following code.

                             61 ;  hob.c 7 
   000A E5*01               
 62         mov  a,(_gint + 1) 
   000C 33                   63         rlc 
 a 
   000D E4                   64         clr  a 
   000E 13                  
 65         rrc  a 
   000F F5*02                66         mov  _foo_hob_1_1,a

Variations of this case however will NOT be recognized . It is a standard C expression , so I heartily recommend this be the only way to get the highest order bit, (it is portable). Of course it will be recognized even if it is embedded in other expressions.

eg.xyz = gint + ((gint >> 15) & 1);

will still be recognized.

6.9 Peep-hole optimizer.

The compiler uses a rule based , pattern matching and re-writing mechanism for peep-hole optimization . It is inspired by 'copt' a peep-hole optimizer by Christopher W. Fraser (cwfraser@microsoft.com). A default set of rules are compiled into the compiler, additional rules may be added with the --peep-file <filename> option. The rule language is best illustrated with examples.

replace { 
mov %1,a 
mov a,%1 } by { mov %1,a
 }

The above rule will the following assembly sequence

mov r1,a 
mov a,r1

mov r1,a

Note: All occurrences of a '%n' ( pattern variable ) must denote the same string. With the above rule, the assembly sequence

mov r1,a 
mov a,r2

will remain unmodified. Other special case optimizations may be added by the user (via --peep-file option), eg. some variants of the 8051 MCU allow only 'AJMP' and 'ACALL' , the following two rules will change all 'LJMP' & 'LCALL' to 'AJMP' & 'ACALL'.

replace { lcall %1 } by { acall %1 }
 
replace { ljmp %1 } by { ajmp %1 }

The inline-assembler' code is also passed through the peep hole optimizer, thus the peephole optimizer can also be used as an assembly level macro expander. The rules themselves are MCU dependent whereas the rule language infra-structure is MCU independent. Peephole optimization rules for other MCU can be easily programmed using the rule language.

The syntax for a rule is as follows ,

rule := replace [ restart ] '{' <assembly sequence>
 '\n' 
                            '}' by '{' '\n' 
   
                             <assembly sequence> '\n' 
         
                   '}' [if <functionName> ] '\n' 
<assembly
 sequence> := assembly instruction (each instruction including labels must
 be on a separate line).

The optimizer will apply to the rules one by one from the top in the sequence of their appearance, it will terminate when all rules are exhausted. If the 'restart' option is specified, then the optimizer will start matching the rules again from the top, this option for a rule is expensive (performance), it is intended to be used in situations where a transformation will trigger the same rule again. A good example of this the following rule.

replace restart { 
pop %1 
push %1 } by {
 
; nop 
}

Note that the replace pattern cannot be a blank, but can be a comment line. Without the 'restart' option only the inner most 'pop' 'push' pair would be eliminated. i.e.

pop ar1 
pop ar2 
push ar2 
push ar1

would result in

pop ar1 
; nop 
push ar1

with the 'restart' option the rule will be applied again to the resulting code and the all the 'pop' 'push' pairs will be eliminated to yield

; nop 
; nop

A conditional function can be attached to a rule. Attaching rules are somewhat more involved, let me illustrate this with an example.

replace { 
     ljmp %5 
%2:} by { 
     sjmp
 %5 
%2:} if labelInRange

The optimizer does a look-up of a function name table defined in function 'callFuncByName' in the source file SDCCpeeph.c , with the name 'labelInRange', if it finds a corresponding entry the function is called. Note there can be no parameters specified for these functions, in this case the use of '%5' is crucial, since the function labelInRange expects to find the label in that particular variable (the hash table containing the variable bindings is passed as a parameter). If you want to code more such functions , take a close look at the function labelInRange and the calling mechanism in source file SDCCpeeph.c. I know this whole thing is a little kludgey , may be some day we will have some better means. If you are looking at this file, you will also see the default rules that are compiled into the compiler, you can your own rules in the default set there if you get tired of specifying the --peep-file option.

Next Previous Contents