-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
-<HTML>
-<HEAD>
- <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.7">
- <TITLE>SDCC Compiler User Guide: Optimizations</TITLE>
- <LINK HREF="SDCCUdoc-7.html" REL=next>
- <LINK HREF="SDCCUdoc-5.html" REL=previous>
- <LINK HREF="SDCCUdoc.html#toc6" REL=contents>
-</HEAD>
-<BODY>
-<A HREF="SDCCUdoc-7.html">Next</A>
-<A HREF="SDCCUdoc-5.html">Previous</A>
-<A HREF="SDCCUdoc.html#toc6">Contents</A>
-<HR>
-<H2><A NAME="Optimizations"></A> <A NAME="s6">6. Optimizations</A> </H2>
-
-<P>SDCC performs a a host of standard optimizations in addition to some MCU
-specific optimizations.
-<H2><A NAME="Sub-expression Elimination"></A> <A NAME="ss6.1">6.1 Sub-expression elimination</A>
- </H2>
-
-<P>The compiler does local and global common subexpression elimination.
-<P><CODE>eg. </CODE>
-<P>
-<PRE>
-i = x + y + 1;
-j = x + y;
-
-</PRE>
-<P>will be translated to
-<P>
-<PRE>
-iTemp = x + y
-i = iTemp + 1
-j = iTemp
-
-</PRE>
-<P>Some subexpressions are not as obvious as the above example.
-<P>eg.
-<P>
-<PRE>
-a->b[i].c = 10;
-a->b[i].d = 11;
-
-</PRE>
-<P>In this case the address arithmetic a->b[i] will be computed
-only once; the equivalent code in C would be.
-<P>
-<PRE>
-iTemp = a->b[i];
-iTemp.c = 10;
-iTemp.d = 11;
-
-</PRE>
-<P>The compiler will try to keep these temporary variables in registers.
-<H2><A NAME="Dead-code elimination"></A> <A NAME="ss6.2">6.2 Dead-Code elimination.</A>
- </H2>
-
-<P>eg.
-<P>
-<PRE>
-int global;
-void f () {
- int i;
- i = 1; /* dead store */
- global
- = 1; /* dead store */
- global = 2;
- return;
- global = 3; /* unreachable
- */
-}
-
-</PRE>
-<P>will be changed to
-<P>
-<PRE>
-int global; void f ()
-{
- global = 2;
- return;
-}
-
-</PRE>
-<H2><A NAME="Copy-Propagation"></A> <A NAME="ss6.3">6.3 Copy-Propagation:</A>
- </H2>
-
-<P>eg.
-<P>
-<PRE>
-int f() {
- int i, j;
- i = 10;
- j = i;
- return j;
-}
-
-</PRE>
-<P>will be changed to
-<P>
-<PRE>
-int f() {
- int i,j;
- i = 10;
- j = 10;
- return 10;
-
-}
-
-</PRE>
-<P>Note: the dead stores created by this copy propagation will be eliminated
-by dead-code elimination .
-<H2><A NAME="Loop Optimizations"></A> <A NAME="ss6.4">6.4 Loop optimizations</A>
- </H2>
-
-<P>Two types of loop optimizations are done by SDCC loop invariant lifting
-and strength reduction of loop induction variables.In addition to the strength
-reduction the optimizer marks the induction variables and the register allocator
-tries to keep the induction variables in registers for the duration of the
-loop. Because of this preference of the register allocator , loop induction
-optimization causes an increase in register pressure, which may cause unwanted
-spilling of other temporary variables into the stack / data space . The compiler
-will generate a warning message when it is forced to allocate extra space either
-on the stack or data space. If this extra space allocation is undesirable then
-induction optimization can be eliminated either for the entire source file
-( with --noinduction option) or for a given function only (#pragma NOINDUCTION).
-<H3><A NAME="Loop Invariant"></A> Loop Invariant: </H3>
-
-<P>eg
-<P>
-<PRE>
-for (i = 0 ; i < 100 ; i ++)
- f += k + l;
-
-</PRE>
-<P>changed to
-<P>
-<PRE>
-itemp = k + l;
-for ( i = 0; i < 100; i++ ) f += itemp;
-
-</PRE>
-<P>As mentioned previously some loop invariants are not as apparent, all static
-address computations are also moved out of the loop.
-<H3><A NAME="Strength Reduction"></A> Strength reduction : </H3>
-
-<P>This optimization substitutes an expression by a cheaper expression.
-<P>eg.
-<P>
-<PRE>
-for (i=0;i < 100; i++) ar[i*5] = i*3;
-
-</PRE>
-<P>changed to
-<P>
-<PRE>
-itemp1 = 0;
-itemp2 = 0;
-for (i=0;i< 100;i++) {
- ar[itemp1]
- = itemp2;
- itemp1 += 5;
- itemp2 += 3;
-}
-
-</PRE>
-<P>The more expensive multiplication is changed to a less expensive addition.
-<H3><A NAME="Loop reversing"></A> Loop reversing: </H3>
-
-<P>This optimization is done to reduce the overhead of checking loop boundaries
-for every iteration. Some simple loops can be reversed and implemented using
-a "decrement and jump if not zero" instruction. SDCC checks for the following
-criterion to determine if a loop is reversible (note: more sophisticated compiers
-use data-dependency analysis to make this determination, SDCC uses a more simple
-minded analysis).
-<P>
-<UL>
-<LI>The 'for' loop is of the form
-"for ( <symbol> = <expression>
-; <sym> [< | <=] <expression> ; [<sym>++
-| <sym> += 1])
-<for body>"</LI>
-<LI>The <for body> does not contain "continue" or 'break".</LI>
-<LI>All goto's are contained within the loop.</LI>
-<LI>No function calls within the loop.</LI>
-<LI>The loop control variable <sym> is not assigned any value within
-the loop</LI>
-<LI>The loop control variable does NOT participate in any arithmetic operation
-within the loop.</LI>
-<LI>There are NO switch statements in the loop.</LI>
-</UL>
-<P>Note djnz instruction can be used for 8-bit values ONLY, therefore it is
-advantageous to declare loop control symbols as either 'char' or 'short', ofcourse
-this may not be possible on all situations.
-<H2><A NAME="Algebraic Simplifications"></A> <A NAME="ss6.5">6.5 Algebraic simplifications:</A>
- </H2>
-
-<P>SDCC does numerous algebraic simplifications, the following is a small
-sub-set of these optimizations.
-<P>
-<PRE>
-eg
-i = j + 0 ; /* changed to */ i = j;
-i /= 2; /* changed to */ i >>=
- 1;
-i = j - j ; /* changed to */ i = 0;
-i = j / 1 ; /* changed to */ i = j;
-
-</PRE>
-<P>Note the subexpressions given above are generally introduced by macro expansions
-or as a result of copy/constant propagation.
-<H2><A NAME="Switch Statement"></A> <A NAME="ss6.6">6.6 'switch' statements.</A>
- </H2>
-
-<P>SDCC changes switch statements to jump tables when the following conditions
-are true.
-<P>
-<UL>
-<LI>The case labels are in numerical sequence , the labels need not be in order,
-and the starting number need not be one or zero.</LI>
-</UL>
-<P>eg
-<P>
-<PRE>
-switch(i) { switch (i) {
-case 4:...
- case 1: ...
-case 5:... case
- 2: ...
-case 3:... case 3: ...
-case 6:...
- case 4: ...
-} }
-
-</PRE>
-<P>Both the above switch statements will be implemented using a jump-table.
-<P>
-<UL>
-<LI>The number of case labels is at least three, since it takes two conditional
-statements to handle the boundary conditions.</LI>
-<LI>The number of case labels is less than 84, since each label takes 3 bytes
-and a jump-table can be utmost 256 bytes long. </LI>
-</UL>
-<P>Switch statements which have gaps in the numeric sequence or those that
-have more that 84 case labels can be split into more than one switch statement
-for efficient code generation.
-<P>eg
-<P>
-<PRE>
-switch (i) {
-case 1: ...
-case 2: ...
-case 3: ...
-case 4: ...
-case
- 9: ...
-case 10: ...
-case 11: ...
-case 12: ...
-}
-
-</PRE>
-<P>If the above switch statement is broken down into two switch statements
-<P>
-<PRE>
-switch (i) {
-case 1: ...
-case 2: ...
-case 3: ...
-case 4: ...
-}switch (i) {
-case 9: ...
-case 10: ...
-case 11: ...
-case 12:...
-
-}
-
-</PRE>
-<P>then both the switch statements will be implemented using jump-tables whereas
-the unmodified switch statement will not be .
-<H2><A NAME="bit shifting"></A> <A NAME="ss6.7">6.7 bit-shifting operations.</A>
- </H2>
-
-<P>Bit shifting is one of the most frequently used operation in embedded programming
-. SDCC tries to implement bit-shift operations in the most efficient way possible.
-<P>eg.
-<P>
-<PRE>
-unsigned short i;...
-i>>= 4;
-..
-
-</PRE>
-<P>generates the following code.
-<P>
-<PRE>
-mov a,_i
-swap a
-anl a,#0x0f
-mov _i,a
-
-</PRE>
-<P>In general SDCC will never setup a loop if the shift count is known. Another
-example
-<P>
-<PRE>
-unsigned int i;
-...
-i >>= 9;
-...
-
-</PRE>
-<P>will generate
-<P>
-<PRE>
-mov a,(_i + 1)
-mov (_i + 1),#0x00
-clr c
-rrc a
-mov _i,a
-
-</PRE>
-<P>Note that SDCC stores numbers in little-endian format (i.e. lowest order
-first)
-<H3><A NAME="bit rotation"></A> Bit-rotation: </H3>
-
-<P>A special case of the bit-shift operation is bit rotation, SDCC recognizes
-the following expression to be a left bit-rotation.
-<P>
-<PRE>
-unsigned char i;
-...
-i = ( ( i << 1) | ( i >> 7));
-...
-
-</PRE>
-<P>will generate the following code.
-<P>
-<PRE>
-mov a,_i
-rl a
-mov _i,a
-
-</PRE>
-<P>SDCC uses pattern matching on the parse tree to determine this operation
-.Variations of this case will also be recognized as bit-rotation i.e i = ((i
->> 7) | (i << 1)); /* left-bit rotation */
-<H2><A NAME="Highest Order Bit"></A> <A NAME="ss6.8">6.8 Highest Order Bit.</A>
- </H2>
-
-<P>It is frequently required to obtain the highest order bit of an integral
-type (int,long,short or char types). SDCC recognizes the following expression
-to yield the highest order bit and generates optimized code for it.
-<P>
-<PRE>
-eg
-unsigned int gint;
-foo () {
-unsigned char hob;
- ...
- hob
- = (gint >> 15) & 1;
- ..
-}
-
-</PRE>
-<P>Will generate the following code.
-<P>
-<PRE>
- 61 ; hob.c 7
- 000A E5*01
- 62 mov a,(_gint + 1)
- 000C 33 63 rlc
- a
- 000D E4 64 clr a
- 000E 13
- 65 rrc a
- 000F F5*02 66 mov _foo_hob_1_1,a
-
-</PRE>
-<P>Variations of this case however will NOT be recognized . It is a standard
-C expression , so I heartily recommend this be the only way to get the highest
-order bit, (it is portable). Of course it will be recognized even if it is
-embedded in other expressions.
-<P>
-<PRE>
-eg.xyz = gint + ((gint >> 15) & 1);
-
-</PRE>
-<P>will still be recognized.
-<H2><A NAME="Peep-Hole"></A> <A NAME="ss6.9">6.9 Peep-hole optimizer.</A>
- </H2>
-
-<P>The compiler uses a rule based , pattern matching and re-writing mechanism
-for peep-hole optimization . It is inspired by 'copt' a peep-hole optimizer
-by Christopher W. Fraser (cwfraser@microsoft.com). A default set of rules are
-compiled into the compiler, additional rules may be added with the --peep-file
-<filename> option. The rule language is best illustrated with examples.
-<P>
-<PRE>
-replace {
-mov %1,a
-mov a,%1 } by { mov %1,a
- }
-
-</PRE>
-<P>The above rule will the following assembly sequence
-<P>
-<PRE>
-mov r1,a
-mov a,r1
-
-</PRE>
-<P>to
-<P>
-<PRE>
-mov r1,a
-
-</PRE>
-<P>Note: All occurrences of a '%n' ( pattern variable ) must denote
-the same string. With the above rule, the assembly sequence
-<P>
-<PRE>
-mov r1,a
-mov a,r2
-
-</PRE>
-<P>will remain unmodified. Other special case optimizations may be added by
-the user (via --peep-file option), eg. some variants of the 8051 MCU allow
-only 'AJMP' and 'ACALL' , the following two rules will change all 'LJMP' &
-'LCALL' to 'AJMP' & 'ACALL'.
-<P>
-<PRE>
-replace { lcall %1 } by { acall %1 }
-
-replace { ljmp %1 } by { ajmp %1 }
-
-</PRE>
-<P>The inline-assembler' code is also passed through the peep hole optimizer,
-thus the peephole optimizer can also be used as an assembly level macro expander.
-The rules themselves are MCU dependent whereas the rule language infra-structure
-is MCU independent. Peephole optimization rules for other MCU can be easily
-programmed using the rule language.
-<P>The syntax for a rule is as follows ,
-<P>
-<PRE>
-rule := replace [ restart ] '{' <assembly sequence>
- '\n'
- '}' by '{' '\n'
-
- <assembly sequence> '\n'
-
- '}' [if <functionName> ] '\n'
-<assembly
- sequence> := assembly instruction (each instruction including labels must
- be on a separate line).
-
-</PRE>
-<P>The optimizer will apply to the rules one by one from the top in the sequence
-of their appearance, it will terminate when all rules are exhausted. If the
-'restart' option is specified, then the optimizer will start matching the rules
-again from the top, this option for a rule is expensive (performance), it is
-intended to be used in situations where a transformation will trigger the same
-rule again. A good example of this the following rule.
-<P>
-<PRE>
-replace restart {
-pop %1
-push %1 } by {
-
-; nop
-}
-
-</PRE>
-<P>Note that the replace pattern cannot be a blank, but can be a comment line.
-Without the 'restart' option only the inner most 'pop' 'push' pair would be
-eliminated. i.e.
-<P>
-<PRE>
-pop ar1
-pop ar2
-push ar2
-push ar1
-
-</PRE>
-<P>would result in
-<P>
-<PRE>
-pop ar1
-; nop
-push ar1
-
-</PRE>
-<P>with the 'restart' option the rule will be applied again to the resulting
-code and the all the 'pop' 'push' pairs will be eliminated to yield
-<P>
-<PRE>
-; nop
-; nop
-
-</PRE>
-<P>A conditional function can be attached to a rule. Attaching rules are somewhat
-more involved, let me illustrate this with an example.
-<P>
-<PRE>
-replace {
- ljmp %5
-%2:} by {
- sjmp
- %5
-%2:} if labelInRange
-
-</PRE>
-<P>The optimizer does a look-up of a function name table defined in function
-'callFuncByName' in the source file SDCCpeeph.c , with the name 'labelInRange',
-if it finds a corresponding entry the function is called. Note there can be
-no parameters specified for these functions, in this case the use of '%5'
-is crucial, since the function labelInRange expects to find the label in that
-particular variable (the hash table containing the variable bindings is passed
-as a parameter). If you want to code more such functions , take a close look
-at the function labelInRange and the calling mechanism in source file SDCCpeeph.c.
-I know this whole thing is a little kludgey , may be some day we will have
-some better means. If you are looking at this file, you will also see the default
-rules that are compiled into the compiler, you can your own rules in the default
-set there if you get tired of specifying the --peep-file option.
-<HR>
-<A HREF="SDCCUdoc-7.html">Next</A>
-<A HREF="SDCCUdoc-5.html">Previous</A>
-<A HREF="SDCCUdoc.html#toc6">Contents</A>
-</BODY>
-</HTML>