1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
3 <!--Converted with LaTeX2HTML 99.1 release (March 30, 1999)
4 original version by: Nikos Drakos, CBLU, University of Leeds
5 * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
6 * with significant contributions from:
7 Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
10 <TITLE>4.1 Optimizations</TITLE>
11 <META NAME="description" CONTENT="4.1 Optimizations">
12 <META NAME="keywords" CONTENT="sdccman">
13 <META NAME="resource-type" CONTENT="document">
14 <META NAME="distribution" CONTENT="global">
16 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
17 <META NAME="Generator" CONTENT="LaTeX2HTML v99.1 release">
18 <META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
20 <LINK REL="STYLESHEET" HREF="sdccman.css">
22 <LINK REL="next" HREF="node40.html">
23 <LINK REL="previous" HREF="node38.html">
24 <LINK REL="up" HREF="node38.html">
25 <LINK REL="next" HREF="node40.html">
29 <!--Navigation Panel-->
32 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="next_motif.gif"></A>
35 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="up_motif.gif"></A>
38 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="previous_motif.gif"></A>
41 <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="contents_motif.gif"></A>
44 <IMG WIDTH="43" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="index" SRC="index_motif.gif"></A>
46 <B> Next:</B> <A NAME="tex2html842"
47 HREF="node40.html">4.2 Pragmas</A>
48 <B> Up:</B> <A NAME="tex2html836"
49 HREF="node38.html">4. SDCC Technical Data</A>
50 <B> Previous:</B> <A NAME="tex2html830"
51 HREF="node38.html">4. SDCC Technical Data</A>
52   <B> <A NAME="tex2html838"
53 HREF="node1.html">Contents</A></B>
54   <B> <A NAME="tex2html840"
55 HREF="node61.html">Index</A></B>
58 <!--End of Navigation Panel-->
59 <!--Table of Child-Links-->
60 <A NAME="CHILD_LINKS"><STRONG>Subsections</STRONG></A>
63 <LI><A NAME="tex2html843"
64 HREF="node39.html#SECTION00051100000000000000">4.1.1 Sub-expression Elimination</A>
65 <LI><A NAME="tex2html844"
66 HREF="node39.html#SECTION00051200000000000000">4.1.2 Dead-Code Elimination</A>
67 <LI><A NAME="tex2html845"
68 HREF="node39.html#SECTION00051300000000000000">4.1.3 Copy-Propagation</A>
69 <LI><A NAME="tex2html846"
70 HREF="node39.html#SECTION00051400000000000000">4.1.4 Loop Optimizations</A>
71 <LI><A NAME="tex2html847"
72 HREF="node39.html#SECTION00051500000000000000">4.1.5 Loop Reversing</A>
73 <LI><A NAME="tex2html848"
74 HREF="node39.html#SECTION00051600000000000000">4.1.6 Algebraic Simplifications</A>
75 <LI><A NAME="tex2html849"
76 HREF="node39.html#SECTION00051700000000000000">4.1.7 'switch' Statements</A>
77 <LI><A NAME="tex2html850"
78 HREF="node39.html#SECTION00051800000000000000">4.1.8 Bit-shifting Operations.</A>
79 <LI><A NAME="tex2html851"
80 HREF="node39.html#SECTION00051900000000000000">4.1.9 Bit-rotation</A>
81 <LI><A NAME="tex2html852"
82 HREF="node39.html#SECTION000511000000000000000">4.1.10 Highest Order Bit</A>
83 <LI><A NAME="tex2html853"
84 HREF="node39.html#SECTION000511100000000000000">4.1.11 Peep-hole Optimizer</A>
86 <!--End of Table of Child-Links-->
89 <H2><A NAME="SECTION00051000000000000000">
94 SDCC performs a host of standard optimizations in addition to some
95 MCU specific optimizations.
99 <H3><A NAME="SECTION00051100000000000000">
100 4.1.1 Sub-expression Elimination</A>
104 The compiler does local and global common subexpression elimination,
109 <TT>i = x + y + 1; </TT>
115 will be translated to
119 <TT>iTemp = x + y </TT>
121 <TT>i = iTemp + 1 </TT>
123 <TT>j = iTemp</TT>
127 Some subexpressions are not as obvious as the above example, e.g.:
131 <TT>a->b[i].c = 10; </TT>
133 <TT>a->b[i].d = 11;</TT>
137 In this case the address arithmetic a->b[i] will be computed only
138 once; the equivalent code in C would be.
142 <TT>iTemp = a->b[i]; </TT>
144 <TT>iTemp.c = 10; </TT>
146 <TT>iTemp.d = 11;</TT>
150 The compiler will try to keep these temporary variables in registers.
154 <H3><A NAME="SECTION00051200000000000000">
155 4.1.2 Dead-Code Elimination</A>
160 <TT>int global; </TT>
162 <TT>void f () { </TT>
164 <TT> int i; </TT>
166 <TT> i = 1; /* dead store */ </TT>
168 <TT> global = 1; /* dead store */ </TT>
170 <TT> global = 2; </TT>
172 <TT> return; </TT>
174 <TT> global = 3; /* unreachable */ </TT>
184 <TT>int global; void f () </TT>
188 <TT> global = 2; </TT>
190 <TT> return; </TT>
196 <H3><A NAME="SECTION00051300000000000000">
197 4.1.3 Copy-Propagation</A>
202 <TT>int f() { </TT>
204 <TT> int i, j; </TT>
206 <TT> i = 10; </TT>
208 <TT> j = i; </TT>
210 <TT> return j; </TT>
220 <TT>int f() { </TT>
222 <TT> int i,j; </TT>
224 <TT> i = 10; </TT>
226 <TT> j = 10; </TT>
228 <TT> return 10; </TT>
233 Note: the dead stores created by this copy propagation will be eliminated
234 by dead-code elimination.
238 <H3><A NAME="SECTION00051400000000000000">
239 4.1.4 Loop Optimizations</A>
243 Two types of loop optimizations are done by SDCC loop invariant lifting
244 and strength reduction of loop induction variables. In addition to
245 the strength reduction the optimizer marks the induction variables
246 and the register allocator tries to keep the induction variables in
247 registers for the duration of the loop. Because of this preference
248 of the register allocator, loop induction optimization causes an increase
249 in register pressure, which may cause unwanted spilling of other temporary
250 variables into the stack / data space. The compiler will generate
251 a warning message when it is forced to allocate extra space either
252 on the stack or data space. If this extra space allocation is undesirable
253 then induction optimization can be eliminated either for the entire
254 source file (with -noinduction option) or for a given function only
255 using #pragma NOINDUCTION.
263 <TT>for (i = 0 ; i < 100 ; i ++) </TT>
265 <TT> f += k + l;</TT>
273 <TT>itemp = k + l; </TT>
275 <TT>for (i = 0; i < 100; i++) </TT>
277 <TT> f += itemp;</TT>
281 As mentioned previously some loop invariants are not as apparent,
282 all static address computations are also moved out of the loop.
286 Strength Reduction, this optimization substitutes an expression by
287 a cheaper expression:
291 <TT>for (i=0;i < 100; i++)</TT>
293 <TT> ar[i*5] = i*3;</TT>
301 <TT>itemp1 = 0; </TT>
303 <TT>itemp2 = 0; </TT>
305 <TT>for (i=0;i< 100;i++) { </TT>
307 <TT> ar[itemp1] = itemp2; </TT>
309 <TT> itemp1 += 5; </TT>
311 <TT> itemp2 += 3; </TT>
317 The more expensive multiplication is changed to a less expensive addition.
321 <H3><A NAME="SECTION00051500000000000000">
322 4.1.5 Loop Reversing</A>
326 This optimization is done to reduce the overhead of checking loop
327 boundaries for every iteration. Some simple loops can be reversed
328 and implemented using a ``decrement and jump if not zero'' instruction.
329 SDCC checks for the following criterion to determine if a loop is
330 reversible (note: more sophisticated compilers use data-dependency
331 analysis to make this determination, SDCC uses a more simple minded
337 <LI>The 'for' loop is of the form
341 <TT>for (<symbol> = <expression> ; <sym> [< | <=] <expression>
342 ; [<sym>++ | <sym> += 1])</TT>
344 <TT> <for body></TT></LI>
345 <LI>The <for body> does not contain ``continue'' or 'break''.</LI>
346 <LI>All goto's are contained within the loop.</LI>
347 <LI>No function calls within the loop.</LI>
348 <LI>The loop control variable <sym> is not assigned any value within the
350 <LI>The loop control variable does NOT participate in any arithmetic operation
351 within the loop.</LI>
352 <LI>There are NO switch statements in the loop.</LI>
354 Note djnz instruction can be used for 8-bit values <I>only</I>, therefore
355 it is advantageous to declare loop control symbols as <I>char</I>.
356 Ofcourse this may not be possible on all situations.
360 <H3><A NAME="SECTION00051600000000000000">
361 4.1.6 Algebraic Simplifications</A>
365 SDCC does numerous algebraic simplifications, the following is a small
366 sub-set of these optimizations.
370 <TT>i = j + 0 ; /* changed to */ i = j; </TT>
372 <TT>i /= 2; /* changed to */ i >>= 1; </TT>
374 <TT>i = j - j ; /* changed to */ i = 0; </TT>
376 <TT>i = j / 1 ; /* changed to */ i = j;</TT>
380 Note the subexpressions given above are generally introduced by macro
381 expansions or as a result of copy/constant propagation.
385 <H3><A NAME="SECTION00051700000000000000">
386 4.1.7 'switch' Statements</A>
390 SDCC changes switch statements to jump tables when the following conditions
396 <LI>The case labels are in numerical sequence, the labels need not be
397 in order, and the starting number need not be one or zero.
401 <TT>switch(i) { switch (i)
404 <TT>case 4:... case 1: ... </TT>
406 <TT>case 5:... case 2: ... </TT>
408 <TT>case 3:... case 3: ... </TT>
410 <TT>case 6:... case 4: ... </TT>
412 <TT>} }</TT>
415 Both the above switch statements will be implemented using a jump-table.</LI>
416 <LI>The number of case labels is at least three, since it takes two conditional
417 statements to handle the boundary conditions.</LI>
418 <LI>The number of case labels is less than 84, since each label takes
419 3 bytes and a jump-table can be utmost 256 bytes long. </LI>
421 Switch statements which have gaps in the numeric sequence or those
422 that have more that 84 case labels can be split into more than one
423 switch statement for efficient code generation, e.g.:
427 <TT>switch (i) { </TT>
429 <TT>case 1: ... </TT>
431 <TT>case 2: ... </TT>
433 <TT>case 3: ... </TT>
435 <TT>case 4: ... </TT>
437 <TT>case 9: ... </TT>
439 <TT>case 10: ... </TT>
441 <TT>case 11: ... </TT>
443 <TT>case 12: ... </TT>
449 If the above switch statement is broken down into two switch statements
453 <TT>switch (i) { </TT>
455 <TT>case 1: ... </TT>
457 <TT>case 2: ... </TT>
459 <TT>case 3: ... </TT>
461 <TT>case 4: ... </TT>
469 <TT>switch (i) { </TT>
471 <TT>case 9: ... </TT>
473 <TT>case 10: ... </TT>
475 <TT>case 11: ... </TT>
477 <TT>case 12: ... </TT>
482 then both the switch statements will be implemented using jump-tables
483 whereas the unmodified switch statement will not be.
487 <H3><A NAME="SECTION00051800000000000000">
488 4.1.8 Bit-shifting Operations.</A>
492 Bit shifting is one of the most frequently used operation in embedded
493 programming. SDCC tries to implement bit-shift operations in the most
494 efficient way possible, e.g.:
497 <TT>unsigned char i;</TT>
501 <TT>i>>= 4; </TT>
507 generates the following code:
510 <TT>mov a,_i </TT>
512 <TT>swap a </TT>
514 <TT>anl a,#0x0f </TT>
520 In general SDCC will never setup a loop if the shift count is known.
525 <TT>unsigned int i; </TT>
529 <TT>i >>= 9; </TT>
539 <TT>mov a,(_i + 1) </TT>
541 <TT>mov (_i + 1),#0x00 </TT>
543 <TT>clr c </TT>
545 <TT>rrc a </TT>
551 Note that SDCC stores numbers in little-endian format (i.e. lowest
556 <H3><A NAME="SECTION00051900000000000000">
557 4.1.9 Bit-rotation</A>
561 A special case of the bit-shift operation is bit rotation, SDCC recognizes
562 the following expression to be a left bit-rotation:
566 <TT>unsigned char i; </TT>
570 <TT>i = ((i << 1) | (i >>
577 will generate the following code:
581 <TT>mov a,_i </TT>
589 SDCC uses pattern matching on the parse tree to determine this operation.Variations
590 of this case will also be recognized as bit-rotation, i.e.:
594 <TT>i = ((i >> 7) | (i <<
595 1)); /* left-bit rotation */</TT>
599 <H3><A NAME="SECTION000511000000000000000">
600 4.1.10 Highest Order Bit</A>
604 It is frequently required to obtain the highest order bit of an integral
605 type (long, int, short or char types). SDCC recognizes the following
606 expression to yield the highest order bit and generates optimized
611 <TT>unsigned int gint; </TT>
614 <TT>foo () { </TT>
616 <TT>unsigned char hob; </TT>
618 <TT> ... </TT>
620 <TT> hob = (gint >> 15) & 1; </TT>
622 <TT> .. </TT>
628 will generate the following code:
631 <TT> 61
632 ; hob.c 7 </TT>
634 <TT> 000A E5*01 62
635 mov a,(_gint + 1) </TT>
637 <TT> 000C 33 63
638 rlc a </TT>
640 <TT> 000D E4 64
641 clr a </TT>
643 <TT> 000E 13 65
644 rrc a </TT>
646 <TT> 000F F5*02 66
647 mov _foo_hob_1_1,a</TT>
650 Variations of this case however will <I>not</I> be recognized. It
651 is a standard C expression, so I heartily recommend this be the only
652 way to get the highest order bit, (it is portable). Of course it will
653 be recognized even if it is embedded in other expressions, e.g.:
657 <TT>xyz = gint + ((gint >> 15) & 1);</TT>
661 will still be recognized.
665 <H3><A NAME="SECTION000511100000000000000">
666 4.1.11 Peep-hole Optimizer</A>
670 The compiler uses a rule based, pattern matching and re-writing mechanism
671 for peep-hole optimization. It is inspired by <I>copt</I> a peep-hole
672 optimizer by Christopher W. Fraser (cwfraser@microsoft.com). A default
673 set of rules are compiled into the compiler, additional rules may
674 be added with the <I>-peep-file <filename></I> option. The rule language
675 is best illustrated with examples.
679 <TT>replace { </TT>
681 <TT> mov %1,a </TT>
683 <TT> mov a,%1</TT>
685 <TT>} by {</TT>
687 <TT> mov %1,a</TT>
693 The above rule will change the following assembly sequence:
697 <TT> mov r1,a </TT>
699 <TT> mov a,r1</TT>
711 Note: All occurrences of a <I>%n</I> (pattern variable) must denote
712 the same string. With the above rule, the assembly sequence:
716 <TT> mov r1,a </TT>
718 <TT> mov a,r2</TT>
722 will remain unmodified.
726 Other special case optimizations may be added by the user (via <I>-peep-file
727 option</I>). E.g. some variants of the 8051 MCU allow only <TT>ajmp</TT>
728 and <TT>acall</TT>. The following two rules will change all <TT>ljmp</TT>
729 and <TT>lcall</TT> to <TT>ajmp</TT> and <TT>acall</TT>
733 <TT>replace { lcall %1 } by { acall %1 } </TT>
735 <TT>replace { ljmp %1 } by { ajmp %1 }</TT>
739 The <I>inline-assembler code</I> is also passed through the peep hole
740 optimizer, thus the peephole optimizer can also be used as an assembly
741 level macro expander. The rules themselves are MCU dependent whereas
742 the rule language infra-structure is MCU independent. Peephole optimization
743 rules for other MCU can be easily programmed using the rule language.
747 The syntax for a rule is as follows:
751 <TT>rule := replace [ restart ] '{' <assembly sequence> '\n'
754 <TT> '}' by '{' '\n'
757 <TT> <assembly
758 sequence> '\n' </TT>
760 <TT> '}' [if <functionName>
761 ] '\n' </TT>
765 <assembly sequence> := assembly instruction (each instruction including
766 labels must be on a separate line).
770 The optimizer will apply to the rules one by one from the top in the
771 sequence of their appearance, it will terminate when all rules are
772 exhausted. If the 'restart' option is specified, then the optimizer
773 will start matching the rules again from the top, this option for
774 a rule is expensive (performance), it is intended to be used in situations
775 where a transformation will trigger the same rule again. A good example
776 of this the following rule:
780 <TT>replace restart { </TT>
782 <TT> pop %1 </TT>
784 <TT> push %1 } by { </TT>
786 <TT> ; nop </TT>
792 Note that the replace pattern cannot be a blank, but can be a comment
793 line. Without the 'restart' option only the inner most 'pop' 'push'
794 pair would be eliminated, i.e.:
798 <TT> pop ar1 </TT>
800 <TT> pop ar2 </TT>
802 <TT> push ar2 </TT>
804 <TT> push ar1</TT>
812 <TT> pop ar1 </TT>
814 <TT> ; nop </TT>
816 <TT> push ar1</TT>
820 <I>with</I> the restart option the rule will be applied again to the
821 resulting code and then all the pop-push pairs will be eliminated
826 <TT> ; nop </TT>
828 <TT> ; nop</TT>
832 A conditional function can be attached to a rule. Attaching rules
833 are somewhat more involved, let me illustrate this with an example.
837 <TT>replace { </TT>
839 <TT> ljmp %5 </TT>
843 <TT>} by { </TT>
845 <TT> sjmp %5 </TT>
849 <TT>} if labelInRange</TT>
853 The optimizer does a look-up of a function name table defined in function
854 <I>callFuncByName</I> in the source file SDCCpeeph.c, with the name
855 <I>labelInRange</I>. If it finds a corresponding entry the function
856 is called. Note there can be no parameters specified for these functions,
857 in this case the use of <I>%5</I> is crucial, since the function
858 <I>labelInRange</I> expects to find the label in that particular variable
859 (the hash table containing the variable bindings is passed as a parameter).
860 If you want to code more such functions, take a close look at the
861 function labelInRange and the calling mechanism in source file SDCCpeeph.c.
862 I know this whole thing is a little kludgey, but maybe some day we
863 will have some better means. If you are looking at this file, you
864 will also see the default rules that are compiled into the compiler,
865 you can add your own rules in the default set there if you get tired
866 of specifying the -peep-file option.
870 <!--Navigation Panel-->
871 <A NAME="tex2html841"
873 <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="next_motif.gif"></A>
874 <A NAME="tex2html835"
876 <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="up_motif.gif"></A>
877 <A NAME="tex2html829"
879 <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="previous_motif.gif"></A>
880 <A NAME="tex2html837"
882 <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="contents_motif.gif"></A>
883 <A NAME="tex2html839"
885 <IMG WIDTH="43" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="index" SRC="index_motif.gif"></A>
887 <B> Next:</B> <A NAME="tex2html842"
888 HREF="node40.html">4.2 Pragmas</A>
889 <B> Up:</B> <A NAME="tex2html836"
890 HREF="node38.html">4. SDCC Technical Data</A>
891 <B> Previous:</B> <A NAME="tex2html830"
892 HREF="node38.html">4. SDCC Technical Data</A>
893   <B> <A NAME="tex2html838"
894 HREF="node1.html">Contents</A></B>
895   <B> <A NAME="tex2html840"
896 HREF="node61.html">Index</A></B>
897 <!--End of Navigation Panel-->
900 <BR><I>2001-07-13</I>