::::: : the wood : davidrobins.net

CSEP501: the compiler is done

News, School ·Tuesday December 15, 2009 @ 22:17 EST (link)

I had my final report for my CSEP 501: Compiler Construction course today; it was a 20-minute scheduled block with the professor; just a fairly informal discussion about how the project went. The project was to build a "MiniJava" compiler in Java targeting Intel x86 (32-bit). Variations were permitted with permission of the instructor (e.g., we used C++; others used F#, Python, etc., or targeted MIPS or Intel x86-64). For the first two parts (scanner and parser) I was in a group of two; we parted amicably after my partner realized I wanted do to several language extensions which (although I said I'd do them myself) and didn't think he'd have time that quarter; so I worked alone on type-checking, semantics, and the big final code generation project. My final project archive is available.

Now that I'm done, it's strange that my (GNU) Screen window #2 is no longer "501" and window #4 no longer "test"; I've had those windows set that way for months now, flipping back and forth as I worked on the source and ran tests.

Speaking of tests, one of the things that helped the most was having a lot of test programs that allowed gauging the state of the compiler at any given time. Toward the end, when code was generated, I had my test framework assemble, link, and run the tests that were intended to pass (I had another class that was intended to ensure the compiler detected errors when appropriate). The framework also compared the test output with known good output.

Another great help was using source control (Subversion in this case). Looking over the logs, I can find a few interesting points where good testing caught some bad problems which were all eventually fixed:

A recursive AST loop in nested compound statements ({ {} }) persisted for quite a while; it was a result of an optimization to ensure that the production top_statement consistently returned a compound statement node (CNC) even if the program only had a single statement at that point.
I tried to make the use of ECX for "this" part of the general register assignment code, but ended up having to make it a special case when that started getting unmanageable, and I hit cases where "this" was used, but the code decided not to save ECX around calls. Sometimes special cases can simplify.
Various too-aggressive optimizations actually lost information; sometimes temporaries were eliminated. For example, an overly aggressive optimization would optimize out something of the form "a = b / b = a" even in the case that one was a store. Fortunately, tests caught this pretty well.
Another optimization was accidentally erasing the wrong instruction: it removed a jump instruction, instead of an intended unused label.
Assumed for a while that the "not" instruction was boolean negation, when in fact it's one's-complement. (I actually wanted to xor with 1.)

Much of these were found by tests as said, and by having about 10 different debug options to allow dumping intermediate compiler states (the AST, IR before and after optimization, register assignment, add IR statements to generated code as comments, etc.).

As the project went on I tended to mark areas that needed fixing (where it was "good enough" now, or I'd used a temporary hack) with either "TODO" or "XXX" comments, which worked well: it kept the work items obvious and easy to find, and local to where they needed to be done. But in larger projects, they can become overused and get ignored: Word has many TODOs, most of which just don't matter any more (fixed another way; current algorithm is sufficient, etc.).

This is the final report I submitted, detailing the stages of my compiler.

CSE P 501 course MiniJava compiler project, part V - Report.

Tools (versions used in parentheses, although other versions may also work): C++ (GCC 4.4.1), GNU Flex (2.5.35), GNU Bison (2.4.1), SCons (1.2.0), Perl (5.8.8), Boost (1.39).

FEATURES

All features are believed to work, plus certain extensions (see below, EXTRA FEATURES).

TEST PROGRAMS

The test programs available from http://www.cambridge.org/resources/052182060X/#programs were used, as well as a few of my own, both programs designed to pass (test/*.java) and programs designed for the compiler to catch errors (test/error/*.{java,txt}). All programs in the main test folder compile, assemble, link (with the included mj library), and run to completion without errors (not all return 0; some with void main functions return nonzero, but none return error codes indicating premature termination). For the programs from the above site, I examined the source and ensured that the right results were printed; they were very helpful in diagnosing problems. Each program also has a '.out' file which is the verified output; the test/runtests script does a 'diff' against it and the actual output, which provides another level of security against bad changes.

PrintArgs.java illustrates the passing of command-line arguments and the handling of strings (an array of strings, in this case). PrintArray.java shows more array handling, including an array of base pointers to various derived classes.

IfNot.java illustrates an optimization to remove a 'not' instruction and reverse the jcc/setcc condition.

LongAdd.java and DeepArray.java were unsuccessful attempts to cause register spills.

VoidDivide.java illustrates optimizing out (most of) a divide in void context (we don't care about the numberator, just if it divides by zero).

Or.java tests the boolean or (||) extension, using System.assert(), which would cause test/runtests, the test host, to fail if any asserts fail.

SpillTest.java has a deeply nested hierarchical comparison, which ensures we run out of registers, and exercises the spill code.

EXTRA FEATURES

Most extra features were completed for the parser assignment, thus they are described in README.2. Optimizations are described below in the appropriate stage under DESCRIPTION.

DESCRIPTION

The compiler runs several stages and passes and each does particular transformations on the input tree or list, or produces a new representation.

0. Driver: The process is driven via the argument parser in main.cc and the driver class (SPD, driver.h) which runs the pipeline (SPD::Parse) and provides context (e.g., command-line options) and global functionality (e.g., error reporting). To avoid unnecessary error checking, if one stage fails the pipeline is usually aborted (e.g., after type checking the code would bloat enormously if later stages had to account for undefined types).

1. Scanner: (see scanner.l and README.1; extensions include "void", other comparison operators besides "<", divide ("/"); also "this", "super", and "System.out.println" are not keywords; they're general identifiers handled later). Scanned tokens have location information embedded.

2. Parser: (see parser.y and README.2 for language extensions). The parser creates an AST of nodes inheriting from the base syntax node (SN, node.h), the main types being expressions (XN, exp.h), statements (CN, stmt.h), and declarations (DN, decl.h, which also includes classes, SNC/class.h, and methods, SNM/method.h, and the top-level program, SPD/driver.h). Nodes contain a generic list of children; most have specific accessors to improve readability (e.g., binary expressions, XN2, have XnLhs and XnRhs child accessors). If the number of children is indeterminate (e.g., parameters), a secondary container node is used.

3. System: SPD::SPD creates the static methods System.out.println, and a few handy extensions: System.out.print (takes a string parameter), System.abort (calls C library abort()), and System.assert (calls abort if passed boolean is false). The String built-in class is also created here. System classes may not be inherited from, and only String may be instantiated. The implementation is provided in the target mj library (see mjlib.cc).

4. Debug: along the way there are several possible debug print-outs, controlled by flags (see main.cc, or mj --help). --debug-scanner, --debug-parser, --debug-gen, --debug-reg print debug output during scanning, parsing, code generation, and register allocation respectively. --print-ast, --print-sym, --print-ir-pre, --print-ir-post, and --print-live output the AST, symbol tables, IR (before and after optimizations), and live ranges. --debug-code inserts some debug comments into the generated code.

5. Fixups: (fixup.h) Since classes can be used before declaration, this pass links type information to named classes (whether base classes, declarations, etc.). This is aided by a declaration reference type (DR, ref.h) which uses a boost::variant to store either a class name or a pointer to a declaration (DN).

6. Type-checking: (typecheck.h; see README.3) Each expression node (XN) is assigned a type, which it gets from its content (constants) or children (operators). The TI class (type.h) stores type information and has an FSubtype method to check if one type is a subtype of another. TCV::PtiTypeCheck ensures that argument types match when they need to, even when an operator can take different types (e.g., my ==/!= can also compare boolean values or class values). Method calls take on the type of their return value, and arguments are checked against declared parameters. The destination of an assignment must be a subtype of the source (e.g., Base b = new Derived() is allowed). If a method name matches a base, its parameters and return type must match too. And a non-void method must end with return (required check due to my extension making return a general statement).

7. Contexts: (context.h) Borrowing a page from perl, three contexts are defined: void, boolean, and value. Void context occurs when the result of an expression is not used; boolean context is an optimization to allow certain comparisons to pass their results in flags. Expressions in void context can be removed from the AST entirely except for method calls and division (since it could be a division by zero which has the side effect of terminating).

8. Measuring: (quant.h) Determines sizes for classes, vtable offets for non- static methods, and frame offsets for parameters and local variables. If local variables aren't used, they don't take up any space (very basic check).

9. IR: (ir.h, irv.h) The IR visitor builds a linear, 2-operand IR. (In retrospect, a 2-operand IR is simpler which is both good - it's a lower-level representation in which the universe of instructions is more restricted - and bad: some complex expressions require multiple IR instructions.) IR operations are basic (set, store, load, arithmetic, call, jump, label) and the allowed operands are constants, temporaries (future registers), labels, the frame or stack pointers, "this", and a conceptual return value. We also build a string table in this step; duplicate strings are coalesced.

10. Constants: (const.h) (This series of IR manipulators, inheriting from class IM in ir.h, is evoked from CGV::GenerateCode in code.h) This IR manipulator replaces temporaries set to constant values with the constants themselves, and also replaces series of constant expressions with the evaluated result. It also eliminates identity operations (add/subtract 0, multiply/divide by 1), and either removes or removes the condition from conditional jumps based on a constant.

11. Jumps: (jump.h) A few IR manipulators here snap jumps to jumps, remove jumps to immediately following code, and remove dead code (code between an unconditional jump and the next label, after removal of unused labels).

12. Frame: (frame.h) This IR visitor checks if the frame pointer is used. If it never is, we may be able to optimize it out later.

13. Instruction selection: (insn.h, low.h) This final IR visitor generates machine instructions (a list of LL objects). It is primarily a direct translation of the IR, with some optimizations (e.g., t0 <- FP, t0 += const, t0 <- (t0) can be matched to a single instruction). A few marker instructions are emitted to help with register allocation: start of call, end of call, epilog, prolog.

IR temporaries are still temporaries, but some register names are emtited (see e.g., LloFromOpr in insn.cc): "this" becomes ECX, return value is EAX, frame/stack EBP/ESP. Some registers are already marked as being required, defined, or used by certain instructions (see LL::DefUse in low.cc). Some register affinities are set: e.g., "idivl" uses EDX:EAX and outputs to EAX, "cdq" requires EAX and outputs EDX:EAX, and "set<cc>" requires (the low 8-bits of) EAX, EBX, ECX, or EDX on the x86 (restrictions loosen with x86_64).

14. Peephole optimizer: (peep.h) Doesn't do very much, although I had hoped to do more by storing symbolic values of the expressions in live registers and eliminating temporaries that reuse these values.

15. Register assignment: (reg.h) Manipulates the low-level instruction list by "painting" temporaries with register names. It first walks the instruction list and finds live ranges, by tracking when a temporary or register is first assigned to, when it's read, and when it's reassigned (it is an error to use an undefined temporary or register; we don't generate such code; this is not related to undefined variables, which are at a higher level, although similar code could be used to detect them earlier). ECX is artificially added as live for the whole method for non-statics.

Next, an interference graph is built, and registers are assigned in order (a cleverer algorithm could be used, but I never found a case where it was necessary, although this is not to say they don't exist; I posted to the discussion board to ask for input here). All live registers are saved/restored around function calls. This is not optimal, since the platform only requires EBX, ESI, and EDI to be saved, but I decided the internal conventions would be that only EBP need be saved, since at this point saving registers in the prolog would be difficult (it would either require modifying EBP offsets or later compound statements); this is another case where a 3-operand IR could have passed along more information and made it easier to adjust after the fact.

If no registers are available, a temporary's adjacent registers are examined; we pick the least-used candidate that doesn't collide with us, and save it around our definition/use ranges. (Collision means that the register's definition and use is interleaved with the temporary.)

E.g., in this hypothetical code segment, t3 denoting an unassigned temporary:

... (%ebx populated earlier)
    1 movl $5, %edx
    2 movl $10, t3
    3 addl %ebx, t3
    4 movl -4(%ebp), t3
... (%edx used later; %ebx and t3 not used again)

%edx is live in [1, >4], %ebx is live in [<1, 3], and t3 is live in [2, 4], so %edx is a good candidate for t3 (since its definition is before t3's and its use is after, i.e. it encloses t3 completely) but %ebx is not, since its use is interleaved with t3's.

16. Peephole optimizer: (peep.h) Run the optimizer again to see if it can do anything after register assignment or other modifications.

17. Clean: (CLN, low.h) remove any marker instructions (they're emitted as comments if they do get to the listing, so for tidiness only) unless --debug-code was specified.

18. Output: generated code is written either to stdout, or to a file specified by the --output (-o) parameter. We don't do any assembling or linking, although the Python program test/build will, as will test/runtests, assuming the GNU toolchain (as to assemble, and g++ to link, since libmj is built as C++).

There is a list of optimizations I considered (but lacked time for or the design didn't match well with them) in the file src/OPTIMIZE.

To build and run the test programs, cd to test and run ./runtests. It will just show success/failure, but it leaves the output so assembly/binaries can be examined/run afterward. I have included the assembly in the archive.