Re-writing the RapidQ compiler: June 2008

Wednesday, June 4, 2008

How to go about creating a new RapidQ compiler ?

Let us see the main parts that a compiler should be made of.

First, we need a lexical analyser to tokenise the source code. Next we need a parser which would analyse the syntax of the source code with the help of the lexical analyser. The parser will also needed a code generator to output the compiled program. Optionally there could a code optimizer to optimize the output code for more efficiency and speed.

You can build a lexical by manually coding one or by creating a lexical analyser using tools like 'Lex' or 'Flex'. Similarly, you can code the parser manually or by using software like 'Yacc' or 'Bison'. There also other such software (sometimes called compiler-compiler) like 'Gentle', 'Gold Parser' etc. 'Flex' and 'Bison' are open-source and free software. The others are freeware. Many resources like tutorials etc. are available on the Net for these softwares.

Ok. How much similar should our new compiler be, to RapidQ ? As far as the syntax of RapidQ is concerned our new compiler should be as far as possible be compatible with RapidQ because that was what has made it easy to program for novices and attracted them to RapidQ. But certain changes could be made to make it more modern but still easy to program.

The main area I would like to talk about is regarding the code generation and runtime environment. As I have said earlier in my previous post, the RapidQ compiler compiles the source code into an intermediate language, binds it with an interpreter and creates executable file containing both. When you double click the executable file, the interpreter inside the executable file loads the compiled intermediate code which is also embedded inside the same file and runs it.

If we want, we could follow the same technique. The disadvantages are :

since each executable file produced contains the interpreter also, the size of the executable file is increased by the interpreter's size.
since the compiled program is in an intermediate language and is being interpreted at run time, it is slower that an equivalent program compiled into machine language.

But the advantages are :

the intermediate code produced on compilation could be the same on all platforms ( both operating system and microprocessor) which can be interpreted by an appropriate interpreter designed for that particular platform.
The compiler writer need not known the assembly language or the machine language of different platforms since the interpreter can also be written in a high level language. He need know only about the intermediate code.

Another path that we could take is to compile the source code directly into machine language. The advantage is that the compiled code will be very fast and the size will be lesser than in the above mentioned case since there is not need of an interpreter to be embedded into the executable file. But the disadvantage is that for the code generation, the compiler should know the corresponding assembly language and architecture of the targeted platform. If you are planning to write the compiler for different platforms you will need to learn the assembly language/ machine language and architecture of the different platforms which will be a daunting task.

Whatever be the route you opt, writing a new compiler is bound to be an uphill task but also interesting and adventurous since it is fraught with many risks. Risks in the sense that there are chances of getting stuck at sometime due to the complexities and lack of theoretical knowledge of creating a compiler.

But is there a better route to take which could be easier ? Perhaps. I have an idea. But you can argue that the result cannot be called a compiler in the real sense! I agree. But by taking this route, you may be able to create a quick and dirty compiler to start with and later improve certain parts of it so that finally it could be called a real compiler. And as a bonus, you can boast that it runs on DotNet!!

Wait till my next post!

See you!

Tuesday, June 3, 2008

More about RapidQ and compilers

Is RapidQ really a compiler ?

Well the answer is 'Yes' and 'No' !! In fact you can call it a pseudo-compiler. The compiler first translates the source code into an intermediate language. This intermediate language is a proprietary language which only (perhaps) William Yu knows! Then the compiler creates an executable files (.exe) by binding an interpreter program with the translated intermediate code. So what you see when you compile a program with the RapidQ compiler is an file with the extension '.exe'. But it contains both the compiled intermediate code plus the intermediate code interpreter. So double clicking the executable file runs the interpreter which in turn loads the intermediate code within the executable file and runs it.

Thus, we can see that RapidQ compiles the source into an intermediate code and then interprets it when run. That is why I answered the question 'Yes' and 'No'.

The regular compilers are those than compiles the source code directly into the machine language of a computer. But the compiled executable program is dependent on the operating system running on the computer on which it runs. There though you may compile you program using a compiler for Windows on an Intel Pentium machine, you will not be able to run the executable on an Intel Pentium machine with Linux as the operating system. To run your program on Linux, you should compile your program using a compiler for Linux operating system.

A compiler vendor may sell different versions of their compiler for Windows as well as Linux. In that case you may take your program's source code and use the compiler version that relates to your operating system to create an executable file. For example, if you have written a program in the C language which only uses the features that conforms to the ANSI C standards, you may be lucky to just use an ANSI C compiler on Linux to compile your source code for Linux without any changes. So to an extent, C is called a portable language. But not fully.

At present the most portable language is Java. Using Java, you can compile your source code into a ".class" file.
This file contains an intermediate code for a virtual machine called the 'Java Virtual Machine' or JVM for short. Note that it is not an executable file as such. It does not have a '.exe' or '.com' extensions but only a '.class' extension. Then how is this 'class' executed ? Well, you need to install a Java Runtime software on your computer. So if you have installed it, then you will be able to run the '.class' file by giving the command 'java yourprogram.class" on your computer. One advantage of Java is, that you can copy the compiled '.class' file directly from Windows to Linux or vice versa and then run it without any modification or tweaking. How is this done ? Well, Sun Microsystems ( the creators of Java) releases different Java Runtime software for different platforms ( Operating System + Microprocesser). The Java Runtime just intereprets the compiled intermediate code for the Java Virtual Machine and runs it. You just have to download the appropriate Java Runtime for your combination of Operating System and hardware.

Are any similarities between RapidQ with Java ?

Well, both are available for Windows and Linux. While the compiled code is protable in the case of Java, in the case of RapidQ only the source code is portable. Both create intermediate code which is then interpreted while running. In the case of Java, a separate software called the Java Runtime interprets the compiled code whereas in the case of RapidQ the interpreter that is bound to the intermediate code interprets it.

So much regarding a comparison of RapidQ with other compilers.

Bye!

Sunday, June 1, 2008

What is RapidQ ? Why re-write the compiler ?

The RapidQ compiler written by William Yu was popular because it was a Basic compiler which was easy to learn, easy to create both DOS and GUI programs and was a freeware. It is a cross-platform compiler in the sense that the versions are available for both Windows and Linux. The easiness with which one could learn RapidQ attracted people who were new to programming to RapidQ. Further, one thing added to its popularity was that it was a freeware. To know how popular the compiler is even today, search for 'RapidQ' using Google and see the number of results!

But RapidQ was not complete. It was in the beta stage of development when it was abandoned by William Yu. I read somewhere that the company that makes the 'RealBasic' compiler bought the sourcecode of RapidQ from William Yu. So further development was stopped. But the compiler in beta stage was released as a freeware on the Net. There are reports of 'memory leak' in the compiler. Somewhere on the Net, I have found someone claiming that the memory leak was fixed ! But how ?! I wonder ( more about it later).

The software release came with an IDE which was buggy. It looked like a small brother of the Visual Basic editor. You could drag and drop controls on the form in the Design view and edit the code in the 'Code view'. There is a property window where you can edit the properties of the controls on the form you are designing. But that IDE too was not complete.

There were some attempts to re-create the RapidQ compiler. But the only attempt that I have found to be successful was the 'HotBasic' compiler. Though it is not claimed to be 100% compatible, it is claimed to be almost similar to RapidQ.

There was no way to further develop the RapidQ compiler since its source code was not released to the public. So only the only option remaining was to re-write it from scratch which is indeed a daunting task. Writing compilers is one of the most difficult areas in computer science. It is highly technical and one look at the contents text books will make a lay hobby programmer dizzy! Writing a compiler is highly complicated and time consuming. That may be one reason for the failure of such attempts.

But is there a way to write a compiler without going such highly technical details ? May be to some extent. But in some cases we may have to rely on some of the compiler theory in computer science.

I will discuss more about this in my next post. See you then!

Re-writing the RapidQ compiler