The Challenge

Erythro was born of a small challenge between me and a friend - a programming competition had been brewing for several months, and with the 2020 virus lockdown we'd both been left with nothing else to do, so the challenge was formalised:

The first person to create a language specification and a working compiler for said specification, that achieves the following requirements:

- is x86_64 native

- is self-hosted (the compiler can compile itself)

- supports constant folding (any arithmetic in the program gets folded into a constant)

- produces a valid ELF64 executable by default, but optionally produces an executable of the format defined below.

- can cross-compile to AARCH64 (tested with a Raspberry Pi 3B)

- compiles a mini-kernel for the chosen architecture if the argument --compile-example-program is passed

- outputs GDB-compatible debugging symbols by default, but with the option to not include them

wins the competition.

There is no prize, only the utility of having these tools ready for when we use them.. because we both wanted to do this for our own separate projects - I wanted this for Chroma, destoer wanted this for his Nintendo 64 emulator.

The Format

This is a draft specification, but any changes will be incremental, in raised boxes like this one, above the real text. This means that any major revisions to the spec will appear in place of the old one, but there will be a record going back in time as you scroll down.

The first idea for a custom executable format is using a Matroska container (.mkv file). This means that theoretically the file could both; run in our kernels, and be played as a video in any program supporting MKV, including any video players that are created on top of our kernels.

A video player inside this custom Matroska format, which is capable of playing itself as a video, is an amazing concept..

The Plan

Look & Feel

There's just something about a well-written piece of code, in a language that manages to make it look nice. I'm of the humble opinion that for some things, C++ manages this really well. For other things, Java is the way to go. I'd even go as far to say that Python can make some godawful code look nice.

When it comes to functionality, though, there is one thing that stands out, especially for things like this: C. It's quite possible that it'll make the code look terrible; loops and ifs nested 150 layers deep, but nothing beats the solidity of that platform.

This language is designed to be used for my kernel/os: Chroma. Thus, i want it to have a featureset that makes it specifically adept at that. I'll be taking inspiration from a great many things, and the combination, mix and recipe will be documented here.

Some people would say that a featureset like Rust is ideal for this: complete memory safety, easy and quick to iterate.. To that i say no. Just, no.

So, let's have a quick look at features that I'd like to have in this language. To do that, let's first have a look at some parts of osdev that i find quite frankly annoying.


This isn't going to be a whistlestop tour - I'm going to try to get as in-depth as i possibly can, this is an extremely important step.

Inline ASM

The single most important thing for me, in this language, is easy access to inline asm. The GCC way of doing this is with this gastly mess:

GCC Inline ASM Syntax
__asm__ __volatile__("command" : output from asm : input to asm : register anti-clobber list);

So, take this example pulled straight from the Chroma source:

Chroma ASM snippet
uint32_t ReadPort(uint16_t Port, int Length) {
  uint32_t Data;
  if(Length == 1) { // Read a byte
    __asm__ __volatile__("inb %[address], %[value]" : : [value] "a" ((uint8_t)  Data), [address] "d" (Port) :); 
  } else if (Length == 2) { // Read a word
	__asm__ __volatile__("inw %[address], %[value]" : : [value] "a" ((uint16_t) Data), [address] "d" (Port) :); 
  } else if (Length == 4) { // Read a long (dword)
	__asm__ __volatile__("inl %[address], %[value]" : : [value] "a" (Data), [address] "d" (Port) :); 
  } else {
	printf("ReadPort: Invalid Read Length.\r\n");

  return Data;

Take a long, hard look at this code. Try to understand it from what i've given you, the comments should let you infer what the ASM does even if you can't read assembly. Then, find the error.

That's right. This function which is meant to read a hardware port and return it as an integer, actually wants the data as input, to output it to the port. This syntax is terrible; hard to read, write and understand, and for things like this where there absolutely is no escaping ASM, an alternative is needed desperately.

For my solution, we have to escape the idea that this language will ever be general purpose. Having easy access to raw assembly language is powerful, and should not be used lightly. Nonetheless, let's also look at the section of code that switches the code segment into the newly formed GDT entry:

Chroma GDT Segment Switch
__asm__ __volatile__ ("mov $16, %ax \n\t" // 16 = 0x10 = index 2 of GDT
                    "mov %ax, %ds \n\t" // Next few lines prepare the processor for the sudden change into the data segment
                    "mov %ax, %es \n\t"
                    "mov %ax, %fs \n\t"
                    "mov %ax, %gs \n\t"
                    "mov %ax, %ss \n\t"
                    "movq $8, %rdx \n\t" // 8 = 0x8 = index 1 of GDT
                    "leaq 4(%rip), %rax \n\t" // Returns execution to immediately after the iret, required for changing %cs while in long mode. - this is currently literally the only way to do it.
                    "pushq %rdx \n\t" 
                    "pushq %rax \n\t"
                    "lretq \n\t");

Sure, this is readable, and with the \ns you can easily tell what is supposed to be happening here. However, that's a lot of typing for something that's incredibly common in osdev.


Sure, calling this a "solution" is a stretch, but here's how Erythro is going to tackle it:

Erythro ASM Syntax
asm {

There are more details to consider here - consider [value] "a" ((uint8_t) Data) .

It casts the 32-bit Data variable to 8 bits, and passes that to the [value] variable in the ASM.

To do something like that would take a lot more effort than i have for this full challenge, so i'll have to erase the possibility of allowing casting in ASM snippets. However, since this is all lexed in the compiler I should be good to simply use the variable as a name in asm:

Erythro ASM Variable
uint8_t readValue;

asm {
	inb readValue, $26

And with that small adjustment, it all becomes a lot easier. The gastly multi line GCC behemoth above becomes streamlined:

Erythro ASM Multiline
asm {			
  mov $16, ax 
  mov ax, ds
  mov ax, es
  mov ax, fs
  mov ax, gs
  mov ax, ss
  movq $8, rdx
  leaq 4(rip), rax
  pushq rdx
  pushq rax

This also introduces another feature I want:

Register Access

One exceedingly annoying thing about C, despite being "one of the lowest level languages you can use", apart from of course Assembler language and raw machine code itself (and butterflies!), is that it completely removes you from accessing the CPU directly. This makes sense in the modern world - security, right?

WRONG. Well, not wrong, just annoying. Erythro is designed specifically for writing kernels, which by default are ring 0. Thus, i want registers. I want them now.

One notable side-effect of the mov ax, ds syntax of the asm sections, is that ax and ds are now not usable as variable names. As a hunch, i guess this is why GCC requires they be in %register form. However, i actually want these register names to be keywords - so that i can access them in code! Thus, this is perfect.

So, let's make a list of all the registers that are useful in programming an OS:

Well, all of them.. If we made a list of all possible MSRs (Model Specific Registers) of every x86_64/AARCH64 CPU to ever exist.. well, it'd get exhausting quickly. So, it seems we need to think this through.


x86_64 uses 16 general purpose registers:
  1. rax
  2. rbx
  3. rcx
  4. rdx
  5. rsi
  6. rdi
  7. rsp
  8. rbp
  9. r8
  10. r9
  11. r10
  12. r11
  13. r12
  14. r13
  15. r14
  16. r15

These registers represent the (uint64_t) versions of these registers.

There are variants:

  1. eax, ebx, ecx, edx, r8d, r9d, r10d... etc

    represent the (uint32_t) casts of the 64 bit versions - that is, they are the lower 32 bits.

  2. ax, bx, cx, dx, r8w, r9w, r10w... etc

    represent the (uint16_t) casts of the 64 bit versions - that is, they are the lower 16 bits.

  3. al, bl, cl, dl, r8b, r9b, r10b... etc

    represent the (uint8_t) casts of the 64 bit versions - that is, they are the lower 8 bits.

  4. There may be a pattern here that you've noticed - the r9 versions use word terminology rather than x/e/r:

    r8 is the 64 bit (quad-word) version,

    r8d is the 32 bit (double-word) version,

    r8w is the 16 bit (word) version,

    r9b is the 8 bit (byte) version.

    This feels really intuitive to me, so i'll steal that too.


AARCH64 is just a fancy name for 64-bit ARM v8 Architecture.

AARCH64 uses 31 general purpose registers, handily all named Xx; X0 through X30.
Amazingly, this is the same amount of information as is described to the left. Well, moving on..

Writing the compiler: The Syntax