GoLang Brainfuck Interpreter

December 23, 2022

Let me open by saying I have no idea what I am doing other than learning and experimenting.

With that disclaimer let’s start with what is Brainfuck? Let this blurb I stole from wiki do the work for me…

Brainfuck is an esoteric programming language created in 1993 by Urban Müller.

Notable for its extreme minimalism, the language consists of only eight simple commands, a data pointer, and an instruction pointer. While it is fully Turing complete, it is not intended for practical use, but to challenge and amuse programmers. Brainfuck requires one to break commands into microscopic steps.

A program in Brainfuck might look something like this >>++++[>++++++++<-]. In the sample, we can see some of the valid characters used. Most of the characters are visually descriptive of their actions like “>” represents shifts in the direction like an arrow and “+-“ represents increment or decrement operations.

I want to create a program in Go that can take a Brainfuck program as input and execute it.

From the description, we can pull out most of the requirements.

To create an interpreter I implemented some concepts I learned about in College. First, a lexer will validate the syntax and convert the characters to a tokenized list. Next, a parser will interpret the tokens into an AST (abstract syntax tree). Honestly not sure if AST is the correct term for this, the “tree” is a linear set of instructions because the language does not have conditionals or assignments. Let’s just consider it a list of “enhanced” operations to be interpreted by the parser. Finally the “AST” will be used by the interpreter to execute the instruction set.

Lexer

The lexer was the easiest part to implement. A lexer’s job is to take the raw string input and turn it into a flat list of Token types. For Brainfuck I implemented a Token Type based on the language specifications. A token has an Enum and literal that represents the raw input.

package lexer

type TokenType int

const (
    INCREMENT TokenType = iota
    DECREMENT
    SHIFT_LEFT
    SHIFT_RIGHT
    OUTPUT
    INPUT
    OPEN_LOOP
    CLOSE_LOOP
    NEW_LINE
    EOF
)

type Token struct {
    TokenType TokenType
    Literal   string
}

NEW_LINE allows for new lines to be included in the input and EOF represents the end of the input.

The lexer then uses a large case statement to map the character inputs to the correct types. Using the default as the error catch surfaces the unknown characters to the user.

    switch char {
    case '+':
        tokens[i] = Token{TokenType: INCREMENT, Literal: string(char)}
    case '-':
        tokens[i] = Token{TokenType: DECREMENT, Literal: string(char)}
    case '>':
        tokens[i] = Token{TokenType: SHIFT_RIGHT, Literal: string(char)}
    .
    .
    .
    default:
        err := fmt.Sprintf("Error at %d: Invalid character \"%s\" ", i, string(char))
        return nil, errors.New(err)
    }

Parse

The parser creates a flat list of Operations that will be given to the interpreter to run. The parser looks similar to the Lexer but with some extra features to handle loops.

type Operation struct {
    Token lexer.Token
    Count int // used for instruction optimization
    Jump  int // set only if loop
}

Operations have a lexer.Token and Jump attributes. The Jump attribute is only set when loop tokens are parsed. The jump attribute tells the interpreter where to move the instructions pointer (IP). I left out the Count parameter because I have not implemented it but, it represents the number of identical operations that can be compressed by the parser. For example “>++++” reads as “Shift Data Pointer Right and Increment 4 times”. Rather than generating 4 individual increment operations, we could optimize this by setting the count to 4 on the first operation and discarding the duplicates.

In the code notice, the only operations that need extra consideration are the loops and EOF. For the loops, I used a stack to make sure all the loop’s parenthesis match and store their jump points in the AST node. When the parser reaches the EOF then the stack should be empty, indicating that all parenthesis have matching sets. If you had to do any leetcode code questions on stacks this might be familiar lmao.

        switch token.TokenType {
        case lexer.DECREMENT,
            lexer.INCREMENT,
            lexer.SHIFT_LEFT,
            lexer.SHIFT_RIGHT,
            lexer.INPUT,
            lexer.OUTPUT:
            operations[i] = createOperation(i, tokens)

        case lexer.OPEN_LOOP:
            stack.Push(i)
            operations[i] = createLoop(-1, lexer.OPEN_LOOP) // jump position is unknown for now

        case lexer.CLOSE_LOOP:
            if stack.IsEmpty() {
                return nil, errors.New("closing bracket does not match any opening bracket")
            }
            open_loop, _ := stack.Pop()
            operations[i] = createLoop(open_loop, lexer.CLOSE_LOOP) // send end loop jump
            operations[open_loop].Jump = i                          // set the open loop jmp to end

        case lexer.EOF:
            if !stack.IsEmpty() {
                return nil, errors.New("unmatched opening bracket")
            }
            continue

Interpreter

LETS GO now I have all the instructions lexed and parsed, and I can implement the execution flow. From the description, I can simulate the environment by adding the pointers and memory. The memory is set to 30000 bytes as described in the wiki.

type Brainfuck struct {
    Instructions []parser.Operation
    Memory       [30000]int
    dp           int    // Data Pointer
    ip           int    // Instruction Pointer
}

Similar to all the examples above I used switch statements to execute the behavior for each operation

        switch op.Token.TokenType {

        case lexer.INCREMENT:
            bf.Memory[bf.dp]++

        case lexer.DECREMENT:
            bf.Memory[bf.dp]--

        case lexer.SHIFT_LEFT:
            bf.dp--

        case lexer.SHIFT_RIGHT:
            bf.dp++

        case lexer.OUTPUT:
            fmt.Print(bf.Memory[bf.dp])

        case lexer.INPUT:
            var input int
            _, _ = fmt.Scan(&input)
            bf.Memory[bf.dp] = input

        case lexer.OPEN_LOOP:
            if bf.Memory[bf.dp] <= 0 {
                bf.ip = op.Jump + 1
            }
        case lexer.CLOSE_LOOP:
            bf.ip = op.Jump
        }

Testing

Now I can test the interpreter by running the following BrainFuck Program; guess what a first program might be.

func main() {
    program := "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."
    bf, err := interpreter.NewBrainfuck(program)
    if err != nil {
        fmt.Println(err)
        os.Exit(2)
    }
    bf.Interpret()
}

// Returns: 7210110810811144328711111410810033 (wtf)

Convert the output from decimal to ASCII and you get Hello, World!.

There is more testing to be done but, I am pretty happy with the results!

References