Let me open by saying I have no idea what I am doing other than learning and experimenting.
With that disclaimer let’s start with what is Brainfuck? Let this blurb I stole from wiki do the work for me…
Brainfuck is an esoteric programming language created in 1993 by Urban Müller.
Notable for its extreme minimalism, the language consists of only eight simple commands, a data pointer, and an instruction pointer. While it is fully Turing complete, it is not intended for practical use, but to challenge and amuse programmers. Brainfuck requires one to break commands into microscopic steps.
A program in Brainfuck might look something like this >>++++[>++++++++<-]
. In the sample, we can see some of the valid characters used. Most of the characters are visually descriptive of their actions like “>” represents shifts in the direction like an arrow and “+-“ represents increment or decrement operations.
I want to create a program in Go that can take a Brainfuck program as input and execute it.
From the description, we can pull out most of the requirements.
- 8 Instructions
- Data pointer
- Instruction Pointer
- Memory
To create an interpreter I implemented some concepts I learned about in College. First, a lexer will validate the syntax and convert the characters to a tokenized list. Next, a parser will interpret the tokens into an AST (abstract syntax tree). Honestly not sure if AST is the correct term for this, the “tree” is a linear set of instructions because the language does not have conditionals or assignments. Let’s just consider it a list of “enhanced” operations to be interpreted by the parser. Finally the “AST” will be used by the interpreter to execute the instruction set.
Lexer
The lexer was the easiest part to implement. A lexer’s job is to take the raw string input and turn it into a flat list of Token types. For Brainfuck I implemented a Token
Type based on the language specifications. A token has an Enum and literal that represents the raw input.
package lexer
type TokenType int
const (
INCREMENT TokenType = iota
DECREMENT
SHIFT_LEFT
SHIFT_RIGHT
OUTPUT
INPUT
OPEN_LOOP
CLOSE_LOOP
NEW_LINE
EOF
)
type Token struct {
TokenType TokenType
Literal string
}
NEW_LINE
allows for new lines to be included in the input and EOF
represents the end of the input.
The lexer then uses a large case statement to map the character inputs to the correct types. Using the default as the error catch surfaces the unknown characters to the user.
switch char {
case '+':
tokens[i] = Token{TokenType: INCREMENT, Literal: string(char)}
case '-':
tokens[i] = Token{TokenType: DECREMENT, Literal: string(char)}
case '>':
tokens[i] = Token{TokenType: SHIFT_RIGHT, Literal: string(char)}
.
.
.
default:
err := fmt.Sprintf("Error at %d: Invalid character \"%s\" ", i, string(char))
return nil, errors.New(err)
}
Parse
The parser creates a flat list of Operations
that will be given to the interpreter to run. The parser looks similar to the Lexer but with some extra features to handle loops.
type Operation struct {
Token lexer.Token
Count int // used for instruction optimization
Jump int // set only if loop
}
Operations have a lexer.Token
and Jump
attributes. The Jump
attribute is only set when loop tokens are parsed. The jump attribute tells the interpreter where to move the instructions pointer (IP). I left out the Count
parameter because I have not implemented it but, it represents the number of identical operations that can be compressed by the parser. For example “>++++” reads as “Shift Data Pointer Right and Increment 4 times”. Rather than generating 4 individual increment operations, we could optimize this by setting the count to 4 on the first operation and discarding the duplicates.
In the code notice, the only operations that need extra consideration are the loops and EOF. For the loops, I used a stack to make sure all the loop’s parenthesis match and store their jump points in the AST node. When the parser reaches the EOF then the stack should be empty, indicating that all parenthesis have matching sets. If you had to do any leetcode code questions on stacks this might be familiar lmao.
switch token.TokenType {
case lexer.DECREMENT,
lexer.INCREMENT,
lexer.SHIFT_LEFT,
lexer.SHIFT_RIGHT,
lexer.INPUT,
lexer.OUTPUT:
operations[i] = createOperation(i, tokens)
case lexer.OPEN_LOOP:
stack.Push(i)
operations[i] = createLoop(-1, lexer.OPEN_LOOP) // jump position is unknown for now
case lexer.CLOSE_LOOP:
if stack.IsEmpty() {
return nil, errors.New("closing bracket does not match any opening bracket")
}
open_loop, _ := stack.Pop()
operations[i] = createLoop(open_loop, lexer.CLOSE_LOOP) // send end loop jump
operations[open_loop].Jump = i // set the open loop jmp to end
case lexer.EOF:
if !stack.IsEmpty() {
return nil, errors.New("unmatched opening bracket")
}
continue
Interpreter
LETS GO now I have all the instructions lexed and parsed, and I can implement the execution flow. From the description, I can simulate the environment by adding the pointers and memory. The memory is set to 30000 bytes as described in the wiki.
type Brainfuck struct {
Instructions []parser.Operation
Memory [30000]int
dp int // Data Pointer
ip int // Instruction Pointer
}
Similar to all the examples above I used switch statements to execute the behavior for each operation
switch op.Token.TokenType {
case lexer.INCREMENT:
bf.Memory[bf.dp]++
case lexer.DECREMENT:
bf.Memory[bf.dp]--
case lexer.SHIFT_LEFT:
bf.dp--
case lexer.SHIFT_RIGHT:
bf.dp++
case lexer.OUTPUT:
fmt.Print(bf.Memory[bf.dp])
case lexer.INPUT:
var input int
_, _ = fmt.Scan(&input)
bf.Memory[bf.dp] = input
case lexer.OPEN_LOOP:
if bf.Memory[bf.dp] <= 0 {
bf.ip = op.Jump + 1
}
case lexer.CLOSE_LOOP:
bf.ip = op.Jump
}
Testing
Now I can test the interpreter by running the following BrainFuck Program; guess what a first program might be.
func main() {
program := "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++."
bf, err := interpreter.NewBrainfuck(program)
if err != nil {
fmt.Println(err)
os.Exit(2)
}
bf.Interpret()
}
// Returns: 7210110810811144328711111410810033 (wtf)
Convert the output from decimal to ASCII and you get Hello, World!
.
There is more testing to be done but, I am pretty happy with the results!
References
- https://github.com/sebastian-mora/brainfuck-interpreter
- https://en.wikipedia.org/wiki/Brainfuck
- https://esolangs.org/wiki/Brainfuck
- https://thorstenball.com/blog/2017/01/04/a-virtual-brainfuck-machine-in-go/