Writing your own programming language is incredibly daunting. But if you start by writing a language that evaluates very simple functions it opens your mind to the possibility of writing an entire language and gives you a better understanding of what that would entail. So let’s start there. Let’s write a programming language that adds and multiplies numbers.
What’s in a Programming Language?
1. Parsers: Functions that read input and then tell the program what to do with it
2. Data Structure: The parser puts our input into an abstract syntax which our program can then evaluate
3. Nodes: Each node on the tree represents a value or expression which can then be evaluated to give us a value
Parser
In my opinion, the parser is where the magic happens. Writing a parser tells us how input will be interpreted. This is where you really determine the syntax of the programming language. For instance, in Ruby you can write 2 + 2 or 2+ 2 or (2+2) or ( 2 + 2 ) and they will all equal 4. That’s because there is a Ruby parser that doesn’t care about whitespace here and wants you to have a value on the left and on the right of the ‘+’ sign.
We are going to be very strict with our programming language in order to simplify our code. Before I share the operations that our language can perform, let’s call our code MarkLang (in honor of Mark who taught me how to write a language).
MarkLang Syntax
Addition: open paren + plus sign + whitespace + number + whitespace + number …. + close paren
Valid Examples: (+ 3 3) or (+ 6 2 10 56) or (+ 1 1 0 5 8 9)
Multiplication: open paren + asterisk + whitespace + number + whitespace + number …. + close paren
Valid Examples: (* 4 5) or (* 6 2) or (* 7 7 8 10 12)
Data Structure
The parser reads our input and turns it into structured data. The structure it forms is an abstract syntax tree, or a tree representation of the source code written in a programming language. It creates nodes with branches, like a tree.
Nodes
The parser turns input into nodes. For example, (+ 3 3) will be turned into three nodes. The Plus Node will have two branches off of it with a NumberNode (value = 3) on each end. For another example, let’s think about (+ 3 (+ 5 5)). This input will be turned in a data structure with five nodes. At the top there will be a PlusNode with two branches off it, a Number Node (value = 3) and another PlusNode. That second Plus Node will also have two brances with NumberNode (value = 5) at each end.
Every nodes needs to have an eval function so we can treat all nodes the same from the outside. Then each type of node’s eval method will do something different when we call it. This sounds like a case of polymorphism.
Okay, here are the codes: NumberNode (which just returns the number value), PlusNode(which adds all of the NumberNodes it’s given) and MultiplyNode (which multiplies all of the NumbderNodes it’s given).
class NumberNode def initialize(value) @value = value end def eval @value end end class PlusNode def initialize(nodes) @nodes = nodes end def eval result = 0 @nodes.each do |node| result += node.eval end result end end class MultiplyNode def initialize(nodes) @nodes = nodes end def eval result = 1 @nodes.each do |node| result *= node.eval end result end end
Next, let’s write the Parser. Our parser will take in a string code like “(+ 2 2)” and parse it. We then go through the parse_expression method recursively to parse the expressions in parentheses and the numbers. We use a @position instance variable to keep our place in the code while we parse different parts of it.
class Parser attr_accessor :position, :code def initialize(code) @position = 0 @code = code end def parse_expression if is_digit? code[@position] parse_number_node elsif code[@position] == '(' parse_parenthesized_expression end end def parse_number_node number = "" while is_digit?(code[@position]) number += code[@position] @position += 1 end NumberNode.new number.to_i end def parse_parenthesized_expression if code[position + 1] == "*" parse_multiply_node elsif code[position + 1] == "+" parse_plus_node elsif code[(position + 1)..(position + 2)] == "if" parse_if_node end end def skip_whitespace until code[@position] != ' ' @position += 1 end end def parse_arguments @position += 3 node_array = [] while code[position] != ")" skip_whitespace node_array << parse_expression end @position += 1 node_array end def parse_multiply_node MultiplyNode.new(parse_arguments) end def parse_plus_node PlusNode.new(parse_arguments) end def is_digit?(char) ('0'..'9').include? char end end
Now we have a simple parser with three types of nodes. We just wrote the MarkLang programming language that can add and multiple numbers. Let's try it.
code = "(* 2 2 2)" parser = Parser.new(code) result = parser.parse_expression puts result.eval #=> 8
Ta-da!