# Parsing mathematical expressions with Shunting Yard, F#, and C#

14 Aug 2015

Part 1: Lexing mathematical expressions with F#
Part 2: Parsing mathematical expressions with Shunting Yard, F#, and C#
Part 3: Recursive descent parsing of mathematical expressions with C#

(Code for this post is available on GitHub. There's both an F# and a C# version. The C# version is more advanced and constructs an Abstract Syntax Tree (AST) during parsing. It then implements a tree-walking interpreter to output the numeric result, the AST, and the prefix and postfix form of the expression.)

In the previous post, focus was on how to construct an expression parser, turning a string such as "-12+45*56" into a list of tokens: UnaryMinOp, Integer 12, BinPlusOp, Integer 34, BinMulOp, and Integer 56. Building on the lexer, this post employs Dijkstra's Shunting Yard algorithm (see original '61 paper, page 22) to reduce the list of tokens to a single Integer, evaluating the infix expression. What Shunting Yard provides is an elegant solution to the problem of non-trivial, non-left to right processing of tokens due to operator associativity and precedence.

## Associativity and precedence

Operator associativity and operator precedence exist to disambiguate the order of evaluation of sub-expressions in a expression with no parentheses. The rules of associativity come into play when an expression contains multiple operators at the same level of precedence. For instance, with the binary exponentiation operator being right associative, 2^3^4 and 2^(3^4) are equivalent. As for precedence, its rules are needed when an expression contains operators at different levels of precedence. Multiplication trumping addition is what makes 2+3*4 and 2+(3*4) equivalent.

In case the exact definition of which operators are left and right associative and their levels of precedence appear somewhat arbitrary, it's because it is. Luckily, we all adhere to the same arbitrary rules in order to reduce the number of parentheses required.

## The Shunting Yard algorithm

Central to Shunting Yard is the table of associativity and precedence. During execution, the algorithm consults this table to determine if an expression is to be immediately evaluated or its evaluation deferred. The algorithm is naively processing tokens left to right and thus doesn't have the foresight of a human knowing what comes next. This apparent limitation is probably due to its 1961 heritage working with severely limited resources.

Following mathematical conventions, here's the table listing associativity and precedence for the operators supported by our language:

type Associativity =
| Left
| Right

let configuration =
[(Token.BinExpOp, 4, Right)
(Token.UnaryMinOp, 3, Left)
(Token.BinMulOp, 2, Left)
(Token.BinDivOp, 2, Left)
(Token.BinPlusOp, 1, Left)
(Token.BinMinOp, 1, Left)]


In order to defer evaluation and keep track of intermediate results, Shunting Yard makes use of two stacks: one for operators and one for operands. Rather than defining new data types to store on the stack, mimicking what's already stored in the Token type, we use the existing Token type:

let operators = Stack<Token>()
let operands = Stack<Token>()


With these introductory definitions out of the way, let's turn our attention to Shunting Yard in pseudo-code form (a modified version of the one on Wikipedia). The later implementation contains an inlined version of the pseudo-code for traceability:

while tokens to be read
if token is operand then push onto operand stack
if token is unary prefix operator then push onto operator stack
if token is binary operator, o1, then
while operator token, o2, at top of operator stack, and
either o1 is left associative and its precedence is <= to that of o2
or o1 is right associative and its precedence < that of o2
reduce expression
push o1 onto the operator stack
if token is left paren then push it onto operator stack
if token is right paren
until token at top of operator stack is left paren
reduce expression
if operator stack runs out without finding left paren then mismatched parens
pop left paren from stack
when no more tokens to read and
while still tokens on operator stack
if operator token on top of stack is paren then mismatched parens
reduce expression
pop result of operand stack

reduce expression
pop operator off operator stack
pop operands off operand stack
process expression
push result onto operand stack


At first glance, Shunting Yard may appear daunting. It's probably best understood through initial pen and paper evaluation of gradually more complex expressions.

Implementing the algorithm bottom up, we require a few functions to answer questions such as whether a token is an operand or an operator and whether the operator is unary or binary. We could've encoded these binary decisions in the configuration table, but encoding the mapping through functions seemed more natural:

// val lookupOperator : token:Token -> int * Associativity
let lookupOperator token =
configuration
|> List.find (fun (t, _, _) -> token = t)
|> fun (_, p, a) -> p, a

// val isOperand : _arg1:Token -> bool
let isOperand = function
| Integer _ -> true
| _ -> false

// val isUnaryOperator : _arg1:Token -> bool
let isUnaryOperator = function
| UnaryMinOp | UnaryPlusOp -> true
| _ -> false

// val isBinaryOperator : _arg1:Token -> bool
let isBinaryOperator = function
// not (isUnaryOperator token) will not work since Integer is neither
| BinPlusOp | BinMinOp | BinMulOp | BinDivOp | BinExpOp -> true
| _ -> false


Next follows the reduce expression part of the algorithm. When the main part decides that it's time, reduce expression is where operator and operand tokens get reduced to a simpler Integer form. At this point Shunting Yard could equally well construct an abstract syntax tree from the tokens. The algorithm would remain the same, but reduceExpression would no longer push and pop Integers off the operand stack, but types of syntax tree nodes. The node which remains on the operand stack after all tokens have been processed would be the root node of the syntax tree as exemplified by this C# example.

// val reduceExpression : unit -> unit
let reduceExpression() =
let extractValue t =
match t with
| Integer i -> i
| _ -> failwithf "Unsupported %A" t

let operator = operators.Pop()
if operator = UnaryMinOp then
let operand = operands.Pop() |> extractValue
operands.Push(Integer -operand)
else if operator = UnaryPlusOp then operands.Pop() |> ignore
else
let left = operands.Pop() |> extractValue
let right = operands.Pop() |> extractValue
match operator with
| BinPlusOp -> operands.Push(Integer (left + right))
| BinMinOp -> operands.Push(Integer (left - right))
| BinMulOp -> operands.Push(Integer (left * right))
| BinDivOp -> operands.Push(Integer (left / right))
| BinExpOp ->
operands.Push(Integer (Math.Pow(float left, float right) |> int))
| _ -> failwithf "Unsupported operator %A" operator


And finally, the main part of Shunting Yard, tying together the pieces and including Unquote tests:

// val evaluateExpression : tokens:Token list -> Token
let evaluateExpression tokens =
// while tokens to be read
// val eval : _arg1:Token list -> Token
let rec eval = function
| hd :: tl ->
// if token is operand then push onto operand stack
if isOperand hd then operands.Push(hd)

// if token is unary prefix operator then push onto operator stack
else if isUnaryOperator hd then operators.Push(hd)

// if token is binary operator, o1, then
else if isBinaryOperator hd then
// while operator token, o2, at top of operator stack, and
//     either o1 is left associative and its precedence <= o2
//     or o1 is right associative and its precedence < o2
//   reduce expression
// push o1 onto the operator stack
let pO1, aO1 = lookupOperator hd
let isReduceRequired operator =
let o2 =
if isBinaryOperator(operator)
then Some (lookupOperator(operator))
else None
match o2 with
| Some(pO2, aO2) ->
(aO1 = Left && pO1 <= pO2) || (aO1 = Right && pO1 < pO2)
| None -> false

while operators.Count > 0 && isReduceRequired(operators.Peek()) do
reduceExpression()
operators.Push(hd)

// if token is left paren then push it onto operator stack
else if hd = LParen then operators.Push(hd)

// if token is right paren
else if hd = RParen then
// until token at top of operator stack is left paren
while operators.Count > 0 && operators.Peek() <> LParen do
reduceExpression()

// if operator stack runs out without finding left paren
// then mismatched parens
if operators.Count = 0 then failwith "Unmatched parens"

// pop left paren from stack
if operators.Peek() = LParen then operators.Pop() |> ignore

eval tl
| [] ->
// when no more tokens to read and
// while still tokens on operator stack
while operators.Count > 0 do
// if operator token on top of stack is paren
// then mismatched parens
if operators.Peek() = LParen || operators.Peek() = RParen then
failwith "Unmatched paren"

// reduce expression
reduceExpression()

// pop result of operand stack
operands.Pop()

eval tokens

// val eval : (string -> Token)
let eval = parseExpression >> evaluateExpression

test <@ "1" |> eval = Integer 1 @>
test <@ "-1" |> eval = Integer -1 @>
test <@ "1+2" |> eval = Integer 3 @>
test <@ "1+-2" |> eval = Integer -1 @>
test <@ "-(1+2)" |> eval = Integer -3 @>
test <@ "2^3^2" |> eval = Integer 512 @>
test <@ "1+2*3" |> eval = Integer 7 @>
test <@ "4^5/1+2*3" |> eval = Integer 1030 @>


## Conclusion

Writing a lexer, parser, and evaluator by hand, even for this simple language of mathematical expressions, was no trivial task. Yet it proved a great learning experience and I've since taken the ideas to writing a parser and evaluator for a domain specific language. For this SPCalendarRecurrenceExpander, which turns each SharePoint calendar recurrence event into a series of individual events, taking into account recurrence exceptions, parsing entails generating an abstract syntax tree from the appointment. Evaluation is then consuming the abstract syntax tree, turning the recurrence into individual appointments.