14 Aug 2015

Part 1: Lexing mathematical expressions with F#

Part 2: Parsing mathematical expressions with Shunting Yard, F#, and C#

Part 3: Recursive descent parsing of mathematical expressions with C#

(Code for this post is available on GitHub. There's both an F# and a C# version. The C# version is more advanced and constructs an Abstract Syntax Tree (AST) during parsing. It then implements a tree-walking interpreter to output the numeric result, the AST, and the prefix and postfix form of the expression.)

In the previous post, focus was on how to construct an expression parser, turning a string such as "-12+45*56" into a list of tokens: UnaryMinOp, Integer 12, BinPlusOp, Integer 34, BinMulOp, and Integer 56. Building on the lexer, this post employs Dijkstra's Shunting Yard algorithm (see original '61 paper, page 22) to reduce the list of tokens to a single Integer, evaluating the infix expression. What Shunting Yard provides is an elegant solution to the problem of non-trivial, non-left to right processing of tokens due to operator associativity and precedence.

Operator
associativity
and operator
precedence exist to disambiguate the order of evaluation of
sub-expressions in a expression with no parentheses. The rules of
associativity come into play when an expression contains multiple
operators at the *same* level of precedence. For instance, with
the binary exponentiation operator being right associative, 2^3^4 and
2^(3^4) are equivalent. As for precedence, its rules are needed when
an expression contains operators at *different* levels of
precedence. Multiplication trumping addition is what makes 2+3*4 and
2+(3*4) equivalent.

In case the exact definition of which operators are left and right associative and their levels of precedence appear somewhat arbitrary, it's because it is. Luckily, we all adhere to the same arbitrary rules in order to reduce the number of parentheses required.

Central to Shunting Yard is the table of associativity and precedence. During execution, the algorithm consults this table to determine if an expression is to be immediately evaluated or its evaluation deferred. The algorithm is naively processing tokens left to right and thus doesn't have the foresight of a human knowing what comes next. This apparent limitation is probably due to its 1961 heritage working with severely limited resources.

Following mathematical conventions, here's the table listing associativity and precedence for the operators supported by our language:

type Associativity = | Left | Right let configuration = [(Token.BinExpOp, 4, Right) (Token.UnaryMinOp, 3, Left) (Token.BinMulOp, 2, Left) (Token.BinDivOp, 2, Left) (Token.BinPlusOp, 1, Left) (Token.BinMinOp, 1, Left)]

In order to defer evaluation and keep track of intermediate results, Shunting Yard makes use of two stacks: one for operators and one for operands. Rather than defining new data types to store on the stack, mimicking what's already stored in the Token type, we use the existing Token type:

let operators = Stack<Token>() let operands = Stack<Token>()

With these introductory definitions out of the way, let's turn our attention to Shunting Yard in pseudo-code form (a modified version of the one on Wikipedia). The later implementation contains an inlined version of the pseudo-code for traceability:

while tokens to be read read token if token is operand then push onto operand stack if token is unary prefix operator then push onto operator stack if token is binary operator, o1, then while operator token, o2, at top of operator stack, and either o1 is left associative and its precedence is <= to that of o2 or o1 is right associative and its precedence < that of o2 reduce expression push o1 onto the operator stack if token is left paren then push it onto operator stack if token is right paren until token at top of operator stack is left paren reduce expression if operator stack runs out without finding left paren then mismatched parens pop left paren from stack when no more tokens to read and while still tokens on operator stack if operator token on top of stack is paren then mismatched parens reduce expression pop result of operand stack reduce expression pop operator off operator stack pop operands off operand stack process expression push result onto operand stack

At first glance, Shunting Yard may appear daunting. It's probably best understood through initial pen and paper evaluation of gradually more complex expressions.

Implementing the algorithm bottom up, we require a few functions to answer questions such as whether a token is an operand or an operator and whether the operator is unary or binary. We could've encoded these binary decisions in the configuration table, but encoding the mapping through functions seemed more natural:

// val lookupOperator : token:Token -> int * Associativity let lookupOperator token = configuration |> List.find (fun (t, _, _) -> token = t) |> fun (_, p, a) -> p, a // val isOperand : _arg1:Token -> bool let isOperand = function | Integer _ -> true | _ -> false // val isUnaryOperator : _arg1:Token -> bool let isUnaryOperator = function | UnaryMinOp | UnaryPlusOp -> true | _ -> false // val isBinaryOperator : _arg1:Token -> bool let isBinaryOperator = function // not (isUnaryOperator token) will not work since Integer is neither | BinPlusOp | BinMinOp | BinMulOp | BinDivOp | BinExpOp -> true | _ -> false

Next follows the reduce expression part of the algorithm. When the main part decides that it's time, reduce expression is where operator and operand tokens get reduced to a simpler Integer form. At this point Shunting Yard could equally well construct an abstract syntax tree from the tokens. The algorithm would remain the same, but reduceExpression would no longer push and pop Integers off the operand stack, but types of syntax tree nodes. The node which remains on the operand stack after all tokens have been processed would be the root node of the syntax tree as exemplified by this C# example.

// val reduceExpression : unit -> unit let reduceExpression() = let extractValue t = match t with | Integer i -> i | _ -> failwithf "Unsupported %A" t let operator = operators.Pop() if operator = UnaryMinOp then let operand = operands.Pop() |> extractValue operands.Push(Integer -operand) else if operator = UnaryPlusOp then operands.Pop() |> ignore else let left = operands.Pop() |> extractValue let right = operands.Pop() |> extractValue match operator with | BinPlusOp -> operands.Push(Integer (left + right)) | BinMinOp -> operands.Push(Integer (left - right)) | BinMulOp -> operands.Push(Integer (left * right)) | BinDivOp -> operands.Push(Integer (left / right)) | BinExpOp -> operands.Push(Integer (Math.Pow(float left, float right) |> int)) | _ -> failwithf "Unsupported operator %A" operator

And finally, the main part of Shunting Yard, tying together the pieces and including Unquote tests:

// val evaluateExpression : tokens:Token list -> Token let evaluateExpression tokens = // while tokens to be read // val eval : _arg1:Token list -> Token let rec eval = function // read token | hd :: tl -> // if token is operand then push onto operand stack if isOperand hd then operands.Push(hd) // if token is unary prefix operator then push onto operator stack else if isUnaryOperator hd then operators.Push(hd) // if token is binary operator, o1, then else if isBinaryOperator hd then // while operator token, o2, at top of operator stack, and // either o1 is left associative and its precedence <= o2 // or o1 is right associative and its precedence < o2 // reduce expression // push o1 onto the operator stack let pO1, aO1 = lookupOperator hd let isReduceRequired operator = let o2 = if isBinaryOperator(operator) then Some (lookupOperator(operator)) else None match o2 with | Some(pO2, aO2) -> (aO1 = Left && pO1 <= pO2) || (aO1 = Right && pO1 < pO2) | None -> false while operators.Count > 0 && isReduceRequired(operators.Peek()) do reduceExpression() operators.Push(hd) // if token is left paren then push it onto operator stack else if hd = LParen then operators.Push(hd) // if token is right paren else if hd = RParen then // until token at top of operator stack is left paren while operators.Count > 0 && operators.Peek() <> LParen do reduceExpression() // if operator stack runs out without finding left paren // then mismatched parens if operators.Count = 0 then failwith "Unmatched parens" // pop left paren from stack if operators.Peek() = LParen then operators.Pop() |> ignore eval tl | [] -> // when no more tokens to read and // while still tokens on operator stack while operators.Count > 0 do // if operator token on top of stack is paren // then mismatched parens if operators.Peek() = LParen || operators.Peek() = RParen then failwith "Unmatched paren" // reduce expression reduceExpression() // pop result of operand stack operands.Pop() eval tokens // val eval : (string -> Token) let eval = parseExpression >> evaluateExpression test <@ "1" |> eval = Integer 1 @> test <@ "-1" |> eval = Integer -1 @> test <@ "1+2" |> eval = Integer 3 @> test <@ "1+-2" |> eval = Integer -1 @> test <@ "-(1+2)" |> eval = Integer -3 @> test <@ "2^3^2" |> eval = Integer 512 @> test <@ "1+2*3" |> eval = Integer 7 @> test <@ "4^5/1+2*3" |> eval = Integer 1030 @>

Writing a lexer, parser, and evaluator by hand, even for this simple language of mathematical expressions, was no trivial task. Yet it proved a great learning experience and I've since taken the ideas to writing a parser and evaluator for a domain specific language. For this SPCalendarRecurrenceExpander, which turns each SharePoint calendar recurrence event into a series of individual events, taking into account recurrence exceptions, parsing entails generating an abstract syntax tree from the appointment. Evaluation is then consuming the abstract syntax tree, turning the recurrence into individual appointments.