SlideShare a Scribd company logo
1 of 87
Download to read offline
Writing Parsers and Compilers
          with PLY
             David Beazley
        http://www.dabeaz.com

          February 23, 2007
Overview

• Crash course on compilers
• An introduction to PLY
• Notable PLY features (why use it?)
• Experience writing a compiler in Python
Background
• Programs that process other programs
• Compilers
• Interpreters
• Wrapper generators
• Domain-specific languages
• Code-checkers
Example
• Parse and generate assembly code
  /* Compute GCD of two integers */
  fun gcd(x:int, y:int)
      g: int;
      begin
          g := y;
          while x > 0 do
              begin
                  g := x;
                  x := y - (y/x)*x;
                  y := g
              end;
          return g
      end
Compilers 101
• Compilers have multiple phases
• First phase usually concerns "parsing"
• Read program and create abstract representation
/* Compute GCD of two integers */
fun gcd(x:int, y:int)
    g: int;
    begin
        g := y;


                                    parser
        while x > 0 do
            begin
                g := x;
                x := y - (y/x)*x;
                y := g
            end;
        return g
    end
Compilers 101
• Code generation phase
• Process the abstract representation
• Produce some kind of output
                                        LOAD R1, A
                                        LOAD R2, B
                    codegen             ADD R1,R2,R1
                                        STORE C, R1
                                        ...
Commentary

• There are many advanced details
• Most people care about code generation
• Yet, parsing is often the most annoying problem
• A major focus of tool building
Parsing in a Nutshell
• Lexing : Input is split into tokens
                 b = 40 + 20*(2+3)/37.5


     NAME = NUM + NUM * ( NUM + NUM ) / FLOAT

• Parsing : Applying language grammar rules
            =
     NAME             +
                NUM                                 /
                                *                       FLOAT
                          NUM             +

                                    NUM       NUM
Lex & Yacc
• Programming tools for writing parsers
• Lex - Lexical analysis (tokenizing)
• Yacc - Yet Another Compiler Compiler (parsing)
• History:
    - Yacc : ~1973. Stephen Johnson (AT&T)
    - Lex : ~1974. Eric Schmidt and Mike Lesk (AT&T)

• Variations of both tools are widely known
• Covered in compilers classes and textbooks
Lex/Yacc Big Picture
lexer.l
    token
specification
Lex/Yacc Big Picture
lexer.l
    token
 /* lexer.l */                       grammar
specification
 %{                               specification
 #include “header.h”
 int lineno = 1;
 %}
 %%
 [ t]* ;      /* Ignore whitespace */
 n                      { lineno++; }
 [0-9]+                  { yylval.val = atoi(yytext);
                           return NUMBER; }
 [a-zA-Z_][a-zA-Z0-9_]* { yylval.name = strdup(yytext);
                           return ID; }
 +                      { return PLUS; }
 -                       { return MINUS; }
 *                      { return TIMES; }
 /                      { return DIVIDE; }
 =                       { return EQUALS; }
 %%
Lex/Yacc Big Picture
lexer.l
    token
specification

          lex

lexer.c
Lex/Yacc Big Picture
lexer.l         parser.y
    token          grammar
specification   specification

          lex

lexer.c
Lex/Yacc Big Picture
 lexer.l                         parser.y
     token
     /* parser.y */                  grammar
 specification
     %{                           specification
      #include “header.h”
      %}
           lex
      %union {
          char *name;
lexer.c.c int   val;
      }
      %token PLUS MINUS TIMES DIVIDE EQUALS
      %token<name> ID;
      %token<val> NUMBER;
      %%
      start : ID EQUALS expr;
      expr : expr PLUS term
            | expr MINUS term
            | term
            ;
      ...
Lex/Yacc Big Picture
lexer.l         parser.y
    token          grammar
specification   specification

          lex              yacc

lexer.c         parser.c
Lex/Yacc Big Picture
lexer.l                      parser.y
     token                      grammar
 specification               specification

          lex                           yacc

lexer.c                      parser.c


typecheck.c      codegen.c      otherstuff.c
Lex/Yacc Big Picture
lexer.l                       parser.y
     token                       grammar
 specification                specification

          lex                            yacc

lexer.c                       parser.c


typecheck.c       codegen.c      otherstuff.c




                 mycompiler
What is PLY?

• PLY = Python Lex-Yacc
• A Python version of the lex/yacc toolset
• Same functionality as lex/yacc
• But a different interface
• Influences : Unix yacc, SPARK (John Aycock)
Some History
• Late 90's : "Why isn't SWIG written in Python?"
• 2001 : Taught a compilers course. Students
  write a compiler in Python as an experiment.
• 2001 : PLY-1.0 developed and released
• 2001-2005: Occasional maintenance
• 2006 : Major update to PLY-2.x.
PLY Package
• PLY consists of two Python modules
   ply.lex
   ply.yacc


• You simply import the modules to use them
• However, PLY is not a code generator
ply.lex

• A module for writing lexers
• Tokens specified using regular expressions
• Provides functions for reading input text
• An annotated example follows...
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’                      tokens list specifies
t_TIMES = r’*’                 all of the possible tokens
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()          # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’                 Each token has a matching
t_MINUS = r’-’
t_TIMES = r’*’
                                  declaration of the form
t_DIVIDE = r’/’                         t_TOKNAME
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’        These names must match
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’                         Tokens are defined  by
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’
                                         regular expressions
def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’           For simple tokens,
t_TIMES = r’*’
t_DIVIDE = r’/’
                          strings are used.
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
                       Functions are used when
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’ code
                          special action
                             must execute
def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):       docstring holds
    r’d+’           regular expression
    t.value = int(t.value)
    return t

lex.lex()          # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ Specifies ignored
                              ]
t_ignore = ‘ t’           characters between
t_PLUS   = r’+’
                        tokens (usually whitespace)
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t
                   Builds the lexer
lex.lex()        by creating a master
                  # Build the lexer
                  regular expression
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’         Introspection used
t_TIMES = r’*’       to examine contents
t_DIVIDE = r’/’         of calling module.
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
ply.lex example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’         Introspection used
t_TIMES = r’*’       to examine contents
t_DIVIDE = r’/’         of calling module.
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’

def t_NUMBER(t):
    r’d+’                 __dict__ = {
    t.value = int(t.value)    'tokens' :   [ 'NAME' ...],
    return t                  't_ignore'   : ' t',
                              't_PLUS' :   '+',
lex.lex()         # Build the ...
                               lexer
                              't_NUMBER'   : <function ...
                           }
ply.lex use
• Two functions: input() and token()
   ...
   lex.lex()         # Build the lexer
   ...
   lex.input("x = 3 * 4 + 5 * 6")
   while True:
        tok = lex.token()
        if not tok: break

        # Use token
        ...
ply.lex use
• Two functions: input() and token()
   ...
   lex.lex()         # Build the lexer
   ...
                                         input() feeds a string
   lex.input("x = 3 * 4 + 5 * 6")
   while True:                              into the lexer
        tok = lex.token()
        if not tok: break

        # Use token
        ...
ply.lex use
• Two functions: input() and token()
   ...
   lex.lex()         # Build the lexer
   ...
   lex.input("x = 3 * 4 + 5 * 6")
   while True:
        tok = lex.token()            token() returns the
        if not tok: break            next token or None
        # Use token
        ...
ply.lex use
• Two functions: input() and token()
    ...
    lex.lex()         # Build the lexer
    ...
    lex.input("x = 3 * 4 + 5 * 6")
    while True:
         tok = lex.token()
         if not tok: break

tok.type # Use   token
         ...
tok.value
tok.line
tok.lexpos
ply.lex use
• Two functions: input() and token()
    ...
    lex.lex()         # Build the lexer
    ...
    lex.input("x = 3 * 4 + 5 * 6")
    while True:
         tok = lex.token()
         if not tok: break

tok.type # Use   token
         ...
tok.value
tok.line
tok.lexpos               t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’
ply.lex use
• Two functions: input() and token()
    ...
    lex.lex()         # Build the lexer
    ...
    lex.input("x = 3 * 4 + 5 * 6")
    while True:
         tok = lex.token()
         if not tok: break

tok.type # Use   token
         ...
tok.value            matching text
tok.line
tok.lexpos               t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’
ply.lex use
• Two functions: input() and token()
    ...
    lex.lex()         # Build the lexer
    ...
    lex.input("x = 3 * 4 + 5 * 6")
    while True:
         tok = lex.token()
         if not tok: break

tok.type # Use   token
         ...
tok.value
tok.line
tok.lexpos               Position in input text
ply.lex Commentary


• Normally you don't use the tokenizer directly
• Instead, it's used by the parser module
ply.yacc preliminaries
• ply.yacc is a module for creating a parser
• Assumes you have defined a BNF grammar
    assign : NAME EQUALS expr
    expr   : expr PLUS term
           | expr MINUS term
           | term
    term   : term TIMES factor
           | term DIVIDE factor
           | factor
    factor : NUMBER
ply.yacc example
import ply.yacc as yacc
import mylexer               # Import lexer information
tokens = mylexer.tokens      # Need token list

def p_assign(p):
    '''assign : NAME EQUALS expr'''

def p_expr(p):
    '''expr : expr PLUS term
            | expr MINUS term
            | term'''
def p_term(p):
    '''term : term TIMES factor
            | term DIVIDE factor
            | factor'''
def p_factor(p):
    '''factor : NUMBER'''

yacc.yacc()               # Build the parser
ply.yacc example
import ply.yacc as yacc
import mylexer                    token information
                            # Import lexer information
tokens = mylexer.tokens     # Need token list lexer
                                 imported from
def p_assign(p):
    '''assign : NAME EQUALS expr'''

def p_expr(p):
    '''expr : expr PLUS term
            | expr MINUS term
            | term'''
def p_term(p):
    '''term : term TIMES factor
            | term DIVIDE factor
            | factor'''
def p_factor(p):
    '''factor : NUMBER'''

yacc.yacc()               # Build the parser
ply.yacc example
import ply.yacc as yacc
import mylexer               # Import lexer information
tokens = mylexer.tokens      # Need token list

def p_assign(p):
    '''assign : NAME EQUALS expr'''

def p_expr(p):                          grammar rules encoded
    '''expr : expr PLUS term            as functions with names
            | expr MINUS term                 p_rulename
            | term'''
def p_term(p):
    '''term : term TIMES factor
            | term DIVIDE factor          Note: Name doesn't
            | factor'''                   matter as long as it
def p_factor(p):
                                            starts with p_
    '''factor : NUMBER'''

yacc.yacc()               # Build the parser
ply.yacc example
import ply.yacc as yacc
import mylexer               # Import lexer information
tokens = mylexer.tokens      # Need token list

def p_assign(p):
    '''assign : NAME EQUALS expr'''

def p_expr(p):
    '''expr : expr PLUS term
            | expr MINUS term
            | term'''                          docstrings contain
def p_term(p):                                  grammar rules
    '''term : term TIMES factor                   from BNF
            | term DIVIDE factor
            | factor'''
def p_factor(p):
    '''factor : NUMBER'''

yacc.yacc()               # Build the parser
ply.yacc example
import ply.yacc as yacc
import mylexer            # Import lexer information
tokens = mylexer.tokens   # Need token list

def p_assign(p):
    '''assign : NAME EQUALS expr'''

def p_expr(p):
    '''expr : expr PLUS term
            | expr MINUS term
            | term'''
def p_term(p):
    '''term : term TIMES factor
            | term DIVIDE factor
            | factor'''
def p_factor(p):
    '''factor : NUMBER'''
                       Builds the parser
yacc.yacc()           # Build the parser
                      using introspection
ply.yacc parsing
• yacc.parse() function
    yacc.yacc()      # Build the parser
    ...
    data = "x = 3*4+5*6"
    yacc.parse(data)   # Parse some text



• This feeds data into lexer
• Parses the text and invokes grammar rules
A peek inside

• PLY uses LR-parsing. LALR(1)
• AKA: Shift-reduce parsing
• Widely used parsing technique
• Table driven
General Idea
• Input tokens are shifted onto a parsing stack
     Stack                 Input
                           X = 3 *   4   +   5
     NAME                    = 3 *   4   +   5
     NAME =                    3 *   4   +   5
     NAME = NUM                  *   4   +   5



• This continues until a complete grammar rule
  appears on the top of the stack
General Idea
• If rules are found, a "reduction" occurs
    Stack                       Input
                                X = 3 *   4   +   5
    NAME                          = 3 *   4   +   5
    NAME =                          3 *   4   +   5
    NAME = NUM                        *   4   +   5
                 reduce   factor : NUM

    NAME = factor


• RHS of grammar rule replaced with LHS
Rule Functions

• During reduction, rule functions are invoked
    def p_factor(p):
        ‘factor : NUMBER’


• Parameter p contains grammar symbol values
    def p_factor(p):
        ‘factor : NUMBER’

          p[0]     p[1]
Using an LR Parser

• Rule functions generally process values on
  right hand side of grammar rule
• Result is then stored in left hand side
• Results propagate up through the grammar
• Bottom-up parsing
Example: Calculator
def p_assign(p):
    ‘’’assign : NAME EQUALS expr’’’
    vars[p[1]] = p[3]

def p_expr_plus(p):
    ‘’’expr : expr PLUS term’’’
    p[0] = p[1] + p[3]

def p_term_mul(p):
    ‘’’term : term TIMES factor’’’
    p[0] = p[1] * p[3]

def p_term_factor(p):
    '''term : factor'''
    p[0] = p[1]

def p_factor(p):
    ‘’’factor : NUMBER’’’
    p[0] = p[1]
Example: Parse Tree
def p_assign(p):
    ‘’’assign : NAME EQUALS expr’’’
    p[0] = (‘ASSIGN’,p[1],p[3])

def p_expr_plus(p):
    ‘’’expr : expr PLUS term’’’
    p[0] = (‘+’,p[1],p[3])

def p_term_mul(p):
    ‘’’term : term TIMES factor’’’
    p[0] = (‘*’,p[1],p[3])

def p_term_factor(p):
    '''term : factor'''
    p[0] = p[1]

def p_factor(p):
    ‘’’factor : NUMBER’’’
    p[0] = (‘NUM’,p[1])
Example: Parse Tree
>>> t = yacc.parse("x = 3*4 + 5*6")
>>> t
('ASSIGN','x',('+',
                  ('*',('NUM',3),('NUM',4)),
                  ('*',('NUM',5),('NUM',6))
               )
)
>>>

                         ASSIGN


                  'x'              '+'


                         '*'                 '*'

                     3         4         5         6
Why use PLY?

• There are many Python parsing tools
• Some use more powerful parsing algorithms
• Isn't parsing a "solved" problem anyways?
PLY is Informative
• Compiler writing is hard
• Tools should not make it even harder
• PLY provides extensive diagnostics
• Major emphasis on error reporting
• Provides the same information as yacc
PLY Diagnostics
• PLY produces the same diagnostics as yacc
• Yacc
    % yacc grammar.y
    4 shift/reduce conflicts
    2 reduce/reduce conflicts

• PLY
    % python mycompiler.py
    yacc: Generating LALR parsing table...
    4 shift/reduce conflicts
    2 reduce/reduce conflicts

• PLY also produces the same debugging output
Debugging Output
Grammar                                                             state 10
Rule       1      statement -> NAME = expression                          (1)    statement -> NAME = expression .
Rule       2      statement -> expression                                 (3)    expression -> expression . + expression
Rule       3      expression -> expression + expression                   (4)    expression -> expression . - expression
Rule       4      expression -> expression - expression                   (5)    expression -> expression . * expression
Rule       5      expression -> expression * expression                   (6)    expression -> expression . / expression
Rule       6      expression -> expression / expression
Rule       7      expression -> NUMBER                                    $end                 reduce using     rule 1 (statement -> NAME = expression .)
                                                                          +                    shift and go     to state 7
Terminals, with rules where they appear                                   -                    shift and go     to state 6
                                                                          *                    shift and go     to state 8
*                           :   5                                         /                    shift and go     to state 9
+                           :   3
-                           :   4
/                           :   6
=                           :   1                                   state 11
NAME                        :   1
NUMBER                      :   7                                         (4)    expression   ->   expression   -   expression .
error                       :                                             (3)    expression   ->   expression   .   + expression
                                                                          (4)    expression   ->   expression   .   - expression
Nonterminals, with rules where they appear                                (5)    expression   ->   expression   .   * expression
                                                                          (6)    expression   ->   expression   .   / expression
expression                  : 1 2 3 3 4 4 5 5 6 6
statement                   : 0                                       !   shift/reduce    conflict for + resolved as shift.
                                                                      !   shift/reduce    conflict for - resolved as shift.
Parsing method: LALR                                                  !   shift/reduce    conflict for * resolved as shift.
                                                                      !   shift/reduce    conflict for / resolved as shift.
state 0                                                                   $end               reduce using rule 4 (expression -> expression - expression .)
                                                                          +                  shift and go to state 7
       (0)     S' -> . statement                                          -                  shift and go to state 6
       (1)     statement -> . NAME = expression                           *                  shift and go to state 8
       (2)     statement -> . expression                                  /                  shift and go to state 9
       (3)     expression -> . expression + expression
       (4)     expression -> . expression - expression                !   +                    [   reduce   using   rule   4   (expression   ->   expression   -   expression   .)   ]
       (5)     expression -> . expression * expression                !   -                    [   reduce   using   rule   4   (expression   ->   expression   -   expression   .)   ]
       (6)     expression -> . expression / expression                !   *                    [   reduce   using   rule   4   (expression   ->   expression   -   expression   .)   ]
       (7)     expression -> . NUMBER                                 !   /                    [   reduce   using   rule   4   (expression   ->   expression   -   expression   .)   ]
       NAME                shift and go to state 1
       NUMBER              shift and go to state 2


       expression                         shift and go to state 4
       statement                          shift and go to state 3

state 1

       (1) statement -> NAME . = expression

       =                   shift and go to state 5
Debugging Output
... Grammar                                                                          state 10
state 1 11 statement
   Rule                      -> NAME = expression                                          (1)   statement -> NAME = expression .
    Rule       2   statement -> expression                                                 (3)   expression -> expression . + expression
    Rule       3   expression -> expression + expression                                   (4)   expression -> expression . - expression
      (4) expression -> expression
    Rule
    Rule
               4
               5
                   expression -> expression - expression
                   expression -> expression * expression
                                                                  -    expression .        (5)   expression -> expression . * expression
                                                                                           (6)   expression -> expression . / expression
      (3) expression -> expression
    Rule
    Rule
               6
               7
                   expression -> expression / expression
                   expression -> NUMBER
                                                                  .    + expression
                                                                                    $end                     reduce using     rule 1 (statement -> NAME = expression .)
      (4) expression -> expression
    Terminals, with rules where they appear
                                                                  .    - expression +                        shift and go     to state 7
                                                                                    -                        shift and go     to state 6
    *
      (5) expression -> expression
                         : 5
                                                                  .    * expression *                        shift and go     to state 8
                                                                                    /                        shift and go     to state 9
    + (6) expression -> expression
                         : 3                                      .    / expression
    -                    : 4
    /                        :   6
    =                        :   1                                                   state 11
   ! shift/reduce conflict for + resolved as shift.
    NAME
    NUMBER
                             :
                             :
                                 1
                                 7                                 (4) expression -> expression - expression .
   ! shift/reduce conflict for - resolved as shift.
    error                    :                                     (3) expression -> expression . + expression
                                                                   (4) expression -> expression . - expression
   ! shift/reduce conflict for * resolved as shift.
    Nonterminals, with rules where they appear                     (5) expression -> expression . * expression
                                                                   (6) expression -> expression . / expression
   !statement
      shift/reduce
    expression           : 1 2 3 conflict for / resolved as shift.
                         : 0
                                 3 4 4 5 5 6 6
                                                                 ! shift/reduce conflict for + resolved as shift.
      $end
    Parsing method: LALR
                                       reduce using rule 4 (expression ->conflict for - resolved as shift.
                                                                 ! shift/reduce   expression - expression .)
                                                                 ! shift/reduce conflict for * resolved as shift.
      +
    state 0
                                       shift and go to state 7   ! shift/reduce conflict for / resolved as shift.
                                                                   $end            reduce using rule 4 (expression -> expression - expression                                             .)
      -(0) S' -> . statement           shift and go to state 6     +               shift and go to state 7
                                                                   -               shift and go to state 6
      *(2) statement -> . expression shift and go to state 8
        (1) statement -> . NAME = expression                       *               shift and go to state 8
                                                                   /               shift and go to state 9
      /(4) expression -> . expression - expression
        (3) expression -> . expression shift and go to state 9
                                       + expression
                                                                                       !   +                 [   reduce   using   rule   4   (expression   ->   expression   -   expression    .)   ]
           (5) expression -> . expression * expression                                 !   -                 [   reduce   using   rule   4   (expression   ->   expression   -   expression    .)   ]
           (6) expression -> . expression / expression                                 !   *                 [   reduce   using   rule   4   (expression   ->   expression   -   expression    .)   ]
  ! +      (7) expression -> . NUMBER       [ reduce using             rule   4   (expression
                                                                                       !   /              -> [    expression
                                                                                                                 reduce   using   rule   4      -   expression
                                                                                                                                             (expression   ->   expression   -    .)    ]
                                                                                                                                                                                 expression    .)   ]

  ! -NAME
       NUMBER
                            shift and go
                            shift and go
                                            [ reduce using
                                           to state 1
                                           to state 2
                                                                       rule   4   (expression             ->      expression                    -   expression                    .)    ]
  ! *                                       [ reduce using             rule   4   (expression             ->      expression                    -   expression                    .)    ]
  ! /expression
       statement
                                            [ reduce state 4
                                             shift and go to using
                                             shift and go to state 3
                                                                       rule   4   (expression             ->      expression                    -   expression                    .)    ]
...state 1
           (1) statement -> NAME . = expression

           =                shift and go to state 5
PLY Validation
• PLY validates all token/grammar specs
• Duplicate rules
• Malformed regexs and grammars
• Missing rules and tokens
• Unused tokens and rules
• Improper function declarations
• Infinite recursion
Error Example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’
t_MINUS = r'-'      example.py:12: Rule t_MINUS redefined.
t_POWER = r'^'                    Previously defined on line 6

def t_NUMBER():
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
Error Example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’
t_MINUS = r'-'
t_POWER = r'^'     lex: Rule 't_POWER' defined for an
                    unspecified token POWER
def t_NUMBER():
    r’d+’
    t.value = int(t.value)
    return t

lex.lex()        # Build the lexer
Error Example
import ply.lex as lex
tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’,
           ’DIVIDE’, EQUALS’ ]
t_ignore = ‘ t’
t_PLUS   = r’+’
t_MINUS = r’-’
t_TIMES = r’*’
t_DIVIDE = r’/’
t_EQUALS = r’=’
t_NAME   = r’[a-zA-Z_][a-zA-Z0-9_]*’
t_MINUS = r'-'
t_POWER = r'^'

def t_NUMBER():     example.py:15: Rule 't_NUMBER' requires
    r’d+’          an argument.
    t.value = int(t.value)
    return t

lex.lex()         # Build the lexer
Commentary

• PLY was developed for classroom use
• Major emphasis on identifying and reporting
  potential problems
• Report errors rather that fail with exception
PLY is Yacc
• PLY supports all of the major features of
  Unix lex/yacc
• Syntax error handling and synchronization
• Precedence specifiers
• Character literals
• Start conditions
• Inherited attributes
Precedence Specifiers
• Yacc
   %left PLUS MINUS
   %left TIMES DIVIDE
   %nonassoc UMINUS
   ...
   expr : MINUS expr %prec UMINUS {
       $$ = -$1;
   }

• PLY
   precedence = (
      ('left','PLUS','MINUS'),
      ('left','TIMES','DIVIDE'),
      ('nonassoc','UMINUS'),
   )
   def p_expr_uminus(p):
      'expr : MINUS expr %prec UMINUS'
      p[0] = -p[1]
Character Literals
• Yacc
  expr :   expr   '+'   expr    {    $$   =   $1   +   $3;   }
       |   expr   '-'   expr    {    $$   =   $1   -   $3;   }
       |   expr   '*'   expr    {    $$   =   $1   *   $3;   }
       |   expr   '/'   expr    {    $$   =   $1   /   $3;   }
       ;


• PLY
  def p_expr(p):
     '''expr : expr       '+'       expr
             | expr       '-'       expr
             | expr       '*'       expr
             | expr       '/'       expr'''
     ...
Error Productions
• Yacc
  funcall_err : ID LPAREN error RPAREN {
           printf("Syntax error in argumentsn");
       }
       ;


• PLY
  def p_funcall_err(p):
     '''ID LPAREN error RPAREN'''
     print "Syntax error in argumentsn"
Commentary

• Books and documentation on yacc/bison
  used to guide the development of PLY
• Tried to copy all of the major features
• Usage as similar to lex/yacc as reasonable
PLY is Simple
• Two pure-Python modules. That's it.
• Not part of a "parser framework"
• Use doesn't involve exotic design patterns
• Doesn't rely upon C extension modules
• Doesn't rely on third party tools
PLY is Fast
• For a parser written entirely in Python
• Underlying parser is table driven
• Parsing tables are saved and only regenerated if
  the grammar changes
• Considerable work went into optimization
  from the start (developed on 200Mhz PC)
PLY Performance
• Example: Generating the LALR tables
     •   Input: SWIG C++ grammar

     •   459 grammar rules, 892 parser states

     •   3.6 seconds (PLY-2.3, 2.66Ghz Intel Xeon)

     •   0.026 seconds (bison/ANSI C)

• Fast enough not to be annoying
• Tables only generated once and reused
PLY Performance
• Parse file with 1000 random expressions
  (805KB) and build an abstract syntax tree
      •   PLY-2.3   : 2.95 sec,     10.2 MB   (Python)

      •   YAPPS2    : 6.57 sec,     32.5 MB   (Python)

      •   PyParsing : 13.11 sec,    15.6 MB   (Python)

      •   ANTLR      : 53.16 sec,   94 MB     (Python)

      •   SPARK      : 235.88 sec, 347 MB     (Python)


• System: MacPro 2.66Ghz Xeon, Python-2.5
PLY Performance
• Parse file with 1000 random expressions
  (805KB) and build an abstract syntax tree
     •   PLY-2.3   : 2.95 sec,   10.2 MB   (Python)

     •   DParser   : 0.71 sec,   72 MB     (Python/C)

     •   BisonGen : 0.25 sec,    13 MB     (Python/C)

     •   Bison     : 0.063 sec, 7.9 MB     (C)

• 12x slower than BisonGen (mostly C)
• 47x slower than pure C
• System: MacPro 2.66Ghz Xeon, Python-2.5
Perf. Breakdown
• Parse file with 1000 random expressions
  (805KB) and build an abstract syntax tree
     •   Total time : 2.95 sec

     •   Startup    : 0.02 sec

     •   Lexing     : 1.20 sec

     •   Parsing    : 1.12 sec

     •   AST        : 0.61 sec


• System: MacPro 2.66Ghz Xeon, Python-2.5
Advanced PLY

• PLY has many advanced features
• Lexers/parsers can be defined as classes
• Support for multiple lexers and parsers
• Support for optimized mode (python -O)
Class Example
import ply.yacc as yacc

class MyParser:
    def p_assign(self,p):
        ‘’’assign : NAME EQUALS expr’’’
    def p_expr(self,p):
        ‘’’expr : expr PLUS term
                | expr MINUS term
                | term’’’
    def p_term(self,p):
        ‘’’term : term TIMES factor
                | term DIVIDE factor
                | factor’’’
    def p_factor(self,p):
        ‘’’factor : NUMBER’’’
    def build(self):
        self.parser = yacc.yacc(object=self)
Experience with PLY

• In 2001, I taught a compilers course
• Students wrote a full compiler
• Lexing, parsing, type checking, code generation
• Procedures, nested scopes, and type inference
• Produced working SPARC assembly code
Classroom Results

• You can write a real compiler in Python
• Students were successful with projects
• However, many projects were quite "hacky"
• Still unsure about dynamic nature of Python
• May be too easy to create a "bad" compiler
General PLY Experience

• May be very useful for prototyping
• PLY's strength is in its diagnostics
• Significantly faster than most Python parsers
• Not sure I'd rewrite gcc in Python just yet
• I'm still thinking about SWIG.
Limitations
• LALR(1) parsing
• Not easy to work with very complex grammars
  (e.g., C++ parsing)
• Retains all of yacc's black magic
• Not as powerful as more general parsing
  algorithms (ANTLR, SPARK, etc.)
• Tradeoff : Speed vs. Generality
PLY Usage

• Current version : Ply-2.3
• >100 downloads/week
• People are obviously using it
• Largest project I know of : Ada parser
• Many other small projects
Future Directions
• PLY was written for Python-2.0
• Not yet updated to use modern Python
  features such as iterators and generators
• May update, but not at the expense of
  performance
• Working on some add-ons to ease transition
  between yacc <---> PLY.
Acknowledgements
• Many people have contributed to PLY
    Thad Austin          Adam Kerrison
    Shannon Behrens      Daniel Larraz
    Michael Brown        David McNab
    Russ Cox             Patrick Mezard
    Johan Dahl           Pearu Peterson
    Andrew Dalke         François Pinard
    Michael Dyck         Eric Raymond
    Joshua Gerth         Adam Ring
    Elias Ioup           Rich Salz
    Oldrich Jedlicka     Markus Schoepflin
    Sverre Jørgensen     Christoper Stawarz
    Lee June             Miki Tebeka
    Andreas Jung         Andrew Waters
    Cem Karan

• Apologies to anyone I forgot
Resources

• PLY homepage
     http://www.dabeaz.com/ply


• Mailing list/group
http://groups.google.com/group/ply-hack

More Related Content

What's hot

C decision making and looping.
C decision making and looping.C decision making and looping.
C decision making and looping.Haard Shah
 
Domain Driven Design with the F# type System -- F#unctional Londoners 2014
Domain Driven Design with the F# type System -- F#unctional Londoners 2014Domain Driven Design with the F# type System -- F#unctional Londoners 2014
Domain Driven Design with the F# type System -- F#unctional Londoners 2014Scott Wlaschin
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Why Rust? - Matthias Endler - Codemotion Amsterdam 2016
Why Rust? - Matthias Endler - Codemotion Amsterdam 2016Why Rust? - Matthias Endler - Codemotion Amsterdam 2016
Why Rust? - Matthias Endler - Codemotion Amsterdam 2016Codemotion
 
Lecture 3 RE NFA DFA
Lecture 3   RE NFA DFA Lecture 3   RE NFA DFA
Lecture 3 RE NFA DFA Rebaz Najeeb
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenMITSUNARI Shigeo
 
Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021TomStraub5
 
C lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshareC lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshareGagan Deep
 
Structure of c_program_to_input_output
Structure of c_program_to_input_outputStructure of c_program_to_input_output
Structure of c_program_to_input_outputAnil Dutt
 
"Simple Made Easy" Made Easy
"Simple Made Easy" Made Easy"Simple Made Easy" Made Easy
"Simple Made Easy" Made EasyKent Ohashi
 
Top 20 java programming interview questions for sdet
Top 20 java programming interview questions for sdetTop 20 java programming interview questions for sdet
Top 20 java programming interview questions for sdetDevLabs Alliance
 
Decision makingandbranching in c
Decision makingandbranching in cDecision makingandbranching in c
Decision makingandbranching in cNMahendiran
 
SEH overwrite and its exploitability
SEH overwrite and its exploitabilitySEH overwrite and its exploitability
SEH overwrite and its exploitabilityFFRI, Inc.
 
ErlangでErlagVM上で動く言語の作り方
ErlangでErlagVM上で動く言語の作り方ErlangでErlagVM上で動く言語の作り方
ErlangでErlagVM上で動く言語の作り方osamu kimura
 
Presentation on c programing satcture
Presentation on c programing satcture Presentation on c programing satcture
Presentation on c programing satcture topu93
 

What's hot (20)

C decision making and looping.
C decision making and looping.C decision making and looping.
C decision making and looping.
 
Domain Driven Design with the F# type System -- F#unctional Londoners 2014
Domain Driven Design with the F# type System -- F#unctional Londoners 2014Domain Driven Design with the F# type System -- F#unctional Londoners 2014
Domain Driven Design with the F# type System -- F#unctional Londoners 2014
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Why Rust? - Matthias Endler - Codemotion Amsterdam 2016
Why Rust? - Matthias Endler - Codemotion Amsterdam 2016Why Rust? - Matthias Endler - Codemotion Amsterdam 2016
Why Rust? - Matthias Endler - Codemotion Amsterdam 2016
 
Lecture 3 RE NFA DFA
Lecture 3   RE NFA DFA Lecture 3   RE NFA DFA
Lecture 3 RE NFA DFA
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
 
Variadic functions
Variadic functionsVariadic functions
Variadic functions
 
Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021Developing Terraform Modules at Scale - HashiTalks 2021
Developing Terraform Modules at Scale - HashiTalks 2021
 
String c
String cString c
String c
 
String.ppt
String.pptString.ppt
String.ppt
 
C lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshareC lecture 4 nested loops and jumping statements slideshare
C lecture 4 nested loops and jumping statements slideshare
 
Structure of c_program_to_input_output
Structure of c_program_to_input_outputStructure of c_program_to_input_output
Structure of c_program_to_input_output
 
"Simple Made Easy" Made Easy
"Simple Made Easy" Made Easy"Simple Made Easy" Made Easy
"Simple Made Easy" Made Easy
 
Top 20 java programming interview questions for sdet
Top 20 java programming interview questions for sdetTop 20 java programming interview questions for sdet
Top 20 java programming interview questions for sdet
 
Decision makingandbranching in c
Decision makingandbranching in cDecision makingandbranching in c
Decision makingandbranching in c
 
Python fundamentals
Python fundamentalsPython fundamentals
Python fundamentals
 
SEH overwrite and its exploitability
SEH overwrite and its exploitabilitySEH overwrite and its exploitability
SEH overwrite and its exploitability
 
ErlangでErlagVM上で動く言語の作り方
ErlangでErlagVM上で動く言語の作り方ErlangでErlagVM上で動く言語の作り方
ErlangでErlagVM上で動く言語の作り方
 
Rust-lang
Rust-langRust-lang
Rust-lang
 
Presentation on c programing satcture
Presentation on c programing satcture Presentation on c programing satcture
Presentation on c programing satcture
 

Viewers also liked

Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0David Beazley (Dabeaz LLC)
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...David Beazley (Dabeaz LLC)
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...David Beazley (Dabeaz LLC)
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsDavid Beazley (Dabeaz LLC)
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++David Beazley (Dabeaz LLC)
 
In Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter LockIn Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter LockDavid Beazley (Dabeaz LLC)
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyDavid Beazley (Dabeaz LLC)
 
Ctypes в игровых приложениях на python
Ctypes в игровых приложениях на pythonCtypes в игровых приложениях на python
Ctypes в игровых приложениях на pythonPyNSK
 
llvm-py: Writing Compilers In Python
llvm-py: Writing Compilers In Pythonllvm-py: Writing Compilers In Python
llvm-py: Writing Compilers In Pythonmdevan
 

Viewers also liked (20)

Generators: The Final Frontier
Generators: The Final FrontierGenerators: The Final Frontier
Generators: The Final Frontier
 
Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0Generator Tricks for Systems Programmers, v2.0
Generator Tricks for Systems Programmers, v2.0
 
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
 
Perl-C/C++ Integration with Swig
Perl-C/C++ Integration with SwigPerl-C/C++ Integration with Swig
Perl-C/C++ Integration with Swig
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
Python in Action (Part 1)
Python in Action (Part 1)Python in Action (Part 1)
Python in Action (Part 1)
 
Generator Tricks for Systems Programmers
Generator Tricks for Systems ProgrammersGenerator Tricks for Systems Programmers
Generator Tricks for Systems Programmers
 
Mastering Python 3 I/O
Mastering Python 3 I/OMastering Python 3 I/O
Mastering Python 3 I/O
 
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python ExceptionsWAD : A Module for Converting Fatal Extension Errors into Python Exceptions
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
 
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
SWIG : An Easy to Use Tool for Integrating Scripting Languages with C and C++
 
Python Generator Hacking
Python Generator HackingPython Generator Hacking
Python Generator Hacking
 
Interfacing C/C++ and Python with SWIG
Interfacing C/C++ and Python with SWIGInterfacing C/C++ and Python with SWIG
Interfacing C/C++ and Python with SWIG
 
Python in Action (Part 2)
Python in Action (Part 2)Python in Action (Part 2)
Python in Action (Part 2)
 
In Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter LockIn Search of the Perfect Global Interpreter Lock
In Search of the Perfect Global Interpreter Lock
 
Mastering Python 3 I/O (Version 2)
Mastering Python 3 I/O (Version 2)Mastering Python 3 I/O (Version 2)
Mastering Python 3 I/O (Version 2)
 
Understanding the Python GIL
Understanding the Python GILUnderstanding the Python GIL
Understanding the Python GIL
 
A Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and ConcurrencyA Curious Course on Coroutines and Concurrency
A Curious Course on Coroutines and Concurrency
 
An Introduction to Python Concurrency
An Introduction to Python ConcurrencyAn Introduction to Python Concurrency
An Introduction to Python Concurrency
 
Ctypes в игровых приложениях на python
Ctypes в игровых приложениях на pythonCtypes в игровых приложениях на python
Ctypes в игровых приложениях на python
 
llvm-py: Writing Compilers In Python
llvm-py: Writing Compilers In Pythonllvm-py: Writing Compilers In Python
llvm-py: Writing Compilers In Python
 

Similar to Parsing and Compilers with PLY

Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)Sami Said
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manualSami Said
 
Introduction to pygments
Introduction to pygmentsIntroduction to pygments
Introduction to pygmentsroskakori
 
Writing a compiler in go
Writing a compiler in goWriting a compiler in go
Writing a compiler in goYusuke Kita
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxSreeLaya9
 
Introduction to Elixir
Introduction to ElixirIntroduction to Elixir
Introduction to Elixirbrien_wankel
 
Intermediate code generation1
Intermediate code generation1Intermediate code generation1
Intermediate code generation1Shashwat Shriparv
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Tom Lee
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonTristan Penman
 
Yacc topic beyond syllabus
Yacc   topic beyond syllabusYacc   topic beyond syllabus
Yacc topic beyond syllabusJK Knowledge
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM CompilerErin Dees
 
Elixir and Dialyzer, Types and Typespecs, using and understanding them
Elixir and Dialyzer, Types and Typespecs, using and understanding themElixir and Dialyzer, Types and Typespecs, using and understanding them
Elixir and Dialyzer, Types and Typespecs, using and understanding themDan Janowski
 
6 compiler lab - Flex
6 compiler lab - Flex6 compiler lab - Flex
6 compiler lab - FlexMashaelQ
 
DSL - expressive syntax on top of a clean semantic model
DSL - expressive syntax on top of a clean semantic modelDSL - expressive syntax on top of a clean semantic model
DSL - expressive syntax on top of a clean semantic modelDebasish Ghosh
 

Similar to Parsing and Compilers with PLY (20)

Lexyacc
LexyaccLexyacc
Lexyacc
 
Compiler Design Tutorial
Compiler Design Tutorial Compiler Design Tutorial
Compiler Design Tutorial
 
Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)
 
Lex tool manual
Lex tool manualLex tool manual
Lex tool manual
 
Introduction to pygments
Introduction to pygmentsIntroduction to pygments
Introduction to pygments
 
Writing a compiler in go
Writing a compiler in goWriting a compiler in go
Writing a compiler in go
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptx
 
Introduction to Elixir
Introduction to ElixirIntroduction to Elixir
Introduction to Elixir
 
C language
C languageC language
C language
 
Yacc lex
Yacc lexYacc lex
Yacc lex
 
Intermediate code generation1
Intermediate code generation1Intermediate code generation1
Intermediate code generation1
 
Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]Inside PHP [OSCON 2012]
Inside PHP [OSCON 2012]
 
Generating parsers using Ragel and Lemon
Generating parsers using Ragel and LemonGenerating parsers using Ragel and Lemon
Generating parsers using Ragel and Lemon
 
Ch04
Ch04Ch04
Ch04
 
Yacc topic beyond syllabus
Yacc   topic beyond syllabusYacc   topic beyond syllabus
Yacc topic beyond syllabus
 
Write Your Own JVM Compiler
Write Your Own JVM CompilerWrite Your Own JVM Compiler
Write Your Own JVM Compiler
 
Elixir and Dialyzer, Types and Typespecs, using and understanding them
Elixir and Dialyzer, Types and Typespecs, using and understanding themElixir and Dialyzer, Types and Typespecs, using and understanding them
Elixir and Dialyzer, Types and Typespecs, using and understanding them
 
LEX & YACC TOOL
LEX & YACC TOOLLEX & YACC TOOL
LEX & YACC TOOL
 
6 compiler lab - Flex
6 compiler lab - Flex6 compiler lab - Flex
6 compiler lab - Flex
 
DSL - expressive syntax on top of a clean semantic model
DSL - expressive syntax on top of a clean semantic modelDSL - expressive syntax on top of a clean semantic model
DSL - expressive syntax on top of a clean semantic model
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Parsing and Compilers with PLY

  • 1. Writing Parsers and Compilers with PLY David Beazley http://www.dabeaz.com February 23, 2007
  • 2. Overview • Crash course on compilers • An introduction to PLY • Notable PLY features (why use it?) • Experience writing a compiler in Python
  • 3. Background • Programs that process other programs • Compilers • Interpreters • Wrapper generators • Domain-specific languages • Code-checkers
  • 4. Example • Parse and generate assembly code /* Compute GCD of two integers */ fun gcd(x:int, y:int) g: int; begin g := y; while x > 0 do begin g := x; x := y - (y/x)*x; y := g end; return g end
  • 5. Compilers 101 • Compilers have multiple phases • First phase usually concerns "parsing" • Read program and create abstract representation /* Compute GCD of two integers */ fun gcd(x:int, y:int) g: int; begin g := y; parser while x > 0 do begin g := x; x := y - (y/x)*x; y := g end; return g end
  • 6. Compilers 101 • Code generation phase • Process the abstract representation • Produce some kind of output LOAD R1, A LOAD R2, B codegen ADD R1,R2,R1 STORE C, R1 ...
  • 7. Commentary • There are many advanced details • Most people care about code generation • Yet, parsing is often the most annoying problem • A major focus of tool building
  • 8. Parsing in a Nutshell • Lexing : Input is split into tokens b = 40 + 20*(2+3)/37.5 NAME = NUM + NUM * ( NUM + NUM ) / FLOAT • Parsing : Applying language grammar rules = NAME + NUM / * FLOAT NUM + NUM NUM
  • 9. Lex & Yacc • Programming tools for writing parsers • Lex - Lexical analysis (tokenizing) • Yacc - Yet Another Compiler Compiler (parsing) • History: - Yacc : ~1973. Stephen Johnson (AT&T) - Lex : ~1974. Eric Schmidt and Mike Lesk (AT&T) • Variations of both tools are widely known • Covered in compilers classes and textbooks
  • 10. Lex/Yacc Big Picture lexer.l token specification
  • 11. Lex/Yacc Big Picture lexer.l token /* lexer.l */ grammar specification %{ specification #include “header.h” int lineno = 1; %} %% [ t]* ; /* Ignore whitespace */ n { lineno++; } [0-9]+ { yylval.val = atoi(yytext); return NUMBER; } [a-zA-Z_][a-zA-Z0-9_]* { yylval.name = strdup(yytext); return ID; } + { return PLUS; } - { return MINUS; } * { return TIMES; } / { return DIVIDE; } = { return EQUALS; } %%
  • 12. Lex/Yacc Big Picture lexer.l token specification lex lexer.c
  • 13. Lex/Yacc Big Picture lexer.l parser.y token grammar specification specification lex lexer.c
  • 14. Lex/Yacc Big Picture lexer.l parser.y token /* parser.y */ grammar specification %{ specification #include “header.h” %} lex %union { char *name; lexer.c.c int val; } %token PLUS MINUS TIMES DIVIDE EQUALS %token<name> ID; %token<val> NUMBER; %% start : ID EQUALS expr; expr : expr PLUS term | expr MINUS term | term ; ...
  • 15. Lex/Yacc Big Picture lexer.l parser.y token grammar specification specification lex yacc lexer.c parser.c
  • 16. Lex/Yacc Big Picture lexer.l parser.y token grammar specification specification lex yacc lexer.c parser.c typecheck.c codegen.c otherstuff.c
  • 17. Lex/Yacc Big Picture lexer.l parser.y token grammar specification specification lex yacc lexer.c parser.c typecheck.c codegen.c otherstuff.c mycompiler
  • 18. What is PLY? • PLY = Python Lex-Yacc • A Python version of the lex/yacc toolset • Same functionality as lex/yacc • But a different interface • Influences : Unix yacc, SPARK (John Aycock)
  • 19. Some History • Late 90's : "Why isn't SWIG written in Python?" • 2001 : Taught a compilers course. Students write a compiler in Python as an experiment. • 2001 : PLY-1.0 developed and released • 2001-2005: Occasional maintenance • 2006 : Major update to PLY-2.x.
  • 20. PLY Package • PLY consists of two Python modules ply.lex ply.yacc • You simply import the modules to use them • However, PLY is not a code generator
  • 21. ply.lex • A module for writing lexers • Tokens specified using regular expressions • Provides functions for reading input text • An annotated example follows...
  • 22. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 23. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ tokens list specifies t_TIMES = r’*’ all of the possible tokens t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 24. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ Each token has a matching t_MINUS = r’-’ t_TIMES = r’*’ declaration of the form t_DIVIDE = r’/’ t_TOKNAME t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 25. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ These names must match t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 26. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ Tokens are defined by t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ regular expressions def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 27. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ For simple tokens, t_TIMES = r’*’ t_DIVIDE = r’/’ strings are used. t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 28. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ Functions are used when t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ code special action must execute def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 29. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): docstring holds r’d+’ regular expression t.value = int(t.value) return t lex.lex() # Build the lexer
  • 30. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ Specifies ignored ] t_ignore = ‘ t’ characters between t_PLUS = r’+’ tokens (usually whitespace) t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 31. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t Builds the lexer lex.lex() by creating a master # Build the lexer regular expression
  • 32. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ Introspection used t_TIMES = r’*’ to examine contents t_DIVIDE = r’/’ of calling module. t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 33. ply.lex example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ Introspection used t_TIMES = r’*’ to examine contents t_DIVIDE = r’/’ of calling module. t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ def t_NUMBER(t): r’d+’ __dict__ = { t.value = int(t.value) 'tokens' : [ 'NAME' ...], return t 't_ignore' : ' t', 't_PLUS' : '+', lex.lex() # Build the ... lexer 't_NUMBER' : <function ... }
  • 34. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... lex.input("x = 3 * 4 + 5 * 6") while True: tok = lex.token() if not tok: break # Use token ...
  • 35. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... input() feeds a string lex.input("x = 3 * 4 + 5 * 6") while True: into the lexer tok = lex.token() if not tok: break # Use token ...
  • 36. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... lex.input("x = 3 * 4 + 5 * 6") while True: tok = lex.token() token() returns the if not tok: break next token or None # Use token ...
  • 37. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... lex.input("x = 3 * 4 + 5 * 6") while True: tok = lex.token() if not tok: break tok.type # Use token ... tok.value tok.line tok.lexpos
  • 38. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... lex.input("x = 3 * 4 + 5 * 6") while True: tok = lex.token() if not tok: break tok.type # Use token ... tok.value tok.line tok.lexpos t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
  • 39. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... lex.input("x = 3 * 4 + 5 * 6") while True: tok = lex.token() if not tok: break tok.type # Use token ... tok.value matching text tok.line tok.lexpos t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’
  • 40. ply.lex use • Two functions: input() and token() ... lex.lex() # Build the lexer ... lex.input("x = 3 * 4 + 5 * 6") while True: tok = lex.token() if not tok: break tok.type # Use token ... tok.value tok.line tok.lexpos Position in input text
  • 41. ply.lex Commentary • Normally you don't use the tokenizer directly • Instead, it's used by the parser module
  • 42. ply.yacc preliminaries • ply.yacc is a module for creating a parser • Assumes you have defined a BNF grammar assign : NAME EQUALS expr expr : expr PLUS term | expr MINUS term | term term : term TIMES factor | term DIVIDE factor | factor factor : NUMBER
  • 43. ply.yacc example import ply.yacc as yacc import mylexer # Import lexer information tokens = mylexer.tokens # Need token list def p_assign(p): '''assign : NAME EQUALS expr''' def p_expr(p): '''expr : expr PLUS term | expr MINUS term | term''' def p_term(p): '''term : term TIMES factor | term DIVIDE factor | factor''' def p_factor(p): '''factor : NUMBER''' yacc.yacc() # Build the parser
  • 44. ply.yacc example import ply.yacc as yacc import mylexer token information # Import lexer information tokens = mylexer.tokens # Need token list lexer imported from def p_assign(p): '''assign : NAME EQUALS expr''' def p_expr(p): '''expr : expr PLUS term | expr MINUS term | term''' def p_term(p): '''term : term TIMES factor | term DIVIDE factor | factor''' def p_factor(p): '''factor : NUMBER''' yacc.yacc() # Build the parser
  • 45. ply.yacc example import ply.yacc as yacc import mylexer # Import lexer information tokens = mylexer.tokens # Need token list def p_assign(p): '''assign : NAME EQUALS expr''' def p_expr(p): grammar rules encoded '''expr : expr PLUS term as functions with names | expr MINUS term p_rulename | term''' def p_term(p): '''term : term TIMES factor | term DIVIDE factor Note: Name doesn't | factor''' matter as long as it def p_factor(p): starts with p_ '''factor : NUMBER''' yacc.yacc() # Build the parser
  • 46. ply.yacc example import ply.yacc as yacc import mylexer # Import lexer information tokens = mylexer.tokens # Need token list def p_assign(p): '''assign : NAME EQUALS expr''' def p_expr(p): '''expr : expr PLUS term | expr MINUS term | term''' docstrings contain def p_term(p): grammar rules '''term : term TIMES factor from BNF | term DIVIDE factor | factor''' def p_factor(p): '''factor : NUMBER''' yacc.yacc() # Build the parser
  • 47. ply.yacc example import ply.yacc as yacc import mylexer # Import lexer information tokens = mylexer.tokens # Need token list def p_assign(p): '''assign : NAME EQUALS expr''' def p_expr(p): '''expr : expr PLUS term | expr MINUS term | term''' def p_term(p): '''term : term TIMES factor | term DIVIDE factor | factor''' def p_factor(p): '''factor : NUMBER''' Builds the parser yacc.yacc() # Build the parser using introspection
  • 48. ply.yacc parsing • yacc.parse() function yacc.yacc() # Build the parser ... data = "x = 3*4+5*6" yacc.parse(data) # Parse some text • This feeds data into lexer • Parses the text and invokes grammar rules
  • 49. A peek inside • PLY uses LR-parsing. LALR(1) • AKA: Shift-reduce parsing • Widely used parsing technique • Table driven
  • 50. General Idea • Input tokens are shifted onto a parsing stack Stack Input X = 3 * 4 + 5 NAME = 3 * 4 + 5 NAME = 3 * 4 + 5 NAME = NUM * 4 + 5 • This continues until a complete grammar rule appears on the top of the stack
  • 51. General Idea • If rules are found, a "reduction" occurs Stack Input X = 3 * 4 + 5 NAME = 3 * 4 + 5 NAME = 3 * 4 + 5 NAME = NUM * 4 + 5 reduce factor : NUM NAME = factor • RHS of grammar rule replaced with LHS
  • 52. Rule Functions • During reduction, rule functions are invoked def p_factor(p): ‘factor : NUMBER’ • Parameter p contains grammar symbol values def p_factor(p): ‘factor : NUMBER’ p[0] p[1]
  • 53. Using an LR Parser • Rule functions generally process values on right hand side of grammar rule • Result is then stored in left hand side • Results propagate up through the grammar • Bottom-up parsing
  • 54. Example: Calculator def p_assign(p): ‘’’assign : NAME EQUALS expr’’’ vars[p[1]] = p[3] def p_expr_plus(p): ‘’’expr : expr PLUS term’’’ p[0] = p[1] + p[3] def p_term_mul(p): ‘’’term : term TIMES factor’’’ p[0] = p[1] * p[3] def p_term_factor(p): '''term : factor''' p[0] = p[1] def p_factor(p): ‘’’factor : NUMBER’’’ p[0] = p[1]
  • 55. Example: Parse Tree def p_assign(p): ‘’’assign : NAME EQUALS expr’’’ p[0] = (‘ASSIGN’,p[1],p[3]) def p_expr_plus(p): ‘’’expr : expr PLUS term’’’ p[0] = (‘+’,p[1],p[3]) def p_term_mul(p): ‘’’term : term TIMES factor’’’ p[0] = (‘*’,p[1],p[3]) def p_term_factor(p): '''term : factor''' p[0] = p[1] def p_factor(p): ‘’’factor : NUMBER’’’ p[0] = (‘NUM’,p[1])
  • 56. Example: Parse Tree >>> t = yacc.parse("x = 3*4 + 5*6") >>> t ('ASSIGN','x',('+', ('*',('NUM',3),('NUM',4)), ('*',('NUM',5),('NUM',6)) ) ) >>> ASSIGN 'x' '+' '*' '*' 3 4 5 6
  • 57. Why use PLY? • There are many Python parsing tools • Some use more powerful parsing algorithms • Isn't parsing a "solved" problem anyways?
  • 58. PLY is Informative • Compiler writing is hard • Tools should not make it even harder • PLY provides extensive diagnostics • Major emphasis on error reporting • Provides the same information as yacc
  • 59. PLY Diagnostics • PLY produces the same diagnostics as yacc • Yacc % yacc grammar.y 4 shift/reduce conflicts 2 reduce/reduce conflicts • PLY % python mycompiler.py yacc: Generating LALR parsing table... 4 shift/reduce conflicts 2 reduce/reduce conflicts • PLY also produces the same debugging output
  • 60. Debugging Output Grammar state 10 Rule 1 statement -> NAME = expression (1) statement -> NAME = expression . Rule 2 statement -> expression (3) expression -> expression . + expression Rule 3 expression -> expression + expression (4) expression -> expression . - expression Rule 4 expression -> expression - expression (5) expression -> expression . * expression Rule 5 expression -> expression * expression (6) expression -> expression . / expression Rule 6 expression -> expression / expression Rule 7 expression -> NUMBER $end reduce using rule 1 (statement -> NAME = expression .) + shift and go to state 7 Terminals, with rules where they appear - shift and go to state 6 * shift and go to state 8 * : 5 / shift and go to state 9 + : 3 - : 4 / : 6 = : 1 state 11 NAME : 1 NUMBER : 7 (4) expression -> expression - expression . error : (3) expression -> expression . + expression (4) expression -> expression . - expression Nonterminals, with rules where they appear (5) expression -> expression . * expression (6) expression -> expression . / expression expression : 1 2 3 3 4 4 5 5 6 6 statement : 0 ! shift/reduce conflict for + resolved as shift. ! shift/reduce conflict for - resolved as shift. Parsing method: LALR ! shift/reduce conflict for * resolved as shift. ! shift/reduce conflict for / resolved as shift. state 0 $end reduce using rule 4 (expression -> expression - expression .) + shift and go to state 7 (0) S' -> . statement - shift and go to state 6 (1) statement -> . NAME = expression * shift and go to state 8 (2) statement -> . expression / shift and go to state 9 (3) expression -> . expression + expression (4) expression -> . expression - expression ! + [ reduce using rule 4 (expression -> expression - expression .) ] (5) expression -> . expression * expression ! - [ reduce using rule 4 (expression -> expression - expression .) ] (6) expression -> . expression / expression ! * [ reduce using rule 4 (expression -> expression - expression .) ] (7) expression -> . NUMBER ! / [ reduce using rule 4 (expression -> expression - expression .) ] NAME shift and go to state 1 NUMBER shift and go to state 2 expression shift and go to state 4 statement shift and go to state 3 state 1 (1) statement -> NAME . = expression = shift and go to state 5
  • 61. Debugging Output ... Grammar state 10 state 1 11 statement Rule -> NAME = expression (1) statement -> NAME = expression . Rule 2 statement -> expression (3) expression -> expression . + expression Rule 3 expression -> expression + expression (4) expression -> expression . - expression (4) expression -> expression Rule Rule 4 5 expression -> expression - expression expression -> expression * expression - expression . (5) expression -> expression . * expression (6) expression -> expression . / expression (3) expression -> expression Rule Rule 6 7 expression -> expression / expression expression -> NUMBER . + expression $end reduce using rule 1 (statement -> NAME = expression .) (4) expression -> expression Terminals, with rules where they appear . - expression + shift and go to state 7 - shift and go to state 6 * (5) expression -> expression : 5 . * expression * shift and go to state 8 / shift and go to state 9 + (6) expression -> expression : 3 . / expression - : 4 / : 6 = : 1 state 11 ! shift/reduce conflict for + resolved as shift. NAME NUMBER : : 1 7 (4) expression -> expression - expression . ! shift/reduce conflict for - resolved as shift. error : (3) expression -> expression . + expression (4) expression -> expression . - expression ! shift/reduce conflict for * resolved as shift. Nonterminals, with rules where they appear (5) expression -> expression . * expression (6) expression -> expression . / expression !statement shift/reduce expression : 1 2 3 conflict for / resolved as shift. : 0 3 4 4 5 5 6 6 ! shift/reduce conflict for + resolved as shift. $end Parsing method: LALR reduce using rule 4 (expression ->conflict for - resolved as shift. ! shift/reduce expression - expression .) ! shift/reduce conflict for * resolved as shift. + state 0 shift and go to state 7 ! shift/reduce conflict for / resolved as shift. $end reduce using rule 4 (expression -> expression - expression .) -(0) S' -> . statement shift and go to state 6 + shift and go to state 7 - shift and go to state 6 *(2) statement -> . expression shift and go to state 8 (1) statement -> . NAME = expression * shift and go to state 8 / shift and go to state 9 /(4) expression -> . expression - expression (3) expression -> . expression shift and go to state 9 + expression ! + [ reduce using rule 4 (expression -> expression - expression .) ] (5) expression -> . expression * expression ! - [ reduce using rule 4 (expression -> expression - expression .) ] (6) expression -> . expression / expression ! * [ reduce using rule 4 (expression -> expression - expression .) ] ! + (7) expression -> . NUMBER [ reduce using rule 4 (expression ! / -> [ expression reduce using rule 4 - expression (expression -> expression - .) ] expression .) ] ! -NAME NUMBER shift and go shift and go [ reduce using to state 1 to state 2 rule 4 (expression -> expression - expression .) ] ! * [ reduce using rule 4 (expression -> expression - expression .) ] ! /expression statement [ reduce state 4 shift and go to using shift and go to state 3 rule 4 (expression -> expression - expression .) ] ...state 1 (1) statement -> NAME . = expression = shift and go to state 5
  • 62. PLY Validation • PLY validates all token/grammar specs • Duplicate rules • Malformed regexs and grammars • Missing rules and tokens • Unused tokens and rules • Improper function declarations • Infinite recursion
  • 63. Error Example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ t_MINUS = r'-' example.py:12: Rule t_MINUS redefined. t_POWER = r'^' Previously defined on line 6 def t_NUMBER(): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 64. Error Example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ t_MINUS = r'-' t_POWER = r'^' lex: Rule 't_POWER' defined for an unspecified token POWER def t_NUMBER(): r’d+’ t.value = int(t.value) return t lex.lex() # Build the lexer
  • 65. Error Example import ply.lex as lex tokens = [ ‘NAME’,’NUMBER’,’PLUS’,’MINUS’,’TIMES’, ’DIVIDE’, EQUALS’ ] t_ignore = ‘ t’ t_PLUS = r’+’ t_MINUS = r’-’ t_TIMES = r’*’ t_DIVIDE = r’/’ t_EQUALS = r’=’ t_NAME = r’[a-zA-Z_][a-zA-Z0-9_]*’ t_MINUS = r'-' t_POWER = r'^' def t_NUMBER(): example.py:15: Rule 't_NUMBER' requires r’d+’ an argument. t.value = int(t.value) return t lex.lex() # Build the lexer
  • 66. Commentary • PLY was developed for classroom use • Major emphasis on identifying and reporting potential problems • Report errors rather that fail with exception
  • 67. PLY is Yacc • PLY supports all of the major features of Unix lex/yacc • Syntax error handling and synchronization • Precedence specifiers • Character literals • Start conditions • Inherited attributes
  • 68. Precedence Specifiers • Yacc %left PLUS MINUS %left TIMES DIVIDE %nonassoc UMINUS ... expr : MINUS expr %prec UMINUS { $$ = -$1; } • PLY precedence = ( ('left','PLUS','MINUS'), ('left','TIMES','DIVIDE'), ('nonassoc','UMINUS'), ) def p_expr_uminus(p): 'expr : MINUS expr %prec UMINUS' p[0] = -p[1]
  • 69. Character Literals • Yacc expr : expr '+' expr { $$ = $1 + $3; } | expr '-' expr { $$ = $1 - $3; } | expr '*' expr { $$ = $1 * $3; } | expr '/' expr { $$ = $1 / $3; } ; • PLY def p_expr(p): '''expr : expr '+' expr | expr '-' expr | expr '*' expr | expr '/' expr''' ...
  • 70. Error Productions • Yacc funcall_err : ID LPAREN error RPAREN { printf("Syntax error in argumentsn"); } ; • PLY def p_funcall_err(p): '''ID LPAREN error RPAREN''' print "Syntax error in argumentsn"
  • 71. Commentary • Books and documentation on yacc/bison used to guide the development of PLY • Tried to copy all of the major features • Usage as similar to lex/yacc as reasonable
  • 72. PLY is Simple • Two pure-Python modules. That's it. • Not part of a "parser framework" • Use doesn't involve exotic design patterns • Doesn't rely upon C extension modules • Doesn't rely on third party tools
  • 73. PLY is Fast • For a parser written entirely in Python • Underlying parser is table driven • Parsing tables are saved and only regenerated if the grammar changes • Considerable work went into optimization from the start (developed on 200Mhz PC)
  • 74. PLY Performance • Example: Generating the LALR tables • Input: SWIG C++ grammar • 459 grammar rules, 892 parser states • 3.6 seconds (PLY-2.3, 2.66Ghz Intel Xeon) • 0.026 seconds (bison/ANSI C) • Fast enough not to be annoying • Tables only generated once and reused
  • 75. PLY Performance • Parse file with 1000 random expressions (805KB) and build an abstract syntax tree • PLY-2.3 : 2.95 sec, 10.2 MB (Python) • YAPPS2 : 6.57 sec, 32.5 MB (Python) • PyParsing : 13.11 sec, 15.6 MB (Python) • ANTLR : 53.16 sec, 94 MB (Python) • SPARK : 235.88 sec, 347 MB (Python) • System: MacPro 2.66Ghz Xeon, Python-2.5
  • 76. PLY Performance • Parse file with 1000 random expressions (805KB) and build an abstract syntax tree • PLY-2.3 : 2.95 sec, 10.2 MB (Python) • DParser : 0.71 sec, 72 MB (Python/C) • BisonGen : 0.25 sec, 13 MB (Python/C) • Bison : 0.063 sec, 7.9 MB (C) • 12x slower than BisonGen (mostly C) • 47x slower than pure C • System: MacPro 2.66Ghz Xeon, Python-2.5
  • 77. Perf. Breakdown • Parse file with 1000 random expressions (805KB) and build an abstract syntax tree • Total time : 2.95 sec • Startup : 0.02 sec • Lexing : 1.20 sec • Parsing : 1.12 sec • AST : 0.61 sec • System: MacPro 2.66Ghz Xeon, Python-2.5
  • 78. Advanced PLY • PLY has many advanced features • Lexers/parsers can be defined as classes • Support for multiple lexers and parsers • Support for optimized mode (python -O)
  • 79. Class Example import ply.yacc as yacc class MyParser: def p_assign(self,p): ‘’’assign : NAME EQUALS expr’’’ def p_expr(self,p): ‘’’expr : expr PLUS term | expr MINUS term | term’’’ def p_term(self,p): ‘’’term : term TIMES factor | term DIVIDE factor | factor’’’ def p_factor(self,p): ‘’’factor : NUMBER’’’ def build(self): self.parser = yacc.yacc(object=self)
  • 80. Experience with PLY • In 2001, I taught a compilers course • Students wrote a full compiler • Lexing, parsing, type checking, code generation • Procedures, nested scopes, and type inference • Produced working SPARC assembly code
  • 81. Classroom Results • You can write a real compiler in Python • Students were successful with projects • However, many projects were quite "hacky" • Still unsure about dynamic nature of Python • May be too easy to create a "bad" compiler
  • 82. General PLY Experience • May be very useful for prototyping • PLY's strength is in its diagnostics • Significantly faster than most Python parsers • Not sure I'd rewrite gcc in Python just yet • I'm still thinking about SWIG.
  • 83. Limitations • LALR(1) parsing • Not easy to work with very complex grammars (e.g., C++ parsing) • Retains all of yacc's black magic • Not as powerful as more general parsing algorithms (ANTLR, SPARK, etc.) • Tradeoff : Speed vs. Generality
  • 84. PLY Usage • Current version : Ply-2.3 • >100 downloads/week • People are obviously using it • Largest project I know of : Ada parser • Many other small projects
  • 85. Future Directions • PLY was written for Python-2.0 • Not yet updated to use modern Python features such as iterators and generators • May update, but not at the expense of performance • Working on some add-ons to ease transition between yacc <---> PLY.
  • 86. Acknowledgements • Many people have contributed to PLY Thad Austin Adam Kerrison Shannon Behrens Daniel Larraz Michael Brown David McNab Russ Cox Patrick Mezard Johan Dahl Pearu Peterson Andrew Dalke François Pinard Michael Dyck Eric Raymond Joshua Gerth Adam Ring Elias Ioup Rich Salz Oldrich Jedlicka Markus Schoepflin Sverre Jørgensen Christoper Stawarz Lee June Miki Tebeka Andreas Jung Andrew Waters Cem Karan • Apologies to anyone I forgot
  • 87. Resources • PLY homepage http://www.dabeaz.com/ply • Mailing list/group http://groups.google.com/group/ply-hack