Skip to content

A clean combinator language that compiles to POSIX regular expressions

Notifications You must be signed in to change notification settings

bzinberg/regex-combinator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regex Combinator Language

Herein lies a combinator language for matchers which compiles down to POSIX regular expressions (both standards — BREs and EREs — are supported).

Why

The need for a clean combinator language arises because regular expression syntax is so bad that writing down a regular expression (let alone checking that a regular expression has the intended meaning) takes much more effort than expressing the desired matcher in plain English. As Prof. Gerald Sussman explains:

Regular expressions are ubiquitous. On the surface, regular expressions look like a combinator language, because expression fragments can be combined to make more complex expressions. But the meaning of a fragment is highly dependent on the expression it is embedded in. For example, to include a caret character in a bracket expression, [...], it must not be in the first character position. If the caret appears after the first character it is just an ordinary character, but if it appears as the first character it negates the meaning of the bracket expression. For example, a bracket expression may not contain just a caret.

Whereas, expressed a clean language like this one, the meaning of the matcher is more clear than it would be in plain English. And because the matcher can be compiled to a POSIX regular expression, standard tools like grep can still be used.

This matcher supports all features of POSIX regular expressions, including backreferences (implemented by labels rather than asking you to magically keep track of parenthesis counts).

For a detailed specification, as well as some entertaining prose by Gerald Sussman, see ps.txt.

Where

To use, load ps01.scm. Examples can be found in ps.txt, as well as in the test suite regexptest.scm.

This project was an assignment for 6.945 Adventures in Advanced Symbolic Programming at MIT. A set of accompanying questions, as well as a specification of the combinator language, can be found in ps.txt. The top-level include, as well as answers to the accompanying questions, are found in ps01.scm. (Some starter code was originally provided but it has all been rewritten, except for the Scheme interface to grep.)

About

A clean combinator language that compiles to POSIX regular expressions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages