diff --git a/_data/authors.yml b/_data/authors.yml index 53469fc1..c0fd9fe7 100644 --- a/_data/authors.yml +++ b/_data/authors.yml @@ -88,3 +88,17 @@ Patrick J. Roddy: - label: Website icon: fas fa-fw fa-link url: https://paddyroddy.github.io +Eliot W. Robson: + name : Eliot W. Robson + bio : "PhD Student, University of Illinois Urbana-Champaign" + avatar : /images/people/eliot-w-robson.jpg + links: + - label: "Email" + icon: "fas fa-fw fa-envelope-square" + url: "mailto:eliot.robson24@gmail.com" + - label: GitHub + icon: fab fa-fw fa-github + url: https://github.com/eliotwrobson + - label: LinkedIn + icon: fab fa-fw fa-linkedin + url: https://www.linkedin.com/in/eliot-robson/ diff --git a/_posts/2024-07-10-automata.md b/_posts/2024-07-10-automata.md new file mode 100644 index 00000000..ffb2b971 --- /dev/null +++ b/_posts/2024-07-10-automata.md @@ -0,0 +1,126 @@ +--- +layout: single +title: "automata: Simulation and manipulation" +excerpt: automata is a package implementing structures and algorithms for manipulating finite automata, pushdown automata, and Turing machines, that was recently accepted into the pyOpenSci ecosystem. +author: Eliot W. Robson +permalink: /blog/automata +header: + overlay_image: /images/automata/finite_language_dfa.png + overlay_filter: rgba(20, 13, 36, 0.8) +categories: + - blog-post + - automata + - formal-languages + - models-of-computation + - pyos-accepted +classes: wide +comments: true +--- + + +Automata are abstract machines used to represent models of computation, and are a central object of study in theoretical computer science. Given an input string of characters over a fixed alphabet, these machines either accept or reject the string. A language corresponding to an automaton is +the set of all strings it accepts. Three important families of automata in increasing order of generality are the following: + +1. Finite-state automata +2. Pushdown automata +3. Turing machines + +The [`automata`](https://caleb531.github.io/automata/) package facilitates working with these families by allowing simulation of reading input and higher-level manipulation +of the corresponding languages using specialized algorithms. For an overview on automata theory, see [this Wikipedia article](https://en.wikipedia.org/wiki/Automata_theory), and +for a more comprehensive introduction to each of these topics, see [these lecture notes](https://jeffe.cs.illinois.edu/teaching/algorithms/#models). + +## Statement of need + +Automata are a core component of both computer science education and research, seeing further theoretical work +and applications in a wide variety of areas such as computational biology and networking. +Consequently, the manipulation of automata with software packages has seen significant attention from +researchers in the past. The similarly named Mathematica package [`Automata`](https://www.cs.cmu.edu/~sutner/automata.html) implements a number of +algorithms for use with finite-state automata, including regular expression conversion and binary set operations. +In Java, the [Brics package](https://www.brics.dk/automaton/) implements similar algorithms, while the [JFLAP package](https://www.jflap.org/) places an emphasis +on interactivity and simulation of more general families of automata. + +[`automata`](https://caleb531.github.io/automata/) serves the demand for such a package in the Python software ecosystem, implementing algorithms and allowing for +simulation of automata in a manner comparable to the packages described previously. As a popular high-level language, Python enables +significant flexibility and ease of use that directly benefits many users. The package includes a comprehensive test suite, +support for modern language features (including type annotations), and has a large number of different automata, +meeting the demands of users across a wide variety of use cases. In particular, the target audience +is both researchers that wish to manipulate automata, and for those in educational contexts to reinforce understanding about how these +models of computation function. + +## Package features + +The API of the package is designed to mimic the formal mathematical description of each automaton using built-in Python data structures +(such as sets and dicts). This is for ease of use by those that are unfamiliar with these models of computation, while also providing performance +suitable for tasks arising in research. In particular, algorithms in the package have been written for tackling +performance on large inputs, incorporating optimizations such as only exploring the reachable set of states +in the construction of a new finite-state automaton. The package also has native display integration with Jupyter +notebooks, enabling easy visualization that allows students to interact with [`automata`](https://caleb531.github.io/automata/) in an exploratory manner. + +Of note are some commonly used and technical algorithms implemented in the package for finite-state automata: + +- An optimized version of the Hopcroft-Karp algorithm to determine whether two deterministic finite automata (DFA) are equivalent. + +- The product construction algorithm for binary set operations (union, intersection, etc.) on the languages corresponding to two input DFAs. + +- Thompson's algorithm for converting regular expressions to equivalent nondeterministic finite automata (NFA). + +- Hopcroft's algorithm for DFA minimization. + +- A specialized algorithm for directly constructing a state-minimal DFA accepting a given finite language. + +- A specialized algorithm for directly constructing a minimal DFA recognizing strings containing a given substring. + +To the authors' knowledge, this is the only Python package implementing all of the automata manipulation algorithms stated above. + +## Example usage + +![A visualization of `target_words_dfa`. Transitions on characters leading to immediate rejections are omitted.]({{ site.url }}/images/automata/finite_language_dfa.png) + +The following example is inspired by the use case described in [this blog post](http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levenshtein-Automata). +We wish to determine which strings in a given set are within the target edit distance +to a reference string. We will first initialize a DFA corresponding to a fixed set of target words +over the alphabet of all lowercase ascii characters. + +```python +from automata.fa.dfa import DFA +from automata.fa.nfa import NFA +import string + +target_words_dfa = DFA.from_finite_language( + input_symbols=set(string.ascii_lowercase), + language={'these', 'are', 'target', 'words', 'them', 'those'}, +) +``` + +Next, we construct an NFA recognizing all strings within a target edit distance of a fixed +reference string, and then immediately convert this to an equivalent DFA. The package provides +builtin functions to make this construction easy, and we use the same alphabet as the DFA that was just created. + +```python +words_within_edit_distance_dfa = DFA.from_nfa( + NFA.edit_distance( + input_symbols=set(string.ascii_lowercase), + reference_str='they', + max_edit_distance=2, + ) +) +``` + +Finally, we take the intersection of the two DFAs we have constructed and read all of +the words in the output DFA into a list. The library makes this straightforward and idiomatic. + +```python +found_words_dfa = target_words_dfa & words_within_edit_distance_dfa +found_words = list(found_words_dfa) +``` + +The DFA `found_words_dfa` accepts strings in the intersection of the languages of the +DFAs given as input, and `found_words` is a list containing this language. Note the power of this +technique is that the DFA `words_within_edit_distance_dfa` +has an infinite language, meaning we could not do this same computation just using the builtin +sets in Python directly (as they always represent a finite collection), although the +syntax used by [`automata`](https://caleb531.github.io/automata/) is very similar to promote intuition. + +## Citing + +This post is adapted from [our JOSS paper](https://joss.theoj.org/papers/10.21105/joss.05759), which should be used for citations. diff --git a/images/automata/finite_language_dfa.png b/images/automata/finite_language_dfa.png new file mode 100644 index 00000000..1f813410 Binary files /dev/null and b/images/automata/finite_language_dfa.png differ diff --git a/images/people/eliot-w-robson.jpg b/images/people/eliot-w-robson.jpg new file mode 100644 index 00000000..4b7e0c4c Binary files /dev/null and b/images/people/eliot-w-robson.jpg differ