Skip to content

Commit

Permalink
Merge pull request #2 from asmeurer/myst
Browse files Browse the repository at this point in the history
Use Myst instead of recommonmark
  • Loading branch information
asmeurer authored Jul 7, 2020
2 parents 25c9716 + 94eddcd commit 308eae5
Show file tree
Hide file tree
Showing 12 changed files with 376 additions and 363 deletions.
14 changes: 10 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,26 @@ matrix:
dist: xenial
sudo: true

allow_failures:
- python: "nightly"

install:
- pip install sphinx==2 doctr recommonmark
- pip install sphinx doctr myst_parser
- source activate test-environment
# Needed for show_relabars, https://github.com/bitprophet/alabaster/pull/135
- pip install -U git+https://github.com/asmeurer/alabaster/@rellinks-classes

script:
- set -e
# - set -e
- cd docs
- make doctest # tests rst files only
- ./run_doctests *.md
- if [[ "${BUILD}" == "true" ]]; then
make html;
make linkcheck;
cd ..;
doctr deploy --built-docs docs/_build/html .;
if [[ "${TRAVIS_BRANCH}" == "master" ]]; then
doctr deploy --built-docs docs/_build/html .;
else
doctr deploy --no-require-master --built-docs docs/_build/html "docs-$TRAVIS_BRANCH";
fi
fi
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ help:
html: exact_type_table.txt

livehtml: exact_type_table.txt
sphinx-autobuild -b html $(ALLSPHINXOPTS) "$(SOURCEDIR)" $(BUILDDIR)/html
sphinx-autobuild -B -p 0 -b html $(ALLSPHINXOPTS) "$(SOURCEDIR)" $(BUILDDIR)/html

exact_type_table.txt: exact_type_table.py
python exact_type_table.py
Expand Down
95 changes: 49 additions & 46 deletions docs/alternatives.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ definition, that is, every occurrence of the `def` keyword. Such a tool
could be used by a text editor to aid in jumping to function
definitions, for instance.

(regular-expressions)=
## Regular Expressions

Using naive regular expression parsing, you might start with something
Expand Down Expand Up @@ -86,8 +87,9 @@ a piece of (incomplete) Python code has any mismatched parentheses or braces.
In this case, you definitely don't want to do a naive matching of parentheses
in the source as a whole, as a single "mismatched" parenthesis in a string
could confuse the entire engine, even if the source as Python is itself valid.
We will see this example in more detail [later](examples.html#mismatched-parentheses).
We will see this example in more detail [later](mismatched-parentheses).

(tokenize)=
## Tokenize

Now let's consider the tokenize module. It's quite easy to search for
Expand Down Expand Up @@ -121,17 +123,18 @@ depends on what your use-case is and what trade-offs you are willing to
accept.

It should also be noted that the above function is not fully correct, as it
does not properly handle [`ERRORTOKEN`](tokens.html#errortoken)s or
[exceptions](usage.html#exceptions). We will see
[later](examples.html#line-numbers) how to fix it.
does not properly handle [`ERRORTOKEN`](errortoken)s or
[exceptions](exceptions). We will see
[later](line-numbers) how to fix it.

(ast)=
## AST

The `ast` module can also be used to avoid the pitfalls of detecting false
positives. In fact, the `ast` module will have NO false positives. The price
that is paid for this is that the input code to the `ast` module must be
completely valid Python code. Incomplete or syntactically invalid code will
cause `ast.parse` to raise a `SyntaxError`. <sup id="a1" style="font-size:12px">[1](#f1)</sup>
cause `ast.parse` to raise a `SyntaxError`.[^a1]

```py
>>> import ast
Expand Down Expand Up @@ -178,42 +181,42 @@ cons" because some things may be pros (like the ability to work with
incomplete code) or cons (like accepting invalid Python), depending on
what you are trying to do.

```eval_rst
.. list-table::
:header-rows: 1
* - Regular expressions
- ``tokenize``
- ``ast``
* - Can work with incomplete or invalid Python.
- Can work with incomplete or invalid Python, though you may need to
watch for ``ERRORTOKEN`` and exceptions.
- Requires syntactically valid Python (with a few minor exceptions).
* - Regular expressions can be difficult to write correctly and maintain.
- Token types are easy to detect. Larger patterns must be amalgamated
from the tokens. Some tokens mean different things in different contexts.
- AST has high-level abstractions such as ``ast.walk`` and
``NodeTransformer`` that make visiting and transforming nodes easy,
even in complicated ways.
* - Regular expressions work directly on the source code, so it is trivial
to do lossless source code transformations with them.
- Lossless source code transformations are possible with ``tokenize``, as all the
whitespace can be inferred from the ``TokenInfo`` tuples. However, it can
often be tricky to do in practice, as it requires manually accounting
for column offsets.
- Lossless source code transformations are impossible with ``ast``, as it completely
drops whitespace, redundant parentheses, and comments (among other
things).
* - Impossible to detect edge cases in all circumstances, such as code that
actually is inside of a string.
- Edge cases can be avoided. Differentiates between actual code and code
inside a comment or string. Can still be fooled by invalid Python (though this can
often be considered a `garbage in, garbage out
<https://en.wikipedia.org/wiki/Garbage_in,_garbage_out>`_ scenario).
- Edge cases can be avoided effortlessly, as only valid Python can even
be parsed, and each node class represents that syntactic construct
exactly.
```{list-table}
---
header-rows: 1
---
* - Regular expressions
- ``tokenize``
- ``ast``
* - Can work with incomplete or invalid Python.
- Can work with incomplete or invalid Python, though you may need to
watch for ``ERRORTOKEN`` and exceptions.
- Requires syntactically valid Python (with a few minor exceptions).
* - Regular expressions can be difficult to write correctly and maintain.
- Token types are easy to detect. Larger patterns must be amalgamated
from the tokens. Some tokens mean different things in different contexts.
- AST has high-level abstractions such as ``ast.walk`` and
``NodeTransformer`` that make visiting and transforming nodes easy,
even in complicated ways.
* - Regular expressions work directly on the source code, so it is trivial
to do lossless source code transformations with them.
- Lossless source code transformations are possible with ``tokenize``, as all the
whitespace can be inferred from the ``TokenInfo`` tuples. However, it can
often be tricky to do in practice, as it requires manually accounting
for column offsets.
- Lossless source code transformations are impossible with ``ast``, as it completely
drops whitespace, redundant parentheses, and comments (among other
things).
* - Impossible to detect edge cases in all circumstances, such as code that
actually is inside of a string.
- Edge cases can be avoided. Differentiates between actual code and code
inside a comment or string. Can still be fooled by invalid Python (though this can
often be considered a [garbage in, garbage
out](https://en.wikipedia.org/wiki/Garbage_in,_garbage_out) scenario).
- Edge cases can be avoided effortlessly, as only valid Python can even
be parsed, and each node class represents that syntactic construct
exactly.
```

As you can see, all three can be valid depending on what you are trying to do.
Expand All @@ -228,14 +231,15 @@ In addition to `tokenize` and `ast`, the Python standard library has several
[other modules](https://docs.python.org/3/library/language.html) which can aid
in inspecting and manipulating Python source code.

(parso)=
### Parso

As a final note, David Halter's
[parso](https://parso.readthedocs.io/en/latest/) library contains an
alternative implementation of the standard library `tokenize` and `ast`
modules for Python. Parso has many advantages over the standard library, such
as round-trippable AST, a `tokenize()` function that has fewer \"gotchas\" and
doesn't raise [exceptions](usage.html#exceptions), the ability to detect
doesn't raise [exceptions](exceptions), the ability to detect
multiple syntax errors in a single block of code, the ability to parse Python
code for a different version of Python than the one that is running, and more.
If you don\'t mind an external dependency and want to save yourself potential
Expand All @@ -246,7 +250,6 @@ library `tokenize` or `ast`. Parso's tokenizer
it.


<small>[1.](#a1) <span id="f1"></span> Actually there are a handful of syntax errors that
cannot be detected by the AST due to their context sensitive nature, such
as `break` outside of a loop. These are found only after compiling the
AST.</small>
[^a1]: Actually there are a handful of syntax errors that cannot be detected
by the AST due to their context sensitive nature, such as `break`
outside of a loop. These are found only after compiling the AST.
26 changes: 2 additions & 24 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,13 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'myst_parser',
'sphinx.ext.doctest',
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
from recommonmark.parser import CommonMarkParser
from recommonmark.transform import AutoStructify

source_parsers = {
'.md': CommonMarkParser,
}

enable_eval_rst = True
source_suffix = ['.rst', '.md']
# source_suffix = '.rst'
Expand Down Expand Up @@ -125,7 +115,7 @@
# Fonts
'font_family': "Palatino, 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro', serif",
'font_size': '18px',
'code_font_family': "'Menlo', 'Deja Vu Sans Mono', 'Consolas', 'Bitstream Vera Sans Mono', monospace",
'code_font_family': "'Menlo', 'DejaVu Sans Mono', 'Consolas', 'Bitstream Vera Sans Mono', monospace",
'code_font_size': '0.8em',
}

Expand Down Expand Up @@ -203,15 +193,3 @@
author, 'BrownWaterPython', 'One line description of project.',
'Miscellaneous'),
]


# -- Extension configuration -------------------------------------------------

def setup(app):
app.add_config_value('recommonmark_config', {
'enable_eval_rst': True,
'auto_toc_tree_section': 'Contents',
'enable_math': False,
'enable_inline_math': False,
}, True)
app.add_transform(AutoStructify)
83 changes: 48 additions & 35 deletions docs/exact_type_table.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,23 @@
#!/usr/bin/env python
import sys
import tokenize
from pathlib import Path

HEADER = """
.. list-table::
:header-rows: 1
* - Exact token type
- String value
"""

TABLE_ENTRY = """
* - ``{token_name}``{note}
- ``{token_string}``
"""
# TODO: Make these footnotes appear right below the table. See
# https://github.com/executablebooks/MyST-Parser/issues/179

FOOTER = """
.. rubric:: Footnotes
.. [#f1] Due to a `bug <https://bugs.python.org/issue24622>`_, the
``exact_type`` for ``RARROW`` and ``ELLIPSIS`` tokens is ``OP`` in Python
versions prior to 3.7. See `above <#rarrow>`_.
[^f1]: Due to a [bug](https://bugs.python.org/issue24622), the `exact_type`
for `RARROW` and `ELLIPSIS` tokens is `OP` in Python versions prior to
3.7. See [above](rarrow).
.. [#f2] New in Python 3.8.
[^f2]: New in Python 3.8.
"""

def escape(s):
return '\\' + '\\'.join(s)
def code(s):
return '`' + s + '`'

token_types = {num: string for string, num in tokenize.EXACT_TOKEN_TYPES.items()}

Expand All @@ -37,23 +26,47 @@ def main():
sys.exit("This script should be run with Python 3.8 or newer.")

print("Generating exact_type_table.txt")

name_column = ['Exact token type']
string_column = ['String value']
for token_type in sorted(token_types):
token_name = tokenize.tok_name[token_type]
token_string = token_types[token_type]
if token_type in [tokenize.RARROW, tokenize.ELLIPSIS]:
note = " [^f1]"
elif token_type == tokenize.COLONEQUAL:
note = " [^f2]"
else:
note = ''

name_column.append(code(token_name) + note)
string_column.append(code(token_string))

name_column_width = len(max(name_column, key=len)) + 2
string_column_width = len(max(string_column, key=len)) + 2

assert len(name_column) == len(string_column)

with open('exact_type_table.txt', 'w') as f:
f.write(HEADER)
for token_type in sorted(token_types):
token_string = token_types[token_type]
if token_type in [tokenize.RARROW, tokenize.ELLIPSIS]:
note = " [#f1]_"
elif token_type == tokenize.COLONEQUAL:
note = " [#f2]_"
else:
note = ''

f.write(TABLE_ENTRY.format(
token_name=tokenize.tok_name[token_type],
token_string=token_string,
note=note,
))
for i, (typ, string) in enumerate(zip(name_column, string_column)):
f.write('|')
f.write(typ.center(name_column_width))
f.write('|')
f.write(string.center(string_column_width))
f.write('|')
f.write('\n')
if i == 0:
f.write('|')
f.write('-'*name_column_width)
f.write('|')
f.write('-'*string_column_width)
f.write('|')
f.write('\n')

f.write(FOOTER)

# touch tokens.md so it forces a rebuild
Path('tokens.md').touch()

if __name__ == '__main__':
main()
Loading

0 comments on commit 308eae5

Please sign in to comment.