Merge pull request #2 from asmeurer/myst

Use Myst instead of recommonmark
asmeurer · Jul 7, 2020 · 308eae5 · 308eae5
2 parents 25c9716 + 94eddcd
commit 308eae5
Show file tree

Hide file tree

Showing 12 changed files with 376 additions and 363 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -24,20 +24,26 @@ matrix:
       dist: xenial
       sudo: true
 
+  allow_failures:
+    - python: "nightly"
+
 install:
-  - pip install sphinx==2 doctr recommonmark
+  - pip install sphinx doctr myst_parser
   - source activate test-environment
   # Needed for show_relabars, https://github.com/bitprophet/alabaster/pull/135
   - pip install -U git+https://github.com/asmeurer/alabaster/@rellinks-classes
 
 script:
-  - set -e
+  # - set -e
   - cd docs
-  - make doctest # tests rst files only
   - ./run_doctests *.md
   - if [[ "${BUILD}" == "true" ]]; then
         make html;
         make linkcheck;
         cd ..;
-        doctr deploy --built-docs docs/_build/html .;
+        if [[ "${TRAVIS_BRANCH}" == "master" ]]; then
+            doctr deploy --built-docs docs/_build/html .;
+        else
+            doctr deploy --no-require-master --built-docs docs/_build/html "docs-$TRAVIS_BRANCH";
+        fi
     fi
diff --git a/docs/Makefile b/docs/Makefile
@@ -18,7 +18,7 @@ help:
 html: exact_type_table.txt
 
 livehtml: exact_type_table.txt
-	sphinx-autobuild -b html $(ALLSPHINXOPTS) "$(SOURCEDIR)" $(BUILDDIR)/html
+	sphinx-autobuild -B -p 0 -b html $(ALLSPHINXOPTS) "$(SOURCEDIR)" $(BUILDDIR)/html
 
 exact_type_table.txt: exact_type_table.py
 	python exact_type_table.py

diff --git a/docs/alternatives.md b/docs/alternatives.md
@@ -15,6 +15,7 @@ definition, that is, every occurrence of the `def` keyword. Such a tool
 could be used by a text editor to aid in jumping to function
 definitions, for instance.
 
+(regular-expressions)=
 ## Regular Expressions
 
 Using naive regular expression parsing, you might start with something
@@ -86,8 +87,9 @@ a piece of (incomplete) Python code has any mismatched parentheses or braces.
 In this case, you definitely don't want to do a naive matching of parentheses
 in the source as a whole, as a single "mismatched" parenthesis in a string
 could confuse the entire engine, even if the source as Python is itself valid.
-We will see this example in more detail [later](examples.html#mismatched-parentheses).
+We will see this example in more detail [later](mismatched-parentheses).
 
+(tokenize)=
 ## Tokenize
 
 Now let's consider the tokenize module. It's quite easy to search for
@@ -121,17 +123,18 @@ depends on what your use-case is and what trade-offs you are willing to
 accept.
 
 It should also be noted that the above function is not fully correct, as it
-does not properly handle [`ERRORTOKEN`](tokens.html#errortoken)s or
-[exceptions](usage.html#exceptions). We will see
-[later](examples.html#line-numbers) how to fix it.
+does not properly handle [`ERRORTOKEN`](errortoken)s or
+[exceptions](exceptions). We will see
+[later](line-numbers) how to fix it.
 
+(ast)=
 ## AST
 
 The `ast` module can also be used to avoid the pitfalls of detecting false
 positives. In fact, the `ast` module will have NO false positives. The price
 that is paid for this is that the input code to the `ast` module must be
 completely valid Python code. Incomplete or syntactically invalid code will
-cause `ast.parse` to raise a `SyntaxError`. <sup id="a1" style="font-size:12px">[1](#f1)</sup>
+cause `ast.parse` to raise a `SyntaxError`.[^a1]
 
 ```py
 >>> import ast
@@ -178,42 +181,42 @@ cons" because some things may be pros (like the ability to work with
 incomplete code) or cons (like accepting invalid Python), depending on
 what you are trying to do.
 
-```eval_rst
-
-.. list-table::
-   :header-rows: 1
-
-   * - Regular expressions
-     - ``tokenize``
-     - ``ast``
-   * - Can work with incomplete or invalid Python.
-     - Can work with incomplete or invalid Python, though you may need to
-       watch for ``ERRORTOKEN`` and exceptions.
-     - Requires syntactically valid Python (with a few minor exceptions).
-   * - Regular expressions can be difficult to write correctly and maintain.
-     - Token types are easy to detect. Larger patterns must be amalgamated
-       from the tokens. Some tokens mean different things in different contexts.
-     - AST has high-level abstractions such as ``ast.walk`` and
-       ``NodeTransformer`` that make visiting and transforming nodes easy,
-       even in complicated ways.
-   * - Regular expressions work directly on the source code, so it is trivial
-       to do lossless source code transformations with them.
-     - Lossless source code transformations are possible with ``tokenize``, as all the
-       whitespace can be inferred from the ``TokenInfo`` tuples. However, it can
-       often be tricky to do in practice, as it requires manually accounting
-       for column offsets.
-     - Lossless source code transformations are impossible with ``ast``, as it completely
-       drops whitespace, redundant parentheses, and comments (among other
-       things).
-   * - Impossible to detect edge cases in all circumstances, such as code that
-       actually is inside of a string.
-     - Edge cases can be avoided. Differentiates between actual code and code
-       inside a comment or string. Can still be fooled by invalid Python (though this can
-       often be considered a `garbage in, garbage out
-       <https://en.wikipedia.org/wiki/Garbage_in,_garbage_out>`_ scenario).
-     - Edge cases can be avoided effortlessly, as only valid Python can even
-       be parsed, and each node class represents that syntactic construct
-       exactly.
+```{list-table}
+---
+header-rows: 1
+---
+
+* - Regular expressions
+  - ``tokenize``
+  - ``ast``
+* - Can work with incomplete or invalid Python.
+  - Can work with incomplete or invalid Python, though you may need to
+    watch for ``ERRORTOKEN`` and exceptions.
+  - Requires syntactically valid Python (with a few minor exceptions).
+* - Regular expressions can be difficult to write correctly and maintain.
+  - Token types are easy to detect. Larger patterns must be amalgamated
+    from the tokens. Some tokens mean different things in different contexts.
+  - AST has high-level abstractions such as ``ast.walk`` and
+    ``NodeTransformer`` that make visiting and transforming nodes easy,
+    even in complicated ways.
+* - Regular expressions work directly on the source code, so it is trivial
+    to do lossless source code transformations with them.
+  - Lossless source code transformations are possible with ``tokenize``, as all the
+    whitespace can be inferred from the ``TokenInfo`` tuples. However, it can
+    often be tricky to do in practice, as it requires manually accounting
+    for column offsets.
+  - Lossless source code transformations are impossible with ``ast``, as it completely
+    drops whitespace, redundant parentheses, and comments (among other
+    things).
+* - Impossible to detect edge cases in all circumstances, such as code that
+    actually is inside of a string.
+  - Edge cases can be avoided. Differentiates between actual code and code
+    inside a comment or string. Can still be fooled by invalid Python (though this can
+    often be considered a [garbage in, garbage
+    out](https://en.wikipedia.org/wiki/Garbage_in,_garbage_out) scenario).
+  - Edge cases can be avoided effortlessly, as only valid Python can even
+    be parsed, and each node class represents that syntactic construct
+    exactly.
 ```
 
 As you can see, all three can be valid depending on what you are trying to do.
@@ -228,14 +231,15 @@ In addition to `tokenize` and `ast`, the Python standard library has several
 [other modules](https://docs.python.org/3/library/language.html) which can aid
 in inspecting and manipulating Python source code.
 
+(parso)=
 ### Parso
 
 As a final note, David Halter's
 [parso](https://parso.readthedocs.io/en/latest/) library contains an
 alternative implementation of the standard library `tokenize` and `ast`
 modules for Python. Parso has many advantages over the standard library, such
 as round-trippable AST, a `tokenize()` function that has fewer \"gotchas\" and
-doesn't raise [exceptions](usage.html#exceptions), the ability to detect
+doesn't raise [exceptions](exceptions), the ability to detect
 multiple syntax errors in a single block of code, the ability to parse Python
 code for a different version of Python than the one that is running, and more.
 If you don\'t mind an external dependency and want to save yourself potential
@@ -246,7 +250,6 @@ library `tokenize` or `ast`. Parso's tokenizer
 it.
 
 
-<small>[1.](#a1) <span id="f1"></span> Actually there are a handful of syntax errors that
-     cannot be detected by the AST due to their context sensitive nature, such
-     as `break` outside of a loop. These are found only after compiling the
-     AST.</small>
+[^a1]: Actually there are a handful of syntax errors that cannot be detected
+       by the AST due to their context sensitive nature, such as `break`
+       outside of a loop. These are found only after compiling the AST.
diff --git a/docs/conf.py b/docs/conf.py
@@ -39,23 +39,13 @@
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
 extensions = [
+    'myst_parser',
     'sphinx.ext.doctest',
 ]
 
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
 
-# The suffix(es) of source filenames.
-# You can specify multiple suffix as a list of string:
-#
-# source_suffix = ['.rst', '.md']
-from recommonmark.parser import CommonMarkParser
-from recommonmark.transform import AutoStructify
-
-source_parsers = {
-    '.md': CommonMarkParser,
-}
-
 enable_eval_rst = True
 source_suffix = ['.rst', '.md']
 # source_suffix = '.rst'
@@ -125,7 +115,7 @@
     # Fonts
     'font_family': "Palatino, 'goudy old style', 'minion pro', 'bell mt', Georgia, 'Hiragino Mincho Pro', serif",
     'font_size': '18px',
-    'code_font_family': "'Menlo', 'Deja Vu Sans Mono', 'Consolas', 'Bitstream Vera Sans Mono', monospace",
+    'code_font_family': "'Menlo', 'DejaVu Sans Mono', 'Consolas', 'Bitstream Vera Sans Mono', monospace",
     'code_font_size': '0.8em',
     }
 
@@ -203,15 +193,3 @@
      author, 'BrownWaterPython', 'One line description of project.',
      'Miscellaneous'),
 ]
-
-
-# -- Extension configuration -------------------------------------------------
-
-def setup(app):
-    app.add_config_value('recommonmark_config', {
-        'enable_eval_rst': True,
-        'auto_toc_tree_section': 'Contents',
-        'enable_math': False,
-        'enable_inline_math': False,
-        }, True)
-    app.add_transform(AutoStructify)
diff --git a/docs/exact_type_table.py b/docs/exact_type_table.py
@@ -1,34 +1,23 @@
 #!/usr/bin/env python
 import sys
 import tokenize
+from pathlib import Path
 
-HEADER = """
-.. list-table::
-   :header-rows: 1
-
-   * - Exact token type
-     - String value
-"""
-
-TABLE_ENTRY = """
-   * - ``{token_name}``{note}
-     - ``{token_string}``
-"""
+# TODO: Make these footnotes appear right below the table. See
+# https://github.com/executablebooks/MyST-Parser/issues/179
 
 FOOTER = """
 
-.. rubric:: Footnotes
-
-.. [#f1] Due to a `bug <https://bugs.python.org/issue24622>`_, the
-   ``exact_type`` for ``RARROW`` and ``ELLIPSIS`` tokens is ``OP`` in Python
-   versions prior to 3.7. See `above <#rarrow>`_.
+[^f1]: Due to a [bug](https://bugs.python.org/issue24622), the `exact_type`
+       for `RARROW` and `ELLIPSIS` tokens is `OP` in Python versions prior to
+       3.7. See [above](rarrow).
 
-.. [#f2] New in Python 3.8.
+[^f2]: New in Python 3.8.
 
 """
 
-def escape(s):
-    return '\\' + '\\'.join(s)
+def code(s):
+    return '`' + s + '`'
 
 token_types = {num: string for string, num in tokenize.EXACT_TOKEN_TYPES.items()}
 
@@ -37,23 +26,47 @@ def main():
         sys.exit("This script should be run with Python 3.8 or newer.")
 
     print("Generating exact_type_table.txt")
+
+    name_column = ['Exact token type']
+    string_column = ['String value']
+    for token_type in sorted(token_types):
+        token_name = tokenize.tok_name[token_type]
+        token_string = token_types[token_type]
+        if token_type in [tokenize.RARROW, tokenize.ELLIPSIS]:
+            note = " [^f1]"
+        elif token_type == tokenize.COLONEQUAL:
+            note = " [^f2]"
+        else:
+            note = ''
+
+        name_column.append(code(token_name) + note)
+        string_column.append(code(token_string))
+
+    name_column_width = len(max(name_column, key=len)) + 2
+    string_column_width = len(max(string_column, key=len)) + 2
+
+    assert len(name_column) == len(string_column)
+
     with open('exact_type_table.txt', 'w') as f:
-        f.write(HEADER)
-        for token_type in sorted(token_types):
-            token_string = token_types[token_type]
-            if token_type in [tokenize.RARROW, tokenize.ELLIPSIS]:
-                note = " [#f1]_"
-            elif token_type == tokenize.COLONEQUAL:
-                note = " [#f2]_"
-            else:
-                note = ''
-
-            f.write(TABLE_ENTRY.format(
-                token_name=tokenize.tok_name[token_type],
-                token_string=token_string,
-                note=note,
-                ))
+        for i, (typ, string) in enumerate(zip(name_column, string_column)):
+            f.write('|')
+            f.write(typ.center(name_column_width))
+            f.write('|')
+            f.write(string.center(string_column_width))
+            f.write('|')
+            f.write('\n')
+            if i == 0:
+                f.write('|')
+                f.write('-'*name_column_width)
+                f.write('|')
+                f.write('-'*string_column_width)
+                f.write('|')
+                f.write('\n')
+
         f.write(FOOTER)
 
+    # touch tokens.md so it forces a rebuild
+    Path('tokens.md').touch()
+
 if __name__ == '__main__':
     main()