Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MD -> LaTeX: gather is not properly generated #974

Closed
LecrisUT opened this issue Mar 13, 2024 · 17 comments · Fixed by #977
Closed

MD -> LaTeX: gather is not properly generated #974

LecrisUT opened this issue Mar 13, 2024 · 17 comments · Fixed by #977
Labels
bug Something isn't working

Comments

@LecrisUT
Copy link
Contributor

Description

Consider the following document:

Some introduction text

\begin{gather}
    E=mc^2
\end{gather}

The issue is that the LaTeX document that it generates becomes:

Some introduction text

\begin{equation}
\begin{gather}
    E=mc^2
\end{gather}
\end{equation}

This should not be surrounded by equation block

Proposed solution

  • Distinguish which environments need to be within a math environment, e.g. gathered vs gather
  • Distinguish which environments handle numbering gathered vs gather
  • Distinguish when we want something to be numbered and not gather vs gather*

Then conditionally add either equation or equation* or nothing to the LaTeX document

@LecrisUT LecrisUT added the bug Something isn't working label Mar 13, 2024
@agoose77
Copy link
Contributor

The document that you are rendering is built in two phases:

  1. Parsing the LaTeX into our MDAST
  2. Writing the MDAST into LaTeX

Looking at the writer, we acknowledge that we don't yet handle AMS math. I did wonder whether we should flag the AST to indicate that the content is ASMMath, but that would probably require annotating the .data field, or adding a container.

So, my guess is we might want to just identify AMSMath from the content of the node. In the writer, this would probably be a simple as testing for \\\\\{\w+\*?\} at the start of the content. Would you be interested in adding this? I could walk you through it if you've not contributed before (but I do recognise your handle, so perhaps you have).

@rowanc1
Copy link
Collaborator

rowanc1 commented Mar 13, 2024

There is a TODO in the writer here:
https://github.com/executablebooks/mystjs/blob/main/packages/myst-to-tex/src/math.ts#L50

Depending on how deep this goes, we might need to have an update to the tokenizer:
https://github.com/executablebooks/markdown-it-amsmath

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Mar 13, 2024

a simple as testing for \\\\\{\w+\*?\} It would be better to be more explicit with the \w+ and handle AMSMath specifics, unless we want to pass all \begin/\end environments as is? I am not sure how that would work on the site generation side.

Would you be interested in adding this? I could walk you through it

Sure, some help on this would be good, since this is a show-stopper for trying out mystmd

Depending on how deep this goes, we might need to have an update to the tokenizer:
https://github.com/executablebooks/markdown-it-amsmath

That does seem reasonable. I was thinking for this one, to add another MATH_HANDLER for ams math which just passes it down as is. It would be tricky to workout the label, but maybe we can regex the content and register any \label{*} command we see inside it?

I think the issue here is that \begin{amsmath_env} are implicitly wrapped by :::{math}

@agoose77
Copy link
Contributor

agoose77 commented Mar 13, 2024

If you open a PR, we can collaborate on it.

It would be better to be more explicit

I think we can assume that the contents of the math node are determined by the tokenizer/parser phase, so if we see an environment of any kind it has already been "validated". As such, we just want to re-use the existing environment rather than wrap in a math environment.

The question will be what to do about the label and enumerated options that the user can set on the AST node via directive options (i.e. for MyST sources, not LaTeX). There may be some merit in parsing the environment and modifying it if the user has e.g. written

:::{math}
:enumerated: true

\\begin{gather*}
...
\\end{gather*}
:::

Because these options are only set if the user explicitly creates a math environment in MyST markdown, and specifies them.

We could alternatively ignore them, but I'm somewhat against that.

@LecrisUT
Copy link
Contributor Author

:::{math}
:enumerated: true

\\begin{gather*}
...
\\end{gather*}
:::

I would say if the user combines math and amsmath it is ill-defined. Do they want the labeling to be as subequations, do they want one to have priority over the other, etc. We should throw an error in those situations and ask them to review it.

Caveat with gathered and friends that are meant to be used inside an already defined math environment. There the intent is well defined and we should just skip it. Should ensure that :::{math} remains a single line math for those though.

If you open a PR, we can collaborate on it.

As soon as I have any commit to open it with :D.

I think we can assume that the contents of the math node are determined by the tokenizer/parser phase

Thanks for the pointer. That would still be in the myst-to-tex? I went through a search of gather and couldn't find anything to tell me where that parsing occurs

@agoose77
Copy link
Contributor

That would still be in the myst-to-tex?

That should be tex-to-myst!

@rowanc1
Copy link
Collaborator

rowanc1 commented Mar 13, 2024

For some background/info on a few of the code-paths at play:

When parsing a tex file (no markdown):

tex-to-myst --> AST --> transformations --> myst-to-tex

When parsing tex AMS environment in a markdown file:

markdown-it-amsmath --> myst-parser --> AST --> transformations --> myst-to-tex

I think that in this case you are likely writing in markdown? In which case tex-to-myst is never involved.

For the labels, we parse those out of the equations so that we can do cross-references, etc. in myst. There is also some handling for sub-equations, with each of their own labels. All of this is done in a transformation on the math node in the AST.

@LecrisUT
Copy link
Contributor Author

Hmm, so if markdown-it-amsmath is used, then it is quite a red-herring. It seems that the main point is where does it choose to use the math handler/where does it inject the math block

@agoose77
Copy link
Contributor

@LecrisUT I just had a meeting with @rowanc1 where it became clear I think I misunderstood the document type that you're using; I'd thought you were using LaTeX -> LaTeX, but it seems like you're actually reading MyST -> LaTeX. I'm just working on some other PRs at the moment, but I'll circle back around to this shortly.

@agoose77
Copy link
Contributor

It seems to me that we want to include the environment in the math AST node. As such, we also want to support the environment in the math directive e.g.

:::{math}
\begin{gather}
...
\end{gather}
:::

We already strip out the "basic" environments in https://github.com/executablebooks/mystmd/blob/974dde179c40ff57908b55925baf9212455a16bf/packages/myst-transforms/src/math.ts#L251

I believe we should leave the AMSMath environments because there's no strong push factor to use an alternative. We could lift information about the environment to a new attribute of the math node, but I don't think there's a strong benefit, and presently it would probably complicate things more. It's really a question of "what is math".

My proposal is do the simple thing now, and we can "fix" it in future if we realise it was the wrong call.

@LecrisUT
Copy link
Contributor Author

LecrisUT commented Mar 13, 2024

Hmm, what about for a temporary fix, inside
https://github.com/executablebooks/mystmd/blob/974dde179c40ff57908b55925baf9212455a16bf/packages/myst-to-tex/src/math.ts#L49-L54
We add a quick check for \\begin\{(gather\*?|align\*?)\} and if that passes, we simply print the content as it?

For a more proper call, the node should tell us that it is a asmmath type and what environment it's from, but I'm not sure where that would have to go

@agoose77
Copy link
Contributor

agoose77 commented Mar 13, 2024

Yes, there are two questions here:

  1. Short-term fix?
  2. Long-term changes?

I think we should look to fix this as-is, and discuss longer-term AST changes in a new issue :)

I think we should try and strip the ASMMath environment so that we can add labels. This makes things more complicated, but is the "right" thing to do imo.

@LecrisUT
Copy link
Contributor Author

I think we should try and strip the ASMMath environment so that we can add labels. This makes things more complicated, but is the "right" thing to do imo.

Wait, I thought the opposite, i.e. strip the equation wrapper when asmmath is detected, and give an error if label is added there just in case. The issue is equation is always a single-line math environment, while gather/align are multi-line

@agoose77
Copy link
Contributor

The issue is equation is always a single-line math environment, while gather/align are multi-line

That's a good point! Hmm. Maybe the proper solution is then to properly parse AMSMath environments into multiple math nodes. But, that's a complex fix. So, for now, let's just pass-through the environment as-is, as you suggest!

@rowanc1
Copy link
Collaborator

rowanc1 commented Mar 13, 2024

This note is a bit off-topic (in that we should pursue the simple approach suggested in the PR), however, I want to explain some of the previous thinking/work we are trying to put in this direction.

There are some changes we made to parsing latex subequations and for align environments to allow these to be picked up and translated to both JATS (used for archiving scientific documents) and the HTML view with basic rendering and cross-references. The challenge in representing this in MyST (and in JATS) is that those environments need to be in charge of the equation labelling and numbering (with specific nodes/attributes for the cross references), and this fights with latex (or at least needs to share understanding). There is actually no way to do a multi-line, labeled subequations in JATS for example, and the way people do it with an disp-formula-group completely destroys the alignment that is the goal of the align and gather groups in the first place. So we are stuck somewhere between this:

image

and this:

image

The way that I am thinking that we get around that is actually by holding both of these variants in the tree, and we introduced a mathGroup (similar to a citeGroup) that has the raw latex to be rendered as well as the children nodes. Right now that mathGroup only renders the children/numbers, but we should update the UI to show the original with the labels they will need to be generated by katex rather than myst (then updating the enumerators with a post html transform or css perhaps to match the numbering). A start in that direction which works from tex-->jats is shown in a test here. I think the combination of these with the fixes suggested above should allow us to fully support both the semantics and the visual-alignment.

@LecrisUT
Copy link
Contributor Author

Could mathGroup behave like admonitions in that on the markdown side it's a container like the \begin{} environments that we can pass specific class: align? Something like

:::{mathGroup}
:class: align

x + y & \geq 0

x - y & \leq 4

x \times y & \geq 1

x^y + y^x &\geq 1
:::

would generate

\begin{align}
x + y & \geq 0 \\
x - y & \leq 4 \\
x \times y & \geq 1 \\
x^y + y^x &\geq 1
\end{align}

Not sure how to deal with label in that case either unless we just allow them to be defined in there as well and just parse the contents of it from TeX and back.

@LecrisUT LecrisUT changed the title LaTeX: gather is not properly generated MD -> LaTeX: gather is not properly generated Mar 13, 2024
@rowanc1
Copy link
Collaborator

rowanc1 commented Mar 15, 2024

I moved my comment to #992 to continue the conversation as #977 will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants