Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative images are relative to working directory, not file #3752

Closed
Porges opened this issue Jun 21, 2017 · 68 comments
Closed

Relative images are relative to working directory, not file #3752

Porges opened this issue Jun 21, 2017 · 68 comments

Comments

@Porges
Copy link

Porges commented Jun 21, 2017

I'm using 1.19.2.1.

I have a setup like:

  • src
    • 001
      • 001.md
      • 001.jpg
    • 002
      • 002.md
      • 002.jpg

In the .md files I'm trying to include the images via relative links like ![](001.jpg), however when I do this and build from the top level (passing all the .md files as arguments), Pandoc cannot find the images. Instead I must supply the path as relative to the working directory (so src/001/001.jpg), which is a bit clunky.

So: a) is there any way to get my desired behaviour, and b) is the current behaviour intended? I would have expected paths to be relative to the file that they appear in.

Thanks!

@jgm
Copy link
Owner

jgm commented Jun 21, 2017

Yes, current behavior is intended. Pandoc just acts on a stream of text that may come from files (possibly several files in different directories) or from stdin; it doesn't keep track of what directory the text came from. Thus pandoc foo/bar.txt is equivalent to cat foo/bar.txt | pandoc.

See #852, which added a --resource-path command line option. This should help in your case, though it's not released. (You could try compiling from source or using pandoc-nightly, but be aware there are significant changes in 2.0.) You may have problems with this, though, if you have files with the same name in different directories.

@jgm
Copy link
Owner

jgm commented Jun 21, 2017

I can think of a somewhat complex way in which this might be improved. Not sure it's worth it, though. Instead of having the reader take a Text as argument, we could have it take something like a list of pairs of filenames and Texts:

data Source = Stdin | File FilePath | Url String
newtype Sources = Sources  [(Source, Text)]
readMarkdown :: PandocMonad m => ReaderOptions -> Sources -> m Pandoc

Most of the readers use parsec parsers; we could define a custom Stream instance for Sources by defining

uncons :: s -> m (Maybe (t, s))

We could store the name of the current source file in the "common state" of the pandoc monad. The readers could then check for the image file, first in the local directory and then in the working directory or resource path, and adjust the path accordingly. (Note: we don't normally do anything like this until the writers.)

This would help with your use case, at the expense of making the pandoc API considerably more complicated. Not sure it's worth it.

@mb21 @jkr I'd be curious if you have any thoughts about this. @jkr, can --file-scope help with this? Perhaps when --file-scope is used we could automatically add the input file's path to resource path. But this wouldn't really help, since file-scope only affects parsing, and currently we don't load resources until the writing phase.

@mb21
Copy link
Collaborator

mb21 commented Jun 23, 2017

That's a hard one. I've run into this (kind of unexpected behaviour) myself. Then again, it's really nice to have pandoc behave consistently when input is piped to in and when read from a file. With that in mind, I don't think making those intrusive and complicating changes is worth it.

@jgm jgm closed this as completed Jun 26, 2017
@Porges
Copy link
Author

Porges commented Jul 5, 2017

The unfortunate thing is that this is inconsistent with the way GitHub processes Markdown. So when I make my paths relative to the working directory the images render as broken in the online view.

Maybe I'll just write a script to pre-process all my files into another directory before running pandoc...

@mb21
Copy link
Collaborator

mb21 commented Jul 5, 2017

another thought: we could abstract the file handling from the readers, so they would only get a mediabag or similar interface of files to query which could be instantiated with either files from the working directory or current source file directory – depending on a command line setting for example.

@jgm
Copy link
Owner

jgm commented Jul 5, 2017

@mb21 Any change that would search for images in the working directory of the source file (when multiple files are specified on the command line) would have to keep track of which source file includes the given image, so we'd need the more complex interface I sketched above.

I did have one thought for a more limited change: perhaps we could automatically set the resource path for images to include the directory of the first file argument. When --file-scope is used, we could set this for each file argument. I think that would work for this use case.

@jgm jgm reopened this Jul 5, 2017
@Wolf-SO
Copy link

Wolf-SO commented Jul 5, 2017

@mb21 isn't this (specifying the image path via command line option) helpful also for multi-target publishing scenarios (HTML vs. PDF) where you need images in different resolution, which could be accomplished by changing folders? I find using the --default-image-extension= for managing image resolutions somewhat cumbersome.

@Porges
Copy link
Author

Porges commented Jul 6, 2017

@jgm I did actually try --file-scope in the hope that it would work. The downside is the lack of being able to link across files in that case, but I might be able to live with that (as long as it doesn't break pandoc-citeproc).

@Prajjwal
Copy link

--file-scope does not alleviate the issue.

A possible workaround for now could be writing a wrapper script that cd's into the correct directory, generates an output file for each input, and later concatenates the result. This is cumbersome, and I'm not even sure if it can be done with ePub output.

Another equally cumbersome workaround could be to pipe the markdown to sed and replacing the relative URLs with absolute ones.

I suppose we shall have to wait for the --resource-path option to hit stable and hope it works.

@archonic
Copy link
Contributor

archonic commented Jun 2, 2018

Does anyone know if --resource-path has hit stable yet?

@mb21
Copy link
Collaborator

mb21 commented Jun 3, 2018

Actually, --resource-path is already released. See the MANUAL for usage.

@Porges
Copy link
Author

Porges commented Jun 3, 2018

I guess to emulate my desired behaviour I can provide all the directories as --resource-path arguments (as long as I don't duplicate any filenames...)

@agusmba
Copy link
Contributor

agusmba commented Jun 5, 2018

I guess to emulate my desired behaviour I can provide all the directories as --resource-path arguments (as long as I don't duplicate any filenames...)

@Porges, depending on your workflow, wrappers that automatically cd into different directories such as pandocomatic could be of use to you.

@Porges
Copy link
Author

Porges commented Jun 5, 2018

@agusmba I'm generating a single file output from all the input files, so I don't think that would work.

@jacebenson
Copy link

I came across this problem when exploring using Pandoc. I ended up renaming each image to something custom like the endpoint or the title of the post hypen image name. Then I ran this script to generate the command to run. YMMV https://gist.github.com/jacebenson/f6eba3a293def19bf8184defbf274dcc

@Agarwal-Nikhil
Copy link

@tarleb by using --resource-path=.. Images defined using markdown syntax works but those with latex syntax don't work.

@qaisjp
Copy link

qaisjp commented Jan 31, 2020

I've run into this issue as well. My directory structure is like this. And I just define input files like chapters/**/*.md which is (probably) expanded by Bash to each markdown file deep inside the chapters folder.

A filter like this one, except for pandoc 2, would work nicely. Except we'd still need a way to determine what folder our markdown files are in. #3342 (comment)

@mb21
Copy link
Collaborator

mb21 commented Feb 2, 2020

@mb21 Any change that would search for images in the working directory of the source file (when multiple files are specified on the command line) would have to keep track of which source file includes the given image, so we'd need the more complex interface I sketched above.

I keep re-reading this sentence and not sure I get it. I mean, I get:

would have to keep track of which source file includes the given image

but why does this imply that:

we'd need the more complex interface I sketched above


Why do we have to call the reader with a list of inputs [(Source, Text)]? Couldn't we do this similarly to how --file-scope is implemented? Just call the reader multiple times in App.hs, once for each input file, but each time with the resource-path (in the PandocMonad env) set to the file's directory.

There seem to be four modes:

paths are relative to cwd (--resource-path=.) paths are relative to each input file
share reader state across files pandoc pandoc --resource-relative-to='file'
reset reader state after call to each reader pandoc --file-scope pandoc --file-scope --resource-relative-to='file'

I'm not sure what's the best name for this --resource-relative-to='file'|'cwd' option is. Feels a bit like a static site generator works... you might also want an option to give pandoc an input directory instead of individual input files, then it would default to --resource-relative-to='file'.

@qaisjp
Copy link

qaisjp commented Feb 2, 2020

you might also want an option to give pandoc an input directory instead of individual input files, then it would default to this new mode

I already kind of do this but lean on the shell to expand globs: pandoc chapters/**/*.md

@jgm
Copy link
Owner

jgm commented Feb 2, 2020

@mb21 if you do it the way we do with file-scope, calling the reader for each one, then things like reference links and footnotes won't work between files (as with file-scope).

@mb21
Copy link
Collaborator

mb21 commented Feb 10, 2020

if you do it the way we do with file-scope, calling the reader for each one, then things like reference links and footnotes won't work between files (as with file-scope).

@jgm I meant not do it exactly the same way... but in a similar way: call the reader multiple times in App.hs, once for each input file, but share/pass the state along each time. I guess that would mean exporting a second function from each reader, e.g. readMarkdown opts carryOverState text, or somehow provide an ability in PandocMonad for a reader to store it's state for when it's called the next time...

@jgm
Copy link
Owner

jgm commented Feb 10, 2020

It's tricky because state types are very heterogeneous. I suppose one approach would be to put something in PandocMonad for the inputs: [(FilePath, Text)]. The parsers could then be rewritten so that after parsing blocks, they move to the next item in this list, reset the source position (and maybe the working directory?), and add the text to the input stream, then parse again. This would require a lot of rewriting of parsers (and maybe there'd be problems in some cases) but the reader types could at least stay stable.

Indeed, a lot of this logic is already included in our current facility for handling include files in latex, RST, and other formats.

@jgm
Copy link
Owner

jgm commented May 24, 2021

As for implementation, I'm thinking this would involve a new reader option and changes to the Markdown reader. It would also be possible to implement this as an AST transform after the reader, but this would still require changes to the reader, which would have to insert the needed information about the source location of the image or link elements.

@qaisjp
Copy link

qaisjp commented May 24, 2021

Btw I think there's an ambiguity in --rebase-relative-paths=. — I believe it's easy to confuse it with --rebase-relative-paths=$(pwd).

@brainchild0
Copy link

On option 3, any opinions about the option name?

--resource-path-target, --resource-target-path, --resource-base.

I would suggest --resource-path-base or --resource-base-path, but I suppose they are too easy to confuse with the existing option.

@jgm
Copy link
Owner

jgm commented May 24, 2021

--rebase-relative-paths=. — I believe it's easy to confuse it with --rebase-relative-paths=$(pwd)

I'm not quite sure what you have in mind here. . and $(pwd) refer to the same directory, after all. But there is a question whether these two things should do the same thing, or something different. E.g. should a reference to img.jpg occurring in a/x.md be rewritten to a/img.jpg with the former and /home/user/docs/a/img.jpg (or whatever) with the latter?

I'm somewhat tempted to drop the argument and just always make the rewriting relative to the working directory. I can think of a few cases where the argument could be helpful, but in all those cases you could just change directories before calling pandoc to deal with the issue.

--resource-path-target or anything like that should be avoided. We have --resource-path already, and that refers to something different.

@qaisjp
Copy link

qaisjp commented May 24, 2021

Oh, I thought =. was for the behaviour of "reading from the directory of the markdown file", not for "reading from the cwd of this invocation"

@jgm
Copy link
Owner

jgm commented May 25, 2021

Further simplification, removing the optional argument. (If there is a demand for this, we can always add it later without compromising backwards compatibility.)

--rebase-relative-paths

Rewrite relative paths for Link and Image elements, depending
on the path of the file containing the link or image link.
For each link or image, pandoc will compute the directory of
the containing file, relative to the working directory, and
prepend the resulting path to the link or image path.

The use of this option is best understood by example.
Suppose you have a a subdirectory for each chapter of a
book, chap1, chap2, chap3. Each contains a file
text.md and a number of images used in the chapter. You
would like to have ![image](spider.jpg) in chap1/text.md
refer to chap1/spider.jpg and ![image](spider.jpg) in
chap2/text.md refer to chap2/spider.jpg. To do this,
use

pandoc chap*/*.md --rebase-relative-paths

Without this option, you would have to use
![image](chap1/spider.jpg) in chap1/text.md and
![image](chap2/spider.jpg) in chap2/text.md. Links with
relative paths will be rewritten in the same way as images.
This option currently only affects Markdown input.

@jgm
Copy link
Owner

jgm commented May 25, 2021

Implementing this would require:

  • update manual
  • add tests
  • add reader option, readerRebaseRelativePaths
  • modify markdown reader to be sensitive to this
  • add command-line option
  • handle option in defaults files

@dstadelm
Copy link

dstadelm commented May 25, 2021

I'm having the same issue with relative paths for ReST. The ReST documentation at https://docutils.sourceforge.io/docs/ref/rst/directives.html#including-an-external-document-fragment states

The "include" directive reads a text file. The directive argument is the path to the file to be included, relative to the document containing the directive. Unless the options literal, code, or parser are given, the file is parsed in the current document's context at the point of the directive.

Therefore for a directory structure as follows
image

with the following contents:

foo.rst

.. include:: bar/bar.rst

bar.rst

.. include:: here.rst
.. include:: there/file.rst

This should work if I'm reading the specification of reST correctly. I'm not sure about the comment of @jgm #3752 (comment) , even if the main file passed could be from stdin such as using cat, the include statements still have an actual file location, and as the only file initially given, is the main file, the only assumption that would have to be made would be that about the location of the main file. In which case I would assume the working directory of pandoc.

However I've seen that docutils which defines this behaviour also doesn't comply with their own specification 😥
This is wrong, it seems sphinx is messing it up, docutils is handling it correctly.

It would make life so much easier, as one could write modules containing documentations and move them around freely without worrying about the context they are living in.

@qaisjp
Copy link

qaisjp commented May 25, 2021

One last piece of feedback from me, as a person who encountered this issue in the wild, I would also like to propose this:

Hint for ambiguities

Flags: nothing special
Where: chap1/text.md refers to spider.jpg
Files present: chap1/spider.jpg and /spider.jpg
Outcome: print a warning like

[W] Reference "spider.jpg" in `chap1/text.md` is ambiguous.
It currently refers to "/spider.jpg" but could also refer "chap1/spider.jpg".
To pick the latter, provide `--rebase-relative-paths`.
To suppress this warning, provide `--no-rebase-relative-paths`.

Hint for broken references

Flags: nothing special
Where: chap1/text.md refers to spider.jpg
Files present: chap1/spider.jpg but not /spider.jpg
Outcome: print a warning like

[W] Reference "spider.jpg" in `chap1/text.md` could not be found at "/spider.jpg".
If you meant to refer to "chap1/spider.jpg", provide `--rebase-relative-paths`.

Ignore ambiguities if a specific flag is specified

Flags: --no-rebase-relative-paths or --rebase-relative-paths
Where: chap1/text.md refers to spider.jpg
Files present: either of the above scenarios
Outcome: no warnings


The pattern of prefixing with --no- is derived from Git. It's been over a year since i've used pandoc so I have zero context on how warnings work and what the CLI parameter patterns are.

One shortcoming of this proposal is that it doesn't work great if you want to opt one Markdoc file into rebasing relative paths, but not another, but I don't envisage that being a popular usecase.

@jgm
Copy link
Owner

jgm commented May 25, 2021

@dstadelm this issue with RST includes is already fixed (#6632) but the fix isn't yet in an official release. I've done some experiments on image links in RST, and it doesn't look as if RST rebases image links in the way I've described above; so far, then, this would be a Markdown-only feature.

@jgm
Copy link
Owner

jgm commented May 25, 2021

Here's are some more options that should be considered:

Option 4: No extra command-line option. Do the image resolution (and load binary data into the media bag) in the Markdown reader. Always look for the image first relative to the file containing the image link, and then in the resource path. Modify the image path to match the first matching image. Emit an INFO message indicating which path has been matched. Emit a WARNING message if nothing is found (this is done already now, but in the writers, and only for formats that include image data and not just links).

Potential advantages:

  • This would just "do the right thing automatically" in the vast majority of cases.
  • It would allow the --resource-path to affect not just output formats like docx, but formats like HTML that employ image links; the resource path could affect what path is used in the link.

Potential drawbacks:

  • Some users may find it objectionable that their image links get rewritten based on what is actually on the file system.
  • We'd be doing unnecessary IO when converting to formats that don't require image data (like HTML); this would affect performance considerably for image-heavy conversions
  • People who have set the resource-path so that it doesn't include the working directory might have breakages, since images would be sought relative to the including file (which could be the working directory) before the resource path is consulted.

Option 5: Like Option 4, but activate this behavior only if a command-line option is used (say, --resolve-image-links).

jgm added a commit that referenced this issue May 26, 2021
- Add manual entry for `--rebase-relative-paths`.
- Add option `--rebase-relative-paths`, which rewrites
  relative image and link paths by prepending the (relative)
  directory of the containing file.
- Enable `rebase-relative-paths` in defaults files.
- Add `readerRebaseRelativePaths` to ReaderOptions record
  [API change].
- Make Markdown reader sensitive to `readerRebaseRelativePaths`.
- Add tests for #3752.

Closes #3752.
@jgm
Copy link
Owner

jgm commented May 26, 2021

I've pushed a rebase-relative-paths branch that implements Option 3, if anyone wants to try it out.

jgm added a commit that referenced this issue May 26, 2021
- Add manual entry for `--rebase-relative-paths`.
- Add option `--rebase-relative-paths`, which rewrites
  relative image and link paths by prepending the (relative)
  directory of the containing file.
- Enable `rebase-relative-paths` in defaults files.
- Add `readerRebaseRelativePaths` to ReaderOptions record
  [API change].
- Make Markdown reader sensitive to `readerRebaseRelativePaths`.
- Add tests for #3752.

Closes #3752.
jgm added a commit that referenced this issue May 26, 2021
- Add manual entry for (non-default) extension
  `rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
  in Text.Pandoc.Extensions [API change]. When enabled, this
  extension rewrites relative image and link paths by prepending
  the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.

Closes #3752.

NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
@jgm
Copy link
Owner

jgm commented May 26, 2021

My new thought is that it makes more sense for this to be an extension.
This idea is developed in the rebase-relative-paths-extension branch.

jgm added a commit that referenced this issue May 26, 2021
- Add manual entry for (non-default) extension
  `rebase_relative_paths`.
- Add constructor `Ext_rebase_relative_paths` to `Extensions`
  in Text.Pandoc.Extensions [API change]. When enabled, this
  extension rewrites relative image and link paths by prepending
  the (relative) directory of the containing file.
- Make Markdown reader sensitive to the new extension.
- Add tests for #3752.

Closes #3752.

NB. currently the extension applies to markdown and associated
readers but not commonmark/gfm.
@jgm jgm closed this as completed in 834da53 May 27, 2021
@jgm
Copy link
Owner

jgm commented May 27, 2021

This is in master branch now. Currently the extension only works for markdown. I am working on making it work with commonmark and gfm, but this requires changes in commonmark-hs.

jgm added a commit that referenced this issue May 30, 2021
jgm added a commit that referenced this issue May 30, 2021
The immediate reason for this is to allow the test output of #3752
to work on both windows and linux.
@Altair-Bueno
Copy link

Altair-Bueno commented Jan 3, 2022

TLDR

Book
   ├── resources
   │   └── A1.png
   ├── Chapter-1
   │   └── ej1.md
   ...
<!-- ej1.md -->
# Hello world

![Image](../resources/A1.png)

Render every chapter with:

pandoc "--from=markdown+rebase_relative_paths" --output=Book.pdf Chapter**/*.md

More info

Search for Extension: rebase_relative_paths on pandoc's manual

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests