Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of tables in HTML to markdown conversion #222

Closed
jgm opened this issue Jun 10, 2011 · 7 comments
Closed

Better handling of tables in HTML to markdown conversion #222

jgm opened this issue Jun 10, 2011 · 7 comments
Assignees

Comments

@jgm
Copy link
Owner

jgm commented Jun 10, 2011

Describe the proposed feature, including illustrative examples.

Currently Pandoc 'handles' tables in HTML to markdown conversion by just
stripping all table markup tags (

) usually leaving
each table cell content as a normal text paragraph.

Best of all would be if Pandoc could convert HTML tables not containing any
nested tables into Pandoc markdown tables and leave tables with nested
tables in place as HTML. Second best would be if all tables just be left
as HTML -- perhaps according to an option whether to strip them or leave
them alone.

I attach a Perl script which simulates the behavior I'm looking for as well
as the issue with

 blocks which I file at the same time as this issue.

/BP

Google Code Info:
Issue #: 132
Author: bpjonsson
Created On: 2009-03-05T13:56:10.000Z
Closed On: 2009-12-05T07:56:44.000Z

@ghost ghost assigned jgm Jun 10, 2011
@jgm jgm closed this as completed Jun 10, 2011
@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

I encountered an issue with my conversion script and some input files, so I changed
it to use a temporary file when converting content with tidy and pandoc through the
shell.

New version attached.

/BP

Google Code Info:
Author: bpjonsson
Created On: 2009-03-11T10:32:06.000Z

@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

If you run pandoc with --parse-raw, the table tags will be passed through unscathed.
Is this sufficient for your needs? I hesitate to try to write an HTML table ->
pandoc table converter, because (i) HTML tables are still sometimes used for layout
in web pages, and (ii) pandoc tables include information about the relative widths of
columns, which would be hard to derive from HTML tables.

Google Code Info:
Author: [email protected]
Created On: 2009-03-25T18:08:30.000Z

@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

Sorry, I had totally missed --parse-raw. It is indeed what I needed. See, two days
of writing bad Perl for nothing! Not really: I still think the HTML to Pandoc
markdown conversion is kinda neat! :-)

Google Code Info:
Author: [email protected]
Created On: 2009-04-05T12:03:36.000Z

@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

Apologies for writing that last comment inder the wrong identity.
[email protected], [email protected] and [email protected] are all
me, in case you haven't noticed!

/BP

Google Code Info:
Author: bpjonsson
Created On: 2009-04-05T12:07:32.000Z

@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

I'd like to close this bug. But I thought your script might be useful to others.
Would you mind if I put a link to it (or the script itself) on the pandoc website,
under Extras? If you think this is a good idea, you should perhaps add a license to
the source file.

Google Code Info:
Author: [email protected]
Created On: 2009-11-01T03:29:53.000Z

@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

Google Code Info:
Author: [email protected]
Created On: 2009-12-05T07:56:44.000Z

@jgm
Copy link
Owner Author

jgm commented Jun 10, 2011

I've written a new and IMHO improved script for converting HTML tables into markdown.
It no longer tries to integrate pandoc or tidy: you have to pipe output from pandoc's
HTML to markdown conversion (with the --parse-raw option) into it, like this:

pandoc -r html -R -w markdown in.html | perl html-table2pandoc >out.md

Input must be utf-8 encoded and output also always is utf-8 encoded. Not much of a
limitation since it applies to Pandoc too!

Google Code Info:
Author: bpjonsson
Created On: 2010-02-06T11:41:43.000Z

jgm added a commit that referenced this issue Feb 27, 2017
Add hypersetup options to beamer templates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant