Improve Pandoc's syntax highlighting output (Skylighting tokens and span styles are a bit too simplistic) #10115
Replies: 2 comments
-
Both of these require some programming, I suppose, but nothing too complex. I asked ChatGPT to do 1 and the result looks pretty credible (not tested): -- A Lua filter for Pandoc that highlights code blocks using the external pygments program
-- Function to execute the pygments command and capture the output
local function highlight_with_pygments(code, lang)
-- Define the command to run pygments
local command = string.format('pygmentize -l %s -f html -O full,style=default', lang)
-- Open a handle to the pygments process
local handle = io.popen(command, 'w')
-- Write the code to the process
handle:write(code)
handle:close()
-- Capture the highlighted output
local highlighted_code = io.popen(command):read('*a')
return highlighted_code
end
-- The main function to process code blocks
function CodeBlock(elem)
-- Check if the code block has a language specified
if elem.classes[1] then
-- Get the language of the code block
local lang = elem.classes[1]
-- Highlight the code using pygments
local highlighted_code = highlight_with_pygments(elem.text, lang)
-- Replace the code block content with the highlighted code
return pandoc.RawBlock('html', highlighted_code)
end
end |
Beta Was this translation helpful? Give feedback.
-
The above AI answer didn't work immediately for me, but it was a decent starting point. I'm no lua expert, but I thought I'd share what I'm using now. It takes about twice the time as the default pandoc highlighter; I'm converting 1365 markdown files in a little over 3 min on a nine-year-old machine. Perhaps my lua is inefficient, but this is good enough for me as a quick and dirty filter! I'm sure investing in a haskell version would be the way to go for anyone concerned with speed. I'd love to hear any suggestions for improvement! -- stx-highlight.lua
-- A Lua filter for Pandoc that highlights code blocks using the external
-- pygments program. Derived from:
-- https://github.com/jgm/pandoc/discussions/10115#discussioncomment-10433143
-- Pygmentize code for some lang
local function pygmentize(code, lang)
-- write original code block contents to a temporary file
-- https://stackoverflow.com/a/72273633
local tmp_filename = os.tmpname()
local tmp_handle = assert(io.open(tmp_filename, "w"))
tmp_handle:write(code)
tmp_handle:close()
-- run the pygment program string as a process and read it's output
local prog = string.format(
"pygmentize -l %s -f html -O cssclass=stx-hl %s",
lang,
tmp_filename
)
local proc_handle = assert(io.popen(prog, "r"))
local code_pygmentized = proc_handle:read("*a")
-- clean up the process handle and temporary file
proc_handle:close()
os.remove(tmp_filename)
return code_pygmentized
end
-- Replace code block content with pygmentized code if lang is specified.
function CodeBlock(elt)
local lang = assert(elt.classes[1])
if lang then
---@diagnostic disable-next-line: undefined-global
return pandoc.RawBlock('html', pygmentize(elt.text, lang))
end
end |
Beta Was this translation helpful? Give feedback.
-
I've searched for similar issues but there are a lot of syntax highlighting (closed) issues! I've made some observations here about how Pandoc is outputting it's syntax highlighting for Markdown fenced code blocks, as I've been running into some roadblocks when trying to create my own colour themes.
I understand you're using Haskell's Skylighting. I'm not a Haskell guy (I've seen a couple of workarounds but I'm not a good enough programmer to understand them!
Pandoc, in general, works just fine and is easy to use and compile for my purposes, but some languages (like Elm Lang) render quite poorly, for example,
.fu
is applied to both functions and it's arguments, so the Breeze Dark Theme is impossible to recreate:Beta Was this translation helpful? Give feedback.
All reactions