-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluggable parsing #113
Pluggable parsing #113
Conversation
A pluggable content system is necessary to facilitate flexible content generation. The new `content` task is intended to allow easy extension of the content system so that it can grow beyond only markdown.
Will take a closer look later but in the meantime have you seen #50? |
I hadn't! I'll take a look. |
The asciidoc implementation in this PR certainly leaves something to be desired compared to the one in #100. However, I only picked asciidoc as an example because it's the first thing listed here: https://en.wikipedia.org/wiki/Lightweight_markup_language :). The real feature here is a unified, user-pluggable content pipeline. If this idea gets good feedback, I would advocate for removing asciidoc from this PR. |
@bhagany so one general thing: instead of the multimethod we could just invoke the task multiple times (as often as you you have different content types) and pass the "spec" that the multimethods return to the task as a task option. For some supported content types we could provide these "specs" as part of the std-lib. |
@bhagany also it might be nice to differentiate which content transformation was used for a given entry in the fileset. On that note: I'm still thinking it wouldn't hurt to use specific keys for the results like |
It's interesting that you bring up the keyword thing, because that was the starting point of the thought process that led to this PR. I eventually came to the conclusion that markup-language-specific metadata keys aren't all that useful. Some points about that:
That said, I do think there's some tension here, both in parsing input and in rendering output. The tension arises when we want multiple ways to produce the same kind of thing. It feels weird to have multiple tasks set the same metadata key, and for the reasons outlined above, using separate keys for the same kinds of values isn't what I'd like to see either. In my opinion, the ideal situation is having a task set a key that corresponds to the task's name. The best objection to this that I've been able to think of is that I'd rather keep perun's required dependencies to a minimum, and only pull in libraries that someone actually uses. I'm not sure how to dynamically pull in dependencies yet, but I think it's possible. If I'm on the wrong track there, please let me know - that's pretty close to a deal-breaker in adding more parsers and renderers to perun, imo. Anyway, I'll try to replace the multimethod in the manner you described, but I don't quite understand what the objection is. I think that any implementation will have to involve a function call of some sort, if dynamically loading dependencies for the pods is a goal. I also think that the multimethod approach leads to a nicer API for |
I like ideas behind this PR.
This seems very reasonable to me. Also I personally see no big problem in pulling additional libraries for the included batteries. Other people might have different opinion, but for me extra libraries were not an issue. |
It's interesting to see the different approaches to this. I have been working on something similar where each parser has it's own task, that only affects the files that match the parser. |
Except that this leads to an explosion of keys, and suddenly later tasks have to care about where their input comes from. It shouldn't matter if the content was generated by parsing a single file, processing the contents of a directory, or any other way. Keys should be based on the intent of the output. In fact, I would argue that, like Ring, we should be focused primarily on what the "request map" (or perhaps, for us, page map) contains. We should be writing a spec very much like this: https://github.com/ring-clojure/ring/blob/master/SPEC |
This is a really good idea. It might also be good to have a trace stored in a key listing all the transformations applied to a file. something like {:perun/trace [:content :render]} Could be really handy for debugging. |
Noted. I don't feel strongly about it - I was following the lead of someone else in one of the discussions here, I think maybe @Deraen? I probably won't focus on this without more prodding, then.
I think we actually agree here - I'm trying to avoid an explosion of keys, while improving the intuitiveness of the interface, by providing a common interface for input parsing.
I agree here as well. Metadata keys should follow a spec, and wherever possible, perun core task names should correspond to key names in the spec. For input parsing, this is possible (hence this pull request). However, for tasks like
This is a great idea. |
Hahahahaha I just reread what I said about it being a dealbreaker above. I guess I am not terribly consistent, forgive me. |
{:perun/trace [:content :render]} 👍 I like this. I'm wondering if this could also be used in a nice way to indicate what kind of parser has been used (markdown, asciidoc, ...)? Otherwise also what @bhagany suggested
"Spec" has become a terribly overloaded word lately 😄 What I mean is this: {:file-exts ["md" "markdown"]
:parse-form (fn [files] `(io.perun.content.markdown/parse-markdown ~files ~options))
:pod (create-pod markdown-deps)} You now essentially pass them to the task by passing the (content :languages [p/markdown p/asciidoc])
;; p/markdown and p/asciidocs are maps like the one in the previous snippet This way you could just write a map like above and pass it to the task. No extending of multimethods from foreign namespaces, no possibility of conflicts (think custom/different markdown implementations). You could easily add another file extension if you wanted to by using plain |
Okay, I understood you here. My initial thought was that if we wanted to keep required dependencies to a minimum, creating the pod outside of a function call wouldn't work. If dependency minimization isn't a goal, I don't think there's much of a difference between our implementations API-wise, and yours does have the advantages you listed, so I don't have any issue using yours. I will make that change soon. |
Also, don't forget to update the examples. (There really need to be some tests that make sure those always work.) |
Future content parsers will need to use the same yaml parsing that `markdown` does. This is a modified form of changes in hashobject#113.
Similarly to #109, I've split the ideas here into several different PR's and future PR's, and I'm pretty happy with that direction, so I'm closing this PR in favor of the others. |
This is a proposal for a pluggable content parsing system that can be extended beyond only markdown, both in user code and in perun itself.
Changes:
markdown
into the more generalcontent
task. It takes a vector of keywords that name markup languages to parse, which defaults to[:markdown]
. Options to each language parser can be passed through using a map that looks like, for example{:markdown {:smarts true} :asciidoc {"attributes" {"backend" "docbook"}}
content-parser
multimethod as an extension point for plugging into thecontent
taskmarkdown
's functionality by implementing acontent-parser
method for:markdown
, so that by default(content)
is equivalent to(markdown)
:asciidoc
method forcontent-parser
, to demonstrate the addition of new parsersI'm aware of one issue with the asciidoc parser: if the yaml metadata block at the beginning of a file, and the asciidoc contains any
----
delimited blocks, an error will be thrown. I'd appreciate some input on this.Let me know what you think!