Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Generate ExDoc documentation for Erlang projects #4

Open
16 tasks done
erszcz opened this issue Nov 18, 2019 · 54 comments
Open
16 tasks done

Generate ExDoc documentation for Erlang projects #4

erszcz opened this issue Nov 18, 2019 · 54 comments

Comments

@erszcz
Copy link
Collaborator

erszcz commented Nov 18, 2019

This thread documents the progress on generating ExDoc documentation for Erlang projects.

Current status

erszcz/edoc@0c81cea can be used to generate chunks compatible with erlang/otp@dd3b015 as follows:

git clone https://github.com/ferd/recon
cd recon
cat >> rebar.config <<END

{plugins,
 [
  {rebar3_edoc_chunks, {git, "https://github.com/erszcz/edoc.git", {branch, "master"}}}
 ]}.

{provider_hooks,
 [
  {post, [{compile, {edoc_chunks, compile}}]}
 ]}.
END
rebar3 compile
ls _build/default/lib/recon/doc/chunks/

ToDo

  • Aim for 100% compatibility with application/erlang+html format.
  • Define a set of HTML tags which are allowed in the chunks (likely starting with tags from https://developer.mozilla.org/en-US/docs/Web/HTML/Element but removing those which are obviously not useful in an .erl file / non-dynamic webpage).
  • Make sure EDoc documentation discourages use of raw HTML in doc comments.
  • Make sure that links to modules/functions/types/callbacks are stored properly.
  • Include since and deprecated information in the chunks:
    • @since
    • @deprecated - partially supported, see erszcz/edoc@ed5f585. EDoc @deprecated might contain markup, but EEP-48 expects a flat binary() for this metadata field - currently, all markup is dropped in the chunks.
  • Remove temporary edoc_layout_chunk_markdown layout and refactor the rest for simplicity:
    • Remove edoc_layout_chunk_markdown.
    • Refactor.
  • Prepare a PR to erlang/otp - this will likely squash the forked project history.
  • Once the PR is created, If need be also fix NEW-OPTIONS: / INHERIT-OPTIONS: doc comments - I'm not sure how to use these yet, though.
  • From 2020-03-10 WG meeting: have a command line tool to generate the chunks.
    The escript doesn't yet handle -I, i.e. adding custom include paths, but it hasn't yet occurred to be a problem.

2020-04-01 update:

  • tt, expr, see, strong tag handling - these pop up in Recon and EDoc itself
  • ex_doc consuming doc chunks could not generate EDoc documentation due to the types exported to chunk and types exported to Dbgi not matching. This is solved by rewriting all @spec / @type tags to -spec/-type attributes (currently only for edoc.erl)
  • HTML generated by ex_doc for edoc.erl (after spec/type rewrite) has some documentation entries truncated, e.g. for the deprecated file/2
@erszcz
Copy link
Collaborator Author

erszcz commented Nov 18, 2019

Code blocks and inline code formatting is back.

@gomoripeti
Copy link

I tried docs_chunks with wm-erlang branch of ex_doc (without rebar3) on an erlang app. All worked fine with a minor modification: I had to add config.source_dir to the code path otherwise ExDoc.Retriever.docs_from_files couldn't load the module (although it has full path to the beam)

@erszcz
Copy link
Collaborator Author

erszcz commented Nov 18, 2019

@gomoripeti Thanks for the feedback!
The EDoc fork and docs_chunks are now separate codebases. Since the new formatter required some changes in (now called) edoc_chunks, could you give the rebar plugin (and therefore edoc_chunks) a go to see if it still behaves properly on your app?

Moreover, as far as I understand @wojtekmach is not really interested in supporting docs_chunks in the long term, while the EDoc fork has a chance of eventually landing in OTP.

@wojtekmach
Copy link

Yeah, I consider docs_chunks a workaround and while I'm happy to accept bug fixes etc, I don't plan to extend it beyond what it is so far. I'm also happy to keep it around to test some ideas out. Long term, looking forward to have chunks generation upstream in OTP in edoc or erlc.

@erszcz
Copy link
Collaborator Author

erszcz commented Nov 18, 2019

For the record:

ExDoc docs are generated properly for docsh, but are not for edoc itself.

ExDoc expects a -type attribute to be present in Dbgi if a @type EDoc tag of the same name is present in Docs. This causes it to crash on edoc.erl:

** (MatchError) no match of right hand side value: nil
    (ex_doc) lib/ex_doc/retriever.ex:416: ExDoc.Retriever.get_type/3
    (ex_doc) lib/ex_doc/retriever.ex:406: anonymous fn/4 in ExDoc.Retriever.get_types/2
    (elixir) lib/enum.ex:1948: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ex_doc) lib/ex_doc/retriever.ex:405: ExDoc.Retriever.get_types/2
    (ex_doc) lib/ex_doc/retriever.ex:134: ExDoc.Retriever.do_generate_node/3
    (ex_doc) lib/ex_doc/retriever.ex:120: ExDoc.Retriever.generate_node/3
    (elixir) lib/enum.ex:2994: Enum.flat_map_list/2
    (ex_doc) lib/ex_doc/retriever.ex:43: ExDoc.Retriever.docs_from_modules/2

@wojtekmach
Copy link

wojtekmach commented Nov 19, 2019

I vaguely remember reading somewhere (but don't quote me on this) that writing typespecs is preferred over writing edoc typespec tags. If that's correct, I'd focus on that scenario and maybe even warn when edoc typespec is found. And if that's not correct, I'd be curious to learn about that use case. In any case, I made a note of this in my internal ExDoc list.

@KennethL
Copy link

I think we have to decide what solutions and what formats we are going for before we join our forces and implement the things we agreed on.

@josevalim and I have had several discussions after summer about what problem to solve and how. @wojtekmach showed a very nice proof of concept about what at least I think we should go for. Below is my summary of that:

We want to use the doc chunk format specified in EEP-48 as basis for a standardized way of making documentation available for use in the interactive shell and for use via the Language Server Protocol for presentation in an editor or IDE.

As doc chunks also is the input to ExDoc for generating nice html and epub documentation we want to make ExDoc support Erlang documentation as well (and possibly other languages).

For Elixir we already have the doc chunks, the use from the shell and the ExDoc parts in place and now we want to make the same available for Erlang.

Since Edoc currently is the only tool for documenting Erlang APIs which in practice is available and used for Erlang components outside OTP it would be good to add the possibility to generate doc chunks from Edoc. Note that Edoc is open for adding backends as plugins so it should be possible to do this without need to change the core parts of Edoc (for this reason at least). I also think Edoc should remain compatible about what input it takes.
So we want an Edoc markup to doc chunk tool probably based on Edoc.

Since almost everything in OTP is documented via the OTP XML format (with tooling in the erl_docgen application) we also want a OTP XML to doc chunk translator. The docs_chunk tool by by @wojtekmach showed that this is a way forward. Now I have taken @wojtekmach s work as inspiration to do a similar translation but with a different implementation which I think should be part of the erl_docgen application in OTP.

The goal is to have a make target for building OTP which can produce the doc chunks for all OTP modules (with public APIs).

For this to happen we have discussed the format of an doc chunk. It is probably no point in generating Markdown since that format is tricky to parse and will loose information compared with what we have in the original source. Instead we think it would be good to have an Erlang term format along the lines of:

{Tag, Attributes, Content}
Content = [binary()|{Tag, Attributes, Content}]

The same format should be used when generated from OTP XML and from Edoc.

ExDoc can be extended to support this type of format.

-type docs_v1() :: #docs_v1{anno :: erl_anno:anno(),
                            beam_language :: beam_language(),
                            format :: mime_type(),
                            module_doc :: doc(),
                            metadata :: metadata(),
                            docs :: [docs_v1_entry()]}.
%% The Docs v1 chunk according to EEP 48.

-type docs_v1_entry() :: #docs_v1_entry{kind_name_arity :: {atom(), atom(), arity()},
                                        anno :: erl_anno:anno(),
                                        signature :: signature(),
                                        doc :: doc(),
                                        metadata :: metadata()}.

It is the exact contents in the metadata and in the doc part for both module and function/type that is of most interest. Also in what representation we have the -type and -spec parts as they are carrying important information both for the documentation and when doing for example completion in tha shell or via the LSP.

It is this format that is important to settle first. I will soon present more details about this, that we can discuss.

@erszcz
Copy link
Collaborator Author

erszcz commented Nov 19, 2019

@wojtekmach

I vaguely remember reading somewhere (but don't quote me on this) that writing typespecs is preferred over writing edoc typespec tags.

Indeed, it's in the official EDoc documentation:

Note that although the syntax described in the following can still be used for specifying functions we recommend that Erlang specifications as described in Types and Function Specification should be added to the source code instead.

I left my previous comment here as a conclusion of the evening research and also an explanation to why the chunks can't be generated for EDoc itself yet.

I'd focus on that scenario and maybe even warn when edoc typespec is found.

The EDoc typespec, at least for now, is the only way to add a textual description to a type definition. In a fully fledged case we would have:

%% @type example_t(). An example type doc.
-type example_t() :: any().

I think a warning is appropriate when:

  • there's a @spec present, i.e. an entry in Docs chunk
  • but there's no -spec present, i.e. no corresponding entry in Dbgi

@KennethL
Copy link

KennethL commented Nov 19, 2019 via email

@wojtekmach
Copy link

@erszcz we can document types with edoc like this.

-type foo() :: atom().
%% Docs for foo.

-type bar() :: atom().
%% Docs for bar.

Note, we can't use @doc (or @since etc) here.

@erszcz
Copy link
Collaborator Author

erszcz commented Nov 19, 2019

@wojtekmach Indeed, I was not sure doc comments with spec/type attributes are already supported.
Having checked that, I see it's described in EDoc docs, but not mentioned in Types and Function Specifications.

@erszcz
Copy link
Collaborator Author

erszcz commented Nov 19, 2019

@KennethL

It would actually be good if Edoc could warn for the use of @type so that
we encourage the use of -type/-spec.

Good point, I'll add that in my fork.

@erszcz
Copy link
Collaborator Author

erszcz commented Feb 10, 2020

Here's an update on what https://github.com/erszcz/edoc/tree/extract-layouts-wip currently produces:

{docs_v1,0,erlang,<<"text/markdown">>,
         [{p,[<<"EDoc - the Erlang program documentation generator.">>]},
          <<"\n \n  ">>,
          <<"This module provides the main user interface to EDoc.\n  ">>,
          {ul,[<<"\n    ">>,
               {li,[{a,[{href,<<"overview-summary.html">>}],
                       [<<"EDoc User Manual">>]}]},
               <<"\n    ">>,
               {li,[{a,[{href,<<"overview-summary.html#Running_EDoc">>}],
                       [<<"Running EDoc">>]}]},
               <<"\n  ">>]}],
         #{},
         [{{type,edoc_module,0},
           0,
           [<<"edoc_module/0">>],
           [<<"  The EDoc documentation data for a module,\n  expressed as an XML document in ">>,
            {a,[{href,<<"http://www.erlang.org/edoc/doc/xmerl/doc/index.html">>},
                {target,<<"_top">>}],
               [<<"XMerL">>]},
            <<" format. See\n  the file ">>,
            {a,[{href,<<"edoc.dtd">>}],[{code,[<<"edoc.dtd">>]}]},
            <<" for details.">>],
           #{}},
          {{type,filename,0},0,[<<"filename/0">>],[],#{}},
          {{type,proplist,0},0,[<<"proplist/0">>],[],#{}},
          {{type,comment,0},0,[<<"comment/0">>],[],#{}},
          {{type,syntaxTree,0},0,[<<"syntaxTree/0">>],[],#{}},
          {{function,file,1},
           0,
           [<<"file/1">>],
           #{<<"en">> =>
                 <<"Equivalent to [file(Name, [])](`file/2`).">>},
           #{}},
          {{function,file,2},
           0,
           [<<"file/2">>],
           [{p,[<<"Reads a source code file and outputs formatted documentation to  \na corresponding file.">>]},
            <<"\n \n  ">>,<<"Options:\n  ">>,
            {dl,[<<"\n   ">>,
                 {dt,[{code,[<<"{dir, ">>,
                             {a,[{href,<<"#type-filename">>}],[<<"filename()">>]},
                             <<"}">>]},
                      <<"\n   ">>]},
                 <<"\n   ">>,
                 {dd,[<<"Specifies the output directory for the created file. (By\n       default, the output is written to the directory of the source\n       file.)\n   ">>]},
                 <<"\n   ">>,
                 {dt,[{code,[<<"{source_suffix, string()}">>]},<<"\n   ">>]},
                 <<"\n   ">>,
                 {dd,[<<"Specifies the expected suffix of the input file. The default\n       value is ">>,
                      {code,[<<"\".erl\"">>]},
                      <<".\n   ">>]},
                 <<"\n   ">>,
                 {dt,[{code,[<<"{file_suffix, string()}">>]},<<"\n   ">>]},
                 <<"\n   ">>,
                 {dd,[<<"Specifies the suffix for the created file. The default value is\n       ">>,
                      {code,[<<"\".html\"">>]},
                      <<".\n   ">>]},
                 <<"\n  ">>]},
            <<"\n \n  ">>,
            {p,[<<"See ">>,
                {a,[{href,<<"#get_doc-2">>}],[{code,[<<"get_doc/2">>]}]},
                <<" and ">>,
                {a,[{href,<<"#layout-2">>}],[{code,[<<"layout/2">>]}]},
                <<" for further  \noptions.">>]},
            <<"\n \n  ">>,
            <<"For running EDoc from a Makefile or similar, see\n  ">>,
            {a,[{href,<<"edoc_run.html#file-1">>}],
               [{code,[<<"edoc_run:file/1">>]}]},
            <<".\n ">>],
           #{}},
          {{function,files,1},0,[<<"files/1">>],[],#{}},
          {{function,files,2},
           0,
           [<<"files/2">>],
           #{<<"en">> =>
                 <<"Equivalent to [run([], Files, Options)](`run/3`).">>},
           #{}},
          {{function,application,1},
           0,
           [<<"application/1">>],
           #{<<"en">> =>
                 <<"Equivalent to [application(Application, [])](`application/2`).">>},
           #{}},
          {{function,application,2},
           0,
           [<<"application/2">>],
           [<<"Run EDoc on an application in its default app-directory. See\n  ">>,
            {a,[{href,<<"#application-3">>}],
               [{code,[<<"application/3">>]}]},
            <<" for details.">>],
           #{}},
         ...
	 ]}.

While for ExDoc and web browsers the whitespace present in the descriptions might not make much difference, it may for the shell viewer. Depending on how simple or smart it is going to be some whitespace cleanup and possibly promotion of freestanding text to <p> elements might be necessary.

@garazdawi, what do you think about it? I think the text layout generated by docsh looks decent and I don't mind it being reused, but I'm obviously biased :)

@garazdawi
Copy link

Depending on how simple or smart it is going to be some whitespace cleanup and possibly promotion of freestanding text to

elements might be necessary.

For the OTP docs I've chosen to do the whitespace trimming before rendering the doc chunks as I would like to keep the renderer as simple as possible. However, if the renderer is supposed to work with multiple input sources I might as well make it work on the doc chunk html format and thus it can be used both by the parser and the renderer.

As for the format I've tried to mimic the man page format of the Erlang/OTP man pages. I think that it will have to do for now as it is not something that has to be backward compatible and we should be able to change it in the future.

@garazdawi
Copy link

garazdawi commented Feb 19, 2020

Do you have any example module/library that this works on? I tried adding it to recon but that failed with this crash:

===> Uncaught error: {badmatch,false}
===> Stack trace to the error location:
[{edoc_chunks,xpath_to_chunk_format,3,
              [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/edoc_chunks.erl"},
               {line,170}]},
 {edoc_chunks,edoc_to_chunk,2,
              [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/edoc_chunks.erl"},
               {line,80}]},
 {rebar3_edoc_chunks,process_file,3,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,84}]},
 {rebar3_edoc_chunks,'-process_app/2-lc$^0/1-0-',3,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,77}]},
 {rebar3_edoc_chunks,process_app,2,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,77}]},
 {rebar3_edoc_chunks,'-do/1-lc$^0/1-0-',2,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,59}]},
 {rebar3_edoc_chunks,do,1,
                     [{file,"/home/eluklar/git/recon/_build/default/plugins/rebar3_edoc_chunks/src/rebar3_edoc_chunks.erl"},
                      {line,59}]},
 {rebar_core,do,2,
             [{file,"/home/eluklar/git/rebar3/src/rebar_core.erl"},
              {line,154}]}]

@erszcz
Copy link
Collaborator Author

erszcz commented Feb 19, 2020

@garazdawi For now I've been testing on EDoc itself:

09:59:26 erszcz @ x5 : ~/work/erszcz/edoc (extract-layouts-wip)
$ r3 shell
===> Verifying dependencies...
===> Compiling edoc
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V10.1  (abort with ^G)
1> edoc:files(["src/edoc.erl"], [{doclet, edoc_doclet_chunks}, {dir, "doctest"},
1>                   {layout, edoc_layout_chunk_htmltree}]).
ok
2> {ok, BChunk} = file:read_file("doctest/chunks/edoc.chunk").
{ok,<<131,104,7,100,0,7,100,111,99,115,95,118,49,97,0,
      100,0,6,101,114,108,97,110,103,109,0,0,...>>}
3> Chunk = binary_to_term(BChunk).
{docs_v1,0,erlang,<<"text/markdown">>,...}

It's definitely still a WIP though.

@erszcz
Copy link
Collaborator Author

erszcz commented Feb 23, 2020

I've cleaned up the new chunk layouts and adjusted the Rebar3 plugin a little bit. Here's how it looks like with Recon now (as of extract-layouts @ 13df5b1):

15:35:31 erszcz @ x5 : ~/work/erszcz/recon (master *)
$ cat rebar.config
{profiles, [
    {test, [
        {erl_opts, [nowarn_export_all, {d, 'TEST'}]}
    ]}
]}.

{plugins,
 [
  {rebar3_edoc_chunks, {git, "https://github.com/erszcz/edoc.git", {branch, "extract-layouts"}}}
 ]}.

{provider_hooks,
 [
  {post, [{compile, {edoc_chunks, compile}}]}
 ]}.
15:36:55 erszcz @ x5 : ~/work/erszcz/recon (master *)
$ r3 shell
===> Verifying dependencies...
===> Compiling recon
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Eshell V10.1  (abort with ^G)
1> f(ReadChunk).
ok
2> ReadChunk = fun (File) ->
2>                     {ok, BChunk} = file:read_file(File),
2>                     Chunk = binary_to_term(BChunk)
2>             end.
#Fun<erl_eval.6.128620087>
(search)`rp': rp( ReadChunk("_build/default/lib/recon/doc/chunks/recon.chunk") ).
{docs_v1,0,erlang,<<"text/markdown">>,
         [{p,[<<"Recon, as a module, provides access to the high-level functionality   \ncontained in the Recon application.">>]},
          <<"\n  \n   ">>,
          {p,[<<"It has functions in five main categories:">>]},
          <<"\n  \n   ">>,
          {dl,[<<"\n       ">>,
               {dt,[<<"1. State information">>]},
               <<"\n       ">>,
               {dd,[<<"Process information is everything that has to do with the\n           general state of the node. Functions such as ">>,
                    {a,[{href,<<"#info-1">>}],[{code,[<<"info/1">>]}]},
                    <<"\n           and ">>,
                    {a,[{href,<<"#info-3">>}],[{code,[<<"info/3">>]}]},
                    <<" are wrappers to provide more details than\n           ">>,
                    {code,[<<"erlang:process_info/1">>]},
                    <<", while providing it in a production-safe\n           manner. They have equivalents to ">>,
                    {code,[<<"erlang:process_info/2">>]},
                    <<" in\n           the functions ">>,
                    {a,[{href,<<"#info-2">>}],[{code,[<<"info/2">>]}]},
                    <<" and ">>,
                    {a,[{href,<<"#info-4">>}],[{code,[<<"info/4">>]}]},
                    <<", respectively.">>]},
...

@josevalim
Copy link
Collaborator

That looks awesome. We will have to do something about the links though. {a,[{href,<<"#info-4">>}] won't work for ExDoc. We will need to formalize a way to write said annotations.

@KennethL
Copy link

KennethL commented Feb 23, 2020 via email

@josevalim
Copy link
Collaborator

I agree with everything. We don't need to be fully compatible with HTML and the formats don't need to be compatible between languages either.

My only suggestion is to not add a href to said links. That will make it easier for us to detect if something is an actual link or an internal reference. Another option is just to use a random tag, such as <erlang-ref> or <ref>.

In any case, tegarding the format, here is what Elixir uses:

  • ref="Module" - modules
  • ref="Module.fun/arity" - functions
  • ref="t:Module.fun/arity" - types
  • ref="c:Module.fun/arity" - callbacks

@erszcz
Copy link
Collaborator Author

erszcz commented Mar 4, 2020

I've worked a bit on compat with OTP 23-devel shell_docs - https://github.com/erszcz/edoc/commits/extract-layouts now produces compatible chunks for all Recon modules apart from recon_trace.

@garazdawi One of the problems is that shell_docs throws unhandled for an h3 because of the guard on Pos:

render_element({h3,_,Content},State,Pos,_Ind,D) when Pos =< 2 ->

However, recon_trace does not seem to abuse the EDoc syntax:

%%% == Tracing Erlang Code ==

The other problem is the set of allowed tags - tt does not seem to be handled by shell_docs, while EDoc currently lets all tags from https://developer.mozilla.org/en-US/docs/Web/HTML/Element pass through, including the deprecated ones. I think I'll just make sure only tags defined with shell_docs ALL_ELEMENTS macro are output.

@garazdawi
Copy link

@garazdawi One of the problems is that shell_docs throws unhandled for an h3 because of the guard on Pos:

render_element({h3,_,Content},State,Pos,_Ind,D) when Pos =< 2 ->

However, recon_trace does not seem to abuse the EDoc syntax:

%%% == Tracing Erlang Code ==

I've sent a PR to your edoc changes and with it I can render recon_trace, but without it I cannot even get the generation to work.

The other problem is the set of allowed tags - tt does not seem to be handled by shell_docs, while EDoc currently lets all tags from https://developer.mozilla.org/en-US/docs/Web/HTML/Element pass through, including the deprecated ones. I think I'll just make sure only tags defined with shell_docs ALL_ELEMENTS macro are output.

We cannot support all of HTML in the renderer. So either we should ignore unknown tags when rendering, or ignore them when creating the chunks.

@josevalim
Copy link
Collaborator

We cannot support all of HTML in the renderer. So either we should ignore unknown tags when rendering, or ignore them when creating the chunks.

Yeah, we will have the same issue in Elixir's shell rendering. :(

However, we cannot ignore these tags when creating the chunk, because those tags are likely useful when generating actual HTML documentation.

We could ignore them in the renderer, but maybe the documentation will then appear incomplete. Maybe an alternative is to render them as HTML tags in the shell too? It would look weird in the shell but that's the best option given they were also written as HTML in the actual docs. I think that's what Elixir does for HTML in the markdown.

@KennethL
Copy link

KennethL commented Mar 9, 2020 via email

@erszcz
Copy link
Collaborator Author

erszcz commented Mar 9, 2020

@garazdawi

I've sent a PR to your edoc changes and with it I can render recon_trace, but without it I cannot even get the generation to work.

Thanks for the PR. Indeed, I experimentally added the shell_docs:normalize/1 call there to clean up non-meaningful whitespace, but apparently I did not do it correctly.

I wanted to check if artefacts like this leading space before "This module..." could be fixed by normalizing:

4> h(recon_rec).

        recon_rec

   This module handles formatting records for known record types. Record definitions are
  imported from modules by user. Definitions are distinguished by record name and its

But it doesn't seem to help :| The real problem is using the @doc tag in recon_rec comment:

%%% @doc
%% This module handles formatting maps.

The text ought to be on the same line, but it's on the next one.

BTW, I've cleaned up and merged the extract-layouts branch to master - it works fine (at least on Recon) in case anyone's curious to try.

@josevalim @KennethL

My approach so far was to let through only a certain set of tags (originally, the set from MDN, I later switched to the set Lukas uses in the renderer) and for all the other tags to just extract the text and disregard the tag name and attributes. For some tags (the aforementioned links/refs/<a href='...'>, but also tt) there could be some builtin translations. What do you think about this approach?

@josevalim
Copy link
Collaborator

@erszcz I don't think we should remove or rewrite the HTML when writing the chunk because when users compare the result from edoc_html with ex_doc in the future, ex_doc would seemingly discard information/markup and they probably won't be happy with that. :)

So if the plan is for edoc to also use "application/erlang+html", we need to change both Erlang/Elixir shell renderers to deal with markup they don't know upfront - even if it is just by discarding it or by rendering it as HTML.

I agree with @KennethL that ideally they wouldn't have to write any HTML in Edoc. But until then, we have to do what we have to do. :)

@garazdawi
Copy link

@garazdawi

I've sent a PR to your edoc changes and with it I can render recon_trace, but without it I cannot even get the generation to work.

Thanks for the PR. Indeed, I experimentally added the shell_docs:normalize/1 call there to clean up non-meaningful whitespace, but apparently I did not do it correctly.

I wanted to check if artefacts like this leading space before "This module..." could be fixed by normalizing:

4> h(recon_rec).

        recon_rec

   This module handles formatting records for known record types. Record definitions are
  imported from modules by user. Definitions are distinguished by record name and its

But it doesn't seem to help :| The real problem is using the @doc tag in recon_rec comment:

%%% @doc
%% This module handles formatting maps.

The text ought to be on the same line, but it's on the next one.

The normalizer should eliminate that space, I'll see if I can find out what is going on.

@marianoguerra
Copy link

one minute after writing an email about edoc I see this thread, I would like to know what's the plan for this edoc fork in relation with OTP/edoc, and if incremental improvements to OTP/edoc make sense or this is the way forward and will be merged eventually to OTP?

@josevalim
Copy link
Collaborator

The goal is to make a contribution to OTP that converts EDoc to chunks.

@erszcz
Copy link
Collaborator Author

erszcz commented Mar 11, 2020

@marianoguerra I'm aiming to prepare a PR to OTP, either for OTP 23 rc2 or rc3 - as soon as the TODOs are taken care of. Kenneth and the OTP team are fine with these changes - for example, please see #3 (comment).

@erszcz
Copy link
Collaborator Author

erszcz commented Apr 5, 2020

I'm keeping the todo list in the top post more or less up to date, but here's a recap of the latest changes:

  • bin/edoc.escript is now available - it's a CLI interface to edoc:application/2 and edoc:files/2
  • since I needed some material for dogfooding, all @spec and @type tags are now rewritten to -spec and -type attributes; types are also moved from header files to corresponding modules to utilise module names as namespaces
  • @deprecated and @since are now exported to chunks
  • EDoc @private and @hidden are translated respectively to EEP-48 hidden and none
  • the chunks are normalized with shell_docs:normalize/1
  • EDoc can now generate chunks for itself 🎉 All of them pass shell_docs:validate/1.

I've encountered some issues on the way, though:

  • EEP-48 states that ModuleDoc and Doc fields can be #{DocLanguage := DocValue} | none | hidden. shell_docs:validate/1 wouldn't have accepted non-map variants, so I had applied erszcz/otp@1e72aef to make it accept none | hidden. However, maybe its intention is to actually check if docs are present - @garazdawi what do you think? I can make this change part of the edoc PR if that's ok.
    It's funny that validate would only crash on hidden as, apparently, maps:map(..., none) returns #{} - none is a valid map iterator :)

  • @deprecated accepts any EDoc comment, therefore nested HTML tags. EEP-48, however, expects this field to be a flat binary() - this requires discarding useful information. ExDoc uses this field for tooltips, where a flat binary is fine, but also for a disclaimer under entry definition, where the discarded tags would be useful (links, etc). Currently, EDoc discards any subtags tags and flattens @deprecated to a binary(), as it's the simplest working solution.

  • @private and @hidden information used to only be passed to EDoc doclets/layouts for the module, not for entries. I've changed that and now all entries, even not exported ones, are stored in the chunks. This means shell_docs prints proper warnings when accessing hidden | none fields, but it also means chunks are bigger in size. On the other hand, even with no human-readable docs, the chunk entries might still contain metadata useful for tools. I think the best would be to have exported and @private entries in the chunks (with proper metadata) and @hidden left out.

  • @wojtekmach, @josevalim when testing interop in order to avoid (bad signature) for types in ExDoc generated HTML, I've used erszcz/ex_doc@5981077 - a fix on top of wm-erlang.
    As of EDoc erszcz/edoc@967d475 this branch doesn't work anymore. I'll try to figure out the exact reason, but I'm including the error below so that it doesn't escape my memory:

    $ /Users/erszcz/work/elixir-lang/ex_doc/ex_doc edoc "0.11" _build/default/lib/edoc/ebin --main edoc
    ** (MatchError) no match of right hand side value: nil
        (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:74: anonymous fn/2 in ExDoc.Formatter.HTML.autolink_and_render/4
        (elixir 1.10.2) lib/enum.ex:2111: Enum."-reduce/3-lists^foldl/2-0-"/3
        (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:73: anonymous fn/2 in ExDoc.Formatter.HTML.autolink_and_render/4
        (elixir 1.10.2) lib/enum.ex:2111: Enum."-reduce/3-lists^foldl/2-0-"/3
        (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:71: ExDoc.Formatter.HTML.autolink_and_render/4
        (ex_doc 0.21.2) lib/ex_doc/formatter/html.ex:21: ExDoc.Formatter.HTML.run/2
        (elixir 1.10.2) lib/kernel/cli.ex:124: anonymous fn/3 in Kernel.CLI.exec_fun/2
    

    wm-erlang-3, on the other hand, does not recognise types provided by OTP apps, yet crashes on something else:

    $ /Users/erszcz/work/elixir-lang/ex_doc/ex_doc edoc "0.11" _build/default/lib/edoc/ebin --main edoc
    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing t:edoc.syntaxTree/0 docs)
    
    warning: documentation references t::edoc.module/0 but it doesn't exist or isn't public (parsing edoc_extract.get_module_info/2 docs)
    
    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing edoc_extract.get_module_info/2 docs)
    
    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.header/4 docs)
    
    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.header/5 docs)
    
    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing edoc_extract.preprocess_forms/1 docs)
    
    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.source/4 docs)
    
    warning: documentation references t::erl_syntax.forms/0 but it doesn't exist or isn't public (parsing edoc_extract.source/5 docs)
    
    warning: documentation references t::erl_syntax.syntaxTree/0 but it doesn't exist or isn't public (parsing t:edoc_specs.syntaxTree/0 docs)
    
    ** (TokenMissingError) nofile:1: missing terminator: end (for "do" starting at line 1)
        (elixir 1.10.2) lib/code.ex:645: Code.format_string!/2
        (ex_doc 0.21.3) lib/ex_doc/autolink.ex:331: ExDoc.Autolink.typespec/2
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
        (ex_doc 0.21.3) lib/ex_doc/formatter/html.ex:91: anonymous fn/5 in ExDoc.Formatter.HTML.render_all/4
        (elixir 1.10.2) lib/enum.ex:2111: Enum."-reduce/3-lists^foldl/2-0-"/3
        (ex_doc 0.21.3) lib/ex_doc/formatter/html.ex:88: anonymous fn/4 in ExDoc.Formatter.HTML.render_all/4
        (elixir 1.10.2) lib/enum.ex:1396: Enum."-map/2-lists^map/1-0-"/2
    

Ufff, well, that's a wall of text, but it added up over the last few days.

@josevalim
Copy link
Collaborator

@wojtekmach, @josevalim when testing interop in order to avoid (bad signature) for types in ExDoc generated HTML

Good catch. I have posted a comment on the PR. If you want to submit a PR to ExDoc, it will be welcome!

@wojtekmach
Copy link

Hi @erszcz, I'll continue working on the Erlang support on https://github.com/elixir-lang/ex_doc/tree/wm-erlang which I've just rebased with latest master. If you'd like to send patches to that branch that would be appreciated.

@garazdawi
Copy link

@garazdawi what do you think? I can make this change part of the edoc PR if that's ok.

Yes, make it part of the PR. Thanks.

@KennethL
Copy link

KennethL commented Apr 6, 2020 via email

@josevalim
Copy link
Collaborator

I believe the reason we have both none and hidden is because there is a distinction between:

  1. is this meant to be public but it was not documented
  2. or this was not meant to be public at all

The issue is that it depends on what you consider to be the default. A function is made publicly available only if it is documented? Or are functions publicly available unless you say they are private/hidden?

They both have pros and cons. Assuming that functions are public by default means that, even if they don't write proper docs, the generated documentation will have something. On the other hand, this means you can accidentally make public a function that was meant to be private.

If Edoc says that everything is private by default, then i agree they are probably not necessary in the chunk. There may be places in Elixir that won't handle this assumption well but this is an Elixir problem to fix. :)

@garazdawi
Copy link

functions [are] publicly available unless you say they are private/hidden?

This is the way that I have done it with the Erlang/OTP docs. All modules have a .chunk file, even if they are not public and in that file the module_doc is hidden and the docs list is empty.

https://github.com/erlang/otp/blob/master/lib/erl_docgen/src/docgen_xml_to_chunk.erl#L35-L36

If code:get_doc/1 tries to lookup a module that does not have an EEP-48 chunk then it will generate one on the fly from the AST.

@erszcz
Copy link
Collaborator Author

erszcz commented May 7, 2020

Status update

TL;DR:

  • Module, function and type annotations are now exported to chunks.
  • Callback locations or definitions, as well as function spec metadata is not easily accessible from EDoc layouts (the plugins used to format generated docs).

TODO:

  • extend edoc_extract with -callback processing to get the full definitions
  • sidestep the XMERL layer and pass specs and callbacks directly to the layout

Here comes the long version. When we call edoc:files/2 or one of the other entry points, the EDoc app does the following:

  1. A doclet is invoked - the doclet understands the structure of Erlang projects / OTP apps and also encodes the final structure of the to-be-generated docs. It finds the source files and calls edoc:get_doc/3 on each of them.
  2. edoc:get_doc/3 calls edoc_extract:source/3 which parses the source file to AST forms and then converts the forms to a flat list of EDoc #entry records. These entries correspond to source level entities which are further processed later: comments, functions, specs, types, records. Callbacks are dropped from further processing at this stage. Another pass over the entries is done to unify @spec/@type and -spec/-type representation for later stages. The entries are passed to edoc_data:module/4.
  3. edoc_data:module/4 processes the list of entries and builds an expanded XMERL representation of a module documentation. Basic callback information (M:behaviour_info/1) is also appended directly to this representation. Some information from the #entry{} records is dropped at this stage, while some is translated into a somewhat bloated XMERL representation. There's an attempt to document this representation with DTD specs inlined in comments in this file. This returns to edoc:get_doc/3.
  4. edoc:get_doc/3 returns to the doclet. The doclet runs the layout plugin with the input being the expanded XMERL format.
  5. In case of output to chunks, edoc_layout_chunks extracts the relevant information from the XMERL and outputs as a #docs_v1{} record.
  6. The doclet converts the record to a binary and writes it to doc/chunks/.

Issues:

  1. No full callback info in the EDoc #entry{} records.
  2. No convenient way to pass complete specs or callback definitions through the XMERL layer. There's existing code to convert from #entry{} -> XMERL, but it doesn't make sense to do it just to convert back in the layout.
  3. I'm not sure if the XMERL representation is of any other use than to interface with the layouts. I haven't found any other code using it. @garazdawi? @KennethL?

@erszcz
Copy link
Collaborator Author

erszcz commented May 7, 2020

The latest changes are available at https://github.com/erszcz/edoc/tree/wip

@garazdawi
Copy link

I'm not sure if the XMERL representation is of any other use than to interface with the layouts. I haven't found any other code using it.

I don't know either. Maybe @richcarl knows something?

@erszcz
Copy link
Collaborator Author

erszcz commented May 13, 2020

Today's update.

Done:

  • sidestep XMerL layer and pass EDoc #entry{} records directly to edoc_layout_chunks - this opens access to all information that's gathered from source code and doc comments
  • store function spec metadata
  • store type definition metadata

The latest commit as of writing this is https://github.com/erszcz/edoc/tree/20a80c37a56cfac6c7709fb69255ac1b0f9c4c3f.

Next steps:

  • extend edoc_extract with -callback processing
  • links

Preview:

20> h(edoc_layout_chunks, module).

  -spec module(edoc:xmerl_module(), proplists:proplist()) -> binary().

  Convert EDoc module documentation to an EEP-48 style doc chunk.
ok
21> ht(edoc_doclet).
   edoc_doclet

These types are documented in this module:

  -type doclet_toc() ::
            #doclet_toc{paths :: [string()], indir :: string()}.

  -type doclet_gen() ::
            #doclet_gen{sources :: [string()],
                        app :: no_app() | atom(),
                        modules :: [module()]}.

  -type no_app() :: [].

  -type context() ::
            #doclet_context{dir :: string(),
                            env :: edoc:env(),
                            opts :: [term()]}.

  -type command() :: doclet_gen() | doclet_toc().
ok
22> hcb(edoc_doclet).
   edoc_doclet

These callbacks are documented in this module:

  run/2
ok
23>

@erszcz
Copy link
Collaborator Author

erszcz commented May 15, 2020

Preview of yesterday's result (https://github.com/erszcz/edoc/tree/callbacks):

1> hcb(edoc_layout).
   edoc_layout

These callbacks are documented in this module:

  -callback module(edoc:xmerl_module(), _) -> binary().
ok
2> hcb(edoc_layout, module).

  -callback module(edoc:xmerl_module(), _) -> binary().

  Layout entrypoint.
ok

Callback signatures and comments are now stored in doc chunks. The syntax of callback comments is the same as of type specs, i.e. the comment follows the attribute:

-type command() :: doclet_gen()
		 | doclet_toc().
%% All doclet commands.

-type doclet_toc() :: #doclet_toc{paths :: [string()],
				  indir :: string()}.
%% Doclet command.

-callback run(command(), context()) -> ok.
%% Doclet entrypoint.

It's a bit less flexible than function/spec comments where the order of forms doesn't matter:

%% @doc Before spec.
-spec f() -> ok.
f() -> ok.

-spec g() -> ok.
%% @doc After spec.
g() -> ok.

This stems from the fact that both spec attributes and function comments are attached to function definitions (and processed together with them), while callback/type comments are attached to the actual attributes and it's necessary to define which comments an attribute "owns" - the preceding or the following ones.

@richcarl
Copy link

I'm not sure if the XMERL representation is of any other use than to interface with the layouts. I haven't found any other code using it.

I don't know either. Maybe @richcarl knows something?

When I wrote it, being able to export as XML and apply XSLT seemed to be a good thing (so you didn't have to do the layout in plain Erlang if you didn't want to), but either nobody understood that it could be done, or nobody wanted to use XSLT. :-)

@richcarl
Copy link

It also allowed there to be an actual DTD describing the intermediate format, which is another good thing.

@josevalim
Copy link
Collaborator

Since we are standardizing on chunks, I wonder if it makes sense to remove those parts from edoc then, if they are really not being used (unless they are used internally?).

@erszcz
Copy link
Collaborator Author

erszcz commented May 25, 2020

@richcarl Thanks for your input!

@josevalim They are used by the default / original doclet and layout, and @KennethL underlined that for the time being we should leave it functional, even if extending EDoc with chunk support.

I'll leave the current implementation, i.e. the chunk output converting some information directly from the internal EDoc #entry{} records, until the PR to OTP is made and wait for comments then.

The last remaining thing before I intend to make the PR are links - this is the TODO in focus now.

@KennethL
Copy link

KennethL commented May 25, 2020 via email

@josevalim
Copy link
Collaborator

Makes total sense, especially if it is used today. No need for removals. Thanks @erszcz and @KennethL!

@jfacorro
Copy link

Hi 👋 ! This looks like a great idea, I was wondering what is the current status of this effort.

I was also curious about how would an Erlang project use ExDoc in a purely Erlang context (i.e. Elixir is not installed). I guess Elixir would be required in that case, am I correct?

@josevalim
Copy link
Collaborator

We are working on it. At the moment, we are refactoring ExDoc so it can support multiple sources (Elixir's markdown and Erlang's doc AST).

I was also curious about how would an Erlang project use ExDoc in a purely Erlang context (i.e. Elixir is not installed).

It will be done with escripts - so Elixir, ExDoc, and everything else will be in a single file executable.

@erszcz
Copy link
Collaborator Author

erszcz commented Aug 18, 2020

Hi, @jfacorro!

On the EDoc side of things the remaining big thing is to support linking. The progress has been slow lately due to the summer period and the pandemic before that, but I'm now hoping to get up to speed again.

BTW, funny things are exposed when trying to fill in the chunk fields with what we can get from EDoc - for example, @since or @deprecated were never supported on types, whereas prior to this effort callbacks could not have attached doc comments at all 🤷

@erszcz
Copy link
Collaborator Author

erszcz commented Oct 14, 2020

FYI, the PR for the EDoc side of things is now ready - erlang/otp#2803.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants