Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request/Idea: switching from Sphinx docs to using Docusaurus #8905

Open
kuhlaid opened this issue Aug 9, 2022 · 13 comments
Open

Feature Request/Idea: switching from Sphinx docs to using Docusaurus #8905

kuhlaid opened this issue Aug 9, 2022 · 13 comments

Comments

@kuhlaid
Copy link
Contributor

kuhlaid commented Aug 9, 2022

Overview of the Feature Request
What are thoughts on switching from Sphinx docs to using Docusaurus for the Dataverse documentation?

The search feature in Sphinx is no good and does not handle quoted 'specific' text searches very well, if at all. For instance, if I search on "categorical labels" it gives me top results on items that are not specific to these two words. Docusaurus has built-in support for Algolia document search, which is free to Open Source projects and works extremely well on specific text searches. This makes searching things in the documents much easier.

I would be willing to work on transiting the docs to Docusaurus if there is interest in it. Anyway, just something to think about that might improve the Dataverse documentation.

@kuhlaid kuhlaid changed the title Feature Request/Idea: Feature Request/Idea: switching from Sphinx docs to using Docusaurus Aug 9, 2022
@pdurbin
Copy link
Member

pdurbin commented Aug 9, 2022

@kuhlaid hi, there's a related issue here:

Back then I was suggesting downloading the PDF version of the guides but https://guides.dataverse.org/en/latest/Dataverse.pdf is a 404 these days so I guess we don't build a PDF anymore. It looks like we had PDFs as of 4.17 (see #6168): https://guides.dataverse.org/en/4.17/Dataverse.pdf

@kuhlaid I'm wondering, would you be able to do the searches you like in a PDF? Could that be a workaround?

I'm a pretty big fan of Sphinx so I'm personally not very excited about switching unless the benefits really outweigh the costs.

(Docusaurus does look pretty cool from a quick look.) 😄

@poikilotherm
Copy link
Contributor

poikilotherm commented Aug 9, 2022

Please note that you can also add Algolia to Sphinx docs. A quick search revealed people succeeded with that already.

On a related note: just by moving to Docosaurus the underlying substantial issues of our documentation won't go away.

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Aug 10, 2022

@pdurbin I do not suggest PDF as a main source of searching documentation just from a usability standpoint. Most users will see the Sphinx search box in the documentation and go straight for that.

Does Sphinx search give you search stats to help improve documentation structuring and keywords? I'm not knocking all of Sphinx just the search, which is no es bueno. So, if Algolia or similar could replace the Sphinx search, that might be a step in the right direction.

@poikilotherm I would be curious to hear more regarding the underlying substantial issues of our documentation that need to be addressed. I will say that the API documentation is rather sparse, but I understand that there are only so many hours in a day to address these things. I think this is a valuable project and only want to help see it succeed.

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Aug 12, 2022

Just for giggles I started building out a Docusaurus version of the Dataverse docs (dataverse-docusaurus.vercel.app) as a proof of concept. Since the pages are static they load instantly and updates to the repository get automatically pushed to Vercel for testing. I will throw in the search feature once I have converted more of the docs.

@pdurbin
Copy link
Member

pdurbin commented Aug 23, 2022

@kuhlaid neat! Is the source of the docs on GitHub? Ah, I think I found it: https://github.com/kuhlaid/dataverse-docusaurus

Markdown instead of .rst. Interesting. 😄

@pdurbin
Copy link
Member

pdurbin commented Oct 1, 2022

https://podrocket.logrocket.com/docusaurus was a good listen. I dunno, it's a lot of work to switch! 😄

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Oct 3, 2022

I gave up on Algolia. With Algolia you can't create a testing/development environment with them, ONLY PRODUCTION. That was ridiculous. I also tested TypeSense but it is too immature at this point so I decided to switch to local search since TypeSense has limitations on searching items in quotes and codeblocks and TypeSense is a pain to setup with scraper (which is a huge pain point that likely prompted Algolia to ditch it from their processes). Anyway, the test docs I compiled are running at https://dataverse-docusaurus.vercel.app with local search installed. Local search does a better job of searching within code blocks and quotes.

@pdurbin
Copy link
Member

pdurbin commented Dec 2, 2022

@kuhlaid very cool. The search seems nice.

I just mentioned this issue and your efforts to @siacus. Like me, he's a pretty big fan of Markdown. A lot of people haven't even heard of reStructuredText, making it harder to contribute beyond simple text changes. I even opened an issue about writing up some tips on using rst:

I'm still on the fence though. It's a lot of work to switch. Plus, I'm not sure what features we'd lose such as the ability to create PDFs and ePubs.

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Jan 4, 2023

@pdurbin I thought I had posted the following suggestion of keeping Sphinx (but I guess I never pressed send). Anyway, my thought was, keep Sphinx but add a link to the PDF version of the Dataverse documentation beside the Sphinx search field with a note stating something along the lines of for more robust searching, search within the PDF version of our documentation. That might improve the existing search situation. I don't even know where to find a PDF version of the Dataverse docs within the docs site in the current state.

@pdurbin
Copy link
Member

pdurbin commented Jan 4, 2023

@kuhlaid https://guides.dataverse.org/en/latest/Dataverse.pdf is a 404 😢

https://guides.dataverse.org/en/4.17/Dataverse.pdf works. There's a PDF to download if you want to look at it. So it's been a while (the 4.x days, I guess) since we built the PDF. This issue from 4.17 is related:

Anyway, I agree with you. It would be nice to have a PDF of the guides again. And we could link to them from the HTML guides, once they're working, and suggest it as an alternative for searching. Do you want to create an issue for this?

Also, how to do feel about ePub? The ePub build is still working, it seems: https://guides.dataverse.org/en/latest/Dataverse.epub

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Jan 4, 2023

From the #6168 issue, it appears as though the PDF generator is unhappy about the non-ASCII characters within the documentation source files.
To search for files containing any non-ASCII characters, you can run the following within a bash terminal

LC_ALL=C find . -type f -exec grep -c -P -n "[^\x00-\x7F]" {} +        # within the current directory, list the files and number of non-ASCII characters in them
LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" somefile.txt   # shows where non-ASCII characters are found in a file (copy the results to an empty text file or somewhere to reference)
nano somefile.txt   # use CTRL+W then CTRL+T to go to a specific line in the file

I was not able to find any non-ASCII characters in the current files.

@pdurbin
Copy link
Member

pdurbin commented Nov 9, 2023

We're adding Markdown support. Please see this issue:

@pdurbin
Copy link
Member

pdurbin commented Jul 25, 2024

The Contributor Guide is written in Markdown.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants