-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RO-Crate previewer #41
Conversation
There is still a problem with the recognition of RO-Crate files in the dataverse. The file name that should map to RO-Crate metadata file is "ro-crate-metadata.json". These files are recognized as JSON files and get "application/json" mime type. It will require a pull request to ensure the correct RO-Crate mime type for these files: |
@ErykKul great stuff! With regard to the file extension, we probably need to treat JSON files differently. Instead of simply relying on .json as a file extension, we should probably read a bit of the file and try to figure out what flavor of JSON it is based on the content. Does that make sense? Please feel free to create an issue about this. Also, wow, what a long mime type! I keep thinking we should show mime types in the but maybe not this one in full. 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of giving my 2 cents on this PR, which I'm very happy to see! :)
I am not familiar with dataverse codebase but isn't there a better way than include a 34.000 lines js file (ro-crate-dynamic.js)? Also this file starts with minified code, never a good thing to have minified js in source tree (and why not minify all the file then?).
It seems this huge file also includes all of the test code/library (assertion-error, chai, etc...), definitely not something we want to include, right? How was this file produced?
Also @ErykKul points to https://github.com/UTS-eResearch/ro-crate-html-js in their message but the active fork is: https://github.com/Language-Research-Technology/ro-crate-html-js. Was the maintained one used?
Hello @ErykKul 👋 I see you fixed the small issues I pointed, but what about the elephant in the room: meaning the 34k lines javascript file. Can you comment on that? Do you think this could be improved somehow? I'm also asking a few questions in my comment above, it would be great if you could address them :) Best regards, |
The big js file is the one of the two distribution files from ro-crate-html-js projects. It can be deleted and then loaded instead from https://unpkg.com/ro-crate-html-js/dist/ro-crate-dynamic.js (that resolves to https://unpkg.com/[email protected]/dist/ro-crate-dynamic.js if we want to point to a specific version). I copied it to this project, since it is sometimes a good idea to keep the library files in the case when some online resources change or disappear over time. Also, I needed to add one line of code in there; otherwise the viewer is not refreshed after the script is loaded. There might be a better solution to that, I did not find it yet. If we say we delete this file, then we would need to have some way to call the load function of that script after it is loaded in the previewer. |
I added this info in short at the top of the lib file. |
I respectfully disagree. Copying the libraries in the source is really bad practice that should have disappeared with the existence of I see two solutions:
A few notes:
Overall, my comments aren't really regarding only this PR but rather the existing structure of this repository and how dependencies are handled. I'm also suprised that basically everything needs to be duplicated in version folders. Isn't it possible to use git branches and tags? I can't imagine trying to find something with Sorry if my comment appears negative, it is hard to communicate constructive criticism in a pleasant manner. Best regards, |
I will go with the first solution, I am working on it (loading the script from the script tag with the call to the load function in it did not work for me at first, before doing this pr, but I will put some more effort into it). As far as I know, there is no build process to this project. You simply point directly to the HTML, e.g. https://gdcc.github.io/dataverse-previewers/previewers/v1.2/ImagePreview.html I am not sure if there is any other reason why we copy the script files to this project from other projects, maybe @qqmyers knows history of it better. I assume it is fine to point to external files. |
@NicolasCARPi I have tried again calling the load function from the script tag, but I still fail at that. My last attempt: ErykKul@af36cb6 My best guess is that it is being called on a wrong thread. When I call window.onload() manually with debug on the RO-Crate previewer thread, it loads the content: I have tried googling it, adding sleep, doing async, etc., with no results. I think that this is the best I can do with my limited knowledge of these technologies. If you have some good ideas, I am open for suggestions. |
In the commit you mention, you are using a You'll need a third script tag so the resulting html looks like: <script type='application/ld+json'>{ DATA }</script>
<script src='https://unpkg...'></script>
<script>window.onload()</script> Note that you do not need to specify the
(source) Note also that the onload() function needs to be made available in global namespace (window), which I believe is the case, haven't looked at it again. Finally, note that I don't have a running Dataverse instance locally where I could try your code, so I'm doing all this "matrix style" and I might miss something! I'll see about resolving that so my contributions can be more meaningful. |
It did not work (window.onload is not a function): This was also something I had tried before doing this PR. It does work with the delay of 5 seconds like this, but I do not think it is a good solution:
I like better some sort of solution like the current one, where the window.onload() is called directly after the script is loaded. |
I have also just tried this one:
The browser just goes into infinite loop and the site is not responsive, without the debug option, etc. |
I have also tried making the call async in a new function. |
Looking at the code here: https://unpkg.com/[email protected]/dist/ro-crate-dynamic.js We can see: window.onload = function () {
load();
}; Be careful as you seem to mix the But reading a bit more the code, it seems this library is very specific and isn't really a library as we can usually encounter. It is full of side effects and expects html elements with specific ids or content to be present on the page. This is something we often see in research software, as they are not written by javascript gurus, or people knowledgeable in the web technologies, but rather by researchers who get it to a point where it works on their machine. In a modern ideal world it would work like this: import { Previewer } from './ro-crate-previewer';
const ro-previewer = new Previewer();
ro-previewer.setData = '{"some": "json-ld"}';
const previewDiv = document.getElementById('previewDiv');
previewDiv.innerHTML = ro-previewer.getHtml(); Something like this. But here the issue is that the lib you're trying to use is clearly tailored for a unique use case rather than being generic, as a library should be. At this point, I wonder how difficult it would be to write it from scratch in typescript, with a proper api and packaging, rather than having to depend on this existing one 🤔 The more I look at the code, the more I see issues everywhere. Such as HTML content, english strings, mixed const/let/var, etc.... I'm sure something cleaner can be done. I'll try and have a look at it this week-end. |
Great! I appreciate that! What I also forgot to mention is that normally that script is loaded from in the header of the HTML, however, if that script loads before the tag Nevertheless, it is one of the "official" RO-Crate tools, as it is mentioned by https://www.researchobject.org/ro-crate/tools/. I did use it out of context of the intended use case, this may contribute to the encountered problems. Considering that it is just one line of code I needed to add, it is a kind of pragmatic solution that could be considered in the short term. |
This might be also useful: there is an easy way of setting up a local dataverse for development/testing using docker: https://guides.dataverse.org/en/latest/container/dev-usage.html |
So I started working on a library that would generate html from a ro-crate json-ld string, in order to adress the shortcomings mentioned above. Here is how one can use it (once it'll be published). The code below would be on dataverse side: import { Builder } from '@deltablot/ro-crate2html';
const data = 'our json-ld ro-crate as string, obtained with a fetch() for instance';
// initialize our library object
const builder = new Builder();
// get the results from the builder, much easier to call a function with data as argument rather than expect the lib to find the data in a script element in the page!!!
const result = builder.parse(data);
// now we simply add the results to our element of choice in our html
const targetDiv = document.getElementById('ro-crate-div');
/*
currently the lib returns an array of 2 HTMLElements,
the first one corresponds to the ro-crate-metadata.json,
so kind of like the metadata of the crate,
and the second one corresponds to the `@id` with `./`, which we know is present in the crate, per specification.
*/
result.forEach(el => {
targetDiv.appendChild(el);
}); And the HTML would need to be:
And here is how it looks like in the page: Of course this is very preliminary as I've only worked on it for a few hours. But I believe it would be far better to use a library like this one, with a clean interface and no assumption on where it will run. Currently the lib is less than 100 lines of code and has no dependencies. One question that I have is: what exactly do we want to extract from the RO-Crate, and how do we want to display it? I used the |
Hey @ErykKul can you give it a try? The npm package is published and the documentation is enough to get you started. You can load the lib with: https://unpkg.com/@deltablot/[email protected]/dist/main.js Then use it as shown above, or get inspiration from the demo code: https://github.com/deltablot/ro-crate2html/tree/master/demo |
@NicolasCARPi |
Ok, I'll try and join, didn't know this existed! |
I have tried integrating the previewer, but I get an error from the script: It works for the most part, but it looks strange (I think that the error prevents the other scripts from fixing the header, etcl.): The header is needed when you use "Open in new window" button, like this: I will try your examples of RO-Crate metadata to see if it fixes the problem. The example I have used comes from the https://github.com/UTS-eResearch/ro-crate-html-js/blob/master/test_data/sample-ro-crate-metadata.json The implementation I am testing: Kind regards, |
I have tried your examples, they don't generate that error, but it still looks the same: It might be related to the Kind regards, |
I think the issue is that your sample crate is using legacy As for why the page looks weird I have no idea, but at least the proof of concept works, you successfully replaced 34k lines of code with about 100, which in my book is a clear win! |
…k in Dataverse context
Downloading files via previewer is not possible right now, as it would require significant changes to the previewers' framework (it is intended for previewing single files and does not contain download links to other files). Therefore, I have removed the download button-link from the preview. I have also disabled the previews of PDF and image files due to the same problem; dataset files are not accessible from the previewer. |
@NicolasCARPi that 34K line file is my fault -- I wrote the original HTML preview code several years ago and I really had no idea what I was doing when I set up the packaging. It's certainly not an 'official' RO-Crate thing, but it has been used in a number of projects in the absence of anything else that just works for generic RO-Crate rendering. That code was developed over time, dealing with issues as they came up but I would be very happy to assist in distilling all the things that it does into a set of requirements for new rendering code. A quick summary is that the code was developed initially to make a static HTML preview for a crate that could be stored in an RO-Crate root directory. It turned out that for very large crates (over a few thousand entities -- and yes there are lots of these being created), this didn't work very well in browsers so a hybrid approach evolved - write out a summary of the metadata so there is a summary for posterity, and then when a user opens it, call in the viewer app to provide an interactive RO-Crate browser. It displays one element at a time, as trying to turn a JSON-LD @graph into a tree (eg nested elements) is NOT possible, there may be cycles in the @graph, and as a user clicks around, the 'tree' will just keep going.
Some of the considerations a preview needs to think about at least these things:
You may want to consider using the RO-Crate-js library which deals with finding the root, resolving context terms, validation etc and which will implement future RO-Crate versions. I'll get my team to have a look at removing as much of the extra code as possible from the distribution, thanks for pointing this out @NicolasCARPi |
Hello @ptsefton ! Thank you for your insight.
I'm sure the library is quite useful in the context of a full fledged visualizer, and has already been battle tested to handle all edge cases. But I believe in the context of a previewer on Dataverse website, there are a few things of concern. For instance, look at this line: if (p.match(/(\.txt$)|(\.html?$)/i)){
previews += `<iframe src='${p}' width='100%' height='500'></iframe>`; This will create an iframe in the page with attacker supplied URL/content, which will probably be able to run javascript because having a restrictive CSP for Dataverse isn't something users appear to be doing. I also have an issue with this kind of code: https://github.com/Language-Research-Technology/ro-crate-html-js/blob/1e81b76649753cef43ab68eb4ba8b0dc21b3feeb/lib/ro-crate-preview.js#L165-L183 which will add obsolete version of jquery library in the page along with a bunch of css and other js files. In the context of a standalone previewer, why not, use what works for you, but in the context of a standalone library, this is a big red flag. Ideally the library has no concept of webpage, its job is to convert JSON-LD to simple HTML so we can display the content in a "browsable" manner. And it definitely must not do any CSS, especially hardcoded values as style attribute. Given this, my take is to start from scratch with a proper typescript library, and while it still needs polishing, especially to handle many use cases, I believe it's enough to show a preview of the content, and IMHO we should not try and allow previewing too much: embedding pdfs and displaying html code in the page shouldn't be the concern of the previewer on Dataverse (but definitely is for your lib, which is a more complete visualizer (not previewer, visualizer)). I'll be happy to discuss that with you and others this afternoon: https://dataverse.org/community-calls :) Best regards, |
@NicolasCARPi unfortunately I won't be able to make todays (tomorrow's) call from this timezone. I was not proposing that you use the ro-crate-html-js which you have already critiqued, or to defend it, the way it's packaged is clearly not optimal and we have always assumed that embedding it in a server application would not be appropriate. I was referring to ro-crate-js which handles RO-Crate parsing and access to entities etc, so you can find the Root Data Entity, parse the context etc. This library is available here: https://www.npmjs.com/package/ro-crate -- if you have any concerns about that it would be good to hear them too. I'm not clear on what you're trying to achieve with the Dataverse crate-viewer but I think it would be useful if people could use it to inspect the contents of datasets -- including views of whatever can be safely viewed. That's part of the point on having a visualizer/viewer whatever you call it. If there are images and PDFs and so in a dataset why not let people look at them and see if they want to download the data? This might include looking at the first few rows of CSV files, and viewing movies inline. That's part of the the point of having the HTML file as part of the spec, and it would be good to see that available in a repository, in my opinion. You may not like the term preview, but that's why we sometimes refer to it as a preview -- it's taking a look at what would get if you download the package. |
Oh I didn't realize you were talking about another lib. But it seems the API doc is 404: https://arkisto-platform.github.io/ro-crate-js/ (following the link in the README). I agree it would make sense to use that one, and contribute to it if needed, but it looks like it works in Node and not in Browser context, right? As for where to place the cursor on what to display and what not to display, I guess it's a decision left to Dataverse's maintainers :) Cheers, |
Could it be an option to use the ro-crate-html-js offline to generate the preview-html server-side, but without the script |
@Stian that's an option which is suitable for small/usual sized crates, just generate a static preview and not have a script loaded at all. We're currently updating the code to improve the packaging, we can add an option for rendering a complete crate and whether or not to call the dynamic previewer/vizualizer |
This repository also functions as the backend of the previewers and can only serve static content (https://pages.github.com/); i.e., the same for every file being previewed as HTML. For the RO-Crate previewer the HTML file that is shown is previewers/betatest/ROCrate.html, which is also hosted on my fork: https://erykkul.github.io/dataverse-previewers/previewers/betatest/ROCrate.html For example, If you would like to try this previewer on your Dataverse installation without waiting for this PR to be merged, you could run this bash command to add it to your local Dataverse installation (note that it also needs the IQSS/dataverse#10016 to detect the RO-Crate files such that they can be shown with the correct previewer): curl -X POST -H 'Content-type: application/json' http://localhost:8080/api/admin/externalTools -d \
'{
"displayName": "Show RO-Crate content",
"description": "View the RO-Crate file.",
"toolName": "rocratePreviewer",
"scope": "file",
"types":["preview"],
"toolUrl": "[https://erykkul.github.io/dataverse-previewers/previewers/betatest/ROCrate.html"](https://erykkul.github.io/dataverse-previewers/previewers/betatest/ROCrate.html%22),
"toolParameters": {
"queryParameters":[
{"fileid":"{fileId}"},
{"siteUrl":"{siteUrl}"},
{"key":"{apiToken}"},
{"datasetid":"{datasetId}"},
{"datasetversion":"{datasetVersion}"},
{"locale":"{localeCode}"}
]
},
"contentType": "application/ld+json; profile=\"http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted [https://w3id.org/ro/crate\""](https://w3id.org/ro/crate/%22%22)
}' This kind of setup requires a loaded script that can dynamically adapt the statically hosted HTML to the specific content for the file being previewed. Nevertheless, it is possible to have a backend that does more than just hosting static content and have anything behind it, like docker container with any kind of code. In that case, you would simply set the I think that the previewer implemented in this PR is fine for the short term use. It uses the most complete and production ready library for RO-Crate viewing I have seen so far. There might be something better available in the future, when that time comes, we can reconsider and add a new previewer or replace this one. I think it is safe to merge this PR and wait until that happens. @ptsefton |
In case anyone missed the great conversation about this previewer (and RO-Crate in general), it's linked from here: https://groups.google.com/g/dataverse-community/c/jlRanZy4b38/m/lcyiT7F4HQAJ Direct link to the video: https://drive.google.com/file/d/163meDdiPpx7GtVYo0ue44iaypbAt4Pxf Also, there's an RO-Crate topic in our chat if anyone would like to discuss there: https://dataverse.zulipchat.com/#narrow/stream/379673-dev/topic/RO-Crate/near/393962020 |
Hi all, The github repo https://github.com/UTS-eResearch/ro-crate-html-js has been superseeded by https://github.com/Language-Research-Technology/ro-crate-html-js since about a year ago. |
Thanks! It looks like the https://www.npmjs.com/package/ro-crate-html has outdated information: It points to https://unpkg.com/ro-crate-html-js/dist/ro-crate-dynamic.js i.s.o. https://unpkg.com/ro-crate-html/dist/ro-crate-dynamic.js That is why the file I have used was outdated. I have checked out the project and can build it now. It will be easier to add the changes I needed in the source than in the minimized version. Also, A new commit on this branch will follow soon. |
Now the library is 168.1 kB, I will test the new version of the previewer to see if it still works as expected. |
It works great! Thanks! |
RO-Crate previewer for viewing and verifying RO-Crate files based on https://github.com/UTS-eResearch/ro-crate-html-js