Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm i @rdkit/rdkit, future directions #389

Open
MichelML opened this issue Sep 25, 2023 · 7 comments
Open

npm i @rdkit/rdkit, future directions #389

MichelML opened this issue Sep 25, 2023 · 7 comments
Labels
epic Something containing a series of things to do or requiring a long time to complete.

Comments

@MichelML
Copy link
Collaborator

MichelML commented Sep 25, 2023

npm i @rdkit/rdkit, future directions

Preparing slides for RDKit UGM 2023 got me thinking more seriously about this project's future directions. While there was not a lot of hands raised when I asked the audience who used the package in the past, the number of web/js talks + hackathon interest (without mentionning npm stats, issues created, and feature requests made on GH) mentionning the use of rdkit-js made it clear that there is a demand in using rdkit on the web, and that rdkit-js is usually the first thing people try to make this a reality.

The following Epic lays out what I think are the anticipated next steps for the project across different themes, which aim to improve various facets of the project: usability, usefulness/functionalities, robustness, and good DX. This issue (or related GH issues that will be logged soon) may evolve, but here are the current themes in no particular order:

  • CI
  • Initialization
  • API documentation strategy
  • Minimallib-level functionalities
  • Higher-level functionalities, and components
  • Local development setup
  • Deprecation of open source contributed examples

CI

Context: rdkit-js is a web assembly module (called the MinimalLib) at its core, which exposes functionality of the main RDKit project. Currently, when changes occur in the main rdkit project, there is no guarantee that these changes won't break a downstream build of the rdkit-js library. Solution:

  • Add an azure pipeline CI in the GH RDKit/RDKit project that will build the rdkit-js library for each change made in the main rdkit package.

Initialization

Context: Many users complained the library is hard to use in certain contexts. Namely, if you work in a modern development environment such as create react app or similar, you cannot simply import rdkit from "@rdkit/rdkit";, then use. Why this isn't possible is multi-faceted: 1) the minimallib is not built for >= es6, 2) some environments are "use strict"-enforced, and the minimallib isn't currently compiled with the "use strict" option, 3) the library being wasm based, you need to think about where to publicly serve this module for things to work, and since a lot of users do not have a web background or experiencing managing assets in a web context, this can be a barrier to entry or a reason to simply abandon using the package. Lastly, users have reported wanting mechanisms allowing to import the library offline, since currently the library is fetched on each page load (or if you're not careful, everytime you call initRDKitModule()). Solutions:

  • Experiment with building the wasm module with emscripten es6, strict and environment options to see if it solves the import problem in modern environments (and don't break loading the lib in environments it currently works in). If this doesn't solve the problem, investigate compiling an initRDKitModule for each target environment vs a generic one for all environments.
  • Based on findings, potentially expose more than one wasm module people can choose from depending on their use case. This also ties with potentially having different builds for different use cases (depiction only vs "fullstack" rdkit-js).
  • Add a wrapper on the low-level initializer using good defaults, making it so that users don't need to think about serving assets properly (the WASM module and JS file) before starting to use the library.
  • Use IndexDB as a caching mechanism of the wasm module, so that subsequent page loads will load the module from disk and not from the network. The version cached should be considered as part of the caching mechanism.
  • Rewrite the main README to showcase the way(s) users can start using the library in modern web environments.

Minimallib-level functionalities

Context: There is currently functionalities either 1) compiled but not exposed in the docs, 2) not-compiled but already available as part of the minimallib build, and 3) not-compiled and not available, yet user-requested. Solutions:

  • Add documentation for existing functionalities not exposed in the docs: Reaction API.
  • Compile functionalities already part of the build as part of the rdkit-js offering: MCS API.
  • Work toward gradually adding user-requested features: R-group decomposition, Similarity search, getters/setters for atom and bond properties are not exposed, enable an API allowing to draw "on top" of the 2D depiction images (ref draw polygons #295 and Any quick feedback on rdkit-js? Add it here! And let the community upvote! #150 (comment) ), multi-color substructure highlights, lasso highlight, etc.

Higher-level components

Context: considering we are now able to "install, then use" in various contexts, this lays out the foundation for easy to use Higher-level components both in pure JS and modern frameworks. This should also abstract some important performance considerations to keep in mind when leveraging a wasm module and making a large amount of manipulations on molecules. Solutions:

  • Move the rdkit-structure-renderer (or its adaptation) inside the rdkit-js GH repository, which already abstracts performance considerations. Add tests.
  • Implement a MoleculeStructure React component leveraging the rdkit-structure-renderer, with tests.
  • Move rdkit-js GH repo structure to a multi-npm package, mono repo architecture: @rdkit/rdkit (core functionalities) @rdkit/renderer (@ptosco's adaptation of his rdkit-structure-renderer), and @rdkit/react (proof of concept of building and maintaining framework specific components) .

API documentation strategy

Context: Examples only documentation (current) is insufficient. Manually maintained API documentation (current) is error prone. We must move towards an API documentation that is as automated as possible. Solutions:

  • Generate the index.d.ts file for the API during the wasm module build (possible via an emcc flag --embind-emit-tsd <name of .d.ts file>). Since this file is complete, but pretty raw, add proper description of each type manually on the first generation, and make it the official types for the library. Then, commit the initial/raw index.d.ts to be able to maintain a diff of the API changes before each release; edit the client-facing index.d.ts according to the changes before a release. Finally, deploy the API documentation via typedoc (already implemented).

  • Include TS definitions for API functionalities sitting on top of the core minimallib API (initializer, components, etc); this may simply imply implementing these functionalities in TypeScript directly.

Local development of rdkit-js.

Context: The above changes will add some complexity for the development of the library. Strategies to overcome it will be laid out once progress is made on segments of this epic. For example, to build higher-level JS functionalities on top of the minimallib, you need to build the right version of the minimallib locally first, and the local tooling to do that easily is not currently in place.

Deprecation of open source contributed examples

Context: Issues regarding examples implementation have stirred the rdkit-js efforts in a direction that does not improve the library itself, and making them mainstream in the library introduces maintenance overhead. Future work and GitHub issues should encourage contributions that will either 1) Make the library easier to use, 2) Add functionality to the library, or 3) Improve existing functionality of the library. Solutions:

  • Warn contributors that the examples besides rdkit-js and rdkit-react will be removed from the GH repository, and contributors will have X days to host the examples elsewhere. The rdkit-js project will maintain a list of these initiatives in the README of the project, but won't host/deploy the examples code anymore.
  • Proceed to removing examples and adapt README consequently.
  • For remaining examples, proceed to adding examples covering missing minimallib functionalities not currently covered: MCS and Reaction APIs.

Relevant links

@MichelML MichelML added the epic Something containing a series of things to do or requiring a long time to complete. label Sep 25, 2023
@MichelML
Copy link
Collaborator Author

MichelML commented Sep 25, 2023

GH issues will be logged according to the above in the next few days.

@ptosco @greglandrum (or anyone ), any thoughts on the above would be greatly appreciated. Greg, Paolo, it was nice to see you both in person at the UGM, and the whole community actually. I'm glad we were able to exchange a bit Paolo, and hopefully next time Greg we'll get to chat more.

@papillot
Copy link

papillot commented Oct 2, 2023

Thanks @MichelML for having laid out these directions!

One improvement that I can think about regarding the current JS API, would be to abstract the need to serialize parameters as JSON strings (such as the details option in the depiction functions) and to deserialize returned values to JavaScript Objects.
I am using RDKit js in a Typescript project and passing parameters in a type safe manner is not straightforward.
I realize that this may require breaking changes to the JSMol objects, but I guess that it would be useful to have this baked-in the library so that the users do not have to reinvent their own utilities (like an API to wrap the original API with complete typings).

Regarding the caching of the wasm module in IndexDB, this is not an issue I have been confronted to. In my experience, the wasm file is cached by the browser directly thanks to the request headers. This may not be achievable on every host environment though.

@adam-of-barot
Copy link
Contributor

Thank you for the writeup @MichelML!

While I'm sort of sad to see the examples go, it's probably the right direction to go.
RDKitJS is probably a bit tricky to meaningfully contribute to, since mainly JS devs want to use it, but it requires some C/C++ knowledge to actually move the project forward.

Also, I have started experimenting with different compilation options. Once the separate issues go up, I'll go into more detail, but I just wanted to share my initial findings.

  • adding the -s EXPORT_ES6=1 flag -> compilation successful, but js tests fail
  • adding the -s STRICT=1 flag -> compilation fails
  • adding the --embind-emit-tsd interface.d.ts flag -> compilation successful, d.ts file generated. Couldn't figure out how to copy it out of the docker container though.

@MichelML
Copy link
Collaborator Author

MichelML commented Oct 14, 2023

Thanks for this @adam-of-barot , I also experimented and got the same results as you.

When you compiled with --embind-emit-tsd interface.d.ts, did it compile the wasm & js files as well or only the .d.ts file? For me, it was only the .d.ts file , so if the same for you, we'll have to sort of build the lib twice as part of the release process. No big deal, just not the best.

For EXPORT_ES6, what was failing in the tests? Was it just a matter of adjusting how you import the lib prior to the tests?

I also found a promising avenue with the ENVIRONMENT link time options, where you can compile the wasm module for any runtime in those: web,webview,worker,node. I'm not entirely sure what webview is (i think it's a deprecated API for chrome), but after building including only the web,webview,worker runtime, I was able to successfully import the initializer without compile errors in the most recent version of create-react-app. Meaning, this:

import initRDKitModule from "@rdkit/rdkit";

worked out of the box without warnings or errors. This is encouraging to make the initialization/import easier and more modern.

@adam-of-barot
Copy link
Contributor

adam-of-barot commented Oct 14, 2023

I think all three files (.js, .wasm, .d.ts) got generated, but I'll have to double check. (EDIT: double checked, all files are generated)
I compiled it with "-lembind --embind-emit-tsd interface.d.ts" flags instead of just "--bind" btw.

Testing the EXPORT_ES6 compiled .js file results in the following error:

/src/rdkit/Code/MinimalLib/demo/RDKit_minimal.js:3
var _scriptDir = import.meta.url
                        ^^^^
SyntaxError: Cannot use 'import.meta' outside a module
...

Changing require to import and changing .js to .mjs for both tests.js and RDKit_minimal.js seems to solve the issue, but other require statements inside the test file need to be changed to import as well.

But it's good to know that the ENVIRONMENT flag is promising. I'll play around with it too.
I can think of Android System WebView and Microsoft WebView2 when seeing the webview environment.

@MichelML
Copy link
Collaborator Author

@adam-of-barot -lembind didn't work for me 🤷‍♂️ . Let me know when you try, and make sure to remove your previous build artifacts before you try.

To copy the interface.d.ts out of docker, you should just have to change this line https://github.com/rdkit/rdkit-js/blob/master/Dockerfile#L64 to:

RUN make -j2 RDKit_minimal && \
  cp Code/MinimalLib/RDKit_minimal.* ../Code/MinimalLib/demo/ && \
  cp Code/MinimalLib/interface.d.ts ../Code/MinimalLib/demo/

@adam-of-barot
Copy link
Contributor

adam-of-barot commented Oct 18, 2023

My bad @MichelML, sort of.

I was manually building the MinimalLib to better understand how the docker container looks during build time. I noticed that some parameters (like -- bind) are passed from CMakeLists.txt, so that's where I initially included --embind-emit-tsd. I read from the emscripten docs that --bind is deprecated in favor of -lembind, so I tried building with that, but it's probably not necessary to change that.

I wasn't sure if there were restrictions/best practices on where to pass which flag.
But thankfully adding the --embind-emit-tsd flag in the Dockerfile does the same thing, so that's good.

I was able to get the autogenerated d.ts out of the container with that code, thanks!

This was referenced Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic Something containing a series of things to do or requiring a long time to complete.
Projects
None yet
Development

No branches or pull requests

3 participants