Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB Creation and Updating #114

Open
jthompson-arcus opened this issue Oct 16, 2024 · 4 comments
Open

DB Creation and Updating #114

jthompson-arcus opened this issue Oct 16, 2024 · 4 comments
Labels
question Further information is requested

Comments

@jthompson-arcus
Copy link
Collaborator

clinsight/R/run_app.R

Lines 66 to 74 in f32e09d

if(!file.exists(user_db)){
warning("No user database found. New database will be created")
db_create(get_review_data(data), db_path = user_db)
} else{
# Skip if not needed for faster testing:
if(isTRUE(get_golem_config("app_prod"))){
db_update(get_review_data(data), db_path = user_db)
}
}

Why is the database creation and updating not part of the preprocessing? It seems like since we already require the creation of the study data and meta data that it makes sense to update the database when study data is created, not every time the application is run.

@jthompson-arcus jthompson-arcus added the question Further information is requested label Oct 16, 2024
@LDSamson
Copy link
Collaborator

You can always update the database manually of course, which is probably smart to do. Do you propose to remove these checks and if so, why? Don't you think these checks add a layer of robustness to the app by verifying that the database is in synch with the study data when starting the application?

@jthompson-arcus
Copy link
Collaborator Author

I do think we should have the check in place. And db_update() does return early with the simple sync time check.

I think my question around it is really two-fold:

  1. As a process flow, it makes more sense to be running this in the pre-processing step as opposed to the run-time step. But maybe that is just a documentation problem because this can function mostly as a check.
  2. The update only runs when app_prod = TRUE and does so silently. This configuration has to be intentionally added because the default is FALSE and is easy to forget.

To a lesser extent I'm worried about initialization time, but this process here isn't really the culprit there.

@jthompson-arcus
Copy link
Collaborator Author

@LDSamson another thought is that the study_data is only used here and in get_appdata() in app_server.R. Relooking at our process can help with load times in the application. Right now the app_data object gets created for each user at runtime. As far as I can tell, we should get performance gains if the application relies on app_data instead of study_data.

@LDSamson
Copy link
Collaborator

Might be better indeed to rely on app_data instead. historically, the app relied on multiple objects being available (app_data, app_tables, app_vars, metadata), which was too complicated and led to the current situation. I don't think performance gain will be big, but it can help to only rely on app_data.

We can also simplify the in-memory metadata, since I think we are only using metadata$items_expanded and some metadata-derived app_vars in the application.

We should aim for better performance since the app is slow to start up and identify the biggest bottlenecks. Probably biggest performance gains can be achieved by improving the plotly functions, I think designing the figures by using native plotly function instead of using ggplot2 and then converting the figures with plotly::ggplotly().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants