seedcase-project · lwjohnst86 · Jan 2, 2024 · Dec 19, 2023 · Dec 19, 2023 · Dec 19, 2023
@@ -0,0 +1,15 @@
+@startuml runtime-login-sequence
+
+title Login and Authentication Sequence of a Registered User
+
+participant "User" as u
+participant "Frontend" as f
+participant "API" as api
+participant "Backend" as db
+
+u -> f --: Enter login information
+f -> api --: Send login request
+api -> db ++: Check permissions level
+db --> api --: Send permissions level
+api --> u --: Show permission-dependent content
+@enduml
@@ -0,0 +1,185 @@
+---
+title: "Runtime View"
+---
+
+This section describes the concrete behaviour, interactions, and
+pathways that data take within Sprout. "Runtime" in this case refers
+to how the software works "in action".
+
+## Login and Authentication
+
+Almost all users will need to log into the Sprout-managed Data
+Resources. The steps for logging in and having their permission levels
+checked follows the sequence described in the figure below.
+
+![Login and Authentication Sequence of a Registered User.](images/runtime-login-sequence.png)
+
+## Data Input
+
+The overall aim of this section is to describe the general path that
+data takes through a Seedcase Data Resource, from input into the final
+output. Specifically, these items are described as:
+
+-   *Input*: Because we currently focus on health research, the type of
+    input data and metadata is what is typically generated from
+    health studies. This could be in the form of e.g., csv or Excel files, json files, or image files.
+-   *Output*: The final output is the input data stored
+    together as a single database, or at least multiple databases and
+    files explicitly linked in such a way that it conceptually
+    represents a single database.
+
+### Expected Type of Input Data
+
+Given the (current) focus on health data as well as the team's experiences with research and health data,
+we make some assumptions about the type of data that will be input into
+Sprout. Health data tends to consist of specific types of data:
+
+<!---
+-   **Clinical**: This data is typically collected during patient visits
+    to doctors. Depending on the country or administrative region, there
+    will likely already be well-established data processing and storage
+    pipelines in place. --->
+-   **Register**: This type of data is highly dependent on the country
+    or region. Generally, this data is collected for national or
+    regional administrative purposes, such as, recording employment
+    status, income, address, medication purchases, and diagnoses. Like
+    the routine clinical data, the pipelines in place for processing and
+    storage of this data are usually very extensive and well established. 
+<!---
+-   **Biological sample data**: This type of data is generated from
+    biological samples, like blood, saliva, semen, hair, or urine. Data
+    generated from sample analytic techniques often produce large
+    volumes of data per person. Samples may be generated in larger
+    established laboratories or in smaller research groups, depending on
+    how what analytic technology is used and how new it is. The
+    structure and format of the generated data also tends to be highly
+    variable and depends heavily on the technology used, sometimes
+    requiring specialized software to process and output. --->
+-   **Survey or questionnaire**: This type of data is often done based
+    on a given study's aims and research questions. There are hundreds
+    of different questionnaires that can have highly specific purposes
+    and uses for their data. They are also highly variable in the volume
+    of data collected based on the survey, and on the format of the
+    data.
+These types of input data are commonly formatted in text (txt) files, comma-separated value (csv) files, Excel (XLSX) files, and JSON structures. 
+
+### Expected Flow of Input Data
+
+The above described data tends to fit into, mostly, two categories for
+data input.
+
+-   *Routine or continuous collection*, where ingested data into
+    Sprout would occur as soon as the data was collected from one
+    "observational unit"[^1] or very shortly afterwards. Clinical data
+    as well as survey or questionnaire data may likely fall under this
+    category.
+-   *Batch collection*, where ingested data occurs some time after the
+    data was collected and from multiple observational units. Biological
+    sample data would fall under this category, since laboratories
+    usually run several samples at once and input data after internal
+    quality control checks and machine-specific data processing. While
+    register-based data does get collected continuously, direct access
+    to it is only given on a batch basis, usually once every year.
+    Survey data may also come in batches, depending on the questionnaire
+    and software used for its collection.
+
+[^1]: Observational unit is the "entity" that the data was collected
+    from at a given point in time, such as a human participant in a
+    cohort study or a rat in an animal study at a specific time point.
+
+For sources of data from routine collection with well-established data
+input processes, the data input pipeline would likely involve
+redirecting these data sources from their generation into Seedcase via a
+direct call to the API so the data continues on to the backend and
+eventual data storage.
+
+Sources of data that don't have well-established data input processes,
+such as data from hospitals or medical laboratories, would need to use the
+Sprout data batch-input Web Portal. This Portal only accepts data
+that is in a pre-defined format (as determined and created by the Data Management 
+Administrators) that includes documentation, and potentially automation
+scripts on how to pre-process the data prior to uploading it.
+
+These uploaded files might be a variety of file types, like `.csv`,
+`.xls`, or `.txt`). Only users with the correct permission levels are
+allowed to upload data. It will be the Data Access Administrator who
+will be doing the initial upload, as that will entail setting up table schemas
+and allocating space in the raw data file storage. The second way of
+getting data into the Data Resource is by manually enter it by an
+authorized Data Contributor.
+
+Once the data is submitted through the Portal, it is sent in an
+encrypted, legally-compliant format to a server and stored in the way
+defined by the API and common data model.
+
+### Upload Data to Sprout
+
+An approved user, i.e., a Data Access Administrator or a Data Contributor, will open the login screen in the Web Portal. They
+will enter their credentials which will be transmitted to the API layer.
+The API Security layer will check with the list of users and permissions
+in the database and confirm that the specific user has permission to
+enter data into a specific table (or set of tables) in the database.
+
+Once this check is complete the frontend will receive permission from
+the API Security layer to display the data entry/upload options for this kind of user role.
+
+Before any of the actions described below can be done, it is expected that appropriate table schemas or entry forms have been created by one or more administrators of the system. This process is described elsewhere.
+
+<!--TODO change elsewhere above to the actual location of where we describe table schema and data entry form creation-->
+
+#### Batch Upload of Data 
+
+The user has selected a valid table schema to use, and have uploaded the file to the holding area. This prompts the system to check that the data in the file match the schema in the database on headers and data type. If this validation is successful then the system will inform the user about how many rows of data it found and validated. If the user is in agreement, then the system will write the data into the relevant table and display a confirmation back to the user. Should the user disagree with the number of rows then they cancel the upload and take the file away to investigate why the system can't see the correct number of rows, this is an action which happens outside of Seedcase.
+
+![Logged In User Who Chooses to Use the Batch Upload Function with Existing Table Schema.](/design/images/user-flow-data-upload.png){#fig-batch-data-entry}
+
+<!--TODO Ensure that the link above will still work once SKB has finished updating the diagrams-->
+
+#### Manual Data Entry: Done in One Session
+
+The user completes all fields in the form and clicks "Save and Submit". This sends
+the data to the API layer where it is confirmed as valid, parcelled up
+and submitted to the database. The database will then write the data
+into a new record in the table (or tables). Once done the database will
+confirm successful entry of data to the API which will in turn send the
+confirmation back to the user via the frontend.
+
+![Logged In User Who Manually Writes a New Row to the Data
+Resource.](/design/images/runtime-manual-data-entry.png){#fig-manual-data-entry}
+
+<!--TODO convert puml file to png so that the link above works-->
+
+#### Manual Data Entry: Done in Multiple Sessions
+
+There may be situations where an approved user will be prevented from
+completing the data entry form in one session. In that case it would be
+beneficial if there is an option of saving the data as it is, and be
+able to return to the data entry at a later time. Much of the initial
+workflow is the same as above, until the user is interrupted and selects
+"Save" instead of "Save and Submit". This will send the data to the API with
+a flag showing that fields may be incomplete, thus preventing the API
+from rejecting the data due to NULL values. The API will submit the data
+to the database along the incomplete flag.
+
+When the Data Contributor goes back to the data entry at a later time, they will be
+presented with the option of completing any incomplete records as well
+as entering new data. If they click on "Complete Records" they are shown
+the records that they have started but not submitted. Once they select a
+partially completed record the frontend will request the currently
+completed items from the database via the API layer before displaying
+the entry form with the completed fields.
+
+Once the user has completed more data they can either click on "Save" or
+"Save and Submit". The first option will put them back to the top of this
+workflow, the second will send the data back to the API layer for
+validation. Once the data is validated it will be submitted to the
+database. The database will then write the data into a new record in the
+table (or tables) and update the flag to show the record is complete.
+Once done the database will confirm successful entry of data to the API
+which will in turn send the confirmation back to the user via the front
+end.
+
+![Logged In User Enters Data Manually in More Than One
+Session](/design/images/runtime-manual-data-update.png){#fig-manual-data-update}
+
+<!--TODO convert puml file to png so that the link above works-->