Skip to content

Data Persistence Model

Sara Gaudon edited this page Nov 12, 2019 · 8 revisions

Content

  1. Persist
  2. But why not just use a real database
  3. Design overview
  4. Top-level API
  5. Persistent Object API
  6. Crypto API
  7. Implementation Notes

Persist

The data persistence module (a.k.a. "Persist") is essentially a self-contained library designed to store structured information in a folder shared across computers. It replicates the basic features of a document-oriented database. The library provides three core pieces of functionality: • User account management • Persistent object storage (a "database") • Lock/mutex implementation for network drives

Security and privacy

KoNote uses a multi-layer approach to securing data. First, all clinical information is stored in an encrypted form. KoNote uses 256-bit AES encryption, the same encryption technology used to secure online banking, to ensure that only users with the proper user name and password view patient information. Second, KoNote and the encrypted files can be installed on a network drive. This gives IT the ability to control access to the application in a familiar way, using the same access controls (file permissions) used to protect documents and files on the network drive. This also allows KoNote to leverage any existing backup solution that is in place. Read more on Security and Privacy

But why not just use a real database?

Ideally, we would. In fact, it's very likely that we will at some point. Right now, though, our deployment target is a Windows network drive, not a full server, and this limits our options.

Popular databases such as PostgreSQL, MySQL, MongoDB, SQL Server, etc are designed to run using a client-server architecture. In other words, the database runs on a single computer somewhere, and, any time an application wants to read or write information in the database, the application sends a "request" or "query" to the database over a network connection, and the database sends back a "response".

Where would it go?

This works great when there is a designated physical server (or cloud VM) for the database to run on. But how would this work in our case? Where would the database software actually run?

The first candidate would be the server providing the shared folder (i.e. the network drive). Unfortunately, running software on this server would require special access from the org's IT department. It's unlikely we would get that access -- I doubt most IT departments would be eager to give us full access to the server running their network file shares. I bet they'd rather just give us a separate server (VM).

A second possibility is that one of the user's computers acts as the "server" and runs the database. All other users' computers would connect to this "server". The first problem here would be that this computer must be running at all times for the database to be accessible. Second, IT departments typically limit network communication between users' computers (i.e. P2P) -- that sort of network traffic is often indicative of malware.

Everyone gets a database!

The most promising idea actually does away with the concept of a designated "database server". Databases need to store their underlying data somewhere. There's a reason why a database doesn't lose its data when a server is restarted. Typically, databases are configured to store their data in a directory or a raw disk partition. Idea: what if we configured the database to store its data at X:\KoNode\data? In other words, instruct the database to store its data on the network drive!

This sounds great, and could actually work in isolation. However, problems arise when multiple users wish to use the database simultaneously. What if two users simultaneously try to modify the same part of the database? There would need to be some way for their two computers to coordinate the change -- otherwise, one might accidentally write over the wrong part of a file and corrupt the database. This sort of conflict might happen weekly or even daily in a large organization, due to the way databases are structured (indexes, etc).

We need more locks

Databases normally prevent these conflicts using "locks" ("mutex" or "semaphore" in fancy CS speak). Unfortunately, Windows network drives are notoriously bad at providing reliable locking mechanisms (quote from the SQLite FAQ: "file locking of network files [on Windows] is very buggy and is not dependable").

OK, so suppose we design our own locking mechanism that works reasonably well on Windows network drives (we did that). Could we adapt this to make a traditional database work?

Theoretically, we could modify the database software to use our locking mechanisms instead of its own. I suspect that would be a huge development effort. Modern databases are quite complex. Also, I strongly suspect that we would run into other complications (e.g. special file system operations that are improperly or only partially implemented for Windows network drives).

What if we just had one "global" lock for the entire database? In other words, we could force every computer to wait their turn before running a database query -- only one computer would be able to use the database at a time.

I suspect this would be way too slow in practice with even a moderate number of users. The process for performing each query would be:

  • Acquire the global database lock. This would require at least one network round trip to the network drive.
  • Start/open the database. At least one round trip. Databases do fancy things with caching and background processing (e.g. garbage collection). I suspect it would be unsafe to leave the database running while other computers are running queries.
  • Run the query. Likely a round trip, unless cached.
  • Stop/close the database.
  • Release the global database lock. At least one round trip.

With so many round trips, each query would be at least 100 ms. In some cases, it could even last for seconds! This might not seem like a big deal, but don't forget that it's not just one user waiting -- all users on the entire system would need to wait, since queries are processed one at a time.

But what about distributed databases?

Another possibility is to use database "replication". This is a feature used at companies who run medium- or large-scale internet services. At some point, a service might grow to the point where a single database server isn't able to handle the volume of traffic. Replication allows two or more servers to keep their databases in sync. "Read" queries to either server will return the same data (pretty much), and "write" queries will affect the databases on both servers.

OK, so what if we run a database on every user's computer and keep them in sync using replication? That would be pretty cool. Unfortunately, we would have the same problem with IT departments restricting P2P traffic. Also, unless every user's computer runs 24/7, we would run into problems with change propagation. Example:

  • User A makes a change at 8pm after everyone else has shut down their computer.
  • User A leaves on vacation.
  • User B makes a different change to the same file.
  • User A gets back and tries to sync up. The application would now need to figure out how to handle the conflict between User A's change and User B's change.

Pretty soon, we're writing our own version of git.

The Solution

It's not practical to use a full database, so we'll go with something simpler. We'll assume that our customer maintains a network file share accessible to all staff (or at least the staff who wish to use our products), and performs regular backups. Then, we'll build our own custom data persistence layer on top of the network drive. The persistence layer will manipulate files and directories on the network share, and carefully avoid problems with concurrent access.

This will all be invisible to application programmer. The persistence library will present an interface that makes it feel just like a server and database. The programmer won't need to think in terms of files and directories; the programmer will work with higher-level concepts like "prog note" objects and client files. At the same time, the library will be flexible enough for use in other applications beyond KoNote.


Some relevant additional discussion (from email):

I'm looking at the section "we need more locks", in which you identify some of the performance and technical problems involved.

I'm still not quite clear on the following:

Let's say we use a node-webkit-friendly database, with a global lock that gives read/write privileges to only one user at a time (like the lock process you have written. Users' DB requests go through our software to vet access).

What are the big stoppers to using the database? Is it one or more of

  • opening and closing the database for each user's query, or
  • concerns about file system operations on Windows networks?

Put another way, if dealing with concurrent access is the big problem (as you suggest in The Solution), haven't we already dealt with that by developing a global lock system?

Response:

It depends on what kind of database. Let's suppose we used MySQL or Postgres, which are standalone databases that we would need to install separately alongside the app (or package into the installer, I suppose).

FS operations: It might be possible to run a database off of a network drive without crashes. I'm really not sure. It's possible that a database could rely on file system features that are not implemented for network drives. That would depend on the database.

Performance: On my brand new computer with top of the line specs (and SSD), a clean install of MySQL with no data takes 2s to start and 1s to stop. That's a minimum of 3 seconds overhead every time we want to do something with the database (read or write). Running it off a network drive will make things much slower. Worse, during that time, the application running on other computers cannot safely access the database, even to read. You could easily end up with situations where one slow computer blocks out everyone else for 20-30s at a time. This is the main issue.

Let's suppose we used WebSQL or IndexedDB, which are built into browsers (and NW.js). These databases do not allow us to specify where to save the data: the data is just saved where the browser keeps history, cookies, cache, etc.

We could tell NW.js to store cookies, history, etc on the network drive so that all KoNote instances point to the same folder. I'm pretty sure NW.js would blow up, since it doesn't expect multiple instances to be running off the same cookies, cache, etc at once.

We could extract data from WebSQL/IndexedDB every time we want to make a change, and somehow synchronize it with what's on the network drive. There might be something here, and it's worth considering for future development. I suspect it would be more complicated than the existing solution, but it could have some nice features like the ability to work offline.

Design overview

The persistence library was written with a number of design goals in mind:

  • Stand-alone: Obviously, this library was originally designed for use in KoNote. However, as much as possible, the code has been written to be generic and configurable so that the library can be used to create other KoNode apps.
  • Node.js only: NW.js is not needed to use this library.
  • Object-centric: The JSON-able JavaScript object is the basic unit of storage. Objects are automatically assigned unique IDs so that they can refer to one another.
  • Type safe: All objects are validated against a schema on both read and write to ensure that they contain the correct fields with values of the correct type. This helps catch bugs that might otherwise go unnoticed due to JavaScript's dynamic typing.
  • Append-only: Data should never be deleted or overwritten. Complete, accurate record-keeping is important for both medical and legal reasons. All mutable objects have a revision history. Whenever an object is updated, a new revision is added to its history. The old versions of the object are never lost.
  • Acceptably performant at scale: This library won't set any records for performance, but it should operate acceptably with hundreds of concurrent users.
  • Reliable at scale: The library should not corrupt or lose data, even if there are hundreds of concurrent users. Performance should be sacrificed before user data.

Data persistence

The Persistent Object API provides a MongoDB-like interface for reading and writing data.

Collections are the basic unit of the API. A collection is a set of objects of the same type (e.g. progress notes or metric definitions). An object can be any valid JSON object. This is roughly analogous to traditional table-based databases: collections are tables, objects are rows in a table, and an individual object's fields are cells in that table.

There are two types of collections: mutable and immutable. Immutable collections allow new objects to be created, but do not allow old objects to be changed or removed. Mutable collections, on the other hand, maintain a revision history for each object. This allows the objects to be changed, but the old versions of each object is kept forever.

Persistent objects can, themselves, contain collections. For example: KoNote maintains a collection of client files. Each client file contains a collection of progress notes. Collections can be nested to arbitrary depth, because, well, how could I resist?

Read more on Persistent Object API

Top-level API

The persist module can be imported as a normal Node.js module: var Persist = require('./persist'). Note that it should never be necessary for code outside the persist module to import any of persist's submodules. require('./persist/something') == BAD.

Three core Persist modules are exposed:

Persist.Users

Provides logic related to directly accessing and manipulating user account information.

More documentation will be written when the API has become more stable.

Persist.Session

Provides a mechanism for beginning a login session.

All operations on persistent objects (both read and write) must be done in the context of a login session. The actual persistent object API is therefore exposed as a field in every Session object.

Persist.Session provides a single method:

  • login(string dataDirectoryPath, string userName, string password, function cb(Error err, Session s))

    Attempt to start a login session under the specified credentials. If the operation succeeds, a Session object will be provided to the callback.

    Errors:

    • Persist.IOError: typically indicates a network connectivity problem
    • Persist.Session.UnknownUserNameError: no account exists with the specified user name
    • Persist.Session.IncorrectPasswordError: an account exists with the specified user name, but the password was incorrect

Session objects expose an object called persist. This is an instance of the Persistent Object API.

More documentation for the Session class will be written once the API has become more stable.

Persist.Lock

Provides a locking mechanism that works (fairly) reliably on network drives.

In many cases, it is important that only one user be allowed to work on a document at a time. Since this varies greatly depending on the application, the API user is responsible for using locks as needed.

Note: this locking mechanism is may not work reliably with network drives using Microsoft DFS.

API

Usage:

var Persist = require('./persist');
var Lock = Persist.Lock;
  • Lock.acquire(ActiveSession, string lockId, function cb(Error err, Lock lock))

    Attempt to acquire mutually exclusive lock on the ID lockId. lockId should be a valid file name on all relevant platforms.

    It is the responsibility of the API user to decide how lock IDs should be chosen.

    The lock will be held until the lock object's release method is called. The lock mechanism is designed to withstand crashes: if the application dies, the lock will automatically "expire" after a configurable amount of time (usually a few minutes).

    Errors:

    • Persist.IOError
    • Lock.LockInUseError: indicates that the lock is currently held elsewhere.
  • Lock.acquireWhenFree(ActiveSession, string lockId, *optional number intervalMins, function cb(Error err, Lock lock))

    Runs Lock.acquire(...) every specified time interval (default is 30s) to try and retrieve a lock.

    This loop runs in the background until either:

    • The lock has become available, and has been retrieved
    • The operation has been cancelled (must be done as part of deinit() cleanup)

    To cancel the operation:

    1. Save the operation to a variable (or state): lockOperation = Lock.acquireWhenFree(...)
    2. Use the its public cancel() method: `lockOperation.cancel(function cb)

    Currently it won't spit out any errors, since its single purpose is to keep retrying even after failure. Instead, an informative console.warn() message is provided when the given attempt fails. Prepare to handle an Error err object anyway, since we may introduce this in the future.

  • Lock.prototype.release(function cb(Error err))

    Release this lock object, allowing others to obtain the lock.

    As soon as this method is called, the application should assume that it no longer holds the lock, even if the callback receives an error.

    Errors:

    • Persist.IOError

Persistent Object API

The Persistent Object API provides a simple interface for reading and writing data objects. This API is accessible from any valid login session (see the Persist.Session documentation).

All timestamps use the Moment.js format string Persist.TimestampFormat. In most cases, Immutable.js data structures are used.

Data model definitions

The Persistent Object API requires an array of data model definitions -- a schema. This schema lists the collections that the application requires, as well as the structure of the objects that each collection will contain. Any object going in or out of the API is validated against the schema using a library called Joi.

Each data model definition in the array should be a plain JavaScript object with the following properties:

  • name (required string): The name of the type of object that this collection will contain. The name should be alphanumeric, camelcase, and start with a lowercase letter (likeThis). The name must be unique among all data models, including those specified as children or descendants.

  • collectionName (required string): The name of the collection, subject to the same character and uniqueness restrictions as name. By convention, collectionName should be the plural form of name.

  • isMutable (required boolean): Whether or not the objects of this collection should be mutable. Mutable objects will maintain a revision history.

  • indexes (optional array of arrays of strings): An array of fields that should be indexed. In order to support nested object structures, a field is referred to using an array of property names.

    Example: Suppose the objects in the collection look like this:

    {
    	"name": {
    		"first": "Alice",
    		"last": "Smith"
    	},
    	"gender": "female"
    }
    

    We can refer to obj.name.last using ["name", "last"], and obj.gender using ["gender"]. Therefore, if we wanted to index both the last name field and the gender field, we would specify [["name", "last"], ["gender"]].

    For more information on indexes, see the list method of the Collection API.

  • schema (required Joi schema): A Joi validation schema that defines what fields each object should/can contain.

    The schema must be either a Joi.object(), or an array of Joi.object()s -- other Joi types (such as Joi.array()) are permitted, but not at the top-level. This is because the schema will be automatically extended to add metadata fields (id, revisionId, author, and timestamp).

  • children (optional array of data model definitions): A list of subcollections that each object should contain. Each subcollection has a full data model definition as described in this section, and can have subcollections of its own.

Session.persist

The Persistent Object API can be accessed using the persist field of any Session object. This object will be referred to as persist in the remainder of this section.

persist always has the following properties:

  • persist.eventBus: A Backbone.Events object that allows API users to monitor for changes to persistent objects. Note that, due to the limited capabilities of network drives, a process is not guaranteed to receive events from changes that occurred outside of that process (e.g. on other computers).

    The following event types are implemented:

    • create:$DATA_MODEL_NAME, where $DATA_MODEL_NAME is the name field of the relevant data model. This event will fire when a new object of the given type is created. The listener functions will receive a single argument: the object that was created.

    • createRevision:$DATA_MODEL_NAME, where $DATA_MODEL_NAME is the name field of the relevant data model. This event will fire when an object of the given type is updated. The listener functions will receive a single argument: the newly revised object.

Collection API

The rest of persist's properties are generated dynamically based on the data model definitions provided. These properties are named after the collectionName field of the data model. Every collection has a set of methods appropriate for the type of objects (e.g. immutable vs mutable). All possible methods are listed here:

  • create(Immutable.Map newObject, function cb(Error err, Immutable.Map createdObject))

    Add a new object to this collection. The object must conform to the schema specified in the data model definition.

    If this collection is a subcollection (i.e. a child of another collection), newObject must contain fields identifying a parent object (see "Identifying a Parent" below).

    The following fields will automatically be added before the object is written:

    • id: A new random ID will be generated to identify the object.
    • revisionId: A new random ID will be generated to identify this revision of the object.
    • author: The user name from the Session object.
    • timestamp: The current date/time, with millisecond precision.

    If the operation succeeds, the callback will receive a copy of the object with these fields added.

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if the specified parent object could not be found

    Implementation notes:

    This operation is atomic. It will either succeed 100% or make no changes at all. Any other processes performing read operations either see the operation as complete or not yet started -- never partially incomplete.

    TODO performance

  • list(string parentIds..., function cb(Error err, Immutable.List objectHeaders))

    List all objects in the collection. For performance reasons, the resulting list will contain object headers, not entire objects.

    An object header is an Immutable.Map containing only the fields of the object that are "indexed". The data model definition controls which fields are indexed. The object's id field is always indexed.

    If this collection is a subcollection (i.e. a child of another collection), the first parameters to this method must identify a parent (see "Identifying a Parent" below).

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if the specified parent object could not be found

    Implementation notes:

    TODO performance

  • read(string parentIds..., string objectId, function cb(Error err, Immutable.Map object))

    (only available for immutable objects)

    Read the object with the specified ID. The full object is returned as an Immutable.Map.

    If this collection is a subcollection (i.e. a child of another collection), the first parameters to this method must identify a parent (see "Identifying a Parent" below).

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if either the parent object or the target object could not be found

    Implementation notes:

    TODO performance

  • createRevision(Immutable.Map updatedObject, function cb(Error err, Immutable.Map updatedObject))

    (only available for mutable objects)

    Update the object by adding a new entry to its revision history.

    The revisionId, author, and timestamp fields will be automatically revised. These changes will be reflected in the object passed to the callback.

    If this collection is a subcollection (i.e. a child of another collection), updatedObject must contain fields identifying a parent object (see "Identifying a Parent" below). These fields must match what was provided when the object was created.

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if either the parent object or the target object could not be found

    Implementation notes:

    This method performs two operations: (1) create new object revision, and (2) update indexes.

    Each operation is atomic. It will either succeed 100% or make no changes at all. Any other processes performing read operations either see the operation as complete or not yet started -- never partially incomplete.

    Note however that it is still possible for a crash between (1) and (2) to leave the indexes out of date.

    TODO performance

  • listRevisions(string parentIds..., string objectId, function cb(Error err, Immutable.List revisionHeaders))

    (only available for mutable objects)

    List all entries in the specified object's revision history. For performance reasons, the resulting list will contain revision headers, not the entire object. A revision header is an Immutable.Map containing only two fields: timestamp and revisionId. If the full object is needed with each revision, try one of the read methods.

    If this collection is a subcollection (i.e. a child of another collection), the first parameters to this method must identify a parent (see "Identifying a Parent" below).

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if either the parent object or the target object could not be found

    Implementation notes:

    TODO performance

  • readRevisions(string parentIds..., string objectId, function cb(Error err, Immutable.List revisions))

    (only available for mutable objects)

    Read all revisions of the specified object. A revision is a snapshot of the object at a given point in time. This is an expensive operation for large revision histories, since it must read every revision file. See listRevisions for a light-weight alternative.

    If this collection is a subcollection (i.e. a child of another collection), the first parameters to this method must identify a parent (see "Identifying a Parent" below).

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if either the parent object or the target object could not be found

    Implementation notes:

    TODO performance

  • readLatestRevisions(string parentIds..., string objectId, number maxRevisionCount, function cb(Error err, Immutable.List revisions))

    (only available for mutable objects)

    Read the maxRevisionCount most recent revisions of the specified object.

    If this collection is a subcollection (i.e. a child of another collection), the first parameters to this method must identify a parent (see "Identifying a Parent" below).

    Errors:

    • IOError: usually indicates a connection problem
    • ObjectNotFoundError: if either the parent object or the target object could not be found

    Implementation notes:

    TODO performance

Identifying a Parent

The "subcollection" feature allow objects to contain collections of more objects. In order to use one of these subcollections, we must specify where to find it, i.e. what object contains the subcollection. This object is referred to as the "parent".

We could easily identify the parent object by providing its ID, but what if the parent object is itself in a subcollection? The parent's collection will itself have a parent, which we can identify by an ID. Depending on how deeply nested the collections are, we may need to specify many IDs in order to locate an object.

Example: Suppose that we have an e-commerce application. The application keeps a collection of customer objects. Every customer has a subcollection of purchases. Every purchase contains a subcollection of line items.

Now suppose that a customer with ID 3a9f has a purchase with ID 84ba, which contains a line item with ID e0d2. In order to read the line item e0d2, we would need to identify its parent, purchase 84ba. But, in order to identify purchase 84ba, we would need to identify its parent, customer 3a9f.

There are two ways to provide these parent IDs. Methods that accept the object itself as a parameter (e.g. create and createRevision) require that the IDs be included as fields in the object. The fields are named by appending "Id" to the data model name. In the above example, we would need to include the fields customerId and purchaseId.

The other methods expect that these IDs be provided as arguments to the function, in order from highest-level to lowest-level. Continuing with the above example, we would need to provide read(customerId, purchaseId, lineItemId, cb) in order to read a line item.

Note that it's a useful memory aid to compare this to a file path: customers/3a9f/purchases/84ba/lineItems/e0d2.

Crypto API

The data persistence package includes an internal module that implements the various cryptographic operations needed for securing the persistent data and user account system. Although it is not necessary to use this module directly from outside of the persist package, its API is documented here.

Symmetric Encryption Key

This class implements a symmetric encryption scheme, i.e. encryption that uses the same key to encrypt and decrypt. The class provides two methods of creating an encryption key:

  • SymmetricEncryptionKey.generate() -> SymmetricEncryptionKey

    Randomly generates a new encryption key. Returns a SymmetricEncryptionKey object.

  • SymmetricEncryptionKey.derive(string password, object params, function cb(err, SymmetricEncryptionKey k)) -> SymmetricEncryptionKey

    Transforms the specified password into an encryption key. This makes it easy to implement password-based encyryption.

    params must be an object with two properties:

    • salt: must be a string generated using generateSalt() (see below). Different passwords should use different salts.
    • iterationCount: higher iteration counts slow down password cracking. Recommended value: 500,000 - 1,000,000

    Asynchronously returns a SymmetricEncryptionKey object, via the callback.

SymmetricEncryptionKey objects provide two operations, encrypt and decrypt.

  • key.encrypt(string or Buffer message) -> Buffer

    Encrypts the specified message with this encryption key. This method can encrypt any sequence of bytes (i.e. any Buffer), including images and documents. For convenience, this method also accepts strings, which it will encode as bytes using UTF-8.

  • key.decrypt(Buffer encryptedMessage) -> Buffer

    Decrypts the specified ciphertext with this encryption key, and returns the original message.

    Note: this method returns a Buffer. Call .toString() on the Buffer to convert back to a string.

Additionally, the class allows you to "import" and "export" keys. Exporting a key object converts it to a string. Importing uses that string and recreates the key object. This lets you save the keys you've generated and load them in again later.

  • key.export() -> string

    Export this SymmetricEncryptionKey as a string.

  • SymmetricEncryptionKey.import(string exportedKey) -> SymmetricEncryptionKey

    Convert the previously exported SymmetricEncryptionKey to a SymmetricEncryptionKey object.

Asymmetric encryption

The two classes PrivateKey and PublicKey together provide an asymmetric encryption scheme. In asymmetric encryption, keys come in a pair consisting of a private key and a public key. The public key can only encrypt messages, while the private key can only decrypt messages.

Note: these classes are designed to be expanded to include digital signatures, but that functionality has not yet been implemented.

  • PrivateKey.generate(function cb(Error err, PrivateKey k))

    Randomly generates a new key pair, and asynchronously returns a PrivateKey object. The public key can be obtained using .getPublicKey().

    Note: this method is sloowwwwwww. Blame RSA.

  • privKey.getPublicKey() -> PublicKey

    Obtains the corresponding PublicKey to this PrivateKey.

Each key type implements the relevant operation:

  • pubKey.encrypt(string or Buffer message, function cb(Error err, Buffer encryptedMessage))

    Encrypts the specified message using this public key. Returns the ciphertext asynchronously via the callback.

    Note: this method accepts either a Buffer or a string for convenience. Strings will be encoded as bytes using UTF-8.

  • privKey.decrypt(Buffer encryptedMessage, function cb(Error err, Buffer message))

    Decrypts the specified ciphertext using this private key. Returns the original message asynchronously via the callback.

    Note: this method returns a Buffer. Call .toString() to convert back to a string if needed.

Both key types can be exported as a string, and later reimported as a key object:

  • privKey.export() -> string
  • pubKey.export() -> string
  • PrivateKey.import(string exportedPrivateKey) -> PrivateKey
  • PublicKey.import(string exportedPublicKey) -> PublicKey

Extra utilities

  • generateSalt() -> string

    Generate a password salt for use with SymmetricEncryptionKey.derive.

  • generatePassword() -> string

    Generates a secure password using uppercase letters (A-Z), lowercase letters (a-z), digits (0-9), underscores (_), and hyphens (-).

    Passwords output from this function have enough entropy to make them essentially uncrackable when used with an appropriate password hash.

  • obfuscate(string or Buffer message) -> Buffer

    Obfuscates the input to make it difficult for a non-technical user to tamper with or access the value. This is not a secure form of encryption.

    Note: strings are encoded as bytes using UTF-8

  • deobfuscate(Buffer obfuscatedMessage) -> Buffer

    Restore the original value provided to obfuscate.

    Note: this method returns a Buffer. If a string is needed, call .toString() on the Buffer.

Implementation notes

This page contains notes about how the persist package works. This information is not needed to use the package, but it is useful for implementing or debugging new persist features.

User accounts

Each user account has a directory at data/_users/$USERNAME. This directory contains the following files:

  • public-info: Information about the account that is public. Contains an obfuscated JSON object with the following fields:

    • accountType (string): either "admin" or "normal"
    • isActive (boolean): if this account has not been deactivated
  • account-key-N, where N is an integer >= 1: A JSON object with the following fields:

    • kdfParams (object): parameters required for key derivation.

      • iterationCount: see Crypto.SymmetricEncryptionKey
      • salt: see Crypto.generateSalt
    • accountKey (string): the account key, encrypted with the user's password, and encoded in base64url

    When attempting to access this file, the application lists the files in the directory and uses the account key file with the highest N. When a user changes their password, a new file with a larger N is created. This works around the fact that normal users do not have permission to modify files (create/append only).

    When run with an admin account, the application should, for security reasons, regularly clean up any obselete account key files across all accounts.

  • account-recovery: The account key, encrypted with the system encryption key. This allows admins to recover an account if the password is lost.

  • private-info: Account information that is private to the user. Contains an encrypted JSON object with the following fields:

    • globalEncryptionKey (string): an exported SymmetricEncryptionKey
    • systemEncryptionKey (optional string): an exported SymmetricEncryptionKey. Only present for admin accounts.

    The file is encrypted using the user's account key.

Cryptography

Most implementation notes for the crypto module are written as inline comments.

Obfuscation

Obfuscation is used on some data files to prevent inexperienced users from trying to modify data files by hand. Obfuscation is just encryption with a hard-coded symmetric encryption key (symmkey-v1:6f626675736361746520746865207374756666207769746820746865206b6579). The (en|de)cryption is performed using SymmetricEncryptionKey from the Crypto module, as usual. This encryption should not be expected to provide any real protection.