Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review XForms cache architecture #1718

Closed
ebruchez opened this issue May 14, 2014 · 11 comments
Closed

Review XForms cache architecture #1718

ebruchez opened this issue May 14, 2014 · 11 comments

Comments

@ebruchez
Copy link
Collaborator

In particular, check and test for:

  • proper expiration of static vs. dynamic state entries
  • cost (CPU, memory) of dynamic state serialization
  • proper expiration of entries with session expiration

See also #1572, #1973.

@ebruchez
Copy link
Collaborator Author

Also consider:

  • Can we do any type of automatic tuning of cache sizes?
  • Can we add a monitoring page to check the state of all caches?

@ebruchez ebruchez modified the milestones: Review, Consider for 4.7 Jun 26, 2014
@ebruchez
Copy link
Collaborator Author

Tasks:

  • re-understand architecture of static vs. dynamic state caches and stores
  • test that dynamic state is properly serialized out of the cache, into the store, and does result in reduced memory consumption
  • test that many open sessions do not cause dynamic state to be incorrectly kept in memory, and that expiration/serialization takes place as expected

@ebruchez
Copy link
Collaborator Author

ebruchez commented Sep 3, 2014

Some explanation of how things should work right now:

  • Every time you hit a form page, some server state is created. This is what we call the "dynamic state" or a "form session", and it is separate from the "static state" (which is the compiled form, basically).
  • The dynamic state includes all the data which is specific to the current form session: instance data, state of some controls, etc. It takes memory. For example, if memory increases by 10 MB each time after gc, that's probably fair to say that this is the size of the dynamic state.
  • A refresh in a browser is the same as opening the form a second time, for example in a new tab: this also creates new server state.
  • As long as tabs stay open, we have by default what we call the "session heartbeat" going: this keeps the session open so that it doesn't expire.
  • However, once you close all tabs, the server session eventually expires, and should cause all the associated server state to be freed. But that can take time (minutes, hours, more, depending on the session duration).
  • The dynamic state consists mainly of a tree of Java objects on the server. In addition, an initial serialization of that data is created to handle the browser back.
  • All dynamic state instances are kept in the document cache. By default, its size is 50 and set by the property oxf.xforms.cache.documents.size. So if each state is 10 MB, the cache can grow to 500 MB.
  • This explains why the memory is not fully reclaimed when you gc.
  • When the cache is full, based on LRU, dynamic state objects are serialized and pushed to what we call the XForms state store. This is configured in ehcache.xml (cache name is xforms.state). This is meant to serialize state (DynamicState) to disk and free memory. If any of this information is needed at a later time, it is deserialized from disk and put back in the document cache.
  • The more complex the form definition and/or the more data it holds, the more memory it is likely to take.

@ebruchez
Copy link
Collaborator Author

ebruchez commented Sep 3, 2014

Ideas for better caching:

  • Could we use a weak reference cache? Unfortunately, such references can be a bit too unpredictable. It's unclear whether this is promising.
  • We could use a heuristic to determine the memory weight of the dynamic state, based on data size, number of controls, maybe just the size of the serialization. This would be a rough approximation, but it could tell you whether the size is about 100 KB, 1 MB, 10 MB, 100 MB, say. This could be used to size the dynamic state cache in MB instead of number of elements.

@ebruchez
Copy link
Collaborator Author

Also, could we better passivate, or even in some case delete server state? See [customer suggestion[(https://basecamp.com/1721271/projects/3269712/messages/25504605), and my reply:

It is a good suggestion in theory, but the issue here is: how can this be determined? If you refresh a page in a browser, the server sees it exactly as if you had opened the same URL in another browser tab or window. The server just doesn't know.

One way might be to figure out whether the client could somehow detect an event when the browser unloads the page. There are some of those events. But it is not clear that the browser can inform the server when such events are detected (that is: send an Ajax request, for example). (I remember an issue whereby in some cases Ajax events made it to the server with one browser, maybe Firefox, but not Chrome.)

Note that similarly, this can happen when the user navigates from a Form Runner page: state is kept on the server for the page which is now to the "left" in the browser history, and not active. In theory, the state could be, on the server, immediately passivated, and reactivated in the fairly unlikely event of a browser back button navigation.

@ebruchez
Copy link
Collaborator Author

Some thoughts on deleting/passivating state when hitting the same form URL more than once. Scenarios:

  • refresh in browser tab
  • open in another tab

In general it's not a great idea to delete state because that would mean that your form in the other tab would stop working. For example open the following URL in two tabs:

http://demo.orbeon.com/orbeon/fr/orbeon/contact/new

The first tab would stop working completely, which is not desirable.

For a given session, and a given form path, whatever it might be, we could automatically passivate (but not completely remove) form sessions with the same path. So in my example above:

Maybe a better approach might be one of working with cache weights: instead of passivating state directly, such hits would just reduce the weight of the data in the cache, scheduling them for earlier passivation.

Going further, this might work not only in the case of same paths, but for any forms a user might be opening. This might make sense too: the likelihood of a user working on more than one form at a time is not 0, but it is probably low in practice. So maybe a system of weights could work here too.

@avernet avernet removed this from the Review milestone Oct 17, 2014
@ebruchez ebruchez modified the milestones: Review, 4.8 Oct 17, 2014
@ebruchez
Copy link
Collaborator Author

2014-10-27:

  1. Checked XFCD usage with large form. The result: about 11 MB. Not low but what I was expecting.
  2. Tested that form sessions are in fact serialized to disk via Ehcache, and that the heap is freed appropriately.
  3. Tested that Tomcat session expiration does indeed free form sessions.
  4. Checked memory usage in Form Builder with the same large form. Here, the result is more interesting: each new session takes about 65 MB, which is large. If I flood the cache with documents of this size, I can easily consume GBs of heap.

NOTE: When session expires, size of Ehcache store on disk doesn't shrink. I don't know if this is preventable.

@ebruchez
Copy link
Collaborator Author

2014-10-28:

@ebruchez
Copy link
Collaborator Author

ebruchez commented Nov 5, 2014

We wish to simplify the code if possible, to make intents clearer and to make it easier to reason about the logic which is likely to remain non-obvious.

@ebruchez
Copy link
Collaborator Author

Created #1989 with bug found during investigation.

@ebruchez
Copy link
Collaborator Author

Closing the "review" part. Open new bugs for new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants