Skip to content

Commit

Permalink
Es item (#111)
Browse files Browse the repository at this point in the history
* Removed unused interface

* Added datastore param to PickStorage methods; move PickStorage to storage.py

* Added datastore parameters to connection.py

* initial test

* Fixed a couple imports, improvd test_pick_storage

* request.datastore moved to storage.py. Misc fixes

* Disable some annoying loggers, improve PickStorage, couple test-related bugfixes

* Confirm self.read has value in PickStorage.storage

* small test fix

* Revised register_storage function to better handle existing PickStorage

* Use new register storage with esstorage and mpindexer

* test changes

* Test fix

* Storage reconfiguration and changes for ES-based items

* Fix for get_by_uuid direct, add TestingLinkTargetElasticSearch

* test_create_es_item_without_es

* A couple more misc test fixes

* Fix to PickStorage.find_uuids_linked_to_item

* Fix collection name

* One more small fix

* Messy, but got something working. Cleanup is needed, especially for request.force_datastore

* Refactoring, simplifying, fixing tests

* Fully remove linkFrom

* Test embedding with TestingLinkTargetElasticSearch

* Misc cleanup

* small test fix

* Polishing crud_views and connection, added agg_items to ES item tests

* Doc changes to cached_views.py

* doc updates for esstorage.py

* Slight refactor to mpindexer

* Final doc refactors

* Small fix for indexing-info when item is not yet indexed

* Slight refactor of purge_uuid to remove from ES before DB

* Refactored docs a bit and only include updated ones

* Some progress on docs

* Filled out storage overview doc

* Small doc-related changes

* Added some placeholder docs and made rst formatting consistent

* Correctly format inline code

* Change ES item designation to AbstractCollection.properties_datastore

* Fixes for links/uuids for ES items, as well as adjustment to properties_datastore

* Check request.datastore first in PickStorage.storage; adjustments for properties_datastore

* Doc changes for properties_datastore

* Test and version updates

* Small fixes and refactors related to default properties_datastore=database

* Addressed a couple of Will's PR comments

* Refactor TestingLinkTargetElasticSearch tests

* Handle ES-based collections in create mapping

* Use new Collcection.default_properties_datastore for uuid cache invalidation in Connection.__getitem__

* More docs

* small review changes

* fix import

Co-authored-by: Will Ronchetti <[email protected]>
  • Loading branch information
carlvitzthum and willronchetti authored Feb 3, 2020
1 parent e46ac9f commit b163e5a
Show file tree
Hide file tree
Showing 44 changed files with 1,368 additions and 483 deletions.
11 changes: 10 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,13 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ['_static']

# can add a logo on sidebar with:
# html_logo = docs/source/img/...

# Read the Docs configuration.
# See: https://sphinx-rtd-theme.readthedocs.io/en/stable/configuring.html
html_theme_options = {
'navigation_depth': 2
}
17 changes: 17 additions & 0 deletions docs/source/es_indexing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Elasticsearch Indexing
=====================

**Work in progress!**

Indexing is the process of building a complete document that contains multiple views of an item, then putting that document into Elasticsearch (ES). This is done whenever an item is created or changed, and acts as one of the backbones of Snovault, allowing searching of data and quick reading of complex views for items that are "cached" by using ES as a right storage.

.. image:: img/indexing.png

Figure 1: Diagram of the indexing process.

Code
-----------------
* `indexer.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/elasticsearch/indexer.py>`_: index endpoint and initialization, Indexer class
* `mpindexer.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/elasticsearch/mpindexer.py>`_: MPIndexer class and helper functions
* `indexer_queue.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/elasticsearch/indexer_queue.py>`_: QueueManager and endpoints for queueing and checking indexing
* `indexing_views.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/indexing_views.py>`_: index-data view and some other related endpoints
File renamed without changes.
Binary file added docs/source/img/connection_storage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/indexing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/img/traversal.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
68 changes: 4 additions & 64 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,76 +1,16 @@
Snovault Documentation
========================
Snovault
========================

Snovault is a JSON-LD Database Framework that serves as the backend for the 4DN Data portal and CGAP.

|Build status|_

.. |Build status| image:: https://travis-ci.org/4dn-dcic/snovault.svg?branch=master
.. _Build status: https://travis-ci.org/4dn-dcic/snovault

Installation Instructions
=========================

Currently these are for Mac OSX using homebrew. If using linux, install dependencies with a different package manager.

Step 0: Install Xcode (from App Store) and homebrew: http://brew.sh::

Step 1: Verify that homebrew is working properly::

$ sudo brew doctor


Step 2: Install or update dependencies::

$ brew install libevent libmagic libxml2 libxslt openssl postgresql graphviz python3
$ brew install freetype libjpeg libtiff littlecms webp # Required by Pillow
$ brew tap homebrew/versions
$ brew install [email protected]

If you need to update dependencies::

$ brew update
$ brew upgrade

Step 3: Run buildout::

$ python3 bootstrap.py --buildout-version 2.9.5 --setuptools-version 36.6.0
$ bin/buildout

NOTE:
If you have issues with postgres or the python interface to it (psycogpg2) you probably need to install postgresql
via homebrew (as above)
If you have issues with Pillow you may need to install new xcode command line tools:
- First update Xcode from AppStore (reboot)
$ xcode-select --install
If you are running macOS Mojave, you may need to run the below command as well:
$ sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /



If you wish to completely rebuild the application, or have updated dependencies:
$ make clean

Then goto Step 3.


Running tests
=============

To run specific tests locally::

$ bin/test -k test_name

To run with a debugger::

$ bin/test --pdb

Specific tests to run locally for schema changes::
Snovault is a JSON-LD Database Framework that serves as the backend for the `4DN Data portal <https://github.com/4dn-dcic/fourfront>`_ and `CGAP <https://github.com/dbmi-bgm/cgap-portal>`_. It is a very divergent fork of the work of the same name written by the ENCODE team at Stanford University. `See here <https://github.com/ENCODE-DCC/snovault>`_ for the original version.

$ bin/test -k test_load_workbook
Since Snovault is used for multiple deployments across a couple projects, we use `GitHub releases <https://github.com/4dn-dcic/snovault/releases>_` to version it. This page also acts as a changelog.

Run the Pyramid tests with::
To get started, read the following documentation on setting up and developing Snovault:

$ bin/test

Expand Down
44 changes: 44 additions & 0 deletions docs/source/local_installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
Local Installation
==================

Currently these are for macOS using homebrew. If using linux, install dependencies with a different package manager.

Snovault is known to work with Python 3.6.x and will not work with Python 3.7 or greater. If part of the HMS team, it is recommended to use Python 3.4.3, since that's what is running on our servers. A good tool to manage multiple python versions is `pyenv <https://github.com/pyenv/pyenv>_`. It is best practice to create a fresh Python virtualenv using one of these versions before proceeding to the following steps.

Step 0: Obtain AWS keys. These will need to added to your environment variables or through the AWS CLI (installed later in this process).

Step 1: Verify that homebrew is working properly::

$ brew doctor


Step 2: Install or update dependencies::

$ brew install libevent libmagic libxml2 libxslt openssl postgresql graphviz
$ brew install freetype libjpeg libtiff littlecms webp # Required by Pillow
$ brew install [email protected]

If you need to update dependencies::

$ brew update
$ brew upgrade

Step 3: Run buildout::

$ python3 bootstrap.py --buildout-version 2.9.5 --setuptools-version 36.6.0
$ bin/buildout

NOTE:
If you have issues with postgres or the python interface to it (psycogpg2) you probably need to install postgresql
via homebrew (as above)
If you have issues with Pillow you may need to install new xcode command line tools:
- First update Xcode from AppStore (reboot)
$ xcode-select --install
If you are running macOS Mojave, you may need to run the below command as well:
$ sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /


If you wish to completely rebuild the application, or have updated dependencies:
$ make clean

Then go to Step 3.
10 changes: 10 additions & 0 deletions docs/source/resource_views.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Resource Views
===========================

**Work in progress!**

This document outlines the different base resource views and their sources. May be worth first reading the `traversal <https://snovault.readthedocs.io/en/latest/traversal.html>`_ and `storage <https://snovault.readthedocs.io/en/latest/storage_overview.html>`_ documentation.

**TODO: outline each resource view with context=Item.**

**TODO: Include relationship to storage and traversal (context and embed.py)**
12 changes: 12 additions & 0 deletions docs/source/resources.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Resources
===========================

**Work in progress!**

This document outlines different classes that compose a base Snovault item. Code is located in the following files:

- `resources.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/resources.py>`_: Root, AbstractCollection, Collection, Item classes
- `typeinfo.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/typeinfo.py>`_: AbstractTypeInfo, TypeInfo, TypesTool
- `config.py <https://github.com/4dn-dcic/snovault/blob/master/src/snovault/config.py>`_: CollectionsTool, collection and abstract_collection decorators

**TODO: outline the role of each resource class. Include a complete example**
4 changes: 0 additions & 4 deletions docs/source/search_info.rst

This file was deleted.

12 changes: 7 additions & 5 deletions docs/source/snowflakes.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
================
Snowflakes
================

General
^^^^^^^^
-----------------

Snowflakes used to be the front-end component of Snovault meant to serve as a demo. Since we at 4DN have our own Snovault-backed application (Fourfront, CGAP), snowflakes has been entirely removed from our version of Snovault. It is still present in ENCODE's version which you can find `here <https://github.com/ENCODE-DCC/snovault>`_ .

Removing Snowflakes from Snovault proved more challenging than one may expect. Some parts of snowflakes were actually required for snovault to run, such as ``root.py``. These files have all been migrated into Snovault.

Testing
^^^^^^^^
-----------------

In addition, several relevant tests that lived in Snowflakes have been migrated into Snovault. These tests include only those that are specific to Snovault and are not covered in existing Fourfront/CGAP testing. Properly configuring the tests proved challenging as the test framework as previously configured intertwined Snowflakes and Snovault in such a way that Snovault tests could not function without the presence of Snovault.

To fix this, several aspects of the tests have been refactored. We now load test schemas from files and have migrated many of the relevant fixtures from Snowflakes. ``config.py`` also required changes to account for behavior Snovault expected that it inherited from Snowflakes due to how includes work in PyTest.

Test coverage for Snovault should still be fairly strong, especially when combined with that of Fourfront/CGAP. Some indexing tests are marked as flaky as we've found they experience intermittent failures. Updating how we clear the SQS queue has also helped to remidy this issue.
Test coverage for Snovault should still be fairly strong, especially when combined with that of Fourfront/CGAP. Some indexing tests are marked as flaky as we've found they experience intermittent failures. Updating how we clear the SQS queue has also helped to remedy this issue.

Troubleshooting Notes
---------------------

One issue of note that was not solved involved a particular logging related test that appears to pass on local and fail on Travis. The associated test is ``test_indexing_logging``. This tests makes a index post on the application and checks to see that a correct log message was emitted. The log message itself is emitted but for some reason on Travis it is truncated. Even spinning up Travis on an identical container could not reproduce the issue. The relevant line is marked in the test file.
One issue of note that was not solved involved a particular logging related test that appears to pass on local and fail on Travis. The associated test is ``test_indexing_logging``. This tests makes a index post on the application and checks to see that a correct log message was emitted. The log message itself is emitted but for some reason on Travis it is truncated. Even spinning up Travis on an identical container could not reproduce the issue. The relevant line is marked in the test file.
Loading

0 comments on commit b163e5a

Please sign in to comment.