Es item #111

carlvitzthum · 2019-11-07T23:41:47Z

Large change that refactors a lot of storage-related code and adds a new feature: creating items with ES-based properties. See this document for an overview. There are some other misc. changes as well.

Version is currently set to 1.3.8 but may need to change if subsequent releases are made before this is merged.

Summary

Added documentation, including a number of placeholders. Refactoring some existing docs
Removed linkFrom code (unused)
Refactored PickStorage and datastore (on request) slightly and moved to storage.py
Added more resource views using ICachedItem (for ES-based items)
Refactored ESStorage and added create + update methods (for ES-based items)
Refactoring resource.py and connection.py for ES-based items
create_mapping will now skip indices that are not empty for collections with properties_datastore='elasticsearch'
Added some tests to test_indexing.py and test_storage.py

Please note that links between documents will not work until the RTD build runs.

…rage.py

…ed bugfixes

…mespacing stuff

…s_item

…equest.force_datastore

willronchetti

Generally looks good, mostly just small things I've commented on

src/snovault/cache.py

willronchetti · 2019-12-05T14:27:09Z

src/snovault/elasticsearch/esstorage.py

-        return None
-
-    return find_it(our_dict)
+    used_datastore = 'elasticsearch'  # datastore used by this model


Is there a reason this is necessary? May be a little confusing since used_datastore is an attribute on Item, and while CachedModel is related it isn't exactly the same thing. Don't you always know in this case implicitly that the data comes from elasticsearch?

willronchetti · 2019-12-05T14:32:39Z

src/snovault/elasticsearch/esstorage.py

+            document = existing_doc.source
+
+        index_name = get_namespaced_index(self.registry, document['item_type'])
+        # use `refresh='waitfor'` so that the ES model is immediately available


Doesn't this mean under heavy load this could block? May want to make this configurable

willronchetti · 2019-12-05T14:36:57Z

src/snovault/resources.py

@@ -255,6 +258,9 @@ class Item(Resource):
    embedded_list = []
    filtered_rev_statuses = ()
    schema = None
+    # `used_datastore` determines where the properties of the item are store.
+    # None for traditional Postgres setup or "elasticsearch" to use ES document
+    used_datastore = None


I see that this is set yet I suspect in some case it must not be given the errors I saw when running this branch on CGAP. To fix it I had to put a try-except where this field is accessed if I recall for static sections?

willronchetti · 2019-12-05T14:38:44Z

src/snovault/resources.py

+        for operations like getting current sid/max_sid, rev_links, and
+        updating. Leverage `model.used_datastore` to determine source
+        """
+        if self.model.used_datastore != 'database':


Isn't the convention that used_datastore='elasticsearch' and anything else implies database? In which case if the default None were here you would grab the model twice when you didn't need to?

This was a bit confusing because the models themselves (storage.Resource or esstorage.CachedModel) had used_datstore parameters which are always set, as well as the used_datastore parameters on Item. I've refactored it a bit.
I think this is correct though; if you added a third type of model, say used_datastore='redis', then we would still want to use the write model here

willronchetti · 2019-12-05T14:48:47Z

src/snovault/storage.py

+            # properties and sheets, as those are exclusively stored in ES
+            self.write.update(model, {}, None, unique_keys, links)
+            # update contents of the ES documents
+            return storage.update(model, properties, sheets, unique_keys, links)


Don't think this return is necessary since the one below is identical

willronchetti · 2019-12-05T14:59:40Z

src/snovault/tests/test_indexing.py

@@ -1260,3 +1259,151 @@ def test_validators_on_indexing(app, testapp, indexer_testapp):
    val_err_view = testapp.get(ppp_id + '@@validation-errors', status=200).json
    assert val_err_view['@id'] == ppp_id
    assert val_err_view['validation_errors'] == es_res['_source']['validation_errors']
+
+
+def test_elasticsearch_item(app, testapp, indexer_testapp):


Would maybe consider splitting into 2 (even 3) separate tests if you can logically split the behavior. Your call though

Probably a good call. This test grew organically, hence it's massive size. I'll split it up

willronchetti · 2019-12-05T15:00:29Z

src/snovault/tests/test_link.py


    url = '/testing-link-sources-sno/'
    for item in sources:
-        testapp.post_json(url, item, status=201)
+        res = testapp.post_json(url, item)


This and the above change was for testing, right? Should probably be changed back since I believe this one will fail silently (and you don't need res?)

Yep, definitely. Reverted.

…es_datastore

… properties_datastore

…abase

…idation in Connection.__getitem__

netsettler

I tried to read every line of this code, although reviewing the actual algorithmic choices is beyond the scope of what I had time to do. I did reason through things that are locally understandable and didn't see flaws in most of that, but I'm glad Will had eyes on it, too. Some of the bigger sections of code look like they were just moving code around. I pulled up sources and tried to do side-by-side of blocks of code that seemed to have moved a lot, though that's error-prone in its own ways. In general, for big changes like this, reading the commit-by-commit changes can be helpful, and maybe I should do that as well, though it has issues with changes getting superseded. Anyway, reviewing the things I actually remarked on, I see they were largely cosmetic. I'm going to ponder whether the level of work I've done on it constitutes an approval. I'll likely talk to Will before signing off. But I'll just say in words here that I don't have any reason not to approve this, presuming it's passing tests. It's good not to let a change this big linger. And we have other big things like PR #68 that may be affected and also need to get in.

netsettler · 2020-01-30T21:21:58Z

src/snovault/aggregated_items.py

-    if not hasattr(context.model, 'source'):
+    if context.model.used_datastore != 'elasticsearch':


I assume we can rely on every model to have a .used_datastore so we don't have to be so timid about the access?

src/snovault/cache.py

netsettler · 2020-01-30T21:49:57Z

docs/source/local_installation.rst

+
+Step 1: Verify that homebrew is working properly::
+
+    $ sudo brew doctor


I did

~$ sudo brew doctor Error: Running Homebrew as root is extremely dangerous and no longer supported. As Homebrew does not drop privileges on installation you would be giving all build scripts full access to your system. ~$ brew doctor Your system is ready to brew.

I think we should change this line to say just

$ brew doctor

netsettler · 2020-01-30T21:51:23Z

docs/source/local_installation.rst

+
+    $ brew install libevent libmagic libxml2 libxslt openssl postgresql graphviz
+    $ brew install freetype libjpeg libtiff littlecms webp  # Required by Pillow
+    $ brew tap homebrew/versions


~$ brew tap homebrew/versions Error: homebrew/versions was deprecated. This tap is now empty as all its formulae were migrated.

I recommend that we just remove this line. I don't actually think it's needed.

netsettler · 2020-01-30T21:58:09Z

docs/source/local_installation.rst

@@ -0,0 +1,45 @@
+Local Installation


I bet most of these instructions would be better if we borrowed the instructions I just created for forefront. (We could do that in a separate PR sometime. Not a blocker here, though.

netsettler · 2020-01-31T11:17:04Z

src/snovault/tests/test_storage.py

+    registry[STORAGE] = storage
+    # expect existing values to be used
+    register_storage(registry)
+    assert registry[STORAGE].write == 'dummy_db'


It hurts my head that these are called simply read and write, since I keep assuming their value will be used like you'd use

sys.stdout.write("Foo")

Definitely not something to change on this PR, which is already very sprawling and hard to review. But one day I hope we fix this to have a more intuitive name like .storage_for_read, etc.

netsettler · 2020-01-31T11:27:03Z

src/snovault/tests/testing_views.py

+        return [request.resource_path(conn[uuid]) for uuid in
+                self.get_filtered_rev_links(request, rev_name)]


I would rather see the line break before the for so that the fact of an iteration is more prominent. And this is a place I deviate from PEP. I tend to like to use a space after the bracket/brace when the text inside is computed, whether a literal or a comprehension. The space tells you something interesting is afoot and draws your attention. The line break in the proper place makes it like an upside-down for loop, and so a more familiar notation the eye can more easily scan. e.g.,

return [ request.resource_path(conn[uuid]) for uuid in self.get_filtered_rev_links(request, rev_name) ]

netsettler · 2020-01-31T11:55:02Z

src/snovault/tests/test_indexing.py

@@ -1046,7 +1065,7 @@ def test_indexing_esstorage(app, testapp, indexer_testapp):
    # test the following methods:
    es_res_by_uuid = esstorage.get_by_uuid(test_uuid)
    es_res_by_json = esstorage.get_by_json('required', 'some_value', TEST_TYPE)
-    es_res_direct = esstorage.get_by_uuid_direct(test_uuid, namespaced_test_type, TEST_TYPE)


Use keyword calling here, please.

netsettler · 2020-01-31T14:53:51Z

src/snovault/crud_views.py

-    # purge_uuid fxn ensures that all links to the item are removed
-    namespaced_index = get_namespaced_index(request, item_type)
-    request.registry[STORAGE].purge_uuid(item_uuid, namespaced_index, item_type)
+    request.registry[STORAGE].purge_uuid(item_uuid, item_type)


Oh, I thought I'd committed this comment, but apparently hadn't: Please whenever changing argument conventions like this, use keyword calling so that you're sure it's lining up consistently. In fact, I use keyword calling in nearly all cases when there are arbitrarily ordered arguments like these. (By arbitrarily, I mean that there's something natural about putting the uuid first in a function named purge_uuid, so I'm OK with that not being a keyword, though I also don't mind a keyword even then. But in this PR there is a def purge_uuid(self, rid, item_type=None):' in src/snovault/storage.pythat takes one set of argument conventions, and there's also been amax_sidargument that's been used. I recommend writing at leastpurge_uuid(item_uuid, item_type=item_type)and optionally evenpurge_uuid(rid=item_uuid, item_type=item_type)`. This make sure the right method with the right args is being called while in transition.

Some of my other comments, which intended to allude back to that one, may have been confusing in its absence. :)

netsettler · 2020-01-31T14:56:57Z

src/snovault/tests/test_indexing.py

@@ -1262,6 +1280,207 @@ def test_validators_on_indexing(app, testapp, indexer_testapp):
    assert val_err_view['validation_errors'] == es_res['_source']['validation_errors']


+def test_elasticsearch_item_basic(app, testapp, indexer_testapp, es_based_target):


These tests look plausible, though I'm still not up on enough detail to review some aspects of them. I wouldn't mind sitting with Will and going over them in detail to make sure I get the things that are being tested. No action item for the code due to that, though. :)

One question I can ask here, though: These tests are longish. Is that because there is state that runs through the whole thing and changes? Would they not work if carved up into more and smaller tests? Would that require repeating a constant bit of boilerplate setup, or would there be a (pardon pun) pyramidal cost in setup as the series of tests progressed? Or is the reason for lumping them together that the fixture setup is costly and we just don't want too many of such tests? I feel like this is not just a matter of ease of review, but of more concise noticing of particular problems in case of error, of having one error not mask another, and of modularity in test setup so that tests are easier to change or less state-dependent.

netsettler

This all looks good.

carlvitzthum added 30 commits October 21, 2019 13:23

Removed unused interface

2a811f1

Added datastore param to PickStorage methods; move PickStorage to sto…

856d510

…rage.py

Added datastore parameters to connection.py

684c752

initial test

d5d9a56

Fixed a couple imports, improvd test_pick_storage

874832d

request.datastore moved to storage.py. Misc fixes

af3deff

Disable some annoying loggers, improve PickStorage, couple test-relat…

69fb912

…ed bugfixes

Confirm self.read has value in PickStorage.storage

2746e15

small test fix

f90ea44

Revised register_storage function to better handle existing PickStorage

22da12c

Use new register storage with esstorage and mpindexer

d900bf8

test changes

ad4938d

Test fix

4b3e665

Storage reconfiguration and changes for ES-based items

3d6bc0e

Resolve merge conflicts with snovault v1.3.2 and refactor a bit of na…

1b7c93a

…mespacing stuff

Merge branch 'es_item' of https://github.com/4dn-dcic/snovault into e…

e03eb7c

…s_item

Fix for get_by_uuid direct, add TestingLinkTargetElasticSearch

6726132

test_create_es_item_without_es

46dd00b

A couple more misc test fixes

1651282

Fix to PickStorage.find_uuids_linked_to_item

2e1dc8c

Fix collection name

4717d1e

One more small fix

55604b5

Messy, but got something working. Cleanup is needed, especially for r…

c8fa44e

…equest.force_datastore

Refactoring, simplifying, fixing tests

680516f

Fully remove linkFrom

8832f7d

Test embedding with TestingLinkTargetElasticSearch

e69a02e

Misc cleanup

00aab68

small test fix

d43b3a7

Polishing crud_views and connection, added agg_items to ES item tests

71bca4f

Doc changes to cached_views.py

968aab5

carlvitzthum added 7 commits November 21, 2019 12:24

Refactored docs a bit and only include updated ones

097c6f7

Some progress on docs

7ed4990

Filled out storage overview doc

dada23c

Small doc-related changes

16583f1

Added some placeholder docs and made rst formatting consistent

6374b9b

Fix merge conflict in esstorage.py

397e17b

Correctly format inline code

5ce08f3

willronchetti reviewed Dec 5, 2019

View reviewed changes

carlvitzthum added 10 commits December 5, 2019 19:11

Change ES item designation to AbstractCollection.properties_datastore

cb780c3

Fixes for links/uuids for ES items, as well as adjustment to properti…

5b02900

…es_datastore

Check request.datastore first in PickStorage.storage; adjustments for…

a40bc2e

… properties_datastore

Doc changes for properties_datastore

b692fe8

Test and version updates

11fbbd5

Small fixes and refactors related to default properties_datastore=dat…

1994b3a

…abase

Addressed a couple of Will's PR comments

bdc0b11

Refactor TestingLinkTargetElasticSearch tests

540f067

Handle ES-based collections in create mapping

919c3f1

Use new Collcection.default_properties_datastore for uuid cache inval…

a654902

…idation in Connection.__getitem__

This was referenced Dec 17, 2019

Es item -- Fourfront. NOT READY 4dn-dcic/fourfront#1225

Merged

Es item -- CGAP. NOT READY dbmi-bgm/cgap-portal#71

Merged

carlvitzthum added 2 commits January 16, 2020 10:30

Resolve merge

e3c46c3

More docs

a46ebd6

willronchetti requested a review from netsettler January 28, 2020 20:19

netsettler reviewed Jan 31, 2020

View reviewed changes

willronchetti added 2 commits February 3, 2020 11:09

small review changes

a8e2449

Merge branch 'master' into es_item

1a575cb

netsettler approved these changes Feb 3, 2020

View reviewed changes

fix import

dc1f627

willronchetti merged commit b163e5a into master Feb 3, 2020

willronchetti deleted the es_item branch February 3, 2020 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Es item #111

Es item #111

carlvitzthum commented Nov 7, 2019 •

edited

Loading

willronchetti left a comment

willronchetti Dec 5, 2019

willronchetti Dec 5, 2019

willronchetti Dec 5, 2019

willronchetti Dec 5, 2019

carlvitzthum Dec 6, 2019

willronchetti Dec 5, 2019

carlvitzthum Dec 6, 2019

willronchetti Dec 5, 2019

carlvitzthum Dec 6, 2019

willronchetti Dec 5, 2019 •

edited

Loading

carlvitzthum Dec 6, 2019

netsettler left a comment

netsettler Jan 30, 2020

netsettler Jan 30, 2020

netsettler Jan 30, 2020

netsettler Jan 30, 2020

netsettler Jan 31, 2020

netsettler Jan 31, 2020

netsettler Jan 31, 2020

netsettler Jan 31, 2020

netsettler Jan 31, 2020

netsettler Jan 31, 2020

netsettler Jan 31, 2020

netsettler left a comment

		if not hasattr(context.model, 'source'):
		if context.model.used_datastore != 'elasticsearch':


		Step 1: Verify that homebrew is working properly::

		$ sudo brew doctor

		return [request.resource_path(conn[uuid]) for uuid in
		self.get_filtered_rev_links(request, rev_name)]

		@@ -1262,6 +1280,207 @@ def test_validators_on_indexing(app, testapp, indexer_testapp):
		assert val_err_view['validation_errors'] == es_res['_source']['validation_errors']


		def test_elasticsearch_item_basic(app, testapp, indexer_testapp, es_based_target):

Es item #111

Es item #111

Conversation

carlvitzthum commented Nov 7, 2019 • edited Loading

willronchetti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willronchetti Dec 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netsettler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netsettler left a comment

Choose a reason for hiding this comment

carlvitzthum commented Nov 7, 2019 •

edited

Loading

willronchetti Dec 5, 2019 •

edited

Loading