Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Commit

Permalink
Fix object/nested field select issue (#584)
Browse files Browse the repository at this point in the history
* Fix object and deep object select issue

* Fix same field added more than once bug

* Add more IT cases

* Add doctest

* Add doctest

* Add doctest

* Update doctest with more examples

* Prepare PR
  • Loading branch information
dai-chen authored Jul 20, 2020
1 parent 9d72867 commit d74cb81
Show file tree
Hide file tree
Showing 17 changed files with 426 additions and 34 deletions.
3 changes: 2 additions & 1 deletion docs/category.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
],
"sql_cli": [
"user/dql/expressions.rst",
"user/dql/functions.rst"
"user/dql/functions.rst",
"user/beyond/partiql.rst"
]
}
111 changes: 110 additions & 1 deletion docs/user/beyond/partiql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,60 @@ PartiQL (JSON) Support
Introduction
============

PartiQL is a SQL-compatible query language that makes it easy and efficient to query semi-structured and nested data regardless of data format. For now our implementation is only partially compatible with PartiQL specification and more support will be provided in future.
In Elasticsearch, there are two types of JSON field in Elasticsarch (called "properties"): ``object`` and ``nested``. An object field can have inner field(s) which could be a simple one or another object field recursively. A nested field is a special version of object type that allows inner field be queried independently.

To support queries for both types, we follow the query language syntax defined in PartiQL specification. PartiQL is a SQL-compatible query language that makes it easy and efficient to query semi-structured and nested data regardless of data format. For now our implementation is only partially compatible with PartiQL specification and more support will be provided in future.

Test Data
=========

Description
-----------

The test index ``people`` is to demonstrate our support for queries with deep nested object fields.
The test index ``employees_nested`` used by all examples in this document is very similar to the one used in official PartiQL documentation.

Example: People
---------------

There are three fields in test index ``people``: 1) deep nested object field ``city``; 2) object field of array value ``account``; 3) nested field ``projects``::

{
"mappings": {
"properties": {
"city": {
"properties": {
"name": {
"type": "keyword"
},
"location": {
"properties": {
"latitude": {
"type": "double"
}
}
}
}
},
"account": {
"properties": {
"id": {
"type": "keyword"
}
}
},
"projects": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
}

Example: Employees
------------------

Expand Down Expand Up @@ -78,6 +122,71 @@ Result set::
]
}

Querying Nested Tuple Values
============================

Description
-----------

Before looking into how nested object field (tuple values) be queried, we need to figure out how many cases are there and how it being handled by our SQL implementation. Therefore, first of all, let's examine different cases by the query support matrix as follows. This matrix summerizes what has been supported so far for queries with the object and nested fields involved. Note that another complexity is that any field in Elasticsearch, regular or property, can have contain more than one values in a single document. This makes object field not always a tuple value which needs to be handled separately.

+-------------------------+---------------+-----------------------+---------------------------------------------+-------------------------+
| Level/Field Type | Object Fields | Object Fields (array) | Nested Fields | Comment |
+=========================+===============+=======================+=============================================+=========================+
| Selecting top level | Yes | Yes | Yes | The original JSON of |
| | | | | field value is returned |
| | | | | which is either a JSON |
| | | | | object or JSON array. |
+-------------------------+---------------+-----------------------+---------------------------------------------+-------------------------+
| Selecting second level | Yes | No | Yes | |
| | | (null returned) | (or null returned if not in PartiQL syntax) | |
+-------------------------+---------------+-----------------------+---------------------------------------------+ PartiQL specification |
| Selecting deeper levels | Yes | No | No | is followed |
| | | (null returned) | (exception may | |
| | | | be thrown) | |
+-------------------------+---------------+-----------------------+---------------------------------------------+-------------------------+

Example 1: Selecting Top Level
------------------------------

Selecting top level for object fields, object fields of array value and nested fields returns original JSON object or array of the field. For example, object field ``city`` is a JSON object, object field (of array value) ``accounts`` and nested field ``projects`` are JSON arrays::

od> SELECT city, accounts, projects FROM people;
fetched rows / total rows = 1/1
+-----------------------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------+
| city | accounts | projects |
|-----------------------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------|
| {'name': 'Seattle', 'location': {'latitude': 10.5}} | [{'id': 1},{'id': 2}] | [{'name': 'AWS Redshift Spectrum querying'},{'name': 'AWS Redshift security'},{'name': 'AWS Aurora security'}] |
+-----------------------------------------------------+-----------------------+----------------------------------------------------------------------------------------------------------------+

Example 2: Selecting Deeper Levels
----------------------------------

Selecting at deeper levels for object fields of regular value returns inner field value. For example, ``city.location`` is an inner object field and ``city.location.altitude`` is a regular double field::

od> SELECT city.location, city.location.latitude FROM people;
fetched rows / total rows = 1/1
+--------------------+--------------------------+
| city.location | city.location.latitude |
|--------------------+--------------------------|
| {'latitude': 10.5} | 10.5 |
+--------------------+--------------------------+

Example 3: Selecting Field of Array Value
-----------------------------------------

Select deeper level for object fields of array value which returns ``NULL``. For example, because inner field ``accounts.id`` has three values instead of a tuple in this document, null is returned. Similarly, selecting inner field ``projects.name`` directly in nested field returns null::

od> SELECT accounts.id, projects.name FROM people;
fetched rows / total rows = 1/1
+---------------+-----------------+
| accounts.id | projects.name |
|---------------+-----------------|
| null | null |
+---------------+-----------------+

For selecting second level for nested fields, please read on and find more details in the following sections.

Querying Nested Collection
==========================

Expand Down
1 change: 1 addition & 0 deletions doctest/test_data/people.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"city": {"name": "Seattle", "location": {"latitude": 10.5}}, "accounts": [{"id": 1}, {"id": 2}], "projects": [{"name": "AWS Redshift Spectrum querying"}, {"name": "AWS Redshift security"}, {"name": "AWS Aurora security"}] }
22 changes: 15 additions & 7 deletions doctest/test_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

import doctest
import os
import os.path
import zc.customdoctests
import json
import re
Expand All @@ -30,6 +31,7 @@
ENDPOINT = "http://localhost:9200"
ACCOUNTS = "accounts"
EMPLOYEES = "employees"
PEOPLE = "people"


class DocTestConnection(ESConnection):
Expand Down Expand Up @@ -88,17 +90,23 @@ def bash_transform(s):
ps1=r'sh\$', comment_prefix='#', transform=bash_transform)


def set_up_accounts(test):
def set_up_test_indices(test):
set_up(test)
load_file("accounts.json", index_name=ACCOUNTS)
load_file("people.json", index_name=PEOPLE)


def load_file(filename, index_name):
filepath = "./test_data/" + filename
# Create index with the mapping if mapping file exists
mapping_file_path = './test_mapping/' + filename
if os.path.isfile(mapping_file_path):
with open(mapping_file_path, 'r') as f:
test_data_client.indices.create(index=index_name, body=f.read())

# generate iterable data
data_file_path = './test_data/' + filename
def load_json():
with open(filepath, "r") as f:
with open(data_file_path, 'r') as f:
for line in f:
yield json.loads(line)

Expand All @@ -114,7 +122,7 @@ def set_up(test):

def tear_down(test):
# drop leftover tables after each test
test_data_client.indices.delete(index=[ACCOUNTS, EMPLOYEES], ignore_unavailable=True)
test_data_client.indices.delete(index=[ACCOUNTS, EMPLOYEES, PEOPLE], ignore_unavailable=True)


docsuite = partial(doctest.DocFileSuite,
Expand All @@ -140,7 +148,7 @@ def doc_suite(fn):
return docsuite(
fn,
parser=bash_parser,
setUp=set_up_accounts,
setUp=set_up_test_indices,
globs={
'sh': partial(
subprocess.run,
Expand Down Expand Up @@ -178,7 +186,7 @@ def load_tests(loader, suite, ignore):
docsuite(
fn,
parser=sql_cli_parser,
setUp=set_up_accounts
setUp=set_up_test_indices
)
)

Expand All @@ -188,7 +196,7 @@ def load_tests(loader, suite, ignore):
docsuite(
fn,
parser=ppl_cli_parser,
setUp=set_up_accounts
setUp=set_up_test_indices
)
)

Expand Down
35 changes: 35 additions & 0 deletions doctest/test_mapping/people.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
{
"mappings": {
"properties": {
"city": {
"properties": {
"name": {
"type": "keyword"
},
"location": {
"properties": {
"latitude": {
"type": "double"
}
}
}
}
},
"account": {
"properties": {
"id": {
"type": "keyword"
}
}
},
"projects": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
/*
* Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*
*/

package com.amazon.opendistroforelasticsearch.sql.legacy;

import static com.amazon.opendistroforelasticsearch.sql.legacy.TestsConstants.TEST_INDEX_DEEP_NESTED;
import static com.amazon.opendistroforelasticsearch.sql.util.MatcherUtils.rows;
import static com.amazon.opendistroforelasticsearch.sql.util.MatcherUtils.schema;
import static com.amazon.opendistroforelasticsearch.sql.util.MatcherUtils.verifyDataRows;
import static com.amazon.opendistroforelasticsearch.sql.util.MatcherUtils.verifySchema;

import com.amazon.opendistroforelasticsearch.sql.legacy.utils.StringUtils;
import org.json.JSONArray;
import org.json.JSONObject;
import org.junit.Test;

/**
* Integration test for Elasticsearch object field (and nested field).
* This class is focused on simple SELECT-FROM query to ensure right column
* number and value is returned.
*/
public class ObjectFieldSelectIT extends SQLIntegTestCase {

@Override
protected void init() throws Exception {
loadIndex(Index.DEEP_NESTED);
}

@Test
public void testSelectObjectFieldItself() {
JSONObject response = new JSONObject(query("SELECT city FROM %s"));

verifySchema(response, schema("city", null, "object"));

// Expect object field itself is returned in a single cell
verifyDataRows(response,
rows(new JSONObject(
"{\n"
+ " \"name\": \"Seattle\",\n"
+ " \"location\": {\"latitude\": 10.5}\n"
+ "}")
)
);
}

@Test
public void testSelectObjectInnerFields() {
JSONObject response = new JSONObject(query(
"SELECT city.location, city.location.latitude FROM %s"));

verifySchema(response,
schema("city.location", null, "object"),
schema("city.location.latitude", null, "double")
);

// Expect inner regular or object field returned in its single cell
verifyDataRows(response,
rows(
new JSONObject("{\"latitude\": 10.5}"),
10.5
)
);
}

@Test
public void testSelectNestedFieldItself() {
JSONObject response = new JSONObject(query("SELECT projects FROM %s"));

// Nested field is absent in ES Get Field Mapping response either hence "object" used
verifySchema(response, schema("projects", null, "object"));

// Expect nested field itself is returned in a single cell
verifyDataRows(response,
rows(new JSONArray(
"[\n"
+ " {\"name\": \"AWS Redshift Spectrum querying\"},\n"
+ " {\"name\": \"AWS Redshift security\"},\n"
+ " {\"name\": \"AWS Aurora security\"}\n"
+ "]")
)
);
}

@Test
public void testSelectObjectFieldOfArrayValuesItself() {
JSONObject response = new JSONObject(query("SELECT accounts FROM %s"));

// Expect the entire list of values is returned just like a nested field
verifyDataRows(response,
rows(new JSONArray(
"[\n"
+ " {\"id\": 1},\n"
+ " {\"id\": 2}\n"
+ "]")
)
);
}

@Test
public void testSelectObjectFieldOfArrayValuesInnerFields() {
JSONObject response = new JSONObject(query("SELECT accounts.id FROM %s"));

// We don't support flatten object field of list value so expect null returned
verifyDataRows(response, rows(JSONObject.NULL));
}

private String query(String sql) {
return executeQuery(
StringUtils.format(sql, TEST_INDEX_DEEP_NESTED),
"jdbc"
);
}

}
Loading

0 comments on commit d74cb81

Please sign in to comment.