Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clarification]: Clarification on Operator Precedence for SQL and NoSQL Databases #795

Open
otaviojava opened this issue Jul 10, 2024 · 7 comments
Labels
question Further information is requested

Comments

@otaviojava
Copy link
Contributor

Specification

https://github.com/jakartaee/data/blob/main/spec/src/main/asciidoc/query-language.asciidoc

I need clarification on ...

Please clarify or update documentation regarding the following statement:

Syntactically, logical operators are parsed with lower precedence than equality and inequality operators and other conditional expressions listed above. The NOT operator has higher precedence than AND and OR. The AND operator has higher precedence than OR.

Issue:

  • This statement is true for SQL but may not apply to NoSQL databases like MongoDB.
  • NoSQL databases rely on explicit query structures rather than predefined operator precedence.

Request:

  • Update documentation to highlight that while these precedence rules apply to SQL, they are not guaranteed to work for NoSQL databases.

Thank you for your attention to this matter.

Additional information

No response

@otaviojava otaviojava added the question Further information is requested label Jul 10, 2024
@njr-11
Copy link
Contributor

njr-11 commented Jul 10, 2024

JDQL also defines parentheses (, ) for grouping. If NoSQL databases are able to support those, I would expect the same mechanism to be usable to support the specified operator precedence, which is really just implicit grouping.

@otaviojava
Copy link
Contributor Author

Sorry by delay, I was checking this option:

MongoDB does not support parentheses for grouping. Instead, MongoDB uses explicit JSON-like syntax with $and and $or operators to define the structure and precedence of queries.

Redis and Cassandra do not have support for it as well.

At Graph, Neo4J follows the same idea of SQL on grouping and the "AND" precedence.

@njr-11
Copy link
Contributor

njr-11 commented Jul 12, 2024

I poorly worded my reply. It should say, if you able to implement the JDQL requirements of allowing parenthesis for grouping by some underlying mechanism of the NoSQL database, then you should be able to use that same mechanism to implement precedence.

So when you say:

MongoDB does not support parentheses for grouping. Instead, MongoDB uses explicit JSON-like syntax with $and and $or operators to define the structure and precedence of queries.

that should be fine. MongoDB doesn't need to have syntax that matches the JDQL with parenthesis. The mechanism you said MongoDB has seems good enough to support parentheses in JDQL.

For example, with this JDQL,

@Query("WHERE (yearProduced > 2020 AND price < 100) OR (condition = 'REFURBISHED' AND price < 80)")

you can send the following to MongoDB:

$or: [ { $and: [ { "yearProduced": { $gt: 2020 } },
                 { "price": { $lt: 100 } }
               ]
       },
       { $and: [ { "condition": "REFURBISHED" },
                 { "price": { $lt: 80 } },
               ]
       }
     ]

Note that you could send the exact same thing to MongoDB if the query omitted the parentheses and relied upon precedence rules:

@Query("WHERE yearProduced > 2020 AND price < 100 OR condition = 'REFURBISHED' AND price < 80")

@otaviojava
Copy link
Contributor Author

otaviojava commented Jul 14, 2024

My point is the AND as precedence.

MongoDB does not prioritize AND over OR or vice versa; the precedence is determined by the structure of the query:

Let's use this query:

WHERE numBitsRequired = :bits OR numType = :type AND id < :xmax
{
   "$or":[
      {
         "numBitsRequired":4
      },
      {
         "numType":"COMPOSITE"
      },
      {
         "$and":[
            {
               "_id":{
                  "$lt":20
               }
            }
         ]
      }
   ]
}

⚠️ As I said in the meeting, I am using MongoDB as a reference, the point is, unfortunately, I cannot have this precedence guarantee as we have in SQL.

@njr-11
Copy link
Contributor

njr-11 commented Jul 15, 2024

My point is the AND as precedence.

MongoDB does not prioritize AND over OR or vice versa; the precedence is determined by the structure of the query:

Let's use this query:

WHERE numBitsRequired = :bits OR numType = :type AND id < :xmax
{
   "$or":[
      {
         "numBitsRequired":4
      },
      {
         "numType":"COMPOSITE"
      },
      {
         "$and":[
            {
               "_id":{
                  "$lt":20
               }
            }
         ]
      }
   ]
}

⚠️ As I said in the meeting, I am using MongoDB as a reference, the point is, unfortunately, I cannot have this precedence guarantee as we have in SQL.

No, your example above indicates a bug in your implementation. Your implementation is responsible for interpreting the precedence according to the rules of the Jakarta Data specification and translating that to the mechanism of the underlying database, which for the example above, WHERE numBitsRequired = :bits OR numType = :type AND id < :xmax, you should be doing:

{
  "$and":[
    {
      "$or":[
        {
          "numBitsRequired":4
        },
        {
           "numType":"COMPOSITE"
        }
      ]
    }
    {
      "_id":{
        "$lt":20
      }
    }
  ]
}

@otaviojava
Copy link
Contributor Author

Thanks, but I am curious about the scope of this language, how much it should impact the database procedure or the native behavior of a database engine, and how much this SQL simulation on NoSQL could impact performance.

IMHO, if the database has precedence under the order explicitly defined by the query's nesting and logical operator usage, it should not change it by language.

We should embrace SQL behavior as with NoSQL and keep the boundaries of those databases.

It's worth noting that Cassandra, for instance, does not support certain operations like parentheses and grouping. Understanding these limitations is crucial for effective database management. Otherwise, we need to do multiple queries and handle them by memory.

While it's true that some NoSQL databases, like OrientDB and Oracle NoSQL, implement SQL language, it's important to remember that behavior may not be consistent across all NoSQL databases. This caution is necessary to avoid making assumptions that could lead to unexpected results in database management.

My research is ongoing, and I'm currently considering two potential options. I value your input and look forward to discussing these options further.

  1. Update the spec precedence and explain that it might change on NoSQL databases. Also, make the parentheses optional and throw unsupported operation exceptions.
  2. Explain that Jakarta Data Query language works only for SQL databases; once, Cassandra, Redis, and some document databases cannot implement it.

⚠️ I am still researching it and am super open to suggestions.

@njr-11
Copy link
Contributor

njr-11 commented Jul 25, 2024

Thanks, but I am curious about the scope of this language, how much it should impact the database procedure or the native behavior of a database engine, and how much this SQL simulation on NoSQL could impact performance.

Any impact to performance would only occur when the particular keyword or operator that you are implementing is used. So the user gets to opt in by choosing to use it if they value the function it provides. And if they choose not to use it, there should be no impact to them at all. I don't see a problem here. A vendor is free to document that certain operations might be less efficient on NoSQL if that is so, and recommend for and against whatever patterns they see fit.

  1. Update the spec precedence and explain that it might change on NoSQL databases. Also, make the parentheses optional and throw unsupported operation exceptions.

Parentheses (in JDQL) and default order of precedence when parentheses are not used should be handled the same as the other differences between relational/graph/document/column/key-value. The spec should document it in terms of what the category of database is capable of, and the TCK should arrange its assertions accordingly, just as we did for the other capabilities that differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants