Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sqllogictests (v0) #4395

Merged
merged 20 commits into from
Dec 1, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -517,3 +517,31 @@ jobs:
# If you encounter an error, run './dev/update_config_docs.sh' and commit
./dev/update_config_docs.sh
git diff --exit-code

# Run sqllogictests
sql-logic-tests:
name: run sqllogictests
needs: [linux-build-lib]
runs-on: ubuntu-latest
container:
image: amd64/rust
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
steps:
- uses: actions/checkout@v3
with:
submodules: true
- name: Cache Cargo
uses: actions/cache@v3
with:
path: /github/home/.cargo
# this key equals the ones on `linux-build-lib` for re-use
key: cargo-cache-
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: stable
- name: Run sqllogictests
run: cargo run -p datafusion-sqllogictests
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ members = [
"test-utils",
"parquet-test-utils",
"benchmarks",
"tests/sqllogictests",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend moving this test into datafusion/core/tests so that it would then be run via

cargo test -p datafusion --test sqllogictests

I don't see any reason to put it into its own top level crate (though if others feel differently perhaps we could move the code into datafusion/sqllogictest to match the structure of the other crates in this repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well TIL about cargo tests' harness = false! Thanks for the tip

]

[profile.release]
Expand Down
1 change: 1 addition & 0 deletions tests/sqllogictests/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.py
29 changes: 29 additions & 0 deletions tests/sqllogictests/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http:#www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

[package]
name = "datafusion-sqllogictests"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https:#doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
async-trait = "0.1.58"
datafusion = { path = "../../datafusion/core" }
sqllogictest = "0.8.0"
tokio = { version = "1.0", features = ["macros", "rt-multi-thread"] }
45 changes: 45 additions & 0 deletions tests/sqllogictests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#### Overview

This is the Datafusion implementation of [sqllogictest](https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki). We use [sqllogictest-rs](https://github.com/risinglightdb/sqllogictest-rs) as a parser/runner of `.slt` files in `test_files`.

#### Running tests

`cargo run -p datafusion-sqllogictests`

#### Setup


#### sqllogictests

> :warning: **Warning**:Datafusion's sqllogictest implementation and migration is still in progress. Definitions taken from https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki

sqllogictest is a program originally written for SQLite to verify the correctness of SQL queries against the SQLLite engine. The program is engine-agnostic and can parse sqllogictest files (`.slt`), runs queries against an SQL engine and compare the output to the expected output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sqllogictest is a program originally written for SQLite to verify the correctness of SQL queries against the SQLLite engine. The program is engine-agnostic and can parse sqllogictest files (`.slt`), runs queries against an SQL engine and compare the output to the expected output.
sqllogictest is a program originally written for SQLite to verify the correctness of SQL queries against the SQLite engine. The program is engine-agnostic and can parse sqllogictest files (`.slt`), runs queries against an SQL engine and compare the output to the expected output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this is an amazing writeup -- thank you -- I recommend we eventually move this content into the sqllogictest repo and link to that document here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep makes sense! I'll track this in a later PR as I improve these docs iteratively.


Tests in the `.slt` file are a sequence of query record generally starting with `CREATE` statements to populate tables and then further queries to test the populated data (arrow-datafusion exception).

Query records follow the format:
```sql
# <test_name>
query <type_string> <sort_mode> <label>
<sql_query>
----
<expected_result>
```

- `test_name`: Uniquely identify the test name (arrow-datafusion only)
- `type_string`: A short string that specifies the number of result columns and the expected datatype of each result column. There is one character in the <type_string> for each result column. The characters codes are "T" for a text result, "I" for an integer result, and "R" for a floating-point result.
- (Optional) `label`: sqllogictest stores a hash of the results of this query under the given label. If the label is reused, then sqllogictest verifies that the results are the same. This can be used to verify that two or more queries in the same test script that are logically equivalent always generate the same output.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no explanation for sort_mode

- `expected_result`: In the results section, integer values are rendered as if by printf("%d"). Floating point values are rendered as if by printf("%.3f"). NULL values are rendered as "NULL". Empty strings are rendered as "(empty)". Within non-empty strings, all control characters and unprintable characters are rendered as "@".

##### Example

```sql
# group_by_distinct
query TTI
SELECT a, b, COUNT(DISTINCT c) FROM my_table GROUP BY a, b ORDER BY a, b
----
foo bar 10
foo baz 5
foo 4
3
```
121 changes: 121 additions & 0 deletions tests/sqllogictests/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

use async_trait::async_trait;
use datafusion::arrow::csv::WriterBuilder;
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::prelude::SessionContext;
use std::path::PathBuf;

use sqllogictest::TestError;
pub type Result<T> = std::result::Result<T, TestError>;

mod setup;
mod utils;

const TEST_DIRECTORY: &str = "tests/sqllogictests/test_files";
const TEST_CATEGORIES: [TestCategory; 2] =
[TestCategory::Aggregate, TestCategory::ArrowTypeOf];

pub enum TestCategory {
Aggregate,
ArrowTypeOf,
}

impl TestCategory {
fn as_str(&self) -> &'static str {
match self {
TestCategory::Aggregate => "Aggregate",
TestCategory::ArrowTypeOf => "ArrowTypeOf",
}
}

fn test_filename(&self) -> &'static str {
match self {
TestCategory::Aggregate => "aggregate.slt",
TestCategory::ArrowTypeOf => "arrow_typeof.slt",
}
}

async fn register_test_tables(&self, ctx: &SessionContext) {
println!("[{}] Registering tables", self.as_str());
match self {
TestCategory::Aggregate => setup::register_aggregate_tables(ctx).await,
TestCategory::ArrowTypeOf => (),
}
}
}

pub struct DataFusion {
ctx: SessionContext,
test_category: TestCategory,
}

#[async_trait]
impl sqllogictest::AsyncDB for DataFusion {
type Error = TestError;

async fn run(&mut self, sql: &str) -> Result<String> {
println!(
"[{}] Running query: \"{}\"",
self.test_category.as_str(),
sql
);
let result = run_query(&self.ctx, sql).await?;
Ok(result)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
/// Engine name of current database.
fn engine_name(&self) -> &str {
"DataFusion"
}
/// [`Runner`] calls this function to perform sleep.
///
/// The default implementation is `std::thread::sleep`, which is universial to any async runtime
/// but would block the current thread. If you are running in tokio runtime, you should override
/// this by `tokio::time::sleep`.
async fn sleep(dur: Duration) {
tokio::time::sleep(dur).await;
}
}


#[tokio::main]
pub async fn main() -> Result<()> {
for test_category in TEST_CATEGORIES {
let filename = PathBuf::from(format!(
"{}/{}",
TEST_DIRECTORY,
test_category.test_filename()
));
let ctx = SessionContext::new();
test_category.register_test_tables(&ctx).await;

let mut tester = sqllogictest::Runner::new(DataFusion { ctx, test_category });
// TODO: use tester.run_parallel_async()
tester.run_file_async(filename).await.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tester.run_file_async(filename).await.unwrap();
tester.run_file_async(filename).await?;

}

Ok(())
}

fn format_batches(batches: &[RecordBatch]) -> Result<String> {
let mut bytes = vec![];
{
let builder = WriterBuilder::new().has_headers(false).with_delimiter(b',');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the reason to write out CSV output so that we can reuse existing slt files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope I actually strip the comma later down in the function - I'll set the delimiter to space here and remove the replace call down below

let mut writer = builder.build(&mut bytes);
for batch in batches {
writer.write(batch).unwrap();
}
}

let formatted = String::from_utf8(bytes).unwrap().replace(',', " ");
Ok(formatted)
}

async fn run_query(ctx: &SessionContext, sql: impl Into<String>) -> Result<String> {
let df = ctx.sql(&sql.into()).await.unwrap();
let results: Vec<RecordBatch> = df.collect().await.unwrap();
let formatted_batches = format_batches(&results)?;
Ok(formatted_batches)
}
Loading