Simplify Data Workflows by Combining Programming and Data Operations!
Beyond basic queries, SQL struggles with complex data manipuiation. APIs often return data in JSON format, requiring additional parsing. YADL bridges the gap, offering built-in functionality for both - write less code, analyze more effectively. It's a programming language that allows to parse different file types (json, csv, etc.) with a singe load function and to work with data operations on it.
YADL is a Turing-complete language and was created in a group at university. Thanks to all of them. This repository is a re-upload.
weather_data = load("./weather-data.json", "json") // open json file
bern = weather_data["bern"] // extract "bern" object
// example of a function (fat-arrow style)
has_freezing_days = (city) => {
return check_any(city, (item) => item["temp"] < 0)
}
print3("Has Bern freezing days?: " + has_freezing_days(bern))
print3("Has Bern not freezing days?: " + check_all(bern, (item) => item["temp"] < 0)) // use function in print-statement
print3("Is Bern the best city?: idk, im a computer")
// find continuous data with a while-loop and if/else
print3("Has Bern continuous data?:")
index = 1
continuous_data = true
while (index < len(bern) and continuous_data) {
if (bern[index-1]["day"] +1 != bern[index+0]["day"]) {
continuous_data = false
}
index = index + 1
}
print3(continuous_data)
Let's examine the example above!
Assuming there is the file weather-data.json
(you can also find it in fancy-tests/weather-data.json
), this file will use functions, loops and if-statements to analyze the data in the JSON. This is of course only a small demonstration.
Functions, that are not specifically declared in this example are inbuilt functions. For all in-built functions (with description), click here.
Please keep in mind, that this project was created in a short period of time. This has not reach its full potential. The most important idea we had in mind, is to load the data chunk-wise so that not all data needs to be stored in memory.
YADL comes with a lot of in-built functions - map
, filter
and reduce
to name a few. All of them are described here.
We also have anonymous functions!
This code from the example uses one:
has_freezing_days = (city) => {
return check_any(city, (item) => item["temp"] < 0)
}
print3("Is it freezing?", has_freezing_days(bern)
Notice this line: check_any(city, (item) => item["temp"] < 0)
Right there, we use the anonymous function (item) => item["temp"] < 0)
which returns true, if attribute "temp" of the passed object "item" is below 0.
Is this example, it is passed to one of our in-built function from the standard library check_any
("see here" for more information). It is a higher-order function, which takes the anonymous function as an argument.
But we can also immediately call anonymous functions instead:
print3("2 + 1:", ((a,b) => a+b)(2,1))
- In the example, you might have noticed the
index+0
in the if-statement. This is because the parser expects a value of an operation. I will fix it when I have time!
btw,index+0
is an easy fix - in our group we made fun of this bug by using an anonymous identity function and passing the wanted value:((i) => i)(index)
🤪 - After an if-statement, you have to insert a blank line - also an issue with the parser.
YADL is an interpreted language. That means we use a parser to read the yadl-file and an interpreter to interpret what the parser has parsed. Scala was due to its functional-first style the language of our choice and to make our lifes easier, we use a framework for the parser called FastParse. This is a combinator parser, meaning every parsing-rule has its own parser and each parser can take another parser - they are higher-order functions. The interpreter has to keep track of all the variables, functions and evaluates operations (like x = 2+2*3
; notice the precendece of multiplication/addition), loops and conditionals. The data stream functions as well as the interpreter and parser are handled in Scala.
- download the JAR from this GitHub repository
Run this command:
java -jar <path-to-jar> <path-to-yadl>
path-to-jar is the path to the downloaded jar-file and path-to-yadl the path to your yadl program. You can download and use the test file fancy-tests/wather-data.yadl
.
- Scala 3.X
- Recent Java SDK (openjdk 22 for example)
Run the following commands in the project root.
Just building:
sbt compile
Running:
sbt run
Running with Program arguments:
sbt "run args..."
The quotes are neccessary here because otherwise they would be interpreted as a new command from sbt.
install the Scala Plugin from the jetbrains marketplace.
When you are in a project go to the top-right where you select your current task and chose 'Edit Configurations...' in the drop-down menu.
In the Configuration menu select the +
to add a new task and chose the 'sbt Task'.
Now you can give the task a meaningful name and pick a task to run (for example run
or "run args..."
with arguments) among other settings.
Once done hit 'Apply' or 'OK' to finish the task setup.
Now you should be able to build/run/package/... the project depending on what you chose as a task.
Similar to building in the terminal you execute the following for the scala unit tests:
sbt test
These tests involve a bit more work to be run. For the duration of these steps I assume you are at the root of the project.
Install pytest
Similar to building in the terminal you execute the assembly
-task added by the project/plugin.sbt
build config:
sbt assembly
This will emit a jar-file which we use in the following steps.
The python scripts relies on the YADL_JAR
envirnoment variable to be pointed to the yadl interpreter.
To set the env. var. use:
For Linux and Mac:
export YADL_JAR=target/scala-3.4.1/yadl.jar
For Windows:
set YADL_JAR=target/scala-3.4.1/yadl.jar
Finally run pytest:
pytest