Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import mechanism in DaphneDSL #214

Closed
daphne-eu opened this issue Mar 8, 2022 · 8 comments · Fixed by #410
Closed

Import mechanism in DaphneDSL #214

daphne-eu opened this issue Mar 8, 2022 · 8 comments · Fixed by #410
Assignees
Milestone

Comments

@daphne-eu
Copy link
Owner

In GitLab by @pdamme on Mar 8, 2022, 18:20

Like many other script/programming languages, DaphneDSL should have some kind of import/include mechanism enabling users to employ code in other files. This is necessary for both, user files and DaphneDSL standard lib files.

Open questions include how exactly the file to include shall be specified and how exactly it will be included.

@pdamme pdamme closed this as completed Jun 9, 2022
@pdamme pdamme reopened this Jun 9, 2022
@pdamme
Copy link
Collaborator

pdamme commented Jun 9, 2022

We need this feature to be able to reuse DaphneDSL code in other DaphneDSL scripts. For instance, we want to create a DAPHNE standard library of high-level data analysis and ML functions, which are in turn implemented as functions in DaphneDSL. Then, users should be able to simply import those library functions and use them in their own DaphneDSL scripts.

It could look like this:

import "kmeans.daphne";
in = readMatrix("someFile.csv");
print(kmeans(in, 5, 20));

The particulars of this feature are yet to be designed. Here is some inspiration:

  • Importing a DaphneDSL script file should have roughly the same effect as pasting its contents literally.
  • An exception to this could be that function and variable names could be prefixed with the file name, e.g., foo() in bar.daphne could become bar.foo() in the importing script, to avoid name clashes.
  • It would be okay to allow import statements only in the beginning of a DaphneDSL script.
  • Parser errors should always indicate the actual file where the problem was encountered. So if main.daphne imports bar.daphne and there is a syntax error in bar.daphne, then the error message should report bar.daphne.

@akroviakov
Copy link
Collaborator

Assigned to @akroviakov .

@akroviakov
Copy link
Collaborator

foo() in bar.daphne could become bar.foo() in the importing script, to avoid name clashes.
Identifier rule does not allow . in the name, could _ be used as a delimeter for concatenating names with the prefix or must . be used?

Importing a DaphneDSL script file should have roughly the same effect as pasting its contents literally.
The example with kmeans (test/api/cli/algorithms/kmeans.daphne) requires command line arguments, so I assume a function (i.e., def km(r:si64,c:si64,f:si64,i:si64)) was meant or am I wrong?

@pdamme
Copy link
Collaborator

pdamme commented Jun 20, 2022

Actually, I explicitly chose . because it is not normally allowed in identifiers ;) . The idea is to guarantee that the importing DaphneDSL script cannot itself define a variable/function whose name clashes with an imported variable/function. For instance, if we used _ (allowed in identifiers), importing bar.daphne could introduce bar_foo(), but nothing would prevent the importing script from defining a function bar_foo() itself which could cause ambiguities/errors, which could be hard to understand for the person writing the importing script, since the internals of imported scripts should not need to be (fully) known.

Thus, I would really use ., and syntactically, that's also inspired by other languages, such as Python. I could imagine the following bahavior in DaphneDSL:

  • When parsing a variable/function foo in an imported script bar.daphne, it should appear as bar.foo in the symbol table of the importing script.
  • INDENTIFIER should stay as it is.
  • References to variables in expressions should allow prefixes: | var=(( IDENTIFIER '.' )* IDENTIFIER) # identifierExpr. The fully qualified name can simply be looked up in a symbol table (no need to extract prefix and variable name separately).
  • Assignments to variables should also allow prefixes: (IDENTIFIER '.')* IDENTIFIER indexing? ( ',' (IDENTIFIER '.')* IDENTIFIER indexing? )* '=' expr ';' ;, but if the fully qualified name contains a ., the symbol table entry must already exist. This is to ensure that we can (a) update global variables in imported scripts (bar.foo = 123;), but cannot create any new variables in an imported script (since it would be useless and might hide bugs, such as assuming the wrong variable name).
  • Function definitions must not use prefixes (same rationale as for assignment: we don't want to allow defining functions in an imported script from within the importing script).

The k-means was only a syntax example. Was not meant to exactly match the kmeans script ;) .

@akroviakov
Copy link
Collaborator

I'm trying to write unit test cases that use information from UserConfig.json.
An illustrative test case for one file (something similar to ConfigTest.cpp):

const char* configFilePath = "test/api/cli/import/UserConfig.json";
DYNAMIC_SECTION(configFilePath) {
    compareDaphneToRef(
            "test/api/cli/import/import_success_4.txt",
            "test/api/cli/import/import_success_4.daphne", "--config", configFilePath
    );
}

It fails, so I added a check for user config existence:

ConfigParser::fileExists(args["--config"])

And it showed Parser error: could not open file '' for reading user config.
I then took a look at what LOG in Utils.h says and it displays the following:

stdout: 
stderr: Parser error: could not open file '' for reading user config

status: 256
daphne --config test/api/cli/import/UserConfig.json test/api/cli/import/import_success_4.daphne

The member args in DaphneDSLVisitor.cpp has size 0, so I'm a bit confused.
Even if I simply take a test from ConfigTest.cpp checking for success and only change paths, it still displays this error.

Those scripts that do not interact with UserConfig work fine regardless of whether --config was specified.
What would be the proper way to utilize UserConfig in unit test cases?

@pdamme
Copy link
Collaborator

pdamme commented Jul 4, 2022

Unfortunately, I cannot reproduce this problem on main. I tried modifying the existing script-level test case test/api/cli/vectorized/MultiThreadedOpsTest.cpp by passing an additional user config file via --config. When the specified file exists, everything works fine and the test passes. When the file doesn't exist, I get the following error message Error while reading user config: could not open file 'src/api/cli/UserConfig.json2' for reading user config. There are two significant differences to your observation:

  1. The error message containts the file name I passed.
  2. The error message starts with Error while reading user config:, while yours starts with Parser error:. These prefixes are prepended in src/api/cli/daphne.cpp and indicate at which stage of the DAPHNE program execution the exception/error happened. I'm surprised that it is a Parser error for you. This suggests that the system tries to read the config file during parsing. However, this should not be the case, since the user config is read very early even before the parsing. (And that seems to work even in your case, since you don't get the error there.)

When I modify the "config success" test case in test/api/cli/config/ConfigTest.cpp by specifying a non-existing file, I also get an error message starting with Error while reading user config:.

So I suspect that something is going wrong in the parser in your code. If you cannot find the error, please create a draft PR, so that I can try out your state of the code.

PS: Since you mention args in DaphneDSLVisitor.cpp: I'm not sure in how far this can be connected to this issue, since that variable is about the parameters of a DaphneDSL function call...

@akroviakov
Copy link
Collaborator

Please correct me if I'm wrong, but aren't scriptArgsFinal in daphne.cpp passed to DaphneDSLParser:

DaphneDSLParser parser(scriptArgsFinal);

where they are stored as args:

    DaphneDSLParser(std::unordered_map<std::string, std::string> args) : args(args) {
        //
    }

which are then passed to the visitor in DaphneDSLParser::parseStream():

DaphneDSLVisitor visitor(module, builder, args);

and so on the console call like

./build.sh; build/bin/daphne test/api/cli/import/import_success_4.daphne --config=test/api/cli/import/UserConfig.json

we can access --config inside DaphneDSLVisitor through args and read it?

Also, I found what was the issue with:

    compareDaphneToRef(
            "test/api/cli/import/import_success_4.txt",
            "test/api/cli/import/import_success_4.daphne", "--config", configFilePath
    );

It appears that in unit test cases "--config", configFilePath are treated as separate arguments instead of one single argument, so we'd want to use = as daphne's --help suggests:

--config test/api/cli/import/UserConfig.json
vs.
--config=test/api/cli/import/UserConfig.json

Do I misunderstand something about the current usage of "--config", configFilePath as arguments in test/api/cli/config/ConfigTest.cpp?

And a bit of a separate question, we need to read some paths in DaphneDSLVisitor, to get them we can:

  1. Pass DaphneUserConfig from daphne.cpp all the way to DaphneDSLVisitor as an argument.
  2. Rely on the already established args map, create DaphneUserConfig and read into it on imports.

The first option seems better than reading json on each import, but maybe you have some precautions against dragging DaphneUserConfig into the visitor?

@akroviakov
Copy link
Collaborator

Ok, I see my confusion with unit tests and config passing.
In the above message I mentioned one call, and although this call is allowed, it is wrong:

./build.sh; build/bin/daphne test/api/cli/import/import_success_4.daphne --config=test/api/cli/import/UserConfig.json

since it is not conform to daphne's usage stated in --help:

daphne [options] script [arguments]

so a correct call would be:

./build.sh; build/bin/daphne --config=test/api/cli/import/UserConfig.json test/api/cli/import/import_success_4.daphne 

And just if someone would need the difference explained:
The first call gets --config into scriptArgsFinal in daphne.cpp, but user_config will be default.
The second call correctly fills user_config and leaves scriptArgsFinal empty.

What I was trying to do is to retrieve --config info from what is basically a copy of scriptArgsFinal.
But since in unit testing (e.g., compareDaphneToStr in Utils.h) we pass args... before scriptFilePath, the scriptArgsFinal was empty and I was basically looking at an empty container.

So in short, I haven't noticed that I violated daphne's usage stated in --help and got confused.

And yes, the question above whether to use DaphneUserConfig or args map is no longer relevant. DaphneUserConfig will be used to pass config to DaphneDSLVisitor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants