Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add include_file statement #74

Merged
merged 23 commits into from
Aug 13, 2021
Merged

Add include_file statement #74

merged 23 commits into from
Aug 13, 2021

Conversation

dlurton
Copy link
Member

@dlurton dlurton commented May 21, 2021

Implements #57

Why this PR was created: PartiQL customers need the ability to make their own permuted version of the partiql_ast type domain. Since until now PIG only supported single-file type universes, the only way to do that is to copy and paste the partiql_ast into their own project and permute from there, but this is not a sustainable solution.

This is the implementation of the first phase of this task which adds the ability to "import" type universes into another using a new statement: (include_file <path-to-file>). The second phase of this feature will be to avoid duplication of the generated classes for the imported type domains. (Since the partiql_ast code will have been previously generated and will be part of the main PartiQL library.)

  • Imports are handled by the parser. Statements from the imported type universe are included in the current type universe being parsed as if they were "pasted" at the location of the import.
  • Recursive imports are supported.
  • Cycles are ignored--that is, if an include_file statement tries to include a file that was previously parsed or is currently being parsed, this does not cause an error--the include_file statement is simply ignored. Similar to #pragma once from modern C/C++ compilers.
  • Import cycles are detected using by using a set containing the "canonical" path to all either previously parsed or currently being parsed, this should prevent confusion when using relative paths.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Had a few questions/nits and may need some import error tests.

README.md Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/cmdline/Command.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/InputSource.kt Outdated Show resolved Hide resolved
pig/test-domains/main.ion Show resolved Hide resolved
Copy link

@almann almann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think you should really be explicit about the notion of what import is importing. It reads more like include with guards from a C/C++ context than what I would normally think of as import from a Java/Python perspective, where you are importing something logical.

I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include or the like to keep our options open for a logical import later.

What is the relationship between types imported in one universe versus another? I think the answer is no relationship, right?

README.md Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/cmdline/Command.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/InputSource.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/InputSource.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/main.kt Outdated Show resolved Hide resolved
pig/test/org/partiql/pig/domain/Util.kt Outdated Show resolved Hide resolved
Comment on lines 39 to 48
/** A minimal faux file system backed by a Map<String, String>. Used only for testing. */
class StringSource(val sources: Map<String, String>) : InputSource {
override fun openStream(sourceName: String): InputStream {
val text: String = sources[sourceName] ?: throw FileNotFoundException("$sourceName does not exist")

return ByteArrayInputStream(text.toByteArray(Charsets.UTF_8))
}

override fun getCanonicalName(sourceName: String): String = sourceName
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example, was the one that intuited to me that you had the beginnings of a Resolver style API, but a sub-optimal abstraction. The file names are not interesting, the type names are and I think the import should be based on that, otherwise you have something more like include that has this extra layer of "file" that the user shouldn't really care about.

Abstracting in this way does offer a challenge for PIG. What file gets parsed when I say (import Foo), but that is something worth considering. Otherwise I would add the concept of module which is a collection of type universes and (import foo) maps to this module which has a logical file name format if on a file system. This conceptual model is similar to Python modules and Java classes (except for the notion of one class per class file).

@dlurton
Copy link
Member Author

dlurton commented May 24, 2021

Overall, I think you should really be explicit about the notion of what import is importing. It reads more like include with guards from a C/C++ context than what I would normally think of as import from a Java/Python perspective, where you are importing something logical.

I specifically chose to avoid the term include here because that it is already used as part of the domain permutation feature. The next logical one is import, IMHO, however I would probably be okay with include_file so as to avoid any ambiguity with the include of permute_domain.

I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include or the like to keep our options open for a logical import later.

I don't believe PIG yet needs modules or the ability to import a single named named type domain from another file. For that, I would like to take the "wait until it's needed" approach. The main goal here is to give the user the ability to permute a domain contained in another file, and I don't think something as complex as modules are needed for that.

What is the relationship between types imported in one universe versus another? I think the answer is no relationship, right?

No direct relationship. However, any two domains (imported or otherwise) with a declared transform (transform <from-type-domain> <to-type-domain>) will get a generated VisitorTransform with abstract functions for the types that are different. That VisitorTransform will contain references to both type domains.

@almann
Copy link

almann commented May 24, 2021

I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include or the like to keep our options open for a logical import later.

I don't believe PIG yet needs modules or the ability to import a single named named type domain from another file. For that, I would like to take the "wait until it's needed" approach. The main goal here is to give the user the ability to permute a domain contained in another file, and I don't think something as complex as modules are needed for that.

I am fine with reserving the import term for the logical one, because I do think that is probably the more interesting thing to do for PIG in the future. include_file or the like is much clearer to me about what is going on, literally macro expand the thing and parse it into the type universe in question, it eliminates the notion of the relationship of the definition with other type universes (which is an oxymoron anyhow), and just says copy it into our translation unit (in which a single translation unit defines a type universe).

Copy link
Contributor

@abhikuhikar abhikuhikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My major concerns are the following:

  1. I think the way imports are resolved is not correct especially with the cyclic imports. The current implementation in my opinion doesn't really handle cyclic dependency imports.

  2. How are you handling the naming clashes (if any) from the multiple imports. (I may have missed this in the review in case this is handled).

README.md Outdated
stmt ::= <definition> | <transform>
stmt ::= <definition> | <transform> | <include>

import ::= `(import <path-to-include)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<path-to-import>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path-to-file actually.

pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some nits and a few minor suggestions. Also, if we're sticking with the term include rather than import, should we also change the usages in the PR title, description, branches, and/or issue?

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/test-domains/main.ion Outdated Show resolved Hide resolved
pig/test/org/partiql/pig/util/ParseHelpers.kt Outdated Show resolved Hide resolved
pig/test/org/partiql/pig/util/ParseHelpers.kt Outdated Show resolved Hide resolved
@dlurton dlurton changed the title Add import statement Add import_file statement Jun 15, 2021
@dlurton dlurton changed the title Add import_file statement Add include_file statement Jun 15, 2021
@dlurton
Copy link
Member Author

dlurton commented Jun 15, 2021

I think we can s/import/include/g most every place but I don't think it's necessary to do so in the branch names. Those are temporary.

@dlurton dlurton mentioned this pull request Jun 15, 2021
@dlurton dlurton requested a review from alancai98 June 15, 2021 23:54
Copy link

@almann almann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we talked about a bunch of this offline so some of the comments below are a little out of date, but here are my top-level suggestions (and I am also okay if you cut issues to address them and follow up unless you disagree).

InputStreamSource really should encapsulate the resolution of domain files. By that I mean that it probably should model the API that resolves some name relative to some other name. Right now that logic is baked into the parser.

Overall, I am okay with relative import resolution, but I think we need absolute paths too, and I think the concept of root should be made explicit meaning that I should not be able to include a file arbitrarily in the path as that will not be portable (I don't think we ever want (include_file "/usr/share/foo/bar.ion") referring to the root file system for example). This will be super important for composing different sets of universe files together (e.g. I want to include partiql_ast.ion). Right now I cannot address partiql_ast.ion without some relative path that implies I need to copy it into my root to have a chance of referring to it in a sane way. It also makes referring to that file at different levels of nesting require a different import, which we should have the flexibility to avoid.

This root concept could further support multiple roots (which is similar to classpaths and -I in GCC), this could make it easier to support distributing Pig domains as resource files in JARs and composing them in a more flexible way (again, not suggesting you do that now, just suggesting that the concept will allow us to do so without convoluted extraction and layout-fu in build logic.

Furthermore, I think paths are logical, can never address outside of the root (or roots), and always use forward slash. This simplifies the syntax and avoids portability concerns with file paths in different operating systems. Unlike my comments below, I do think you've convinced me that relative paths make sense, but I think composing different roots will make logical absolute paths make more sense (e.g. (include_file "/partiql_ast.ion") referring to a top-level file addressable from one of the roots).

README.md Outdated
Comment on lines 421 to 422
Paths are always relative to the directory containing the current file. The working directory at the time `pig`
is executed is irrelevant.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting choice and seems needlessly complex--shouldn't it be relative to some import path (which could contain the directory of the file in question)? This could lead to some really confusing include paths with nested folder structures...

Comment on lines 22 to 29
/**
* A simple abstraction for opening [InputStream], allows for type domains to be loaded from a file system or an
* in-memory data structure. The latter is useful for unit testing.
*/
internal interface InputStreamSource {
/** Opens an input stream for the given filename name. */
fun openInputStream(fileName: String): InputStream
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the path/file name implementation defined? E.g. \ works on Windows, etc.?

Also, if this is relative to a given file, doesn't that imply we need a parameter here to model the "working directory?"

Comment on lines 32 to 34
internal val FILE_SYSTEM_INPUT_STREAM_SOURCE = object : InputStreamSource {
override fun openInputStream(fileName: String) = FileInputStream(fileName)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation defines the file name in terms of current working directory of the JVM, does it not? (versus the file being operated on as your docs would imply)

Comment on lines 121 to 125
requireArityForTag(sexp, 1)
val relativePath = sexp.tail.single().asString().textValue

val workingDirectory = File(this.qualifedSourceStack.peek()).parentFile
val workingDirectory = File(this.inputFilePathStack.peek()).parentFile
val qualifiedSourcePath = File(workingDirectory, relativePath).canonicalPath
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this seems like overkill to me--and interestingly, you are relying on the parser to resolve paths, which seems like it should be the responsibility of the input stream resolver.

README.md Show resolved Hide resolved
pig/src/org/partiql/pig/domain/include/IncludeResolver.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/include/IncludeResolver.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt Outdated Show resolved Hide resolved
pig/src/org/partiql/pig/generator/custom/generator.kt Outdated Show resolved Hide resolved
assertEquals(tc.expectedConsdieredPaths, ex.consideredFilePaths)
}

// TODO: include tests for InvalidIncludePathException.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a TODO for this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I missed that. only one test was needed.

Comment on lines +434 to +435
Note that paths starting with `/` do not actually refer to the root of any file system, but instead are treated as
relative to the include directories.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix / here is basically a sigil modifying the include.

There are two slightly distinct types of inclusion:

  • unsigiled includes
    • include the includee relative to the 1) includer or 2) "main" or 3) any -I path
    • these serve to allow to break a model into more manageable bits
  • sigiled includes:
    • include the includee relative to any -I path
    • these feel more like referencing an external model

I see from the previous discussions on this issue that a formal idea of an import has been deferred, but I wonder if at least we should consider a sigil that is not as overloaded as /.

Copy link
Contributor

@abhikuhikar abhikuhikar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, typos and comments. One major question/concern - How are we dealing with the duplicate domain names across multiple includee files?

README.md Show resolved Hide resolved

- The directory containing the includer (if the path to the includee does not start with `/`)
- The directory containing the "main" type universe that was passed to `pig` on the command-line.
- Any directories specified with the `--include` or `-I` arguments, in the order they were specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to add a point here explicitly specifying the behavior of / i.e. all other directories are searched except the parent directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was covered on line 428:

  • ... (if the path to the includee does not start with /)

What can I do to make this clearer?

pig/src/org/partiql/pig/domain/parser/SourceLocation.kt Outdated Show resolved Hide resolved
pig/test-domains/main.ion Outdated Show resolved Hide resolved
pig/test/org/partiql/pig/cmdline/CommandLineParserTests.kt Outdated Show resolved Hide resolved
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing back a comment from Abhilash's review and an offline team discussion. Rest of the changes since the last revision LGTM

@Test
fun `include happy case`() {
val universe = parseWithTestRoots("test-domains/main.ion")
val allDomains = universe.statements.filterIsInstance<TypeDomain>()
Copy link
Member

@alancai98 alancai98 Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was brought up offline in a team discussion and in Abhilash's review. What happens in the case where an included domain has the same name as a previously defined domain?

For example, if universe_b.ion had defined domain_a again and not domain_b?

// in test-domains/root_b/dir_z/universe_b.ion
// defining `domain_a` (which has already been defined in `universe_a.ion`) rather than `domain_b`
(define domain_a (domain (product foo)))

Would this be an error or overridden? If it's an error, would this be thrown by the IncludeResolver? I think a test and/or a comment should be added detailing this behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when I do the change (i.e. make domain_b -> domain_a in universe_b.ion), no error is given for a duplicated domain name.

Copy link
Member Author

@dlurton dlurton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when I do the change (i.e. make domain_b -> domain_a in universe_b.ion), no error is given for a duplicated domain name.

FYI that's because the component detects that error (TypeDomainSemanticChecker) isn't exercised by the tests that load those domains.

The component is tested under that scenario here. Since TypeDomainSemanticChecker does not see any include_file statement and it appears to TypeDomainSemanticChecker as if all of the domains were all part of the same file, there was no need to modify it as part of this feature.

However I do agree there's a small possibility that some refactoring might occur in the future which might break that ability so I'll make sure to add a test to ensure this semantic check works across files.

pig/test/org/partiql/pig/cmdline/CommandLineParserTests.kt Outdated Show resolved Hide resolved
assertEquals(tc.expectedConsdieredPaths, ex.consideredFilePaths)
}

// TODO: include tests for InvalidIncludePathException.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I missed that. only one test was needed.

@dlurton dlurton merged commit e9f6dbd into imports Aug 13, 2021
@dlurton dlurton deleted the imports-phase-1 branch August 31, 2021 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants