Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove different behaviour when single input #1902

Merged
merged 1 commit into from
May 7, 2024

Conversation

olorin37
Copy link
Contributor

Addresses #1901. Proposition.

It could be so simple, but it seems that cli tests rely on specific behaviour for single file input. I'll think about it later, cause it requires grasping the way tests are made... What do you think? Is it good direction?

@vkleen
Copy link
Contributor

vkleen commented Apr 30, 2024

Personally, I like the approach. The extra indirection through an import should be entirely inconsequential. I don't see a good reason why tests couldn't be made to work with this either, but I haven't given it much thought.

@olorin37
Copy link
Contributor Author

Surely tests can be aligned, but I had to lest the issue for other duties, for a while.

@olorin37 olorin37 force-pushed the 1901-treat-all-inputs-the-same branch from 0d4b06f to 18ca2ed Compare May 1, 2024 22:28
@olorin37
Copy link
Contributor Author

olorin37 commented May 1, 2024

The main problem still remains. Snapshots for cli tests now contains import "FILE" instead actual content...

Some explicit occurences of new_from_file has been removed.

@olorin37 olorin37 force-pushed the 1901-treat-all-inputs-the-same branch from 18ca2ed to 59e6c28 Compare May 1, 2024 22:58
@yannham
Copy link
Member

yannham commented May 2, 2024

Thanks for tackling this, @olorin37. While it sounded like a quick and efficient solution initially, given the tests and the small annoyances coming from having this additional import indirection, I wonder if we shouldn't take a different path.

Here is the crux: when importing a file, the import resolver (the cache) parse the file according to its extension. So it selects a format based on the file name. While it's a bit fragile, it's currently still very useful and does the job. However, when loading a file without an import, the cache always parses it as Nickel.

I think it doesn't make a lot of sense: if we automatically use the file extension to drive how imported files are parsed, we should probably do the same for the main file as well. I believe it mostly amounts to get rid of all the xxx_multi methods in cache::Cache, or rather, to get rid of the non-multi variants and rename all xxx_multi to just xxx. As an example: get rid of parse and rename parse_multi to just parse. Then fix the callsites that will missing a new additional ImportFormat parameter by providing InputFormat::Nickel.

In a second step, maybe we can add a parse_auto as well, that will have the same signature as the current parse but will determine the format based on the file extension, and default to Nickel. This is exactly what Cache::resolve (used for import resolution) is doing, so this just amounts to extract a small part of it and putting in this new parse_auto function.

The last step is to use this new parse_auto inside cache::Prepare instead of parse, here precisely:

if let CacheOp::Done(_) = self.parse(file_id)? {

Overall this change doesn't require any new logic, but just re-organizing the cache::Cache methods. @olorin37, do you feel like tackling this? Of course I understand that you might not have the time or the additional motivation 🙂

@olorin37
Copy link
Contributor Author

olorin37 commented May 2, 2024

Surly I'll try to grasp it, but I am not sure if I'll manage, cause of time:) If not I'll inform you... but first I'll try to do it.

@yannham
Copy link
Member

yannham commented May 2, 2024

Great! If you think that would help at some point, we can even schedule a little pair programming session, as it seems we're in the same timezone. Let me know (e.g. on Discord)!

@olorin37
Copy link
Contributor Author

olorin37 commented May 2, 2024

After the re-organization I got exactly the same problems:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Snapshot Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Snapshot file: cli/tests/snapshot/snapshots/snapshot__pprint-ast_stdout_simple_record.ncl.snap
Snapshot: pprint-ast_stdout_simple_record.ncl
Source: cli/tests/snapshot/main.rs:59
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Expression: out
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
-old snapshot
+new results
────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    0       │-{
    1       │-  aaaaa | Number | default = 1,
    2       │-  bbbbb
    3       │-    : String
    4       │-    | force
    5       │-    = "some long string that goes past the 80 character line limit for pretty printing",
    6       │-  ccccc : { x : Number, y : Number } = { x = 999.8979, y = 500, },
    7       │-  ddddd
    8       │-    | Array std.string.NonEmpty
    9       │-    = [ "a", "list", "of", "non", "empty", "strings" ],
   10       │-}
          0 │+import "[INPUTS_PATH]/pretty/simple_record.ncl"
────────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

It seems, first layer of imports should be somehow resolved.

@yannham
Copy link
Member

yannham commented May 3, 2024

@olorin37 I think you're missing one step, but you're not far. Sorry if that wasn't clear, but my suggestion of going the "get rid of xxx_multi" road is a different direction that your initial take. The idea would thus be to restore the new_from_file specific case for single files, that you originally got rid of. Finally, we need to determine the format from the path automatically in cache::Cache::prepare. More concretely:

  1. Revert 59e6c28

  2. Add this helper method to cache::InputFormat, next to from_path:

    /// Returns an [InputFormat] based on the file extension of a source path.
    pub fn from_source_path(source_path: &SourcePath) -> Option<InputFormat> {
      if let SourcePath::Path(p) = source_path {
        Self::from_path(p)
      }
      else {
        None
      }
    }

    (I didn't check this code, so it might need a few tweaks to make the compiler happy, but to get the idea)

  3. Then, we can finally use that from within cache::Cache::prepare: replace the following line

    if let CacheOp::Done(_) = self.parse(file_id)? {

    by something like:

    let input_format = self.file_paths.get(file_id).and_then(|source_path InputFormat::from_source_path(source_path)).unwrap_or_default();
    if let CacheOp::Done(_) = self.parse(file_id, input_format)? { 

This should let the cache select the right format automatically now, even for a single program.

I think in the long term, we might want to also do your initial simplification - the two aren't mutually exclusive - but we then need to fix the pretty printer, which is making your tests fail. Some tests are checking that nickel pprint-ast - which parse a file and pretty-print the parsed ast - behave correctly, but now it only sees the root import instead of the actual program. In the meantime, this approach - which is also simplifying the code and make the input format parameter passing more explicit - is probably the fastest way to get there.

@olorin37
Copy link
Contributor Author

olorin37 commented May 3, 2024

Right, I missed that you proposal is an alternative to mine. And, yes I actually do what you proposed without understanding how it would help those snapshot tests (with pretty print). So, I'll follow your idea and tips.

"Less is more" removing this one unnecessary case from my initial proposition should be done some day I think, unless it does not complicates things even more 🤔

@olorin37 olorin37 force-pushed the 1901-treat-all-inputs-the-same branch from 1018c7b to 0726888 Compare May 3, 2024 21:39
@olorin37
Copy link
Contributor Author

olorin37 commented May 3, 2024

@yannham Thanks for guidance, now it is working. Please, make an review, anything to improve (with this "fast-path" approach)?

@olorin37 olorin37 force-pushed the 1901-treat-all-inputs-the-same branch 3 times, most recently from 80f7916 to 4298d8f Compare May 5, 2024 09:30
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse_auto part isn't mandatory, and we can do that in a subsequent PR. However I think we should fix the import issue in ncl_bench_group before proceeding with this PR.

@@ -194,7 +194,7 @@ macro_rules! ncl_bench_group {
.unwrap()
.transformed_term;
if bench.eval_mode == $crate::bench::EvalMode::TypeCheck {
cache.parse(id).unwrap();
cache.parse(id, InputFormat::Nickel).unwrap();
Copy link
Member

@yannham yannham May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should either:

  1. Use a fully qualified name for InputFormat, that is nickel_lang_core::cache::InputFormat
  2. Import it a few line above, inside the macro, and keep it as InputFormat

But in any case, the consumers/callers of ncl_bench_group should (Edit: + NOT) have to import it themselves (as you had to do in various bench files). This leads to code duplication, but more importantly, callers of nlc_bench_group have to magically guess that they need to do this import (or wait for a compilation error), while the macro should - and can - be self-contained.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: I meant should NOT have to import it themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, right now I see. I missed that the code I edited was inside a macro, sure it's a mistake. So, I'll import it in macro, and get rid of those imports from benches:)

@@ -1,6 +1,7 @@
use std::rc::Rc;

use criterion::{criterion_main, Criterion};
use nickel_lang_core::cache::InputFormat;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(See comment in ncl_bench_group, but example of import that we shouldn't have to do)

@@ -312,7 +312,7 @@ impl<EC: EvalCache> Program<EC> {
pub fn parse(&mut self) -> Result<RichTerm, Error> {
self.vm
.import_resolver_mut()
.parse(self.main_id)
.parse(self.main_id, InputFormat::Nickel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if those shouldn't guess the format as well, instead of InputFormat::Nickel. That's why it could be handy to have a parse_auto (the name isn't great, so feel free to use another if you find some!). However, this one can also be done in a subsequent PR, as the current code at least fixes the original issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_infer_format? less concise, but more descriptive ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olorin37 right, it's a reasonable suggestion and better than parse_auto, so go for it 🙂

self.vm.import_resolver_mut().parse(self.main_id)?;
self.vm
.import_resolver_mut()
.parse(self.main_id, InputFormat::Nickel)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, I think any parse from within Program methods should probably try to guess the format, to have the same behavior as prepare.

@olorin37 olorin37 force-pushed the 1901-treat-all-inputs-the-same branch from 4298d8f to 01e63c2 Compare May 6, 2024 19:43
Guess input format based on extension similarly like in multiple input case.
@olorin37 olorin37 force-pushed the 1901-treat-all-inputs-the-same branch from 01e63c2 to c6c6f18 Compare May 6, 2024 20:21
Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! We've settled (in chat conversations) on implementing this new parse_guess_format or whatever in a follow-up PR, and not in this one, to get the ball rolling.

@yannham yannham enabled auto-merge May 7, 2024 08:10
@yannham yannham added this pull request to the merge queue May 7, 2024
Merged via the queue into tweag:master with commit aca39ef May 7, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants