-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix line ending bug #90
base: main
Are you sure you want to change the base?
Conversation
@jonasfj
|
I'm down for taking a look at this. I saw you had nice comments explaining some of the methods. That's awesome. Feel free to give some review hints in the PR description :) Also if any of these changes could be done independently, please do consider submitting smaller independent PRs. It's much easier to review, which often leads to faster turnaround time. |
2555b32
to
c7aec85
Compare
I've tried to split the 22 commits over 3 PRs based on the changes I made. |
lib/src/utils.dart
Outdated
/// [YamlList] or a top-level [YamlScalar] of the [yaml] string provided. | ||
/// | ||
/// [currentEndOffset] represents the end offset of [YamlScalar] or [YamlList] | ||
/// or [YamlMap] being replaced, that is, `end + 1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably better illustrated with an example.
/// ```dart
/// final yaml = 'my_key: "replaced-value"\n';
/// // ^--- currentEndOffset points to the newline
/// fimal currentEndOffset = 24;
/// ```
ofcourse, I might have counted wrong here 🤣
Also does this mean that currentEndOffset
can be greater than the length of the yaml
document? (currentEndOffset == yaml.length
, so yaml[currentEndOffset]
might throw)
Example: k: v
will have currentEndOffset: 4
(if I'm replacing v
, right).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, if I'm asking dumb questions here, but it might be worth making some examples.
And I think we should make sure we're doing it right, if we get he wrong invariants we'll have a lot of bugs to fix :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also does this mean that
currentEndOffset
can be greater than the length of theyaml
document? (currentEndOffset == yaml.length
, soyaml[currentEndOffset]
might throw)
currentEndOffset
shouldn't be greater than yaml.length
when passed in as an argument. May be I should throw if it is?
Example:
k: v
will havecurrentEndOffset: 4
(if I'm replacingv
, right).
I think 3
since it's always zero-based.
Sorry, if I'm asking dumb questions here, but it might be worth making some examples.
And I think we should make sure we're doing it right, if we get he wrong invariants we'll have a lot of bugs to fix :D
Sure, will add examples.
lib/src/utils.dart
Outdated
/// Normalizes an encoded [YamlNode] encoded as a string by pruning any | ||
/// dangling line-breaks. | ||
/// | ||
/// This function checks the last `YamlNode` of the [update] that is a | ||
/// `YamlScalar` and removes any unwanted line-break within the | ||
/// [updateAsString]. | ||
/// | ||
/// This is achieved by obtaining the chunk of the [yaml] that is after the | ||
/// current node being replaced using its [nodeToReplaceEndOffset]. If: | ||
/// 1. The chunk has any trailing line-break then the it is left untouched. | ||
/// 2. The node being replaced with [update] is not the last node, then it | ||
/// is left untouched. | ||
/// 3. The terminal node in [update] is a `YamlScalar`, that is, | ||
/// the last [YamlNode] within the [update] that is not a collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding some example.
Also why is it that we don't do this normalization when we create the updateAsString
? Wouldn't that be more correct -- I don't know -- I probably don't understand all the context here, that's why I'm asking :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I’ll add an example.
I do that because yamlEncodeBlock
has no context of which method called it or the current structure of the yaml document. It could be:
updateInList
or_appendToBlockList
or_insertInBlockList
for lists_replaceInBlockMap
or_addToBlockMap
for block mapsYaml.update
for top levelYamlNode
.
Thus, the callers should make a call to the function to prune the trailing line break because they have an idea of the yaml's structure.
Additionally, I found it quite advantageous on our part to try and include line breaks for an existing YamlNode
being replaced to edits. This reduces bugs and removes the need to try and guess where the next line break is.
It’s better to include it and check later if it was there just before suggesting an edit using a definite endOffset
Feel free to mark comments as "Resolved" once you've:
I think it would be useful to add some examples in the code. Especially for |
> Update function doc and inline comments > Make function return named record > Prefer named parameters > Refactor existing code referencing function
I've added commits I had moved to the other PRs. Without the changes, the The existing code was only scanning for the first instance of # Existing code works great here for edits
- valueToRemove # Comment with hyphen (-)
- nextValue
# Existing code fails horribly here for edits
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
- nextValue
I realized we can do it
|
/// Returns the `endOffset` of the last comment extracted that is `end + 1` | ||
/// and a `List<String> comments`. It is recommended (but not necessary) that | ||
/// the caller checks the `endOffset` is still within the bounds of the [yaml]. | ||
({int endOffset, List<String> comments}) skipAndExtractCommentsInBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give me a few examples of how this works.
Or maybe a few unit tests, even if we don't keep them, I'd like to understand a bit more what the intention here is.
It seems like it takes a string slice, and returns:
- Comments from within the slice (not sure why)
- Offset where the next non-comment thing in the slice begins, or ends? (or did I misunderstand that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it takes a string slice, and returns:
- Comments from within the slice (not sure why)
- Offset where the next non-comment thing in the slice begins, or ends? (or did I misunderstand that?)
Yes. It returns where the non-comment starts but with a nuanced approach based on the bool
argument you pass into the greedy
parameter.
Could you give me a few examples of how this works.
Yeah, sure. See the examples below.
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
- nextValue
greedy
astrue
returns theoffset
of the-
indicating the start of the next value in the list.greedy
asfalse
returns theoffset
of the last\n
(line-break) + 1 which is just after the last closing bracket)
Or maybe a few unit tests, even if we don't keep them, I'd like to understand a bit more what the intention here is.
A nice example even without the unit tests would be an element in a list or map with multi-line comments being removed. For example:
Performing YamlEditor(yaml).remove([0]);
on the string below:
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
- nextValue
# Expected output without this PR
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
- nextValue
# Expected output with PR
- nextValue
The offending code can be found here which is similar to _removeFromBlockMap
:
yaml_edit/lib/src/list_mutations.dart
Lines 343 to 374 in 35f4248
if (start > 0) { | |
final lastHyphen = yaml.lastIndexOf('-', start - 1); | |
final lastNewLine = yaml.lastIndexOf('\n', start - 1); | |
if (lastHyphen > lastNewLine) { | |
start = lastHyphen + 2; | |
/// If there is a `-` before the node, we need to check if we have | |
/// to update the indentation of the next node. | |
if (index < list.length - 1) { | |
/// Since [end] is currently set to the next new line after the current | |
/// node, check if we see a possible comment first, or a hyphen first. | |
/// Note that no actual content can appear here. | |
/// | |
/// We check this way because the start of a span in a block list is | |
/// the start of its value, and checking from the back leaves us | |
/// easily confused if there are comments that have dashes in them. | |
final nextHash = yaml.indexOf('#', end); | |
final nextHyphen = yaml.indexOf('-', end); | |
final nextNewLine = yaml.indexOf('\n', end); | |
/// If [end] is on the same line as the hyphen of the next node | |
if ((nextHash == -1 || nextHyphen < nextHash) && | |
nextHyphen < nextNewLine) { | |
end = nextHyphen; | |
} | |
} | |
} else if (lastNewLine > lastHyphen) { | |
start = lastNewLine + 1; | |
} | |
} | |
return SourceEdit(start, end - start, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: These thoughts are not fully conceived, I'm still trying to understand.
So if the input for YamlEditor(yaml).remove([0])
is:
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
# nextValue is awesome:
- nextValue
Then with this PR we'd get:
- nextValue
Right?
I'm just highlighting this example, because detecting that a comment is spanning multiple lines is difficult.
It's kind of obvious to us humans that the comment # nextValue is awesome:
is associated with - nextValue
and that all the comments with equal indentation are related to - valueToRemove
.
But if we tried to make a heuristic to detect this, such heuristic would fail if comments aren't perfectly aligned 🙈 I can see no end of corner cases 🤣
So maybe we have to ask, when faced with YamlEditor(yaml).remove([0])
on:
# A
# B
- valueToRemove # C
# D
# E
# F
# G
- nextValue # H
# I
What comments do we wish to remove / preserve?
At the top of my head, I'm thinking:
H
andI
should definition be preserved.- Removing
C
seems reasonable. - Do we want to treat
A
andB
the same way?- Would we want to detect that
A
andB
are different based on indentation?
- Would we want to detect that
- Do we want to treat
D
,E
,F
andG
the same way?- Would we want to detect that
D
andE
are not the same based on indentation? - Would we want to detect that
E
andF
are not the same based on empty line? - Would we want to detect that
F
andG
are not the same based on indentation?
- Would we want to detect that
I'm suspecting that maybe it's simplest to say that:
A
andB
are treated the same way: we preserve them.D
,E
,F
andG
are treated the same way: we preserve them.
It could be that it's best to say retain comments, unless they are inside the value we're removing (or on the same line). We risk leaving some half gabled comments, but which is worse:
- Removing too many comments?
- Retaining half a comment related to a value that's been removed?
If we make heuristics for all this to be more accurate, don't we risk things becoming extremely complicated? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right?
Yes.
Funnily enough, I totally understand your point of view 😄. That’s how my head was spinning before I wrote skipAndExtractComments
.
However, I reduced this PR’s scope to actually do what YamlEditor
currently WANTS to do while ensuring we can have folded
/literal
strings. This is still a PoC.
From the PR, you can see I didn’t change how YamlEditor
works just nudged it to do what existing code wants to do.
Personally, I don’t see it as complexity but rather as a challenge to live up to the package’s description the RIGHT WAY since if we provide a way to preserve comments, wouldn’t it be great to at least provide a subjective way to remove them?
If you are open to it, I can come up with an issue by the end of the week with my view and where we can both float ideas on how to tackle this and once we settle on something I will submit a PR. If not, then please indicate what we can keep from this PR and what not to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, then please indicate what we can keep from this PR and what not to keep.
I'll have to admit that I'm not entirely sure what this PR does 🙈
Also that we have things like skipAndExtractCommentsInBlock
but we only use endOffset
from it, scares me a bit. It might actually makes sense, if I understood what the future plans for skipAndExtractCommentsInBlock
was 🤣
If you are open to it, I can come up with an issue by the end of the week with my view and where we can both float ideas on how to tackle this and once we settle on something I will submit a PR.
I think that's a good idea!
Let's outline:
- What problems we're trying to solve.
- What the limitations to our approach will be.
- What will the downsides be? (if we're doing heuristics, there will be scenarios where it works less than perfect, what do those scenarios look like).
once we settle on something I will submit a PR.
Small PRs are MUCH easier to land!
If we can't do it all in one PR, maybe we should create an intermediate branch, try to land small PRs there and then eventually merge the branch into main
.
There isn't a whole lot of development on main
, so it's not like there'll be a lot of merge conflicts 🤣 (benefits of working in a small repository).
Of course, there is a non-trival risk that we get stuck somewhere along the way don't land anything.
But if we try this route we could perhaps consider writing a test cases upfront.
Maybe we could extend the setup in test/testdata to support more kinds of tests, and maybe also support tests that are known to fail now, but that we want to fix.
Sometimes it's a lot easier to discuss things through examples.
Personally, I don’t see it as complexity but rather as a challenge to live up to the package’s description the RIGHT WAY since if we provide a way to preserve comments, wouldn’t it be great to at least provide a subjective way to remove them?
Yes and no. Also scratching my head and wondering how ambitious a package description we wrote 🙈
Some background, this package started as part of GSoC, the original aim was to enable us to create a dart pub add <package>
command.
Since then it's also be used for dart pub upgrade --major-versions
, dart pub upgrade --tighten
, and a few other things left and right.
We needed some way to preserve comments, because while JSON is a subset of YAML, I don't think anyone would like to use dart pub add
if it parsed your pubspec.yaml
added the package and then saved pubspec.yaml
formatted as JSON. As people frequently have comments in their pubspec.yaml
.
But we also don't really care to be perfectionist. Ideally, we just want to not cause comments and whitespace from parts of the YAML file unrelated to the modification we're making from being changed. But if a comment next to the thing we're changing becomes garbled or whitespace changes a bit, then perhaps that's okay'ish.
In theory I wouldn't object to this package having support for reading, stripping, modifying and removing comments. In theory I also wouldn't mind this package using heuristics to guess whether a comment belongs to a value being removed or not.
In practice this package is used to make small changes to YAML configuration files. If we don't limit the scope of the package, then it'll have an endless stream of bugs and feature requests.
It might be better declare that some behavior or feature is out of scope, than it is to have a buggy implementation. It's not like we have lot resources to invest in the maintenance of this package.
IMO, we should aim for (in order of priority):
- (1) Reliably produce:
- Valid YAML
- A semantically correct mutation
(_performEdit ensures this, by parsing and comparing the result, throwing if there are errors)
- (2) Avoid internal errors
- Bugs
- Inconsistencies caught by
_performEdit
- (3) Stable API and predictable behavior.
- (4) Preserve comments and whitespace in parts of the YAML document unrelated to the mutation.
- (5) Reasonable heuristics for preservation or removal of comments and whitespace in part of the YAML document affected by the mutation.
I think that when adding features or tweaking (5) we should consider what implication it might have for maintainability and remember that the objective is to modify small configuration files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to admit that I'm not entirely sure what this PR does 🙈
-
We can now encode strings with a trailing line-break as
ScalarStyle.FOLDED
orScalarStyle.LITERAL
instead of falling back to double quotes. See: -
Because we can do
1
above, it comes with a cost. The edit we suggest may have a line-break already since we ensure we preserve any dangling line-breaks after callingnormalizeEncodedBlock
which:- Preserves line-breaks for
ScalarStyle.FOLDED
orScalarStyle.LITERAL
or if the value of theYamlScalar
itself has a line-break - Checks if the node being replaced had a line-break and preserves it in the new encoded block
- Preserves line-breaks for
The cost is to ensure we include any existing line-break in the edit without trying to introduce any breaking behavior to the current YamlEditor
functionality.
From my previous comment, YamlEditor
CURRENTLY wants to get rid of inline comments in the terminal node but does it poorly causing issues with edits with some of the tests actually enabling the bug. (See some of the failing tests when this PR is run. I’ll highlight them on request)
Thus, why I added skipAndExtractComments
which skips the inline comments if that’s what YamlEditor
wanted to do.
I must admit, maybe the function is a bit overzealous and returns the comments it skipped but I thought why not.
Also, this PR was just a PoC. Nothing more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kekavc24 how about you start an issue outline what we want to do.
And we try to reach consensus on that first.
I think perhaps it's best to try write tests and make small changes (many small PRs are infinitely better, than large ones). It's more work to split logic into separate PRs. But the challenge is also getting there without introducing bugs (actually, it's getting there while convincing maintainers of this package that we're not introducing new bugs).
I debugged #89 further while submitting #91 and noted several issues that are closely coupled:
YamlList
orYamlMap
and never aYamlScalar
This PR:
YamlNode
encoded within a block.normalizeEncodeBlock
function that dynamically prunes the additional line break if:YamlScalar
isn't encoded asScalarStyle.LITERAL
orScalarStyle.FOLDED
. Also fixes Literal and folded strings cannot handle "\n \n" at the end of a string #89skipAndExtractComments
function which greedily skips any comments and whitespace belonging to theYamlNode
being replaced.Further changes made in #93
Contribution guidelines:
dart format
.Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.