Fix and improve `Chain`'s index offsetting logic #22

timvermeulen · 2020-10-13T16:57:21Z

Chain's index offsetting logic is untested and contains a few bugs, all these assertions fail:

let chain = "abc".chained(with: "XYZ")

do {
  let i = chain.index(chain.startIndex, offsetBy: 3, limitedBy: chain.startIndex)
  XCTAssertNil(i)
}

do {
  let i = chain.index(chain.startIndex, offsetBy: 4)
  let j = chain.index(i, offsetBy: -2)
  XCTAssertEqual(chain[j], "c")
}

do {
  let i = chain.index(chain.startIndex, offsetBy: 3)
  let j = chain.index(i, offsetBy: -1, limitedBy: i)
  XCTAssertNil(j)
}

I've put the fixes for these in a separate commit for reference.

Another issue with the current implementation is that chain.index(i, offsetBy: n) (with i pointing into base1) always computes the distance from i to base1.endIndex, regardless of n. As a result, this code traverses the entire string:

let chain = someVeryLongString().chained(with: [])
let n = someVerySmallNumber()
let _ = chain.index(chain.startIndex, offsetBy: n)

This PR fixes this by first calling base1.index(i, offsetBy: n, limitedBy: base1.endIndex) before potentially computing the distance to base1.endIndex.

This does still mean that when crossing the boundary between base1 and base2, the suffix of base1 is traversed twice. If we could determine whether or not Base1: RandomAccessCollection then we could conditionally replace this logic by repeated base1.index(after:) calls, but I'm not sure if that's possible...

Checklist

I've added at least one test that validates that my change is working, if appropriate
I've followed the code style of the rest of the project
I've read the Contribution Guidelines
I've updated the documentation if necessary

natecook1000

Looks great, @timvermeulen — thank you! 👏

natecook1000 · 2020-10-13T19:38:43Z

Sources/Algorithms/Chain.swift

-      let d = base1.distance(from: i, to: base1.endIndex)
-      if n < d {
+      if limit >= i {
+        // `limit` is relevant, so `base2` cannot be reached


Good observation!

natecook1000 · 2020-10-13T19:40:53Z

Sources/Algorithms/Chain.swift

        return base1.index(i, offsetBy: n, limitedBy: limit)
          .map(Index.init(first:))
+      } else if let j = base1.index(i, offsetBy: n, limitedBy: base1.endIndex) {


I wish there was an operation that combined this calculation and finding d in the next branch, but no luck, right?

I agree 100% to the extent that I added the equivalent for Rust's Iterator recently, specifically to speed up Chain 🙂 This would make for a good optimization point if we ever get the chance.

The next best thing would be to write a regular function for this which somehow detects whether the collection supports random-access, is there any hope of that being possible to implement? I assume that info does exist at runtime but I haven't found a way to extract it.

I agree 100% to the extent that I added the equivalent for Rust's Iterator recently, specifically to speed up Chain 🙂 This would make for a good optimization point if we ever get the chance.

Exactly. I think the only place we have anything like that is in the innards of capturing a sequence in an array, where we return the buffer along with in-progress iterator.

The next best thing would be to write a regular function for this which somehow detects whether the collection supports random-access, is there any hope of that being possible to implement? I assume that info does exist at runtime but I haven't found a way to extract it.

There is a way, but since even non-random-access collections can provide a faster index(_:offsetBy:), I don't think we'd want to go that route…

timvermeulen added 3 commits October 13, 2020 18:22

Add tests

e47a4bd

Fix offsetting logic

75f5f75

Avoid unnecessary computations

b925460

timvermeulen changed the title ~~Fixing and improving Chain's index offsetting logic~~ Fix and improve Chain's index offsetting logic Oct 13, 2020

natecook1000 approved these changes Oct 13, 2020

View reviewed changes

natecook1000 merged commit 2beda36 into apple:main Oct 13, 2020

timvermeulen deleted the chain-index-offset-by branch October 13, 2020 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and improve `Chain`'s index offsetting logic #22

Fix and improve `Chain`'s index offsetting logic #22

timvermeulen commented Oct 13, 2020

natecook1000 left a comment

natecook1000 Oct 13, 2020

natecook1000 Oct 13, 2020

timvermeulen Oct 13, 2020

natecook1000 Oct 13, 2020

Fix and improve Chain's index offsetting logic #22

Fix and improve Chain's index offsetting logic #22

Conversation

timvermeulen commented Oct 13, 2020

Checklist

natecook1000 left a comment

Choose a reason for hiding this comment

natecook1000 Oct 13, 2020

Choose a reason for hiding this comment

natecook1000 Oct 13, 2020

Choose a reason for hiding this comment

timvermeulen Oct 13, 2020

Choose a reason for hiding this comment

natecook1000 Oct 13, 2020

Choose a reason for hiding this comment

Fix and improve `Chain`'s index offsetting logic #22

Fix and improve `Chain`'s index offsetting logic #22