Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and improve Chain's index offsetting logic #22

Merged
merged 3 commits into from
Oct 13, 2020

Conversation

timvermeulen
Copy link
Contributor

Chain's index offsetting logic is untested and contains a few bugs, all these assertions fail:

let chain = "abc".chained(with: "XYZ")

do {
  let i = chain.index(chain.startIndex, offsetBy: 3, limitedBy: chain.startIndex)
  XCTAssertNil(i)
}

do {
  let i = chain.index(chain.startIndex, offsetBy: 4)
  let j = chain.index(i, offsetBy: -2)
  XCTAssertEqual(chain[j], "c")
}

do {
  let i = chain.index(chain.startIndex, offsetBy: 3)
  let j = chain.index(i, offsetBy: -1, limitedBy: i)
  XCTAssertNil(j)
}

I've put the fixes for these in a separate commit for reference.

Another issue with the current implementation is that chain.index(i, offsetBy: n) (with i pointing into base1) always computes the distance from i to base1.endIndex, regardless of n. As a result, this code traverses the entire string:

let chain = someVeryLongString().chained(with: [])
let n = someVerySmallNumber()
let _ = chain.index(chain.startIndex, offsetBy: n)

This PR fixes this by first calling base1.index(i, offsetBy: n, limitedBy: base1.endIndex) before potentially computing the distance to base1.endIndex.

This does still mean that when crossing the boundary between base1 and base2, the suffix of base1 is traversed twice. If we could determine whether or not Base1: RandomAccessCollection then we could conditionally replace this logic by repeated base1.index(after:) calls, but I'm not sure if that's possible...

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

@timvermeulen timvermeulen changed the title Fixing and improving Chain's index offsetting logic Fix and improve Chain's index offsetting logic Oct 13, 2020
Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, @timvermeulen — thank you! 👏

let d = base1.distance(from: i, to: base1.endIndex)
if n < d {
if limit >= i {
// `limit` is relevant, so `base2` cannot be reached
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation!

return base1.index(i, offsetBy: n, limitedBy: limit)
.map(Index.init(first:))
} else if let j = base1.index(i, offsetBy: n, limitedBy: base1.endIndex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was an operation that combined this calculation and finding d in the next branch, but no luck, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100% to the extent that I added the equivalent for Rust's Iterator recently, specifically to speed up Chain 🙂 This would make for a good optimization point if we ever get the chance.

The next best thing would be to write a regular function for this which somehow detects whether the collection supports random-access, is there any hope of that being possible to implement? I assume that info does exist at runtime but I haven't found a way to extract it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree 100% to the extent that I added the equivalent for Rust's Iterator recently, specifically to speed up Chain 🙂 This would make for a good optimization point if we ever get the chance.

Exactly. I think the only place we have anything like that is in the innards of capturing a sequence in an array, where we return the buffer along with in-progress iterator.

The next best thing would be to write a regular function for this which somehow detects whether the collection supports random-access, is there any hope of that being possible to implement? I assume that info does exist at runtime but I haven't found a way to extract it.

There is a way, but since even non-random-access collections can provide a faster index(_:offsetBy:), I don't think we'd want to go that route…

@natecook1000 natecook1000 merged commit 2beda36 into apple:main Oct 13, 2020
@timvermeulen timvermeulen deleted the chain-index-offset-by branch October 13, 2020 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants