Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured COG band subsetting #2706

Closed
echeipesh opened this issue Jun 14, 2018 · 12 comments
Closed

Structured COG band subsetting #2706

echeipesh opened this issue Jun 14, 2018 · 12 comments
Labels
cog layers reading/writing GeoTiff layers enhancement

Comments

@echeipesh
Copy link
Contributor

Current multiband COG readers still produce all of the bands from the GeoTiff.
However, its possible that the user knows that useful information exists only in subset of the bands and wishes to read exclusively those layers. GeoTiffs that use band-interleave naturally lend themselves to this optimization.

The main concerns is how to deal with invalid band selection. Currently you get "all the bands", whatever that is, so there is no opportunity for errors. Given band subset of [1,2,100] its not desirable to fail the whole query if some of the tiffs do not have band 100. Two choices are fill missing bands with NODATA tiles or return Array[Option[Tile]]. The latter is more explicit and more efficient because avoids needing to call .isNoDataTile to verify query results. However its not consistent with current API. Part of the issue here is to think through the implications and possibly establish a new convention for when we querying for band subsets.

Although we are already in 2.0.0-RC1 this should be developed with intent to back-port the addition to 2.1 branch.

@echeipesh echeipesh added the cog layers reading/writing GeoTiff layers label Jun 21, 2018
@jbouffard
Copy link

I personally don't like the first option of returning missing bands with NODATA since that just seems like the reader is failing silently. The other alternative seems okay, but I think that throwing an error might be better. If there's a failure in the reading stage, then that means the user has an incorrect assumption about the data they're working with, which should be known as soon as possible.

Regardless of what we decided to return, though, I feel that this feature should just be temporary to the 2.0 release. In my opinion, it doesn't make sense to have two separate read methods for the ValueReaders where one reads the whole Tile while the other just reads a subset of bands. Rather it should just be a single read method that gives the decided return type. What that type is is something we should discus more here #2732

@moradology
Copy link
Contributor

The main issue Option has, and a cause for the complaints some have of 'maybeitis' in functional code, is that None is not supposed to serve as a stand-in for ⊥: if it can mean exception encountered and value not there - no exceptional state encountered, it is doing a disservice to both cases.

Because this code reaches out into the world and does stuff, exceptions aren't really avoidable. Because we need to call out such exceptional cases as well as represent successful computations which return None, I think the best bet would be for us to wrap code like this up in IO. IO[Array[Option[Tile]]] means that any None cases should be given the full semantic treatment and that we can always get at thrown exceptions in our callsite (IO.attempt converts IO[A] to IO[Either[Throwable, A]])

@pomadchin
Copy link
Member

pomadchin commented Jul 5, 2018

@jbouffard so you think that bandSubset method and the entire tile read should be the same method?

Separate methods were done 1. because of a different signature 2. different semantics. I can see that probably it makes sense just to call a bandSubset function in other read functions.

I also want to add that I'm very happy to see and agree with @moradology suggestion about exceptions handle via IO monad, but it's more a part of #2732. It looks like IO thing won't happen in terms of 2.0 though.

I think that exactly in this method, it looks like potentially not having a band is a correct behaviour. We're selecting some random bands from a tiff and we can get a band (Some) or can get nothing (None), all other exceptions would be thrown (or catched via IO monad attempt).

An alternative example to support Options use cases:
The same story can happen with a raster.crop(extent) function signature, where we can expect to have a non intersecting case, it means that it's fine if we have a non intersecting result, it's correct and no needs in throwing exceptions in this case. The incorrect behaviour can be smth else, not related to this non-intersection case.

@jbouffard
Copy link

@pomadchin Yeah, I think they should be but not right now. I was thinking that once we decide on the new Tile API and the return type for ValueReader we could merge the two together.

I'm still of the opinion that an error should just be thrown when attempting to read a band that doesn't exist. That way the user will know the data they want to work with is different from what they're expecting, and they can decide how they want to handle the error. IO[Either[Throwable, A]] seems like a lot for the user to deal with, and I think that complexity should be in an application that uses the ValueReader.

@pomadchin
Copy link
Member

@jbouffard yea, your idea would work only if this method would fail on any incorrect bands. I don't buy your love to Exceptions :D

@moradology
Copy link
Contributor

moradology commented Jul 5, 2018

Here's how you get/deal with IO[Either[Throwable, A]]:

val someIO: IO[A] = ???
someIO.attempt map {
  case Right(success) => success // what do we do with success?
  case Left(ElementNotFoundException) => ??? // what do we do if the element is not found?
  case Left(_) => ??? // what do we do when there's some error we don't have a special case for?
}

@jbouffard
Copy link

@pomadchin I wouldn't say that I have a love for Exceptions haha, just that I think they have a place in IO. I do like the IO monad, but I just think that it adds more complexity than is needed to the API. By having the method just throw an error, the user will be able to decide how they want to handle things.

@moradology
Copy link
Contributor

moradology commented Jul 5, 2018

For those poor, confused souls, this might be OK:

val someIO: IO[A] = ???
try {
  someIO map { result => ??? }.unsafeRunSync
} catch {
  case MyException(_) => ???
}

@pomadchin
Copy link
Member

pomadchin commented Jul 5, 2018

@jbouffard also from the last @moradology example it's clear how we can get async api for free (we wanted to investigate it #2306).
The next thing is that you can compose effects via IO monad, and not to try catch each particular block, semantics of IO itself mean encoding side effects as pure values, capable of expressing (a)synchronous computations.

It's a big question what is better to throw new Exception (which would always be an unexpected thing) or to add some semantics into the type description.

For sure every time i mention IO it's in terms of our future possible solution. And at this point (in terms of 2.0) throwing exceptions looks not too bad.

@jbouffard
Copy link

@pomadchin Should we close this issue and move the discussion elsewhere?

@pomadchin
Copy link
Member

@jbouffard not sure yet, will figure it out!

@pomadchin
Copy link
Member

Closed here #2775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cog layers reading/writing GeoTiff layers enhancement
Projects
None yet
Development

No branches or pull requests

4 participants