Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRT source filename paths #10

Open
GretaCB opened this issue Jun 23, 2014 · 21 comments
Open

VRT source filename paths #10

GretaCB opened this issue Jun 23, 2014 · 21 comments

Comments

@GretaCB
Copy link
Contributor

GretaCB commented Jun 23, 2014

In reference to this issue I ran into Friday, I'm wondering how this potential filepath issue can best be managed/error handled. Right now, the path is relative to the VRT file:
<SourceFilename relativeToVRT="1">sample.tif</SourceFilename>

@hrwgc , I spoke with @yhahn about this and he mentioned that in the past this was a pain, because you had to always input the absolute path. Any thoughts on a better way to control the source filename in a vrt?

@springmeyer Is there something that can be done on the mapnik/gdal side?

On the mapnik-omnivore side, a potential strategy to handle this possible file error would look something like...

  • User passes in path of vrt file
  • Parse the vrt file for <SourceFilename relativeToVRT="1">... and grab the filename
  • Verify that the source filname exists within the same directory as the vrt file
  • If not, throw proper error
@brandonreavis
Copy link

One thing you might want to look into is the getFileList() function in GDAL.

> var ds2 = gdal.open('test/data/sample.vrt');
> ds.getFileList();
[ 'test/data/sample.vrt',
  '/home/vagrant/repositories/node-gdal/test/data/sample.tif' ]
> var ds2 = gdal.open('test/data/sample_invalid.vrt');
> ds2.getFileList()
[ 'test/data/sample_invalid.vrt' ]

I haven't used it before but it seems like it would help. From the simple test above it seems to turn all relative paths into absolute ones and if the file referenced in the VRT doesn't exist it doesn't return it.

Unfortunately, this doesn't explicitly tell you which filenames are bad, which leaves you still needing to scan through the file to see what files are referenced. Maybe a generic error like "VRT file doesn't reference any existing files on the filesystem" would be fine instead though?

If a VRT file references 10 raster files and 2 filenames are bad, I would imagine getStatistics() would still work and it would just use the 8 valid rasters. For your application you might want to be able to throw an error saying which 2 filenames are bad though instead of pretending they arent there.

@hrwgc
Copy link

hrwgc commented Jun 23, 2014

@GretaCB I'm going to focus on how we handle input files in TileMill -- let me know if your context is different.
To my knowledge, relative path vrts fail in tilemill because we symlink the VRT to the tilemill project's layers directory. Relative paths in the symlinked VRT are no longer valid, because mapnik looks for the tiffs in the layers directory, instead of in the symlinked VRT's actual location.

One way to change this would be to treat VRTs in a similar manner to multi-file vector sources, like a shapefile, where each of the component files is symlinked to the layers directory. For a VRT with relative paths, that might look like:

  1. symlink all source tiffs named in source VRT to mapnik layers directory
  2. symlink source vrt.

For a vrt with absolute paths, we'd only need to symlink the source vrt, since the absolute paths would remain valid.

Without making any changes on tilemill/mapnik end, the easiest way to deal with the issue is to always use VRTs with absolute paths. I do this by making vrt with gdalbuildvrt and specifying source images on command line using absolute paths (gdalbuildvrt mynewvrt.vrt /path/to/tiffs/*tif). You can't make the VRT with gdal_translate, as it does not preserve absolute paths (!!!).

@yhahn
Copy link
Member

yhahn commented Jun 23, 2014

@hrwgc assuming there's no symlinking involved, have you run into any other issues with relative paths?

@hrwgc
Copy link

hrwgc commented Jun 23, 2014

If a VRT references GeoTIFFs with external overviews (which are $tiff.tif.ovr files produced in same directory as $tiff) , mapnik doesn't seem to know they exist or use them. To my knowledge, this behavior occurs regardless of absolute or relative paths in VRT.

@springmeyer
Copy link
Contributor

My guess is that its only tilemill symlinking/millstone that breaks vrt's and not Mapnik. But I'll change my mind if you can provide a testcase showing otherwise @hrwgc :)

@hrwgc
Copy link

hrwgc commented Jun 23, 2014

@springmeyer I'll defer to your expertise here :)

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 23, 2014

Thanks @brandonreavis , ds.getFileList() is a great starting point for handling this error.

getStatistics() still errors out if one or more invalid tiffs exist in the VRT:

Error: /Users/mapbox/dev/mapnik-omnivore/test/data/vrt/sample.vrt, band 1: Failed to compute statistics, no valid pixels found in sampling.

So having just one invalid source in the VRT will cause the entire VRT to be invalid in my case.

  • The error above is thrown using the following VRT: https://gist.github.com/GretaCB/e9f8af8fe3d64cdb9b22
  • The VRT contains three source filenames: One is valid 1-projection.tif and the other two are missing/invalid
  • Calling ds.getFileList() on this file returns...
[ '/Users/mapbox/dev/mapnik-omnivore/test/data/vrt/sample.vrt',
  '/Users/mapbox/dev/mapnik-omnivore/test/data/vrt/vrtTest/1-projected.tif' ]

Agreed, it would be ideal to list the specific sources that are invalid, but afraid there is no other way (that I can think of) to obtain these filenames except by parsing the xml for all SourceFilename elements and checking fs.exist for each. Also, failing VRT files that have any invalid sources filenames at all.

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 23, 2014

Thanks for your input @hrwgc ! I think regarding mapnik-omnivore, I want to be able to validate the VRT file properly, which would entail handling any errors in relation to mapnik/gdal and provide as much helpful information about the error as possible. And (best case) return metadata about the VRT file.

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 23, 2014

Ran gdalinfo on the vrt I referenced above and got the following:

mapboxs-MacBook-Pro-2:mapnik-omnivore mapbox$ gdalinfo ./test/data/vrt/sample.vrt
Driver: VRT/Virtual Raster
Files: ./test/data/vrt/sample.vrt
       /Users/mapbox/dev/mapnik-omnivore/./test/data/vrt/vrtTest/1-projected.tif
Size is 7976, 7237
Coordinate System is:
PROJCS["WGS 84 / Pseudo-Mercator",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Mercator_1SP"],
    PARAMETER["central_meridian",0],
    PARAMETER["scale_factor",1],
    PARAMETER["false_easting",0],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    EXTENSION["PROJ4","+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext  +no_defs"],
    AUTHORITY["EPSG","3857"]]
Origin = (6008775.072321455925703,2941109.730532602407038)
Pixel Size = (33.043018308499384,-33.043018308499384)
Corner Coordinates:
Upper Left  ( 6008775.072, 2941109.731) ( 53d58'39.88"E, 25d31'51.64"N)
Lower Left  ( 6008775.072, 2701977.407) ( 53d58'39.88"E, 23d34'38.08"N)
Upper Right ( 6272326.186, 2941109.731) ( 56d20'42.95"E, 25d31'51.64"N)
Lower Right ( 6272326.186, 2701977.407) ( 56d20'42.95"E, 23d34'38.08"N)
Center      ( 6140550.629, 2821543.569) ( 55d 9'41.42"E, 24d33'28.56"N)
Band 1 Block=128x128 Type=Byte, ColorInterp=Gray

So, GDAL itself is only picking up the one existing source 1-projected.tif and ignoring the others.

I then checked out the GDAL source for the GetFileList() function, and looks like the file list relies on this apoOverviews object. After searching the file, I'm not exactly sure where this apoOverviews object is instantiated. I'll keep digging.
/cc @springmeyer @brandonreavis

@brandonreavis
Copy link

@GretaCB . You're brave haha. I started digging into the source for the same thing just a while ago and ...yeah... it's dense. I haven't come up with anything productive yet either. As far as I can tell, the check for whether or not the file exists is done in vrtsources.cpp file here.

But I am not sure how we could change node-gdal's ds.getFileList() method to return all of filenames mentioned in the VRT without heavily patching the gdal source. The alternative you've mentioned of parsing the file for SourceFilename tags might be the only alternative. I really wish there was a cleaner solution though.

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 23, 2014

@brandonreavis haha, yeah definitely a whirlwind going through the GDAL source code. Since getFileList() is part of the node-gdal API, it might be best to handle this error and parse the XML for the missing sources from within node-gdal. That way, any application accessing the function doesn't have to repeat the same code. I'd be happy to create a branch and throw it together.

@brandonreavis
Copy link

Yeah I am totally for that. I talked it over briefly with @brianreavis and we agree that the best way to go about it would probably be to make another function like getMissingFiles().

gdal.Dataset.prototype.getMissingFiles = function() {
    var existing_filenames = this.getFileList();
    var all_filenames = [];
    if (this.driver.description === 'VRT') { 
         //scan through file for <SourceFilename>
         //add filenames that dont exist in existing_filenames
    } 
    return all_filenames;
}

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 24, 2014

Awesome, would this be adding a getMissingFiles() function in C++ or javascript? I was thinking of adding the new function in this class: https://github.com/naturalatlas/node-gdal/blob/master/src/gdal_dataset.cpp

@brandonreavis
Copy link

So I changed my mind after looking into it more. It would be far from a simple fix and it would be prone to break with any changes to the GDAL source.

I am just going to run down a list of complications

  • Writing the function purely with JS would require using an external library to parse the XML. This would bloat node-gdal just for this issue. The GDAL library already has an XML parser.
  • Exploiting the XML parser that GDAL uses would be no simple task.
  • We would need to open a VRT "file" the same way GDAL does. Just to open and get the contents of the VRT as a string takes 100 lines of code..
    • Turns out you can open a VRT dataset like this: gdal.open('VRTDataset rasterXSize="984" rasterYSize="804">...')
    • They also are doing weird things with symlinks that I don't have the patience to figure out now.
  • Different drivers will require writing different xml parsing methods. Vector VRTs will have the same problem (versions of GDAL above 2.0) as raster VRTs, but the xml is structured differently.

So there are just too many moving parts, and it seems like it would be beyond the scope of the node-gdal binding to support. It really should be something that the GDAL dudes are taking care of... it's not really the binding's job.

Since god only knows when GDAL will support this functionality, if you decide you really need to know which filenames in a VRT are bad, I think writing a VRT parser purely with JS that lives outside of node-gdal is the way to go for now. Again, I wish there was a better solution.

@brandonreavis
Copy link

When you and @brianreavis turned verbose debugging on with gdal.verbose() you both got this message to show up as the first of many errors:

ERROR 4: '/Users/brianreavis/Repositories/node-gdal/test/data/sample.tif' does not exist in the file system,
and is not recognised as a supported dataset name.

The default behavior was to throw the last error, but with the commit above I (crudely) made the get/computeStatistics() methods throw any file errors instead of the other weird errors.

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 24, 2014

Oofdah, yeah that's a lot of hurdles. I updated my local node-gdal master to try out your last commit, but not seeing a difference in the errors. I'm seeing that particular Error 4 error when I comment out gdal quiet();, but it only includes the next invalid source and not all of them. Should there be particular errors I should be seeing based on your change?

I'm not too familiar with C++ and this might be crazy talk, but would there be a way to integrate a separate javascript VRT parser module into the rasterband.cpp file? For example, can a C++ class import a javascript module or somehow use it as a separate library? If so, it would ideally send the file path to the module and then just receive the list of missing filenames to spit back with the error message.

Also, would it be realistic to add this functionality in GDAL's source and do a pull request?
cc @springmeyer @yhahn

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 24, 2014

Thinking it's probably best to chill on this issue for now, and maybe return to it later. I'm going to invalidate any VRTs that don't have ALL sources, and include the vague-ish error message that they're missing one or more sources. Thanks for falling down the dark tunnel with me @brandonreavis !

@brandonreavis
Copy link

You should be able to have gdal.quiet() uncommented and it should function like this:

Code:

var ds    = gdal.open('test/data/sample_invalid.vrt');
var band  = ds.bands.get(1)
var stats = band.getStatistics(false, true);

Result:

/home/vagrant/repositories/node-gdal/examples/vrt.js:5
var stats = band.getStatistics(false, true);
                 ^
Error: `/home/vagrant/repositories/node-gdal/test/data/sample_invalid.tif' does not exist in the file system,
and is not recognised as a supported dataset name.

    at Object.<anonymous> (/home/vagrant/repositories/node-gdal/examples/vrt.js:5:18)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Function.Module.runMain (module.js:497:10)
    at startup (node.js:119:16)
    at node.js:902:3

GDAL will only report the first file does not exist error that it notices, and thats the error that should be thrown in JS now. Is it throwing this error instead?

ERROR 1: test/data/sample_invalid.vrt, band 1: Failed to compute statistics, no valid pixels found in sampling.

@GretaCB
Copy link
Contributor Author

GretaCB commented Jun 24, 2014

After reinstalling https://github.com/naturalatlas/node-gdal/tarball/master, I'm getting the following error for an invalid VRT when calling band.getStatistics():

[Error: `/Users/mapbox/dev/mapnik-omnivore/test/data/vrt/vrtTest/2-projected.tif' does not exist in the file system, and is not recognised as a supported dataset name.]

@brandonreavis
Copy link

About patching the GDAL source:
As far as I understand this is the function you would have to patch : VRTSimpleSource:GetFileList().

Simply commenting out these 6 lines would make it work how you want, but it makes node-gdal function differently than everyone else's GDAL code which wouldn't be good. That function is called for each SimpleSource in a VRT, so its not like you could just overload it and call it through the binding with JS. You would have to also overload all of the other VRT getFileList() methods, and yeah so begins the nightmare to support.

About a separate javascript VRT parser:
I am not totally sure what you mean here. To make this special VRT parsing ability be a part of node-gdal it would require listing an XML parser as a dependency of node-gdal, so when you do an npm install it also installs all of the bazillion other dependencies that that the XML parser needs.

If you really want to be able to call something like ds.getMissingVRTFiles(), you should make a separate module that patches node-gdal with your functionality like:

var parser = require('somexmlparser');

function getMissingVRTFiles(){
    var driver_name = this.driver.description;
    if(driver_name != 'VRT') return [];

    var filename = this.description;
    //verify its actually a filename
    var missing_files = [];
    //parse the xml 

    return missing_files;
}

module.exports.applyPatch = function(gdal){
    gdal.Dataset.prototype.getMissingVRTFiles = getMissingVRTFiles;
}

then in your mapnik-omnivore module do something like:

var gdal = require('gdal');
require('gdal-patch-getmissingvrtfiles').applyPatch(gdal);

@sgillies
Copy link
Contributor

@brandonreavis @GretaCB Late to the party, but how about on user input of a VRT we just gdal translate to a proper TIFF file and go from there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants