Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export_workspace: support file copy #676

Closed
jdries opened this issue Feb 6, 2024 · 3 comments · Fixed by #685, Open-EO/openeo-python-driver#262 or #687
Closed

export_workspace: support file copy #676

jdries opened this issue Feb 6, 2024 · 3 comments · Fixed by #685, Open-EO/openeo-python-driver#262 or #687
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Feb 6, 2024

Assume that the workspace is mounted as POSIX directory, then in 'export_workspace' copy the relevant STAC and data files to the right directory.
Workspace metadata can be defined in config file.

bossie added a commit that referenced this issue Feb 19, 2024
@bossie bossie linked a pull request Feb 20, 2024 that will close this issue
bossie added a commit that referenced this issue Feb 20, 2024
bossie added a commit that referenced this issue Feb 20, 2024
@bossie
Copy link
Collaborator

bossie commented Feb 20, 2024

Needs more work.

@bossie bossie reopened this Feb 20, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 21, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 21, 2024
bossie added a commit that referenced this issue Feb 21, 2024
@bossie bossie linked a pull request Feb 21, 2024 that will close this issue
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 23, 2024
bossie added a commit that referenced this issue Feb 23, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 26, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 26, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 27, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 27, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 27, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 27, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 27, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 28, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 28, 2024
bossie added a commit to Open-EO/openeo-python-driver that referenced this issue Feb 28, 2024
bossie added a commit that referenced this issue Feb 28, 2024
bossie added a commit that referenced this issue Feb 29, 2024
I wonder why this worked before.

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/flask/app.py", line 870, in full_dispatch_request
    rv = self.dispatch_request()
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/flask/app.py", line 855, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/users/auth.py", line 88, in decorated
    return f(*args, **kwargs)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/views.py", line 655, in result
    result = backend_implementation.processing.evaluate(process_graph=process_graph, env=env)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 301, in evaluate
    return evaluate(process_graph=process_graph, env=env)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate
    result = convert_node(result_node, env=env)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1563, in apply_process
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1563, in <dictcomp>
    args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())}
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node
    return convert_node(processGraph['node'], env=env)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node
    process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}),
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1595, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2204, in load_stac
    return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/openeogeotrellis/backend.py", line 1004, in load_stac
    opensearch_client.add_feature(
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/usr/local/spark/python/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
  File "/var/lib/jenkins/workspace/_openeo-geopyspark-driver_PR-687/venv38/lib64/python3.8/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o24174.add_feature.
: java.lang.NullPointerException
	at java.base/java.net.URI$Parser.parse(URI.java:3104)
	at java.base/java.net.URI.<init>(URI.java:600)
	at org.openeo.geotrellis.file.FixedFeaturesOpenSearchClient.$anonfun$add_feature$1(FixedFeaturesOpenSearchClient.scala:28)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at org.openeo.geotrellis.file.FixedFeaturesOpenSearchClient.add_feature(FixedFeaturesOpenSearchClient.scala:26)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:829)
@bossie
Copy link
Collaborator

bossie commented Feb 29, 2024

Added disk workspace vdboschj-public-workspace on Terrascope to test:

workspaces = {
    "vdboschj-public-workspace": DiskWorkspace(root_directory=Path("/data/users/Public/vdboschj/workspace"))
}

bossie added a commit that referenced this issue Feb 29, 2024
In this case, it was an integration test with a "discard_result" process.

Traceback (most recent call last):
  File "batch_job.py", line 1339, in <module>
    main(sys.argv)
  File "batch_job.py", line 1014, in main
    run_driver()
  File "batch_job.py", line 985, in run_driver
    run_job(
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "batch_job.py", line 1189, in run_job
    _export_workspace(result, result_metadata, stac_metadata_dir=job_dir)
  File "batch_job.py", line 1217, in _export_workspace
    asset_paths = [Path(asset["href"]) for asset in result_metadata["assets"].values()]
KeyError: 'assets'
bossie added a commit that referenced this issue Mar 4, 2024
Traceback (most recent call last):
  File "batch_job.py", line 1345, in <module>
    main(sys.argv)
  File "batch_job.py", line 1014, in main
    run_driver()
  File "batch_job.py", line 985, in run_driver
    run_job(
  File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "batch_job.py", line 1189, in run_job
    _export_workspace(result, result_metadata, stac_metadata_dir=job_dir)
  File "batch_job.py", line 1224, in _export_workspace
    stac_paths = _write_exported_stac_collection(stac_metadata_dir, result_metadata)
  File "batch_job.py", line 1259, in _write_exported_stac_collection
    item_files = [write_stac_item_file(asset_id, asset) for asset_id, asset in result_metadata["assets"].items()]
KeyError: 'assets'
@bossie
Copy link
Collaborator

bossie commented Mar 6, 2024

This process graph in a batch job will copy the batch job's output assets to subdirectory test_export_workspace/ of test workspace vdboschj-public-workspace *:

{
  "loadcollection1": {
    "process_id": "load_collection",
    "arguments": {
      "id": "PROBAV_L3_S10_TOC_333M",
      "temporal_extent": [
        "2017-11-21",
        "2017-11-22"
      ],
      "spatial_extent": {
        "west": 3.5385,
        "south": 51.3548,
        "east": 3.9869,
        "north": 51.625
      },
      "bands": [
        "NDVI"
      ]
    }
  },
  "saveresult1": {
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "loadcollection1"
      },
      "format": "GTiff"
    }
  },
  "exportworkspace1": {
    "process_id": "export_workspace",
    "arguments": {
      "data": {
        "from_node": "saveresult1"
      },
      "workspace": "vdboschj-public-workspace",
      "merge": "test_export_workspace"
    },
    "result": true
  }
}

* maps to directory /data/users/Public/vdboschj/workspace/test_export_workspace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment