Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant numbers of missing S1 RTC scenes #372

Open
JackDunnNZ opened this issue Aug 13, 2024 · 0 comments
Open

Significant numbers of missing S1 RTC scenes #372

JackDunnNZ opened this issue Aug 13, 2024 · 0 comments

Comments

@JackDunnNZ
Copy link

We are observing that a large number of S1 RTC scenes are not present in the catalog, but are present in the raw data.

The following code searches over an arbitrary AOI (in this case the African continent) and date range (Jan-July 2024) and compares the scenes in the RTC catalog to the scenes in the earthsearch catalog (not the true raw data, but for simplicity):

import planetary_computer
import pystac_client

bbox = [-17.578125000000004, -36.3151251474805, 54.84375000000001, 37.43997405227057]
datetime_end = "2024-08-01"
datetime_start = "2024-01-01"

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)
search = catalog.search(
    collections=["sentinel-1-rtc"],
    datetime=f"{datetime_start}/{datetime_end}",
    bbox=bbox,
    fields={"include": ["id"], "exclude": ["links", "assets", "properties", "bbox", "geometry"]},
    limit=1000,
)
items1 = list(search.items_as_dicts())
print(f"Found {len(items1)} items")
ids1 = set(item["id"] for item in items1)

# Try the earth-search
client = pystac_client.Client.open("https://earth-search.aws.element84.com/v1")
search = client.search(
    collections=["sentinel-1-grd"],
    datetime=f"{datetime_start}/{datetime_end}",
    bbox=bbox,
    fields={"include": ["id"], "exclude": ["links", "assets", "bbox", "geometry"]},
    limit=1000,
)
items2 = list(search.items_as_dicts())
print(f"Found {len(items2)} items")
ids2 = set(item["id"] + "_rtc" for item in items2)

n_missing = len(ids2 - ids1)
print(f"{n_missing} items ({n_missing * 100 / len(items2)}%) missing from RTC catalog")

For this particular query, the RTC catalog is missing 3116 out of 21290 scenes (~15%). At risk of sounding ungrateful for such an excellent resource, this is making it hard for us to use the RTC as a data source, but ideally we would like to avoid the effort of managing it ourselves.

I see from older issues there was a plan for a validation process that would help prevent such gaps in the catalog. Has there been progress on that front, and is there any way we could help at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant