Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatonating Files in download_merra2.ipynb raised an error #8

Open
Tjarke opened this issue Nov 29, 2020 · 1 comment
Open

Concatonating Files in download_merra2.ipynb raised an error #8

Tjarke opened this issue Nov 29, 2020 · 1 comment

Comments

@Tjarke
Copy link

Tjarke commented Nov 29, 2020

I was using the notebook: download_merra2.ipynb and in the section: Setting up the DataFrame the following function raised an error:

 xr.open_mfdataset(file_path, concat_dim='date', preprocess=extract_date)

raised:

xarray ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

I solved it (for Germany) by changing the code in the following way:


def extract_date(data_set):
    """
    Extracts the date from the filename before merging the datasets. 
    """
    try:
        # The attribute name changed during the development of this script
        # from HDF5_Global.Filename to Filename. 
        if 'HDF5_GLOBAL.Filename' in data_set.attrs:
            f_name = data_set.attrs['HDF5_GLOBAL.Filename']
        elif 'Filename' in data_set.attrs:
            f_name = data_set.attrs['Filename']
        else: 
            raise AttributeError('The attribute name has changed again!')

        # find a match between "." and ".nc4" that does not have "." .
        exp = r'(?<=\.)[^\.]*(?=\.nc4)'
        res = re.search(exp, f_name).group(0)
        # Extract the date. 
        y, m, d = res[0:4], res[4:6], res[6:8]
        date_str = ('%s-%s-%s' % (y, m, d))
        data_set = data_set.assign(date=date_str)
         data_set = data_set.expand_dims("date") 
         data_set.coords["lat"] = [47.5, 48.0, 48.5, 49.0, 49.5, 50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5, 55.0] 
         data_set.coords["lon"] = [5.625, 6.25, 6.875, 7.5, 8.125, 8.75, 9.375, 10.0, 10.625, 11.25, 11.875, 12.5, 13.125, 13.75, 14.375, 15.0] 
         data_set.coords["time"] = list(range(24)) 


        return data_set

    except KeyError:
        # The last dataset is the one all the other sets will be merged into. 
        # Therefore, no date can be extracted.
         data_set.coords["lat"] = [47.5, 48.0, 48.5, 49.0, 49.5, 50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5, 55.0] 
         data_set.coords["lon"] = [5.625, 6.25, 6.875, 7.5, 8.125, 8.75, 9.375, 10.0, 10.625, 11.25, 11.875, 12.5, 13.125, 13.75, 14.375, 15.0] 
         data_set.coords["time"] = list(range(24)) 
        return data_set

and by commenting in the following cell:

df.drop('DISPH', axis=1, inplace=True)
df.drop(['time', 'date'], axis=1, inplace=True)
df.drop(['U2M', 'U10M', 'U50M', 'V2M', 'V10M', 'V50M'], axis=1, inplace=True)

# df['lat'] = df['lat'].apply(lambda x: lat_array[int(x)])
# df['lon'] = df['lon'].apply(lambda x: lon_array[int(x)])

I could not check whether the same error occurred on another machine.

For the rest thank you for writing this awesome notebook!

@Hossein-Madadi
Copy link

I was using the notebook: download_merra2.ipynb and in the section: Setting up the DataFrame the following function raised an error:

 xr.open_mfdataset(file_path, concat_dim='date', preprocess=extract_date)

raised:

xarray ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

I solved it (for Germany) by changing the code in the following way:


def extract_date(data_set):
    """
    Extracts the date from the filename before merging the datasets. 
    """
    try:
        # The attribute name changed during the development of this script
        # from HDF5_Global.Filename to Filename. 
        if 'HDF5_GLOBAL.Filename' in data_set.attrs:
            f_name = data_set.attrs['HDF5_GLOBAL.Filename']
        elif 'Filename' in data_set.attrs:
            f_name = data_set.attrs['Filename']
        else: 
            raise AttributeError('The attribute name has changed again!')

        # find a match between "." and ".nc4" that does not have "." .
        exp = r'(?<=\.)[^\.]*(?=\.nc4)'
        res = re.search(exp, f_name).group(0)
        # Extract the date. 
        y, m, d = res[0:4], res[4:6], res[6:8]
        date_str = ('%s-%s-%s' % (y, m, d))
        data_set = data_set.assign(date=date_str)
       **  data_set = data_set.expand_dims("date") **
         **data_set.coords["lat"] = [47.5, 48.0, 48.5, 49.0, 49.5, 50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5, 55.0] **
         **data_set.coords["lon"] = [5.625, 6.25, 6.875, 7.5, 8.125, 8.75, 9.375, 10.0, 10.625, 11.25, 11.875, 12.5, 13.125, 13.75, 14.375, 15.0] **
         **data_set.coords["time"] = list(range(24)) **


        return data_set

    except KeyError:
        # The last dataset is the one all the other sets will be merged into. 
        # Therefore, no date can be extracted.
         **data_set.coords["lat"] = [47.5, 48.0, 48.5, 49.0, 49.5, 50.0, 50.5, 51.0, 51.5, 52.0, 52.5, 53.0, 53.5, 54.0, 54.5, 55.0] **
         **data_set.coords["lon"] = [5.625, 6.25, 6.875, 7.5, 8.125, 8.75, 9.375, 10.0, 10.625, 11.25, 11.875, 12.5, 13.125, 13.75, 14.375, 15.0] **
         **data_set.coords["time"] = list(range(24)) **
        return data_set

and by commenting in the following cell:

df.drop('DISPH', axis=1, inplace=True)
df.drop(['time', 'date'], axis=1, inplace=True)
df.drop(['U2M', 'U10M', 'U50M', 'V2M', 'V10M', 'V50M'], axis=1, inplace=True)

# df['lat'] = df['lat'].apply(lambda x: lat_array[int(x)])
# df['lon'] = df['lon'].apply(lambda x: lon_array[int(x)])

I could not check whether the same error occurred on another machine.

For the rest thank you for writing this awesome notebook!

@Tjarke, I had also same problem. Thanks for your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants