-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean import of code for GDAL in ESMF. Compiles/runs in ESMF #1981
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are writing a serial interface - you should only call GDAL from io_root
emulate the netcdf serial interface. NetCDF files are read on io_root and then results are distributed to all tasks. By the way - what does your test look like? You will want to add it to the tests/cunit directory.
@@ -19,6 +19,8 @@ | |||
#include <netcdf.h> | |||
#include <netcdf_meta.h> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also include an #ifdef here.
@@ -949,6 +949,10 @@ PIOc_read_darray(int ncid, int varid, int ioid, PIO_Offset arraylen, | |||
if ((ierr = pio_read_darray_nc(file, iodesc, varid, iobuf))) | |||
return pio_err(ios, file, ierr, __FILE__, __LINE__); | |||
break; | |||
case PIO_IOTYPE_GDAL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need #ifdef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey. Yeah. I tried to set up an #ifdef. It's on the list of to-do
@@ -606,6 +608,13 @@ typedef struct file_desc_t | |||
* feature. One consequence is that PIO_IOTYPE_NETCDF4C files will | |||
* not have deflate automatically turned on for each var. */ | |||
int ncint_file; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef here too
Also you don't need to create a new PR with additional changes, just push them back to the same branch. How is the GDAL library available? Can it be obtained with apt-git? Just trying to figure out how to add it to the github testing workflow. |
GDAL is available in several forms. It is available in my Fedora distro via dnf as an RPM. The source is also available, and is not complex to compile. We have it on the NCCS machines here at NASA. |
@jedwards4b, the ESMF test results show that for a file with 4 points, all 4 points are being read correctly on io_root. The data being read are [1 2 3 4]. If I run on 4 procs, the local buffers are [1] [1] [1] [1]. If I run on 2 procs, the local buffers are [1 2] [1 2]. So it looks like MPI is sending/recving but that indexing isn't correct. |
If you could provide your test program I'll give it a try. Please remove reference to ESMF, let's take that library out of the equation for now. |
ESMF does all the calling to PIO, and it would be a complex task (for me) to uncouple. ESMF starts in Fortran, can then calls into C bindings, sets up arrays etc, and then calls PIO. I'll see if I can isolate the PIO-specifc pieces but we may have to iterate. |
Is there an existing test in PIO that I could modify for this? |
Yes look in the test directories, specifically in tests/cunit for a number of possible tests. test_darray.c might be a good one to try. |
I've added test_gdal.c in the most recent commit to github.com/msl3v/ParallelIO:msl3v/GDAL_clean |
Great thanks - I have a day full of meetings today, but I'll try to get to this as soon as I can. |
What version of gdal is required - my system has gdal/3.7.1 built with gcc - can I test with that? |
3.7.1 works. Here's my cmake cmd env CC=mpicc FC=mpifort CFLAGS="-g -fPIC" cmake -DCMAKE_INSTALL_PREFIX=../Install -Wno-dev -DPIO_ENABLE_TIMING=OFF -DPIO_ENABLE_EXAMPLES=OFF -DPIO_ENABLE_FORTRAN=OFF -DPIO_ENABLE_LOGGING=ON -DCMAKE_BUILD_TYPE=DEBUG -DNetCDF_C_INCLUDE_DIR=/usr/lib64/gfortran/modules -DNetCDF_C_LIBRARY=/usr/lib64/libnetcdf.so -DWITH_PNETCDF=off -DGDAL_INCLUDE_DIR=<path_to_gdal>/include -DGDAL_LIBRARY=<path_to_gdal_lib64>/libgdal.so ../ParallelIO |
The first problem is because your test doesn't run from the source directory and so it |
Interesting. It opens, queries and reads data from the file in my local version. Curious what's different |
Is it maybe that I'm running the test directly from tests/cunit? I don't see a build dir after cmake, make install, make tests. Is 'build' in this case simply the path to src? |
Looks like you found it - that's great! |
OK. The issue with the segfault was that the recvbuf was NULL. It was fixed with a simple line in test_gdal.c |
@jedwards4b 0 sendcounts[0] = 1 sdispls[0] = 0 sendtypes[0] = 27742288 recvcounts[0] = 1 rdispls[0] = 0 recvtypes[0] = 27745072 Is this telling me that only one data point is being received (line 1)? (addenum: this is associated with the call to rearrange_io2comp()) (addendum 2: the data vector at the end of the test is [1 0 0 0]. The data vector read from file is [1 2 3 4]) |
@jedwards4b I'm at a loss. I can keep trying but it's just not clear what the problem might be. Is it the case that the data from the serially-read vector are broadcast via the MPI_AlltoAllw() call in rearrange_io2comp()->pio_swapm()? |
Yes that's correct, I can try to have another look at this tomorrow. |
Is it the case, then, that the first line in the output above |
yes, I will try to look at this again after lunch. |
No hurry. I'm just banging away at the code & posting what comes up. |
@jedwards4b just an update: Still no progress. It looks like the iodesc->map might not be getting set up correctly. MPI doesn't know to send what element to which proc. ... maybe. But I still haven't interpreted what is actually happening here. I'll keep trying. |
I think that you should walk through the netcdf serial test step by step - it should be exactly the same as this one. |
@jedwards4b Also, in a totally vanilla version of ParallelIO, all darray and the test_simple tests fail on my laptop. So maybe that is itself part of the problem. :-/ |
That does seem like a problem. On my system all of the tests are passing except the gdal test. In the test_darray.c there is a loop over num_flavors at line 110. One of those flavors is netcdf-serial (PIO_IOTYPE_NETCDF). Your test should behave exactly the same as test_darray.c does with this flavor. |
Switched to another system. All tests pass (from remotes/origin/main). It seems to be a local MPI problem on my laptop. |
Yeah. It may simply be that my laptop openMPI was upgraded (to v 4.1.9) and no longer works with ParallelIO. Switched to MPICH and things seem to be working, including the GDAL read :-/ |
ESMF_TYPEKIND_R4. MSL
Need to diagnose parallelization issues.