Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with ADCRIC mesh and prep step #170

Open
uturuncoglu opened this issue Mar 9, 2023 · 25 comments
Open

Issue with ADCRIC mesh and prep step #170

uturuncoglu opened this issue Mar 9, 2023 · 25 comments

Comments

@uturuncoglu
Copy link

@pvelissariou1 @moghimis I am creating this new issue since I have problem of running CDEPS+ADCRIC through the CMEPS. As you already know that CDEPS+ADCRIC configuration is working without any issue with NUOPC connectors but when I add CMEPS into the picture, I am getting following error from ESMF,

20230308 155252.352 ERROR            PET499 /glade/work/epicufsrt/contrib/hpc-stack/src-intel2022.1/pkg/esmf-8.3.0b09/src/Infrastructure/Mesh/src/ESMCI_Mesh_Glue.C:5598 ESMCI_meshcreateredistelems() Internal error: Bad condition  - /glade/work/epicufsrt/contrib/hpc-stack/src-intel2022.1/pkg/esmf-8.3.0b09/src/Infrastructure/Mesh/src/Legacy/ESMCI_DDir.C, line:251:processor=499 could not find gid=4293786343 even though it's the processor that should contain it. It's likely that that gid isn't in the directory.
20230308 155252.353 ERROR            PET499 ESMCI_MeshCap.C:2160 MeshCap::meshcreateredistelems() Internal error: Bad condition  - Internal subroutine call returned Error
20230308 155252.353 ERROR            PET499 ESMF_Mesh.F90:3325 ESMF_MeshCreateRedist() Internal error: Bad condition  - Internal subroutine call returned Error
20230308 155252.353 ERROR            PET499 MED-TO-OCN:src/addon/NUOPC/src/NUOPC_Connector.F90:4338 Internal error: Bad condition  - Passing error in return code
20230308 155252.353 ERROR            PET499 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2734 Internal error: Bad condition  - Phase 'IPDv05p5' Initialize for connectorComp 1 -> 3: MED-TO-OCN did not return ESMF_SUCCESS
20230308 155252.353 ERROR            PET499 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1882 Internal error: Bad condition  - Passing error in return code
20230308 155252.353 ERROR            PET499 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:460 Internal error: Bad condition  - Passing error in return code
20230308 155252.353 ERROR            PET499 UFS.F90:386 Internal error: Bad condition  - Aborting UFS
20230308 155252.353 INFO             PET499 Finalizing ESMF

Aster some discussion with the ESMF team members, we decided to check the the_data%NdIDs and the_data%ElIDs. These needs to be > 1. The output of the debug code is something like following,

350: the_data%ElIDs < 1 =            1    -3291031
350: the_data%ElIDs < 1 =            2    -3291032
350: the_data%ElIDs < 1 =            3    -3291033
350: the_data%ElIDs < 1 =            4    -3291926
350: the_data%ElIDs < 1 =            5    -3291927
350: the_data%ElIDs < 1 =            6    -3291928
350: the_data%ElIDs < 1 =            7    -3291929
350: the_data%ElIDs < 1 =            8    -3291931
350: the_data%ElIDs < 1 =           15    -3292871
350: the_data%ElIDs < 1 =           17    -3292873
350: the_data%ElIDs < 1 =           18    -3292874
350: the_data%ElIDs < 1 =           19    -3292875
350: the_data%ElIDs < 1 =           20    -3292876
350: the_data%ElIDs < 1 =           21    -3292877
350: the_data%ElIDs < 1 =           22    -3292878
350: the_data%ElIDs < 1 =           23    -3292879
350: the_data%ElIDs < 1 =           24    -3292880
350: the_data%ElIDs < 1 =           25    -3292881
350: the_data%ElIDs < 1 =           26    -3292882

It seems that this data is read from the PEXXXX/fort.16 files. Right? If so, maybe I have issue with ADCIRC prep step. I just wonder if you have PEXXXX folders for florence_hsofs.atm2adc case. So, I could check mines and to be sure that they are fine. At this point, I have only aches to Orion and Cheyenne and if you have you could copy the files over there.

@uturuncoglu
Copy link
Author

Okay. I am running same case under Orion with out-of-box CoastalApp (not under UFS) and check the elements id over there. If I see negative number in that case too then there could be issue in the ADCIRC cap.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 10, 2023 via email

@uturuncoglu
Copy link
Author

Yes, it seems with connectors but not with CMEPS mediator. There might be an issue in ADCIRC cap or the definition of the decomposition. The elements ids needs to be positive number by definition. I am waiting to run the case at this point. So, I'll update you.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 10, 2023 via email

@uturuncoglu
Copy link
Author

uturuncoglu commented Mar 10, 2023

@pvelissariou1 here is my run directory.

/work/noaa/nems/tufuk/CoastalApp-testsuite/work/florence_hsofs.atm2adc/run

I wrote the element ids in here too. If you look at nems_log.out you will see there are lots of negative ids. I am not sure those are expected but accordion to Bob (ESMF developer), those needs to be positive numbers.

I did following to write those (create_parallel_esmf_mesh_from_meshdata() routine in ADCIRC/thirdparty/nuopc/adc_mod.F90),

        integer :: i
        do i = lbound(the_data%NdIDs, dim=1), ubound(the_data%NdIDs, dim=1)
           if (the_data%NdIDs(i) < 1) then
              print*, "the_data%NdIDs < 1 = ", i, the_data%NdIDs(i)
           end if
        end do
        do i = lbound(the_data%ElIDs, dim=1), ubound(the_data%ElIDs, dim=1)
           if (the_data%ElIDs(i) < 1) then
              print*, "the_data%ElIDs < 1 = ", i, the_data%ElIDs(i)
           end if
        end do
        out_esmf_mesh=ESMF_MeshCreate(parametricDim=dim1, spatialDim=spacedim, &
            nodeIDs=the_data%NdIDs, nodeCoords=the_data%NdCoords, &
            nodeOwners=the_data%NdOwners, elementIDs=the_data%ElIDs, &
            elementTypes=the_data%ElTypes, elementConn=the_data%ElConnect, &
            rc=rc)

@uturuncoglu
Copy link
Author

If you look at PE0498/fort.18 (I think element ids are read from it) there are negative numbers in there too. Is it expected? I think those files are created by Metis (or with ADCIRC prep). Right?

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 10, 2023 via email

@uturuncoglu
Copy link
Author

@pvelissariou1 Which folder level can you see on Orion? I could change the permissions. BTW, is it normal to have negative numbers in PE0498/fort.18. If you don't mind could you check it in your side.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 10, 2023 via email

@uturuncoglu
Copy link
Author

Okay. Let me know if you need anything from my side. Do you know the exact place that ADCIRC prep writes fort.18 files. Maybe we could find something in there.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 10, 2023 via email

@uturuncoglu
Copy link
Author

@pvelissariou1 BTW, if you look at this code,

https://github.com/adcirc/adcirc/blob/b2a68685b57f4490a3818df9617e214cde3524f6/thirdparty/nuopc/adc_mod.F90#L402

There is a line like the_data%NdIds = abs(the_data%NdIds) ; but not for element ids. So, I'll test to put this code also for element ids to see what happens.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 10, 2023 via email

@uturuncoglu
Copy link
Author

@pvelissariou1 @moghimis Adding following to adc_mod solve the negative number issue. At this point, I am not sure this is the correct fix or not but it allowed me to pass the issue in the CMEPS.

@@ -405,6 +419,7 @@ module adc_mod
         DO I=1, the_data%NumEl  ! MNE   
           READ(funit,*) the_data%ElIDs(I) ; 
         ENDDO
+        the_data%ElIDs = abs(the_data%ElIDs)

But I hit another issue like following,

20230311 202853.807 ERROR            PET049 ESMF_FieldRegrid.F90:4102 ESMF_FieldRegridGetArea Invalid argument  - Can't currently calculate area on a mesh location other than elements
20230311 202853.807 ERROR            PET049 med.F90:2323 Invalid argument  - Passing error in return code
20230311 202853.807 ERROR            PET049 med.F90:1686 Invalid argument  - Passing error in return code
20230311 202853.807 ERROR            PET049 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:1639 Invalid argument  - Passing error in return code
20230311 202853.807 ERROR            PET049 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2577 Invalid argument  - Phase 'IPDv03p7' Initialize for modelComp 1: MED did not return ESMF_SUCCESS
20230311 202853.807 ERROR            PET049 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2408 Invalid argument  - Passing error in return code
20230311 202853.807 ERROR            PET049 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2192 Invalid argument  - Passing error in return code
20230311 202853.807 ERROR            PET049 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:463 Invalid argument  - Passing error in return code
20230311 202853.807 ERROR            PET049 UFS.F90:386 Invalid argument  - Aborting UFS
20230311 202853.807 INFO             PET049 Finalizing ESMF

This basically means that the ADCIRC mesh is created on node coordinates and ESMF could to get the area. Anyway, I did a trick in CMEPS side to get rid of this error but then I had another one,

20230311 204503.716 INFO             PET050 before med_map_RouteHandles_init
20230311 204503.717 INFO             PET050  (module_med_map: med_map_routehandles_initfrom_field): mapname fillv_bilnr
20230311 204503.717 INFO             PET050 atm to ocn srcMask =          0 dstMask =          0
20230311 204504.917 INFO             PET050 after  med_map_RouteHandles_init
20230311 204504.917 INFO             PET050  Map type fillv_bilnr, destcomp ocn,  mapnorm one  Sa_pslv
20230311 204504.917 ERROR            PET050 ESMCI_Array.C:1061 ESMCI::Array::create() Value unrecognized or out of range  - LocalArray does not accommodate requested element count
20230311 204504.917 ERROR            PET050 ESMCI_Array_F.C:79 c_esmc_arraycreatelocalarray() Value unrecognized or out of range  - Internal subroutine call returned Error
20230311 204504.917 ERROR            PET050 ESMF_ArrayCreate.F90:25443 ESMF_ArrayCreateLocalArray() Value unrecognized or out of range  - Internal subroutine call returned Error
20230311 204504.917 ERROR            PET050 ESMF_ArrayCreate.F90:2844 ESMF_ArrayCreateFrmPtr Value unrecognized or out of range  - Internal subroutine call returned Error
20230311 204504.917 ERROR            PET050 ESMF_FieldEmpty.F90:61322 ESMF_FieldEmptyCompGBPtr Value unrecognized or out of range  - Internal subroutine call returned Error
20230311 204504.917 ERROR            PET050 ESMF_FieldCreate.F90:5585 ESMF_FieldCreateGBDataPtr Value unrecognized or out of range  - Internal subroutine call returned Error
20230311 204504.917 ERROR            PET050 ESMF_FieldCreate.F90:25437 ESMF_FieldCreateMeshDataPtr Value unrecognized or out of range  - Internal subroutine call returned Error
20230311 204504.917 ERROR            PET050 med_map_mod.F90:902 Value unrecognized or out of range  - Passing error in return code
20230311 204504.917 ERROR            PET050 med.F90:1827 Value unrecognized or out of range  - Passing error in return code
20230311 204504.917 ERROR            PET050 MED:src/addon/NUOPC/src/NUOPC_ModelBase.F90:1639 Value unrecognized or out of range  - Passing error in return code
20230311 204504.917 ERROR            PET050 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2577 Value unrecognized or out of range  - Phase 'IPDv03p7' Initialize for modelComp 1: MED did not return ESMF_SUCCESS
20230311 204504.917 ERROR            PET050 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2408 Value unrecognized or out of range  - Passing error in return code
20230311 204504.917 ERROR            PET050 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2192 Value unrecognized or out of range  - Passing error in return code
20230311 204504.917 ERROR            PET050 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:463 Value unrecognized or out of range  - Passing error in return code
20230311 204504.917 ERROR            PET050 UFS.F90:386 Value unrecognized or out of range  - Aborting UFS

It seems that the fields in ADCIRC cap side is created without explicitly setting meshloc arguments and ESMF creates all the fields on ESMF_MESHLOC_NODE by default. The CMEPS basically assumes that everything in ESMF_MESHLOC_ELEMENT and creates internal ESMF Field Bundles like that. So, I think this creates incompatibility between CMEPS and ADCIRC and leads to fail.

I am plaining to modify ADCIRC cap to create fields with meshloc=ESMF_MESHLOC_ELEMENT argument. This might solve the problem but I am not sure the field will be consistent with the ADCIRC internal data structures and their dimensions. So, at this point we need to discuss little bit more about the ADCIRC cap and decide our approach. We might also find a way to create ADCIRC mesh differently since it is created on node but CMEPS expects it on elements. I am not sure existing input files that are used to create mesh will allow us to create mesh with that way.

I also check the WW3 and its unstructured mesh usage under UFS and CMEPS. It seems that WW3 creates the fields with meshloc=ESMF_MESHLOC_ELEMENT. So, it could be used with CMEPS without any issue.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 13, 2023 via email

@uturuncoglu
Copy link
Author

@pvelissariou1 I think even if we fix the element id issue. We will still have issue with mesh and fields. I'll talk more with Mariana and others from the ESMf team to look for possible approach to tackle that issue.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 13, 2023 via email

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 13, 2023 via email

@uturuncoglu
Copy link
Author

@pvelissariou1 @saeed-moghimi-noaa According to Bob (main ESMF developer related to the mesh and interpolation routines), maybe getting absolute value is not the right approach. He suggest me to eliminate the ghost points (the negative element ids). Anyway, we will have a meeting with Mariana, Denise and Bob about it on Monday to look at both ADCIRC cap and also the CMEPS to find the best approach. I'll update you about it.

@saeed-moghimi-noaa
Copy link
Collaborator

Thanks @uturuncoglu . Good progress! Please let us know if inviting an ADCIRC developer night be helpful at any point.

@uturuncoglu
Copy link
Author

@saeed-moghimi-noaa Thanks. Sure. I'll let you know about the initial discussion. If we need we could have another meeting by including ADCIRC developer.

@uturuncoglu
Copy link
Author

@pvelissariou1 @saeed-moghimi-noaa I had a call with Mariana (main CMEPS developer), Denise (from NOAA-EMC - she has experience with development of WW3 unstructured mesh configuration) and also Bob from (ESMF team, the main person for meshes) about the issue that we had with ADCIRC mesh. Here are the question that we have and needs to be clarified,

  • It seems that ADCIRC defines the fields on nodes. Is this a requirement? Is it possible to create the fields on elements? Is there any flexibility to support this under ADCIRC cap.
  • Is it possible to create ADCIRC mesh without ghost cells/elements? This will eliminate the negative indices for elements and nodes and make model more efficient. It seems they are not used by the ESMF.
  • Is it possible to access element corner coordinates? This could be use to create dual mesh - basically swaps nodes with elements. For regional applications, you are loosing some information around the edge but this can be tested too. We could use this approach to eliminate the issue related with the mesh.

It seems that supporting the nodes under CMEPS mediator requires lots of development and it seems that it is not a feasible option according to the discussion that I have with Mariana. So, the development needs to be done in the ADCIRC cap to make it run with CMEPS. Anyway, let me know what you think. If you want to arrange another meeting with ADCIRC developers that works for me and we could have more discussion around the issue.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 21, 2023 via email

@uturuncoglu
Copy link
Author

@pvelissariou1 I think we will have similar problems with SCHISM since it also uses nodes.

@pvelissariou1
Copy link
Collaborator

pvelissariou1 commented Mar 21, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants