Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement regions #175

Closed
nschloe opened this issue Dec 21, 2017 · 36 comments
Closed

implement regions #175

nschloe opened this issue Dec 21, 2017 · 36 comments

Comments

@nschloe
Copy link
Owner

nschloe commented Dec 21, 2017

Some data formats have the concept of regions/submeshes/domains, e.g., gmsh and Exodus. Think about how to best integrate this into meshio.

@gdmcbain
Copy link
Contributor

Reading about Exodus boundary conditions, I see that besides what it calls 'element blocks', corresponding to the above, it also has 'nodesets' and 'sidesets', the last two being useful for boundary and initial conditions, and 'sides' being sides of elements.
I can immediately envisage 'nodesets' being useful for Dirichlet conditions.

@gdmcbain
Copy link
Contributor

Exodus Element Blocks are stricter than Gmsh Physical Regions; whereas the latter encode only the continuous concept of subdomain, the former mixes in the discrete notion of type of element.

Element Blocks (also referred to as simply, Blocks) are a logical grouping of elements all having the same basic geometry and number of nodes. All elements within an Element Block are required to have the same element type. (ibid)

In a 'quad-dominant' mesh of a surface, this would necessitate two blocks, for the triangles and quadrilaterals. I can see computational advantages to this: the connectivity would be described by a rectangular matrix rather than a list. But it also seems like an admixture of concerns if one just wants to specify that a subdomain is of one material.

@cbcoutinho
Copy link
Contributor

cbcoutinho commented Dec 22, 2017 via email

@gdmcbain
Copy link
Contributor

Hmm. I was trying to second-guess why Exodus insisted on one type of element per block.
In which case each element has the same number of nodes and the array with entry [e, n] the global index of the n-th node in the e-th element would be 100% dense and perfectly rectangular, which I thought (at the risk of prematurely optimizing) might be more efficient to allocate and look up.
I didn't find an explicit statement of the motivation for the Exodus restriction yet, but thought it might be something like that.

@gdmcbain
Copy link
Contributor

As to the relation between this and what's currently in the meshio cell_data and field_data, I think they're like inverses, as discussed in the final comments yesterday on #169.

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Hmm. I was trying to second-guess why Exodus insisted on one type of element per block.
In which case each element has the same number of nodes and the array with entry [e, n] the global index of the n-th node in the e-th element would be 100% dense and perfectly rectangular,

I also think this will be the reason. This is by the way also what comes out of a meshio read: The cells a segregated by their element type so the connectivity data is just one big n-by-k array, where n is the number of cells and k the number of nodes per cell. This makes many things really convenient.

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Anyhow, on to regions. Any opinion yet on how to implement them? The possibilities I see right now are

  • An dict of arrays that assigns an integer value to each cell, à la
    regions = {'triangle': [0 ,0, 1, 0, ...], 'quad': [1, 0, 1, ...]}
  • A dictionary of indices into the cells dict
    regions = {'boundary': [0, 2, 3, 4, 5, ...], 'copper': [0, 1, 2, 3, 10, ...]}
    or
    regions = {'boundary': {'triangle': [0, ...]}, 'copper': {'triangle': [0, ...]}}

I generally like the second approach better because it allows to naturally associate a string with the regions and permits cells to be a member of an arbitrary number of regions (also none at all). Perhaps Exodus-style point sets could also be incorporated in the this approach with the 'point' keyword.

@dalcinl
Copy link
Contributor

dalcinl commented Jan 30, 2018

Your last one is promising, however there is something missing. In some formats (e.g. Gmsh), the region names are mapped to an integer tag. Maybe

regions = {
    'boundary': (int_tag_1, {'triangle': [0, ...]}), 
    'copper': (int_tag_2, {'triangle': [0, ...]}),
}

Also, maybe the region dict should be allowed to have integer keys ? (again thinking on Gmsh, where physical regions are always identified with an integer, and an optional name). Or maybe keys could be tuples ("name", integer_tag) or "name" or integer_tag, the three variants allowed ?

PS: At this point I'm wondering whether meshio should use a container class to hold its data, rather than having read() return many things.

PS: What we are trying to achieve made me remember this joke https://xkcd.com/927/.

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Yeah, we should definitely allow integer tags. Also, parallel to cell_data, names and cell types should be ordered like

regions = {'triangle': {'boundary': [0, ...], 8: [0, ...]}}

@dalcinl Do you know if it's guaranteed in gmsh that physical tags and geometrical tags are always different?

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Also, good idea about the container type. This would allow us to add more data without breaking the API (as we have to do now).

@dalcinl
Copy link
Contributor

dalcinl commented Jan 30, 2018

In gmsh, physical tags and geometrical tags are orthogonal concepts, they can certainly be the same. You can have let say a boundary face with physical tag 1 that comes from a geometrical entity with tag 1.

PS: So is the tuple[str,int] or str or int not good enough for your?

@dalcinl
Copy link
Contributor

dalcinl commented Jan 30, 2018

Also, eventually you need something like a metadata entry where you can stash stuff that are too much specific to a mesh format, something like metadata = {'format': 'gmsh', stuff, stuff, stuff}, so that you can implement a roundtrip like write('mesh2.msh', *read('mesh1.msh')) and get a mesh2.msh file that matches as much as possible what is inside mesh1.msh without loss of information.

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Also, eventually you need something like a metadata entry

All good suggestions. This should probably be made a separate issue.

About the different gmsh tags: Tuples as keys is certainly possible, but seems a bit wasteful. The fact that there are multiple different tag types appears to be very specific to gmsh. A more natural approach seems to be to use

'physical{}'.format(k)
'geometrical{}'.format(k)

as keys. Optionally, the physical list could be replaced by their respective entris in $PhysicalNames.

@dalcinl
Copy link
Contributor

dalcinl commented Jan 30, 2018

Well, to add complications, gmsh physical regions can have names, I submitted a PR some time ago adding support for them. IMHO, Gmsh's geometrical tags are rather useless in a mesh, they are sort of references to the tags of geometrical entities in the *.geo file, abd they would be a good candidate for stashing in a metadata entry.

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Well, to add complications, gmsh physical regions can have names

No problem, I got this covered. I've now added gmsh region reading in this branch; will see about writing now. Feedback welcome.

@gdmcbain
Copy link
Contributor

PS: At this point I'm wondering whether meshio should use a container class to hold its data, rather than having read() return many things.

Yes, so at the moment I have something like this in my unpublished finite-element module based on meshio and pygmsh. (At least, I hadn't heard of "container class" before but I think I get the idea.)

class Mesh(object):

    dimension = NotImplemented

    def __init__(self, points, cells,
                 point_data=None, cell_data=None, field_data=None):

        self.points, self.cells = points, cells

        self.point_data = {} if point_data is None else point_data
        self.cell_data = {} if cell_data is None else cell_data
        self.field_data = {} if field_data is None else field_data
    
    @classmethod
    def from_geo(cls, geometry: Geometry, *args, **kwargs):
     
        return cls(*generate_mesh(geometry, dim=cls.dimension, *args, **kwargs))

Is that what you meant? If meshio had such a class, I would use it or subclass it, I think, rather than catching the multiple values returned by read (via generate_mesh) in my classmethod constructor.

@nschloe
Copy link
Owner Author

nschloe commented Jan 30, 2018

Let's not mix discussions. I've just opened an issue for the container discussion at #198.

@nschloe
Copy link
Owner Author

nschloe commented Jan 31, 2018

Alright, the regions now kind of work in the branch, but I'm not sure if I'm a hundred percent happy with it. The design is now such that the reader will return something like

regions = {
    'triangle': {
        'physical-1-copper': [...],
        'physical-5-iron': [...],
        'geometrical-8': [...],
        }}

from a gmsh mesh. The lists are indices in the cell array. When writing, you supply such an object to the writer, and tags and physical names are extracted from the keys.

While I think the above format is most useful for applications, I cannot see a mesh format that actually implements tags like that. (Please correct me if I'm wrong.) Even gmsh demands one and only one tag to be present for each cell in both physical and geometrical groups. I don't see much difference to a cell data array with int dtype. This is also how you're supposed to handle subdomains in VTK.

I'm wondering now if we should implement pseudo-regions simply as cell data arrays with a specific name, e.g., 'region-<something>'.

@gdmcbain
Copy link
Contributor

I'm not so keen on packing metadata into strings. @dalcinl proposed more structured tags, which generally seems more Pythonic.

Another idea is to store the metadata hierarchically. What's in master at the moment is hierarchical, where in cell_data, each type of cell ('vertex', 'line', …) has a dict with keys 'physical', 'geometric'. Here in #175 we're talking about inverting this, similar to the discussion ‘tags in VTK’ with Andy Bauer cited above, but a hierarchy could be used that way too; e.g. (without having thought this through), in field_data, instead of a flat dict keyed by user-level names of regions (as in master), put a 'physical' key there (and 'geometrical' if desired, though I agree with the assessment of @dalcinl that they're ‘rather useless’).

@gdmcbain
Copy link
Contributor

More generally, doesn't the current field_data fulfil what @dalcinl proposed for 'metadata'? What's the difference? Except that at the moment, the keys are user-level names of regions of various dimension, but couldn't those be pushed down one level?

@nschloe
Copy link
Owner Author

nschloe commented Feb 1, 2018

I'm not so keen on packing metadata into strings. dalcinl proposed more structured tags, which generally seems more Pythonic.

Well, Gmsh want's to store a string and an integer with each region. I guess that's what you get when you settle for a format like that, but it isn't pretty. I'm not a fan of making a tuple a key in dictionary, but, as suggested above, what we could do is simply store two separate things:

  • The regions as int-valued cell data arrays.
  • The string data as a dictionary (as part of field data)

I think that's more or less what we had done earlier anyways. Advantages of this:

  • The data structure maps 1:1 what gmsh does in physical and geometrical data.
  • It seamlessly integrates in our cell data array; no need to change the API.
  • Cell data can be written to almost all formats, so information isn't lost in translation.

Disadvantages:

  • Each cell needs to have and only one value for each region array. This is fine for materials (after all, each cell is made of one), but you cannot simply mark a particular boundary set or so.
  • The Gmsh standard actually allows for marking more than physical and geometrical regions, but that's not covered here. I haven't seen more markings than those two anywhere in the wild though.

What do you think?

@dalcinl
Copy link
Contributor

dalcinl commented Feb 1, 2018

@gdmcbain By metadata I mean a special dict of something that is very specific to the format that you are reading from or writing to. For example, if you read gmsh and next write vtk, the vtk writer can just ignore this metadata, but if you write back gmsh, then you honor it. The use case is doing some pre-processing on the input mesh data, and writing it back in the same format. Of course, it is up to the user to doing the read-transform-write to update the metadata if needed.

@dalcinl
Copy link
Contributor

dalcinl commented Feb 1, 2018

@gdmcbain @nschloe Can you define what field_data means for you? Maybe yes, field_data is what I called metadata, but I'm not sure whether that's the actual meaning.

@nschloe
Copy link
Owner Author

nschloe commented Feb 1, 2018

@dalcinl Remember we're discussing regions here. If it's about the a container class, please move you're contributions to the appropriate issue.

@dalcinl
Copy link
Contributor

dalcinl commented Feb 1, 2018

Yes, I know. Should be regions (or part of the info about them) be considered metadata that is too much format-specific?

@nschloe
Copy link
Owner Author

nschloe commented Feb 1, 2018

First we have to decide how to do regions. If we decide to just wrap them as part of cell data (see suggestion above), we hardly need any meta data.

@dalcinl
Copy link
Contributor

dalcinl commented Feb 1, 2018

Perhaps cell_data is too much generic for enconding regions? For me, point_data and cell_data means scalar or vector fields on either points or cells, in the VTK sense. The gmsh format also support point and cell data (real values, not integers!), but confusing/mixing such data with regions was the root cause of my initial complaints.

If you still want to encode regions with cell data, that's fine, but we need to devise a mechanism to distinguish integral values (for gmsh regions, aka physical/elementary tags) from scalar/vector data (for gmsh NodeData or ElementData).

@nschloe
Copy link
Owner Author

nschloe commented Feb 1, 2018

Perhaps cell_data is too much generic for enconding regions?

Generally you're right here. The thing is that Gmsh doesn't make use of the freedom: Both physical and geometrical data are simply integer arrays over cells, just like cell_data. The gmsh standard does actually allow for more flexibility by tagging, but I've never seen this used anywhere.

nut we need to devise a mechanism to distinguish integral values (for gmsh regions, aka physical/elementary tags)

That should be easy: The two region values of gmsh are called physical and geometrical.

@dalcinl
Copy link
Contributor

dalcinl commented Feb 1, 2018

What's in a name? :-) What would happen if you read from a VTK file with a cell data entry named physical with let say some scalar and vector field, and then you want to write out a gmsh file?

Anyway, you seem to suggest going back to previous approach where physical and geometrical were dumped in gmsh files as integer tags, and not as now in the $ElementData section. I'm certainly not opposed to that, I already said that IMHO the current behavior is a regression. However, maybe I would suggest to somehow namespace these special cell data names, let say something like gmsh:physical and gmsh:geometrical (or maybe region:physical to make it more format-independent), so now the gmsh writer can have some confidence that the data is actually meant to be dumped as region tags. With a bit of namespacing on the keys for cell data, I think you can go back to the original approach of using cell_data, and forget about a special treatment for regions.

@gdmcbain
Copy link
Contributor

gdmcbain commented Feb 2, 2018

Thanks for clarifications. This is all sounding O. K. to me as far as it goes, but…

I've just realized why I hadn't been bitten by the change as @dalcinl was this time, whereas I had been back in #169. I don't ask meshio to write .msh files, only read them, and that implicitly by using pygmsh.generate_mesh; the hidden temporary .msh files being written by Gmsh in a subprocess. Thus far, I'm working directly with the five objects returned by generate_mesh (i.e. by meshio.read) and haven't needed to serialize the mesh; if I did, and chose meshio.write and Gmsh .msh for the purpose, I would have hit this problem, yes. I would certainly have expected the physical tagging written by pygmsh and Gmsh into the $Element block to have persisted there.

So this isn't really about what I thought it was when this issue was launched in December; i.e. whether the data structure for the regions should be inverted. This sounds more like #197, but I see that the discussion from there was moved back here.

To return it for the moment to the original question of which way the looking-up of regions should go, I see that Gmsh in the MSH format in the $Element block stores it as element -> region. And if it were in $ElementData it would still be nearly the same: elm-number -> region, 'elm-number' being the index into the list or array of elements. This means that if I only want to integrate over a region, I need to filter either of those look-ups to get the elements or elm-numbers that lie in the region of interest. The intention of the inverse was to have dicts going the other way: region -> elm-number, so that such a traversal would be more direct. I believe, as mentioned above, that this is what Exodus's 'element blocks' do, but studying the spec for Gmsh MSH again, I can't see any way to encode a mapping in that direction without inverting it. It also sounds like VTK has no support for region -> elm-number either. Is Exodus on its own? Anyway, without support in Gmsh or VTK, it's maybe not worth having inverting the look-up read from the $Element block on the way in as one is just going to have to undo that on the way out in most cases.

@nschloe
Copy link
Owner Author

nschloe commented Feb 2, 2018

@gdmcbain Thanks for your comment! If I would sign all of this I could.

For my own applications, I would rather need region -> elm-index, but as you point out correctly, this is unfortunately not what mesh formats implement. Granted, Exodus node sets and side sets is approximately that, but only for, well nodes and "sides". I don't know why they make this restriction. I'm almost inclined

standards

Okay, I think this is how it's gonna go:

  • Implement regions as integer cell data. As for gmsh's groups, I like @dalcinl's suggestion to prefix the name; 'gmsh:physical' and 'gmsh:geometrical' are probably good enough.

  • If there ever comes a file format that does region -> element-index (or the need arises for Exodus node, side sets), we can always implement that as proper regions.

@nschloe
Copy link
Owner Author

nschloe commented Feb 2, 2018

Alright, thanks everyone for hanging in there. I've just published 1.11.7 with the proposed fix.

@nschloe nschloe closed this as completed Feb 2, 2018
@gdmcbain
Copy link
Contributor

gdmcbain commented Feb 5, 2018

What I'm thinking more generally, trying to not to be Gmsh-centric, is that some input or output formats will store regions as element -> region and some as region -> element. (I'm not imagining any other possibilities than these two, though I did try to.)

It's true that we don't know of any mesh formats that support region -> element-index with elements being allowed to belong to more than one region, Exodus II with its Element Blocks does more work in the direction of region -> element than element -> region.

Another reference for the format (especially ‘Anatomy of an Exodus II file’ in §3):

  • Kostka, T. D. (2013, January). Exomerge user's manual: A lightweight Python interface for manipulating Exodus files. Technical Report SAND2013-0725, Sandia National Laboratories, Livermore, California.

But anyway, not tying meshio too closely to any existing format, I wondered about whether the format-neutral data-structure that meshio builds and populates when called to read and uses when called to write, what if it had both possibilities?

  1. element -> region, say implemented as at present in cell_data[type]['gmsh:physical'] from Regions naive #199
  2. region -> element, not yet implemented but maybe with a new dict cell_data['regions'].

Then in meshio.*_io.read, if reading:

  1. a type 1 format such as Gmsh MSH, populate cell_data[type]['gmsh:physical']
  2. a type 2 format, populate cell_data['regions']

Then in meshio.*_io.write, if we have

  1. cell_data[type]['gmsh:physical'] and we have to write

    1. a type 1 format, that's easy, already implemented
    2. a type 2 format, do the inversion. Go through the cell_data[type]['gmsh:physical'] and construct the maps region -> element
  2. cell_data['regions'] and we have to write

    1. a type 1 format, invert
    2. a type 2 format, it should be straightforward.

I imagine not usually having both cell_data[type]['gmsh:physical'] and cell_data['regions'] in the same cell_data dict.

@gdmcbain
Copy link
Contributor

gdmcbain commented Feb 5, 2018

I was disappointed to learn by way of the above that

Even gmsh demands one and only one tag to be present for each cell in both physical and geometrical groups.

but indeed, from Gmsh's MSH ASCII file format:

By default, the first tag is the number of the physical entity to which the element belongs; the second is the number of the elementary geometrical entity to which the element belongs; the third is the number of mesh partitions to which the element belongs, followed by the partition ids (negative partition ids indicate ghost cells). A zero tag is equivalent to no tag. Gmsh and most codes using the MSH 2 format require at least the first two tags (physical and elementary tags).

That's quite a shortcoming. I wondered what happens if two physical entities are defined that overlap. I tried this with:

Point (1) = {0, 0, 0, 1};
Point (2) = {1, 0, 0, 1};
Point (3) = {2, 0, 0, 1};

Line (1) = {1, 2};
Line (2) = {2, 3};

Physical Line("left", 1) = {1};
Physical Line("all", 2) = {1, 2};

and found that Gmsh makes two lots of elements over 0 < x < 1.

$MeshFormat
2.2 0 8
$EndMeshFormat
$PhysicalNames
2
1 1 "left"
1 2 "all"
$EndPhysicalNames
$Nodes
3
1 0 0 0
2 1 0 0
3 2 0 0
$EndNodes
$Elements
3
1 1 2 1 1 1 2
2 1 2 2 1 1 2
3 1 2 2 2 2 3
$EndElements

The first two elements here have the same endpoints.

I suppose in practice (and I'm very surprised that I haven't hit this yet in my work) that what one would have to do is define three separate physical entities for the intersection and two relative complements and then use the union of the intersection and one of the two relative complements. Not nice.

@nschloe
Copy link
Owner Author

nschloe commented Feb 5, 2018

Thanks Geordie for the concise write-up!

what if it had both possibilities?

Yeah that'd be nice to have. I believe it'd make sense to wait for an actual use case of this and then design meshio around that. Having cell_data['regions'] will certainly a possibility, another is regions[name], but really we don't have to decide that before a user comes along with a mesh format that supports that.

That's quite a shortcoming. I wondered what happens if two physical entities are defined that overlap. I tried this with: [...]

Very interesting discovery! I'm surprised that not even Gmsh itself makes proper use of its tags. That makes me wonder why they didn't use $ElementData, or why that mesh format was created in the first place when VTK & friends were already around. Well, well...

@gdmcbain
Copy link
Contributor

gdmcbain commented Sep 7, 2018

I wondered what happens if two physical entities are defined that overlap.

This was touched on by C. Geuzaine (2018-09-06) as one of the motivations for the introduction of MSH4 #280.

The handling of physical groups in version 2 had numerous issues (see https://gitlab.onelab.info/gmsh/gmsh/issues/94). Two of the most serious were that

  • if you defined more than 1 group for a given geometrical entity, the elements would be duplicated
  • storing the physical group for each element was wasteful (as many integers as elements, instead of a single one)

These points might guide #289.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants