Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP-51: GeoNode 4 #3228

Closed
pjdufour opened this issue Aug 23, 2017 · 28 comments
Closed

GNIP-51: GeoNode 4 #3228

pjdufour opened this issue Aug 23, 2017 · 28 comments
Labels
gnip A GeoNodeImprovementProcess Issue

Comments

@pjdufour
Copy link
Member

pjdufour commented Aug 23, 2017

GeoNode 4

At the recent FOSS4G 2017, the community discussed our vision for GeoNode 3. Below you will find a document that describes a few of principals we identified as important for adopting into the next major release of GeoNode. Take a read!

https://github.com/GeoNode/geonode-vision/blob/master/geonode-vision.md

This document is only a draft and we'd appreciate your input. You can provide feedback in the comments below or edit the document if there's agreement. Thanks!

@pjdufour pjdufour added the gnip A GeoNodeImprovementProcess Issue label Aug 23, 2017
@tomkralidis
Copy link
Member

Looks good! Initial comments?

  • can we clarify fresh CSW?
  • should GeoNode be Python 3 only?

@pjdufour
Copy link
Member Author

pjdufour commented Aug 24, 2017

In regards to "fresh CSW", if my recall is correct GeoNode had at one time supported GeoNetwork and other CSW backends. But right now pyCSW is the only recommended backend. That legacy code had prevented deep integration of permissions, users, etc. into our pyCSW metadata services. IMHO, we should generate CSW directly from the models as a API service, instead of the current flow where we save ISO 1915 XML directly into the model and then translate. I'd be in favor of explicitly promoting pyCSW as a top-level component, but we hadn't discussed it yet.

In regards to Python 3, that's a good question! I don't know.

@tomkralidis
Copy link
Member

Some clarifications:

  • we do generate CSW directly from the models. We additionally store a synced static ISO 19139:2007 document for the CSW workflow where a clients asks for ISO with a full element set (specification requirement)
  • there's nothing stopping deep integration of permissions or users. pycsw is extensible such that repository plugins can handle this transparently
  • there is room for improvement in the pycsw GeoNode plugin query functionality. Currently, pycsw translates the CSW query into an SQL where clause. This would be better implemented by the plugin interpreting the CSW query (pycsw passes a dict of the CSW query into plugins) into it's native query syntax (Django query, ES, SOLR, etc.). This would play nicer with users/permissions native to Django
  • what do we mean by top-level component?
  • I'd be in favour of Python 3 only to take advantage of its respective improvements

@pjdufour
Copy link
Member Author

  • By "top-level" I meant we should make pyCSW a required component rather than optional.
  • My thinking is that with a fresh codebase, we could build much deeper pyCSW integration.
  • Also, FWIW, caching of the CSW directly in the model has lead to issues when the IP address changes, etc. IMHO, would be better not to cache it at all.
  • We shouldn't need to rely on updatelayers as much as we do know to fix synchronization issues.

@tomkralidis
Copy link
Member

  • we'll also want to address CSW transactions (which would get easier with deeper integration) and CSW 3.0 by default (it's configurable in the pycsw API)
  • +1 to iron out synchronization

More comments/thoughts:

  • should we consider MapServer or Mapnik as a data provider option?

@afabiani
Copy link
Member

Hi all, thanks for putting together this proposal. Overall it looks to me very good.
Unique points I would like to highlight and possibly to discuss more are:

  1. The core of GeoNode 3 should be more oriented to geospatial data instead of the simple concept of Layer. A geospatial dataset may produce more than one Layer and also it is possible to produce Layers out of analysis and processing.

  2. GeoNode 3 should be more oriented to implement and correctly manage workflows. It is important both from users and mainteners perspective. From the users perspective, often there is the need of better manage the data flow being able to establish who can publish data, who can check it's quality, who can access the results when ready. From the developers perspective there's often the need to introduce more checks on the data uploaded by users and eventually to pre-process it before ingesting to the system. It would be also very useful to allow GeoNode 3 to manage and follow remote processing.
    Generally speaking the overall architecture should go toward a direction where GeoNode core is completely independent from the geospatial backend.

  3. Multi-tenancy support and clustering. GeoNode 3 architecture should be ready to be deployed on cloud being used from big organizations which have the need of tweaking the portal, both in terms of layout and available functionalities, accordingly to the Group / User connected.

  4. Dev-ops instruments integration to have more control on accesses to the portal. It is very important to be able to understand who is doing what and eventually limiting unwanted activities to the portal.

  5. Pluggable and extensible metadata system. Better rely on the power of instruments like pycsw and GeoNetwork to let GeoNode 3 being able to support dynamic metadata forms.

@pjdufour
Copy link
Member Author

Thanks for the feedback @afabiani! Below are comments/questions:

  1. My suggestion in the past has been the additional of a "collection" or "project" model that would encapsulate a set of layers, documents, and maps. The project could be manually created or automatically created. As mentioned, the UI could be quite different depending on the instance. Would that satisfy your requirement?

  2. We should certainly add the necessary hooks to the core for workflow management (database fields and APIs), but I wonder how much of this is more on the UI/UX side.

  3. Certainly can add principal on multi-tenancy support and clustering. I think we all agree with that.

  4. Could you clarify? An ELK server could satisfy demands for continuous monitoring.

  5. What are dynamic metadata forms?

@capooti
Copy link
Member

capooti commented Aug 24, 2017

Love all of the idea behind this GNIP.
I agree with @afabiani for the need of a containers of data (could be also a project model, as @pjdufour defines it).
Workflow management could be handy, but I agree with @afabiani that should come as a separate block.
Lot of good ideas here, wish I will be able to find the time to work on it :)

@pjdufour
Copy link
Member Author

I put together an image of the architecture we discussed at the code sprint.

https://github.com/GeoNode/geonode-vision/blob/master/GeoNode3_Vision_Architecture.jpg

@tomkralidis
Copy link
Member

Thanks @pjdufour. How can we edit the diagram? Suggest we add CSW and OWS data services to articulate GeoNode's support for SDI and standards.

@pjdufour
Copy link
Member Author

pjdufour commented Aug 25, 2017

I looked for a hackpad like app for a diagram, but couldn't find anything good. I ended up using google diagram. I'll add you to the google diagram.

@francbartoli
Copy link
Member

Thanks all for the discussion! Few thoughts:

  • General:
    1. Love diagram and give a quick architecture of the next major version but I'd prefer to list what GeoNode is and what isn't maybe in a meaningful format like gherkin which makes itself useful also for BDD

      Feature: Backend agnostic configuration
          In order to ingest spatial dataset
          As a maintainer
          I want to be able to have abstract configuration for multiple backends
      
          Scenario:
              Given there is a spatial dataset somewhere
              And multiple backends configuration for storing spatial data
              When somebody ingests a dataset
              And any target backend has not been chosen
              Then I see the dataset ingested in all of them
      

      It's just an example, I'm not quite sure it is all formally correct.

    2. The more we allow this exercise to everybody included people who are not developers the more we open mind for features which are business oriented rather than technical oriented. Duplication would not be an issue right now, we can then make a triage for duplicated feature/scenario

    3. Be more generic possible with no reference to user interface interaction at all

  • Technical:
    1. Minimal core with only a RESTful API, thinking of anything can be done from a GUI then it should be done as well as from a machine
    2. Geospatial backend providers can be local or remote and should be designed in django models and can be various: classic geospatial engines (GeoServer, Mapserver, QGIS server, ArcGIS) with all their complexities, big data storages(GeoMesa, GeoWave, etc) but also simpler like just file(GeoJson) or DB storage(ex. GeoDjango)
    3. Agreed at all with the suggestions from @afabiani in particular:
      • Introduce workflows: any ingestion should trigger a workflow that can be more or less complicated and composed by jobs, each of them defined by atomic tasks - preprocessing, data cleaning, validation, trasformation are just few examples
      • Introduce the concept of Dataset/Resource which can lead to multiple layers from a single resource, different workflow can be applied to Dataset vs its single layers
      • A dataset can be the output of whatever Web processing/API that has a spatial component(any geometry with coordinates and a declared CRS) clearly defined or hidden. We can span from WPS (local, remote) to generic Web API passing through all intermediatory file formats supported by GDAL
      • Metadata are crucial: if a Dataset already has own metadata then they can be part of the workflow in term of checks, validation, compliance etc but also any action performed on the data has to be added to metadata, if it doesn't at least a minimal set of metadata fields has to be created to track the tasks performed by GeoNode
      • Multitenancy should be supported with a more structured design "organisation-centric" for user and permission management:
        • Account Management ==> {Organisation,Role,Group,User}
        • Permission Management should rely on roles
        • There should be a dedicated django web api app for this purpose and for single sign on control while social login should be supported at the largest possible extent along with all use cases for passwordless, password management, forgot password etc

@jondoig
Copy link
Contributor

jondoig commented Aug 31, 2017

Exciting ideas! We would second the need for data containers or projects, but urge that this be as flexible as possible and look towards a linked open data approach, using RDF to build relationships between datasets and possibly even between features across datasets. @rob-metalinkage may have advice on how to approach this within the proposed architecture.

@rob-metalinkage
Copy link

Road map looks good IMHO and the discussion directions here are interesting. FYI I'm working on Mapstory, a Geonode project, on its future directions and a couple of comments:

  1. strongly agree the "layer" paradigm is limited - often data-manager centric not user needs centric. Users generally need to know the relationships between layers. ISO metadata is extremely poor in this regard - agree with comments it should be a view on a richer model.

  2. Augmentation of metadata with custom elements can be done, as suggested, with static files using the target schema - this will be fragile if you try to extract anything more interesting from such data however. The approach I am using (which does not need to be core but should be considered as an example of a pluggable approach) is to use semantic models and create opportunities of rich reasoning to build the types of relationships and customised metadata views needed (using managed rules as just more types of content)

  3. A key part of metadata (also extremely poorly handled by both standards and REST API environments) is "what are the valid values in an attribute in a dataset". This is in fact the key to being able to link between features in individual layers. Exploiting this aspect of metadata about layers allows one to build a Linked Data UI for the individual the features. (Note this is not Linked Data as in "turn everything to RDF in a bucket" but rather "build me a link to the service API endpoints that have specific data about the object I am looking at).

This stuff is all nascent - but in active development and hope to have the early backend betas integrated into the MapStory UI so the intent is more visible before the end of the year.

starting point is at https://github.com/rob-metalinkage/django-gazetteer - but feel free to contact me for a more detailed walk through.

@Coop56
Copy link
Contributor

Coop56 commented Nov 28, 2017

Is there a planned date when work will start on Geonode 3?

@francbartoli
Copy link
Member

not yet @Coop56. Do you want to start a thread in the dev mailing list?

@gamesbook
Copy link
Contributor

Has this work being continued elsewhere? Should this issue be closed now?

@francbartoli
Copy link
Member

@gamesbook for now there is just an empty repo which is supposed to be for OpenApi v3 model

@ingenieroariel
Copy link
Member

Personally, I tried to give this problem a go but was not able to make a dent. Main issue is the discrepancy between describing what is there (not a good idea), vs describing something that is not a pipe dream and is actually achievable in 6-12 months with 3-4 people (I was not able to deal with the complexity).

The direction I took in order to continue being productive was to rethink GeoNode around only the core upload / permissions / download problem and just like we did in 2009, look at existing robust software packages to solve those problems, I settled on the following 3 tools:

  • Minio for raw data storage (private AWS S3)
  • ORY Hydra, Keto and Oathkeeper for permissions (private AWS IAM)
  • Nginx with small Lua functions (private AWS Lambda)

What I am doing now is writing OpenAPI 3 definitions for an upload / permissions / download process based on how those tools already work. This obviously ignores the even bigger problem of workflows, for example configuring datasets in postgres, geoserver, tegola, etc.

In order to write an API that is implementable, I am creating a minimal geonode with just those tools and expect to have results to share by the next summit. The working code is at:

https://github.com/piensa/puertico

@francbartoli
Copy link
Member

@ingenieroariel I think the minimal problem for an OpenAPI 3 definition is to model the main entities for this new api. Layers don't convince me and I would prefer the concept of Datasets.

Then I'm wondering what is a dataset:

  1. A combination of vector and raster collections?
  • /datasets/{cool_dataset_name}/collections/{my_awesome_vector_collection}/items (WFS3 implementation or subset from remote WFS3)
  • /datasets/{cool_dataset_name}/collections/{my_awesome_raster_collection}/cogs ( COGs bucket server or subset from remote collection. Here a collection can be also described by a STAC specification)
    • {my_awesome_raster_collection} has a collection_type=raster property
  1. Just one of the two above with a property for being distinguished?
  • /datasets/{cool_vector_dataset_name}/collections/{my_awesome_collection}/items
  • /datasets/{cool_raster_dataset_name}/collections/{my_awesome_collection}/cogs
  • {cool_vector_dataset_name} has a dataset_type=vector
  • {cool_raster_dataset_name} has a dataset_type=raster

Open mind to different vision/model and happy if this work can be started. Also a great tool to collaborate can be spotlight

I'm going to open an issue and discuss this on the geonode-api repo

@tomkralidis
Copy link
Member

tomkralidis commented Mar 16, 2019 via email

@francbartoli
Copy link
Member

francbartoli commented Mar 16, 2019

Definitely agree that we should put forth a dataset centric approach. This dovetails with the resource oriented architecture we are seeing with emerging OGC standards. Types of datasets can be enumerated as per ISO 19115 (vector, grid, etc.).

Good idea!

We can also put forth the notion of virtual datasets (think WPS). We implemented the above in pygeoapi and are working on WCS REST.

pygeoapi is something where definitively to have a look at. Do you have some examples of WCS REST? How would that be related to the concept of COG?

I would suggest looking at the pygeoapi config and API (WFS3/WPS) for ideas for data access. Of course we would need a management API and do on for endpoints that do not fit into the above.

On Mar 15, 2019, at 14:00, Francesco Bartoli @.***> wrote: @ingenieroariel I think the minimal problem for an OpenAPI 3 definition is to model the main entities for this new api. Layers don't convince me and I would prefer the concept of Datasets. Then I'm wondering what is a dataset: A combination of vector and raster collections? /datasets/{cool_dataset_name}/collections/{my_awesome_vector_collection}/items (WFS3 implementation or subset from remote WFS3) {my_awesome_vector_collection} has a collection_type=vector property example: http://geo.weather.gc.ca/geomet-beta/features/collections/hydrometric-daily-mean/items/ /datasets/{cool_dataset_name}/collections/{my_awesome_raster_collection}/cogs ( COGs bucket server or subset from remote collection. Here a collection can be also described by a STAC specification) {my_awesome_raster_collection} has a collection_type=raster property Just one of the two above with a property for being distinguished? /datasets/{cool_vector_dataset_name}/collections/{my_awesome_collection}/items /datasets/{cool_raster_dataset_name}/collections/{my_awesome_collection}/cogs {cool_vector_dataset_name} has a dataset_type=vector {cool_raster_dataset_name} has a dataset_type=raster Open mind to different vision/model and happy if this work can be started. Also a great tool to collaborate can be spotlight I'm going to open an issue and discuss this on the geonode-api repo — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

@francbartoli
Copy link
Member

Opened issue geonode-api:#1 for continuing the discussion

@tomkralidis
Copy link
Member

tomkralidis commented Mar 19, 2019

Nothing in pygeoapi that is stable (working on it). I would imagine COG would simple be a given raster resource to which HTTP Range requests would be supported. +1 to continue over in GeoNode/geonode-api#1

@capooti capooti changed the title GNIP: GeoNode 3 GNIP: GeoNode 4 Apr 4, 2019
@capooti
Copy link
Member

capooti commented Apr 4, 2019

I have renamed this GNIP to GeoNode 4, as GeoNode 3 will be still based on current architecture using Python 3 and Django 2

@afabiani afabiani changed the title GNIP: GeoNode 4 GNIP-51: GeoNode 4 Aug 22, 2019
@afabiani afabiani reopened this Aug 22, 2019
@gannebamm
Copy link
Contributor

Just my 2cents:

With all my experience with current Geonodes codebase and working hard to keep it maintained, I would give my +1 for not beeing agnostic but choose frameworks and pin them. Check what possibilities are currently feasible and pick one as reference implementation. Only maintain this implementation. No more QGIS Server, Leaflet, OpenLayers, Geoserver, React and angular and maybe a bit of Vue codebase. We should be API first, which enables other frontend implementations, but we should not incorporate any of their code into our codebase.

This is a bit harsh, but I think will help us to stay clean for a longer period of time.

@gannebamm
Copy link
Contributor

@giohappy
Copy link
Contributor

giohappy commented May 9, 2022

RC of GeoNode 4 released

@giohappy giohappy closed this as completed May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gnip A GeoNodeImprovementProcess Issue
Projects
None yet
Development

No branches or pull requests