Skip to content

OpenECPDS boosts the efficiency and productivity of data services by using proven and innovative technologies. It offers a portable, adaptable application for diverse environments, with a user-friendly tool for managing data acquisition, dissemination with push/pull mechanisms, and a notification system, all using standard protocols.

License

Notifications You must be signed in to change notification settings

ecmwf/open-ecpds

Repository files navigation

SVG Image

Introduction

OpenECPDS has been designed as a multi-purpose repository, hereafter referred to as the Data Store, delivering three strategic data-related services:

  • Data Acquisition: the automatic discovery and retrieval of data from data providers.
  • Data Dissemination: the automatic distribution of data products to remote sites.
  • Data Portal: the pulling and pushing of data initiated by remote sites.

Data Acquisition and Data Dissemination are active services initiated by OpenECPDS, whereas the Data Portal is a passive service triggered by incoming requests from remote sites. The Data Portal service provides interactive access to the Data Dissemination and Data Acquisition services.

OpenECPDS enhances data services by integrating innovative technologies to streamline the acquisition, dissemination, and storage of data across diverse environments and protocols.

Data Storage and Retrieval

Unlike a conventional data store, OpenECPDS does not necessarily physically store the data in its persistent repository but rather works like a search engine, crawling and indexing metadata from data providers. However, OpenECPDS can cache data in its Data Store to ensure availability without relying on instant access to data providers.

Data can be fed into the Data Store via:

  • The Data Acquisition service, discovering and fetching data from data providers.
  • Data providers actively pushing data through the Data Portal.
  • Data providers using the OpenECPDS API to register metadata, allowing asynchronous data retrieval.

Data products can be searched by name or metadata and either pushed by the Data Dissemination service or pulled from the Data Portal by users. OpenECPDS streams data on the fly or sends it from the Data Store if it was previously fetched.

Protocols and Connections

OpenECPDS interacts with a variety of environments and supports multiple standard protocols:

  • Outgoing connections (Data Acquisition & Dissemination): FTP, SFTP, FTPS, HTTP/S, AmazonS3, Azure and Google Cloud Storage.
  • Incoming connections (Data Portal): FTP, HTTPS, S3 (SFTP and SCP soon available).

Protocol configurations vary based on authentication and connection methods (e.g., password vs. key-based authentication, parallel vs. serial connections).

The OpenECPDS software is modular, supporting new protocols through extensions.

Object Storage

OpenECPDS stores data as objects, combining data, metadata, and a globally unique identifier. It employs a file-system-based solution with replication across multiple locations to ensure continuous data availability. For example, data can be replicated across local storage systems and cloud platforms to bring data closer to users and enhance performance.

The object storage system in OpenECPDS is hierarchy-free but can emulate directory structures when necessary, based on metadata provided by data providers. OpenECPDS presents different views of the same data, depending on user preferences.

Additional Features

  • Notification System: Provides an embedded MQTT broker to publish notifications and an MQTT client to subscribe to data providers.
  • Data Compression: Supports various algorithms (lzma, zip, gzip, bzip2, lbzip2, lz4, snappy) to reduce dissemination time and enable faster access to data.
  • Data Checksumming: Provides MD5 for data integrity checks on the remote sites, and ADLER32 for data integrity checks in the data store.
  • Garbage Collection: Automatically removes expired data, with no limit on expiry dates.
  • Data Backup: Can be configured to map data sets in OpenECPDS to existing archiving systems.

Getting Started

Building and Running OpenECPDS

OpenECPDS requires Docker to be installed and fully functional, with the default Docker socket enabled (Settings -> Advanced -> "Allow the default Docker socket to be used"). The build and run process has been tested on Linux and macOS (Intel/Apple Silicon) using Docker Desktop v4.34.2. It has also been reported to work on Windows with the WSL 2 backend and the host networking option enabled.

The default setup needs a minimum of 3GB of available RAM. The disk space required depends on the size of the data you expect to handle, but at least 15GB is essential for the development and application containers.

To download the latest distribution, run the following command:

curl -L -o master.zip https://github.com/ecmwf/open-ecpds/archive/refs/heads/master.zip && unzip master.zip

A Makefile located in the open-ecpds-master directory can be used to create the development container that installs all the necessary tools for building the application. The Java classes are compiled, packaged into RPM files, and used to build Docker images for each OpenECPDS component.

Creating and Logging into the Development Container

To build the development container:

make dev

If successful, you should be logged into the development container.

Building and Configuring OpenECPDS

From there, you can run the following command to compile the Java classes, package the RPM files, and build the OpenECPDS Docker images:

make build

Warning: In a production environment, ENV should be avoided in Dockerfiles for sensitive data like MYSQL_ROOT_PASSWORD for the Database or KEYSTORE_PASSWORD for the Monitor and Mover. Docker secrets or environment variable files should be used instead.

Once the build process is complete, navigate to the following directory where another Makefile is available:

cd run/bin/ecpds

The services are started using Docker Compose. The docker-compose.yml file contains all the necessary configurations to launch and manage the different components of OpenECPDS. You can find this file in the appropriate directory for your OS (Darwin-ecpds for macOS or Linux-ecpds for Linux).

To verify the configuration and understand how Docker Compose interprets the settings before running the services, use the following command:

make config

For advanced configurations, you can fine-tune the options by modifying the default values in the Compose file. Each parameter is documented within the file itself to provide a better understanding of its function and how it impacts the system's behavior. By reviewing the Compose file, you can tailor the setup to your environment’s specific requirements.

Starting OpenECPDS

To start the application:

make up

This will start the OpenECPDS master, monitor, mover, and database services.

It might take a few seconds for all the services to start. Once they are up, you can access the following URLs (please update them if you changed the configuration in the compose files):

Warning: Certificate validation should be disabled when relevant, as the test environment uses a self-signed certificate.

Interface URL Login Details
Monitoring https://127.0.0.1:3443 admin/admin2021
Data Portal https://127.0.0.1:4443 test/test2021
ftp://127.0.0.1:4021 test/test2021
MQTT Broker mqtt://127.0.0.1:4883 test/test2021
Virtual FTP Server ftp://127.0.0.1:2021 admin/admin2021
JMX Interfaces http://127.0.0.1:2062 master/admin
http://127.0.0.1:3062 monitor/admin
http://127.0.0.1:4062 mover/admin

Checking the Containers and Logs

To verify that the containers are running, use:

make ps

To view the standard output (stdout) and standard error (stderr) streams generated by the containers, use:

make logs

To view the logs generated by OpenECPDS, you can browse the following directories mounted to the containers:

run/var/log/ecpds/master
run/var/log/ecpds/monitor
run/var/log/ecpds/mover

Additional Makefile Options

To log in to the database:

make mysql

To log in to the master container (use the same for monitor, mover, and database):

make connect container=master

Stopping OpenECPDS

To stop the application, run:

make down

To clean the logs and data:

make clean

Support Materials

You can access the Javadoc API documentation for OpenECPDS at the following link: Javadocs. This comprehensive documentation provides detailed information about the classes, methods, and functionalities available, serving as a valuable resource for developers.

Additionally, you can find the OpenECPDS options for various editors at this link: OpenECPDS Options. This documentation outlines the configurable options available in the OpenECPDS editors, helping users to customize their experience and optimize their workflow effectively.

About

OpenECPDS boosts the efficiency and productivity of data services by using proven and innovative technologies. It offers a portable, adaptable application for diverse environments, with a user-friendly tool for managing data acquisition, dissemination with push/pull mechanisms, and a notification system, all using standard protocols.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published