Skip to content

Commit

Permalink
Merge pull request #464 from GIScience/delete-policy
Browse files Browse the repository at this point in the history
Implement deletion policy
  • Loading branch information
matthiasschaub authored Aug 1, 2024
2 parents 92614b6 + 637bd16 commit 4d8e329
Show file tree
Hide file tree
Showing 44 changed files with 124,635 additions and 12,411 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -250,3 +250,5 @@ debug-output/
# ml-models
*.pt
*.pth

celerybeat-schedule.*
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ RUN npm run build


FROM condaforge/mambaforge:23.3.1-0
# HTTP request timeout. Default is 30 seconds.
ENV POETRY_REQUESTS_TIMEOUT=60

RUN apt-get update \
&& apt-get install -y --no-upgrade \
Expand Down
1 change: 0 additions & 1 deletion docker-compose.yaml → compose.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
version: "3.9"
services:
flask:
# Web app
Expand Down
25 changes: 4 additions & 21 deletions config/sample.config.toml
Original file line number Diff line number Diff line change
@@ -1,23 +1,6 @@
data-dir = "/some/absolute/path"
user-agent = "sketch-map-tool"
broker-url = "redis://localhost:6379"
result-backend = "db+postgresql://smt:smt@localhost:5432"
wms-url-osm = "https://maps.heigit.org/osm-carto/service?SERVICE=WMS&VERSION=1.1.1"
wms-layers-osm = "heigit:osm-carto@2xx"
wms-url-esri-world-imagery = "https://maps.heigit.org/sketch-map-tool/service?SERVICE=WMS&VERSION=1.1.1"
wms-url-esri-world-imagery-fallback = "https://maps.heigit.org/sketch-map-tool/service?SERVICE=WMS&VERSION=1.1.1"
wms-layers-esri-world-imagery = "world_imagery"
wms-layers-esri-world-imagery-fallback = "world_imagery_fallback"
wms-read-timeout = 600
max-nr-simultaneous-uploads = 25
max_pixel_per_image = 100000000
# required configuration variables
neptune_api_token = "h0dHBzOi8aHR06E0Z...jMifQ"
neptune_project = "HeiGIT/SketchMapTool"
neptune_model_id_yolo_osm_cls = "SMT-CLR-1"
neptune_model_id_yolo_esri_cls = "SMT-CLR-3"
neptune_model_id_yolo_osm_obj = "SMT-OSM-9"
neptune_model_id_yolo_esri_obj = "SMT-ESRI-1"
neptune_model_id_sam = "SMT-SAM-1"
model_type_sam = "vit_b"
esri-api-key = ""
log-level = "INFO"
# required configuration variables for docker compose setup
# broker-url = "redis://redis:6379"
# result-backend = "db+postgresql://smt:smt@postgres:5432"
18 changes: 18 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,20 @@ All lot of configuration values come with defaults. Required configuration value
- `neptune_api_token`
- `esri-api-key`

### ArcGIS ESRI

To get an ArcGIS/ESRI API key sign-up for [ArcGIS Location Platform](https://location.arcgis.com/sign-up/)
and follow [this tutorial](https://developers.arcgis.com/documentation/security-and-authentication/api-key-authentication/tutorials/create-an-api-key/).

> Note: Keep the referrer field empty.
### neptune.ai

Ask the team to get an invite the Sketch Map Tool project on neptuine.ai.

To get the API key go to "Project Metadata" and copy the key from the example code.


## Configuration for Docker Compose

For running the services using Docker Compose set broker URL and result backend to:
Expand All @@ -29,3 +43,7 @@ For running the services using Docker Compose set broker URL and result backend
broker-url = "redis://redis:6379"
result-backend = "db+postgresql://smt:smt@postgres:5432"
```

## Default Configuration

For a list of all configuration variables and their default values please take a look at [config.py](sketch_map_tool/config.py).
22 changes: 16 additions & 6 deletions docs/development-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
For contributing to this project please also read the [Contribution Guideline](/CONTRIBUTING.md).

> Note: To just run the Sketch Map Tool locally, provide the required [configuration](/docs/configuration.md)
> and use Docker Compose: `docker compose up -d`
> and use Docker Compose: `docker compose up -d`.
## Prerequisites (Requirements)

- Python: `>=3.11`
- [Mamba](https://github.com/conda-forge/miniforge#install): `>=1.4`
- Node: `>=14`

This project uses [Mamba](https://github.com/conda-forge/miniforge#install) for environment and dependencies management. Please make sure it is installed on your system: [Installation Guide](https://github.com/conda-forge/miniforge#install). Instead of Mamba, Conda can also be used.
This project uses [Mamba](https://github.com/conda-forge/miniforge#install) for environment and dependencies management.
Please make sure it is installed on your system: [Installation Guide](https://github.com/conda-forge/miniforge#install).
Instead of Mamba, Conda can also be used.

> Actually, Mamba and Poetry together are used to manage environment and dependencies.
> But only Mamba is required to be present on the system.
Expand Down Expand Up @@ -80,7 +82,7 @@ Please refer to the [configuration documentation](/docs/configuration.md).
```bash
mamba activate smt
docker start smt-postgres smt-redis
celery --app sketch_map_tool.tasks worker --beat --concurrency 4 --loglevel=INFO
celery --app sketch_map_tool.tasks worker --beat --pool solo --loglevel=INFO
```

### 2. Start Flask (Web App)
Expand All @@ -105,7 +107,7 @@ ruff format

### Tests

Provide required [configuration variables](/docs/configuration.md#required-configuration) in `config/test.config.toml`.
Provide required [configuration variables](/docs/configuration.md#required-configuration) in `config/test.config.toml`. Be sure *not* to set `broker-url` and `result-backend`.

To execute all tests run:
```bash
Expand All @@ -114,7 +116,7 @@ pytest

To get live logs, INFO log level and ignore verbose logging messages of VCR run:
```bash
pytest -s --log-level="INFO" --log-disable="vcr"
pytest --capture=no --log-level="INFO" --log-disable="vcr"
```

The integration test suite utilizes the [Testcontainers framework](https://testcontainers.com/)
Expand Down Expand Up @@ -171,14 +173,22 @@ Bundle the code with:
npm run build
```

## Database

To connect to the Postgres database when running it as Docker container with the before mentioned Docker run command:
`psql -h localhost -d smt -U smt -p 5432 -W`.

If you run the database as Docker Compose service run:
`psql -h localhost -d smt -U smt -p 5444 -W`.

## Setup in an IDE

If you setup sketch-map-tool in an IDE like PyCharm please make sure that your IDE does not setup a Poetry managed project/virtual environment.
Go thought the setup steps above in the terminal and change interpreter settings in the IDE to point to the mamba/conda environment.

Also make sure the environment variable `PROJ_LIB` to point to the `proj` directory of the mamba/conda environment:
```bash
PROJ_LIB=/home/$USERDIR/mambaforge/envs/smt/share/proj
PROJ_LIB=/home/$USERDIR/miniforge3/envs/smt/share/proj
```

## Setup on an Apple Mac with M2 chip
Expand Down
7 changes: 4 additions & 3 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion scripts/celery.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/bin/bash
# Run celery
poetry run celery --app sketch_map_tool.tasks worker --beat --concurrency 4 --loglevel=INFO
poetry run celery --app sketch_map_tool.tasks worker --beat --pool solo --loglevel=INFO
7 changes: 7 additions & 0 deletions sketch_map_tool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,13 @@
"worker_send_task_events": True, # send task-related events to be monitored
# Avoid errors due to cached db connections going stale through inactivity
"database_short_lived_sessions": True,
# Cleanup map frames and uploaded files stored in the database
"beat_schedule": {
"cleanup": {
"task": "sketch_map_tool.tasks.cleanup_map_frames",
"schedule": timedelta(hours=3),
},
},
}


Expand Down
1 change: 1 addition & 0 deletions sketch_map_tool/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
"user-agent": "sketch-map-tool",
"broker-url": "redis://localhost:6379",
"result-backend": "db+postgresql://smt:smt@localhost:5432",
"cleanup-map-frames-interval": "12 months",
"wms-url-osm": "https://maps.heigit.org/osm-carto/service?SERVICE=WMS&VERSION=1.1.1",
"wms-layers-osm": "heigit:osm-carto@2xx",
"wms-url-esri-world-imagery": "https://maps.heigit.org/sketch-map-tool/service?SERVICE=WMS&VERSION=1.1.1",
Expand Down
139 changes: 123 additions & 16 deletions sketch_map_tool/database/client_celery.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
import logging
from io import BytesIO
from uuid import UUID

import psycopg2
from psycopg2.errors import UndefinedTable
from psycopg2.extensions import connection

from sketch_map_tool import __version__
from sketch_map_tool.config import get_config_value
from sketch_map_tool.exceptions import CustomFileNotFoundError
from sketch_map_tool.exceptions import (
CustomFileDoesNotExistAnymoreError,
CustomFileNotFoundError,
)
from sketch_map_tool.helpers import N_
from sketch_map_tool.models import Bbox, Layer, PaperFormat

db_conn: connection | None = None

Expand All @@ -25,29 +32,125 @@ def close_connection():
db_conn.close()


def insert_map_frame(file: BytesIO, uuid: UUID):
"""Insert map frame as blob into the database with the uuid as primary key.
def insert_map_frame(
file: BytesIO,
uuid: UUID,
bbox: Bbox,
format_: PaperFormat,
orientation: str,
layer: Layer,
):
"""Insert map frame alongside map generation parameters into the database.
The map frame is later on needed for georeferencing the uploaded photo or scan of
a sketch map.
The UUID is the primary key.
The map frame is needed for georeferencing the uploaded files (sketch maps).
"""
create_query = """
CREATE TABLE IF NOT EXISTS map_frame(
uuid UUID PRIMARY KEY,
file BYTEA
)
"""
insert_query = "INSERT INTO map_frame(uuid, file) VALUES (%s, %s)"
CREATE TABLE IF NOT EXISTS map_frame(
uuid UUID PRIMARY KEY,
file BYTEA,
bbox VARCHAR,
lat FLOAT,
lon FLOAT,
format VARCHAR,
orientation VARCHAR,
layer VARCHAR,
version VARCHAR,
ts TIMESTAMP WITH TIME ZONE DEFAULT now()
)
"""
insert_query = """
INSERT INTO map_frame (
uuid,
file,
bbox,
lat,
lon,
format,
orientation,
layer,
version
)
VALUES (
%s,
%s,
%s,
%s,
%s,
%s,
%s,
%s,
%s)
"""
with db_conn.cursor() as curs:
curs.execute(create_query)
curs.execute(insert_query, (str(uuid), file.read()))
curs.execute(
insert_query,
(
str(uuid),
file.read(),
str(bbox),
bbox.centroid[0],
bbox.centroid[1],
str(format_),
orientation,
layer,
__version__,
),
)


def delete_map_frame(uuid: UUID):
"""Delete map frame of the associated UUID from the database."""
query = "DELETE FROM map_frame WHERE uuid = %s"
def cleanup_map_frames():
"""Cleanup map frames which are old and without consent.
Only set file to null. Keep metadata.
This function is called by a periodic celery task.
"""
query = """
UPDATE
map_frame
SET
file = NULL,
bbox = NULL
WHERE
ts < NOW() - INTERVAL %s
AND NOT EXISTS (
SELECT
*
FROM
blob
WHERE
map_frame.uuid = blob.map_frame_uuid
AND consent = TRUE);
"""
with db_conn.cursor() as curs:
try:
curs.execute(query, [get_config_value("cleanup-map-frames-interval")])
except UndefinedTable:
logging.info("Table `map_frame` does not exist yet. Nothing todo.")


def cleanup_blob(map_frame_uuids: list[UUID]):
"""Cleanup uploaded files (sketch maps) without consent.
Only set file and name to null. Keep metadata.
This function is called after digitization.
"""
query = """
UPDATE
blob
SET
file = NULL,
file_name = NULL
WHERE
map_frame_uuid = %s
AND consent = FALSE;
"""
with db_conn.cursor() as curs:
curs.execute(query, [str(uuid)])
try:
curs.executemany(query, [map_frame_uuids])
except UndefinedTable:
logging.info("Table `blob` does not exist yet. Nothing todo.")


def select_file(id_: int) -> bytes:
Expand All @@ -57,6 +160,10 @@ def select_file(id_: int) -> bytes:
curs.execute(query, [id_])
raw = curs.fetchone()
if raw:
if raw[0] is None:
raise CustomFileDoesNotExistAnymoreError(
N_("The file with the id: {ID} does not exist anymore"), {"ID", id_}
)
return raw[0]
else:
raise CustomFileNotFoundError(
Expand Down
Loading

0 comments on commit 4d8e329

Please sign in to comment.