Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValidationError with Postgres JSON arrays #189

Open
Lawiss opened this issue Sep 10, 2023 · 3 comments
Open

ValidationError with Postgres JSON arrays #189

Lawiss opened this issue Sep 10, 2023 · 3 comments
Labels
question Further information is requested

Comments

@Lawiss
Copy link

Lawiss commented Sep 10, 2023

Hello,
I have an issue when trying to load a table from Postgres to Postgres :

jsonschema.exceptions.ValidationError: [{'type': 'BIG_BAG', 'other': '', 'quantity': 5}] is not of type 'object'
Failed validating 'type' in schema['properties']['packagings']: {'properties': {}, 'type': 'object'} 
On instance['packagings']:  [{'other': '', 'quantity': 5, 'type': 'BIG_BAG'}]

The issue is that JSON(b) columns in Postgres can contain JSON arrays which are converted into Python list inside the target-postgres plugin causing an inconsistency with the JSON schema type returned by the tap-postgres.
I looked into the code of the transferwise's variant and found that for arrayand objectJSON schema types it creates jsonb Postgres columns : https://github.com/transferwise/pipelinewise-target-postgres/blob/bef5a2786afab3c1600e62a77b6d121625e7bc8e/target_postgres/db_sync.py#L44C5-L45C27

@visch
Copy link
Member

visch commented Sep 12, 2023

Can you create a reproducible example here like the DDL and DML that creates a sequence the meltanolabs tap-postgres and meltanolabs target-postgres doesn't like? This would be the most helpful here as it's not clear right now if this is a tap-postgres issue or target-postgres issue. To me it seems like the type of [{'type': 'BIG_BAG', 'other': '', 'quantity': 5}] should be an array not an object and array's should work properly (I think?)

@visch visch added the question Further information is requested label Sep 12, 2023
@Lawiss
Copy link
Author

Lawiss commented Sep 12, 2023

Hello @visch,
You can use those command to create an example table in Postgres with some data:

create table "public"."test_meltano" (
    id int not null,
    packagings jsonb not null,
    constraint "test_meltano_pkey" primary key (id)
);

INSERT INTO "public"."test_meltano" (id, "packagings") VALUES(1,'[{"type": "DEPOT_BAG", "other": "", "quantity": 1}]'::jsonb);
INSERT INTO "public"."test_meltano" (id, "packagings") VALUES(2,'[{"type": "BIG_BAG", "other": "", "quantity": 2},{"type": "DEPOT_BAG", "other": "", "quantity": 1}]'::jsonb);
INSERT INTO "public"."test_meltano" (id, "packagings") VALUES(3,'[]'::jsonb);

The meltano.yml contains:

plugins:
  extractors:
    - name: tap-postgres
      variant: meltanolabs
      pip_url: git+https://github.com/MeltanoLabs/tap-postgres.git
      select:
        - public-test_meltano.*
  loaders:
    - name: target-postgres
      variant: meltanolabs
      pip_url: meltanolabs-target-postgres~=0.0.7

After a first run, the tap generate this tap.proprerties.json:

{
      "tap_stream_id": "public-test_meltano",
      "table_name": "test_meltano",
      "replication_method": "",
      "key_properties": [
        "id"
      ],
      "schema": {
        "properties": {
          "id": {
            "type": [
              "integer"
            ]
          },
          "packagings": {
            "properties": {},
            "type": "object"
          }
        },
        "type": "object",
        "required": [
          "id",
          "packagings"
        ]
      },
      "is_view": false,
      "stream": "public-test_meltano",
      "metadata": [
        {
          "breadcrumb": [
            "properties",
            "id"
          ],
          "metadata": {
            "inclusion": "automatic",
            "selected": true
          }
        },
        {
          "breadcrumb": [
            "properties",
            "packagings"
          ],
          "metadata": {
            "inclusion": "available",
            "selected": true
          }
        },
        {
          "breadcrumb": [],
          "metadata": {
            "inclusion": "available",
            "table-key-properties": [
              "id"
            ],
            "forced-replication-method": "",
            "schema-name": "public",
            "selected": true
          }
        }
      ],
      "selected": true
    }

As you can see, packagings property has only the object type. The problem is that this columns contains in fact JSON arrays that will be received by the target as Python arrays of dicts. Then the target throw a validation error as the property type is not respected.

I don't know the best way to handle this particular case, Transferwise's tap appends an array type to the property and then the target creates JSON columns to store the data.

@sebastianswms
Copy link
Contributor

This seems like a problem with tap-postgres as opposed to target-postgres. If the tap was accurately reporting the fact that the column can contain arrays and not just objects, everything would be operating smoothly. I've put in a PR in tap-postgres to fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants