Data sharing #17

sworisbreathing · 2021-08-12T06:16:42Z

Adds a redshit_datashare resource to manage data sharing between Redshift clusters. This should be defined on the producer cluster. For managing the schemas and objects, we use a nested attribute block.

resource "redshift_datashare" "share" {
  name = "my_datashare"
  owner = "fred"

  schema {
    name = redshift_schema.schema.name
    mode = "auto"
  }

  schema {
    name = redshift_schema.other_schema.name
    mode = "manual"
    tables = ["foo"]
    functions = ["bar"]
  }
}

Note that we're rolling up the management of schemas (and their objects) into two modes, auto and manual. This avoids a lot of weird edge cases that would otherwise come up if we'd tried to expose individual settings for ALL TABLES, ALL FUNCTIONS, and INCLUDENEW in the ALTER DATASHARE command:

in auto mode, we add ALL TABLES IN SCHEMA and ALL FUNCTIONS IN SCHEMA and SET INCLUDENEW=true FOR SCHEMA, so that newly-created tables/functions are automatically exposed to the datashare by the redshift cluster itself, without needing to re-run terraform.
in manual mode, we only expose the specific tables/functions that are configured in the schema block.

This PR turned out to be so big that I decided it best not to include the corresponding data source. For that one I intend for it to follow the same structure as what's in this PR, and allow it to be defined on either the producer or the consumer.

The update code turned out to be way more involved than I'd originally hoped, due to issues with terraform-plugin-sdk. What I really wanted for the nested schema blocks was to use blocks, but treat them in the backend as a map instead of a set/list (in other words, treat name as the unique identifier for the nested schema block). Unfortunately, schema.TypeMap can only store primitive types. Doing a simple hash function on the name meant terraform wasn't picking up changes to the tables/functions inside the schema block. I tried this in combination with CustomizeDiff but never could get it to properly detect the changes.

I ended up taking inspiration from the aws_security_group resource in terraform-provider-aws, which also has to work around this issue of doing incremental updates to nested attributes in a schema.TypeSet. The terraform plan output makes it appear that we're completely dropping the schema from the datashare and then re-adding it (I'd hoped that hashing on the name would solve that problem, but instead terraform simply didn't detect changes to the schema configuration), but during the update there's a bunch of extra logic to figure out what tables/functions actually need adding/removing. It's uglier than I'd like, but it works when none of the other approaches I've taken did.

sworisbreathing · 2021-08-12T06:18:08Z

I'm putting this in draft for now because there's still a bit of work to finish off the update logic, and probably some cleanup as well. But I wanted to go ahead and open the PR to show what it will look like.

Before putting this into ready for review I'll probably squash the commits.

…ove entire schema with all assets works. Modify manually-managed assets in datashare schema still not implemented.

sworisbreathing · 2021-08-13T01:07:42Z

One other thing to note... functions in manual mode doesn't work and instead fails with syntax error at end of input when trying to run ALTER DATASHARE... ADD FUNCTION.... This appears to either be a bug in Redshift or incorrect documentation. I've got a support case open with AWS and they've reproduced the issue and escalated to the engineering team for further investigation.

UPDATE: I've identified the issue, but it's kind of a deal-breaker for being able to manage functions in a data share. The issue is that when you manage UDFs in Redshift, you have to include the parameter list, like so:

ALTER DATASHARE ADD FUNCTION foo(varchar);

(NB: this is also the syntax you need in order to DROP FUNCTION...)

However you can't read back the parameter list - SVV_DATASHARE_OBJECTS only includes the function name but not the signature. This is problematic for 3 reasons:

If we're updating a datashare schema from auto mode to manual mode, then the functions list is computed from SVV_DATASHARE_OBJECTS. If we need to drop some functions from the datashare, we don't have enough information to generate a workable DDL statement.
Redshift allows function overloading - it's perfectly acceptable to have functions foo(varchar) and foo(varchar, varchar), but since SVV_DATASHARE_OBJECTS doesn't include the parameter list, but we can't know which of our foo functions is exposed to the datashare and which isn't. It also means that if we specify the parameter list as functions = ["foo(varchar)"], terraform will always show us as needing to make changes because we lose the full signature.
INCLUDENEW applies to both functions and tables in the schema. This means that, if we removed the functions attribute, functions would silently slip through undetected in auto mode. You can't do ALTER DATASHARE ... SET INCLUDENEW=true FOR TABLES IN SCHEMA...

I can think of 3 ways of handling this:

Rewrite the redshift_datashare resource so that you only pass in a list of schema names, and all schemas operate in auto mode. This is the simplest solution, but the least flexible.
Drop support for the functions attribute. This still allows you to manage tables manually, but means that you can only share functions in auto mode, and terraform won't actually tell you what functions are exposed.
Leave everything as-is for now, but add a bunch of warnings in the documentation recommending against using functions in manual mode, and advise that it'll be fixed up in a future release of the provider once AWS starts providing the missing signature information in SVV_DATASHARE_OBJECTS. This is probably what I'll do in the PR for now, but it's the one I'm least happy with.

@winglot what is your preferred approach? I think option 1 makes the most sense from a user standpoint, but I don't want to rip out all of the code I've written yet if you think there's value in being able to manage specific tables within a datashare/schema.

sworisbreathing · 2021-08-13T02:59:51Z

further to my comment above, I think what I will probably do is create an alternative branch/PR for option 1 and close this one for the time being. That will make things work in a reliable and expected way, and leave the door open to more fine-grained controls if AWS ever roll out a fix for the above issue, without having to rewrite all the code from scratch.

…t-digest Update actions/checkout digest to 8ade135

sworisbreathing added 2 commits August 12, 2021 16:22

Add data sharing resource.

f90ab4d

throwaway commit. Partial implementation of datashare modify. Add/rem…

598db51

…ove entire schema with all assets works. Modify manually-managed assets in datashare schema still not implemented.

sworisbreathing force-pushed the data-sharing branch from 6203392 to 598db51 Compare August 12, 2021 06:23

sworisbreathing added 2 commits August 12, 2021 16:49

clean up data share examples

d993620

Throwaway commit. Compute modifications but do not execute them

207c4b7

fix update from manual to auto

34987b9

sworisbreathing closed this Aug 13, 2021

sworisbreathing mentioned this pull request Aug 13, 2021

Add simple redshift_datashare resource #18

Merged

StevenKGER pushed a commit to dbsystel/terraform-provider-redshift that referenced this pull request Oct 25, 2024

Merge pull request brainly#17 from mmichaelb/renovate/actions-checkou…

abdd482

…t-digest Update actions/checkout digest to 8ade135

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data sharing #17

Data sharing #17

sworisbreathing commented Aug 12, 2021

sworisbreathing commented Aug 12, 2021 •

edited

Loading

sworisbreathing commented Aug 13, 2021 •

edited

Loading

sworisbreathing commented Aug 13, 2021

Data sharing #17

Data sharing #17

Conversation

sworisbreathing commented Aug 12, 2021

sworisbreathing commented Aug 12, 2021 • edited Loading

sworisbreathing commented Aug 13, 2021 • edited Loading

sworisbreathing commented Aug 13, 2021

sworisbreathing commented Aug 12, 2021 •

edited

Loading

sworisbreathing commented Aug 13, 2021 •

edited

Loading