Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Postgres not connecting in apache beam with go sdk #28446

Open
1 of 15 tasks
sroutweave opened this issue Sep 14, 2023 · 5 comments
Open
1 of 15 tasks

[Bug]: Postgres not connecting in apache beam with go sdk #28446

sroutweave opened this issue Sep 14, 2023 · 5 comments

Comments

@sroutweave
Copy link

sroutweave commented Sep 14, 2023

What happened?

We are trying to connect to postgres DB using apache beam go sdk . But it's throwing error each time. Tried both I/O connector databaseio & Jdbcio. Can we get any help on this please?

With databaseio connector

caused by: source failed caused by: DoFn[UID:4, PID:org.postgresql.Driver.Query/databaseio.queryFn, Name: github.com/apache/beam/sdks/v2/go/pkg/beam/io/databaseio.queryFn] failed: failed to open database: org.postgresql.Driver caused by: sql: unknown driver "org.postgresql.Driver" (forgotten import?)

With Jdbcio connector

panic: tried cross-language for beam:transform:org.apache.beam:schemaio_jdbc_read:v1 against autojava::sdks:java:extensions:schemaio-expansion-service:runExpansionService;org.postgresql:postgresql:42.3.3 and failed expanding external transform error in starting expansion service, StartService(): context deadline exceeded

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@johannaojeling
Copy link
Contributor

@sroutweave the native databaseio connector uses the database/sql package, which requires a supported driver to be registered. For Postgres, you could use for instance pq or pgx.

In your Go program, you will need to add a blank import to the selected driver package in order for its driver to get registered. The pq driver will be registered under the name postgres and pgx under pgx. You will need to pass the same name as the driver argument to the databaseio.Read function, instead of the JDBC name org.postgresql.Driver for the correct driver to be picked up.

Example program:

package main

import (
	"context"
	"flag"
	"log"
	"reflect"

	"github.com/apache/beam/sdks/v2/go/pkg/beam"
	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/databaseio"
	"github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
	"github.com/apache/beam/sdks/v2/go/pkg/beam/x/debug"
	_ "github.com/jackc/pgx/v5/stdlib" // driver: "pgx"
	//_ "github.com/lib/pq" // driver: "postgres"
)

var (
	driver = flag.String("driver", "pgx", "Database driver")
	dsn    = flag.String("dsn", "host=localhost port=5432 user=postgres password=pwd dbname=postgres sslmode=disable", "Database DSN")
	table  = flag.String("table", "users", "Database table")
)

type User struct {
	ID    string
	Name  string
	Email string
}

func main() {
	flag.Parse()

	beam.Init()
	p, s := beam.NewPipelineWithRoot()

	rows := databaseio.Read(s, *driver, *dsn, *table, reflect.TypeOf(User{}))
	debug.Print(s, rows)

	ctx := context.Background()
	if err := beamx.Run(ctx, p); err != nil {
		log.Fatalf("Failed to execute job: %v", err)
	}
}

@sroutweave
Copy link
Author

sroutweave commented Sep 18, 2023

_ "github.com/jackc/pgx/v5/stdlib" 

I am trying to connect cloudsql postgres db instance . And for that what should be the driver? I didn't find in the supported driver list. Can you please help in that? @johannaojeling

@johannaojeling
Copy link
Contributor

I am trying to connect cloudsql postgres db instance . And for that what should be the driver? I didn't find in the supported driver list. Can you please help in that? @johannaojeling

There are different ways to connect to a Cloud SQL database depending on your requirements but you should be able to use one of those mentioned Postgres drivers. Please see the Google Cloud docs for more information and code/configuration examples. The Go snippets use the pgx driver.

@sroutweave
Copy link
Author

I am trying to connect cloudsql postgres db instance . And for that what should be the driver? I didn't find in the supported driver list. Can you please help in that? @johannaojeling

There are different ways to connect to a Cloud SQL database depending on your requirements but you should be able to use one of those mentioned Postgres drivers. Please see the Google Cloud docs for more information and code/configuration examples. The Go snippets use the pgx driver.

So when I tried with pgx & postgres driver for cloudsql postgres db instance to connect even though the tables are there it shows error as table doesn't exist.

@johannaojeling
Copy link
Contributor

@sroutweave it would be helpful if you could tell how to reproduce the problem, including information such as:

  • relevant code and versions
  • how you are running the pipeline (locally, Dataflow, ..?)
  • how your Cloud SQL instance is configured (e.g. private vs public IP) and how you usually connect to it
  • logs showing the error message
  • any other details that may be useful for people help you

@lostluck lostluck changed the title Postgres not connecting in apache beam with go sdk[Bug]: [Bug]: Postgres not connecting in apache beam with go sdk Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants