DBLINK()
is a Vertica User Defined Transform Function coded in C++ to run SQL against other databases.
For example, the following statement runs a row count in PostgreSQL and retrieves the result (6,001,215) in Vertica:
=> SELECT DBLINK(USING PARAMETERS
cid='pgdb',
query='SELECT COUNT(*) FROM tpch.lineitem'
) OVER();
count
--------
6,001,215
--- 1 row selected in 0.228s (prep 0.17s, exec 0.211s, fetch 0.000/0.000s)
DBLINK()
is a Vertica function that pushes SQL to other databases and retrieves the result of the remote execution back in Vertica. DBLINK()
can push any type of SQL commands that the remote database accepts:
- DDL statements. For example,
CREATE
a table in the remote database. - DML statements to manipulate data in the remote database (
INSERT
,UPDATE
, etc.). - DQL statements to
SELECT
data from the remote database using the SQL dialect and functions available on the remote database. - DCL statements like
GRANT
andREVOKE
.
The following statement creates a table in Vertica named public.customer
that contains 10% of randomly selected data from the PostreSQL table tpch.customer
:
=> CREATE TABLE public.customer AS
SELECT DBLINK(USING PARAMETERS
cid='pgdb',
query='SELECT * FROM tpch.customer WHERE RANDOM() < 0.1')
OVER();
This statement creates an empty table in Vertica corresponding to the table definition in the remote database:
=> CREATE TABLE public.customer AS
SELECT DBLINK(USING PARAMETERS
cid='pgdb',
query='SELECT * FROM tpch.customer LIMIT 0')
OVER();
This statement will group-by the result of a JOIN
between the Vertica table tpch.nation
and the MySQL table tpch.region
:
=> SELECT r.r_name, count(*)
FROM tpch.nation n
LEFT OUTER JOIN
( SELECT DBLINK(USING PARAMETERS
cid='mypg’,
query='SELECT r_name, r_regionkey FROM tpch.region'
) OVER()) r
ON n.n_regionkey = r.r_regionkey
GROUP BY 1 ;
This statement drops a PostgreSQL table if exists:
=> SELECT DBLINK(USING PARAMETERS
cid='pgdb',
query='DROP TABLE IF EXISTS public.t1') OVER();
Sometimes the SQL that you want to push to the remote database is quite complex. In these cases, you might find useful to write the SQL in a file using your preferred editor, and then pass the file containing the SQL text to DBLINK()
using the following syntax:
=> SELECT DBLINK(USING PARAMETERS
cid='mysql',
query='@/tmp/myscript.sql') OVER()";
You can install DBLINK()
from the latest released binaries without needing to clone this repository. Or, you can clone the repository and build from source. Whichever way you chose, ODBC has to be configured on all nodes in order to use DBLINK()
..
DBLINK()
uses ODBC to interact with the remote databases. You must install and configure the following on all nodes in your cluster:
- ODBC Driver Manager (DBLINK has been tested with unixODBC)
- Specific ODBC Drivers for the remote database that you want to connect to
- Configure the ODBC layer
- Download the appropriate build of ldblink.so from the latest release. Example: ldblink.so.centos-v23.3.0 -> binary for vertica v23.3.0 on CentOS
- Rename the binary to
ldblink.so
- Copy
ldblink.so
to the initiator node. - Execute this SQL substituting the full path of ldblink.so on the initiator node:
CREATE OR REPLACE LIBRARY DBLink AS '/full/path/to/ldblink.so' LANGUAGE 'C++';
CREATE OR REPLACE TRANSFORM FUNCTION dblink AS LANGUAGE 'C++' NAME 'DBLinkFactory' LIBRARY DBLink ;
GRANT EXECUTE ON TRANSFORM FUNCTION dblink() TO PUBLIC ;
-
Create a Connection Identifier Database (a simple text file) under
/usr/local/etc/dblink.cids
. You can use a different location by changing theDBLINK_CIDS
define in the source code. For details, see Configure DBLINK(). -
(optional) Delete ldblink.so from initiator node.
Before you run make
commands, review the Makefile and make any necessary changes.
- First you need to Setup a C++ Development Environment
- Then to build using the installed Vertica SDK and devtoolset (centos): just run
make
- Install the library in Vertica (as dbadmin):
make install
- Create a Connection Identifier Database (a simple text file) under
/usr/local/etc/dblink.cids
. You can use a different location by changing theDBLINK_CIDS
define in the source code. For details, see Configure DBLINK().
This is mainly for testing and release, or when you want to build dblink for a specific vertica version on centos/ubuntu but you do not have the tools to do it. It uses the verticasdk docker image owned by Vertica to build dblink for any specified vertica version.
-
Compile the DBLINK source code with for the appropriate Vertica version and Linux distribution.
- To build without needing to set up a build environment, specify the version and target OS. For example:
it will create
$ make -C docker-dblink VERTICA_VERSION=12.0.2 OSTAG=ubuntu
ldblink.s0.ubuntu-v12.0.2
in the docker-dblink directory
- To build without needing to set up a build environment, specify the version and target OS. For example:
-
Install the library in Vertica (as dbadmin): Follow Installing pre-built libraries from step 2.
-
Create a Connection Identifier Database (a simple text file) under
/usr/local/etc/dblink.cids
. You can use a different location by changing theDBLINK_CIDS
define in the source code. For details, see Configure DBLINK().
You can uninstall the library with DROP LIBRARY DBLink
in vsql or by running make clean
.
DBLINK()
requires two parameters and accepts one optional parameter with the following syntax:
DBLINK(USING PARAMETERS cid=value, query=value[, rowset=value]);
Name | Required | Description |
---|---|---|
cid |
No | Connection Identifier Database. Identifies an entry in the connection identifier database. |
connect_secret |
No | The ODBC connection string containing the DSN and credentials. |
query |
Yes | The query being pushed on the remote database. If the first character of this parameter is @ , the rest is interpreted as the name of the file containing the query. |
rowset |
No | Number of rows retrieved from the remote database during each SQLFetch() cycle. Default is 100. |
For example, the following query retrieves data from the remote database 500 rows at a time:
=> SELECT DBLINK(USING PARAMETERS
cid='pgdb',
query='SELECT c_custkey, c_nationkey, c_phone FROM tpch.customer ORDER BY 1',
rowset=500) OVER();
c_custkey | c_nationkey | c_phone
-----------+-------------+-----------------
1 | 15 | 25-989-741-2988
2 | 13 | 23-768-687-3665
3 | 1 | 11-719-748-3364
4 | 4 | 14-128-190-5944
5 | 3 | 13-750-942-6364
6 | 20 | 30-114-968-4951
7 | 18 | 28-190-982-9759
...
One way to specify the connection parameters is to use a Connection Identifier
Database -- a simple text file containing the codes used with cid
. The
cid file must exist in the same location on all vertica nodes. For example:
$ cat /usr/local/etc/dblink.cids
# Vertica DBLINK Configuration File
#
# Connection IDs lines have the following format:
# <mnemonic code>:<ODBC configuration>
# and are terminated by a SINGLE '\n' (ASCII dec 10, ASCII hex 0x0a)
# Be aware of this! Windows editors might end lines with \r\n. In
# this case the Carriage Return is considered part of the ODBC config
# and can cause undefined ODBC Driver Behavior.
#
# Lines starting with '#' are considered comments
#
# Sample configuration:
pgdb:UID=mauro;PWD=xxx;DSN=pmf
myver:UID=mauro;PWD=xxx;DSN=vmf
mysql:USER=mauro;PASSWORD=xxx;DSN=mmf
Then use the cid
parameter to pick our connection:
SELECT DBLINK(USING PARAMETERS cid='myconnecction', query=...) ...
Another methods you can use to specify the connection parameters is to use connect_secret
.
This way you don't have to create a the dblink.cids database however defining
the connection parameters in the command line is not safe before Vertica
12.0.4. All queries are recorded under v_monitor.query_requests
and in
the log file, and that can expose passwords in the connect_secret
parameter.
SELECT DBLINK(USING PARAMETERS connect_secret='UID=mauro;PWD=secret;DSN=pmf', query=...) ...
Lastly, a final method is to set a UDPARAMETER in the session after
connecting to Vertica. The value for this SESSION PARAMETER won't be recorded
in query_requests
.
ALTER SESSION SET UDPARAMETER FOR ldblink dblink_secret = 'UID=mauro;PWD=secret;...' ;
SELECT DBLINK(USING PARAMETERS query='my first query') ...
SELECT DBLINK(USING PARAMETERS query='my second query') ...
SELECT DBLINK(USING PARAMETERS query='my third query') ...
You must install and configure the ODBC layer on all nodes for each database that you want to connect to.
The ODBC configuration depends on the specific ODBC Driver Manager and Database ODBC Drivers. The following is an example configuration file that configures unixODBC and the the PostgreSQL/MySQL drivers:
$ cat /etc/odbc.ini
[ODBC Data Sources]
PSQLODBC = PostgreSQL ODBC
MYODBC = MySQL ODBC
[pmf]
Description = PostgreSQL mftest
Driver = PSQLODBC
Trace = No
TraceFile = sql.log
Database = pmf
Servername = mftest
UserName =
Password =
Port = 5432
SSLmode = allow
ReadOnly = 0
Protocol = 7.4-1
FakeOidIndex = 0
ShowOidColumn = 0
RowVersioning = 0
ShowSystemTables = 0
ConnSettings =
Fetch = 1000
Socket = 4096
UnknownSizes = 0
MaxVarcharSize = 1024
MaxLongVarcharSize = 8190
Debug = 0
CommLog = 0
Optimizer = 0
Ksqo = 0
UseDeclareFetch = 0
TextAsLongVarchar = 1
UnknownsAsLongVarchar = 0
BoolsAsChar = 1
Parse = 0
CancelAsFreeStmt = 0
ExtraSysTablePrefixes = dd_
LFConversion = 0
UpdatableCursors = 0
DisallowPremature = 0
TrueIsMinus1 = 0
BI = 0
ByteaAsLongVarBinary = 0
LowerCaseIdentifier = 0
GssAuthUseGSS = 0
XaOpt = 1
UseServerSidePrepare = 0
[mmf]
Description = MySQL mftest
Driver = MYODBC
SERVER = mftest
PORT = 3306
SQL-MODE = 'ANSI_QUOTES'
$ cat /etc/odbcinst.ini
[ODBC]
Trace=off
Tracefile=/tmp/uodbc.trc
[PSQLODBC]
Description=PostgreSQL ODBC Driver
Driver64=/usr/lib64/psqlodbcw.so
UsageCount=1
[MYODBC]
Driver=/usr/lib64/libmyodbc8w.so
UsageCount=1
[MySQL ODBC 8.0 ANSI Driver]
Driver=/usr/lib64/libmyodbc8a.so
UsageCount=1
To ask a question, start a discussion. To report an issue, open up an issue and provide following information:
- The command that you ran and the associated output as shown on your screen by using the standard Vertica SQL client
vsql
. - Vertica version:
SELECT VERSION();
. DBLINK
library metadata by running statement the following as dbadmin:=> SELECT * FROM USER_LIBRARIES WHERE lib_name = 'ldblink';
- Attach the following ODBC configuration files:
odbc.ini
(please remove passwords or other confidential information)odbcinst.ini
- ODBC Driver Manager version and configuration. For example, with unixODBC, the output of the command
odbcinst -j
. - ODBC traces obtained while running the command (see 1.). To enable the ODBC traces you have to set
Trace = on
inodbcinst.ini
. Do not forget to switch ODBC tracing off at the end.