Skip to content
Anthony Virtuoso edited this page Nov 16, 2019 · 8 revisions

Q: Where data sources are supported?

A: Please refer to: Available Connectors or look at the module list in the root of our github repository.


Q: What limitations come with using Lambda for BigData?

A: The most relevant restrictions are the 15 minute max runtime and 3GB max memory available to any given Lambda Invocation. If the source system that your are federating to supports partitioning or parallel scans, Athena will use multiple Lambda invocations to extend both the max runtime and total working memory available to the connector. Since Athena's execution engine is only delegating Table Scan operations to your connector, your query can actually run much longer than 15 minutes and use considerably more memory than 3GB. These restrictions only apply to the fragments of the SQL Table Scan operations that Athena delegates to Lambda.


Q: What is a 'Connector'?

A: A 'Connector' is a piece of code that can translate between your target data source and Athena. Today this code is expected to run in an AWS Lambda function but in the future we hope to offer more options. You can think of a connector as an extension of Athena's query engine. Athena will delegate portions of the federated query plan to your connector.

For more a more detailed explanation of the functionality that comprises a connector see: MetadataHandler, RecordHandler, UserDefinedFunctionHandler


Q: I'd like to connector to XYZ, but XYZ is not in the list of available connectors, what should I do?

A: The Amazon Athena Query Federation SDK allows you to customize Amazon Athena with your own code. This enables you to integrate with new data sources, proprietary data formats, or build in new user defined functions. If you are comfortable developing your own connector, we recommend going through our Getting Started Guide.

Alternatively, you can raise an issue and let us know what data source you feel would be valuable for us to support.


Q: What connectivity / networking requirements are there for Athena Federation?

A: From the outset we expected that customers wanting to run federated queries would have a diverse technology landscape, potentially compromised of many micro-services and application specific VPCs which created islands of data. As such, Athena has no specific networking requirement in order to federate queries across your sources. Instead, Athena gives you the freedom to deploy connectors as Lambda functions in the appropriate VPC(s) that will enable the connector to communicate with your source. For example, you can deploy 3 different copies of the JDBC connector in order to have Athena join across MySQL in VPC1 and Redshift in VPC2 and lastly Postgres in VPC3. At no point would Athena be able to speak directly to these VPCs nor would these VPCs require any peering or connectivity to each-other.