This tool will retrieve a file from an HTTP or HTTPS location. A general flow is shown just below. Please be advised the service does not currently include anti-virus although it appears in the diagram which reflects a desired state rather than the current state.
(see Note 1).
WARNING WARNING WARING - no anti-virus is currently included even though it appears in the diagram above as a desired feature
Requests should be sent as JSON.
{
"method": "GET",
"url": "http://example.com/some/file",
"timeout_seconds": 5
}
The following fields are mandatory:
- url
The following fields are optional:
- method: defaults to GET. Valid options are GET or POST
- timeout_seconds: integer seconds, defaults to 5 seconds
Exit codes are broken into different series where 1xx are access failures, 2xx are file format failures, 3xx are concerns with file contents such as malware, 4xx indicates the files couldn't be uploaded to the staging area for dispatch for further processing (Airbyte), and 5xx indicates errors replying to the system which requested the file retrieval.
Code | Meaning |
---|---|
0000 | Success |
0010 | Invalid Request |
0100 | Error Retrieving Credentials |
0101 | Login Error (OAuth failure, HTTP 401, etc.) |
0102 | Access Denied (HTTP 403) |
0103 | File Not Found (HTTP 404) |
0199 | Unknown Retrieval Error |
0200 | Decompression failed (unsupported format, corrupted, etc.) |
0300 | File, or embedded file, flagged by Anti-malware |
0400 | Upload Failed |
0500 | Response to source system (Airflow) failed |
9999 | Operation unsupported |
[Exit codes] |
Build the docker image with the following. Removing the --no-cache speeds up the build, but seems to occasionally miss changes to the files (I'm probably doing it wrong).
docker build --no-cache -t timstestcntnrreg.azurecr.io/http_file_rtrvr:0.0.2 .
Upload the docker image to ACR
docker push timstestcntnrreg.azurecr.io/http_file_rtrvr:0.0.1