Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace routerify in file transfer server with axum #2461

Merged
merged 15 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 24 additions & 21 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ az_mapper_ext = { path = "crates/extensions/az_mapper_ext" }
backoff = { version = "0.4", features = ["tokio"] }
base64 = "0.13"
batcher = { path = "crates/common/batcher" }
bytes = "1.4"
c8y_api = { path = "crates/core/c8y_api" }
c8y_config_manager = { path = "crates/extensions/c8y_config_manager" }
c8y_firmware_manager = { path = "crates/extensions/c8y_firmware_manager" }
Expand Down Expand Up @@ -152,12 +153,14 @@ tedge_timer_ext = { path = "crates/extensions/tedge_timer_ext" }
tedge_utils = { path = "crates/common/tedge_utils" }
tedge-watchdog = { path = "crates/core/tedge_watchdog" }
tempfile = "3.5"
test-case = "2.2"
test-case = "3.2"
thiserror = "1.0"
time = "0.3"
tokio = { version = "1.23", default-features = false }
tokio-util = { version = "0.7", features = ["codec"] }
toml = "0.7"
tower = "0.4"
http-body = "0.4"
Bravo555 marked this conversation as resolved.
Show resolved Hide resolved
tracing = { version = "0.1", features = ["attributes", "log"] }
tracing-subscriber = { version = "0.3", features = ["time", "env-filter"] }
try-traits = "0.1"
Expand Down
7 changes: 5 additions & 2 deletions crates/core/tedge_agent/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ repository = { workspace = true }
[dependencies]
anyhow = { workspace = true }
async-trait = { workspace = true }
axum = { workspace = true }
camino = { workspace = true }
clap = { workspace = true }
flockfile = { workspace = true }
Expand All @@ -20,7 +21,6 @@ lazy_static = { workspace = true }
log = { workspace = true }
path-clean = { workspace = true }
plugin_sm = { workspace = true }
routerify = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
tedge_actors = { workspace = true }
Expand All @@ -38,13 +38,16 @@ tedge_utils = { workspace = true }
thiserror = { workspace = true }
time = { workspace = true, features = ["formatting"] }
tokio = { workspace = true, features = ["rt-multi-thread"] }
tokio-util = { workspace = true }
toml = { workspace = true }
tracing = { workspace = true }
which = { workspace = true }

[dev-dependencies]
serial_test = { workspace = true }
bytes = { workspace = true }
http-body = { workspace = true }
tedge_actors = { workspace = true, features = ["test-helpers"] }
tedge_mqtt_ext = { workspace = true, features = ["test-helpers"] }
tedge_test_utils = { workspace = true }
test-case = { workspace = true }
tower = { workspace = true }
8 changes: 4 additions & 4 deletions crates/core/tedge_agent/src/agent.rs
Original file line number Diff line number Diff line change
Expand Up @@ -153,10 +153,10 @@ impl Agent {
pub fn init(&self) -> Result<(), anyhow::Error> {
// `config_dir` by default is `/etc/tedge` (or whatever the user sets with --config-dir)
create_directory_with_defaults(self.config.config_dir.join(".agent"))?;
create_directory_with_defaults(self.config.log_dir.clone())?;
create_directory_with_defaults(self.config.data_dir.clone())?;
create_directory_with_defaults(self.config.http_config.data_dir.file_transfer_dir())?;
create_directory_with_defaults(self.config.http_config.data_dir.cache_dir())?;
create_directory_with_defaults(&self.config.log_dir)?;
create_directory_with_defaults(&self.config.data_dir)?;
create_directory_with_defaults(&self.config.http_config.file_transfer_dir)?;
create_directory_with_defaults(self.config.data_dir.cache_dir())?;

Ok(())
}
Expand Down
34 changes: 16 additions & 18 deletions crates/core/tedge_agent/src/file_transfer_server/actor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ impl Actor for FileTransferServerActor {

async fn run(mut self) -> Result<(), RuntimeError> {
let http_config = self.config.clone();
let server = http_file_transfer_server(&http_config)?;
let server = http_file_transfer_server(http_config)?;

tokio::select! {
result = server => {
Expand Down Expand Up @@ -83,9 +83,12 @@ impl Builder<FileTransferServerActor> for FileTransferServerBuilder {
#[cfg(test)]
mod tests {
use super::*;
use anyhow::bail;
use anyhow::ensure;
use hyper::Body;
use hyper::Method;
use hyper::Request;
use std::time::Duration;
use tedge_test_utils::fs::TempTedgeDir;
use tokio::fs;

Expand Down Expand Up @@ -141,31 +144,26 @@ mod tests {
}

#[tokio::test]
#[serial_test::serial]
async fn check_server_does_not_panic_when_port_is_in_use() -> Result<(), anyhow::Error> {
async fn check_server_does_not_panic_when_port_is_in_use() -> anyhow::Result<()> {
let ttd = TempTedgeDir::new();

let http_config = HttpConfig::default()
.with_data_dir(ttd.utf8_path_buf().into())
.with_port(3746);
jarhodes314 marked this conversation as resolved.
Show resolved Hide resolved
let config_clone = http_config.clone();

// Spawn HTTP file transfer server
// handle_one uses port 3000.
let builder_one = FileTransferServerBuilder::new(http_config);
let handle_one = tokio::spawn(async move { builder_one.build().run().await });

// handle_two will not be able to bind to the same port.
let builder_two = FileTransferServerBuilder::new(config_clone);
let handle_two = tokio::spawn(async move { builder_two.build().run().await });

// although the code inside handle_two throws an error it does not panic.
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
// Both servers will attempt to bind to port 3746.
let server_one = FileTransferServerBuilder::new(http_config).build().run();
// This server will not be able to bind to the same port.
let server_two = FileTransferServerBuilder::new(config_clone).build().run();

// to check for the error, we assert that handle_one is still running
// while handle_two is finished.
assert!(!handle_one.is_finished());
assert!(handle_two.is_finished());
tokio::select! {
// Ensure we bind server_one first by polling that future first
biased;
res = server_one => bail!("expected second server to finish first, but first finished with: {res:?}"),
jarhodes314 marked this conversation as resolved.
Show resolved Hide resolved
res = server_two => ensure!(res.is_err(), "expected server two to fail with port binding error, but no error was found"),
_ = tokio::time::sleep(Duration::from_secs(5)) => bail!("timed out waiting for actor to stop running"),
}
Copy link
Contributor Author

@jarhodes314 jarhodes314 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring is incidental, I noticed the test had both a hard sleep (so it was comparatively slow) and not terribly deterministic (no guarantee server one would attempt to bind first, no guarantee the port binding would fail within 100ms). The timeout-based approach gives a lot more leniency to the servers to fail, and polling server_one first ensures that it binds to a port first (assuming that is the first thing it attempts to do, which I think is a reasonable assumption).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In response to #2461 (comment), I've now changed this test again, and as a result the remaining complexity around which port binds first (i.e. what I'm relying on the biased directive for) has disappeared entirely.


Ok(())
}
Expand Down
88 changes: 82 additions & 6 deletions crates/core/tedge_agent/src/file_transfer_server/error.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
use axum::extract::rejection::PathRejection;
use axum::response::IntoResponse;
use hyper::StatusCode;
use tedge_actors::RuntimeError;

use super::request_files::RequestPath;

#[derive(Debug, thiserror::Error)]
pub enum FileTransferError {
#[error(transparent)]
Expand All @@ -8,12 +13,6 @@ pub enum FileTransferError {
#[error(transparent)]
FromHyperError(#[from] hyper::Error),

#[error("Invalid URI: {value:?}")]
InvalidURI { value: String },

#[error(transparent)]
FromRouterServer(#[from] routerify::RouteError),

#[error(transparent)]
FromAddressParseError(#[from] std::net::AddrParseError),

Expand All @@ -24,8 +23,85 @@ pub enum FileTransferError {
BindingAddressInUse { address: std::net::SocketAddr },
}

#[derive(Debug, thiserror::Error)]
pub enum FileTransferRequestError {
#[error(transparent)]
FromIo(#[from] std::io::Error),

#[error("Request to delete {path:?} failed: {err}")]
DeleteIoError {
#[source]
err: std::io::Error,
jarhodes314 marked this conversation as resolved.
Show resolved Hide resolved
path: RequestPath,
},

#[error("Request to upload to {path:?} failed: {err:?}")]
Upload {
#[source]
err: anyhow::Error,
path: RequestPath,
},

#[error("Invalid file path: {path:?}")]
InvalidPath { path: RequestPath },

#[error("File not found: {0:?}")]
FileNotFound(RequestPath),

#[error("Path rejection: {0:?}")]
PathRejection(#[from] PathRejection),
}

impl From<FileTransferError> for RuntimeError {
fn from(error: FileTransferError) -> Self {
RuntimeError::ActorError(Box::new(error))
}
}

impl IntoResponse for FileTransferError {
jarhodes314 marked this conversation as resolved.
Show resolved Hide resolved
fn into_response(self) -> axum::response::Response {
use FileTransferError::*;
let status_code = match self {
FromIo(_)
| FromHyperError(_)
| FromAddressParseError(_)
| FromUtf8Error(_)
| BindingAddressInUse { .. } => {
tracing::error!("{self}");
StatusCode::INTERNAL_SERVER_ERROR
}
};
status_code.into_response()
}
}

impl IntoResponse for FileTransferRequestError {
fn into_response(self) -> axum::response::Response {
use FileTransferRequestError::*;
match &self {
FromIo(_) | PathRejection(_) => {
tracing::error!("{self}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: This will log an error not only when a request is handled, but every time .into_response() is called.

Axum has a tracing feature that makes it output traces when handling events. Could it be used for logging errors like we try to do here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure axum's tracing is quite what we want here. Here, I want to determine whether I'm hiding information from the user (which is when we have an internal server error), and if so, log that information. This avoids polluting the server logs with errors arising from clients misusing the API, but means we don't expose implementation details to the user of the API. If we rely on some sort of middleware for logging, either it would take an input of FileTransferRequestError, in which case we might as well do everything here as we'd still need to log selectively, or we'd take the converted response, by which point we know the status code, so it's easy to log selectively, but we've thrown away the pertinent information (e.g. an IO error).

(
StatusCode::INTERNAL_SERVER_ERROR,
"Internal error".to_owned(),
)
}
jarhodes314 marked this conversation as resolved.
Show resolved Hide resolved
DeleteIoError { path, .. } => {
tracing::error!("{self}");
(
StatusCode::FORBIDDEN,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these really want to be 403 Forbidden (it's what the code did previously)? It feels like an encapsulation failure to recognise for certain paths that we can't delete/upload to them.

Copy link
Contributor

@Bravo555 Bravo555 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO "could not delete path due to IO error" could happen due e.g. path not existing, in which case we need to return 404, or due to server not having permissions to do filesystem delete, in which case we should return 500, so if I'm not wrong, then even using a single return code for all IO errors is kinda bad. As for 403 Forbidden, I also feel that it should be used when the client that made the request doesn't have permissions to do something, which doesn't seem to be the case here.

Copy link
Contributor Author

@jarhodes314 jarhodes314 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was roughly what I was thinking. We currently handle the not found case by returning 202 Accepted, like the successful deletion case (which I assume is to make DELETE requests idempotent). Returning 404 Not Found may also make sense, or if the file previously existed, 410 Gone, although a success response tells the user the file definitely no longer exists, and the request definitely succeeded.

I think if we've hit the case handled by this code, it's some error accessing the file system, which in my mind should definitely be a 500 Internal Server Error. This code may also be caused by trying to delete a directory, which I talk about in a different comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

403 doesn't look bad, as the spec mentions the following: The access is tied to the application logic, such as insufficient rights to a resource. which is the case here, right? I vaguely remember a similar discussion happening when the original author introduced this case in the first place and we settled with 403 for the same reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what application logic is relevant here. With the current API at least, we have no control over permissions. If there's a permissions error on the file system, that implies some other process is interfering with /var/tedge/file-transfer, which is not an application-level error, it's an error from the environment. The question here mostly boils down to whether we want to assume it's some file that should have never been there (in which case encapsulating the error entirely and responding with 404 Not Found is appropriate, or whether we assume the file was uploaded correctly, but something as changed the internal permissions so we can't access it, in which case a 500 Internal Server Error is appropriate.

Thinking more broadly, a useful litmus test to apply here would be "if one user receives a 403 for a resource, there must exist some other user that is able to access the resource in question". If not, and the 403 response is global, then I don't believe this can be considered an authorization issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@didier-wenzek @reubenmiller do you have opinions on this point?

Copy link
Contributor

@didier-wenzek didier-wenzek Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting a resource that doesn't exist is definitely not an error. Returning 202 in such a case makes the request idempotent.

Trying to delete a non-empty directory, should be considered as a user error as we don't want to support recursive deletes. So 400 makes sense. I would do the same for empty-directories.

If the error is due to missing access, I would say this is a 500 as the file-transfer service should own all the files under file transfer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

403 doesn't look bad, as the spec mentions the following: The access is tied to the application logic, such as insufficient rights to a resource.

From RFC 9110:

The 403 (Forbidden) status code indicates that the server understood the request but refuses to fulfill it.

If we get an IO error due to missing file permissions, I would say that the server tried to execute the request, but failed, due to a misconfiguration (e.g. not having write access to file-transfer directory), in which case 500 would be fitting.

Copy link
Contributor Author

@jarhodes314 jarhodes314 Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to delete a non-empty directory, should be considered as a user error as we don't want to support recursive deletes. So 400 makes sense. I would do the same for empty-directories.

I think if it is a directory, 404 makes more sense as it the error is in the URL and the issue is that the resource in question cannot be interacted with (and essentially doesn't exist), not that the request is inherently invalid. I'll make sure the error message clarifies that the file we attempted to delete is a directory.

Other than that, it seems like there is now broad consensus that 500 is the appropriate response for an IO error, so I'll sort that out too.

format!("Cannot delete path {path:?}"),
)
}
Upload { path, .. } => {
tracing::error!("{self}");
(
StatusCode::FORBIDDEN,
format!("Cannot upload to path {path:?}"),
)
}
InvalidPath { .. } | FileNotFound(_) => (StatusCode::NOT_FOUND, self.to_string()),
}
.into_response()
}
}
Loading