Interface contract for robust error handling? #15

didibus · 2023-01-24T20:13:09Z

didibus
Jan 24, 2023

In order for a user to implement a production robust use of the library APIs, the error scenarios need to be well documented, and ideally they should all follow well defined patterns so the user has a simple way to appropriately handle all error scenarios.

For example, a use case I've tried to implement is to fetch thousands of URLs, attempting for all of them to succeed.

The starting code looks like:

(ns script
  (:require [babashka.http-client :as http]))

(->> ["https://clojure.org"
      "https://google.com"
      "https://amazon.com"
      "https://localhost"
      "garbage"]
     (mapv #(http/get % {:async true}))
     (mapv deref)
     (println))

Where the vector of URL strings would contain thousands more URLs in it.

In a real life scenario, we can't assume we have the cleanest data, first of all that means some URLs might be malformed to begin with, such as "garbage" in my example.

In our case, http/get will not behave in an async way even though we said :async true, because it's URL to URI parser is actually ran synchronously and throws synchronously on error, instead of always returning a CompletableFuture.

This means users must know that some errors are synchronous, while others are async. We might update the code as follows:

(->> ["https://clojure.org"
      "https://google.com"
      "https://amazon.com"
      "https://localhost"
      "garbage"]
     (mapv #(try (http/get % {:async true})
                 (catch Exception e (delay e))))
     (mapv deref)
     (println))

This delays the synchronous errors, so that they show up asynchronously later when we deref the results, allowing us to move all the error handling around the deref.

Next the user needs to know how to handle errors in the deref part. Because we want this to be a robust implementation, the user wants to perform a retry on all errors that indicate a transient issue, but would like to log and continue on all errors that indicate an issue with the input or a bug, for which retrying would be fruitless, and finally after some number of retries if the retryable ones still fail, it wants to similarly log and continue.

The first code refactor might look like:

(->> ["https://clojure.org"
      "https://google.com"
      "https://amazon.com"
      "https://localhost"
      "garbage"]
     (mapv #(try (http/get % {:async true})
                 (catch Exception e (delay e))))
     (mapv #(try (deref %)
                 (catch Exception e e)))
     (println))

Here we're turning the deref's thrown exceptions into return values, so we can process them after to check for which one succeeded or errored.

Now we refactor the code again so we can handle errors:

(defn error-type
  [result]
  (cond (instance? clojure.lang.ExceptionInfo result)
        :retryable-error
        (instance? Exception result)
        :non-retryable-error
        (and (map? result) (not= 200 (:status result)))
        :retryable-error
        :else
        :success))

(defn batch-fetch
  [urls]
  (->> urls
       (mapv #(try (http/get % {:async true})
                   (catch Exception e (delay e))))
       (mapv #(try (deref %)
                   (catch Exception e e)))
       (group-by error-type)
       (doall)))

(batch-fetch ["https://clojure.org"
              "https://google.com"
              "https://amazon.com"
              "https://localhost"
              "garbage"])

This is the crux of the issue, look at the error-type function. Here we want a reliable way to identify everything that's not a success, and for each thing that's not a success, we want to identify if it's retryable or not retryable. But because babashka.http-client doesn't provide well defined errors, it is very difficult to know how to identify the type of errors that could happen exhaustively.

Currently, it appears sometimes the error will be an ex-info but not always. If it is an ex-info, it's not clear how to tell what type of error it is, or if there will always be a :status key on the ex-data for it.

The user would need documented descriptions of where errors are thrown/returned, the set of all possible type of errors at each point, and their structure.

At a minimum, a consistent structure for all errors would help a lot, for example making all errors ex-info with the type of error documented on a :type key on the ex-data, along with a documented list of the set of all possible types of errors, and that all errors will always have an ex-data with the details.

That way inside error-type we could simply inspect the ex-data :type key and if it's one of a retryable type, we can classify it as retryable, otherwise non-retryable.

This would also allow us, for non-retryable ones, to know what to log in the error, since we can always expect an ex-data with some common structure, and a ex-message, we can log something like:

(log (str (:type (ex-data err)) ", " (ex-message err) ", " (ex-data err)))

Finally, it would be nice if all errors contained the URL as well which it failed on, because for the retryable ones for example, the user would want to retry, but if the error no longer contains the information to map back to which of the async get this was for, knowing which one to retry is non-trivial. Thus either having the URL as part of the error so we can retry using the error data itself, or having a feature to provide some ID to the async get call which is included on the error so we can map it back would be ideal.

Regards.

lispyclouds · 2023-01-31T08:22:29Z

lispyclouds
Jan 31, 2023
Maintainer

@borkdude some observations from me if it helps:

none of the existing http clients in clojure provide an error interface as described here as far as I can see.
all of them use the status code as a method of determining if an error has happened or not and other class of errors are raw exceptions. Also none of the errors are datafied.

the closest thing I can see is the way aws-api does it but even that has this caveat:

Note that AWS will sometimes (rarely, but not never) return a 200 with an error message in the response payload, e.g. https://aws.amazon.com/pt/premiumsupport/knowledge-center/s3-resolve-200-internalerror/. 
Since we don't have a consistent, reliable way to programatically check for this, aws-api does not convert these responses to anomalies.

Also it doesn't handle network or client related exceptions and they are thrown verbatim and not all errors are datafied.

0 replies

borkdude · 2023-01-31T10:01:31Z

borkdude
Jan 31, 2023
Maintainer

@lispyclouds Good observations. I view http-client as a light-weight tool to make working with java.net.http easier but not as a tool which wraps Java exceptions and translates them to an ex-info. But if we're throwing ex-info ourselves, I want to add data if this makes things easier for users, e.g. by adding a :type field. I think what we need is problems from real world scenarios and based on that we can make incremental improvements. I'll convert this issue to a Github Discussion and will reserve issues for directly actionable problems + solutions.

0 replies

didibus · 2023-02-01T02:30:37Z

didibus
Feb 1, 2023
Author

I think it's okay to punt the problem to the user, but at least the wrapper around java.net.http shouldn't make it harder to handle errors than if they used java.net.http directly.

Right now it feels like it does, because it obscures what exception will be raw from java.net.http and which one are going to be altered by babashka.http-client.

Also, I find it strange that it wraps http error status codes behind an exception. I'd rather get a map back with status = to whatever status code, and the rest of the keys corresponding to that status code.

But if I look at https://stackoverflow.com/questions/67523816/how-where-to-check-for-java-net-http-httpclient-http-response-status-codes it seems people even find that java.net.http has badly documented error handling and struggle to handle errors with it as well.

1 reply

lispyclouds Feb 1, 2023
Maintainer

Agreed. Here are some of my thoughts specifically on:

Also, I find it strange that it wraps http error status codes behind an exception. I'd rather get a map back with status = to whatever status code, and the rest of the keys corresponding to that status code.

This is built more from the sense of scripting and it's more helpful to bail out in cases of exceptional status codes than remember to check for the status and program accordingly. The point being failing faster seems to bode well for scripts.
I myself had similar thoughts but got the point later 😃.
Since this also tries to be a smooth migration curve from existing clojure HTTP libs, follows the similar behaviour like:
- clj-http
- this section in hato: throw-exceptions? By default, the client will throw exceptions for exceptional response statuses. Set this to false to return the response without throwing.

I think these are some of the rationales for the current implementation choices and I totally get the need to have something new and incrementally better, specially when building a thing from the ground up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interface contract for robust error handling? #15

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Interface contract for robust error handling? #15

didibus Jan 24, 2023

Replies: 3 comments · 1 reply

lispyclouds Jan 31, 2023 Maintainer

borkdude Jan 31, 2023 Maintainer

didibus Feb 1, 2023 Author

lispyclouds Feb 1, 2023 Maintainer

didibus
Jan 24, 2023

Replies: 3 comments 1 reply

lispyclouds
Jan 31, 2023
Maintainer

borkdude
Jan 31, 2023
Maintainer

didibus
Feb 1, 2023
Author

lispyclouds Feb 1, 2023
Maintainer