Retry Mechanisms in Apache Camel
The Apache Camel library is, among other things, great for consuming messages on a retry schedule, but the setup can be a bit daunting. So in this article, I'll give you an overview of how to do it, as well as some useful tips.
architecture
apache-camel
java
backend
Apache Camel is an incredibly powerful Java framework for (asynchronously) routing messages between different components within your system.
In this context, a component is very loosely defined as something capable of receiving or producing data; often in the form of so-called messages. Camel supports over 300 different components, examples of which include:
- ActiveMQ
- Bean (referring to Spring or Java EE beans)
- Dropbox (for reading or writing files)
- Google Sheets
- JDBC
- Log (using SLF4J)
When you're not simply routing messages between different components, but when you're also processing these messages in Java (via the bean component for example), there are many reasons why the processing of a particular message could fail.
So in this article, we will explore Apache Camel's retry pattern by taking a look at the different built-in retry mechanisms when consuming messages from a queue.
Table of Contents
- The Application
- Dead Letter Queues (DLQs)
- Reusable Builder Classes
- Basic Retries
- Exponential Backoff
- Logging
- Conclusion
The Application
We're going to need some kind of application to discuss in this article, so consider an e-mail microservice as part of some larger system.
This microservice is connected to a message broker,
where it's constantly listening for CustomerCreated
application events on a particular queue.
In reaction to such an event, the microservices should simply send the newly registered customer a welcome e-mail.
For our technologies, we'll go with Spring Boot for the microservice app and ActiveMQ for the message broker. You can find this application's source code on GitHub - to follow along, please make sure to have the following tools installed:
- Java 13
- Maven
- Docker Compose (for easily spinning up an ActiveMQ instance via the
docker-compose up
command)
To keep things simple, all e-mails will simply be logged to the console, without actually integrating to an external e-mail provider like Mailgun or Sendgrid.
Dead Letter Queues (DLQs)
Once you've added the required (Spring) dependencies for Camel itself, the ActiveMQ component and the Jackson data format,
the following route builder can be used for consuming messages from a customer.created
queue and sending a welcome e-mail in response.
Unfortunately, there's a potential downside to this (simple) approach.
If a CustomerCreated
event comes in,
but an exception occurs while attempting to send the welcome e-mail,
Camel will log the exception to the console,
but the original CustomerCreated
message would be lost.
This is where Dead Letter Queues (DLQs) come in handy.
You can think of a DLQ as a graveyard for messages which weren't successfully processed.
Camel makes it very easy to set up DLQs via the built-in errorHandler
method,
available in every route builder.
This errorHandler
method accepts an ErrorHandlerBuilder
object,
which can be conveniently created via the built-in deadLetterChannel
method.
Soon we will have a look at the many ways in which ErrorHandlerBuilder
objects can be customized,
but for now, we'll just enable the useOriginalMessage()
option,
making sure that:
- if a message was put on the
customer.created
queue, but it couldn't be processed due to some error - then this original message will be moved to the
customer.created.dead
DLQ, ignoring any transformations which might have been applied while traversing the Camel route (like unmarshalling the message's JSON body to aCustomerCreated
POJO)
DLQs are a must-have if you want to be able to manually or automatically retry or replay failed messages. For example, the ActiveMQ web console, available at localhost:8161 by default, provides a way to move messages between queues, which can be used to manually replay a particular message by moving it back from the DLQ to the original queue.
Reusable Builder Classes
To keep things organized, I usually create:
- some kind of
EndpointBuilder
class (like this) which contains utility methods for constructing ActiveMQ endpoint strings - some kind of
EndpointDLQBuilder
class which allows me to keep my DLQ configuration in a central location
This is what the new CustomerCreatedSubscriber
class looks like
when making use of these two helper classes.
Here, the queue(...)
method is statically imported from the new EndpointBuilder
class,
and the dlq(...)
method is statically imported from the new EndpointDLQBuilder
class,
which is defined as follows:
Basic Retries
The (Default)ErrorHandlerBuilder
, used for configuring DLQs in Camel,
provides an easy way to automatically retry messages which couldn't be processed first-time because of some exception.
To test this,
I've intentionally set up my EmailService to fail every request.
We can use the following DLQ configuration to:
- retry the processing of a message 3 times before moving it to the DLQ
- add a static delay of 1 second between each attempt
When I start the application
(which I've set up to publish a dummy event to the customer.created
queue during startup)
I'm greeted with the following log statements:
The original attempt, followed by 3 retries, makes for a total of 4 attempts.
Also note how the timestamps are approximately 1 second apart of each other,
as configured by the redeliveryDelay(1000)
option.
This setup is known as the retry pattern, which is especially useful if you're integrating with external systems. For example, if some external system is known to be unavailable for a few minutes from time to time, it might make sense to configure a Camel retry schedule with a fixed delay of 3-5 minutes between different attempts.
Exponential Backoff
When integrating with some external system, it's important to be a good client 😇. If an external system becomes unavailable, possibly due to traffic overload, we mustn't contribute to the problem by continuously trying to hit some struggling server with our tight Camel retry schedules.
This is where the exponential backoff algorithm comes in. Instead of reattempting requests on a fixed schedule, the idea is to wait (exponentially) longer between consecutive requests.
Configuring exponential backoff in Camel is pretty easy:
Here, the backOffMultiplier(2)
parameter doubles the configured 1000
millisecond wait time between each consecutive attempt,
before eventually moving the message to the DLQ.
Attempt | Time |
---|---|
1 | 09:00:00 |
2 | 09:00:01 |
3 | 09:00:03 |
4 | 09:00:07 |
The exponential growth may cause the wait time in between requests to grow too large.
For example, if we set the maximumRedeliveries
to 12,
the wait time between the 11th and 12th (final) retry will be 211 = 2048 seconds (34 minutes).
To prevent the exponential backoff from growing too large,
one can use the maximumRedeliveryDelay(20000)
option to instruct Camel to never wait for longer than 20 seconds between attempts:
this also is known as the truncated exponential backoff algorithm.
There's one more consideration I would like to mention here. If the schedules of a large number of clients are closely aligned, it's still possible for them to bring a struggling server down (whether the clients are using exponential backoff or not). This is a phenomenon known as the thundering herd problem and also the reason why it's often advised to add some randomness ("jitter") to the wait time between each attempt, but that's outside the scope of this article.
Logging
At this point, it would be hard to know if a message in our system has ended up on the DLQ. We could have a look at the ActiveMQ web console, but it would be much better if we also saw the error printed to the console.
For this, Camel offers us the following configuration, where the stack trace of the last failed delivery will be logged to the console, right before the message gets moved to the DLQ.
Here,
the log
variable refers to a private static final org.slf4j.Logger
field,
auto-generated by Lombok's @Slf4j
annotation.
This setup works fairly well, but usually,
I also like to add an Exception
header to the ActiveMQ message itself,
containing the stack trace of the actual exception;
this way, there's also some information on why a particular message has ended up on the DLQ.
We can use the onPrepareFailure
option for this,
which, according to the JavaDoc,
was specifically designed to "enrich the message before sending it to a dead letter queue".
Conclusion
As we've discussed, Camel's retry pattern works great when integrating with potentially unreliable external systems.
To learn more about Camel's Dead Letter Channel and message redelivery, feel free to have a look at this very readable section from the official Camel user manual on which most of this article is based.