Hello everyone, event-driven is a very widespread architecture in microservices as it promotes decoupling between the different services at Convenia, we chose this architecture, and in this article, I would like to expose a little of what we did and how we did it.
First, I would like to say that the choices we made were taking into account our scale and growth forecast, this stack may not be ideal for you, and because it is a very broad topic and without a formal definition, we have taken some initiatives that make makes a lot of sense to us, but might not fit well in your case, although the following is likely to be constructive if you’re thinking of doing something asynchronous and decoupled.
How does event-driven architecture work?
Every mature framework currently comes with some kind of event bus. If you are familiar with this concept, think about it more broadly. Instead of having a class that fires an event and N classes of Listeners that listen to this event, we would have a sender service and N “listener” services case this has not helped to understand, I will explain a little better:
In the image above, we have some important elements:
Issuing service (Order service): it is the service from which the stimulus originates, it is only responsible for closing the order, and at the end, it shouts: “Order Completed”! That way, all the other services that care about this “stimulus” can react to it. The sender service does not know what services it “looks after” for its message. This action ends after the message is issued.
Listener Services (represented on the right of the figure): The three services care about the message issued by the requested service but do not know about the requested service itself. From then, they can do whatever they want with the message issued without influencing the sender service or even without influencing other listeners, most different from what would happen in a procedural approach.
Message broker (the central piece): This element is responsible for promoting the temporal decoupling between the sender and the listeners. This means that the message is issued when the sender deems it convenient and without him caring if the listeners will be able to hear the message at that time, listeners can listen to the message whenever they want. If a listener is offline or even “broken” the message broker will “hold” the message until that listener can receive it there.
How does this all fit into the Convenia ecosystem?
When choosing the stack parts, we tried to keep the complexity as low as possible so that the stack does not burden us and the development remains simple. We chose rabbitmq for its simplicity of use and setup and for being a robust enough option.
On the applications side, we use PHP with Laravel framework, this brought us a lot of agility in the development of applications, but there are almost no solutions to make all services work as a unit, so we did a lot of things on our own.
Right at the beginning, we started to make the communication between the services with the most widespread php lib, the amqplib, which despite fulfilling its role and being performative enough, seems clumsy and is practically immovable (untestable), in addition to obliging us to write an ungainly boilerplate, to solve this problem we wrote Pigeon (I suggest reading the documentation), it wraps amqplib giving us the possibility to test emissions and write truthfully elegant code, Pigeon also allows us to treat rabbitmq as disposable, instead of keeping a versioned queue definition file, Pigeon can create the queues on the fly and already do the bindings correctly if something goes wrong we only upload another instance of rabbit, and everything will work.
Here is an example of issuing an event with Pigeon:
Pigeon::dispatch('sample.event', [
'scooby' => 'doo'
]);
To listen to the above event we could do the following code:
Pigeon::events('sample.event')
->callback(
Closure::fromCallable([$this, 'httpCallback'])
)->fallback(
Closure::fromCallable([$this, 'httpFallback'])
)->consume(0, true);
In the example above, we defined a Closure for a callback (when everything goes well) and another for a fallback (when something goes wrong). Pigeon is very comprehensive, and I intend to go into more detail in another post, but for now, I believe that this demonstrates, as well as communication between services.
How to Design a Listener?
The listener is the class that will contain the code that listens to the events (code presented in the previous item). This code opens a socket with rabbitmq and waits for events, and this means that the listener process will never “die”. For this, we need to take some special care because, by default, the PHP programmer is used to the life cycle of the request, which is very short-lived, and this allows us to make some transgressions when committed in a Listener, which can cause a lot of headaches, here are some necessary precautions.
The listener needs to run on the supervisor, always: This is not only advised for programs written in PHP, but the vault also is a tool to manage secrets written in go, and it is advisable to run it on the supervisor as well because the supervisor can revive your process if it dies, without it your listener would die, and you would need to do a manual process to bring back, and with a dead listener the service is “deaf”.
Correctly configure the supervisor: it is crucial to pay attention to the supervisor settings related to attempts: in extreme cases, the supervisor is trying to restart the process that is broken infinitely, causing high CPU consumption in simpler setups where the web server is in the same instance of the Listener this would be catastrophic because the web server itself would be impacted, think that if you use a service to manage exceptions such as sentry, there is usually a quota, you can exceed this quota with the infinite attempts of the supervisor and lose visibility.
Separate instances of listeners from instances of webserver: to avoid the situation of the previous item, it would be a good practice to have instances of “Worker”, which run only this listener. These instances tend to be simpler as well because they do not need to be accessed via HTTP in clouds like AWS, this would save the cost and effort of setup with a load balancer.
Avoid wasting resources: Now the process stays alive for a long time, so we must take care to close the connections we open and not do even a type of procedure that can accumulate during consumption of rabbitmq messages: suppose you copy an image to resize it and forget to delete this image at the end of the process, soon you will run out of storage, the same is very common to happen with memory usage.
Artisan commands: Artisan commands are for one-off procedures, ephemeral things too, and not for “long-running tasks”, despite the conceptual transgression they are a structure option to make a listener, inside it you will have access to all the structures of Laravel, keep in mind that artisan commands consume a considerable amount of memory.
Idempotency: Messages can be resent, and the same message can arrive at the same listener a couple of times. Your listener needs to process this message in an Idempotent way. Imagine that it creates a record in the database with an id “auto increment” if the message arrives again, it cannot create another record, maybe an upsert would be the way out in that case.
Reject the message in case of failure: this we will explain in the next item.
And when things go wrong?
Certainly, at some point, things will break, the broken listener tries to reprocess the message the number of times stipulated by the supervisor configuration, and then it will die, in which case, the message will be dammed up in the queue until the listener is fixed, after the deployment of the correction the message is consumed correctly and everything returns to normal.
The above situation is not ideal as the listener often dies only for a given message. Rabbitmq has a dead letter exchange configuration that allows us to send the message to a specific location in case of failure. In our case we reject all messages that caused a failure.
Now we receive all the broken messages in a single place, but we should give more visibility possibilities for these messages, so we created LetterThief (Letter Thief), a service that aims to notify all the failures and gives us the possibility to resend the messages that caused this failure.
The image above shows the Thief interface, with some actions with an emphasis on the “Try again” action. This button gives us the possibility to resend the message to the service from which it originated.
When opening a message, we have various information such as metadata, headers, and body. These will allow us to reproduce and correct the error in a local environment. Only after deploying the fix should we resend the message.
Only damming errors in a specific place don’t matter. We need to have to sound an alarm warning about this error, so we have a Slack channel where all the errors fall, and the following figure shows how this works:
When we see the notification in Slack, we know which developer is responsible for fixing the failure precisely by the queue name. This developer will immediately focus on that fix.
Conclusion
Every distributed architecture will be relatively more complex than a service in a single repository. I believe we managed to archive a relatively simple and secure architecture compared to the standard used in microservices on the market. Each case has a specific need, so it will hardly work completely for you, but you might get some idea of it all, and mainly, never distribute your architecture if it’s not necessary.
I hope I have contributed in some way.
Original article published by the author here and republished on iMasters at the author’s request.
Leave a comment