As many of your will have noticed, we suffered a serious incident on February 24th when rolling out a major change to our software. What was meant as a silent release of our new feature "Customizable User Roles" ended in several hours of downtime of CHEQROOM for a majority of our users.
Here's what happened:
In our attempt to silently roll out a greatly improved version of those user roles (making them dynamic and much more customizable), we encountered several concurrent problems.
The first problem occurred in the service that maps old roles to new roles; while accidentally opening up the management of those roles in our web application.
The immediate impact of this was that some groups of users were put on unexisting role and thus unable to access the service.
The second problem arose at the same time. Our second cluster that provides the backup for peoples' original user roles failed as well.
Getting to a stable situation was complex, time consuming and far from straightforward. During that period, CHEQROOM was effectively unusable for a portion of our users.
We will perform a complete post-mortem on what happened to make sure that we've completely understood the root causes. It will allow us to put in place the necessary measures to minimize the chances of such interruptions in the future.
We're sorry for the inconvenience.