Login issues
Incident Report for CHEQROOM
Postmortem

As many of your will have noticed, we suffered a serious incident on February 24th when rolling out a major change to our software. What was meant as a silent release of our new feature "Customizable User Roles" ended in several hours of downtime of CHEQROOM for a majority of our users.

Here's what happened:

In our attempt to silently roll out a greatly improved version of those user roles (making them dynamic and much more customizable), we encountered several concurrent problems.

The first problem occurred in the service that maps old roles to new roles; while accidentally opening up the management of those roles in our web application.

The immediate impact of this was that some groups of users were put on unexisting role and thus unable to access the service.

The second problem arose at the same time. Our second cluster that provides the backup for peoples' original user roles failed as well.

Getting to a stable situation was complex, time consuming and far from straightforward. During that period, CHEQROOM was effectively unusable for a portion of our users.

We will perform a complete post-mortem on what happened to make sure that we've completely understood the root causes. It will allow us to put in place the necessary measures to minimize the chances of such interruptions in the future.

We're sorry for the inconvenience.

Posted Feb 25, 2020 - 09:24 EST

Resolved
This incident has been resolved.
Posted Feb 25, 2020 - 07:38 EST
Update
All systems have been back online since our last update. All data is safe and restored. If you have any issues logging in you can reach our support staff at support@cheqroom.com.
Posted Feb 25, 2020 - 01:49 EST
Update
The service has returned to normal.
User invites are still experiencing issues in loading
Posted Feb 24, 2020 - 18:07 EST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 24, 2020 - 16:44 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 24, 2020 - 14:42 EST
Identified
The issue has been identified and a fix is being implemented.
Posted Feb 24, 2020 - 08:10 EST
This incident affected: Web application and Mobile apps.