Click servers crashing
Incident Report for Arbeit
Resolved
Hey,

We are happy to share that Click Server stability has been good recently and we wanted to send an update that our Support team is continuing to monitor the servers to mitigate adverse effects that can potentially happen and are available for additional help if your agents are experiencing Click issues.
Posted Oct 06, 2021 - 15:02 UTC
Update
Hey,

We're coming to you with good news about Click this morning - we've made substantial progress on the root issues that were causing instability.

Our team is continuing to monitor the fixes that have been applied and investigating ways to make sure this doesn't happen again.

We appreciate your continued patience and as always, we are available to answer any questions you may have.

- The Nerds at Arbeit
Posted Aug 23, 2021 - 16:44 UTC
Monitoring
We had another solid day of stability with Click. Our Team made a change today that they are confident will further improve click agent stability and it seems to be working well. This is clearly great news but we will continue to monitor the situation and continue to track anything that can cause adverse effects to Click users. Thank you for your continued patience.
Posted Aug 20, 2021 - 19:14 UTC
Update
We have experienced a couple Click crashes today that may have negatively impacted some clients. Our Development team is working to identify the exact causes of today's issues. Our team is monitoring the situation closely and we will continue to communicate the status of our investigation as soon as we know more. We apologize if you were impacted today and we will continue to do our best to mitigate any adverse effects.
Posted Aug 18, 2021 - 20:24 UTC
Update
We're continuing to monitor and identify issues that may have adverse effects on Click users. Our Development Team has developed another change that we will be implementing tonight after everybody stops using Click for the day. There was a server crash today but we quickly identified it and reacted appropriately. We apologize for any inconvenience and are continuing to work on mitigating any adverse effects on Click users.
Posted Aug 18, 2021 - 18:02 UTC
Update
Earlier today, we did experience a crash that may have negatively impacted Click users. Our Development Team worked today on identifying what exactly caused it. We believe that we have identified what caused today's crash. It was related to a patch we applied to fix Click Agents from experiencing being "stuck" in a "Dialing" status after they click on a call card to dial it. We applied a new patch using the information we've gathered today to address this issue.
Posted Aug 17, 2021 - 19:48 UTC
Update
Hello,

We experienced stability issues yesterday and our Development Team and Support Team worked last night on identifying what exactly caused it. We believe that we have identified what caused the problem last night and are actively working today to resolve it.

Thank you,
- The Arbeit Support Team
Posted Aug 17, 2021 - 16:33 UTC
Update
Our Development Team deployed a patch last Friday near the end of the day in response to the performance issues that affected some users during the work day. We are actively monitoring the change and will update you with more information as we gather it.
Posted Aug 16, 2021 - 20:37 UTC
Investigating
We experienced a server crash and are investigating the cause of the issue.
Posted Aug 13, 2021 - 15:48 UTC
Monitoring
The patch applied yesterday has improved the stability of our servers. We will continue to monitor the situation and update you with any pertinent information.
Posted Aug 13, 2021 - 15:04 UTC
Update
We are happy to say that since the patch this morning server stability has improved. We will continue to monitor the situation until we've confirmed that no more issues will arise.
Posted Aug 12, 2021 - 20:20 UTC
Update
Some good news this morning, we have applied a patch in regards to the core issue that was causing the server crashing. We will be monitoring the results of this patch to make sure there are no adverse effects.
Posted Aug 12, 2021 - 15:24 UTC
Update
The patch applied today has improved server stability however we will continue to monitor and make sure there aren't any adverse effects. The development team is still investigating the core issue and we will update you once we know more.
Posted Aug 11, 2021 - 20:31 UTC
Update
Our development team has found the cause of the crashing from yesterday and today. It is an unfortunate by-product produced by the mitigation patch that went live Monday. They are working on fixing this now and we will update you once it goes live.
Posted Aug 10, 2021 - 20:17 UTC
Update
Unfortunately we have had some stability issues since yesterday. Our development team has made some changes that will go live after the next server restart to hopefully reduce crashing while they investigate the issue in more depth. We apologize for any inconvenience this may cause you. We want to assure you we are taking this issue very seriously and dedicating all available resources to resolving it as quickly as possible.
Posted Aug 10, 2021 - 17:45 UTC
Update
We have some good news, over the weekend both parts of our mitigation plan are finally complete. All clients are now on their own servers and the mitigation patch has also been implemented. This patch will reduce server crashing as well as how it effects your agents. If the server does crash it will automatically reconnect so agents won't get kicked out of click. As far as what the agents will see, their authentication will refresh and they can continue on with their work.

If the server crashes while agents are in a call/placing a call, they will notice the following in these call states:

Dialing: The call card will get stuck, but they can just close this card.
On Hold: This card will be remove automatically.
In Call: The call will drop, they will see the call live in agent but they need to "hang up" and assign a disposition.


The development team is still working on a permanent solution however we unfortunately do not have an ETA as this involves fixing an issue with third party open source software. We also realize this mitigation patch is not perfect and will continue to work on quality of life improvements.
Posted Aug 09, 2021 - 16:11 UTC
Update
Our testing team was able to find a bug in the mitigation patch we were planning for Monday, however the development team was able to resolve it in short order and we are still on schedule for the Monday release. We have made some progress on the bug that is causing the crashing but we still do not have a solid ETA on when it will be resolve permanently.
Posted Aug 06, 2021 - 19:32 UTC
Update
In our efforts to keep you up to date we would like to share the following news. The overall stability of click has improved considerable but we wont be stopping our efforts to deliver a better user experience.

After our continued mitigation efforts we will have all clients on their own server tonight. The development team is still testing the mitigation patch we hope to have live Monday. They are also continuing their efforts to resolve the underlying bug that is causing the server crashes.
Posted Aug 06, 2021 - 16:48 UTC
Update
The changes made last night seems to have improved overall click stability which is great but we wont be stopping our efforts to resolve the root cause.

Our development team has completed a mitigation patch that will be implemented once testing is complete, we are hoping to have this added for Monday. Tonight we will be adding more servers until we have each client on their own.
Posted Aug 05, 2021 - 20:38 UTC
Update
In our promise to keep you more up to date with the current status of Click we would like to inform you of two changes we made last night to reduce client impact. There was a patch last night which should reduce the frequency of crashing and we activated more servers, new server activation will continue until each client is on their own server.

We still have our development team working on a permanent solution. They also discovered another potential bug which is being investigated.
Posted Aug 05, 2021 - 15:28 UTC
Update
We are continuing to actively monitor the situation to be able to mitigate any of kind of negative impact to Click agents and users.
Posted Aug 04, 2021 - 13:44 UTC
Update
We are continuing to actively monitor the situation and mitigate the frequency of server crashes. We're implementing and working on multiple changes to lessen the impact of these crashes on Click users.
Posted Aug 03, 2021 - 17:57 UTC
Update
The patch we applied is considerably reducing the amount of crashes that we are experiencing which is very good. We're continuing to work on this issue to mitigate the impact to people using Click.
Posted Aug 03, 2021 - 14:18 UTC
Update
We are continuing to work on patching the issue.
Posted Aug 02, 2021 - 18:49 UTC
Update
We are continuing to work on patching the issue.
Posted Aug 02, 2021 - 16:06 UTC
Update
We are continuing to work on patching the issue.
Posted Aug 02, 2021 - 13:17 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 30, 2021 - 19:37 UTC
Update
Hello! No news or updates on the progress on permanently resolving the issue. But we still haven't faced any crashes yet today but we're continuing to investigate and monitor the situation!
Posted Jul 30, 2021 - 19:03 UTC
Update
We haven't experienced any server crashes yet today. We have made really good progress in identifying what may be causing the crashes but we're continuing to test and identify the root issue.
Posted Jul 30, 2021 - 17:44 UTC
Update
Tickets are open with our the opensource framework to get the bug fixed. We are also continuing to investigate on our side to stabilize in the meantime and deploy a workaround.
Posted Jul 30, 2021 - 16:19 UTC
Update
We are continuing to work on a fix for this issue.
Posted Jul 30, 2021 - 14:24 UTC
Identified
The open source framework we built Click on is experiencing a bug that is causing our servers to crash for two different reasons . To mitigate this we have been trying to move clients around to other servers but those are also crashing after some time. We are now able to reproduce the issue and have applied two patches to the framework directly. Those patches were deployed two days ago. Those patches have reduced the number of crashes significantly across all of our servers but as you can see it is still not fixed.
Posted Jul 22, 2021 - 14:24 UTC
This incident affected: Arbeit Click.