How do you ensure CareLineLive remains stable?
Stability has always been an area that I’ve taken interest in. I’m proud about the fact that CareLineLive has retained 99.998% uptime over the past 12 months, which equates to 10 minutes across the whole year including planned downtime.
The infrastructure behind CareLineLive is both simple and complex in nature. Simple enough that anyone on the team can understand how it works, but complex enough that we can ensure we’re making sure the application stays available when adverse events happen, or simply if we’re experiencing a period of higher than normal usage.
There’s no reason to overcomplicate things, as they just introduce barriers later down the line when you are dealing with a serious incident. The technologies we’ve selected have been proven time and time again in industry.
Of course, we do have to make sure we how to react when things do go awry. We’ve invested a lot of time in developing and testing our Disaster Recovery Plan, so that in the event that there is an issue with an element of infrastructure, we know how to quickly resolve the issue or replace the affected components.
That said, remaining highly available is only a benefit if the software itself works as expected. Bugs are the bane of any developers working day. Whether it’s something small like a typo (we are human after all), or a cosmic ray has flipped a bit in one of our servers. We invest heavily in ensuring we’re performing as much testing as possible before a new feature reaches the hands of our users. Each developer is responsible for implementing unit and feature tests, which ensure that the code that they’re writing is performing as they expect it to. Following that, it will move onto Quality Assurance, where an engineer will perform a number of manual tests to make sure it’s working as defined by the specification. They’ll also implement a number of automated tests that are run each time a code change is made so that we have an opportunity to catch any regressions in functionality before being released.
Finally, we have a internal testing program where team members will be the first to use new features. This serves as a final opportunity to pick up on any defects, and to make sure that the new features or changes actually make sense to users.
Watch the video clip of Dec discussing downtime at CareLineLive