SRE and Security

SRE (Site Reliability Engineering) is normally looked at through the lens of reliability. It's in the name of course. However, SRE can play a fundamental role in securing systems. Reliability and security can work hand in hand to ensure that systems do not lose confidentiality, integrity, or availability.

Firstly, SRE is a movement of placing software engineers into operations roles. Operations has generally been looked at as a sunk cost. With SRE, though, value is being produced from operations via the automation of processes that were previously taken care of manually. Software developers know how to program, and can therefore code workarounds for things that used to be done manually. This is the real benefit of SRE lies. The reliability of the system is increased, thereby increasing customer satisfaction, and ultimately benefiting the company's bottom line.

One of the trade-offs when it comes to securing a system, is deciding whether it should fail open or fail closed. Taking common real-world scenarios, in the case of a fire in a building you would want the doors of that building to fail open so that people could escape. Of course, allowing people into the building could then be an issue because the building has failed open. Someone could pull a fire alarm just to get access to the building and important documents or systems. Failing closed could trap people in the building. Another option is to fail safe. Failing safe balances security with safety. For example, access controls to enter the building remain in place while the ability to leave the building is available to anyone regardless of access controls.

Redundant and segregated systems are another mechanism by which systems are made reliable and secure. This is especially the case for ransomware attacks. When one set of systems is taken down, another site can be brought up to replace the failed site's capabilities. This mitigates any downtime the ransomware may cause and, potentially, paying a ransom to recover one's data. Of course, paying a ransom is never a good idea, as one's data may be sold even if the ransom is paid. It is better to ensure that all data, at rest or in transit, is encrypted so that it cannot be deciphered and sold.

Keep Dependencies Up To Date

Another common point of failure in terms of securing systems, is the use of outdated and insecure dependencies. This is especially true for open source projects such as OpenSSL and the Linux kernel. Heartbleed is a key example. The bug was patched before it was revealed, but many developers took too long to update their OpenSSL dependency and it was exploited anyways. Also, when your systems are up to date with the latest dependencies you can more quickly merge new code that fixes a critical vulnerability. Frequently building your code, to ensure successful builds, can also ensure the critical vulnerabilities can be patched quickly.

Release Frequently Using Automated Testing

SRE dictates that releases must be cut regularly so that any emergency fixes can be done quickly, on the fly. Separating large releases into many smaller releases cuts down the risk of needing to roll back any one release because of a problem. If there does end up being a bug, though, smaller releases leaves you less code to have to sift through to find the bug. This creates faster reaction times, greater site reliability, and increased customer satisfaction.

Testing and validation of releases can be automated as well, allowing good releases to be released and deficient releases to be blocked quickly. This increases confidence that security patches that will be released will work, again decreasing turnaround time and increasing customer satisfaction.

Use Containers

Containers can be secured independent of the underlying OS. This allows the application development teams to secure their application independent of the system it will be running on. Therefore the application development can focus solely on developing the application instead of how it interfaces with underlying systems. Also, the underlying system can be patched without changing the application. Containers are immutable and short-lived, which means they are rebuilt and redeployed often. You patch images and then those images populate the container, ensuring that the patch rollout process matches your code rollout process, as well as monitoring, canarying, and testing.

When deployed fully patched images are susceptible to a new vulnerability, the container registry can be used to discover susceptible versions and apply patches. This is much more efficient than scanning production clusters directly. Ensuring that only the latest images are in production, by monitoring container age, can limit the need for that kind of patching as well.

Use Microservices

Microservices naturally support zero-trust networking. A heterogeneous notion of trust is used by microservices inside the network perimeter. There is also the convergence of security as a result of the use of microservices. Some processes, tools, and dependencies are available across teams. Therefore, there can be common cryptographic libraries or monitoring and alerting infrastructure. Critical security services can be separated into microservices which can then be maintained by a dedicated team.

Workloads are separated into smaller, manageable units which allows for better maintenance and discovery. Infrastructure changes are more flexible because you can independently scale, load balance, and perform rollouts of each microservice.

Summary

There are many tools in SRE's bag to ensure secure systems. Most are being used across the enterprise IT ecosystem already today. Ensuring your dependencies are up to date prevents critical vulnerabilities from making their way into production for an extended period of time. It also improves response time to critical vulnerabilities. Releasing frequently with automated testing ensures that there are fewer bugs in your releases and it is easier to pinpoint where they are, so that rollback and patching happen quickly. Using containers mitigates security problems that come with the application interfacing with the infrastructure. Containers are immutable and short-lived, so the latest version of an application is usually in production and those can be easily patched and rolled out. Microservices naturally support zero-trust networking and allow for the use of common processes, tools, and dependencies to be available across teams.

These principles allow for SRE to add value to the operations department. Further, automation along these lines will increase return of value. Security can also be increased by further innovation in the SRE space.

References

Oprea, A., Beyer, B., Blankinship, P., Adkins, H., Lewandowski, P., & Stubblefield, A. (2020). Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems (1st ed.). Sebastopol, CA: OReilly Media.

Search This Blog

Requisite Software Development

SRE and Security

Comments

Post a Comment