Over the past decade, the way we build and deploy applications has changed dramatically. The explosion of public cloud providers enables us to deploy software without engaging in a drawn-out process to procure and set up infrastructure. Agile, DevOps, Continuous Integration, Continuous Deployment, and other changes to how we work have dramatically accelerated the speed with which we can get new applications and updates in front of our users.
As these innovations and improvements have increased our velocity, they have also expanded the attack surface of our applications. We’ve made significant strides in how we secure our applications. Unfortunately, those who conduct these attacks and incursions have been innovating as well. Maintaining a secure production environment requires careful planning and constant vigilance. This article will address how we monitor our applications from a security perspective and understand which metrics require our attention.
Understanding Your Application’s Attack Surface
If someone assigned you the task of securing a physical building, you’d start by taking an inventory of all potential points of entry. Understanding where an intruder might attempt to enter the building allows you to assess risk and implement security measures to protect entry points. Your applications are no different; before you can fortify them against attacks, you need to understand where those attacks might occur. The areas where your application is vulnerable is commonly known as your application’s attack surface.
For cloud-based applications, the attack surface includes, but is not limited to:
- Physical access to the infrastructure. 
- Network ingress from the internet and intranet. 
- APIs with public and private access. 
- Bugs within your application. 
- Bugs and vulnerabilities in libraries your application uses. 
- Traffic that isn’t appropriately validated and sanitized. 
Suppose we return to the analogy of securing a physical building. In that case, once you’ve fortified the entrances to the building, you’ll want to install monitoring devices that can alert you if someone breaches the building. Devices like motion detectors and cameras help you identify unusual and unauthorized activity inside the building, and even on the perimeter.
Your software application is not different. You need to have a monitoring strategy that effectively monitors the application, the infrastructure, and user interactions. Your plan should also include alerts to notify your support personnel in the event of a breach or an attempted breach, so they can respond appropriately. Cloud security requires a multi-faceted approach, so we’ll divide these metrics accordingly.
Active Monitoring
When we hear monitoring, the first thing that comes to mind is watching your systems for signs of intrusion. These metrics are vital indicators that someone or something is trying to access your system.
Access Attempts
Monitoring failed access attempts, especially those that occur rapidly or methodically over time, is critical. These attempts don’t just need to be on user accounts. They can be against secured APIs or access to other resources, such as file stores, databases, or compute devices.
Traffic Volume
The amount of traffic moving around your systems should follow reasonably predictable patterns. Sudden spikes or other anomalies can alert you to potentially malicious activity. Additionally, if legitimate users are using network resources to download software, media, or other large files, these can potentially open the door to attacks. Monitoring the source and destination of traffic can also indicate problems, especially when compared against lists of networks known to support malicious content.
Session Length
Network sessions that remain open for extended periods could indicate a VPN tunnel that a malicious actor could use for transferring data. In addition to monitoring the length of user sessions, keep track of the duration of connections established with ports used for remote access, like port 22 (SSH), port 23 (Telnet), and port 3389 (RDP).
System Analysis
In addition to actively monitoring activity within your applications and infrastructure, it is also critical to perform regular analysis of the state of your systems, including configurations and access controls.
Policy Violations
Your organization should have documented standards and policies that govern how resources can be used and integrated into the cloud environment. For example, data stores should be encrypted and only allow connections from well-defined devices within your network. Your system should perform regular audits, preferably in an automated manner that identifies violations to these policies and alerts the responsible team to ensure they are brought back into compliance.
Certificate Configurations
User connections, transfer of data between devices, and data storage should all be encrypted using valid and correctly configured security certificates. The certificates themselves should also be secured and regularly audited for compliance with your security policies.
User Policies
Your environment may require that some users have root or superuser access to the systems; however, this access should be tightly controlled and only used when necessary. Regular audits should monitor how these accounts are used, and prevent their proliferation.
You'll also want to monitor access by third parties and your business partners to ensure that they have the appropriate access levels and that they aren't able to access systems beyond the scope of your agreement.
Finally, implement policies that regulate how the system manages changes to an employee or partner's status. Each change in a user's status should trigger an audit of their access rights and remove any access that is no longer required. When you terminate an employee, policies and system processes should quickly remove all access from the system.
When Something Happens
Even the strictest and best-designed security plans will invariably encounter problems over time; whether these are actual breaches or attempted breaches, it's vital that you monitor how well your organization responds.
Measuring the Response
Mean-time-to-detect (MTTD) and mean-time-to-respond (MTTR) are essential metrics that the security team should track and measure. Making these metrics visible to your organization and reducing each measure will ensure that your team focuses on quickly identifying and mitigating security risks and incidents.
Taking It Further
The metrics listed above are key metrics that your organization should be monitoring as a baseline, but it is by no means a complete list. As I mentioned initially, as security evolves, so do the methods that hackers and other bad actors employ to circumvent them. Constant vigilance and continuous improvement are critical to ensure you keep your environment safe and secure.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
