Consider the following scenario: you are asked by your leadership to find dedicated time for threat hunting activities within your network.
After some time, access to the shiny new tool of choice is granted and you are super excited to get started. You log into the tool and are greeted with a lovely search bar; how do you proceed from here?
The tool presenting the blank search bar is undoubtedly powerful and feature packed. However, it is often very difficult to chart a path forward and to harness these powerful features.
Let us put ourselves in the shoes of a SOC analyst who has been placed in front of this proverbial blank search bar and has been given the task of finding threats proactively in the environment. Where would to begin to tackle such a task?
The above question provides the basis for this post, guiding how to address the “blank search bar” problem.
Before diving into this topic, let us step back for a moment and provide some base definitions for what threat hunting means in the context of this blog post.
What is threat hunting
Like many terms in the cybersecurity industry, the definition of threat hunting may change depending on the context and who is providing the definition.
The seminal “TTP-Based Hunting” paper by MITRE provides several early definitions of threat hunting:
A focused and iterative approach to searching out, identifying and understanding adversaries that have entered the defender’s networks
… the process of proactively and iteratively searching through networks to detect and isolate advanced threats that evade existing security solutions.
the proactive detection and investigation of malicious activity within a network
A few key themes shake out from these definitions:
Detect and investigate
In plain language, rather than waiting for your EDR or tool of choice to flag malicious activity, threat hunting warrants that “hunt teams” proactively search for and detect such threats where they are investigated. The process is also iterative, meaning one hunt engagement can feed into another.
Threat hunting engagements can be kicked off through many “inputs” - be it a threat report, a hypothesis of some kind, a newly released technique or just simply a hunch.
In this blog post, we will be focusing on hypothesis-based threat hunting, where we articulate a hypothesis and aim to prove or disprove it using the data that are available to us.
Why the command line
Now that we have defined what threat hunting is, let us dive deeper into the command line and articulate why this particular data item is a fantastic starting point for threat hunting journeys.
To do so, let us take a hypothesis-based threat-hunting approach to MITRE ATT&CK data.
Our hypothesis here is that by examining the data source component within the MITRE ATT&CK data set, we can gain an idea of which data source applies to the most MITRE ATT&CK techniques and procedures.
In other words, we want to spend our valuable time looking at a data source that would - ostensibly - surface the most threats.
We can begin crunching the MITRE ATT&CK JSON data with the following query:
Looking at the returned results, an interesting dynamic begins to bubble up:
We see that large chunks of our pie chart are represented by the “Command Execution” and “Process Creation” categories - both of which have a command line data element.
Together, these represent almost thirty percent of the total data source components within the MITRE ATT&CK framework.
It stands to reason then, that investing time and effort into hunting through these data sources is worth the effort, as according to the data, they present the largest opportunity for finding malicious or at least suspicious activity.
Of course, this analysis paints in very broad brush strokes and does not consider environmental specifics which may warrant starting your threat hunting adventures with a different and perhaps more applicable data source.
Now that we have articulated what threat hunting is and why we focus on the command line, let’s dive into some use cases!
Hypothesis One: Long command lines are malicious
Let us start with a simple hypothesis, that all long command lines in your environment are malicious.
We can start by getting a handle on our data with the following query:
_index=sec_record_endpoint | length(commandLine) as command_line_length | values(commandLine) by command_line_length | sort by command_line_length desc
This query will search the normalized event index in the Sumo Logic Cloud SIEM platform and will display the length of the command line, along with the command line value in descending order.
When prototyping your hunts, it is always nice to have a set of “malicious” data to test your hypothesis against. This can mean running unit tests such as Atomic Red Team or knowing when a penetration test or red team engagement took place so that you can ensure this date range is included in your query.
In our lab environment, a long command line was executed as a “control” and utilizing a sixty-minute time window, our results look like this:
This is awesome, because we now know that a command line that is over eight-thousand characters in length is at least a little bit suspicious in our particular environment.
As a next step, we can create a Cloud SIEM Signal that looks something like this:
We can then go ahead and configure a custom insight to include this signal:
The reason that custom insights are so powerful for threat hunting hypothesis testing is that they include a comments and tag section and can be assigned to different members of the team.
Going back to our original search, once we broaden out the time frame of the query, we discover that our hypothesis could be more solid.
Navigating back to the custom insight we looked at earlier, we can go ahead and add a comment, letting our hunt team members know that this particular hunt needs some tweaking prior to becoming operationalized.
We can also go ahead and add relevant MITRE tags and assign the work to a particular user.
Now we have a centralized place to document our hunting efforts and track work progress, very cool!
Hypothesis two: Special characters in the command line are suspicious
For example, let’s say we wanted to obfuscate the “whoami” command in order to evade basic detections based on this command line value.
Via Invoke-DOSFuscation, the command now looks something like this:
There are probably several ways that we can look for this activity, including a count of obfuscated characters or even command line length like we covered.
Looking at another approach, we can also use qualifiers within our query to look for specific characters which are usually not found in our command lines and flag only when all these conditions are met.
In query form, this looks like this:
_index=sec_record_endpoint AND metadata_vendor = "Microsoft" | if(commandLine matches /(\^)/,1,0) as carrot_match // match on a ^ character | if(commandLine matches /(\&)/,1,0) as concat_match // match on & character | if(commandLine matches /(\%)/,1,0) as percent_match // match on % character | where carrot_match = "1" and concat_match = "1" and percent_match = "1" // only return results if all three match | fields carrot_match,concat_match,percent_match,commandLine
And our results:
We can also broaden our search out a little by looking at additional obfuscation characters and setting a threshold for our match, something similar to:
_index=sec_record_endpoint AND metadata_vendor = "Microsoft" | if(commandLine matches /(\^)/,1,0) as carrot_match // match on a ^ character | if(commandLine matches /(\&)/,1,0) as concat_match // match on & character | if(commandLine matches /(\%)/,1,0) as percent_match // match on % character | if(commandLine matches /(\"")/,1,0) as quote_match // match on “” character | if(commandLine matches /(\;)/,1,0) as semicolon_match // match on ; character | (carrot_match + concat_match + percent_match + quote_match + semicolon_match) as total_obfuscation | where total_obfuscation >= 3 | fields commandLine,total_obfuscation
Once we are happy with our search, we can follow the instructions to turn this scheduled search into a CSE Signal, once our execution occurs, a CSE signal will trigger:
From here, we can go ahead and add this signal to our existing insight or create a new custom insight to track our hunting activities.
Hypothesis three: I can detect mark of the web bypasses via the command line
At the start of this post, we crunched some MITRE ATT&CK data and saw how the command execution and process creation data components link to more MITRE tactics and techniques than other data source components.
However, when perusing the MITRE page for “Mark of the Web” (MOTW) bypasses - as one does - a lightbulb goes off!
We see that threat actors are packing their payloads within an ISO file to avoid having their payloads tagged with the MOTW.
Our hypothesis here is that when Windows mounts an ISO file, it usually assigns it a drive letter other than “C” and that when a file is executed from this drive, it would show up in our command line logs, and this is something we can hunt for.
It should be noted that for this particular technique (T1553.005) MITRE only has “File” as a data source:
This is why the threat hunting process is critical to your overall security program and why the process is iterative and proactive, as sometimes you need to go beyond existing frameworks to squeeze every ounce of value from the data that you are attempting to wrangle.
In order to test our hypothesis and generate the relevant data, we can either craft a custom payload, or use the T1553.005 atomic red team test.
For demonstration purposes, we will scope this particular hunt to Rundll32 executing a payload from an ISO container of some kind.
Here, our detection logic will look similar to:
toLowerCase(commandLine) matches /("c:\\windows\\system32\\rundll32.exe".)([^c]\:\\)/
Hypothesis four: I can detect enumeration across multiple operating systems via the command line
In today's networks, it is common to find a mix of operating systems in use.
Some folks may be using Macbooks and others Windows laptops with some Linux workstations thrown in the mix.
Threat hunting efforts utilizing telemetry stemming from three different operating systems can be a great challenge, as the telemetry may come in different formats with the command line values found in fields that are named differently across operating systems and telemetry sources.
Sumo Logic’s Cloud SIEM product comes with various log mappings and parsers that give users threat-hunting superpowers!
What this means in practice is that you are able to search for a command line value across multiple operating systems as the field name is normalized.
Knowing that we can use one field name to search across multiple operating systems, we can craft a hypothesis which states that discovery activity across multiple operating systems in a short period of time can be - at least potentially - indicative of suspicious behavior.
Let’s look at a practical example of this, using the chain rule functionality in Sumo Logic Cloud SIEM.
Our rule logic will look:
In the rule, we are looking for three distinct matches and are keeping things fairly simple, looking for whoami execution on three different operating systems, the Microsoft vendor is for Windows logs, Laurel for Linux, and Jamf for macOS.
As a quick test, we can go ahead and issue the whoami command to our three systems, and a signal should trigger:
Here we can see which hosts were involved, what users were involved, where the “whoami” processes spawned from, as well as what command lines were used.
Hypothesis five: I can detect APTs with “whoami”
Starting again from MITRE ATT&CK and looking at the System Owner/User Discovery technique we notice an interesting command line value within the procedure examples section:
What if we can flag every occurrence of this command in our environment?
This seems like a great idea and we prototype a quick search to validate our assumptions:
Unfortunately, a lot of results are returned and it is not clear which whoami is suspicious and which are legitimate.
A certain command line value may or may not be malicious or suspicious on its own, but what if we can flag a “whoami” execution from a user or machine that has not executed this command for a period of time?
This temporal element may be the missing piece of information that we need to prove or disprove our hypothesis.
This use case lends itself well to our Cloud SIEM First Seen rule feature, which removes the need for complex queries that look over huge amounts of data to mark first and last seen events.
Below is an example of this kind of rule,
Now, instead of alerting on a “whoami” execution, which can occur frequently in the environment, we add a temporal and baseline element in order to increase the efficacy of the rule logic and raise either an alert or a Signal which can be part of a broader Insight when this activity occurs for the first time in our environment since the baseline period.
Hypothesis six: I can enhance my command line hunts with additional data
Earlier, when looking at our very first hypothesis, we posited that a long command line in our environment is worthy of some kind of attention. We prototyped our hunt and found that in our environment, there is a mixed bag of malicious and benign activity that utilize long PowerShell command lines.
Returning to our MITRE ATT&CK data source analysis, we noted that the command line features heavily in this and takes up the number one and two positions. However, the “network traffic content” is a close third.
What if we could combine our command line and network traffic data?
We know that our long command line hunting logic works to find malicious activity, but we need to add some parameters to this event in order to set it apart from more benign activity, we can do this by adding the following parameters to our hypothesis:
Look for a long PowerShell command line
- Followed by PowerShell making a network connection that is:
To an external IP address
Once again we can turn to Sumo Logic CSE Chain Rules to accomplish this for us:
Once we perform some validation testing, our signal should trigger and look something like this:
This scenario neatly outlines how combining MITRE ATT&CK data source components, in addition to iterating through threat hunting hypotheses - which includes execution, detection, rule modifications and iterations - all combine in order to enhance the threat detection value provided by the telemetry that is being ingested by your security tooling.
We began this blog by outlining the challenges that cyber security analysts and engineers face when being tasked with crafting threat hunting engagements and activities for complex environments.
It is often difficult to decide where to begin such threat hunting activities, as there are so many different strands of telemetry stemming from various cloud-based and on-premises systems. It is often very difficult to find the proverbial needle in the haystack.
By examining the MITRE ATT&CK data closely, particularly the data source components, we can begin to prioritize and organize our threat hunting efforts, spending more time on data points that are tied to the most MITRE ATT&CK tactics and techniques. From here, we can drill down into the procedural level of these techniques and look for this activity in our own environments.
The powerful cloud-native log analytics platform offered by Sumo Logic provides us with the tools and features we need in order to take full advantage of this extremely rich and complex data source, offering us normalization, UBA features, case management as well as an extremely powerful search language that can all be used in tandem in order to find suspicious and malicious activity in environments by utilizing the command line data source component.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.