In our previous blog post, we showed how to get
started with the Kestrel Threat Hunting Language, such as connecting to data
sources and performing your first hunts using the
FIND commands. In
this post, we’ll introduce the
APPLY keyword, which adds powerful analytics
and enrichment capabilities to hunts.
We will show a Kestrel hunt performing backward and forward tracking on a Windows host to unearth the root cause and impact of activities related to an IP address. We will walk the process tree and extend analysis to network traffic and IP addresses, and we will use both pattern matching and analytic hunting steps to build the huntflow.
Table of Content
- Hunting Environment
- Start From An IP
- From IP to Process
- Backward Tracking
- Forward Tracking
- Applying An Analytics
If you’re unfamiliar with setting up Kestrel for your environment, start by reviewing our previous blog Building A Huntbook to Discovery Persistent Threats from Scheduled Windows Tasks. There you will find the steps to monitor a Windows host with Sysmon, stream the log data to Elasticsearch via winlogbeat, install Kestrel in a Python virtual environment, and install and configure the STIX-shifter elastic_ecs connector to access the log data.
Sysmon traces Windows activities and stores individual events as records in Elasticsearch. Reasoning on records directly requires excessive knowledge of record semantics, which gets worse when querying multiple data sources with different record formats. The Kestrel runtime identifies entities in records and automatically collects all available information about entities in related records before presenting the complete view to the user. Kestrel enables us to hunt through entities, e.g., establishing an IP address entity in a Kestrel variable from hundreds of records, each of which describes one aspect or appearance of the entity, finding all process entities connected to that IP address, and tracing other related entities such as network-traffic, file, or registry-key. This entity-based hunting approach makes it easy for a human to organize hunts, pivot from one hunt step to another, and enables composable huntflow development.
In this tutorial, we will start from the IP address
22.214.171.124 and walk
the associated process tree to reason about whether the IP address is suspicious.
First, let’s create a Kestrel variable starting from the IP address as an entity:
ip200 = GET ipv4-addr FROM stixshifter://host101 WHERE [ipv4-addr:value = '126.96.36.199'] START t'2021-04-01T00:00:00Z' STOP t'2021-04-06T00:00:00Z'
We specify the type of entity
ipv4-addr we would like to get from the
monitored Windows host
host101 and describe the matching criteria using the
[ipv4-addr:value = '188.8.131.52']. We limit the time of the
search for the first 5 days in April to limit our search. The time range is
required in our first
GET command in a huntflow to set the scope. Otherwise,
STIX-shifter defaults to the past five minutes.
If you have setup multiple data sources besides
host101, you can use
auto-complete in Kestrel to list all available data sources by pressing
After putting the command in a Jupyter cell and executing it, we get a summary for the code block—one IP address entity and 350 records/logs, each of which has some information about the IP. We don’t bother with the number of records since we are doing entity-based reasoning and Kestrel will assemble the entity from the 350 records for us.
Finding hosts connecting to the IP address is good, and pinning down the processes on the hosts that communicate with the IP is even better. To achieve this, we need to connect to an endpoint monitoring system such as Sysmon mentioned in our previous blog post. From an endpoint’s view, a process creates a network-traffic that reaches a remote host or an IP address entity. We can use the FIND command to navigate through connected entities; the Kestrel runtime will generate/execute corresponding data source queries and assemble entities from returned records.
According to the relation chart in the FIND command
syntax, we use
ACCEPTED BY relation to describe the IP addresses in
ip200 are the
destination IPs of the network-traffic returned. We then use
relation to describe the network-traffic is associated with the process to
be returned. The
DISP command will print select attributes of the entities in
a Kestrel variable without side effects. Now let’s put all four commands
together into a code block and execute it in a Jupyter Notebook cell:
# obtain network traffic that has ip200 as the destination IP ip200nt = FIND network-traffic ACCEPTED BY ip200 DISP ip200nt ATTR src_ref.value, src_port, dst_ref.value, dst_port # obtain processes creating the network traffic p = FIND process CREATED ip200nt DISP p ATTR pid, name, command_line
We get back 350 network-traffic entities in
ip200nt and 34 process entities
Here is a partial list of the network-traffic entities in
ip200nt out of
the 350 entities:
Are all network-traffic HTTPS (port 443)? If we only display the destination
port attribute of entities in
DISP will deduplicate the results
before output, so it is easy to find out the answer using another
command in a new cell:
DISP ip200nt ATTR dst_ref.value, dst_port
After executing the cell, it is clear we guess it right: all network-traffic
to the IP
184.108.40.206 are pointing to its port 443:
Nothing is explicitly suspicious about
ip200nt, and let’s move to the results
we show about
p in our previous executed block: the
DISP p command shows an
abridged list of the 34 associated processes:
We know all 34 processes in the Kestrel variable
p connected to
220.127.116.11:443. Next, we pick up the first process
to start walking the process tree to further our understanding of the
activities on the Windows host.
Backward tracking is a hunting strategy to walk back the control-flow or data-flow of entities and understand their origin or provenance. The most common task for process entities is to backtrack their control-flow and walk up the process tree to check whether given processes are created by a malicious or potentially compromised process.
Let’s take a close look at a subset of processes with name
p. From their command line we can guess they are
benign and belong to Microsoft Visual Studio. Let’s check their parent process
to verify this.
# get a subset of entities from variable `p` bgdownloads = GET process FROM p WHERE [process:name = 'BackgroundDownload.exe'] # obtain parent processes of bgdownloads bgdp = FIND process CREATED bgdownloads DISP bgdp ATTR pid, name, command_line
We get four
BackgroundDownload.exe processes from
p and three processes as their
parent processes. From the parent process names and executable paths, we are
BackgroundDownload.exe processes are part of Microsoft Visual
Studio and spawned by processes from the suite.
Next, let’s see if we can trace back one level to find the grandparent processes:
# grandparent processes of `BackgroundDownload.exe` bgdpp = FIND process CREATED bgdp
Good, we see two processes, and Kestrel also gets back some records with network activities when trying to get the most complete information of the entities—there are 146 network-traffic records related to the two processes. However, this is only a summary of the command execution, and we are not sure whether the network-traffic entities are directly or indirectly linked to the grandparent processes. Let’s print out details of the grandparent processes and the network traffic:
DISP bgdpp ATTR pid, name, command_line bgdppnt = FIND network-traffic CREATED BY bgdpp DISP bgdppnt ATTR src_ref.value, src_port, dst_ref.value, dst_port
Zero network activities: the 146 related
network-traffic records cached could
be indirectly associated with the process in
bgdpp such as their parent or
It is easy to guess the
bgdp is spawned from
bgdpp (likely a double click by a human user);
svchost.exe is spawned
services.exe. We can choose the former to verify:
bgdp_devenv = GET process FROM bgdp WHERE [process:name = 'devenv.exe'] bgdp_devenv_parent = FIND process CREATED bgdp_devenv DISP bgdp_devenv_parent ATTR pid, name, command_line
Bingo. And we can walk the process chain up of
# Let's go further up to pull out parent process of `bgdp_devenv_parent`. ppp = FIND process CREATED bgdp_devenv_parent DISP ppp ATTR pid, name, command_line
explorer.exe is spawned by
svchost.exe -k DcomLaunch -p, which is the
Windows DCOM Server Process Launcher and this behavior is expected. The DCOM
launcher is the great grandparent of
BackgroundDownload.exe, and it is one
of the core Windows system services. We could stop here, but we can also try to
trace back further to see what is the limit of the Sysmon monitor regarding its
visibility into the very early phases of system bootup—of course, Sysmon can
only see things after it is started/spawned itself by a Windows service
# Let's see how far we can go to the origin of the process tree in sysmon pppp = FIND process CREATED ppp DISP pppp ATTR pid, name, command_line
OK. We just hit the limit of Sysmon, which does not log the birth of
After backward tracking and finding some interesting branches in the process tree, we can perform forward tracking, or walk down the tree, to check other activities from the entities. This is a common hunting strategy to understand the impacts of an entity.
We start from processes that talk to IP address
18.104.22.168, and we already
have all such processes in variable
p. Let’s list all process names in
DISP p ATTR name
In the last section, we find the parent process of
could find the siblings of
BackgroundDownload.exe by walking down the process
bgdp (the parent processes of
BackgroundDownload.exe). We can
also go beyond the process tree and forward track files, network-traffic,
registry-keys and even further to other processes via files (data-flow
Let’s try a simple task: start from the
iexplore.exe processes in
p to (i)
walk down their process tree if they fork processes, and (ii) go beyond the
process tree to network activities at leaf processes.
# first walk down the tree from the iexplore processes in `p` ie = GET process FROM p WHERE [process:name = 'iexplore.exe'] DISP ie ATTR pid, name, command_line ie_children = FIND process CREATED BY ie DISP ie_children ATTR pid, name, command_line
As shown in the execution summary, the Kestrel variable
ie contains two
process entities with pids 7356 and 11368. There is only one process in
ie_children with pid 8508. Let’s check its network-traffic as we discussed:
# second let's go beyond the process tree for network activities of the child process ient = FIND network-traffic CREATED BY ie_children DISP ient ATTR dst_ref.value, dst_port
We find 76 network traffic entities from the IE process with pid 8508 shown above, all of which look like web connections. However, could a malicious C&C sever hide in the list? Usually attackers do not use IP addresses directly for C&C, but domain names created by domain generation algorithms (DGA). If we can enrich the IP addresses with their domain names, we may discover something malicious.
Enriching entities is another type of hunting steps besides pattern matching, and Kestrel supports such hunting steps as analytics. A Kestrel analytics is given all records of a list of entities, runs pre-programmed logic on the entities, checks with external threat intelligence, or matches entities with a pre-trained machine learning model. Finally, the analytics generates new attributes for the entities and gives them back to the Kestrel runtime to complete enrichment.
The analytic we use here is domainnamelookup in the Kestrel analytics repository. It is a Kestrel analytic executed via the docker interface. To use the analytics, first clone the repo and go to the domainnamelookup analytics directory. Then do docker build:
$ docker build -t kestrel-analytics-enrichdomain .
The analytic is now available as enrichdomain in Kestrel. Kestrel calls
analytics using the
APPLY command (more details in
APPLY docker://, we can press
TAB to list all available
analytics. If enrichdomain does not show up, restart the Jupyter kernel to
re-initialize the Kestrel kernel and its analytics interface manager.
# next let's apply the analytics to all entities in `ient`. APPLY docker://enrichdomain ON ient # print out all attributes including the ones added/enriched by `enrichdomain` INFO ient
Looking for attributes about domain names, we find two attributes newly added by
the analytic: x_domain_name and x_domain_organization. We can now
DISP ient ATTR dst_ref.value, dst_port, x_domain_name, x_domain_organization
No domain names here appear to be generated by naive DGA (examples from the article Domain Generation Algorithms – Why so effective?). Not bad to rule out a threat.
We hope you enjoyed this tutorial on how to use the Kestrel Threat Hunting Language to extend your searches and pivot between entity types to perform provenance tracking and impact analysis. More can be done by backward and forward tracking control-flow (through the process tree) and data-flow (through files) and applying other analytics in the hunts, and we hope to bring more powerful capabilities in future releases.
Kestrel 1.0 has just process–executable relation for files, which is not very powerful for data-flow tracking. In the future, Kestrel will support STIX 2.1 with SROs including universal file type support for more powerful data-flow tracking.
It is easy to build your own analytics, especially the ones run as docker containers. Check out the analytics template to start, and watch out for a future blog post to guide you in detail!
Until next time, happy threat hunting!