Introducing the Indicators of Behavior (IOB) Sub-Project
February 15, 2023Detecting Malicious Remote Authentication Requests Using Graph Learning
April 28, 2023Threat hunting refers to a proactive approach to identifying previously unknown, or ongoing non-remediated threats, within an organization’s network. Given the dramatic rise in number of cybersecurity attacks in the recent years, threat hunting is very important to secure businesses and enterprises. My colleague, Dr. Mohammad Jafari Dehkordi and I have proposed a new approach to detect lateral movement at the Canadian Institute for Cybersecurity. In this post, I first briefly introduce the approach, and then I show how the approach can be applied on the data read using STIX-Shifter in the Kestrel threat hunting platform.
Lateral Movement
Lateral movement refers to a tactic in the MITRE ATT&CK Matrix for Enterprise. Literally, it refers to the movement of the adversary through the enterprise after the initial access. Remote connections play an important role in the lateral movement. Our approach aims to detect the remote authentication requests made by the adversary to laterally move within the target network.
Overview of the Approach
This is a semi-supervised anomaly detection approach to detect malicious authentication requests made in lateral movement. Three entities are involved in each authentication requests: a user, a source host, from which the request is made, and a destination host, that the user has requested to access it. Note that the source and destination can be same or different. The approach models a benign authentication request as a connections among a cluster of users, a cluster of sources, and a cluster of destinations, represented as (Cu , Cs , Cd), where Cu , Cs , and Cd represents a cluster of users, sources, and destinations, respectively.
Our approach uses the benign authentication requests as the training data and clusters hosts acted as the sources of authentication requests, hosts acted as the destinations of requests, and users separately. The goal of this clustering is to put the entities which are similar in terms of the behavior observed in the relevant authentication requests in a cluster. After clustering, we have a number of clusters dedicated to users, sources, or destinations. The training continues with deriving the inter-cluster connections by iterating over the training data. If the training data includes a benign authentication request made by a user belonging to cluster Cu from a host belonging to cluster Cs to access a host in cluster Cd , a connection among these clusters is derived as (Cu , Cs , Cd). The approach makes the prediction based on the inter-cluster connections derived during the training. To predict the status of a request in terms of benign or malicious, first the clusters to which the source, destination, and user belong are found. Then, if there exists a connection between those cluster, the request is considered as benign; otherwise, it is labeled as malicious. The approach takes the number of clusters as input.
Required Data about Authentication Requests
The approach is semi-supervised and should be trained using some information about a set of benign requests. The following information about authentication requests is needed to train our classifier and detect the lateral movements.
- The time of submitting authentication request
- The host from which request is submitted (source host)
- The username of the user who submits the request
- The domain from which the request is sent.
- The host to which user wants to access (destination host)
This information can be represented as a STIX Cyber-Observable (SCO) as follows.
{
"id": "observed-data--10fb1d74-41f9-4761-a72a-0002",
"type": "observed-data",
"spec_version": "2.1",
"created": "2022-12-08T00:03:19",
"modified": "2022-12-08T00:03:19",
"objects": {
"0": {
"type": "ipv4-addr",
"value": "0.0.4.66"
},
"1": {
"type": "ipv4-addr",
"value": "0.0.4.68"
},
"2": {
"type": "user-account",
"user_id": "C466@DOM1",
"status": "benign"
},
"3": {
"type": "network-traffic",
"src_ref": "0",
"dst_ref": "1"
}
},
"first_observed": "2022-10-05T00:03:19.000Z",
"last_observed": "2022-10-05T00:03:19.000Z",
"number_observed":1
}
We assume that the information of the all requests has been stored in a table in the mysql database in the ascending order based on the time. This table is illustrated in Fig. 1. The first column shows the times at which authentication requests have been generated in timestamp format. The second column shows the usernames together with the domains. The IP addresses of the source and destination hosts are stored in the third and fourth column. Finally, column “status” shows whether the status of the authentication request. The value of this column is either benign or unknown. The benign requests will be used to train the approach, and the requests with unknown status will be tested. In the rest of this blog, we show how we can retrieve this data using STIX-Shifter in Kestrel.
How to Use the Kestrel Analytics
In order to use this Kestrel analytics, you should take the following steps.
- Download
- Build
- Loading the data about the authentication requests in Kestrel
- Apply
Download and Build
Our Kestrel analytics is available in the GitHub repository of the Open Cybersecurity Alliance. To use this Kestrel analytics, you need to download it and then build it. The commands needed to download and build the Kestrel analytics are written below.
$ git clone https://github.com/opencybersecurityalliance/kestrel-analytics/lateral-movement-detection.git
$ docker build -t kestrel-analytics-detect_lateral_movement .
How to Read Data in Kestrel
A simple approach that someone may think of to read the data from the database is retrieving them as observed-data
type using the GET command. However, this command returns nor network-traffic neither user-account objects encapsulated in an observed-data object.
observations = GET observed-data FROM stixshifter://database WHERE [user-account:user_id != null]
DISP observations LIMIT 10
Thus, we have to retrieve user-account and network-traffic objects differently. First, we use the GET
command to retrieve user-account objects. Since the WHERE
clause is mandatory, we just put a simple STIX pattern which is always true like user_id != null
. Afterward, we retrieve the network-traffic objects, which are linked to the user-account objects, using the FIND
command.
users=GET user-account FROM stixshifter://database WHERE [user-account:user_id != null]
connections=FIND network-traffic LINKED users
Th database likely contains more than one authentication request made by a user or a number of authentication requests with the same source and destination pairs. However, the GET
and FIND
commands retrieve only one object per each source and destination pair or user involved in the authentication requests. To cope with this difficulty, we apply the ADDOBSID transformer on variables users
and connections
. This transformer returns a table including attributes of all user-account or network-traffic objects together with the identifiers of the corresponding SDOs.
connections_obs = ADDOBSID (connections)
users_obs = ADDOBSID (users)
Using DISP
command, you can see the tables returned by the ADDOBSID
transformer.
DISP users_obs LIMIT 4
Applying the Kestrel Analytics
Now, we can apply the Kestrel analytic on tables users_obs
and connections_obs
and variable observations
to detect lateral movement . Unfortunately, tables users_obs
and connections_obs
are not changeable since the ADDOBSID transformer actually creates a view. Thus, we need to send variable observations
to the Kestrel analytics to record the output, detection result, as a custom property in the variable observations. We pass three parameters, called “ku”, “ks”, and “kd”, to the Kestrel analytics, which refers to the number of clusters of users, sources, and destinations, respectively.
APPLY docker://detect_lateral_movement ON observations, users_obs, connections_obs WITH ku=60, ks=60, kd=60
Output
Our Kestrel analytics modifies variable observations
by adding more columns which represent the source and destination hosts, username, updated status, and cluster identifiers. These new columns are represented in Fig. 4. Note that columns c_usr
, c_src
, and c_dst
represent the identifiers of clusters to which the user, source host, and destination host belong, respectively. Value -1 for the cluster identifier indicates that the user or host does not belong to any cluster since the user or host was not involved in any train request. If value of column status is “malicious”, the authentication request is recognized as the lateral movement.
DISP observations ATTR id, created, first_observed, source, destination, username, status LIMIT 10
Leila Rashidi is a postdoctoral fellow at the Canadian Institute for Cybersecurity (CIC), University of New Brunswick (UNB), Canada. She currently leads a team at CIC, working on user entity behavior analysis in collaboration with IBM. Her research focuses on the cybersecurity and performance evaluation.