Detecting Malicious Remote Authentication Requests Using Graph Learning

A Kestrel Analytics to Detect Lateral Movement

February 27, 2023

Machine Readable Representation of Adversary Behavior (video)

May 10, 2023

Published by Leila Rashidi on April 28, 2023

In my previous blog post, I introduced a Kestrel analytics to detect lateral movement using clustering sources, destinations, and users and deriving inter-cluster authentication paths. In this blog post, I introduce a new Kestrel analytics which detects lateral movement using graph learning. This Kestrel analytics is developed by my colleague, Dr. Mahdi Rabbani and I at the Canadian Institute for Cybersecurity.

Why graph learning?

Graph learning refers to analysis of graph to learn the structure of graph and particularly node embeddings. Embedding of a node is a numeric vector and is considered as a feature vector representing that node. To detect malicious remote authentication requests, it is important to analyze the patterns of the authentication events in which a user or host were involved. For instance, it is important to consider the set of users that have requested to access a host or the set of hosts that are requested from a host. This information can be represented in a graph, defined as follows.

Per each user or host, there is a node that represents that user or host.
If an authentication request has been submitted by a user to access a host, there is an edge directed from the node representing the user to the node representing that host.
If an authentication request has been submitted by a user from a host, there is an edge directed from the node representing that host to the node representing the user.

For instance, assume there is a network with five hosts, called C17693, C151, C1495, C1521, and C305, and three users whose usernames are U748, U636, and U1723. Fig. 1 represents a graph constructed based on the following authentication requests:

Req 1: User U748 requests to access host C305 from host C17693.
Req 2: User U748 requests to access host C305 from host C151.
Req 3: User U636 requests to access host C305 from host C1495.
Req 4: User U1723 requests to access host C305 from host C1521.

Fig. 1: A small graph constructed based on three benign authentication requests and one malicious request. The nodes and edges related to the malicious request are red.

Using graph learning, we can analyze the connections between different users and hosts and the paths consisting of subsequent remote accesses that show the reachable hosts. The output of graph learning is an embedding vector for each node and edge of the graph, which can be used to differentiate the normal and malicious authentication requests.

Overview of the Machine Learning Approach

The machine learning approach implemented in this Kestrel analytics is supervised and consists of three phases, as follows.

Phase 1: Building Graph
Phase 2: Graph Learning
Phase 3: Training Classifier

In Phase 1, a graph is constructed based on all available authentication events, regardless of if they are malicious or normal, as explained in the previous section. In Phase 2, our Kestrel analytics derives a vector of 128 elements for each node using node embedding techniques. These vectors and time of authentication requests are used to generate a feature vector for each requests. Phase 3 is the last phase which is dedicated to training a classifier. Currently, five classifiers can be applied using this Kestrel analytics, namely, SVM, logistic regression, KNN, XGBoost, and random forest.

Providing Data about Authentication Requests in Kestrel

To load the train and test data using Kestrel, we can save the data into a database shown in the previous blog post. For example, we can have a table with five columns, as represented in Fig. 2.

Fig. 2: A database table that stores the information about train and test authentication requests

The data stored in the database can be read using the STIX-shifter data source interface. We can load the data in the same way as the previous Kestrel analytics to detect lateral movement using the following commands. For further information, you can refer to my previous blog post.

users = GET user-account FROM stixshifter://database WHERE [user-account:user_id != null]

connections = FIND network-traffic LINKED users

connections_obs = ADDOBSID (connections)

users_obs = ADDOBSID (users)

observations = GET observed-data FROM stixshifter://database WHERE [user-account:user_id != null]

How to build and apply the Kestrel analytics?

This Kestrel analytics is available here. Before applying it, you need to download and build it using the following commands.

$ git clone https://github.com/opencybersecurityalliance/kestrel-analytics/Graph\ Learning-based\ Lateral Movement\ Detection.git

$ docker build -t kestrel-analytics-detect_lm .

After building the Kestrel analytics and reading the information stored in the database, we can apply the Kestrel analytics on variables observations and tables users_obs and connections_obs, respectively. This analytics takes two input parameters. The first one is called walkLength which represents the maximum length of the random walks during Phase 2, and the second one is called classifier, which can be set to either svm, knn, logisticRegression, xgboost, or randomforest. For more details, please see the following line of code.

APPLY docker://detect_lm ON observations, users_obs, connections_obs WITH walkLength=3, classifier=xgboost

Our Kestrel analytics modifies variable observations by adding five attributes, called “destination”, “source”, “status”, and “user_id” to all entities. After these modifications, each entity will show an authentication request for which its status in terms of being malicious or benign is represented by the value of attribute status of the entity.

Leila Rashidi

+ posts

Leila Rashidi is a postdoctoral fellow at the Canadian Institute for Cybersecurity (CIC), University of New Brunswick (UNB), Canada. She currently leads a team at CIC, working on user entity behavior analysis in collaboration with IBM. Her research focuses on the cybersecurity and performance evaluation.

Detecting Malicious Remote Authentication Requests Using Graph Learning

A Kestrel Analytics to Detect Lateral Movement

Machine Readable Representation of Adversary Behavior (video)

A Kestrel Analytics to Detect Lateral Movement

Machine Readable Representation of Adversary Behavior (video)

Why graph learning?

Overview of the Machine Learning Approach

Providing Data about Authentication Requests in Kestrel

How to build and apply the Kestrel analytics?

Leila Rashidi

Related posts

Announcing CACAO Roaster v1.3.0!

Integrations made easier with Meshroom

Call for STIX-Shifter Maintainers