In my previous blog post, I introduced a Kestrel analytics to detect lateral movement using clustering sources, destinations, and users and deriving inter-cluster authentication paths. In this blog post, I introduce a new Kestrel analytics which detects lateral movement using graph learning. This Kestrel analytics is developed by my colleague, Dr. Mahdi Rabbani and I at the Canadian Institute for Cybersecurity.
Graph learning refers to analysis of graph to learn the structure of graph and particularly node embeddings. Embedding of a node is a numeric vector and is considered as a feature vector representing that node. To detect malicious remote authentication requests, it is important to analyze the patterns of the authentication events in which a user or host were involved. For instance, it is important to consider the set of users that have requested to access a host or the set of hosts that are requested from a host. This information can be represented in a graph, defined as follows.
For instance, assume there is a network with five hosts, called C17693, C151, C1495, C1521, and C305, and three users whose usernames are U748, U636, and U1723. Fig. 1 represents a graph constructed based on the following authentication requests:
Using graph learning, we can analyze the connections between different users and hosts and the paths consisting of subsequent remote accesses that show the reachable hosts. The output of graph learning is an embedding vector for each node and edge of the graph, which can be used to differentiate the normal and malicious authentication requests.
The machine learning approach implemented in this Kestrel analytics is supervised and consists of three phases, as follows.
In Phase 1, a graph is constructed based on all available authentication events, regardless of if they are malicious or normal, as explained in the previous section. In Phase 2, our Kestrel analytics derives a vector of 128 elements for each node using node embedding techniques. These vectors and time of authentication requests are used to generate a feature vector for each requests. Phase 3 is the last phase which is dedicated to training a classifier. Currently, five classifiers can be applied using this Kestrel analytics, namely, SVM, logistic regression, KNN, XGBoost, and random forest.
To load the train and test data using Kestrel, we can save the data into a database shown in the previous blog post. For example, we can have a table with five columns, as represented in Fig. 2.
The data stored in the database can be read using the STIX-shifter data source interface. We can load the data in the same way as the previous Kestrel analytics to detect lateral movement using the following commands. For further information, you can refer to my previous blog post.
users = GET user-account FROM stixshifter://database WHERE [user-account:user_id != null]
connections = FIND network-traffic LINKED users
connections_obs = ADDOBSID (connections)
users_obs = ADDOBSID (users)
observations = GET observed-data FROM stixshifter://database WHERE [user-account:user_id != null]
This Kestrel analytics is available here. Before applying it, you need to download and build it using the following commands.
$ git clone https://github.com/opencybersecurityalliance/kestrel-analytics/Graph\ Learning-based\ Lateral Movement\ Detection.git
$ docker build -t kestrel-analytics-detect_lm .
After building the Kestrel analytics and reading the information stored in the database, we can apply the Kestrel analytics on variables
observations and tables
connections_obs, respectively. This analytics takes two input parameters. The first one is called
walkLength which represents the maximum length of the random walks during Phase 2, and the second one is called
classifier, which can be set to either
randomforest. For more details, please see the following line of code.
APPLY docker://detect_lm ON observations, users_obs, connections_obs WITH walkLength=3, classifier=xgboost
Our Kestrel analytics modifies variable
observations by adding five attributes, called “destination”, “source”, “status”, and “user_id” to all entities. After these modifications, each entity will show an authentication request for which its status in terms of being malicious or benign is represented by the value of attribute
status of the entity.
Leila Rashidi is a postdoctoral fellow at the Canadian Institute for Cybersecurity (CIC), University of New Brunswick (UNB), Canada. She currently leads a team at CIC, working on user entity behavior analysis in collaboration with IBM. Her research focuses on the cybersecurity and performance evaluation.