Powershell is Microsoft’s scripting language for task automation. It is very powerful and allows both IT professionals and programs to execute low level windows commands. As it is built on top of the .NET common language runtime (CLR) it has access to the same objects and processes as in any other Windows-native program.
Because Powershell can be executed via command-line interfaces, it could also run silently, performing actions that can remain unnoticed for the end-user. Its extensive functionality, native Windows integration, and familiarity with both .NET commands and Windows scripts, makes it the perfect tool for those who have another intention.
The following command is a Powershell example that prints out the computer’s manufacturer, the owner’s name, and the model.
Figure 1. Source: Collecting Information About Computers, Microsoft Docs.
The script in Figure 1 shows how easy it is to identify that the running computer belongs to Jane Doe for a particular workgroup and that the computer is a Compaq Presario 06, in fact, a computer model with some mileage, and probably no longer supported by its manufacturer.
Although the command example for Figure 1 reveals some important information about the computer, this is not considered a malicious script, as the command under revision is an out-of-the-box command from Powershell.
What is a Malicious Script?
A malicious script looks to the human-eye as another common Powershell script that has a hidden intention. That intention might be extracting, deleting, or renaming files, installing extraneous programs, downloading or sending files through the internet without the consent of the end-user.
Malicious programs such as viruses, trojans, and ransomware are using Powershell scripts to commit their objective: harm, steal, and kidnap information from your computer or network.
One important aspect of Powershell execution is that most of the time it’s fileless; which means that the execution of a script does not generate a file, will trigger a download, or will change a registry key, which makes it the perfect tool for those that want to go unnoticed.
Malicious powershells are often using commands that can be labeled as “not common” and determining its rareness is key for their detection. Therefore, building a solution that is able to identify which powershells are normal and which ones are not, is key for its identification.
Detecting Malicious Powershells
The idea that malicious powershells have something special that can differentiate them from others, will be used to build an algorithm that is able to spot that difference by using natural language processing techniques.
Figure 2, shows an example of a malicious Powershell. The command has been modified to make it safe for evaluation.
Figure 2. Encoded Powershell from Emotet malware.
The message from Figure 2 is base-64 encoded, a technique that is commonly used by malware to obscure the execution of Powershell scripts to make them look like a bunch of random text. After performing a decoding task, we can get more information about what the Powershell real commands. Figure 3, shows the de-encoded Powershell. Again this script has been modified to disable any execution.
Figure 3. Decoded Powershell from Emotet malware
Figure 3 shows a decoded Powershell that loops over a set of websites to download some files. After each download, the code uses the .NET system diagnostics library to start some processes.
Although this looks sketchy, it is not trivial for a computer system to determine if the script under execution is trustworthy.
At Knogin, we have reviewed over nine terabytes (9TB) of information in the search of encoded Powershell scripts that can help build a sense of normality based on a statistical distribution. The main idea is to identify words in non-malicious Powershell scripts, to determine which ones are being used. This will help draw a distribution that tells how often or how unique each word is. The following list is an example of those word counts from the powershells from our database:
This is just the top 7 most-common words, described by the number of times each one occurred in the database. For example, the word “output” appeared 45,358 times.
The malicious Powershell detection strategy theorizes that ill-intentioned scripts use commands that are not normal. Normality here will be an estimation of how far it's something from the mean.
To achieve this, Knogin has developed a scoring function Sf(p_command) that receives a Powershell command as a parameter and returns a number representative of the combination of all the words in the script weighted by the occurrence of each word in the database.
The final score of each Powershell in the database draws the following distribution:
Figure 4. Distribution of Powershell scripts based on their scoring function.
Figure 4 shows that the vast majority of the powershells group together based on the scoring function. On the left-hand side, there is a small set of scripts that look different from the rest. Those scripts far on the left, are indeed malicious.
The technique used here for malicious Powershell detection is one of the many strategies that can be used. This one works particularly well with imbalanced datasets where there are not many examples of malicious scripts. Other strategies such as using deep learning for script classification is also recommended if sufficient examples of ill-intended scripts are provided.
In this case, natural language processing was used to count, transform and decompose each script into a discrete set of features that serve as an entry point for an algorithm, such as Knogin’s scoring function, which was later used to draw a distribution to detect those cases that are far away from the rest, based on the distance set by the standard deviation.
If you want to know more about our threat detection tool.