STAMAD

STAtic MAlware Detection

Overview

STAMAD is a static malware detector. It takes as input a set of malwares and a set of benwares and can either (1) extract a malicious API graph representing the malicious behaviors of the malwares in the set; or (2) learn to classify malwares without extracting the malicious behaviors. These phases are called the training phases. Then, given a new program, STAMAD checks whether it is malicious or not.

STAMAD consists of the following modules:

Graph computation

API Call Graph Computation Module takes as input a binary code and produces an API call graph corresponding to this code. It consists in three processing steps. In the first step, it takes as input a binary code of the program and uses PEfile to check whether the binary code is packed or not. If so, this code is unpacked by the corresponding unpacker, e.g., UPX. Then it is fed to the Oracle. Otherwise, this binary code is directly passed to the Oracle. In the second step, the Oracle gets the binary code and outputs the assembly program, information of the API functions and the control flow graph of this assembly code. The Oracle relies on Jakstab and IDA Pro. Jakstab performs static analysis of the binary program and provides its corresponding assembly program and control flow graph. However, it does not allow to extract information of API functions in the program and indirect calls to API functions. Hence, IDA Pro is used to get these informations of API functions with an assembly code. The outputs of the Oracle are passed to the last component to compute the API call graph. In the last step, this component implements the algorithm in [1] to compute an API call graph from the control flow graph and API information of the program.

MalDet is a tool for malware detection

Extraction of Malicious Behaviors Module takes as input a set of malwares and a set of benwares. After applying the previous Module to extract their corresponding API call graphs, these graphs are fed to the Malicious Graph Computation component to compute the malicious API graph. This component implements the TFIDF weighting term scheme introduced in [1] to compute the malicious behaviors. It outputs malicious API graphs representing the malicious behaviors. This phase will be called "training phase".

Malicious Behavior Detection Module takes as input a binary program and applies the first Module to compute its corresponding API call graph. Then, it checks whether the program's graph contains any malicious behavior from the malicious API graphs (the output of the Extraction of Malicious Behaviors Module) or not. If this program contains any malicious behavior, the output is Malicious!. Otherwise, the output is Benign!.

MalDet is a tool for malware detection

Learning Malicious Behaviors Module implements two phases: the learning phase and the detection phase. In the learning phase, it takes as input a set of malwares and a set of benwares. It first applies the first Module to compute their corresponding API call graphs. Then, these API call graphs are fed to the SVM training component, i.e., LIBSVM, to compute a SVM training model. In the detection phase, it takes as input a binary code, applies the first Module to compute its corresponding API call graph. Then, it uses SVM classifier with the training model (the output of the first phase) to classify the program either Malicious! or Benign!.

Prerequisites

STAMAD uses the following tools:

Installing

This tool is setup in the folder STAMAD by the following steps:

How to use

Learning Malicious Behaviors

To understand the following, it is important that you first read the paper [2].

Training phase:

Usage: STAMAD.exe SVM Train [options] <ListFiles>
Options:
   -g<n>  specifies the kind of graph to compute
          n=0 denotes Extended API call graph (default),
          n=1 denotes API call graph,
   -T<t>  specifies the length of walk in the Random walk kernel.
   -L<l>  specifies the value of lambda (0< l <1) in the Random walk kernel.

STAMAD takes an input ListFiles which contains a set of malwares and a set of benwares and gives a training model (trainingModel). An example of ListFiles (trainSamples.txt ) is as follows.


@Trojan-Dropper
SVMTrainData\Trojan-Dropper.Win32.ZomJoiner.200.exe.upx
SVMTrainData\Trojan-Dropper.Win32.Delf.cg.exe.upx
SVMTrainData\Trojan-Dropper.Win32.Small.np.exe.upx
@Worm
SVMTrainData\Worm.Win32.Shorm.120.a.exe
SVMTrainData\Worm.Win32.Lioten.exe.upx
SVMTrainData\Worm.Win32.Petik.c.exe.upx
SVMTrainData\Worm.Win32.Antinny.v.exe
SVMTrainData\Worm.Win32.Antinny.k.exe
@NegativeSet
SVMTrainData\411toppm.exe
SVMTrainData\CbLauncher.exe
SVMTrainData\Deadwood.exe
SVMTrainData\ELFDump.exe
SVMTrainData\a2p.exe
SVMTrainData\alternatives.exe
SVMTrainData\animate.exe
SVMTrainData\annoyance-filter.exe

This example of ListFiles contains Trojan-Dropper family (Trojan-Dropper), Worm family (Worm) and a set of benign programs (NegativeSet).

We will compute the training model (a SVM classifier) from the set of malwares and benwares listed in trainSamples.txt. We compute the training model with the random walk of length 5 and lambda 0.5 by the following command.

STAMAD.exe SVM Train -T5 -L0.5 trainSamples.txt

Training

Test phase:

Using the malicious API call graph extracted from the training phase, STAMAD can check whether a new program is malicious or not.

Usage: STAMAD.exe SVM Test [options] <TestFile> <ListFiles>

We have to choose the options according to the specifications computed in the training phase.

Extraction of Malicious Behavior

To understand the following, it is important that you first read the paper [1].

Training phase:

Usage: STAMAD.exe TFIDF Train [options] <ListFiles>
Options:
   -g<n>  specifies the kind of graph to compute
          n=0 denotes Extended API call graph (default),
          n=1 denotes API call graph,
   -N<n>  specifies the number of nodes/edges in the malicious API graph.
   -F<f>  specifies the type of the function for term weight computation.
          f=0 denotes function F1 (a linear function),
          f=1 denotes function F2 (a rational function),
          f=2 denotes function F3 (a logarithmic function),
          f=3 denotes function F4 (a sigmoid function).
   -W<w>  specifies the Equation for term weight computation.
          w=1 denotes Rocchio Equation,
          w=2 denotes Ratio Equation.
   -S<s>  specifies the strategy (s=1,2,3) to compute graphs.

STAMAD takes an input ListFiles which contains a set of malwares and a set benwares and gives the malicious API call graph which is extracted from these two sets. An example of ListFiles (trainSvm.txt) is as follows.

@Backdoor
Backdoor.Win32.Hupigon.are.exe
Backdoor.Win32.WCRat.11.exe
Backdoor.Win32.Plunix.c.exe.upx
Backdoor.Win32.Sneaker.exe.upx
Backdoor.Win32.DSNX.plugin.PortScan.exe.upx
@Worm
Worm.Win32.Shorm.120.a.exe
Worm.Win32.Lioten.exe.upx
Worm.Win32.Petik.c.exe.upx
@NegativeSet
aafire.exe
aaflip.exe
aainfo.exe
aasavefont.exe
aatest.exe

This example of ListFiles contains Backdoor family (Backdoor), Worm family (Worm) and a set of benign programs (NegativeSet).

There are different options to compute the malicious API call graph according to the parameters S (strategy), F (Frequency Function), W (Equation), N (the number of nodes/edges). The output of the training phase is stored in the folder specification.

Extraction of Malious Behaviors We will extract the malicious behaviors from the set of malwares and benwares listed in trainSvm.txt. Using the following command, the malcious graph is extracted by using Strategy S1 with Equation Ratio and Formula F3 with n=55.

STAMAD.exe TFIDF Train -N55 -S1 -W2 -F3 trainSvm.txt

Training Output:

n55_S1_w2_f3_@Backdoor_Edge.txt
n55_S1_w2_f3_@Backdoor_Node.txt
n55_S1_w2_f3_@Worm_Edge.txt
n55_S1_w2_f3_@Worm_Node.txt

We get the malicious graphs which are extracted from Backdoor family and Worm family declared in the training set (trainSvm.txt). The malcious graph for Backdoor family is specified by two files n55_S1_w2_f3_@Backdoor_Edge.txt (contains edges) and n55_S1_w2_f3_@Backdoor_Node.txt (contains nodes). n55_S1_w2_f3_@Worm_Edge.txt (contains edges) and n55_S1_w2_f3_@Worm_Node.txt (contains nodes) characterize the malicious behaviors from Worm family.

Test phase:

Using the malicious API call graph extracted from the training phase, STAMAD can check whether a new program is malicious or not.

Usage: STAMAD.exe TFIDF Test [options] <TestFile> <ListFiles>

We have to choose the options according to the specifications computed in the training phase.

References

  1. Khanh Huu The Dam and Tayssir Touili. Automatic Extraction of Malicious Behaviors. In Proceedings of the 11th International Conference on Malicious and Unwanted Software 2016
  2. Khanh Huu The Dam and Tayssir Touili. Malware Detection Based On Graph Classification. In Proceedings of the 3rd International Conference on Information Systems Security and Privacy 2017