MADLIRA

Malware Detection using Learning and Information Retrieval for Android

Overview

MADLIRA is an Android static malware detector. It takes as input a set of Android malwares and a set of Android benwares and can either (1) extract a malicious API graph representing the malicious behaviors of the Android malwares in the set; or (2) learn to classify Android malwares without extracting the malicious behaviors. These phases are called the training phases. Then, given a new Android application, MADLIRA checks whether it is malicious or not.

Installing

Download file MADLIRA.7z which contains executable files and decompress it.
Download file MADLIRA_Data.7z which contains training data and decompress it.

Installed Data:

MADLIRA.jar is the main application.
noAPI.txt declares the prefix of APIs.
family.txt lists malwares by family.
Folder TrainData contains the training configuration and training model.
Folder Samples contains sample data.
Folder TempData contains data for kernel computation.

Functionality

This tool has two main components: TFIDF component, which extracts the malicious behaviors and uses these malicious behaviors to check whether a new application is malicious or not (read paper [1] for more details), and SVM component, which applies Random walk Graph kernel based support vector machines to classify malwares from benign programs (read paper [2] for more details).

TFIDF component

TFIDF component

Extraction of Malicious Behaviors Module takes as input a set of malwares and a set of benwares. After applying the Graph Computation component to extract their corresponding API call graphs, these graphs are fed to the Malicious Graph Computation component to compute the malicious API graph. This component implements the TFIDF weighting term scheme introduced in [1] to compute the malicious behaviors. It outputs malicious API graphs representing the malicious behaviors.

Malicious Behavior Detection Module takes as input a binary program. It first applies the Graph Computation component to compute its corresponding API call graph. Then, it checks whether this graph contains any malicious behavior from the malicious API graphs (the output of Extraction of Malicious Behaviors Module) or not. If this program contains any malicious behavior, the output is Malicious!. Otherwise, the output is Benign!.

Command: MADLIRA TFIDF

For this component, there are two functions: the training function (Malicious behavior extraction) and the test function (Malicious behavior detection)

Malicious behavior extraction

Collect benign applications and malicious applications and put them in folders named benignAPKFolder and maliciousApkFolder, respectively.
Prepare training data and pack them in two files named benignPack and maliciousPack by using the command:
```
MADLIRA TFIDF packAPK -PB benignApkFolder -B benignPack -PM maliciousApkFolder -M maliciousPack
```
Extract malicious behaviors from the two packed files (benignPack and maliciousPack) by using the command:
```
MADLIRA TFIDF train -B benignPack -M maliciousPack
```

Malicious behavior detection

Collect new applications and put them in a folder named checkApk.
Detect malicious behaviors of applications in the folder checkApk by using the command:
```
MADLIRA TFIDF check -S checkApk
```

Command:

MADLIRA TFIDF train <Options>
        Compute the malicious specifications for a given training data.
                -B <filename>: the archive file contains all graphs of training benwares.
                -M <filename>: the archive file contains all categories of training malwares.

MADLIRA TFIDF check <Options>
        Check malicious behaviors in the given applications in a folder.
                -S <folder>: the folder contains all applications (apk files).

MADLIRA TFIDF test <Options>
        Test the classifier for a given test data.
                -S <folder>: the folder contains all graphs for testing.

MADLIRA TFIDF clear
        Clean all training data.

MADLIRA TFIDF install
        Clean old training data and install a new data for training.
                -B <filename>: the archive file contains all graphs of training benwares.
                -M <filename>: the archive file contains all categories of training malwares.

Examples:

Training new data:

First collect training applications (APK files) and store them in folders named MalApkFolder and BenApkFolder.
Pack training applications into archive files named MalPack and BenPack by using this command:
```
MADLIRA TFIDF packAPK -PB BenApkFolder -B BenPack -PM MalApkFolder -M MalPack
```
Clean old training data:
```
MADLIRA TFIDF clear
```
Compute the malicious graphs from the training packs (BenPack and MalPack)
```
MADLIRA TFIDF train -B BenPack -M MalPack
```

Checking new applications:

put these applications in a folder named checkApk and use this command:
```
MADLIRA TFIDF check -S checkApk
```
Output:
MADLIRA marks Malicious! programs detected as malwares and Benign! the benign ones.

SVM component

SVM component

Learning Malicious Behaviors Module implements two phases: the learning phase and the detection phase. In the learning phase, it takes as input a set of malwares and a set of benwares. It first applies the first Module to compute their corresponding API call graphs. Then, these API call graphs are fed to the SVM training component, i.e., LIBSVM, to compute a SVM training model. In the detection phase, it takes as input a binary code and applies the first Module to compute its corresponding API call graph. Then, it uses SVM classifier with the training model (the output of the first phase) to classify the program either Malicious! or Benign!.

Command: MADLIRA SVM

For this component, there are two functions: the training function and the test function.

Training phase

Collect benign applications in a folder named benignApkFolder and malicious applications in a folder named maliciousApkFolder.

Prepare training data by using the commands:

MADLIRA SVM packAPK -PB benignApkFolder -B benignPack -PM maliciousApkFolder -M maliciousPack

Learn the malicious behavior as a SVM training model by the following command:
```
MADLIRA SVM train -B benignPack -M maliciousPack
```

Malicious behavior detection

Collect new applications and put them in a folder named checkApk

Detect malicious behaviors of applications in the folder checkApk by using the command:

MADLIRA SVM check -S checkApk

Command:

MADLIRA SVM train <Options>
        Compute the classifier for given training data.
                -T <t>: max length of the common walks (default value = 3).
                -l <lambda>: lambda value to control the importance of length of walks (default value = 0.4).
				-B <filename>: the archive file contains all graphs of training benwares.
				-M <filename>: the archive file contains all graphs of training malwares.

MADLIRA SVM check <Options>
        Check malicious behaviors in the given applications in a folder.
                -S <folder>: the folder contains all applications (apk files).

MADLIRA SVM test <Options>
        Test the classifier for a given test data.
                -S <folder>: the folder contains all graphs for testing.

MADLIRA SVM clear
        Clean all training data.

Packages

This tool uses the following packages:

MADLIRA

Overview

Installing

Installed Data:

Functionality

TFIDF component

Malicious behavior extraction

Malicious behavior detection

Examples:

SVM component

Training phase

Malicious behavior detection

Packages

References