Sunday, February 24, 2013

Malware Analysis 101 [Part 3]

Finally, here's the concluding portion of Malware Analysis 101. Both previous posts can be found here [Part 1] and here [Part 2].


Image Source
A fundamental problem of communication (and language in general) is the concept of naming an object in order to identify it. Identifying individual samples is accomplished via a mechanism known as cryptographic hashing. The basic definition of hashing is: "an algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that any (accidental or intentional) change to the data will (with very high probability) change the hash value. The data to be encoded are often called the "message," and the hash value is sometimes called the message digest or simply digest."[source] Typically, one of two (sometimes both) algorithims are used: MD5 (Message-Digest Algorithm 5) or SHA-1 (Secure Hash Algorithm). Due to some inherent weaknesses in MD5, many organizations have switched to SHA-1. That said, MD5 is still very common and is likely to remain so for quite some time. Generating a hash for any file is relatively simple and there are numerous methods available across various Operating Systems. Some of these methods are:

1) *Nix CLI tool "md5sum" or "sha1sum" simple provide a filename and the tool outputs a cryptographic hash based on the alogrithm of your choosing.

2) Windows CLI tool "md5deep" (iincludes functionality for SHA-1 and others). Similar functionality to the *Nix flavor tools. Available here.

Reputational Analysis:
Image Source

When researching a sample, one of the simplest paths to an answer is to simply search for one out on the internet. It is entirely possible that you are not the first person to question the integrity of "butts.jpg.exe". The fundamental follow up to "I don't know" should generally be "I'll Google it." Use the information you have already gleaned from initial, static analysis to construct intelligent query (ex: Cryptographic Hash is likely to give better results than simply utilizing the file name of a sample). Google the hash, Google the source if you have that information. The remainder of the tools on this list concern the reputation of the source itself, which is generally a very good indicator of the nature of the sample. As always, the "rule of thumb" is to use more than one tool to gather information. The exception is "factual information" such as associated IPs and Whois information, as this isn't likely to change between reporting sites.

Sites Listed on the Slide:

Image Source

 A string is a sequence of characters, 3 or greater like "the". Searching the strings of a program is a simple, but powerful way to examine the functionality of a program. Oftentimes, Function Imports and API calls, as well as hardcoded URLs and other useful information are available with a simple search! This functionality exists with the command (you guess it) "strings" in Linux. For windows based Operating systems the free program "Strings" is available and provides nearly identical functionality as its Linux counterpart: both tools searche for both ASCII and Unicode formatted strings within an Executable. The primary difference in ASCII and Unicode formatting is the usage of NULL terminators to indicate the end of a string. Where ASCII storage sees a series of letters "example B A D" capped off by a NULL value to identify the end of a string, Unicode marks each the end of each character value with a NULL, and further caps off the entire string with two NULL values.
Searching an Executable for strings ignores context, so it is possible that a Function call or interesting value is actually irrelevant. Additionally there are cases where a sequence of 3 or more characters isn't a string, sometimes the bytes mistaken by strings as a character are actually a memory address; fortunately it is very easy to identify erroneous strings.
The most interesting bit of data, at least for our purposes, are the Windows function strings. They are all similarly formatted In that they begin with a capital Letter, contain no spaces, and each subsequent word starts with a capital letter.
Very easy to identify and extremely well documented in the MSDN (Google Search Galore!).
Common Examples include:
LoadLibrary - Which loads a DLL into a process that may NOT have been loaded at the time the program started. Common among nearly every Win32 Program.
GetProcAddress - Retrieves the address of a function in a DLL loaded in memory. This can be used to locate and modify code into a module or to find a location to inject code.

Packed vs. Unpacked (at a glance):

Packing of a file is a technique used by Malware authors specifically to make analysis more difficult (although it does hinder detection somewhat as well). Packing a program is considered a subset of obfuscation, and in general is a major hindrance to static analysis. Fortunately its being packed is a good reason to be suspicious. Legitimate executables generally include a large number of strings; if a program is packed, the number of strings is going to be fairly low, which is what you'd expect as you are only really seeing the strings output of the initial wrapper code. Packed executables will need more advanced techniques to be fully

An in-depth discussion of Packers can be found here (warning, PDF).
For an example of a packer, albeit one used by legitimate authors as well, here is UPX (Link goes to sourceforge).

PE Headers:

Image Source:  hexdump –C <file> | head

For static analysis, the PE file headers are a treasure trove of information. Besides containing Imports and Exports the header contains Time and Date Stamps (which can be altered by the compiler), file metadata, and the various sections:

  1. .text - CPU executed instructions
  2. .rdata - Import and Export information
  3. .data - Global Data (accessible from any part of the program)
  4. .rsrc - Resources used by the executable that are considered seperate from the file itself. Icons, images, menus, and strings are commonly located here.

This isn't an all inclusive list, but a snapshot of the interesting sections. These sections are what provide the majority of the information displayed by tools like Dependency Walker and PEview. This information is of the utmost importance when trying to assemble an overall contextual view and opinion of the file being analyzed.

Linked Libraries and Functions:

Imported functions are the key to basic static analysis. They are the backbone on which analysts can make informed decisions about an executable. An import is a function used by one program that are actually stored in another program (for example a DLL often called by the Windows Installer is run32.dll) this connection is called linking. There are 3 primary ways in which libraries can be linked:

  1. Static - Original Function copied to the executable.
  2. Runtime - Commonly used by packed.obfuscated malware; "ad-hoc" function calling through the use of popular functions like LoadLibrary and 
  3. Dynamic - Libraries are loaded at the time of program execution. (Most Common)

Identification of the libraries used is useful for hypothesizing what a program is meant to do. For example: imports from "Shell32.dll" can indicate that a file can launch other programs, while not malicious on its own, this does give us reason to be suspicious. Tools like the previously referenced DependencyWalker are extremely useful in exploring the capabilities of an executable.

Below is a list of common DLLs and their functionality. This list was constructed by the authors of "Practical Malware Analysis" (Sikorski & Honig, 2012, Page 17).
This is a very common DLL that contains core functionality, such as access
and manipulation of memory, files, and hardware.
This DLL provides access to advanced core Windows components such
as the Service Manager and Registry.
This DLL contains all the user-interface components, such as buttons, scroll
bars, and components for controlling and responding to user actions.
This DLL contains functions for displaying and manipulating graphics.
not import this file directly, although it is always imported indirectly by
Kernel32.dll. If an executable imports this file, it means that the author
intended to use functionality not normally available to Windows programs.
Some tasks, such as hiding functionality or manipulating processes,
will use this interface.
WSock32.dll and
These are networking DLLs. A program that accesses either of these most
likely connects to a network or performs network-related tasks.
This DLL contains higher-level networking functions that implement
protocols such as FTP, HTTP, and NTP.


Here is a dump of all the references I have not mentioned or linked to specifically (no particular order):

  • Practical Malware Analysis – Michael Sikorski and Andrew Honig (Available at no starch press)
  • Kahu Security -
  • Malware Analyst’s Cookbook – Michael Ligh, Steven Adair, Blake Hartstein, Matthew Richard
  • – Lenny Zeltser’s R(everse) E(ngineering) M(alware) distro
Hopefully they help you as much as they helped me.

I hope you enjoyed this series. Check back for more stuff soon!

No comments:

Post a Comment