ebook img

Statistical Tools for Linking Engine-Generated Malware to Its Engine PDF

2009·3.9 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Tools for Linking Engine-Generated Malware to Its Engine

Digitized by the Internet Archive in 2012 with funding from LYRASIS Members and Sloan Foundation http://archive.org/details/statisticaltoolsOOmilg Columbus State University The College ofBusiness and Computer Science The Graduate Program in Applied Computer Science Statistical Tools for Linking Engine-generated Malware to its Engine A Thesis in Applied Computer Science by Edna Chelangat Milgo Submitted in Partial Fulfillment ofthe Requirements for the Degree of Master of Science December 2009 )2009 by Edna Chelangat Milgo I have submitted this thesis in partial fulfillment ofthe requirements for the degree of Master ofScience Date Edna Chelangat Milgo We approve the thesis ofEdna Chelangat Milgo as presented here. Qg^Ja f2tial Date Mohamed R. Chouchane Assistant Professor ofComputer Science, Thesis Advisor 3.-61 " looS ( Date Edward L. Bosworth, Associate Professor of Computer Science Date Jianhua Yang, Associate Professor of Computer Science l^loi IX* '1 Date Lei Li, Associate Professor ofComputer Information Systems Management Ill ABSTRACT Malware-generating engines challenge typical malware analysts by requiring them to quickly extract and upload to their customers' machines, a signature for each of a possi- bly vast number ofnever-before-seen malware instances that an engine can generate in a short amount oftime In this thesis we propose and evaluate two methods for'linking va- riants of engine-generated malware to its engine. The proposed methods use the w-gram frequency vector (NFV) of the opcode mnemonics of an engine-generated malware in- stance as a feature vector for the instance. An NFV is a tuple that maps «-grams with their frequencies. The in-formation contained within the NFV of an engine-generated malware instance is then used to attribute the instance to the engine. The first method im- plements a Bayesian-like classifier that uses 1-gram frequency vectors of programs as feature vectors. This method was successfully evaluated on a sample of benign programs and one of malicious programs from the W32. Simile family of self-mutating mal- ware. The second method, which is an extension of the first method, uses optimized 2-gram frequency vectors as feature vectors and classifies malware by computing its proximity to the average of the NFVs of instances known to have been generated by a known engine. The second method was successfully evaluated on four ma)ware-generating engines: W32 Simile, W32.Evol, W32.NGCVK, and W32.VCL. . The evaluation yielded a set offour 17-tuples ofdoubles as signatures for each ofthe en- gines, and achieved a 95% discrimination accuracy between a sample ofbenign programs and samples of malware instances that were generated by these engines. Accuracies of 94.8% were achieved for engine signatures of size 6. 8 and, 14 doubles. We also used

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.