A systematic method to execute simulated phishing tests By Demetris Antoniou Delft, The Netherlands, 2015 Acknowledgements. This thesis represents the end of my studies as a master student at the Technical University of Delft. The past two years have been an eye-opening experience; both in terms of formal education and in all the experiences I have gained by living in the Netherlands. This has been effected by the numerous people I had the opportunity to converse with, work with, laugh with and the amazing times I have had here. More importantly, however, it is experiences like these that prepare one for the future. The story on how this project came to fruition is a peculiar one. It all started during a lecture on computer security where I asked the professor whether there is any research done on Social Engineering. The professor then instructed me to talk to Dr. Wolter Pieters who is conducting research on the socio-technical aspects of cyber-security within the TREsPASS project. Wolter then got me in touch with Dr. Trajce Dimkov who served as the external supervisor for this project under the auspices of Deloitte. At the time of writing this acknowledgment I am a full-time consultant at Deloitte. My most sincere thanks go to these two men who have been instrumental in both the ending of my academic career (for now?) and the beginning of my professional one. Wolter has been an amazing supervisor who always had meaningful advice on how the thesis should proceed. The project went through several iterations and mishaps yet Wolter was always there to give form to that chaos of information that would come and go. I am thankful for his help and his guidance and even in cases where I would feel lost, Wolter was always finding the way to make sense out of the situation. If Wolter was the Ying Trajce was the Yang. His advice was instrumental in formulating the thesis in a business context and produce something that would provide value to the organization and society in general. Our bi-weekly thesis meetings evolved into mentoring sessions where he, with his pointed remarks, molded me into a professional in way of thinking, understanding and behaving. To both, I owe you my sincerest thanks for being the ones who stepped me up into the next chapter of my life. Many thanks also go to Drs. Marianne Junger and Jan-Willem who provided extensive help in the early stages of the project. I would also like to thank Dr. Robert Verburg and Dr. Michel van Eeten for being on my committee. They have provided invaluable insight on the project and their advice was instrumental in decisions that had to be taken which eventually gave the project its final form. It is through the difficulties that good things come out of and without their eye-opening remarks, this project would not have been at the quality that it is now. The green-light-turned-to-midterm session will be engraved in my memory as one of the best things that happened to me. For breaking the weak foundation and making me build a stronger one, I owe you many thanks. Thanks also go to the many people who were directly or indirectly involved in the shaping of the project. I would like to especially thank Katrien for being my comrade in our Deloitte internship, Hugo and Peter for their out-of-this-world phishing expertise. The entire Deloitte hacking team is truly an exceptional group of people; the trainings I received as part of it have been some of the most memorable learning experiences of my life. I also thank Anamaria for helping me with the translations. Last but not least, come all the people who have put their brick in shaping me as a person. My physics teacher, who taught me to be better than expected; my history teacher, who vividly described the world and motivated me to discover it; the Greek military officer, who taught me that how you say something is more important than what you say. To my parents, who never doubted me and will always be there for me. And to Georgia, my mainstay, the joy of my life and the world’s most beautiful smile (and eyes!) 2 Management Summary. This research project was done for the Management of Technology master program at TU Delft in cooperation with Deloitte Nederland. The purpose of this project was to systematize the creation of simulated phishing test e-mails and to enable their measurement so as to facilitate the statistical analysis of their results. This project is focused on simulated phishing tests which are distinct from regular phishing e-mails both in their background and form. Phishing is a scalable act of deception whereby impersonation is used to obtain information from a target. It poses a risk to both individuals and corporations due to the loss value that is incurred as a result of a successful phishing attack. Organizations have started conducting Simulated Phishing Tests (SPTs) to measure the susceptibility of their employees to actual phishing attacks, compare to others in their industry and to improve awareness through training. Deloitte is offering SPTs to other companies who would like to do so. This has resulted in a unique dataset with the results of previous SPTs. The dataset is unique because every e-mail is accompanied by its click-rate; something not found on public phishing depositories. Such an expanding dataset can be analyzed so as to gain insight around the nature of the e-mail and make statistical inferences. The purpose of this project is to create a coding scheme which will enable this analysis and be used to conduct SPTs in a systematic way. Systematic here means that concrete design decisions will be formulated which the maker of the SPT can follow and make the e-mail according to. These design decisions will result in an e-mail which will also have a set of enumerated variables accompanying it. For example, an e-mail can purposively be designed to use a happy event in its message whereas another can make its focus on corporate policies. These variables can then be correlated to the actual click-rate of the e-mail which will can result in predictive models regarding the click-rates of SPTs in the future. These predictive models can then be used to create phishing campaigns of increasing difficulty; where difficulty could be numerically expressed. Currently (2015), phishing has evolved into a more personalized form also known as spear-phishing where the attacker will modify their phishing e-mail to make it more relatable to the recipient. For example, a phishing attack is only sent to employees of a particular company using that company’s e-mail structure and logo. This introduces several variations of the regular phishing attack which is sent to thousands or millions of people without any form of personalization or target segmentation. SPTs conducted by industry are similar to spear phishing attacks and are usually sent to various departments of a client company and purport to be addressed solely to employees of the company. The first step in fulfilling the purpose of the project was in performing an extensive literature review on phishing, how it evolved and the current state of affairs. The literature review set the definitions and investigated current research on phishing. It has been identified that most research attempts to investigate differences between individuals and their susceptibility to phishing. However, little research has investigated the effect of the e-mail itself to the click-rate. This can be partly explained by the lack of data on click-rates over several different e-mails; something offset by our dataset. The literature review established the definition of phishing as a deceptive piece of text. This definition then led to an exploration of deception in general and focused on deception in asynchronous computer-mediated communications. The result was a set of computer readable indicators on deception which were the first part of the coding scheme. Then, the persuasive elements of phishing attacks were investigated. This resulted in the most populous set of variables for the project which are drawn from various established theories such as Maslow’s theory on motivation and Cialdini’s principles of influence. The enumeration of these variables needs human coders. 3 The final set of variables which describe a simulated test e-mail were derived from the existing dataset of previous SPTs and from the literature. This resulted in variables which are unique to the dataset or were viewed as important and were not found in literature. These variables also require rating by coders. No conclusions can be drawn on the effect of each variable to the click-rate but informed assumptions on the expected effect are provided. The number of tests available is too small to make any inferential analysis at the moment; this project provides a measurement scheme which will enable it in the future. Moreover, guidelines are provided on how to conduct experiments to discover the actual impact of each variable on the click-rate. By using the results of these experiments e-mails can then be designed around their perceived difficulty. More difficult e-mails are harder to detect as phishing e- mails, are much more persuasive and thus are expected to have a higher click-rate than less difficult ones. The results of the literature review and the dataset were then combined and the coding scheme was operationalized. A total of thirty-five variables were included in the coding scheme which were drawn from literature and the dataset. They can be found in Table 18 in Chapter 5. Questions were formulated for the human coded variables with brief explanations on the nature of the variables and the importance of coder training is discussed. Using human coders introduces considerable subjectivity in the measurement of e-mails therefore training is of paramount importance in maintaining the replicability and reliability of the coding scheme. Recommendations are also provided on what can be done with the coding scheme for future SPT engagements. The main benefits of using the coding scheme are outlined below: SPAs can now be constructed in a systematic way where the characteristics of the e-mail can be manipulated along scientifically backed principles. The resulting e-mails from SPTs will have a measurement accompanying them which can be correlated to the resulting click-rate. The e-mails sent to the same organization over time can be created with ascending difficulty which can be a better indicator on the efficacy of anti-phishing awareness training. A methodological approach is proposed which can enable the conduct of experiments which can find the effect of each variable on the click-rate of the e-mail. This experimentation can enable the finding of truth in what the impact is of each variable to the click-rate. Organizations who apply this approach will be able to gain more information about their phishing tests. The use of A/B testing is recommended where the recipients are split into two groups where there is equal representation of all subject specific factors. This ensures that two different e-mails are sent to two groups which are similar to each other as possible. With such an approach the effects of different organizations have on the impact of the click- rate are diminished and the characteristics of the e-mail can be investigated with minimal bias. A/B testing can provide concrete numerical results on the impact that individual variables have on the click- rate of the e-mail. In addition to the coding scheme and experimental guidelines recommendations are made on how statistical analysis of the results should be done. Simpler linear regression models are recommended for analysis when the number of samples is small and the adoption of more complex statistics is recommended as the dataset expands. The approach best suited for analysis when there is a sufficiently large dataset is Neuro-Fuzzy logic since it can incorporate the experimental results from A/B testing 4 whilst shaping the membership functions. In addition, fuzzy-based models can handle linguistic input well which is what the answers to the coding questions are. The approach is then expected to produce an accurate predictive model which will also be understandable by researchers as it provides a good trade-off between explanatory and predictive power. The implementation of the project’s recommendations will enable future conduct of SPTs in a scientific, enumerable and structured way which will enable the creation of scientifically backed phishing e-mails that will be ready for statistical analysis in the future. This project has been conducted in cooperation with Deloitte Nederland B.V. who have graciously provided the infrastructure, data and expertise in the conduct of simulated phishing tests. 5 Table of Contents Acknowledgements. .......................................................................... 1 Management Summary. ..................................................................... 3 1. Introduction. .............................................................................. 8 1.1. A brief overview of Deloitte’s SPT approach............................................................................. 10 1.2. Problem Identification. .............................................................................................................. 10 1.3. The business benefits of the solution. ....................................................................................... 11 1.4. The research opportunities of the solution. ................................................................................. 11 1.5. Research Objective and Research Questions ................................................................................ 12 1.6. Methodology and Thesis Structure .............................................................................................. 13 1.7. Expected Results .......................................................................................................................... 14 2. Scientific Foundation ................................................................. 15 2.1. Social Engineering ........................................................................................................................ 15 2.2. Phishing ....................................................................................................................................... 17 2.3. Spear Phishing and Targeted Attacks ........................................................................................... 18 2.4. Trends in spear phishing .............................................................................................................. 19 2.5. Existing research on phishing: the lack of focus on the e-mail ...................................................... 21 2.5.1. The e-mail and the individual: the bigger picture ................................................................. 23 3. Deception and Persuasion .......................................................... 25 3.1. Deceptive elements of phishing ................................................................................................... 25 3.2. Persuasive elements of phishing .................................................................................................. 28 3.3. Extraction of phishing characteristics from literature ................................................................... 32 4. Dataset Inspection ..................................................................... 37 4.1. Ordering of the current dataset ................................................................................................... 37 4.2. Exploration of representative examples ....................................................................................... 41 4.3. Identification of common themes ................................................................................................ 48 4.4. Extraction of phishing characteristics from the dataset ................................................................ 49 4.5. Limitations of the dataset ............................................................................................................ 50 4.6. Possibilities of the dataset ........................................................................................................... 50 5. The coding scheme and its application. ........................................ 51 5.1. The coding scheme ...................................................................................................................... 51 6 5.1.1. On the method used in analyzing the e-mails ....................................................................... 53 5.2. Informed Assumptions on the impact of variables. ...................................................................... 53 5.3. The importance of coders and practical considerations. .............................................................. 55 5.3.1. Training of coders. ............................................................................................................... 58 5.4. Quantitative attempts at the dataset and recommendations on analysis. .................................... 60 5.5. Towards prediction: Experimentation using the coding scheme .................................................. 66 6. Evaluation of the coding scheme. ................................................ 70 6.1. Evaluation of the coding scheme: a first test ................................................................................ 70 6.1.1. The objectivity of our coding scheme ................................................................................... 70 6.1.2. The reliability of our coding scheme .................................................................................... 71 6.1.3. The replicability of our coding scheme ................................................................................. 71 6.1.4. The systematicity of our coding scheme .............................................................................. 71 6.2. Using the coding scheme for future SPT engagements ................................................................ 72 6.1. Limitations of the coding scheme................................................................................................. 72 7. Conclusion ................................................................................ 73 7.1. The deliverable ............................................................................................................................ 73 7.2. Conclusion ................................................................................................................................... 75 7.3. Recommendations ....................................................................................................................... 75 7.4. Reflection ..................................................................................................................................... 76 Contribution to the Management of Technology Program. ............... 77 List of Figures. .................................................................................. 78 Bibliography. ................................................................................... 79 7 1. Introduction. Ever since their inception, computers were always operated by humans. As stand-alone computers evolved into networked computer systems so did their complexity, usefulness and accessibility. This resulted however, to the growth of malicious users who are willing to use the Internet for fraud, illegal profit, vandalism and other forms of illegal activity. These people are commonly (and often incorrectly) known as hackers who actively try to gain unauthorized access to computer systems. Although most people see hackers as highly gifted individuals with curiosity the current situation (2015) is different. Organized Crime groups use cyber-space more and more to gain illicit financial income. [1] One of the methods that malicious hackers use is the so-called 'Social Engineering'(SE). One of the earliest definitions of social engineering within the context of cyber-security came in 1995 when it was stated that SE is "the process of using social interactions to obtain information about a victim's computer system".[2] Although often overlooked[3], SE is a real and damaging threat to Information Based organizations. For example, many Cyber-security consultants claimed 100% success rate whenever they employed SE to penetration testing.1 Academics have realized the imbalance that exists between the research conducted in the technical domain of Cyber-Security and the social domain. The conclusions of Bjorck and Yngstrom state that the human element of security is very impactful on information security. In their review of existing publications they have identified that 80% of research conducted on Cyber- Security is about technical issues whereas only 20% deals with the social aspects. [4] This makes research into SE an interesting and necessary research topic. Phishing is a subset of social engineering whose latest exact definition will be discussed later. Phishing was first mentioned as such on January 2, 1996 on a Usenet newsgroup.2 The origins of phishing can be traced back to the time when America Online (AOL) was the prominent provider of internet in the U.S. At the time, attackers were sending false e-mails to AOL users and managed to extract their login credentials which in turn lead to the conduction of spam attacks using the hijacked accounts.[5] Attackers then moved on to trying to elicit financial information from their victims, especially credit card numbers. At the time however, the amateur nature of phishing e-mails was a good enough giveaway for to-be victims who quickly learned to recognize phishing attempts usually through the bad spelling and grammar. Nowadays, most e-mail accounts will have their spam inboxes filled with various offerings of cheap drugs, lottery winnings and account suspensions. However, phishing has evolved into something more dangerous and professionally made. The widespread use of Social Media has provided malicious social engineers with large amounts of information that can be used against a target. [6] This leads to the phenomenon of spear phishing or context-aware phishing where the e-mail is targeted. This means the e-mail is tailored to the recipient based on their employment, family status or anything that distinguishes them from others. [7] In contrast, regular phishing is one e-mail sent to many recipients (sometimes hundreds of thousands) 1 http://www.techrepublic.com/blog/it-security/securitys-weakest-link-technology-no-match-for-social- engineering/ 2 http://www.phishing.org/history-of-phishing/ 8 without any personalization. The type of phishing under study here is the former spear-phishing type. Simulated Phishing Tests (SPTs) conducted by industry are also of the spear-phishing type. Existing research revolves around mass phishing where there are well studied elements of the e-mail which are promptly filtered by existing software. [8] However, there seems to be little research specific to simulated phishing tests (SPTs) conducted by industry with the goal of raising awareness toward phishing. There are numerous instances where companies have been victimized by means of phishing. One of the most striking examples, is the one done on RSA in 2011. RSA is a computer security organization mostly known for their SecurID two-factor authentication service. In 2011 they were successfully infiltrated by means of a spear-phishing attack.3 Over a two day period an attacker (presumably state-backed) sent two e-mails to a four RSA employees. Although the e-mails were filtered into the spam folder, one employee actually took the e-mail out of the spam folder and opened the malicious attachment. The targeted employees were not even critical members of the firm such as high management or system administrators. Nevertheless, the SecurID service was severely compromised which resulted in the recall of most security tokens since users of SecurID such as Lockheed-Martin reported that attacks were being carried out using the compromised SecurID’s. Technology news website, Wired.com, also obtained a screenshot of the actual spear phishing e-mail (EMC is the parent company of RSA): Figure 1: The spear-phishing e-mail which compromised RSA. Source: wired.com As can be seen from above the spear-phishing e-mail was not extremely lengthy or elaborate. However, the title ‘Recruitment Plan’ seemed to be a good enough trigger for an employee to open it. Someone who was unsure of their job security or an intern could have opened it due to their need to know whether they were included. The above example demonstrates that even companies which are in the business of computer security can still be infiltrated using spear phishing tactics. Organizations have 3 https://blogs.rsa.com/anatomy-of-an-attack/ 9 recognized the risk that such attacks present to the continuity of their business. Therefore, industry has begun to incorporate phishing awareness training as part of their overall cyber-security strategy. Gartner has claimed in a recent report that organizations which employ what it calls, Anti-Phishing Behavior Management (APBM), “will experience up to 30% less breaches than organizations which rely solely on security awareness.”[9] The report recommends that chief information security officers ‘should leverage APBM solutions aggressively to establish, reinforce and maintain appropriate security behavior within all employee groups.’ Statements like these from organizations like Gartner point to the fact that SPTs will become, in the future, a standard in an organization’s cyber security strategy. 1.1. A brief overview of Deloitte’s SPT approach. Simulated phishing attacks are conducted by industry towards organizations who want to gage the susceptibility of their employees to spear phishing and to better educate employees on how to detect and thwart phishing attacks. Deloitte in the Netherlands has been conducting such engagements since 2014 with more engagements planned in 2015. The engagements are conducted as follows: 1. The client agrees to the execution of a phishing exercise. The client’s departments that will take part are decided upon and timeframes are agreed. 2. A number of phishing scenarios are created according to the client’s specifications. The scenarios are presented to the client who will choose and may suggest changes to the e-mails. 3. Using a specialized infrastructure the e-mails are sent out within the agreed timeframe and responses to them are recorded. Each employee has a unique but anonymous ID with which their actions can be traced. 4. After some time has passed, the results are collected and a report is sent to the client. The results include the percentage of recipients who clicked as well as to what extent they interacted with the phishing website. They could have just opened the link on the phishing e- mail or went all the way through and divulged their username/password information. This project will deal only with the e-mail and whether there was a click or not. 5. The report is either sent as-is or follow-up training is provided. There is also the option to debrief the employees at the time they are phished and they can be re-directed to an e-learning platform. The service offered by Deloitte has yielded an interesting and expanding dataset which not only has the e-mail itself and the phishing website but also the results of each exercise. This makes the data generated unique in the sense that is accompanied with data pertaining to its click rate. Publicly available phishing databases such as the Anti-Phishing Working Group (APWG) have an enormous amount of phishing e-mails but not accompanying data on their targets or their click rates. The data generation of SPTs present a massive opportunity to conduct statistical analysis in the future which will be useful for both the business and science in general. 1.2. Problem Identification. The experience obtained from the current SPT engagements has generated a number of questions whose answer is not clear at first glance. For example, a particular client signed up for a phishing campaign where a total of four phishing e-mails would be sent over several month to their employees 10
Description: