Arrivals of Tourists in Cyprus: Mind the Web Search Intensity Theologos Dergiades, Eleni Mavragani, Bing Pan GreeSE Paper No.107 Hellenic Observatory Papers on Greece and Southeast Europe February 2017 All views expressed in this paper are those of the authors and do not necessarily represent the views of the Hellenic Observatory or the LSE © Theologos Dergiades, Eleni Mavragani, Bing Pan TABLE OF CONTENTS ABSTRACT __________________________________________________________ iii 1. Introduction _____________________________________________________ 1 2. A brief review of the literature _______________________________________ 7 3. Methodology ____________________________________________________ 10 3.1 Standard Granger Non-Causality Testing _____________________________ 10 3.2 Frequency Domain Non-Causality Testing ____________________________ 11 4. Data and preliminary econometric analysis ___________________________ 13 4.1 Data sources ___________________________________________________ 13 4.2 Preliminary econometric analysis __________________________________ 20 5. Empirical results ________________________________________________ 22 5.1 Predictive Power of the Web Search Intensity per Country ______________ 22 5.2 Aggregate Predictive Power of the Web Search Intensity ________________ 28 6. Discussion of findings and policy implications _________________________ 33 7. Conclusions ____________________________________________________ 36 References _________________________________________________________ 39 Appendix ___________________________________________________________ 42 ii Arrivals of Tourists in Cyprus: Mind the Web Search Intensity # * † Theologos Dergiades , Eleni Mavragani , Bing Pan ABSTRACT This paper validates the raison d’être of the effortlessly recovered web Search Intensity Indices (SII) for predicting the arrivals of tourists in Cyprus. By using monthly data (2004-2015) and two causality testing procedures we find, for properly selected key-phrases, that web search intensity (adjusted for different languages and different search engines) turns out to convey a useful predictive content for the arrivals of tourists in Cyprus. Additionally, we show that whenever the prevailing shares of visitors come from countries in different languages, then the identification of the aggregate SII becomes complex. Hence, we argue that blindly using key-phrases to identify an aggregate SII is like an immersion into the unknown, since two sources of bias (the language bias and the search engine bias) are fully neglected. Given the importance of the tourism sector in the total economy activity of Cyprus, our findings might prove to be quite useful to governmental agencies, policy makers and other stakeholders of the sector when their purpose is to allocate effectively the existing limited resources, and to plan short- and long-run promotion and investment strategies. Keywords: Cyprus tourism product; web search intensity; predictive content #Theologos Dergiades, Department of International & European Studies, University of Macedonia, Greece, e-mail: [email protected]; *Eleni Mavragani, School of Economics, Business Administration and Legal Studies, International Hellenic University, [email protected]; †Bing Pan, Department of Recreation, Park & Tourism Management, Pennsylvania State University, e-mail: [email protected] iii Arrivals of Tourists in Cyprus: Mind the Web Search Intensity 1. Introduction Over recent years the availability of freely delivered data from copious web sources (social media, search engines, etc.), sparked a new strand in the empirical literature, the so-called real-time economics.1 In one of the earliest studies in economics,2 that actually inaugurated the field, authored by Hal Ronald Varian (Chief economist at Google) and Hyunyoung Choi (senior economist at Google), there are vigorous signs that properly selected query indices (provided by Google) are useful in prognosticating the activity in different economic sectors, such as the automobile industry and the tourism market.3 This corner stone study has triggered a flurry of scientific publications that use web-related data which aim to explain upcoming trends in various markets. Among others, empirical applications have been conducted for several foreign exchange markets, stock markets, sovereign bond markets, labor markets or even real estate markets. In all the above markets, there is credible evidence that web-related data offer added value when it comes to predicting upcoming events. 1The usefulness of the web search intensity data in predicting events was firstly recognized by researchers conducting studies in the field of medicine (see for example: Cooper et al., 2005; Polgreen et al., 2008). 2Ettredge et al. (2005) is the first study that uses web search intensity data resulted from employment related searches as a significant leading indicator for the U.S. unemployment level. 3 See the Choi and Varian (2009) technical report, which at later time it has been published as Choi and Varian (2012). 1 Paying attention on tourism markets, an essential desideratum for practitioners and policy makers, for several reasons, is the accurate prediction of the demand related to tourism products of interest. It is widely accepted that truthful forecasts provide valuable aid for: a) the development of long-run marketing strategies, b) the formation of competent pricing policies, c) the appropriate scheduling of investments into the sector and d) the effective allocation of the limited resources. Hence, the need for new leading indicators that may contribute to predicting, both effectively and timely, consumer preferences, is persistent and more than justified. Given that nowadays, web search engines constitute the major workhorse in scheduling vacations, these can be seen as a new source of information that may help us to improve our understanding with respect to the consumption of the tourism product. However, it is well recognized that a fresh source of information may not be a competent leading indicator, while a competent leading indicator may not be new. Therefore, common practice dictates that extensive empirical testing is more than imperative before the adoption of such sources of information as leading indicators. This study, concentrating on Cyprus, evaluates the impact of the relevant web search intensity, captured by Google, on the consumption of the tourism product. Accurate forecasts of tourism demand in the case of Cyprus are of major importance since the total economic activity of the island heavily relies on the tourism industry. According to the latest KPMG report (April, 2016), the overall contribution of the tourism industry to the economy, for the year 2014, is more than €3 billion, which corresponds to 21.3% of the GDP. Projections for the next 10 years show that the absolute contribution of the tourism sector is 2 expected to experience a steady annual growth with its magnitude to be somewhat below 5%. By 2025, the relative contribution of the tourism sector to overall economic activity is anticipated to be 25.5%.4 Additionally, we concentrate on the search engine of Google for two major reasons. As stated in Yang et al. (2015), Google is the most popular search engine globally, with a market share equal to 66.7%, and at the same time Google provides historical information on the volume of the conducted queries. Another aspect that makes Cyprus an ideal candidate for study is the observed arrival shares by country. While most of the studies focus on destinations where the dominant market share of the arrivals corresponds to English-speaking countries, this is not the case for Cyprus.5 More than 70% of the arrivals in Cyprus come from the United Kingdom (UK, hereafter), Russian Federation (Russia, hereafter), Greece, Germany and Sweden. Such composition in the origin of the arrivals undeniably complicates the process we need to follow in order to identify the related aggregate web search intensity from the Google search engine. A natural difficulty in tracking the aggregate web search intensity for a specific travel destination is the selection of the appropriate language. Indisputably, English is the prevailing language in the Internet (873 million users) followed by the Chinese language (705 million users).6 4 The 2016 tourism market report of KPMG for Cyprus, is available at: https://www.kpmg.com/cy/ 5 To the best of our knowledge the only study that deals with a destination that receives visitors from countries with different countries is that of Choi and Varian (2012). Choi and Varian (2012) act at a disaggregated level only and they do not provide much information about the construction of the search intensity index (e.g. keywords used). 6 Numbers refer to November 2015 (http://www.internetworldstats.com/stats7.htm); accessed June 2016. 3 Given the dominance of the English language, an apparent question is whether the aggregate web search intensity based on the usage of keywords from the English language, would be adequate to reveal the aggregate interest for a specific travel destination. The answer is yes, if and only if all the arrivals to the destination of interest come solely from English-speaking countries (or to be more precise, if all the visitors perform their web searches in English). In any other case, the aggregate constructed index will be biased.7 In particular, as the share of the total arrivals from English speaking countries decreases progressively relatively to non-English-speaking countries, the quality of the identified aggregate web search intensity index (based only on English keywords) is expected to deteriorate in an analogous manner. Hence, failure to take into account, for our entire sample period, all the languages that correspond to the respective source markets of the destination under investigation, will give rise to the first source of bias, let’s call it language bias. To this point, we need to stress that for most of the times the construction of the aggregate web search intensity index (based on Google) for the tourist product of a country, especially when it is about a popular destination, cannot be to an absolute degree free of the language bias. The presence of the language bias, in such cases, is attributed to an inherent feature of the Google trends facility. In particular, the facility does not deliver data if the search volume for the keyword of interest is relatively small. Immediately, it becomes apparent 7 In more detail, as we use only one language (e.g. English) we reveal correctly the web search intensity that it is attributed only a set of countries (the countries that make use the English language, US, UK etc.), while at the same time we neglect entirely the web search intensity that is formed in other countries using other languages. 4 for the source markets with small shares in the arrivals (implying small search volume), that the construction of the corresponding web search intensity index (SII, hereafter) is a non-feasible task. Consequently, even if we wish, we cannot take into account the search volume from all the languages in order to construct a unified aggregate index. As the cumulative market share in the tourism product, for the source markets that there is not enough volume to construct an index, increases, the quality of the aggregate index is expected to fade. Overall, clearly the language bias is not a question of presence or absence, but rather it is a question about its various degrees. Even if at some point of our sample (e.g. in the beginning) all the major source markets use the same language we continue to run the risk of encountering this so-called language bias, since there is no guaranty that this will be the case at any other point of time. New source markets, using different languages, progressively may earn a greater arrival shares thus reducing or displacing the share of existing source markets. In other words, misleading web search intensity may be received once we fail to take into account source markets that gradually earn larger shares in the arrivals. For example, let’s assume the following: a) over a long-period German-speaking countries are consistently the dominant source markets for a destination but with a declining share over-time and b) Russian-speaking countries initially had a small share in the arrivals (small enough in order not to have enough search volume) but with an increasing trend over time. In the above example, if we extract the web search intensity solely based on German keywords, then the aggregate web search intensity for the destination of interest is misleading. Therefore, we need to examine the dynamic evolution of the shares that 5 each source market has. Overall, it becomes apparent; that accurate identification of the aggregate web search intensity necessitates knowledge of all those source markets that contribute to the total arrivals for the entire sample of investigation. In our effort to measure web search intensity, another source of bias may result from the usage of the Google trends facility itself, if Google is not the dominant search engine in the source market of interest; let’s call it search engine bias. In such cases, the measured volume of queries by the Google trends facility underestimates the true volume of relevant queries, failing this way to convey the precise interest of the users and its evolution over time. Obviously, the bias of the SII delivered by the Google trends facility will be zero if the share of Google for the total number of web searches in the source market is 100%, and increases as the above share of Google decline. By using two alternative causality testing techniques (the first test takes place in the time domain while the second one in the frequency domain) and introducing a simple way to select appropriate key-words, we investigate the predictive power of Google’s SII towards the arrivals of tourists in Cyprus at an aggregate and disaggregate level. The findings from our analysis are the following: a) All the country-specific SII are highly significant in predicting arrivals from the respective source market, b) the presence of both sources of bias, the language bias and the search engine bias, render as ineffective the aggregate SII to predict the total number of tourist arrivals and finally, c) once we consider the two sources of bias, the corrected aggregate SII now turns out to convey a precious predictive content in relation to the arrivals that come from 6
Description: