Stübinger, Johannes; Mangold, Benedikt; Krauss, Christopher Working Paper Statistical arbitrage with vine copulas FAU Discussion Papers in Economics, No. 11/2016 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Stübinger, Johannes; Mangold, Benedikt; Krauss, Christopher (2016) : Statistical arbitrage with vine copulas, FAU Discussion Papers in Economics, No. 11/2016, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institute for Economics, Nürnberg This Version is available at: http://hdl.handle.net/10419/147450 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. Discussion Papers in Economics No. 11/2016 Statistical arbitrage with vine copulas Johannes Stübinger University of Erlangen-Nürnberg Benedikt Mangold University of Erlangen-Nürnberg Christopher Krauss University of Erlangen-Nürnberg ISSN 1867-6707 _____________________________________________________________________ Friedrich-Alexander-Universität Erlangen-Nürnberg Institute for Economics https://www.iwf.rw.fau.de/research/iwf-discussion-paper-series/ Statistical arbitrage with vine copulas Johannes Stu¨bingera,1, Benedikt Mangolda,1, Christopher Kraussa,1 aUniversity of Erlangen-Nu¨rnberg, Department of Statistics and Econometrics, Lange Gasse 20, 90403 Nu¨rnberg, Germany Tuesday 25th October, 2016 Abstract We develop a multivariate statistical arbitrage strategy based on vine copulas - a highly flexible instrument for linear and nonlinear multivariate dependence modeling. In an empir- ical application on the S&P 500, we find statistically and economically significant returns of 9.25 percent p.a. and a Sharpe ratio of 1.12 after transaction costs for the period from 1992 until 2015. Tail risk is limited, with maximum drawdown at 6.57 percent. The high returns can only partially be explained by common sources of systematic risk. We bench- mark the vine copula strategy against other variants relying on the multivariate Gaussian and t-distribution and we find its results to be superior in terms of risk and return charac- teristics. The multivariate dependence structure of the vine copulas is time-varying, and we see that the share of copulas capable of modeling upper and lower tail dependence increases well over 90 percent at times of high market turmoil. Keywords: Finance, statistical arbitrage, pairs trading, quantitative strategies, copulas. Email addresses: [email protected] (Johannes Stu¨binger), [email protected] (Benedikt Mangold), [email protected] (Christopher Krauss) 1The authors have benefited from many helpful discussions with Ingo Klein. 1. Introduction Pairs trading is a relative-value arbitrage strategy, where an investor seeks to profit from mean-reversion properties of the price spread between two co-moving securities. Gatev et al. (2006) provide the first major academic study on this subject, with excess returns of up to 11 percentp.a. from1962until2002ontheUSstockuniverse. Eversinceitspublication, several pairs trading approaches have emerged, using different methodologies for pairs selection and pairs trading - Krauss (2016) provides a recent survey. One stream of literature focuses on copula-based pairs trading. Key representatives are Liew and Wu (2013); Xie and Wu (2013); Stander et al. (2013); Xie et al. (2014); Krauss and Stu¨binger (2015); Rad et al. (2016). These studies use bivariate copulas to model the dependence structure between two stock return time series, and to identify mispricings that can potentially be exploited in a pairs trading application. The most comprehensive contribution is provided by Rad et al. (2016), which is briefly described following Krauss (2016);Radetal.(2016). First, duringaformationperiod, similarpairsareselectedbasedon minimizing the sum of squared distances in normalized price space, as in Gatev et al. (2006). The top 20 pairs are retained. Second, the authors fit parametric marginal distribution functions to the return time series of each stock of the top 20 pairs. Third, the returns are transformed into their relative ranks. Then, several different copulas are fitted for each pair and the best-fitting one is selected based on information criteria. Fourth, conditional distributions are derived as first partial derivatives of the copula function, given as C(u ,u ): 1 2 ∂C(u ,u ) 1 2 h (u |u ) = P (U ≤ u |U = u ) = , (1) 1 1 2 1 1 2 2 ∂u 2 ∂C(u ,u ) 1 2 h (u |u ) = P (U ≤ u |U = u ) = . 2 2 1 2 2 1 1 ∂u 1 In the next step, the conditional probabilities from equation (1) are transformed to daily mispricings m and m for a time t by subtracting a median value of 0.5: 1,t 2,t m = h (u |u )−0.5, m = h (u |u )−0.5, t ∈ T. (2) 1,t 1 1,t 2,t 2,t 2 2,t 1,t Fifth, m and m are used to construct mispricing indices M and M , given as 1,t 2,t 1,t 2,t M = M +m , M = M +m , t ∈ T, (3) 1,t 1,t−1 1,t 2,t 2,t−1 2,t 2 with M = M = 0. Following Rad et al. (2016), positive values of M and negative 1,0 2,0 1,t values of M indicate that stock 1 is overvalued compared to stock 2 and vice versa. Rad 2,t et al. (2016) open a pairs trade at time t if M > 0.4 and simultaneously M < −0.4 and 1,t 2,t vice versa. The trade is closed when both mispricing indices reach a level of zero again. Lau et al. (2016) perform an initial abstraction of this concept to three-dimensional space for Bernstein copulas, with a demonstration on three stocks. We enhance the existing literature in several respects. First, instead of a two-dimensional pairs trading framework in the sense of Gatev et al. (2006); Rad et al. (2016), we construct a multivariate copula-based statistical arbitrage framework in the sense of Avellaneda and Lee (2010). Specifically, for each stock in our S&P 500 data base, we find the three most suitable partners by leveraging different selection criteria. As such, we operate in four-dimensional space (one target stock, three partner stocks) - one of the simplest show cases to benchmark the multivariate models we deploy. A generalization to higher dimensions is straightforward. Empirically, increasing the dimension of the partner portfolio usually leads to higher per- formance - see, for example, Perlin (2007); Avellaneda and Lee (2010); Chen et al. (2012). Second, we benchmark various multivariate copula models to capture the dependence struc- ture of our quadruple, consisting of one target stock i and three partner stocks. We make use of the multivariate Gaussian and the multivariate t-distribution as baseline models for financial market data. These reference cases are compared against vine copulas, a novelty in high-dimensional dependence modeling and state-of-the-art in the copula literature due to their superior flexibility (Low et al. (2013); Weiß and Supper (2013)). Third, we perform a large-scale empirical study on the S&P 500 from January 1990 until October 2015. We find that our vine copula strategy produces statistically and economically significant returns of 9.25 percent p.a. after transaction costs. The results are far superior compared to the mul- tivariate Student’s t-copula (6.76 percent p.a.) or a naive strategy that neglects all partner stocks (0.57 percent p.a.). Similar to Gatev et al. (2006), returns of the vine strategy exhibit low exposure to systematic sources of risk - except for a short-term reversal factor. Monthly alpha after transaction costs still lies at 0.34 percent and tail risk is much lower compared to a simple buy-and-hold investment in the S&P 500. Especially surprising is the fact that the vine strategy does not suffer from consistently negative annualized returns in the recent 3 part of our sample - an issue common among many pairs trading implementations (see, for example, Gatev et al. (2006); Do and Faff (2010); Clegg and Krauss (2016)). Fourth, we analyze the change in chosen copula families in the vine graph over time. We find that in recent years - and especially during financial turmoil - the demand for more flexible copulas increases, allowing for modeling both, upper and lower tail dependence. The rest of this paper is organized as follows. Section 2 briefly describes our data and the software packages we use. Section 3 outlines the methodology, i.e., the partner selection procedure, the workings of the different copula models, the generation of trading signals, and the backtesting approach. In section 4, we present our results and discuss key findings in light of the relevant literature. Finally, section 5 concludes and provides suggestions for further research. 2. Data and Software We run our empirical study on the S&P 500, a highly liquid subset of the U.S. stock market, covering 80 percent of available market capitalization (S&P Dow Jones Indices (2015)). Given intense analyst coverage and high investor attention, this market segment serves as a true acid test for any potential capital market anomaly. We follow Krauss and Stu¨binger (2015) in order to eliminate survivor bias from our data base. First, using Thomson Reuters Datastream, we obtain all month end constituent lists for the S&P 500 from December 1989 to September 2015. Then, we aggregate these lists into a binary matrix, where “1” indicates that a stock is a constituent of the S&P 500 in the subsequent month and “0” the opposite. For all these index constituents, we download the total return indices2, covering the period from January 1990 until October 2015, equally from Thomson Reuters Datastream. By combining both data sets, we are able to replicate the S&P 500 index constituency and the respective prices over time. All relevant analyses are conducted in the programming language R. Table 1 lists the additional packages for dependence modeling, data handling, and financial modeling. 2Returnindicesreflectpricesincludingreinvesteddividendsandadjustedforallfurthercorporateactions and stock splits. 4 Application R package Authors of the R package condMVNorm Varadhan(2015) copula Hofertetal.(2015) fCopulae RmetricsCoreTeametal.(2014) Dependencemodeling permute Simpson(2015) Rcpp Eddelbuetteletal.(2016) VineCopula Schepsmeieretal.(2015) vines Gonzalez-FernandezandSoto(2015) dplyr WickhamandFrancois(2016) ReporteRs Gohel(2016) Datahandling xlsx Dragulescu(2014) xts RyanandUlrich(2014) zoo Zeileisetal.(2015) fUnitRoots Wuertz(2013) lmtest Hothornetal.(2015) PerformanceAnalytics PetersonandCarl(2014) QRM PfaffandMcNeil(2014) quantmod Ryan(2015) Financialmodeling sandwich LumleyandZeileis(2015) texreg Leifeld(2015) timeSeries RmetricsCoreTeametal.(2015) tseries TraplettiandHornik(2016) TTR Ulrich(2015) Table 1: R packages used in this paper. 3. Methodology We slice our data set in 281 overlapping study periods. Each study period consists of a twelve-month initialization, a twelve-month formation, and a six-month out-of-sample trading period. Consequently, we have a total of 281 trading periods, of which six overlap and run in parallel. Their resulting returns are averaged in the sense of Gatev et al. (2006), thus consolidating the six portfolio returns to one final return time series. For each study period j, we consider a total of n stocks that are (i) an index constituent on the last day of j the formation period and (ii) exhibit full historical price data, meaning no NA’s. Theinitializationperiod (subsection 3.1)istwo-staged. Thepartnerselection (subsection 3.1.1)dealswithfourdifferentapproachesforobtainingthemostsuitablepartnerstocks. Ev- ery approach is based on a different measure of association and emphasizes different aspects of the joint four dimensional dependence structure. The model fit (subsection 3.1.2) charac- terizes four different variants to adequately describe this multivariate dependence structure. At first, as a reference case, the naive E-model is created, only incorporating past returns of the target stock in the mispricing index and thus neglecting all partner stocks. Then, we construct the G-model and the T-model, relying on the multivariate Gaussian distribution and the multivariate t-distribution for identifying mispricings of the target stock relative 5 to its partner stocks. Finally, we benchmark these implementations against the V-model, making use of highly flexible vine copulas for capturing multivariate mispricings. The formation period (subsection 3.2) is used for creating one out-of-sample mispricing indexpermodelforeachtargetstock. Then, allmispricingindicespermodeltypeareranked based on their augmented Dickey-Fuller (ADF) test statistics in ascending order. Afterwards, all models are re-estimated based on the new return data of the formation period, to achieve an updated calibration for the out-of-sample trading period (subsection 3.3). The top r mispricing indices per model type are continued in the trading period and serve as trading signal for the corresponding top r target stocks. Specifically, a stock is bought (sold short) for each of the strategy variants, when its mispricing index falls below (exceeds) certain threshold levels. The models are called strategies in the trading process, i.e. the E-model corresponds to the E-strategy (E-strat), the G-model corresponds to the G-strategy (G-strat), the T-model corresponds to the T-strategy (T-strat), and the V-model corresponds to the V-strategy (V-strat). 3.1. Initialization period 3.1.1. Partner selection The partner selection procedure aims at identifying a partner triple for each target stock, based on adequate measures of association. All four stocks together (one target stock and its three suitable partners) form the quadruple Q. Given that every stock of the S&P 500 is consecutively considered as target stock, we effectively create n such quadruples, which are j logged in an (n ×4)-output matrix. j We would like to make two preliminary remarks. First, all measures of association are calculated using the ranks of the daily discrete returns X of our samples. The rank trans- formation provides some robustness, since the impact of large values (outliers) is reduced by only considering the position within the ordered sample, not the value itself. Second, we only take into account the top 50 most highly correlated stocks (approximately 10 percent of available stocks n ) for a given target as potential partner stocks, in order to limit the j computational burden. This bivariate preselection speeds up the required calculation time by a factor of 1,000. 6 Traditional approach. A natural way of describing bivariate linear dependence between two variables is correlation. As baseline approach, the high dimensional relation between the four stocks is approximated by their pairwise bivariate correlations via Spearman’s ρ. In addition to the robustness obtained by rank transformation, it allows to capture nonlinearities in the data to a certain degree. Also, we ensure consistency with the other three approaches, which are equally calculated on ranks. The procedure itself is rather simple. First, we calculate the sum of all pairwise corre- lations for all possible quadruples, consisting of a fixed target stock and of one of the (cid:0)50(cid:1) 3 triples of partner stocks. Second, the quadruple with the largest sum of pairwise correlations is considered as Q and saved to the output matrix. Extended approach. Schmid and Schmidt (2007) introduce multivariate rank based measures of association. We rely on a measure that generalizes Spearman’s ρ to arbitrary dimensions - a natural extension of the traditional approach. In contrast to the strictly bivariate case, this extended approach – and the two following approaches – directly reflect multivariate dependence instead of approximating it by pairwise measures only. We expect a more precise modeling of high dimensional association and thus a better performance in trading strategies. Q for a given target stock is obtained by the following procedure: Build every quadruple out of the (cid:0)50(cid:1) possible combinations containing the target stock, calculate the multivariate 3 version of Spearman’s ρ for each quadruple, and select for Q the one with the largest value. Geometric approach. We introduce an intuitive geometric approach for measuring multivari- ate association in order to select Q. For the sake of clarity, we illustrate this measure in the bivariate case. A generalization to higher dimensions is straightforward. Consider the relative ranks of a bivariate random sample, where every observation takes on values in the [0,1] × [0,1] square. If there exists a perfect linear relation among both the ranks of the components of the sample, a plot of the relative ranks would result in a perfect line of dots between the points (0,0) and (1,1) – the diagonal line. However, if this relation is not perfectly linear, at least one point differs from the diagonal. By dropping a perpendicular from that deviating point to the diagonal, one could calculate the Euclidean 7 distance of the deviation. The more the relative ranks deviate from the diagonal, the larger the sum of all their respective deviations. This sum can be used as a measure of deviation from linearity, the diagonal measure. Hence, we try to find the quadruple Q that leads to the minimal value of the sum of Euclidean distances from the relative ranks to the (hyper-)diagonal in four dimensional space for a given target stock. As such, we calculate the four dimensional diagonal measure for every of the (cid:0)50(cid:1) combinations of partner stocks. The target stock together with the 3 triple, that induces the lowest value of the diagonal measure, is saved as Q in the output matrix. Extremal approach. Mangold (2015) proposes a nonparametric test for multivariate inde- pendence. The resulting χ2 test statistic can be used to measure the degree of deviation from independence, so dependence. Main focus of this measure is the occurrence of joint extreme events. A disproportionately high or low occurrence of joint extreme events inflates the measure. Q is the combination of the target stock together with the triple of partner stocks that maximizes this extremal measure. With this approach, we focus more on the multivariate extremal regions of the unit cube, since those events are crucial for any kind of trading strategy. Similar to the geometric approach, the partner selection operates as follows: for a given target stock, we calculate the extreme measure for every combination of the (cid:0)50(cid:1) possible 3 partner triples. The combination that leads to the largest value of the extremal measure is considered as Q and saved to the output matrix. It is important to highlight the differences between the four approaches. The traditional, the extended, and the geometric approach share a common feature - they measure the deviation from linearity in ranks. All three aim at finding the quadruple that behaves as linearlyaspossibletoensurethatthereisanactualrelationbetweenitscomponentstomodel. While it is true that this aspiration for linearity excludes quadruples with components that are not connected (say, independent), it also rules out nonlinear dependencies in ranks. On the other hand, the extremal approach tries to maximize the distance to independence with focus on the joint extreme observations. This includes both, linear and nonlinear relations among the components of Q. Since two of our introduced models (T-model and V-model) 8
Description: