ebook img

High Frequency Statistical Arbitrage via the Optimal Thermal Causal Path PDF

40 Pages·2012·0.71 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview High Frequency Statistical Arbitrage via the Optimal Thermal Causal Path

High Frequency Statistical Arbitrage via the Optimal Thermal Causal Path V L Raju Chinthalapati Department of Mathematics, London School of Economics Department of Accounting & Finance, University of Greenwich 6th October 2011 Abstract Weconsidertheproblemofidentifyingsimilaritiesandcausalityrela- tionships in a given set of (cid:12)nancial time series data streams. We develop further the \Optimal Thermal Causal Path"[28, 27] method, which is a non-parametric method proposed by Sornette et al. The method consid- ers the mismatch between a given pair of time series in order to identify the expected minimum energy path lead-lag structure between the pair. Traders may (cid:12)nd this a useful tool for directional trading, to spot arbi- trage opportunities. We add a curvature energy term to the method and weproposeanapproximationtechniquetoreducethecomputationaltime. Weapplythemethodandapproximationtechniqueonvariousmarketsec- tors of NYSE data and extract the highly correlated pairs of time series. We show how traders could exploit arbitrage opportunities by using the method. Keywords Statistical Arbitrage, Time-series Classi(cid:12)cation, Optimal Thermal Causal Path. 1 Introduction Thesuccessofatraderinthestockmarketdependsonvariousfactors,including how well the trader predicts the market direction. A technical trading strategy isnothingbutasetofinvestmentdecisionsbasedonthepatternsinstockprices. Bothacademicliteratureandpractitionerliteraturehaveidenti(cid:12)edmixedresults 1 Electronic copy available at: http://ssrn.com/abstract=2033172 fortheperformanceofthetradingstrategies. Loetal. [19]studytheprocessof identifyingmostpopularpricepaternsthatarecommonlyusedbypractitioners and evaluate the performance of associated trading trategies. For most traders, directionaltradingisanattractivetradingstrategyformakinglargepro(cid:12)ts[16]. Directional trading can be de(cid:12)ned as opening a position (which means buying or selling) to take advantage of an expected price move in a security. It is essentiallytakingadvantageoftheforwardpricetrendsinsecurities. Directional trading involves a bet on the price direction of a given security. Traders will bene(cid:12)tfromariseinthepriceiftheyarelong(buying)orfromadeclineinthe price if they are short (selling). Trading strategies based on causality relation patterns between two time series are considered to be one of the important class of directional trading trading strategies. Limited academic literature is available on trading strategies based on causality relation. Brooks et al. [6] examinethelead-lagrelationshipbetweentheFTSE100indexandindexfutures price using various time series models and use the predictive ability to derive a tradng strategy. Shan et al. [26] study the causality relation between news and (cid:12)nancial instuments trading activities using data mining techniques in order to build trading models. Kleinberg et al. [18] study causal relationships in stock returns with temporal logic based methods and apply the methods to construct optimal trading rules. Directional trading strategies o(cid:11)er excellent returns (with risk, of course) [22]. The notable aspect of this trading strategy isthatittendstobemostpro(cid:12)tableduringvolatileconditions[1]. Hedgefunds and proprietary trading desks of investment banks have identi(cid:12)ed directional trading as a pro(cid:12)table arbitrage strategy [2]. Informally, according to a directional trading strategy, we buy a security if weknowthesecuritypriceisgoingup, andwesellifweknowthesecurityprice is going down. Although it seems simple, the non-trivial question is \how do we know the direction of a given security price"? One possible answer to this question is to identify a security that could \lead" the given security, meaning that the price of the (cid:12)rst security follows (approximately) the trend in price of the second. Now, the next level of non-trivial question is \how do we (cid:12)nd a securitythatleadsthegivensecurityprice"? Itturnsoutthatwehavetoidentify causalityrelationsbetweenpairsofsecurities. Goodcausalrelationshipscanbe identi(cid:12)ed between a pair of similar time sequences. Various distance metrics can be used for similarity classi(cid:12)cation [17], but the most popular one is the 2 Electronic copy available at: http://ssrn.com/abstract=2033172 Euclideanmetric. Identifyingsimilartimeserieshasmanyapplicationsin(cid:12)elds like engineering, medicine, economics and (cid:12)nance. Numerous algorithms have been proposed for time series classi(cid:12)cation, for example [17, 23, 24]. Among allthesealgorithms, DiscreteTimeWarping(DTW)enjoysthebestreputation [29]. One key application of identifying similar time series is directional trad- ing. In this paper, we consider multiple series which evolve over time, for ex- ample, stock prices. Filtering similar time series involves: (a) extracting all pairs of similar time series and (or) (b) extracting all series similar to a spec- i(cid:12)ed series from a collection of time series. For a given pair of time series X = (x(0),x(1),...,x(N −1)) and Y = (y(0),y(1),...,y(M −1)), in DTW, we (cid:12)nd a mapping function between X and Y that satis(cid:12)es certain conditions. We consider similarity measure under Eulidean metric. In addition to minimis- ing the cumulative Euclidean metric between the pair of time series, the other requiredconditionsare(1)Boundaryconditions, (2)Continuity, and(3)Mono- tonicity. The less the cumulative Euclidean metric, the more similarity can be identi(cid:12)ed between the time series. The mapping function ϕ : X → Y, between the time series describes the causality relation, which can be considered as the lead-lag structure between the pair of time series. In reality the time series data is noisy and it might have unrelated patterns that could lead to wrong conclusions on the causality relation. This motivates us to (cid:12)nd the expected causality relationship among the givenset of time series data. We use the \Optimal Thermal Causal Path"[28, 27] method in order to identify the expected causality relation. We add a curvature energy term to the method to improve causality relation accuracy. Extracting a similar pair of time series from huge data sets is a computationally intensive process. In ordertoreducethecomputationaltime,weproposeanupperboundingmeasure that is similar to the lower bound of the fast search method for dynamic time warping (FTW) [24]. We propose an approximation scheme that reduces the computational time for NYSE TAQ data sets. From NYSE TAQ data sets, we extract the highly correlated stocks and conduct extensive experiments. We show how traders could exploit arbitrage opportunities by using the method. The organisation of the rest of the paper is as follows. Section 2 discusses related work. Section 3 covers the required background material. In section 4 we show how one can use the modi(cid:12)ed optimal thermal causal path method for 3 directionaltrading. Section5describestheapproximationschemetoreducethe computational time. Section 6 reviews the experiments for statistical arbitrage trading. 2 Related Work A pure arbitrage is a (cid:12)nancial transaction practice with no net investment that makesstrictlypositivepayo(cid:11)withnopossibilityofanegativepayo(cid:11). Inreality, even in the ideal case, it is rare to (cid:12)nd pure arbitrage. As a relaxation of pure arbitrage, we can de(cid:12)ne expectations arbitrage or statistical arbitrage which is not risk-less. A statistical arbitrage is a (cid:12)nancial transaction practice with no net invest- ment that makes a strictly positive expected payo(cid:11) with no possibility of a negative payo(cid:11). But, the E(cid:14)cient Market Hypothesis (EMH) states that there will not be any pricing anomalies and that prices in the market fully re(cid:13)ect the available market information at any time. In the literature, we (cid:12)nd a lot of empirical research that could relate to EMH [25, 9, 10]. According to EMH, we cannotpredictthepricemovements. Randomwalkmodelsforpricemovements were developed and their history can be traced back to Bachelier’s Ph.D. thesis in 1900 [4]. Empirical research suggested that market prices can be partially predicted [20]. In the academic literature we (cid:12)nd quite a few statistical arbi- tragerelatedscholarlycompositions[14,12,5,7]. Bondarenko[5]provesthatif the pricing kernel is path dependent, then no statistical arbitrage opportunities exist. Burgess [7] introduces general directions in which cointegration analysis can be generalised to statistical arbitrage. Getmansky and Lo [12] explain lim- its of arbitrage and why some of the statistical arbitrage opportunities might not be exploitable for a small hedge fund. Hogan et al. [14] describe how to verify the existence of statistical arbitrage and counter-argue with EMH. They introducetheconceptofstatisticalarbitragewhichteststhee(cid:14)ciencyofmarket without specifying an equilibrium model. Statistical arbitrage techniques were (cid:12)rst used at Morgan Stanley in 1980 [11]. Popular mathematical concepts [8, 19] used in statistical arbitrage are: timeseriesanalysismethods,neuralnetworksandpatternrecognitionmethods, particle physics concepts etc. In this paper we use particle physics concepts of free energy and energy minimisation [28, 27]. 4 The most popular technique of statistical arbitrage is pairs trading [11]. It is essentially a mean reverting version of statistical arbitrage. In pairs trading, the arbitrageur identi(cid:12)es a pair of securities whose price di(cid:11)erence (spread) is (cid:13)uctuatingaroundaconstantmean(believedinmeanreversion). Atsometime, if the price di(cid:11)erence is deviating (increasing or decreasing) from the believed constant mean, then the arbitrageur buys the underpriced security and short sells(sellingasecuritythathasbeenborrowed)theoverpricedone. Thehopeof the arbitrageur is that the payo(cid:11)s di(cid:11)erence will converge back to the believed constant and he/she will make a pro(cid:12)t. It is di(cid:14)cult to predict how long the divergencetrendcontinues. Ifthedivergencetrendcontinuesforalongtime,the arbitrageur should invest more or ‘close’ the positions. The arbitrageur might losethegameifhedoesnothaveenoughcapitaltocovertheexpenditureforsuch a long time. As John Keynes says: \Markets can remain irrational longer than you can remain solvent". The limits of the pairs trading technique of arbitrage is explained by Getmansky and Lo [12]. Since contrarian investment strategies are based on mean reversion principle, the pro(cid:12)ts from pairs trading might be merely a disguised way of exploiting these previously documented pro(cid:12)ts of contrarian strategies. Gatev et al. [11] study the pro(cid:12)tability of pairs trading in the U.S. equity market by considering daily data from 1962 to 2002. Their bootstrap results suggest that the pairs trading e(cid:11)ect di(cid:11)ers from previously documented mean reversion pro(cid:12)ts. In a single study, Nath [21] documents pairs trading pro(cid:12)tability of U.S. treasury bills, notes and bonds. Andrade et al. [3]studywhythepricesofsimilarstocksdivergebyconsideringpairstrading as a framework. Directional trading involves a bet on the price direction of a given security. Directional trading does not come under the mean reverting based statistical arbitrage as it depends on the lead-lag relation between the pair of securities. But in this paper, we predict the direction of a given security price by under- standingitscausalityrelationshipwithsomeothersecuritypriceinastatistical sense; that is, we consider the expected causality relation. Since we use the ex- pected causality relation for betting on the price, the returns are not risk-free. Unfortunatelynotmuchacademicliteratureisavailableonthepro(cid:12)tabilityand performanceofthesekindsofdirectionaltradingstrategies. Forexample,Jorda andTaylor[15]studytheperformanceofdirectionaltradingstrategiesandverify the claim that directional forecasts beat coin toss strategies. 5 3 A Prelude 3.1 Time Series Classi(cid:12)cation Atimeseriesisasequenceofmeasurementsforavariableatdi(cid:11)erenttimesteps. Usuallythesemeasurementsaretakenatuniformtimeintervals. Fortimeseries analysis we consider large time series data bases, for example all stock prices of the NYSE. For the given sample of a pair of time series, X =(x(0),x(1),...,x(N −1)) (1) Y =(y(0),y(1),...,y(M −1)) (2) we try to (cid:12)gure out the similarities between them. Similarity search helps in (a) extracting all pairs of similar time series (clas- si(cid:12)cation), (b) extracting all series similar to a speci(cid:12)ed series from a collection of time series (clustering) and (c) extracting causality relationship among the set of time series (association rules) [17]. Quanti(cid:12)cation of similarity can be considered as domain speci(cid:12)c and subjective. A similar pair of time series has minimum dissimilarity, and the simplest way to compute dissimilarity is by the Euclidean distance metric. For a pair of time series X and Y of equal length (M =N), v u uN∑(cid:0)1 t D(X,Y)= (x(i)−y(i))2 (3) i=0 The above Euclidean distance metric gives the total dissimilarity between X and Y that is based on one-to-one alignment of the pair of time series. Even thoughthegiventwotimeseriesaresimilartoeachother, stilltheymayhavea phasedi(cid:11)erence. DynamicTimeWarping(DTW)isthemostpopularsimilarity measure technique that could incorporate nonlinear alignments [23]. When we are trying to capture causality relationship for example, lead-lag structure between two stock prices, we need to construct nonlinear alignments so that we can identify the appropriate lead-lag relation even if there exists a phase di(cid:11)erence. In DTW, we (cid:12)nd the best possible alignment warp [17] ϕ between X and Y. For that we de(cid:12)ne a local distance measure e(x(i),y(j)) = |x(i) − y(j)| and do a systematic comparison between all values of the given time series X 6 t 2 y(M) . . . y(2) y(1) x(1) x(2) ... x(N) t 1 Figure 1: DTW and Y. Figure 1 shows a two dimensional grid where each node (x(i),y(j)) is assigned a value that is equivalent to e(x(i),y(j)). As an analogy, e(x(i),y(j)) isconsideredasthelocalenergymismatchatthenodesofthegridforthegiven pair of time series. Extending the analogy further, one can consider the grid as the energy matrix for the given pair of time series. The causality relationship between X and Y can be explained by using the mapping from X to Y (the \warping path " between X and Y). The dissimilarity E(X,Y), for the given mapping function ϕ is the cumulative sum of the local energies ∑ Eϕ(X,Y)= e(x(i),y(ϕ(i))). (4) i SinceweneedtominimisedissimilaritybetweenX andY, wesimplysearch forthemappingachievingminimumenergy,amongallpossiblemappingsinthe energy grid. Let N∑(cid:0)1 E(X,Y)= min |x(t )−y(ϕ(t ))|, (5) 1 1 ϕ(t1),t1=0,1,...,N(cid:0)1t1=0 We impose some constraints on ϕ for this optimisation problem: (1) end constraints: ϕ(0) = 0 and ϕ(N −1) = M −1 (where M is a parameter and 7 M ≤ N, considered to be (cid:12)xed for the moment) and (2) monotonicity and smoothing constraints: 0≤ϕ(t +1)−ϕ(t )≤1. Note that the mapping ϕ can 1 1 be a multi-valued function and in that case it is di(cid:14)cult to interpret. In order to ensure that ϕ is a well-de(cid:12)ned function that express the dependence relation between the two time series, we may follow the convention [28] to map t to 1 the largest value t of the vertical segment corresponding to t [28, 27]. For 2 1 example, in Figure 1, for instance, we have ϕ(3)=2. Note that, subject only to the second constraint, we have some (cid:13)exibility in imposing the end constraints. The smoothing constraint, implies (1) A path that starts at (0,0) will end at (N−1,ϕ(N−1)), where ϕ(N+1)≤ N −1 (2) A path that starts at (k,ϕ(k)) will end up at (N −1,r), where r ≤ϕ(k)+N −1−k. In reality, the lead-lag relation between two time series could interchange. That means, the leading and lagging roles can interchange between the given time series X and Y. The solution to the above optimisation problem (5) is based on the following dynamic programming for a given set of arbitrary end points. Let E(t ,t ) be the minimum accumulated energy between the starting 1 2 grid point (0,0) and the ending grid point (t ,t ). 1 2 E(t ,t ) = e(x(t ),y(t )) 1 2 1 2 +min[E(t −1,t ),E(t ,t −1),E(t −1,t −1)], (6) 1 2 1 2 1 2 where t =ϕ(t ). 2 1 3.2 Optimal Thermal Causal Path Unfortunately,timeseriesarenotnoisefree,andtheabovemethodmayextract unrelated structures between X and Y. Another problem is that most of the (cid:12)nancial time series are not strictly stationary and lead and lag roles would dynamically change between the given pair of stocks. One can use Granger causality methods [13], but these tests require substantial amount of data. Sor- netteetal.[28,27]chooseaninterestingapproachtopredictcausalityrelations by using statistical physics techniques. Intheaboveoptimisationmethod(5),Sornetteetal[28,27]considerweighted average over many potential mappings (which have more energy than the opti- 8 Optimal Thermal Causal Path ^ ^ S(t ) S(t+1) S(t ) 1 1 ^ S(t−1) S t −1 t t +1 1 1 1 Figure 2: Optimal thermal causal path mal path) around the optimal path. The weight for a mapping is proportional to e(cid:0)ETϕ′ , where T′ describes the allowed deviation from the minimum energy path. It turns out that for each of the nodes (t ,s) on the grid, (cid:12)nding the 1 probability P(t ,s) that a path passes through s at time t . Here, 1 1 ∑ P(t1,s)∝ eETs′l l where E represents accumulated energy of a path l between (0,0) and (t ,s). sl 1 ∑ P(t1,s)=ω^ e−TE′sl l where ω^ is proportionality constant. Then the position of the optimal thermal causal path at t is 1 M∑(cid:0)1 s^(t )= sP(t ,s). (7) 1 1 s=0 The sequence (s^(0),s^(1),...,s^(N − 1)) represents the optimal thermal path trajectory on the grid as shown in Figure 2. ∑ ∑ ∑ Let G(t1,s)= le(cid:0)ETs′l and G(t1) = sG(t1,s). Since sP(t1,s)=1 and 1 ω^ = ∑ , (8) G(t ,s) s 1 9 it can be veri(cid:12)ed that G(t ,s) P(t ,s)= 1 (9) 1 G(t ) 1 and M∑(cid:0)1 G(t ,s) s^(t )= s 1 . (10) 1 G(t ) 1 s=0 In statistical mechanics, G(t ,s) and G(t ) are known as partition function 1 1 and total partition function respectively. Since all paths those are reaching (t +1,s+1)mustpassthroughoneofthepoints(t ,s),(t ,s+1)and(t +1,s), 1 1 1 1 it can be veri(cid:12)ed that the partition function satis(cid:12)es the following recursive relation ( ) ∑ −(E +e(x(t ),y(s+1))) G(t +1,s+1) = exp l 1 1 T′ allpathslbetween(0,0)and(t1,s) ( ) ∑ −(E +e(x(t ),y(s+1))) + exp l 1 T′ allpathslbetween(0,0)and(t1,s+1) ( ) ∑ −(E +e(x(t +1),y(s+1))) + exp l 1 T′ allpathslbetween(0,0)and(t1+1,s) ( ) −(e(x(t +1),y(s+1))) = [G(t ,s)+G(t ,s+1)+G(t +1,s)]exp 1 1 1 1 T′ where E represents the energy of a path l between the given end points. l Synthetic Example: Let us consider a synthetic example and study the lead- lag relation. Figure 3 shows the daily price dynamics of stock X over a period of time. Assume that there exists a stock Y that is lagging behind X by L (some constant) units of time (of course, in reality it is almost impossible to (cid:12)nd such a pair of stocks!). Then the price dynamics of Y can be expressed as Y(t)=X(t−L)andthelead-lagrelationshouldbet =t −L. Forthistrivial 2 1 syntheticexample,ifweapplytheoptimalthermalcausalpathmethodinorder to get the lead-lag relation between X and Y then the thermal path must be (sincethereisnonoiseinthetimeseries)t =s^(t )=t −L. Whileconsidering 2 1 1 thelead-lagrelationbetweentwotimeseries,itwouldbeconvenienttotranslate the co-ordinates from (t ,t ) to (x~,t), where x~=t −t and t=t +t . Since 1 2 2 1 2 1 x~=t −t =L, thetransformationisconvenientinordertoclearlyunderstand 2 1 the lead (or lag) between the two time series. Now, the otimal thermal causal path equation (10) can be expressed [28, 27] as 10

Description:
method. Keywords Statistical Arbitrage, Time-series Classification, Optimal High frequency pairs trading with U.S. treasury securities: Risks.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.