JournalofDiscreteAlgorithms30(2015)1–12 ContentslistsavailableatScienceDirect Journal of Discrete Algorithms www.elsevier.com/locate/jda Range search on tuples of points ∗ Akash Agrawal, Saladi Rahul, Yuan Li, Ravi Janardan Dept. of Computer Science and Eng., Univ. of Minnesota-Twin Cities, 4-192 Keller Hall, 200 Union St. S.E., Minneapolis, MN 55455, USA a r t i c l e i n f o a b s t r a c t Article history: Range search is a fundamental query-retrieval problem, where the goal is to preprocess Received 2 April 2014 a given set of points so that the points lying inside a query object (e.g., a rectangle, or Received in revised form 20 October 2014 a ball, or a halfspace) can be reported efficiently. This paper considers a new version of AAvccaeilpatbelde o2n1l iOncet o1b1e rN o2v0e1m4ber 2014 tohf et hrea nsgeet ss. eTahrech g oparol bisle tmo :p Sreevperoracle sdsi stjhoein st estest ss oo ft hpaoti nfotsr aanrey gqiuveerny aplooningt wqiathn da na odridstearnincge δ, all the ordered sequences (i.e., tuples) of points can be reported (one per set and Keywords: Algorithms consistent with the given ordering of the sets) such that, for each tuple, the total distance Data structures traveled starting from q and visiting the points of the tuple in the specified order is no Computational geometry more than δ. The problem has applications in trip planning, where the objective is to Spatial query processing visit venues of multiple types starting from a given location such that the total distance Range search traveled satisfies a distance constraint. Efficient solutions are given for the fixed distance Geometric transformation and variable distance versions of this problem, where δis known beforehand or is specified as part of the query, respectively. ©2014 Elsevier B.V. Allrightsreserved. 1. Introduction 1.1. Motivation Range search is a fundamental query-retrieval problem. In this problem, we are given a set, S, of n points in Rd and for any query object q (e.g., a rectangle, or a ball, or a halfspace) we wish to report the points of S lying inside q. Since many queries may be asked, it is worthwhile investing effort to preprocess S into a suitable data structure (i.e., index) so that the query time is small. Thus, the resource measures of interest are the query time and the space used by the data structure. Due to its many applications in diverse domains such as spatial databases, robotics, VLSI design, CAD/CAM, computer graphics etc., the range search problem has been studied extensively in the computational geometry and database literature and space- and query time-efficient solutions have been devised for many instances of the problem (see, for instance, [2,3,8,20,22]). In this paper, we consider a new type of range search problem that is motivated by applications in trip planning. Consider the following scenario: Suppose that we are given a database that contains the locations of venues of different types (e.g., restaurants and theaters) in a large city. A tourist is interested in visiting a restaurant and a theater, in that order, but has a constraint on the maximum travel distance (due to reasons such as cost of travel, time, mileage restriction etc.). She wishes to identify a subset of (restaurant,theater) tuples, among the set of all such tuples, so that, for every tuple in the subset, the distance from her current location to the restaurant plus the distance from the restaurant to the theater is no * Corresponding author. E-mail addresses:[email protected](A.Agrawal), [email protected](S.Rahul), [email protected](Y.Li), [email protected](R.Janardan). http://dx.doi.org/10.1016/j.jda.2014.10.006 1570-8667/©2014 Elsevier B.V. Allrightsreserved. 2 A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 Fig.1.Locations of two different types of venues (shown as red and blue points). Point qis the query point. The disk with radius δrepresents the maximum travel distance constraint. (All figures in this paper are best viewed in color.) larger than the specified distance. Therefore, she issues such a query from her location using, say, an application running on her smartphone and gets back the set of (restaurant,theater) tuples satisfying the distance constraint. The returned set is usually much smaller than the set of all (restaurant,theater) tuples and enables the user to make a more informed choice of the tuple to visit. A naïve solution for this problem is to first find all the restaurants and theaters that are reachable from the query location within the specified travel distance and then check this set for the tuples to output. Unfortunately, this can be very expensive. For example, consider Fig.1which depicts a small data-set of restaurants (modeled as blue points bi in the plane) and theaters (red points rj). The user’s location is denoted by the query point q. Let us assume that the maximum travel distance, denoted by δ, is 4 units. This distance constraint is represented as a disk of radius δ centered at q. In this example, it is clear that even though points b1, b2, b3, r3, and r4 are all reachable within travel distance δ, only the (blue,red) tuples (b2, r3), (b3, r3), and (b3, r4) with total travel distance 3.4, 3.4, and 4, respectively, are part of the answer set. The other (blue,red) tuples formed by the points reachable from q within travel distance δ have total travel distance more than δ. (The other possible (blue,red) tuples are (b2, r4), (b1, r4), and (b1, r3) with total travel distance 5.7, 7 and 8.1, respectively.) Now consider a scenario where all the blue points lie on the perimeter of a disk centered at q and of radius δ and all the red points lie inside the disk. It is clear that all the points are reachable from q and yet the total travel distance for every (blue,red) tuple is larger than δ, so that the answer set is empty! This is the worst case scenario for the naive solution. Our search problem generalizes naturally to more than two types of venues (e.g., restaurants, theaters, gas stations, gro- cery stores, etc.) where the objective is to start from the query point, q, and visit one venue of each type, in a pre-specified order, so that the total distance traveled is no larger than the specified maximum travel distance δ. In fact, depending on the application there are two versions of the problem that are of interest. In the first version, δ is known beforehand (during preprocessing) while in the second version δ is known only at query time. (In both versions, q is known only at query time.) We will refer to these versions as the fixed distance and the variable distance problems, respectively. For example, some car rental companies place a restriction on the maximum distance a user can travel. In this case the maximum travel distance, δ, is fixed for all the users and is pre-specified. This is an instance of the fixed distance problem. On the other hand, for some applications, the distance constraint can be imposed by reasons like travel time, travel cost etc. In this scenario, the maximum travel distance, δ, depends on the user and is specified only during the query. This is an instance of the variable distance problem. One might suspect that the fixed distance problem might admit a more efficient solution than the variable distance problem and, as we will see, this is indeed the case. 1.2. Problem statement and contributions In what follows, it will be convenient to associate with each type of venue a distinct color and formulate our problem over colored point-sets in the plane. Let S be a set of n points in R2, where each point is assigned a color ci from (cid:2)a palette of m colors (m ≥2). Let Ci be the set of points of color ci and let ni=|Ci|. Note that Ci∩Cj=∅ if i (cid:6)= j and mi=1Ci=S. Let CS=(c1, ..., cm) be a given ordering of the colors. We wish to preprocess S into a suitable data structure so that fo(cid:3)r any query point q in R2 and a distance δ>0, we can report all tuples (p1, ..., pm), where pi∈Ci, such that d(q, p1) + mi=2d(pi−1, pi) ≤δ. (Here d(·, ·) denotes Euclidean distance.) Hereafter, we will call this the Colored Tuple Range Query (CTRQ) problem. Throughout, we will discuss the CTRQ problem for m =2 colors, which will henceforth be called the 2-CTRQ problem. As we will discuss in Section 4, our solutions for 2-CTRQ generalize to m >2 colors. For simplicity, we define the 2-CTRQ problem as follows. n1+Lent 2B=bne. Wa es ewt isohf btolu per epporionctes,s sB t h=e {sbe1t,s b, 2B, .a.n.,d b nR1,} ,s oa ntdh atR fobre aan ys eqtu eorf yr epdo inpto iqn t:s(,x qR, y=q){ r∈1,R r22, a.n..d, rdni2s}t,a ninc eR δ2>, w0h, ewree can report all tuples (bi, rj), such that d(q, bi) +d(bi, rj) ≤δ. In this paper, we present efficient solutions to the fixed and variable distance versions of theCTRQproblem. Our results include: A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 3 • An algorithm for the fixed distance 2-CTRQproblem that exhibits a trade-off between space and query time. Specifically, our algorithm uses O(nt) space and has a query time of O((1 +k) logn), where t is a user-specified integer parameter t that controls the trade-off, 1 ≤t≤logn, and k is the output size, i.e., the number of tuples reported. For instance, at the two extremes of the trade-off, i.e., t=1 (resp. t=logn), our algorithm uses O(n) (resp. O(n logn)) space and has aquery time of O((1 +k) logn) (resp. O(logn +k)). This is discussed in Section 2. • Several algorithms for the variable distance 2-CTRQ problem that exhibit√ different space and query time bounds. These include, for instance, an O(n)-space algorithm with a query time of O( nlogO(1)n +k logn) and an O(n log2n)-space algorithm with a query time of O((k +1) logn). In fact, if we allow probabilistic preprocessing then it is possible to reduce the query time to (deterministic) O(logn +k) while still using O(n log2n) space. These results are discussed in Section 3. (Moreover, as in the fixed distance problem, it is possible to obtain a general trade-off in terms of the parameter t, but we omit further discussion of this as it mirrors the approach for the fixed distance problem.) • Algorithms for theCTRQproblem for m >2colors. Specifically, for√ the fixed distanceCTRQproblem, our algorithm uses O(n) (resp. O(n log2n)) space and the query time is O(logn +km nlogO(1)n) (resp. O((1 +km) logn)). For √the variable distanceCTRQproblem, our algorithm uses O(n)(resp. O(n log2n)) space and the query time is O((1 + km) nlogO(1)n) (resp. O((1 +km) logn)). Also, if we allow probabilistic preprocessing then it is possible to reduce the query time for the fixed and the variable distance CTRQ problem to (deterministic) O(logn +k) while still using O(n log2n) space. These results are discussed in Section 4. (Note that the results for 2 colors are not a special case of the result for m >2colors. As will be seen in Section4, the reason is that the solutions for m >2colors involve solving the variable distance 2-CTRQ problem, instead of circular range search, in the second step.) 1.3. Approach and key ideas Our algorithms employ a 2-step approach. In the first step, we identify an appropriate subset of blue points from which to search for (blue,red) pairs to output (with respect to the query point, q, and query distance δ). In the second step we search from each such blue point and identify those red points that are reachable from q within distance δ and output the corresponding (blue,red) pairs. The identification of blue points in the first step needs to be done with care. Selecting too few of these will result in some valid (blue,red) pairs not being reported. On the other hand, selecting too many will lead to a high query time since many of them may not be part of any reported pair (as the example in Section 1.1shows). Thus, a key idea underlying our (exact) algorithms is to select efficiently precisely those blue points that are guaranteed to be part of at least one reported (blue,red) pair, with respect to the query point, q, and query distance δ. This, combined with the second step above, ensures that only those pairs that should be reported are actually reported. Moreover, time is not wasted in processing blue points that will not contribute to the output. We will refer to the blue points so selected as fruitfulpoints. A second key idea concerns tuning the query algorithm to the (unknown) output size. In general, a straightforward application of the 2-step method above does not necessarily result in the best possible query time, particularly if the output size is “small” since then the overall query time is dominated by other terms in the query time. To help balance these costs, we show how to identify a suitable threshold on the output size as a function of the input size, n, and how to query judiciously on the two sides of the threshold to achieve the desired balance; however, this improvement usually comes at the expense of more space. (We note that this is just a rough description of the approach and the general idea manifests itself in different ways in the fixed and the variable distance problems.) 1.4. Related work To the best of our knowledge, there is no prior work on theCTRQproblem. However, as mentioned earlier, range search is a well-studied problem and several theoretical and practical solutions have been proposed for different instances of this problem (see, for instance, [2,3,8,20,22]). The CTRQ problem is closely linked to the circular range query problem, whose goal is to preprocess a given set of n points in Rd so that the points inside a query ball can be reported efficiently. Due to applications in proximity queries, the circular range query problem has been studied extensively [20]. Furthermore, via a standard lifting transformation [9], the circular range query problem in Rd can be transformed into a halfspace range reporting problem in Rd+1. In [1], Afshani and Chan have shown that the halfspace range reporting problem in Rd, d ≥4, can be solved in O(n1−1/(cid:10)d/2(cid:11)logO(1)n +k)query time using an O(n)-space data structure, where k is the output size, which is the most efficient solution known so far. Moreover, as shown in [1], the halfspace range search problem in R3 (hence circular range search in R2) can be solved in O(n) space and O(logn +k) query time. In the context of spatial databases, the CTRQ problem is closely related to the optimal sequenced route (OSR) query [23] (also called multi-type nearest neighbor query [18]or transitive nearest neighbor query[25,26]). The goal of the OSR query is to preprocess the given m sets of points into a data structure such that given a query point and an order in which to visit the sets, we can report an m-tuple of points, one per set, that minimizes the total travel distance when visited in the specified order. The main difference between the CTRQ problem and the OSR problem is that, in the latter, the goal is to output only one tuple that minimizes the total travel distance, whereas in the former all m-tuples within a specified distance from the query point are to be reported. For the OSR problem, Sharifzadeh et al.[23]have proposed three 4 A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 algorithms, LORD, R-LORD, and PNE that utilize certain threshold values to filter out the non-candidate points. The proposed PNE technique can be adapted to report top-k optimal routes also. A version of the OSR problem where the order of visit to the sets is not specified is proven to be NP-hard and is known as the trip planning query (TPQ). In [15], Li et al. have developed approximation algorithms for the instance of TPQ where a source and a destination are specified. Kanza et al.[12] have shown polynomial-time heuristics for the constrained version of the TPQ problem. In [6], Chen et al. have proposed a generalization of the OSR query, called multi-rule partial sequenced route (MRPSR) query, where the order of traversal is defined by a partial order. Due to its practical applications, the OSR problem has been studied extensively and practical solutions have been proposed for many different versions of the problem such as OSR on road networks [17], k-OSR [24], interactive route search[13], traffic-aware OSR [14], and many others [4,10,11,16,19]. 2. Fixed distance 2-CTRQ qp o:i(nRxteqsc,,a yRllq )=t h∈{erR 1p,2 rro2a,bn .lde. m.d, risntth2a}an,t c inew Reδ ,2 ww, wies hhc easrnoe lvnree1p:+ oLrentt 2 a=Bll ntbu. epW laee s s we(btis ih,o r ft jo)b, lpusruee cphpr ootichneatssts, dtBh(q e=, bse{ib)t s1+,, bBd2(,ab .ni.,d .r ,jR b) n≤s1o}δ, t. ahNnadto tfeoR rt hbaanet y a iq nus eethtr yeo pffi oxrieenddt distance version, δ is known beforehand (during preprocessing). We first describe an algorithm that uses O(n) space and has a query time of O((k +1) logn), where k is the output size. We then show how to generalize this to a space-query time tradeoff. As mentioned in Section 1.3, the notion of fruitful blue points is central to our approach. Recall that a blue point is a fruitful point if it is part of at least one output tuple for a given q and δ. The following lemma provides a useful characterization of such points. Lemma 1. For any bi∈B, let rj∈Rbe its closest red point. Let qbe a query point and δthe (fixed) query distance. Then biis fruitful if and only if d(q, bi) +d(bi, rj) ≤δ. Proof. (⇐) If d(q, bi) +d(bi, rj) ≤δ then the tuple (bi, rj) is output. Hence, by definition, bi is fruitful. (⇒) Suppose that bi is fruitful. Thus, there is an rk∈R such that (bi, rk) is output. Thus, d(q, bi) +d(bi, rk) ≤δ. Since d(bi, rj) ≤d(bi, rk), we have d(q, bi) +d(bi, rj) ≤δ. (cid:2) In other words, bi is fruitful if (bi, rj) is part of the answer set, and not fruitful otherwise. Note that whether or not bi is fruitful (for a fixed δ) depends on q, i.e., bi may be fruitful for some q and may not fruitful for some other q. Lemma 1 suggests the following preprocessing step to help determine the fruitful point at query time. For each bi∈B, we compute its nearest red point rj∈R and associate with bi the real number d(bi, rj)as a weight w(bi). For a query point q, bi is fruitful if and only if d(q, bi) +w(bi) ≤δ, i.e., d(q, bi) ≤δ−w(bi), i.e., if and only if q lies inside the disk Di of radius δ−w(bi)centered at bi. Thus, we build a data structure on the disks Di that can identify the disks that are “stabbed” by q, and hence can identify the fruitful blue points. This facilitates the first step of our query algorithm. The second step is to identify for each fruitful blue point, bi, all the red points rk (not merely the nearest red point) that are reachable from q within a distance of δ. Each such rk satisfies the condition d(q, bi) +d(bi, rk) ≤δ, i.e., d(bi, rk) ≤ δ−d(q, bi), i.e., rk is in a disk of radius δ−d(q, bi) centered at bi. To facilitate the identification of such red points, rk, we build a data structure for circular range queries on the set, R, of red points. The above preprocessing steps are shown as Algorithm 1. As noted in Section 1.4, circular range search in R2 can be transformed to halfspace range search in R3 via the lifting map [9]. Likewise, disk stabbing can also be transformed to halfspace range search in R3 using the lifting map followed by geometric duality[9]. Therefore, the structure DSf and DSt in Algorithm1can be implemented as the halfspace range search structures developed in [1]. They occupy O(n) space and answer queries in O(logn +k) time, where k is the output size. Algorithm 1 :Preprocessing. Input: B, R: set of input points; δ: a real Output: DSf: Data structure to find fruitful points; DSt: Data structure to form tuples. 1: D ←∅//set of disks 2: Create Voronoi diagram, Vr, for points in R. 3: Build a point location structure on Vr. 4: for allbi∈Bdo 5: nni←nearest neighbor red point of bi via point location in Vr. 6: ifd(bi, nni) ≤δthen// bi cannot be fruitful ifd(bi, nni) >δ 7: w(bi) ←d(bi, nni) 8: Di←disk with radius δ−w(bi)and centered at bi 9: D ←D ∪{Di} 10: endif 11: endfor 12: Create disk stabbing data structure, DSf, on disks in D 13: Create circular range search data structure, DSt, on points in R A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 5 Algorithm 2 :Query. Input: DSf: Data structure to find fruitful points; DSt: Data structure to form tuples; δ: a real; q: query point Output: O: set of colored tuples satisfying the distance constraint δwith respect to q 1: O ←∅//Find fruitful points forq 2: D(cid:16)←Subset of disks Dstabbed by q, as found by a disk stabbing query on DSf with q 3: F←Subset of blue points that are centers of disks in D(cid:16)// F is the set of fruitful blue points 4: for allbi∈F do 5: ρi←δ−d(q, bi)//Query using fruitful points to form tuples 867::: fRo(cid:16)ir←O a l←lSrujOb∈s ∪eRt{ (cid:16)i(obdf ior, erdj) }points found by a circular range search on DSt with bi as query point and radius ρi 9: endfor 10: endfor 11: return O Additionally, the nearest neighbor-finding structure (Voronoi diagram and point location structure) can be implemented in O(n) space and has a query time of O(logn) [8]. The query algorithm is shown as Algorithm 2. The first step identifies the set F of blue points by querying DS and the f second step identifies tuples to output by querying DSt. We argue that a tuple (bi, rj) is reported if and only if d(q, bi) +d(bi, rj) ≤δ. Suppose that (bi, rj) is reported. Thus, rj is found when DSt is queried with bi and radius ρi=δ−d(q, bi), i.e., d(bi, rj) ≤δ−d(q, bi), i.e., d(q, bi) +d(bi, rj) ≤δ. On the other hand, suppose that d(q, bi) +d(bi, rj) ≤δ. Then, d(q, bi) +d(bi, nni) ≤δ, where nni is the red point nearest to bi. Thus, by Lemma 1, bi is a fruitful point. Hence DSt is queried with bi and radius ρi =δ−d(q, bi) ≥δ−(δ−d(bi, rj)) = d(bi, rj), which implies that rj is found by the query. Hence, (bi, rj)is reported. This establishes the correctness of the query algorithm. The first step takes O(logn +kF) =O(logn +k)time, where kF =|F|is the number of fruitful points and k is the size of the output for the 2-CTRQ query. Crucially, note that kF ≤k since, by definition, every fruitful point belongs to at least one output tuple. The second step takes O(logn +ki) time for each bi∈F, where ki is the size of output of the circular range search from bi, i.e., the number of red points repo(cid:3)rted, hence the number of tuples that bi (cid:3)is a part of in the answer set. Tstheepr edfoomrei, ntahtee st otthael otivmerea lflo rq utehrey steicmoen,d w shteicph isis OO(((kb +i∈F1()l looggnn +). Ekia)c)h = ofO t(h(ke +str1u)c ltougrnes) Vsirn, cDe Sf, bain∈dF kDiS=t uks. eTsh uOs(,n t)hsep saecceo, nsdo the total space used is O(n). 2.1. An example We illustrate our algorithm using the point-set of Fig. 1 with δ=4. First, for each point bi∈B, we compute its nearest neighbor in R and assign the distance as weight w(bi) to bi as shown in Fig. 2a. Next, we filter out all the bi’s for which w(bi) is larger than δ and create a disk, Di, around each remaining weighted point bi∈B with radius δ−w(bi), as shown in Fig.2b. (Since, in this example, we do not have any bi with weight larger than δ, we do not discard any weighted points.) For a given query point, q, as shown in Fig. 2c, we find the disks D2 and D3 (shown in green) stabbed by q. The points b2 and b3 corresponding to the centers of the stabbed disks D2 and D3, respectively, are the fruitful points. Next, as shown in Fig. 2d, we perform two circular range search queries, one centered at b2 with radius δ−d(q, b2) =4 −1.4 =2.6, and one centered at b3 with radius δ−d(q, b3) =4 −2 =2. We then form tuples (b2, r3) with total distance 3.41 from q, since r3 is the only point reported in the circular range search with b2 as center, and (b3, r3) and (b3, r4) with total distance 3.41 and 4, respectively, from q, since r3 and r4 are the points reported in the circular range search with b3 as center. 2.2. Space–time trade-off Ideally, one would like a query time of O(logn +k)for the 2-CTRQproblem instead of O((k +1) logn) =O(logn +k logn). The roadblock to this is the second step, i.e., the time to perform a circular range query from each fruitful blue point. For a fruitful point bi, the circular range query takes O(logn +ki) time, where ki is the number of red points reported. An important observation is that if ki≥logn then O(logn +ki) =O(ki); otherwise O(logn +ki) =O(logn). Based on this observation, we can divide the fruitful points into two sets; heavy fruitful points, bi, for which ki≥logn and light fruitful points, bi, for which ki<logn. The heavy fruitful points are, in fact, “good” points because the output size (ki) dominates the query overhead (O(logn)), which allows us to absorb the latter inside the former and obtain a query bound of O(k) for such points. On the other hand, for the light fruitful points, the query overhead dominates the output size and, therefore, results in the k logn term in the query time. Therefore, our approach is to use a simpler data structure for the light fruitful points that avoids the O(logn) query overhead. For the heavy fruitful points, we will continue to use the circular range search structure DSt. Note that we do not know ahead of time which blue points are fruitful and which among these are light or heavy, as this depends on q. Therefore, we will build the simpler data structure for each blue point bi ∈B. This structure is merely a linked list, Li, which contains the (logn)-nearest red points to bi, stored in non-decreasing order of their distances from bi (ties broken arbitrarily). 6 A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 Fig.2.Arunningexampletoillustrateoursolutionforfixeddistance2-CTRQwithδ=4. To answer a query q, we first identify the set, F, of fruitful blue points using DSf. Next, we classify each bi∈F as light or heavy as follows. Let rˆi denote the last point in Li, i.e., the one that is (logn)-nearest from bi. It is easy to see that if d(q, bi) +d(bi, rˆi) ≤δ then bi is heavy; otherwise bi is light. For each bi that is light, we scan Li in non-decreasing order and output (bi, rj)for each rj that is seen such that d(q, bi) +d(bi, rj) ≤δ. (We stop the scan as soon as this condition fails.) The query time is O(ki+1), which, as desired, avoids the O(logn) query overhead that was incurred previously. Thus, the total query time for all light points queried is O(k +kF) =O(k), since kF ≤k. For each heavy point bi, we query DSt, which takes O(logn +ki) =O(ki)time (since ki≥logn). Thus the query time for all heavy points queried is O(k). Thus the total time for the second step of the2-CTRQquery is O(k). The overall query time, inclusive of the first step, is hence O(logn +k), which improves upon the bound of O((k +1) logn) obtained previously. This improvement comes at the expense of a slightly higher space bound of O(n logn) since we store a list Li of length logn with each bi∈B. The preceding approach can be generalized. For a user-specified integer parameter t, 1 ≤t≤logn, we store a list of the t-nearest red points with each blue point. For a given q, there can be at most k heavy fruitful points, where k is the output t size, since the output size of a heavy blue point is, by definition, at least t. Therefore, we perform at most k circular range t queries, each of which takes O(logn +ki) time and yields a total of O(kt logn +k) =O(kt logn) over all heavy points. The time for the light points continues to be O(k), as before. Therefore, the overall time for the 2-CTRQ query, i(cid:4)nclusive of the first step, is O(logn +kt(cid:4)logn +k) =O((1 +kt) logn) a(cid:4)nd the space requirement is O(nt). For example, if t= logn then we get a scheme with O(n logn) space and O(logn +k logn) query time. Theorem 1. A fixed-distance2-CTRQ query on a set of nred and blue points can be answered in O((1 +k) logn)time using O(nt) t space, where tis a user-specified parameter, 1 ≤t≤lognand kis the output size. In particular, t=1(resp. t=logn) yields a solution with O((k +1) logn)(resp. O(logn +k)) query time and O(n)(resp. O(n logn)) space. 3. Variable distance 2-CTRQ In this section, we discuss our solution for the variable distance 2-CTRQ problem. As before, we present a two-step solution for this problem, where, in the first step, we find the set of fruitful points and, in the second step, we explore B and R using the fruitful points to form and report the tuples. However, the challenge now is that δ is not known at preprocessing time. Therefore it is not possible to build the disk stabbing data structure DS to find the fruitful blue points. f In the rest of this section we discuss alternative methods to find these fruitful points. (The second step of our query algorithm remains unchanged as it does not rely on knowing δ at preprocessing time.) A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 7 It is easy to see that Lemma 1 continues to hold even if δ is variable. Thus, as before, we precompute for each bi∈B its closest red point rj∈R and store with bi the weight w(bi) =d(bi, rj). Thus, the problem of finding fruitful points of B for q and δ boils down to finding a set of weighted points bi∈B such that d(q, bi) +w(bi) ≤δ. In Section 3.1, we describe analgorithm for finding fruitful points that is based on a certain geometric transformation. Then in Section 3.2, we describe a method based on additively-weighted higher-order Voronoi diagrams to find the fruitful points. In Section 3.2.1 we show how to improve the storage requirement of the method of Section 3.2 using a graph traversal technique to record certain information compactly. In Section 3.2.2 we discuss briefly how to also improve the query time by using a probabilistic method to suitably partition the points in preprocessing. (The query time remains deterministic.) Finally, in Section 3.3, we describe the overall algorithm for the variable distance 2-CTRQ problem. 3.1. Geometric transformation-based algorithm for fruitful points The main idea behind this approach is to transform the weighted blue points in R2 to (unweighted) points in R4 in such a way that the solution to a certain half-space range search problem on these points in R4 yields the desired fruitful points in R2. We propose the following transformation: 1. Point bi=(xi, yi)in R2, with weight w(bi)is mapped to point b(cid:16)i=(xi, yi, w(bi), zi)in R4, where zi=x2i +y2i −w(bi)2. 2. Query point q =(xq, yq) in R2 with query distance δ is mapped to hyperplane q(cid:16) in R4 defined by the equation a1x +a2y +a3w +a4z=c, where a1=−2xq, a2=−2yq, a3=2δ, a4=1, and c=δ2−xq2−yq2. The following lemma establishes the desired property of this transformation. Lemma 2. Consider the transformation given above. Then, for any bi∈B, d(q, bi) +w(bi) ≤δif and only if b(cid:16)iis on or below q(cid:16). Proof. We have d(q,b)+w(b)≤δ i (cid:5) i ⇔ (xq−xi)2+(yq−yi)2≤δ−w(bi) ⇔ xq2+yq2+x2i +y2i −2xqxi−2yqyi≤δ2+w(bi)2−2δw(bi) (cid:6) (cid:7) ⇔ (−2xq)xi+(−2yq)yi+(2δ)w(bi)+ x2i +y2i −w(bi)2 ≤δ2−xq2−yq2 ⇔ a1xi+a2yi+a3w(bi)+a4zi≤c ⇔ b(cid:16)isonorbelowq(cid:16) (cid:2) i Based on Lemma 2, we transform the weighted blue points in R2 to points in R4 and create a data structure for half-space range search on these points in R4. Given q and δ, we compute q(cid:16) and query the data structure with q(cid:16) to find the points that are on or below q(cid:16), hence the fruitful points. The query algorithm is shown as Algorithm 3. As mentioned in Section 1.4, for d ≥4, there is a data structure for halfspace range search in Rd that has a query time of O(n1−1/(cid:10)d/2(cid:11)logO(1)n +k) and uses O(n) space[√1]. We use this structure for DS4hs, with d =4. Therefore, the space used by this approach is O(n) and the query time is O( nlogO(1)n +kF). Lemma 3. Let Bbe a set of blue points in√ the plane, where |B| ≤n. The points of Bthat are fruitful with respect to a query point qand query distance δcan be computed in O( nlogO(1)n +kF)time and O(n)space, where kF is the number of fruitful points. Algorithm 3 :Geometric-transformation-based method. Input: DS4hs: Data structure for half space range search in R4; q: query point; δ: query distance Output: F: set of fruitful blue points 1: F←∅ 2: q(cid:16)←hyperplane in R4obtained by applying the above-mentioned transformation on qand δ //Perform half-space range search 3: F(cid:16)←set of points returned when DS4hsin queried with q(cid:16) 4: for allb(cid:16)∈F(cid:16)do i 56:: bFi←←Fp∪oin{bt ii}n R2corresponding to b(cid:16)i 7: endfor 8: return F 8 A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 3.2. Finding fruitful points via additively-weighted higher-order Voronoi diagrams Recall that each of the kF fruitful blue points, bi, satisfies a distance constraint relative to q; specifically, d(q, bi) + w(bi) ≤δ. Therefore, if the points of B are ordered by non-decreasing weighted distance from q, then the fruitful points are exactly the first kF points in this order. In other words, the fruitful points constitute the set of kF-closest neighbors of q, under the weighted distance function. Thus, we find fruitful points by finding the kF-closest neighbors of q, under weighted distance. To accomplish this, we could build on the points of B an order-kF additively-weighted Voronoi diagram in the plane [21]. This diagram partitions the plane into cells and associates with each cell a set of kF blue points (called generators) that are the kF-closest neighbors, under weighted distance, of any point p ∈R2 lying in the cell. (The ordering of the generators can be different for different points p in the cell.) Thus, the generators associated with the cell containing q (determined via point location in the diagram) constitute the desired fruitful points. Unfortunately, this approach is not feasible since q and δ are not known beforehand, hence kF is not known at prepro- cessing time. Therefore, we build the following data structure. We create a complete binary tree, T, on the points of B where at most h =clogn points, in any order, are stored at the leaves, for some constant c≥1. (In Section 3.2.2, we show that a more careful distribution of points can further improve the query time.) At each non-leaf node v ∈T (including the root), we store an order-h additively-weighted Voronoi diagram V (v) built on the set, B(v), of points stored at the leaves h of the subtree rooted at v. We also build a data structure for doing point location in V (v). h Given q and δ, we explore T, starting at the root, to determine the fruitful points as follows. At node v, if v is a leaf node then we compute the weighted distance from q to the points stored at v, and report the points with distance at most δ as fruitful points. Otherwise, we query V (v). This involves locating q in V (v), retrieving the h generators of the cell h h containing q, computing the weighted distance from q to each generator, and checking this distance against δ. If there is agenerator with weighted distance larger than δ, then kv<h, where kv is the number of fruitful points in B(v). We report the generators that have weighted distance at most δ from q as fruitful points and abandon searching below v. Otherwise, kv≥h. In this case, we do not report anything, but instead continue to explore the two children of v. We analyze the query time as follows. For a given q and δ, each node v of T that is explored in the search is of one of two types: (i) v is a non-leaf node for which kv ≥h. We do not report any fruitful points at v but instead explore v’s children. We call v a non-terminalnode. And(ii) v is a non-leaf for which kv<h or a leaf. We report fruitful points (if any) at v and abandon the search below v. We call v a terminalnode. The time spent at a non-terminal node, for point location and for checking the generators, is O(logn +h) =O(logn). Similarly, for the time spent at a terminal node that is not a leaf of T. At a terminal node that is a leaf, we spend O(h) =O(logn)time. Therefore, to complete the query time analysis, we need to only upper-bound the number of terminal and non-terminal nodes. The number of terminal nodes is at most twice the number of non-terminal nodes, since a terminal node that is not the root has a non-terminal node as parent and since T is binary. Therefore the total number of terminal and non-terminal nodes is at most three times the number of non-terminal nodes plus one. (The one extra node in the count handles the case where the root is the only terminal node in the search.) To upper-bound the number of non-terminal nodes, consider any such node v. We charge one unit for v and distribute this charge uniformly over the kv ≥h fruitful points in v’s subtree, which results in a charge of at most 1/h to each of these fruitful points. In the worst case, any fruitful point in T is charged at most 1/h in this way for each non-terminal node to whose subtree it belongs. Since the height of T is O(logn), this results in a total charge of O((logn)/h) =O(1) per fruitful point. It follows that the total number of terminal and non-terminal nodes is O(1 +kF). Thus, the query time is O((1 +kF) logn). The size of V (v)at a node v, inclusive of the hgenerators stored with each cell, is O(h2(|B(v)| −h)) =O(h2|B(v)|)[21]. h For nodes v at the same depth in T, the sets B(v) are pairwise disjoint. Therefore, the total size of V (v) at these nodes v h is O(h2n). Since the maximum depth in T is O(logn), the total space used is O(h2n logn) =O(n log3n). 3.2.1. Reducing the space used The space usage can be improved as follows. Note that the size of the symmetric difference of the sets of the generators associated with any two consecutive cells of a V (·), is two, i.e., consecutive cells have h −1 common generators. Based on h this observation, we do not store the list of h generators with each cell; instead, we create a query data structure to find the list of generators for a cell. We first create a dual graph G of V (·), where each vertex of G corresponds to a cell in V (·) h h and two vertices in G are connected by an edge if and only if the corresponding cells in V (·) share an edge. Note that the h number of vertices in G is the same as the number of cells in V (·), which we denote by |V (·)|. Next, we create aspanning h h tree of G and duplicate each edge in it and perform an Euler tour on it. (An Euler tour exists because the resulting graph is undirected and all the vertices have even degree.) We record the “time” when a vertex is visited during the tour using a sequence of integers 1, 2, ..., 2|V (·)| −1. (The h largest label is easily seen to be 2|V (·)| −1.) During the traversal, we maintain a list of h “active” generators, i.e., generators h associated with the Voronoi cell corresponding to the current vertex in tour. Each edge traversal results in deletion of a generator and insertion of another generator in this list. With each generator, we associate a time interval when it was active during the traversal. Note that each edge traversal ends an interval for a generator (deletion of a generator) and A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 9 starts a new interval for another generator (insertion of a generator). Therefore, the total number of intervals is O(|V (·)|). h We store these intervals in an interval tree TI. With each Voronoi cell, we also store a time-stamp of the visit of the corresponding vertex of the dual graph during the traversal. (If multiple time-stamps are associated with a vertex, we pick any one.) To find the generators of the cell containing q, we query TI with the time-stamp associated with the cell to find the intervals stabbed by it and report the generators corresponding to the stabbed intervals. Since there are O(|V (·)|)intervals, h TI uses O(|Vh(·)|) space and reports the intervals that are stabbed in O(log|Vh(·)| +h) =O(logn) time. With this technique, the space at any node v of T (inclusive of the interval tree) is O(h(|B(v)| −h)) = O(h|B(v)|), rather than O(h2|B(v)|) [21], since generators are not stored explicitly with each cell of V (v). This results in an overall h space bound of O(n log2n). Lemma 4. Let Bbe a set of blue points in the plane, where |B| ≤n. The points of Bthat are fruitful with respect to a query point qand query distance δcan be computed in O((1 +kF) logn)time and O(n log2n)space, where kF is the number of fruitful points. 3.2.2. Improving the query time We describe a different way of creating the data structure described in Section 3.2 so that the desired fruitful points can be found in O(logn +kF) time. The data structure is built in preprocessing using a probabilistic method; however, the query time is deterministic. The approach follows closely the techniques used by Chazelle et al. [5], so our discussion here is brief. We refer the reader to[5]for details. One drawback of assigning h =clogn points in any order to the leaf nodes of T is that, for some q and δ, it is possible that the number of fruitful points is h (for example) and all of them are stored in a single leaf node. In this case the fruitful points are each assigned a charge of 1/h at O(logn) nodes, which is too much as it leads to the O((1 +kF) logn) query time shown in Section3.2.1. To avoid this situation, we use a probabilistic approach to assign points to the leaf nodes of T is such a way that for any pair of intermediate nodes v, w ∈T, if w is a child of v, then the following holds true for any q: (cid:8) (cid:6) (cid:7) (cid:6) (cid:7)(cid:8) (cid:8) (cid:8) (cid:8)Nh(w) B(w),q −Nh(v) B(v),q (cid:8)≥dlog(cid:8)B(w)(cid:8), where Nh(·)(B(·), q) represents the set of h(·) =clog|B(·)| points in B(·) that are closest to q and d >0 is a constant. This condition ensures that sufficiently many new fruitful points are “exposed” at w to absorb the cost of exploring at w. It has been shown by Chazelle et al. [5], in the context of circular range search, that there exist constants c, d, and n0 such that for n ≥n0, the desired assignment is possible with probability at least 1/2. Their result does not depend on the underlying distance metric and, therefore, is valid for our weighted distance too. Based on this discussion, we create the data structure mentioned in Section 3.2 with the following modifications. We create a complete binary tree T on points of B and assign max{h, n0} points of B to the leaves of T as described above. Also, at each intermediate node v ∈T, we build an order-h(v) additively-weighted Voronoi diagram, Vh(v)(v), on B(v) and a point location data structure on Vh(v)(v). The query process remains the same. It is clear that the space usage for this data structure remains O(n log2n). By an analysis similar to the one in [5] it can be shown that the query time for this approach is O(logn +kF). Lemma 5. Let Bbe a set of blue points in the plane, where |B| ≤n. The points of Bthat are fruitful with respect to a query point qand query distance δcan be computed in O(kF +logn)time using a data structure of size O(n log2n)computed probabilistically, where kF is the number of fruitful points. 3.3. Putting it all together: The overall algorithm for variable distance 2-CTRQ Recall that the first step is to find the fruitful blue points with respect to q and δ and the second step is to explore from each fruitful blue point and form and report (blue,red) tuples. Section 3.1, Section 3.2.1, and Section 3.2.2 have addressed the first problem and the bounds are summarized in Lemma 3, Lemma4, and Lemma 5, respectively. The second step is identical to the second step for the fixed distance 2-CTRQ problem in Section 2. Theorem 1 summa- rizes the bounds for the fixed distance2-CTRQproblem and these are also the bounds for the second step for that problem. For simplicity, we will focus on the pair of bounds at the extremes, i.e., O(n) (resp. O(n logn)) space and O((k +1) logn) (resp. O(logn +k)) query time, corresponding to the parameter value t=1 (resp. t=logn), although it is possible to consider bounds corresponding to other values of t as well. It is now straightforward to combine a pair of bounds, one for each step, to obtain the overall bounds for the variable distance 2-CTRQ problem. (Note that the number, kF, of fruitful points is at most the output size k.) Theorem 2 below lists the four possibilities. (Of the six possibilities, two have the same space and query time bounds and one is strictly worse than the others; these have been eliminated.) √ Theorem 2. A variable distance 2-C√TRQ query on a set of nred and blue points can be answered in either (i)O( nlogO(1)n +k logn) time using O(n) space, or (ii) O( nlogO(1)n +k) time using O(n logn) space, or (iii) O((1 +k) logn) time using O(n log2n) 10 A.Agrawaletal./JournalofDiscreteAlgorithms30(2015)1–12 space, or (iv) O(logn +k)time using O(n log2n)space. (The bounds in (iv)involve a data structure that is built probabilistically in preprocessing; the query time is deterministic.) 4. Handling more than two colors In this section, we discuss an approach to solve the CTRQ problem for m >2 colors using the solution for 2-CTRQ. Recall the CTRQ problem for m >2 colors: Let S be a set of n points in R2, where each point is assigned a color ci from a(cid:2)palette of m colors (m >2). Let Ci be the set of points of color ci and let ni =|Ci|. Note that Ci∩Cj=∅ if i (cid:6)= j and mi=1Ci=S. Let CS=(c1, ..., cm) be a given ordering of the colors. We wish to preprocess S into a suitable data structure so that for(cid:3) any query point q in R2 and a distance δ>0, we can report all tuples (p1, ..., pm), where pi∈Ci, such that d(q, p1) + mi=2d(pi−1, pi) ≤δ. Note that here δ>0 can be fixed or variable. As before, we present a two-step solution for this problem: In the first step, we find the set of fruitful points of color c1 and, in the second step, we explore from each such fruitful point the points of color c2, ..., cm to form and report the tuples. However, the challenge now is that we have more than two colors and therefore Lemma1cannot be applied directly to assign weights to the points of color c1. Also, due to the same reason, in the second step we cannot use circular range search to explore the points of color c2, ..., cm to form and report the tuples. In the rest of this section we discuss how to overcome these issues. As mentioned in Section1.3, the idea of fruitful points is central to our approach. As in the case of two colors, we define fruitful points for m >2 colors as follows: For a given ordering, CS=(c1, ..., cm), a point of color c1 is fruitful if it is part of at least one output tuple for the query point, q, and query distance δ. The following lemma provides a generalization of Lemma1for m >2 colors. (cid:3) Lemma 6. For any p1∈C1, let (p1, ..., pm)be an m-tuple of C1×·(cid:3)··×Cmthat minimizes mi=2d(pi−1, pi). Let qbe a query point and δthe query distance. Then p1is fruitful if and only if d(q, p1) + mi=2d(pi−1, pi) ≤δ. (cid:3) (cid:3)Promi=(o⇒3f.d)(( p⇐S(cid:16)i−u)p1 I,pf pod(cid:16)is()eq ≤ , ppδ11. )S +iisn cfer umi(cid:3)=it2fmidu=(l2.p diT(−ph1iu,− sp1,i ,) t p≤hi)eδ r≤et hdie(spn 1 at,h pet(cid:16)2u )tp u+lpel(cid:3) e( mi(p=p131, d,p .((cid:16)2.p,.(cid:16) i.−,. p1.,m, pp)(cid:16)im(cid:16))is,) wotueht aphtua vti.es H deo(nuqct,p ep,u 1tb). y + Td(cid:3)heufimisn=, i2tdido((nqp,, i −pp111,) ips+i )f rd≤u(ipδtf1.u,l p.(cid:2)(cid:16)2) + Lemma 6 suggests the following preprocessing step to determine the (cid:3)fruitful points at query time. For each p1∈C1, we comp(cid:3)ute an m-tuple, (p1, ..., pm), of C1×···×Cm that minimizes mi=2d(pi−1, pi) and associate with p1 the real number mi=2d(pi−1, pi)as a weight w(p1). The following lemma provides a useful characterization of the desired m-tuple, (p1, ..., pm). (cid:3) Lemma 7. For some p1∈C1, let P =(p1, ..., pm)be an m-tuple of C1×···×Cm that minimizes(cid:3) mi=2d(pi−1, pi). Then for the point pi∈Ci, 1 ≤i <m, Pi=(pi, ..., pm)is an (m −i +1)-tuple of Ci×···×Cmthat minimizes mj=i+1d(pj−1, pj). Proof. The case i =1 is true by assumption since (cid:3)P1=P. Let i >1 and assume th(cid:3)at Pi is not optimal (cid:3)for pi and let Pi(cid:16)= ((cid:3)pi, p(cid:16)i+1, ..., pm(cid:16) )∈Ci(cid:3)×···×Cm be optimal. Then ij=2d(pj−1, pj) +d(pi, p(cid:16)i+1) + mj=i+2d(p(cid:16)j−1, p(cid:16)j) < ij=2d(pj−1, pj) + mj=i+1d(pj−1, pj) = mj=2d(pj−1, pj), which contradicts the optimality of P. (cid:2) Consider the optimal tuple P=(p1, ..., pm)defined in Lemma7. Then Lemma7implies that for the point pm−1∈Cm−1, the point pm∈Cm minimizes d(pm−1, pm), i.e., pm is a nearest neighbor of pm−1. Similarly, for the point pm−2∈Cm−2, the point pm−1∈Cm−1 minimizes d(pm−2, pm−1) +d(pm−1, pm) =d(pm−2, pm−1) +w(pm−1), where w(pm−1) =d(pm−1, pm) is a weightassigned t(cid:3)o pm−1. In general, for the point pi∈Ci, 1 ≤i <m, the point pi+1∈Ci+1 minimizes d(pi, pi+1) +w(pi+1), where w(pi+1) = mj=i+2d(pj−1, pj) is the weight of pi+1, i.e., w(pi+1) is the length of the shortest path from pi+1 that visits one point from each of Ci+2, ..., Cm, in that order. (For notational convenience, we take w(pm) =0.) Thus, pi+1 minimizes the sum of w(pi+1) and the Euclidean distance between pi and pi+1. Our goal is to eventually assign weights (as defined above) to the points of C1. Based on the above discussion, we preprocess the sets Cm−1, Cm−2, ..., C1, in that order, and assign weights to the points of each set. For any point pi∈Ci, 1 ≤i <m, w(pi) is computed by finding the point pi+1∈Ci+1 that minimizes the sum of w(pi+1) and the Euclidean distance between pi and pi+1. This involves building an additively-weighted Voronoi diagram on the points of Ci+1, using the weights found in the previous step (or weight zero if i +1 =m) and querying this with each point of Ci. With this weight assignment, the problem of finding fruitful points of C1 for a q and δ boils down to finding (similar to the case of m =2 colors) a set of weighted points p1∈C1 such that d(q, p1) +w(p1) ≤δ and, therefore, we can find the fruitful points by using the solution in Section2or Section3(for the fixed distance or the variable distanceCTRQproblem, respectively). Recall that the second step is to explore the points of color c2, ..., cm to form and report the tuples, i(cid:3).e., for each fruitful point p1∈C1, the goal is to identify all the (m −1)-tuples (p2, ..., pm) ∈C2×...×Cm such that d(q, p1) + mi=2d(pi−1, pi)≤δ,
Description: