Informática 30 (2006) 289-293 289 Expected Case for Projecting Points Sergio Cabello and Matt DeVos Institute for Mathematics, Physics and Mechanics, Ljubljana, Slovenia E-mail: cabello@imfm.uni-lj.si Bojan Mohar Faculty of Mathematics and Physics, Ljubljana, Slovenia E-mail: bojan.mohar@uni-lj.si Keywords: randomized algorithm, unit distance, closest pair Received: June 14, 2005 Consider a set of n points in the plane with the property that any pair of points is at least at distance one. We study the expected concentration of the point set afterprojecting it onto a random graduated line. There is a lower bound of n log n) given byMatousekin [4], and we provide an upper bound of O(n2/3). Povzetek: Analizirana je gostota tock v ravniniz razdaljo najmanj ena. 1 Introduction Let P be a set of n points in the plane. For a line L C R2, we can project the points P orthogonally onto L, which we denote by nL(P). Imagine that the line L is a graduated line, that is, a line decomposed into line segments (cells) of length one. For a cell c C L, let Pop(P, c) be the population of the cell c after the projection, that is Pop(P,c) = \{p e P \ nL(p) e c}|. For a graduated line L, we say that its concentration Conc(P, L) is the number of points that its most populated cell gets; that is, Conc(P, L) = max {Pop(P,c)}. c a cell of L In a recent paper, Díaz et al. [3] consider the algorithmic problem of computing a graduated line that minimizes the concentration, that is, they are interested in Conc(P) = min¿ Conc(P, L). However, an asymptotically equivalent problem was considered by Kucera et al. [4] when studying a map labelling problem. Here we are interested in the expected concentration that a point set has when projecting onto a random graduated line. Let L(a) be a graduated line through the origin with angle a with respect to the x-axis, and such that the origin is the boundary of a cell. We are interested in the expected concentration EConc(P) over all lines L(a) EConc(P) = Ea [Conc(P, L(a))], where a is chosen uniformly at random. Let us observe that, for an asymptotic bound on EConc(P), it is equivalent to consider that the lines L(a) pass through some other point of R2 instead of the origin. If the point set P is arbitrarily dense, then it may be that Conc(P, L) > n/2 for any line L, and so EConc(P) = Q(n). However, the problem becomes non-trivial if we put restrictions to the density of the point set. Definition 1. A point set P C R2 is 1-separated if its closest pair is at least at distance 1. Our objective1,2 is to bound the value EConc(P) for any 1-separated point set. Kucera et al. [4] have shown that Conc(P) = O(yJn log n) for any 1-separated point set P. More interestingly, they use Besicovitch's sets [1] for constructing 1-separated point sets P having Conc(P) = nlogn), which implies EConc(P) = nlogn). We will show that for any 1-separated point set P we have EConc(P) = O(n2/3). Therefore, it remains open to find tight bounds for EConc(P). The rationale behind considering projections onto random lines is the efficiency of randomized algorithms whose running time depends on the expected concentration. As an example, consider a set of disjoint unit disks and any sweep-line algorithm [2, Chapter 2] whose running time depends on the maximum number of disks that are intersected by the sweep line. Choosing the direction in which the line sweeps affects the running time, but computing the best direction, or an approximation, is expensive: Kucera et al. [4] claim that it can be done in polynomial time, and Diaz et al. [3] give a constant-factor approximation algorithm with O(nt log nt) running time, where t is the diameter of P. By choosing a random projection we avoid having to compute a good direction for projecting, and we get a randomized algorithm. The results in this paper become helpful for analyzing the expected running time of such randomized algorithms. The rest of the paper is organized as follows. In Section 2 we introduce some relevant random variables and give some basic facts. In Sections 3 and 4 we bound 1 Supported by the European Community Sixth Framework Programme under a Marie Curie Intra-European Fellowship. 2Supported in part by the Ministry of Education, Science and Sport of Slovenia, Research Program P1-0507-0101. 290 Informática 30 (2006) 289-293 S. Cabello et al. EConc(P) using the first and second moments, respectively. 2 Preliminaries Let P = {p\,... ,pn} be a 1-separated point set, and let di}j = d(pi,pj). We use the notation [n] = {1,...,n}. Without loss of generality, we can restrict ourselves to graduated lines passing through the origin. Let L(a) be the line passing through the origin that has angle a with the x-axis, and let p* (a) be the orthogonal projection of a point p onto L(a). Consider the following random variables for the angle a Xi,j (a) 1 if d(p*(a),p*(a)) < 1, 0 otherwise; pXij = i] = n = 11/di, □ 2 < E [XmaX(a)] < 2 EConc(P). 3 Using the first moment Using that the closest pair of P is at least one apart, we get the following result. Lemma 3. For every i G [n], we have E dr.— je[«]\{¿} Proof. Without loss of generality, assume that i — n. Let nd be the number of points in P whose distance from pn is in the interval [d,d +1). We have 1 œ / 1 E = E I E d~ je[n-1] Xiio.) — ^ Xij (a); j=i Xmax(a) — max{Xi(a),.. .,Xn(a)}; n n n X (a) — J2 Xi(a) — ^J2 Xi,j (a), i=1 i=1 j=1 where a is chosen uniformly at random from the values [0, n). In words: Xijj is the indicator variable for the event that p* (a) andp* (a) are at distance at most one in the projection; Xi is the number of points (including pi itself) whose projection is at distance at most one from p*(a); Xmax is the maximum among X1,... ,Xn; and X counts twice the number of pairs of points at distance at most one in the projection. It is clear that P[Xi i — 1] — 1 for any i G [n]. Otherwise we have the following result. Lemma 1. If i — j, then t 2arcsin1/dij P[Xij — 1] — -'-iA. n Proof. Assume without loss of generality that pi is placed at the origin andpj is vertically above it, on the y-axis. See Figure 1. We may also assume that the line L(a) passes through pi. Because di j > 1, there are values a such that Xi j (a) — 1. The angles that make Xi j (a) — 1 are indicated in the figure. In particular, if 3 is the angle indicated in the figure, and we choose a uniformly at random, then P[Xi , j — 1] — n .The angle 3 is such that sin 3 — j—, and so 3 — arcsin -j—. We conclude that d=1 \ditj e[d, d+1) œ I t/2 for all pj e P We conclude that X(a) > £ Xj (a) > £ t/2 > t/2 • \P\ > t2/4, pj ^P Pj and the claim is proved. We have shown that for any value t > 0 we have [Xmax > t] C [X > t2 /4], and using Markov's inequality we conclude P[Xm0x > t]< P[X > t2/4]< 4EX] < O(n^ t2 t2 Let r = \n^/4\. Since Xmax only takes natural numbers, we have (L je[n]\{i} and using Lemma 3 we conclude that E[Xi] = O(y/n). □ Using the first moment method, we can show that for any 1-separated point set P it holds that EConc(P) = O(n3/4). For this, consider a 1-separated point set P and its associated random variable X. We have X = Xi, and because of Lemma 4 we conclude E[X] = O(^%/n). We claim that, for any value t > 0, if we have Xmax(a) > t, then X(a) > t2/4. Intuitively, if some Xi = t, then there are ©(t2) pairs of points at distance at most one from each other, and so contributing to X. The formal proof of the claim is as follows. Let i be an index such that Xi(a) > t. Then, either to the right or to the left of p*(a), the projection of pi onto L(a), there are at least t/2 points p* (a) at distance at most one from p*(a). Assume that those points are to the left and let P c P be the set of those points. We have |P| > t/2. For any pj ,pj' e P we have Xjj> (a) = 1, and therefore we have ] =53 P[Xmax > t] t=1 r n = Y; P[Xmax > t] + P[Xmax > t] t=1 rn < E i+ E t=r+l O(n^/n) t=1 t=r+1 t2 1 r 1 < r + O(nVn) j t2dt < n3/4 + O(nVn)(1 - -) rn = O(n3/4). Using Lemma 2 it follows that EConc(P) = O(n3/4). However, observe that this bound will be improved in next section. We would like to point out that the random variables Xi do not have a strong concentration around their expectation. Therefore, we cannot use many of the results based on concentration of the measure that would reduce the bound on EConc(P). To see this, consider the example in Figure 2. The point pi is the center of a disc of n 292 Informatica 30 (2006) 289-293 S. Cabello et al. Figure 2: Example showing that Xi is not concentrated around its expectation. radius n3/4, and we consider a circular sector with arc-length n1/4. This region is grey in the picture. Imagine that we place a densest 1-separated point set P inside the grey region. Asymptotically, since the region has area Q(n), such a point set P has Q(n) points. Consider the lines L(a + n/2) passing through pi. If a is chosen uniformly at random, the line L(a) intersects the grey region with probability n1/4/(2nnji/4) = dikk whenever j > k; that is, the points are indexed according to their distance from pi. Like above, we assume that the line L(a) passes through pi. We have Xi,j Xi,k (a) = O(di,j). Therefore E [ Xitj^2 Xi,k ] = J21 ■ P k 0, if we have Xmax(a) > t, then T(a) > t3/8. The proof is as follows. Let i be an index such that Xi (a) > t. Then, either to the right or to the left of p*(a), the projection of pi onto L(a), there are at least t/2 points p*(a.) at distance at most one i,j Xi,k from p*(a). Assume that those points are to the left and let k t/2. For any pj ,pj> e P we have Xj,j' (a) = 1. Therefore for all = O(1), and so the pj e P we have Xj (a) > t/2, and X2(a) > t2/4. We 2 XIX! Xi j Xi k j k di, k. Therefore, by the way we indexed the points, we conclude that, if Xi, j (a) = 1, then w2kJ2 Xj (a) Pj eP > X t2/4 > t2/4 ■P Pj ep > t3/8, and the claim is proved. EXPECTED CASE FOR PROJECTING POINTS Informatica 30 (2006) 289-293 293 L(a) Figure 3: For the proof of Lemma 5. For any angle a we have (Xi,jEk 0 we have [Xmax > t] C [T > t3/8], and using Markov's inequality we conclude P[A_ > t]< P[T > |3/8]< 8E|3T] < ^. Let r = \n2/3\. Since Xmax only takes natural numbers, we have ] = X/ P[Xmax > t] t=1 P[Xmax > t] t=1 t=r+l ' ' O(n2) t3 E t=1 t=r+1 i'n 1 < r + O(n2) J dt < n2/3 + O(n2 )( ± - n2 = O(n2/3). 2 2 E[X3] = ©(n2), and in general E[Xf] = ©(np-1) for all naturals p > 2. From this we can only conclude weaker results of the type EConc(P) = O(np/(p+1) ). Conclusions We have studied the expected concentration of projecting 1-separated point sets onto random lines, a parameter that is relevant for sweep-line algorithms when the direction for sweeping is chosen at random. We have shown that, if P consists of n points, the expected concentration EConc(P) is O(n2/3), while the best known lower bound is Q(y/n log n). Therefore, it remains to close this gap. Acknowledgements The authors are grateful to Jin Matousek for the key reference [4]. Sergio is also grateful to Christian Knauer for early discussions. References [1] A. S. Besicovitch. The Kakeya problem. The American Mathematical Monthly, 70:697-706, 1963. [2] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, Berlin, Germany, 2nd edition, 2000. [3] J.M. Díaz and F. Hurtado and M. López and J.A. Sellares. Optimal point set projections onto regular grids. In T. Ibaraki et al., editor, 14th Inter. Symp. on Algorithms and Computation, volume 2906 of LNCS, pages 270-279. Springer Verlag, 2003. [4] L. Kucera, K. Mehlhorn, B. Preis, and E. Schwarze-necker. Exact algorithms for a geometric packing problem. In Proc. 10th Sympos. Theoret. Aspects Comput. Sci., volume 665 of Lecture Notes Comput. Sci., pages 317-322. Springer-Verlag, 1993. Using Lemma 2 it follows that EConc(P) = O(n2/3). □ Trying to use the same ideas with higher moments of Xi does not help. Consider for example the 1-separated point set P consisting of all n points in a horizontal row of length n, and let p1 be the leftmost point. We have