Zbornik 19. mednarodne multikonference INFORMACIJSKA DRUŽBA - IS 2016 Zvezek H Proceedings of the 19th International Multiconference INFORMATION SOCIETY - IS 2016 Volume H Middle-European Conference on Applied Theoretical Computer Science (MATCOS 2016) Uredil / Edited by Prof. Andrej Brodnik http://is.ijs.si 12.-13. oktober 2016 / 12-13 October 2016 Ljubljana, Slovenia Zbornik 19. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2016 Zvezek H Proceedings of the 19th International Multiconference INFORMATION SOCIETY – IS 2016 Volume H Middle-European Conference on Applied Theoretical Computer Science (MATCOS 2016) Uredil / Edited by Prof. Andrej Brodnik 12.-13. oktober 2016 / 12-13 October 2016 Ljubljana, Slovenia Urednik: Andrej Brodnik Univerza v Ljubljani, Fakulteta za računalništvo in informatiko Univerza na primorskem, Fakulteta za matematiko, naravoslovje in informacijske tehnologije Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2016 CIP - Kataložni zapis o publikaciji Narodna in univerzitetna knjižnica, Ljubljana 004(082)(0.034.2) MIDDLE-European Conference on Applied Theoretical Computer Science (2016 ; Ljubljana) Middle-European Conference on Applied Theoretical Computer Science (MATCOS 2016) [Elektronski vir] : zbornik 19. mednarodne multikonference Informacijska družba - IS 2016, 12.-13. oktober 2016, [Ljubljana, Slovenija] = proceedings of the 19th International Multiconference Information Society - IS 2016, 12.-13 October 2016, Ljubljana, Slovenia : zvezek H = volume H / uredil, edited by Andrej Brodnik. - El. zbornik. - Ljubljana : Institut Jožef Stefan, 2016 Način dostopa (URL): http://library.ijs.si/Stacks/Proceedings/InformationSociety/2016/IS2016_Volume_H% 20-%20MATCOS.pdf ISBN 978-961-264-104-7 (pdf) 1. Gl. stv. nasl. 2. Brodnik, Andrej 3. Mednarodna multikonferenca Informacijska družba (19 ; 2016 ; Ljubljana) 29879847 PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2016 Multikonferenca Informacijska družba (http://is.ijs.si) je z devetnajsto zaporedno prireditvijo osrednji srednjeevropski dogodek na področju informacijske družbe, računalništva in informatike. Letošnja prireditev je ponovno na več lokacijah, osrednji dogodki pa so na Institutu »Jožef Stefan«. Informacijska družba, znanje in umetna inteligenca so spet na razpotju tako same zase kot glede vpliva na človeški razvoj. Se bo eksponentna rast elektronike po Moorovem zakonu nadaljevala ali stagnirala? Bo umetna inteligenca nadaljevala svoj neverjetni razvoj in premagovala ljudi na čedalje več področjih in s tem omogočila razcvet civilizacije, ali pa bo eksponentna rast prebivalstva zlasti v Afriki povzročila zadušitev rasti? Čedalje več pokazateljev kaže v oba ekstrema – da prehajamo v naslednje civilizacijsko obdobje, hkrati pa so planetarni konflikti sodobne družbe čedalje težje obvladljivi. Letos smo v multikonferenco povezali dvanajst odličnih neodvisnih konferenc. Predstavljenih bo okoli 200 predstavitev, povzetkov in referatov v okviru samostojnih konferenc in delavnic. Prireditev bodo spremljale okrogle mize in razprave ter posebni dogodki, kot je svečana podelitev nagrad. Izbrani prispevki bodo izšli tudi v posebni številki revije Informatica, ki se ponaša z 39-letno tradicijo odlične znanstvene revije. Naslednje leto bo torej konferenca praznovala 20 let in revija 40 let, kar je za področje informacijske družbe častitljiv dosežek. Multikonferenco Informacijska družba 2016 sestavljajo naslednje samostojne konference: • 25-letnica prve internetne povezave v Sloveniji • Slovenska konferenca o umetni inteligenci • Kognitivna znanost • Izkopavanje znanja in podatkovna skladišča • Sodelovanje, programska oprema in storitve v informacijski družbi • Vzgoja in izobraževanje v informacijski družbi • Delavnica »EM-zdravje« • Delavnica »E-heritage« • Tretja študentska računalniška konferenca • Računalništvo in informatika: včeraj za jutri • Interakcija človek-računalnik v informacijski družbi • Uporabno teoretično računalništvo (MATCOS 2016). Soorganizatorji in podporniki konference so različne raziskovalne institucije in združenja, med njimi tudi ACM Slovenija, SLAIS, DKZ in druga slovenska nacionalna akademija, Inženirska akademija Slovenije (IAS). V imenu organizatorjev konference se zahvaljujemo združenjem in inštitucijam, še posebej pa udeležencem za njihove dragocene prispevke in priložnost, da z nami delijo svoje izkušnje o informacijski družbi. Zahvaljujemo se tudi recenzentom za njihovo pomoč pri recenziranju. V 2016 bomo četrtič podelili nagrado za življenjske dosežke v čast Donalda Michija in Alana Turinga. Nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe bo prejel prof. dr. Tomaž Pisanski. Priznanje za dosežek leta bo pripadlo prof. dr. Blažu Zupanu. Že šestič podeljujemo nagradi »informacijska limona« in »informacijska jagoda« za najbolj (ne)uspešne poteze v zvezi z informacijsko družbo. Limono je dobilo ponovno padanje Slovenije na lestvicah informacijske družbe, jagodo pa informacijska podpora Pediatrične klinike. Čestitke nagrajencem! Bojan Orel, predsednik programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD - INFORMATION SOCIETY 2016 In its 19th year, the Information Society Multiconference (http://is.ijs.si) remains one of the leading conferences in Central Europe devoted to information society, computer science and informatics. In 2016 it is organized at various locations, with the main events at the Jožef Stefan Institute. The pace of progress of information society, knowledge and artificial intelligence is speeding up, but it seems we are again at a turning point. Will the progress of electronics continue according to the Moore’s law or will it start stagnating? Will AI continue to outperform humans at more and more activities and in this way enable the predicted unseen human progress, or will the growth of human population in particular in Africa cause global decline? Both extremes seem more and more likely – fantastic human progress and planetary decline caused by humans destroying our environment and each other. The Multiconference is running in parallel sessions with 200 presentations of scientific papers at twelve conferences, round tables, workshops and award ceremonies. Selected papers will be published in the Informatica journal, which has 39 years of tradition of excellent research publication. Next year, the conference will celebrate 20 years and the journal 40 years – a remarkable achievement. The Information Society 2016 Multiconference consists of the following conferences: • 25th Anniversary of First Internet Connection in Slovenia • Slovenian Conference on Artificial Intelligence • Cognitive Science • Data Mining and Data Warehouses • Collaboration, Software and Services in Information Society • Education in Information Society • Workshop Electronic and Mobile Health • Workshop »E-heritage« • 3st Student Computer Science Research Conference • Computer Science and Informatics: Yesterday for Tomorrow • Human-Computer Interaction in Information Society • Middle-European Conference on Applied Theoretical Computer Science (Matcos 2016) The Multiconference is co-organized and supported by several major research institutions and societies, among them ACM Slovenia, i.e. the Slovenian chapter of the ACM, SLAIS, DKZ and the second national engineering academy, the Slovenian Engineering Academy. In the name of the conference organizers we thank all the societies and institutions, and particularly all the participants for their valuable contribution and their interest in this event, and the reviewers for their thorough reviews. For the fourth year, the award for life-long outstanding contributions will be delivered in memory of Donald Michie and Alan Turing. The Michie-Turing award will be given to Prof. Tomaž Pisanski for his life-long outstanding contribution to the development and promotion of information society in our country. In addition, an award for current achievements will be given to Prof. Blaž Zupan. The information lemon goes to another fall in the Slovenian international ratings on information society, while the information strawberry is awarded for the information system at the Pediatric Clinic. Congratulations! Bojan Orel, Programme Committee Chair Matjaž Gams, Organizing Committee Chair ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Robert Blatnik Vesna Hljuz Dobric, Croatia Aleš Tavčar Alfred Inselberg, Israel Blaž Mahnič Jay Liebowitz, USA Jure Šorn Huan Liu, Singapore Mario Konecki Henz Martin, Germany Marcin Paprzycki, USA Karl Pribram, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Programme Committee Bojan Orel, chair Andrej Gams Vladislav Rajkovič Grega Nikolaj Zimic, co-chair Matjaž Gams Repovš Franc Solina, co-chair Marko Grobelnik Ivan Rozman Viljan Mahnič, co-chair Nikola Guid Niko Schlamberger Cene Bavec, co-chair Marjan Heričko Stanko Strmčnik Tomaž Kalin, co-chair Borka Jerman Blažič Džonova Jurij Šilc Jozsef Györkös, co-chair Gorazd Kandus Jurij Tasič Tadej Bajd Urban Kordeš Denis Trček Jaroslav Berce Marjan Krisper Andrej Ule Mojca Bernik Andrej Kuščer Tanja Urbančič Marko Bohanec Jadran Lenarčič Boštjan Vilfan Ivan Bratko Borut Likar Baldomir Zajc Andrej Brodnik Janez Malačič Blaž Zupan Dušan Caf Olga Markič Boris Žemva Saša Divjak Dunja Mladenič Leon Žlajpah Tomaž Erjavec Franc Novak Bogdan Filipič iii iv KAZALO / TABLE OF CONTENTS Middle-European Conference on Applied Theoretical Computer Science (MATCOS 2016) ............................... 1 PREDGOVOR / FOREWORD ................................................................................................................................. 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ..................................................................................... 4 Industrial and Medical Applications ......................................................................................................................... 5 Customizing Hybrid Optimization for Microwave Tomography / Subotić Miloš, Palfi Laszlo, Pjevalica Nebojša ......................................................................................................................................................... 5 Schedule Assignment for Vehicles in Inter-City Bus Transportation over a Planning Period / Dávid Balázs ............................................................................................................................................................ 9 A Self-Bounding Branch & Bound Procedure for Truck Routing and Scheduling / Csehi Csongor Gy., Farkas Márk, Tóth Ádám ............................................................................................................................. 13 Improving Flow Lines by Unbalance / Mihály Zsolt, Lelkes Zoltán.................................................................. 16 Process Network Solution of a Soleplate Manufacturer's Extended CPM Problem with Alternatives / Vincze Nándor, Ercsey Zsolt, Kovács Zoltán .............................................................................................. 20 Algorithm design and evaluation ........................................................................................................................... 24 ON NIST Test of a Novel Cryptosystem Based on Automata Compositions / Dömösi Pál, Gál József, Horváth Géza, Tihanyi Norbert ................................................................................................................... 24 ALGator - An Automatic Algorithm Evaluation System / Dobravec Tomaž ..................................................... 28 A Graph to the Pairing strategies of the 9-in-a-Row Game / Győrffy Lajos, London András, Makay Géza ............................................................................................................................................................ 32 Construction of Orthogonal CC-Set / Brodnik Andrej, Jovičić Vladan, Palangetić Marko, Silai Daniel ........... 36 Usage of Hereditary Colorings of Product Graphs in Clique Search Programs / Depol i Matjaž, Konc Janez, Szabo Sandor, Zavalnij Bogdan ...................................................................................................... 40 Algorithms Optimization ......................................................................................................................................... 44 Testing the Markowitz Portfolio Optimization Method with Filtered Correlation Matrices / Gera Imre, Bánhelyi Balázs, London András ................................................................................................................ 44 Tight Online Bin Packing Algorithm with Buer and Parametric Item Sizes / Békési József, Galambos Gábor ........................................................................................................................................................... 48 A Branch-and-Cut Algorithm for the Multi-Depot Rural Postman Problem / Fernández Elena, Laporte Gilbert, Rodríguez Pereira Jessica ............................................................................................................. 51 Al ocation and Pricing on a Network in Presence of Negative Externalities / Pekec Saša ............................. 54 Graph Theory ......................................................................................................................................................... 57 The Vertex Sign Balance of (Hyper)graphs / Miklos Dezso ............................................................................ 57 Packing Tree Degree Sequences / Berczi Kristof, Kiraly Zoltan, Liu Changshuo, Miklós István .................... 61 Benchmark Problems for Exhaustive Exact Maximum Clique Search Algorithms / Szabo Sandor, Zavalnij Bogdan ........................................................................................................................................... 65 On Embedding Degree Sequences / Csaba Bela, Vasarhelyi Balint .............................................................. 68 Algorithms Complexity ........................................................................................................................................... 72 Computational Complexity of the Winner Determination Problem for Geometrical Combinatorial Auctions / Goossens Dries, Vangerven Bart, Spieksma Frits .................................................................... 72 Diploid Genome Rearrangement / Miklós István ............................................................................................. 76 Team Work Scheduling / Dosa Gyorgy, Kel erer Hans, Tuza Zsolt ................................................................ 80 Incremental 2-D Nearest-Point Search with Evenly Populated Strips / Podgorelec David, Špelič Denis ....... 83 Miscel aneous ........................................................................................................................................................ 87 Exploratory Equivalence on Hypercube Graphs / Mihelič Jurij, Čibej Uroš, Fürst Luka ................................. 87 Partitioning Polyominoes into Polyominoes of at Most 8 Vertices, Mobile vs Point Guards / Gyori Ervin, Mezei Tamas ..................................................................................................................................... 91 On Linear Grammars with Exact Control / Angyal Dávid, Nagy Benedek ...................................................... 95 Some Computable Functions without Brouwer Fixed-Points / Potgieter Petrus H. ........................................ 99 Indeks avtorjev / Author index .............................................................................................................................. 103 v vi Zbornik 19. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2016 Zvezek H Proceedings of the 19th International Multiconference INFORMATION SOCIETY – IS 2016 Volume H Middle-European Conference on Applied Theoretical Computer Science (MATCOS 2016) Uredil / Edited by Prof. Andrej Brodnik 12.-13. oktober 2016 / 12-13 October 2016 Ljubljana, Slovenia 1 2 FOREWORD MATCOS, Middle European Conference on Applied Theoretical Computer Science, took place at the University of Primorska on October 12th and 13th. This was its fifth overall edition and in these years got wide acceptance as a forum where theory and practice meet in a fruitful dialogue. Moreover, the dialogue does not spawn only between theory and practice, but also between senior and junior researchers. As it is already custom, the first day was devoted student papers, invited talk and some of the regular papers. Overall at the conference were presented four student papers and 26 regular papers. With MATCOS was this year also collocated a conference StuCosRec (3rd Student Computer Science Research Conference) where additional ten student papers were presented. The invited talk at MATCOS was titled Algorithms for robot navigation: From optimizing individual robots to particle swarms and given by Sándor Fekete. The regular papers were grouped into six sessions spanning from graph theory all the way to algorithm design and use of theoreical Computer Science results in practice. This is far the largest MATCOS conference, and we hope to make the next MATCOS even a bigger event. Koper, Ljubljana, Szeged, October 2016 Programme committee Chairs Gábor Galambos and Andrej Brodnik 3 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Jacek Blazevicz (Poznan, Poland) Andrej Brodnik (Koper, Ljubljana, Slovenia) co-chair Angel Corberan (Valencia, Spain) Ruben Dorado Vicente (Jaén, Spain) Elena Fernandez (Barcelona, Spain) Gábor Galambos (Szeged, Hungary) co-chair Gabriel Istrate (Timisoara, Romania) Miklós Krész (Szeged, Hungary) Silvano Martel o (Bologna, Italy) Benedek Nagy (Famagusta, Cyprus, Turkey) Gerhard Reinelt (Heidelberg, Germany) Giovanni Rinaldi (Rome, Italy) Borut Žalik (Maribor, Slovenia) 4 Customizing Hybrid Optimization for Microwave Tomography ∗ Milos Subotic Laszlo Palfi Nebojsa U. Pjevalica RT-RK Institute for Computer Faculty of Technical Sciences Faculty of Technical Sciences Based Systems Trg Dositeja Obradovica 6, Computing and control Narodnog fronta 23a, 21000 21000 Novi Sad, Serbia engineering dept. Trg Dositeja Novi Sad, Serbia laslo.palfi@rt-rk.com Obradovica 6, 21000 Novi milos.subotic@rt- Sad, Serbia rk.uns.ac.rs pjeva@uns.ac.rs ABSTRACT The best images are obtained by means of quantitative MWT, Microwave tomography is an inverse scattering problem, typ- where connection between dielectric properties and waves ically solved through optimization methods. The underlying are described by non-linear scattering equations. As all in- objective function is ill-posed and expensive for evaluation, verse problems, this one is also ill-posed i.e. multimodal and making microwave tomography a hard optimization prob- ill-conditioned. MWT is build on top of a forward solver, lem. This paper presents a novel optimization heuristic for a numeric algorithm for solving sets of scattering equations use in microwave tomography. Landscape analysis of ob- or simulating propagation and scattering of electromagnetic jective function is made. Results from landscape analysis waves. One evaluation of the forward solver could last for helped creating novel optimization heuristic. Significant ac- seconds [17] or even minutes [4]. This makes MWT a hard celeration is obtained. optimization problem. Categories and Subject Descriptors One solution is linearization of scattering equations using Born or Rytov approximations [14]. Also, many regular- G.1.6 [Numerical Analysis]: Optimization—global opti- ization methods are used for linearization of inverse prob- mization, unconstrained optimization lems [18].This makes the objective function convex i.e. uni- modal, which then could be solved by some local optimiza- General Terms tion method, usually Gauss-Newton [14] or similar method. Algorithms All these methods are criticized because they could stuck in a local minimum, work only if the contrast is small, and Keywords tend to over-smooth the resulting image [13]. Hybrid optimization, optimization heuristic, fitness land- scape analysis, microwave tomography, inverse scattering Other methods use global optimization techniques. The di- problem rect optimization approach tries to find the values of every pixel (or voxel) of an image [17] [16]. The scanned object is partitioned to a grid of pixels and every pixel is an optimiza- 1. INTRODUCTION tion variable. Since only a few degrees of freedom (pixels) Microwave tomography (MWT) [13] is imaging modality are possible to optimize, the resulting images are limited to which obtains an image of dielectric properties of scanned lower resolution. To downsize the search space, some con- object from scattered microwaves. MWT is an inverse prob- straints could be set on the image structure or pixel values lem - the input and output values, i.e. the input and output [2] [17] [5]. The indirect optimization approach optimizes wave, are known, and the function, i.e. the dielectric prop- the shapes and dielectric properties of some objects [4]. This erties, have to be found. technique needs a priori knowledge of objects being scanned. The indirect optimization approach has a smaller amount of MWT is an imaging method which does not produce ra- optimization variables in comparison to direct optimization diation like X-Ray CT and promise to be much cheaper approach. than MRI. It is multidisciplinary field involving electromag- netism, antenna design, numeric simulation, optimization This paper describes a novel hybrid optimization heuristic methods, computing acceleration and parallelization. for solving the inverse problem in quantitative MWT, with- out any approximation, with quantized pixel values. First, According to the No Free Lunch theorem [20] it is impossi- the landscape of the objective function is analyzed. The ble to make a universal meta-heuristic which would optimize landscape is described by measures used in literature [19] equally good on all problems. In other words, optimization such as: ruggedness, deceptiveness, neutrality, number of heuristic should be customized for every problem at hand. local minima. Next, some proposed countermeasures [19] This makes MWT optimization problem interesting for re- are used to decrease the difficulty of the objective func- search. tion, which results in hybridization of a global optimization ∗Corresponding author. method known as Abstract Bee Colony (ABC) [10] with lo- 5 cal optimization methods Hooke-Jeeves (HJ) [7] [8] and cus- dielectric permeability values, which are later used as input tom hill-climbing (HC) with memory. Finally, parameters of for the forward solver. ABC method are tuned, after experimenting with the MWT objective function. The method is even more interesting be- Because the lack of derivatives, only derivative-free opti- cause it uses quantized (integer) values are used for search mization methods are used. Also, cache for the cost function space, instead of real values. is used, to avoid unnecessary repeated and costly evaluation of the forward solver. The limitations of the presented method are perfect condi- tions for the objective function. Noise is not modeled and 3. LANDSCAPE ANALYSIS inversion crime is made. A list of landscape metrics and characteristics, issues in opti- mization which rise with characteristics occurrence and some Chapter 2 describes the problem of optimization in MWT solutions for these issues are given in literature [19]. Land- in detail. Method for the landscape analysis and results of scape analysis of objective function is performed in order to analysis are presented in Chapter 3. Chapter 4 describes obtain metrics such as: ruggedness (bumpiness), deceptive- proposed hybrid optimization algorithm. Performance anal- ness (useless gradient information), multimodality (number ysis and comparison of the proposed hybrid algorithm and of local minima), neutrality (slow convergence). The analy- its variants is given in Chapter 5. Chapter 6 includes final sis results are used for choosing the right strategy to improve conclusions and future work. the optimization process. 2. PROBLEM DESCRIPTION The problem being solved in this paper has a large amount This paper deals with the quantitative MWT problem, which of variables, so visualization is impossible. Smaller prob- is inverse problem, as described in the introduction. The lems objective function could be completely sampled. Note inverse problem is ill-conditioned and multimodal. Global that the search space is quantized, so the sampling could optimization methods are needed to find a solution. Neither be complete. For larger problems complete sampling is too approximation nor regularization methods are used. Direct expensive. Instead, analysis is done on statistic from local optimization approach is used, where every grid cell, i.e. minima searches. In this research, Hooke-Jeeves (HJ), a pat- pixel, is a separate optimization variable. tern based local search method, is used. Quantum of search space, 1, is set as minimum step for HJ. The cost (objective, fitness) function consists of: As expected from an ill-possed problem, the analysis found many local minima. Also, the landscape is very rigged, • A search space G to problem space X conversion func- which is typical for ill-conditioned problems with weak causal- tion. A candidate solution needs to be converted from ity. The most interesting landscape features are valleys, such search space variable vector to problem space grid of as the one found in Rosenbrock function. Valleys which are dielectric permittivities. not oriented (stretched) along the coordinate axes are easily found by HJ. HJ easily converge to the valley’s floor. Since • Measurement and simulation forward solvers. The for- HJ searchs only along the coordinate axes it cannot move ward solver takes a grid of dielectric permittivities as towards the next point in the valley’s floor, so it halts there the input and calculates scattered waves. notifying that a local minimum is found. Unlike the Rosen- • A cost calculation formula. Waves from the measure- brock function, which have curved valley, valleys found here ment and simulation forward solver are compared and are strait. This makes tracking valleys easier. Note that a the cost of the objective function is calculated. valley’s floor is very neutral i.e. it has much smaller gradient in comparison to the valley’s walls. FDTD [6], a numeric algorithm for simulating electromag- The change of the quantization shows another interesting netic wave propagation, is used as a forward solver. Two behaviour. In order to obtain meaningful results, the quan- separate grids for two forward solver are used: a fine and tization of the coarse grid has to be finer than the quan- a coarse grid. The fine grid is used by the measurement tization of the the fine grid i.e. the coarse grid has to be forward solver for calculating measured waves, which in real quantized in more quantization levels then the fine grid. If MWT would be obtained by measuring waves on antennas. the difference of the coarse grid’s and the fine grid’s number The fine grid measurement forward solver is evaluated only of the quantization levels is below a certain lower threshold, once at the start of inversion. Its input is later used as a a true minimum cannot be found. This can be considered as target for optimization. The coarse grid simulation forward a needle-in-a-haystack problem. Beyond the aforementioned solver is used by the optimization algorithm, for calculating lower threshold, a true minimum can be found. As the num- waves from the candidate solution. The waves are compared ber of the coarse grid’s quantization levels becomes larger, in frequency domain with Mean Absolute Error (MAE), a the number of local minima increases. New local minima cost calculation formula. mostly appear along the valleys. At very fine quantization, above a certain upper threshold, HJ could break through The search space is quantized, i.e. optimization variables are some valleys. Similar behaviour occurs when the Rosenbrock represented with fixed-point numbers. The implementation function is optimized by HJ. HJ needs to lower its step to presented in this paper uses integers to represent fixed-point very small values, so it could pass Rosenbrock’s valleys, at numbers. The cost function converts the candidate solution the cost of very large number of small steps. For example, from integer search space to floating-point problem space of for one problem, the lower threshold is on 26 quantization 6 levels i.e. 6 quantization bits and the upper threshold is on than HJ when solving same problem. In the worst case HC 214 quantization levels, which is a very big difference. search 2N 2 d directions, while HJ just 2Nd, where Nd is num- ber of optimization variables. On other side, HC is more 4. HYBRID OPTIMIZATION effective on lower quantization threshold than HJ on up- per quantization threshold, because difference between lower One solution for the ruggedness issue is hybridization [19]. and upper quantization threshold is large, as mentioned in HJ is chosen as a local search algorithm. HJ is a simple Chapter 3, which amortizes the cost of HC probing in larger algorithm and it is not hard to implement in a integer arith- number of directions. metic. Other possible candidates were Powell method and Nelder-Mead method. Other methods are harder to imple- The ABC algorithm used in this paper is slightly changed. ment in integer arithmetic. ABC [10], a novel popular meta- Instead of terminating the algorithm when a certain number heuristic, is used as a global search algorithm. GA is also of iterations or function evaluation is reached [12], the algo- commonly used in literature [17] [16]. ABC is easier to un- rithm is terminated when the best solution is not changed derstand than GA. PSO is harder to implement in integer for certain number of iterations i.e. when the algorithm stag- arithmetic. Also, for inverse scattering problems, ABC is nates for a certain number of iterations. The ABC global more effective than PSO [15]. search algorithm is hybridized with HJ and HC on following way: every ABC’s candidate solution that needs cost func- Evaluation of the one ABC+HJ hybrid approach [9] shows tion evaluation is improved with HJ first, and then with HC small acceleration on rugged functions like Ackley, Griewank, next. That way, the bees in ABC work with local minima Rastrigin. The reason for this is that the aforementioned hy- only. One feature of this hybrid is that almost all exploita- brid approach most of the time explore the search space by tion is done by local optimizations, which causes that the ABC and occasionally executes HJ only on the best solution. values of ABC parameters proposed in literature [1] [9] are Since ruggedness drags deceptiveness, ABC work most of the too large. A meta-optimization based method for parame- time with false gradient information HJ converges to local ter tuning [1] demands trying multiple parameter candidates minima and convergence to the global minimum is slow. A and evaluating ABC on a same candidate over 30 or more better approach would be to make the global search algo- times to obtain a good mean of cost, which is unacceptably rithm use the results of the local search algorithm [19]. The slow. A different approach to parameter tuning is tried. landscape of the local minima could look less rugged from Maximum trial and stagnation thresholds are set to higher the global search algorithm’s point of view. This way, ABC values. Every time a better solution is found by an employee works only with local minima. or onlooker bee or whenever new scout is sent, the count of trials is logged, before it is restarted. Similarly, when a new To overcome the issue of valleys not oriented along the axes, best solution is found, the number of iterations under stag- mentioned in Chapter 3, a modified version of hill-climbing nation is saved before restarting. The histograms are made (HC) algorithm is used. The purpose of HC is to squeeze from these logs. Histograms could help choosing lower pa- through the valley towards a better minimum, from a point rameter values than initial one. The number of bees is twice where HJ stopped. The main feature of modified HC is the as big as the number of optimization variables. A larger ability to check the neighbours in other directions besides number of bees will lead to more exploitation, which is al- along the coordinate axes. Neighbours are in directions de- ready done by local search. Large number of bees also means fined as having the greatest common denominator equal 1 more expensive iterations. On other hand, literature [1] ne- and first (Manhattan) norm equal 2. For example, neigh- glects the impact of the bee number on the performance of bour is in direction (1, 1) but it is not in direction (2, 0), the optimization. because (2, 0) even having first norm equal 2, greatest com- mon denominator is not 1 but 2. A simple version of HC is used, where the algorithm moves from the current position to the position of the first neighbour with a smaller cost. 5. EXPERIMENTAL VALIDATION Since HC tracks strait valley, it moves in only few direc- Table 1 presents a comparison of optimization algorithms. tions. The algorithm memorizes the two most recently used A 3-cell 1D problem is optimized over three case: 26, 28 different directions. The algorithm tries to search in mem- and 214 quantization levels. Every algorithm is tuned to a orized directions first, starting from the most recently used 100% success rate on the simplest problem with 26 quan- direction, before searching in all directions. While tracking tization levels, in order to have a fair comparison. The through a valley, it is common that two directions occur al- first two rows show the maximum number of trial iterations ternately. This indicates that the valley is oriented to the Nmax trials and the maximum number of stagnation itera- vector sum of these two directions. The algorithm tries to tions Nmax stagn. Number of bees Nbees is kept as double track the pattern from these two directions, before search- the number of optimization variables. Every cell shows cost ing in memorized directions. If a pattern movement is not in function evaluation and success rate, averaged over 50 successful, the valley probably changed the orientation and algorithm runs. the two directions from the memory are probably stalled. As seen in Table 1, ABC method hybridized with HJ and Additional landscape analysis shows that HC will merge modified HC with memory, gives the best performance. It many local minima found by HJ. HC will lower number of it also seen that finer quantization demands more function the HJ’s local minima by order of magnitude. Also, the evaluations and degrades success rate. At upper threshold of probability that the true minimum is hit from the first at- 214 quantization levels success rates are better. The reason tempt of local search is much larger when using HJ with HC for that is the ability of HJ to break through more valleys, in comparison to HJ only. Modified HC is more expensive as described in Chapter 3. 7 the frequency range 10 hz to 20 ghz. Physics in Table 1: Comparison of optimization heuristics medicine and biology, 41(11):2251, 1996. Heuristics ABC ABC+HJ ABC+HJ+HC [6] S. Hagness and A. Taflove. Computational Nmax trials 150 30 12 electrodynamics: the finite-difference time-domain Nmax stagn 2000 60 14 method. Norwood, MA: Artech House, 2000. 100 79 [7] R. Hooke and T. A. Jeeves. Direct search solution of 26 121 100% 100% 100% numerical and statistical problems. Journal of the 1420 669 ACM (JACM), 8(2):212–229, 1961. 28 1890 50% 80% 98% [8] M. G. Johnson. Nonlinear optimization using the 33141 2044 algorithm of hooke and jeeves, 1994. 214 10030 0% 94% 100% [9] F. Kang, J. Li, Z. Ma, and H. Li. Artificial bee colony algorithm with local search for numerical optimization. Journal of Software, 6(3):490–497, 2011. 6. CONCLUSIONS [10] D. Karaboga and B. Basturk. A powerful and efficient This paper shows that landscape analysis could help in hy- algorithm for numerical function optimization: bridization, modification and tuning of optimization heuris- artificial bee colony (abc) algorithm. Journal of global tic. It is important to consider valley structures, when op- optimization, 39(3):459–471, 2007. timizing the MWT objective function landscape. Choosing [11] I. Loshchilov, M. Schoenauer, and M. Sebag. Adaptive the right level of quantization is important to perform suc- coordinate descent. In Proceedings of the 13th annual cessful search. If quantization is too coarse, then a true conference on Genetic and evolutionary computation, global minimum will never be found. Finer quantization pages 885–892. ACM, 2011. will decrease performance. [12] M. Mernik, S.-H. Liu, D. Karaboga, and M. Črepinšek. On clarifying misconceptions when One of the future tasks is examining objective function sep- comparing variants of the artificial bee colony arability. If the objective function is separable, additional algorithm by offering a new implementation. acceleration could be obtained [3]. An additional future task Information Sciences, 291:115–127, 2015. is implementing an adaptive local search method [11] which [13] S. Noghanian, A. Sabouni, T. Desell, and A. Ashtari. promises faster convergence, especially in valleys. Also, the Microwave Tomography: Global Optimization, proposed optimization heuristic should be tested on a more Parallelization and Performance Evaluation. Springer realistic forward solver, using realistic phantoms with hu- Publishing Company, Incorporated, 2014. man tissues [18]. [14] M. Ostadrahimi, P. Mojabi, A. Zakaria, J. LoVetri, and L. Shafai. Enhancement of gauss–newton 7. ACKNOWLEDGMENTS inversion method for biological tissue imaging. The authors want to thank God for all these great ideas and Microwave Theory and Techniques, IEEE all people who helped. Transactions on, 61(9):3424–3434, 2013. [15] A. Randazzo. Swarm optimization methods in This work was partially supported by the Ministry of Educa- microwave imaging. International Journal of tion, Science and Technological Development of the Republic Microwave Science and Technology, 2012, 2012. of Serbia, under grant number: TR32029. [16] A. Sabouni and S. Noghanian. Experimental results for microwave tomography imaging based on fdtd and 8. REFERENCES ga. Progress In Electromagnetics Research M, [1] B. Akay and D. Karaboga. Parameter tuning for the 33:69–82, 2013. artificial bee colony algorithm. In International [17] A. Sabouni, S. Noghanian, and S. Pistorius. A global Conference on Computational Collective Intelligence, optimization technique for microwave imaging of the pages 608–619. Springer, 2009. inhomogeneous and dispersive breast. Electrical and [2] A. Ashtari, S. Noghanian, A. Sabouni, J. Aronsson, Computer Engineering, Canadian Journal of, G. Thomas, and S. Pistorius. Using a priori 35(1):15–24, 2010. information for regularization in breast microwave [18] J. D. Shea, P. Kosmas, S. C. Hagness, and B. D. image reconstruction. IEEE Transactions on Van Veen. Three-dimensional microwave imaging of Biomedical Engineering, 57(9):2197–2208, 2010. realistic numerical breast phantoms via a [3] W. Chen, T. Weise, Z. Yang, and K. Tang. Large-scale multiple-frequency inverse scattering technique. global optimization using cooperative coevolution with Medical physics, 37(8):4210–4226, 2010. variable interaction learning. In International [19] T. Weise, R. Chiong, and K. Tang. Evolutionary Conference on Parallel Problem Solving from Nature, optimization: Pitfalls and booby traps. Journal of pages 300–309. Springer, 2010. Computer Science and Technology, 27(5):907–936, [4] M. Donelli, I. J. Craddock, D. Gibbins, and 2012. M. Sarafianou. A three-dimensional time domain [20] D. H. Wolpert and W. G. Macready. No free lunch microwave imaging method for breast cancer detection theorems for optimization. IEEE transactions on based on an evolutionary algorithm. Progress In evolutionary computation, 1(1):67–82, 1997. Electromagnetics Research M, 18:179–195, 2011. [5] S. Gabriel, R. Lau, and C. Gabriel. The dielectric properties of biological tissues: Ii. measurements in 8 Schedule assignment for vehicles in inter-city bus transportation over a planning period Balázs Dávid University of Szeged davidb@jgypk.u-szeged.hu ABSTRACT The outline of this paper is the following: first, we present In this paper, we examine the problem of assigning vehicles the classic problem of vehicle scheduling, and demonstrate to each day of a planning period based on existing theoreti- the concept of vehicle duties through it. Using this, we define cal schedules in public transportation. The assignment of a the schedule assignment problem, where we aim to organize vehicle to daily tasks has to satisfy certain requirements. If the duties of vehicles over a longer planning period. For this the problem addresses long-distance bus transportation, ve- problem, we give a mathematical model, and also present a hicles returning to their starting depots would usually result matching based heuristic. The solution of both the model in a high additional cost. Because of this, we also have to and heuristic are tested on real-life instances. assign a garage to each vehicle where they spend the night and from where they start their next daily schedule. We also want to minimize the arising traveling and operational 2. VEHICLE SCHEDULING costs. We give a network-based mathematical model for the For the introduction of the VSP, we refer to our formaliza- problem. We examine solutions both of the model and of tion in [5]. We are given a set V of vehicles and T of service heuristic methods, and present their results. trips. Every trip has a departure and arrival time, a start- ing and ending location, and a set of vehicles that are able Categories and Subject Descriptors to serve the trip. A (t, t0) pair of trips are compatible, if a vehicle can service both trips with respect to the running J.m [Computer Applications]: Miscellaneous time and distance between the arrival location of t and the departure location of t0 (such a journey is called a dead- General Terms head trip). A set D of depots can also be introduced for the Vehicle scheduling, Application, Heuristic problem. In this case, every v ∈ V vehicle has a depot-type d(v) ∈ D. Vehicles that share the same depot-type share 1. INTRODUCTION the same characteristics, and also have the same costs. If Public transportation companies usually create their sched- a vehicle v belonging to depot d is used in the solution, it ule in advance for a longer planning period. The days of contributes a cost of dc(d) + tc(d) × dist(v), where dc(d) is a this period belong to different day-types (workdays, holi- one-time daily cost, and tc(d) is the cost of traveling a unit days, etc.). Days that share a day-type have the same un- distance for a vehicle belonging to depot d, while dist(v) derlying theoretical schedule. Such a daily schedule divides is the distance covered by vehicle v in the solution. A bi- the set of trips into vehicle duties, which also give the exe- nary depot-compatibility vector vt = (v1, ..., v|D|) can also cution order of tasks in that duty. These duties are carried be introduced for every trip t ∈ T . If such a vector exists, a out by the vehicles of the company each day. vehicle belonging to depot d can only service trip t, if vtd = 1. The VSP assigns the trips of the given timetable to vehicles, If several days share the same day-type, the same vehicle satisfying the following conditions: for every v ∈ V , the trips duty will exist for all of them, and a duty will always be assigned to v must be compatible with each other, and every executed by a single vehicle. However, it does not necessarily trip t ∈ T must be executed exactly once. The cost of this have to be the same vehicle every day for the same duty. assignment has to be minimal. The goal of this paper is to assign the above given duties to vehicles, thus creating a unique roster for each vehicle over If the problem has only 1 depot, it is called a single depot the desired planning period. vehicle scheduling problem (SDVSP), and can be solved in polynomial time. A formulation for the SDVSP can be seen in [2]. If the number of depots is at least 2, we get a multiple depot vehicle scheduling problem (MDVSP). The MDVSP was introduced by Bodin et al. in [3], and proven to be NP- hard by Bertossi et al. [1]. An overview of different VSP models can be found in [4]. The result given by the above VSP corresponds to a set of vehicle duties for one day. A vehicle duty gives a set of tasks that have to be executed by the same vehicle on the given 9 day. However, the VSP does not assign a specific vehicle to di,j ∈ D be the vehicle duty j on day i, where 1 ≤ i ≤ n its duties, only gives the required vehicle type. Because of and 1 ≤ j ≤ k where k is the number or duties on day i. this, we call the result a ”theoretical” schedule, as further Let G be the set of nodes for the l garages. A garage g steps have to be taken to determine the exact vehicles in has to be considered as a potential night garage for vehicles service on the current day. at the end of every day of the planning period. To denote this, we introduce multiple nodes for each garage. Let gi,j represent a state of garage j on the end of day i, where 3. SCHEDULE ASSIGNMENT PROBLEM 0 ≤ i ≤ n and 1 ≤ j ≤ l. This node will represent the As seen in Section 2, the resulting schedules of the VSP only number of vehicles staying at garage j at the end of day i. give vehicle duties for a single day. However, transportation The special node g0,j denotes the state of garage j at the companies create their schedules in advance for a planning beginning of the planning period. Let V be the set of nodes period (eg. several weeks or months). Their usual method for the m vehicles. We represent vehicle i with two nodes: is to separate the days of the planning period into different vi,0 represents the vehicle at the beginning of the planning types (eg. workday, Saturday, holiday, etc.), and have a the- period and vi,1 at the end of the planning period. The edges oretical vehicle schedule for each of these day-types. This of our network will represent the possible traveling activities means that days belonging to the same day-type will have of vehicles throughout the planning period.Vehicles staying the exact same vehicle duties throughout the entire planning in their starting garage at the beginning of the planning period. Same duties will always have the same requirement period are given by edges for vehicle types over the planning period. However, they will not necessarily be executed by the same vehicle on dif- ferent days. Evb = {(vi,0, g0,j )|1 ≤ i ≤ m, vehicle i starts at garage j}. The input for the schedule assignment problem is the n day planning period of the company, with each day i having an Vehicles ending the planning period in one of the garages assigned day-type dt(i). We are also given the set V of ve- are represented by hicles that are available over the planning period. Similarly to the VSP, a set D of depots is also introduced, and every v ∈ V vehicle is given a depot-type d(v) ∈ D. Similarly Eve = {(gn,i, vj,1)|1 ≤ i ≤ l, 1 ≤ j ≤ m}. to the VSP, vehicles belonging to the same depot share the same costs and characteristics. Set G represents garages where vehicles can stay for the night between two days of Vehicles leaving the garages to execute a duty at the begin- the planning period. For each day-type dt we also have a ning of a day are represented by edges daily vehicle schedule, which is the set S(dt) of vehicle duties that have to be executed. Similarly to the trips of the VSP, a vehicle duty j ∈ S(dt) also has a binary depot-compatibility Edb = {(gi−1,j , di,h)|1 ≤ i ≤ n, 1 ≤ j ≤ l, 1 ≤ h ≤ k}. vector vj = (v1, ..., v|D|). A vehicle from depot d can service duty j if and only if vj = 1. Vehicles in inter-city trans- d Vehicles returning to a garage at the end of the day from a portation do not necessarily return to their starting garages duty are represented by edges after executing a duty, as that could pontentially mean high extra costs depending on the distance they have to travel. Because of this, a garage g ∈ G also has to be assigned to Ede = {(di,h, gi,j )|1 ≤ i ≤ n, 1 ≤ j ≤ l, 1 ≤ h ≤ k}. the vehicle at the end of each day, where they will spend the night and begin the next day of the planning period. The goal of our problem is to assign these duties to the vehicles of Vehicles staying at a garage for a given day are represented the company such that each duty is executed exactly once, by edges and the arising costs are minimal. A vehicle v from depot d contributes dc(d) × workv i + tc(d) × dist(v) to cost of the problem, where dcd and tc(d) are the one-time daily and Eg = {(gi−1,j , gi,j )|1 ≤ i ≤ n, 1 ≤ j ≤ l}. unit-distance costs of a vehicle from depot d respectively, dist(v) is the distance travelled by vehicle v during the plan- ning period (either by servicing duties or traveling to/from Circulation edges should also be added for each vehicle: garages). The binary vector workv = (work1, ..., workn) denotes whether vehicle v was working on day i of the plan- ning period, or not. Ef = {(vi,1, vi,0)|1 ≤ i ≤ m}. 3.1 Model Using the node set N = {D ∪ V ∪ G} and edge set E = In this subsection we will introduce an integer programming {Evb ∪ Eve ∪ Edb ∪ Ede ∪ Eg ∪ Ef } we can define the multi- model for the schedule assignment problem over a planning commodity network (N, E). Our network will have m sep- period where vehicles have to spend the night at one of the arate commodities, one for every vehicle. The commodities pre-assigned garages. Some notations in this section will of this network will be denoted by c ∈ C. For each edge e be different from the problem introduction above. Let us of this network, we give an integer vector xe. This vector consider a planning period of n days. Let D be the set of will have one component for every commodity c, which we nodes for the vehicle schedules for the planning period. Let will denote by xce. The value xce represents if a vehicle c is 10 assigned the traveling activity connected to edge e. Based If the vehicles also have to be refueled at the end of every on the above network, we can formalize the mathematical day, we have to modify the underlying graph, and introduce model in the following way: the set T for refueling stations. To assign a refueling station for every vehicle at the end of a day, we have to introduce two new sets of edges to the model instead of Ede. Vehicles X xc heading towards a refueling station at the end of a day from e = 1, ∀(i, h) pair (1) a duty are represented by edges e:(gi−1,j ,di,h)∈Edb X xce = 1, ∀(i, h) pair (2) Erb = {(di,h, ti,j)|1 ≤ i ≤ n, 1 ≤ j ≤ |T | , 1 ≤ h ≤ k}, e:(di,h,gi,j )∈Ede xe = 1, ∀e ∈ Evb (3) and vehicles returning to a garage after refueling are repre- sented by edges X xce = 1, ∀c ∈ C (4) e:(gn,i,vj,1)∈Eve Ere = {(ti,j , gi,j )|1 ≤ i ≤ n, 1 ≤ j ≤ |T |}, X X xce − xce = 0, ∀c ∈ C, ∀n ∈ N (5) e∈n+ e∈n− where ti,j represents a state of refueling station j on the end of day i. In this case, constraint (2) is replaced with the xce ∈ {0, 1}, ∀e ∈ {Evb ∪ Eve ∪ Edb ∪ Ede ∪ Ef } (6) following: xce ≥ 0 integer, ∀e ∈ Eg (7) X X trcexce → min X xce = 1, ∀(i, h) pair (10) c∈C e∈E e:(di,h,ti,j )∈Erb Constraints (1) and (2) restrict that there should be exactly 3.3 A matching heuristic one vehicle executing a duty, and returning from a duty to We also present a heuristic solution for the above problem. a garage. Constraint (3) does the starting setup for each Given a planning period of n days, this method will sequen- vehicle, assigning them to a garage given by the network. tially examine all (d Constraint (4) ensures that every vehicle ends the planning k , dk+1) day pairs (0 ≤ k ≤ n − 1) over the planning period. For each such pair, a bipartite graph period in exactly one garage. Flow conservation for the ver- G tices of the network is guaranteed by (5), while constraints k = (Vk ∪ Dk , Ek ) is constructed. The graph Gk repre- sents the state of the problem at the beginning of day k. (6) and (7) provide the binary and integrality constraints for The special value k = 0 denotes the beginning of the plan- all the variables. The objective function of the model min- ning period. Let G be the set of garages where vehicles can imizes the arising costs of the executed traveling activities. stay for the night. The value trce gives the cost of a vehicle from commodity c to service the activity denoted by edge e. Nodes v ∈ Vk represents the vehicles from the fleet of the company with a status at the end of day k, while nodes 3.2 Extensions of the model d ∈ Dk represent the vehicle duties of day k + 1. An edge Depending on the requirements and problem size, other con- (v, d) exists in Ek, if the vehicle v is able to execute duty straints can also be added to the model. One of the easiest d. The cost of an edge is based on the minimum of the ways to decrease the problem size is to modify constraint (3). following distances: sv,d = min{sv,g + sg,d}, where sv,g is In its current form, there is a separate commodity for each the distance between the location of vehicle v at the end vehicle available for the planning period, which can result of day k and garage g, while sg,d is the distance between in a large graph even for a small number of vehicles. How- garage g and the starting location of duty d for all g ∈ G (the ever, vehicles that have the exact same requirements can be smaller the distance, the bigger this value will be). Based on classified into groups. If we let V be the set of such vehicle the above graph, we can give the following matching model groups, and ki be the number of vehicles in group i ∈ V , for our problem (for a big enough number N ). The binary then the following constraint can replace (3): variable xv,d represents if duty d is executed by vehicle v in the solution, or not. X xv ≤ ki, ∀i ∈ V (8) j:v X i,0 ,g0,j xv,d = 1, ∀i (11) (v,d)∈E If garages i ∈ G have a limited capacity mi, then we have to introduce the following capacity constraint on all of their X xv,d = 1, ∀j (12) incoming edges: (v,d)∈E cv,d = min{N − sv,d}, ∀(v, d) ∈ E (13) X xe ≤ mi, ∀j ∈ G (9) e:d x i,h ,gi,j v,d ∈ {0, 1}, ∀v, d (14) 11 X cv,dxv,d → max Table 2: Results of the heruistic (v,d)∈E Vehicle Planning Heur. Heur. Heur. types period cost(km) time(s) gap(%) 1 week 2 824.30 26.53 69.62 1 month 16 061.400 86.06 31.44 We sequentially solve n such matching models for all day 2 2 months 37 266.90 168.02 16.48 pairs (dk, dk+1), (0 ≤ k ≤ n − 1), which will give us the 3 months 55 717.10 232.90 14.17 schedule assignment for all vehicles over the planning pe- riod. After solving the last matching problem, the position 1 week 5 463.00 24.38 157.28 of the vehicles will be the ending location of their last duty. 1 month 31 768.00 86.64 49.69 3 Because of this, we need to solve a final matching problem, 2 months 70 661.90 154.78 24.85 which sends every vehicle to the closest garage. 3 months 106 551.10 227.98 19.81 4. TEST RESULTS the heuristic was the ability to generate an adequate solu- We tested the model and heuristic solution on real-life in- tion for the problem in a short time, in case we want to use stances. These instances were part of a ”what-if” scenario, an initial solution for larger instances with a mathematical trying to coordinate the transportation of three counties programming based solution process. The heuristic fits this in Hungary. These counties organized their transportation requirement well. semi-independently before. The transportation companies provided the input for a 3-month long planning period. The 5. CONCLUSIONS input consisted of vehicle duties belonging to 4 day-types. A In this paper, we examined the application oriented prob- single day had 90-170 vehicle duties depending on its type, lem of assigning schedules to vehicles over a planning period. and the combined fleet of the companies was separated into We wanted these assignments to take into consideration the 3 vehicle types. One vehicle type was able to execute any of requirements of the vehicles themselves, and provide more the duties, while the other two vehicle types were restricted information than a ”theoretical” solution this way. For this, to some of the duties (eg. depending on the length of the we introduced the schedule assignment problem, and pro- duty). Using the input data above, we created two main vided a mathematical model for it. The basic model consid- groups of test instances: one with all three vehicle types, ers the parking requirements of vehicles at the end of each and another with the restricted vehicle types merged into day, but we also gave extensions for garage capacities and one. We ran tests for the entire planning period of 3 months refueling at the end of each day. We also devised a matching- and smaller intervals of it also. The mathematical model was based sequential heuristic for the problem, which decreases solved using the COIN-OR Symphony MILP solver. The re- the running time significantly, but comes at the cost of being sults can be seen in Table 1. far from the optimal solution in quality. A future extension of the model will include the requirement Table 1: Results of the mathematical model Vehicle Planning Opt. Opt. of regular mechanical inspection: vehicles have to be sent for types period cost(km) time(s) a daily inspection after executing duties for a given amount 1 week 1 665.10 14.06 of days. For this, we considered a state-expanded version of 1 month 12 219.50 221.23 the current model, but the size of its current version is still 2 2 months 31 993.40 758.09 too large to yield a solution yet. One natural way to handle 3 months 48 800.00 1 813.57 such a large problem is column generation, which needs an 1 week 2 123.40 14.00 initial solution with an acceptably quality. The matching 1 month 21 221.20 228.26 heuristic solves the problem quickly, and its results can be 3 2 months 56 597.40 847.86 applied effectively in such a solution process. 3 months 88 933.30 2 008.83 6. REFERENCES [1] A. Bertossi, P. Carraresi, and G. Gallo. On some This table shows both the cost of the instances (measured in matching problems arising in vehicle scheduling models. the km that the vehicles ran during the planning period) and Networks, 17(1):271–281, 1987. the time in seconds required for the solution. The results of [2] L. Bodin and B. Golden. Classification in vehicle the heuristic method can be seen in Table 2. routing and scheduling. Networks, 11(1):97–108, 1981. [3] L. Bodin, B. Golden, A. Assad, and M. Ball. Routing It can be seen from the tables, that the solution of the model and scheduling of vehicles and crews: The state of the is possible even for larger instances. This means that it can art. Computers and Operations Research, 10(1):63–212, be applied for practical problems, especially because the con- 1983. straints of the model can easily be modified depending on the requirements of the given company. The heuristic per- [4] S. Bunte and N. Kliewer. An overview on vehicle formed poorly on small instances due to the large number of scheduling models. Journal of Public Transport, sequential matching problems it has to solve, but manages to 1(4):299–317, 2009. significantly decrease the running time of bigger instances. [5] B. Dávid and M. Krész. A model and fast heuristics for The quality of its solutions is far from the optimal value, but the multiple depot bus rescheduling problem. In 10th the gap gets smaller as the length of the planning period in- International Conference on the Practice and Theory of creases. However, one of the main reasons behind developing Automated Timetabling (PATAT), pages 128–141, 2014. 12 A self-bounding Branch & Bound procedure for truck routing and scheduling [Extended Abstract] Csongor Gy. Csehi∗ Márk Farkas∗ Ádám Tóth∗ ABSTRACT the largest public road transportation companies in the EU. A minor improvement on the operational cost of each tour In this talk we will study a part of the core algorithm of a can result huge advantage for the freight services company. complex software solution for truck itinerary construction The problem is to construct a cost optimal itinerary, given for one of the largest public road transportation compa- an initial location with an asset state, the place and other nies in the EU. The problem is to construct a cost optimal properties of tasks (we will call them routing tasks) to be itinerary, given an initial location with an asset state, the performed. Such an itinerary specifies the location and ac- place and other properties of tasks to be performed. Such tivity of the truck and the driver until the finish of the last an itinerary specifies the location and activity of the truck routing task. This means that this itinerary gives every or- and the driver until the finish of the last routing task. The der to the driver, including every turn in the road and every calculation of possible itineraries is a branch and bound algo- stops with exact durations, etc. The working stops can be rithm. The nodes of the search tree have the following argu- done only in the places of the tasks, the refueling and resting ments: position, time, driver-state and truck-state. For each stops can be done only in previously fixed places (roughly node we calculate the cumulated cost for the road reaching 4000 fixed parking places and 100 fixed filling stations across that state, and a heuristically lower bound for the cost of Europe). To achieve such an itinerary we use mapping soft- the remaining road. In each step the procedure expands ware to construct the routes and calculate the distance, du- the next unexpanded node with the best sum for cumulated ration and cost between any two places. Clearly the problem and heuristically cost. To make a sharp heuristic we run is much harder than a path finding in the graph, because we the same branch and bound algorithm (from each node) but can do many different actions in each place (different amount with hypothetical positions (with coarser data and simplified of fueling liters, different duration of rest, etc). activities: no refuelling, no road costs, etc.). We anticipate The software (which also performs the vehicle assignment) significant gains in performance and quality compared to the is already finished and applied with very good results (from previous approach. 2015), large cost saving is reached by the company. For more formal definitions of the problem, and more information of CCS Concepts the software one must read [2]. The ongoing researches aim •Computer systems organization → Embedded sys- to extend the functionality of the software. One goal is to tems; Redundancy; Robotics; •Networks → Network reli- improve optimality by plan the itinerary for longer times- ability; pan. That means more routing tasks in each round. The calculation of possible itineraries is a branch and bound Keywords algorithm. For detailed information on the widely used algo- rithms of operations research the reader should see [1] The Logistics; route optimization; branch and bound nodes of the search tree have the following arguments: po- sition, time, driver-state and truck-state (we will call these 1. INTRODUCTION data the state). For each node we calculate the cumulated cost for the road reaching that state, and a heuristical lower We will study a part of the core algorithm of a complex bound for the cost of the remaining road. Each node has a software solution for truck itinerary construction for one of pointer to its father (this will make it possible to calculate 1Nexogen Kft., csehi.csongor@nexogen.hu, the roue from a proper node). In each step the procedure farkas.mark@nexogen.hu, toth.adam@nexogen.hu expands the next unexpanded node with the best sum for cumulated and heuristical cost. The following oversimplified example of [2] with Figure 1 il- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed lustrates the tree of the algorithm. Suppose that we are in for profit or commercial advantage and that copies bear this notice and the full cita- position ’Start’ in the begining. From ’Start’ we can go to tion on the first page. Copyrights for components of this work owned by others than different places for example two parking places ’P1’ and ’P2’ ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission (the state will be different in the two locations if the dura- and/or a fee. Request permissions from permissions@acm.org. tion and distance of the drivings are not equal). Supposing that we can rest 9 or 11 hours we get two new nodes from c 2016 ACM. ISBN 978-1-4503-2138-9. each parking place reaching node. If we can reach place ’P3’ DOI: 10.1145/1235 13 from both ’P1’ and’P2’ then this way we get four different (h) BOUNDING: For each element Y of N , where the nodes in the same place ’P3’. In general none of the four place of Y is P , get the list LP of the previously nodes can be bounded in the algorithm, because the states examined nodes in place P . Compare Y with ev- are different and hence we can not predict which will give ery element of LP , and if there exists such Z that the best solution in the end. every state related variable and the cost are not worse in Z than in Y , then delete Y from N . (i) For each element of N put it into L and into the proper LP list according to the place of the i nodes. 3. RETURN: Unreachable target. The target can not be reached in the given time limit. The algorithm has many additional logics, but here we focus on the heuristics only. A more detailed description of the algorithm can be found in [2]. 2. THE CONCEPT OF THE MAIN BOUND- ING METHOD As we mentioned in the step 2/f of the algorithm we need a good heuristic to bound the remaining cost from every node. For this we need to calculate a minimal road and we have to bound the needed duration. First we estimate the remaining distance and driving dura- Figure 1: An example subtree of the algorithm tion. To optimize the running time we do not want to ask the mapping software for all these estimations, but store The better the lower bounds are, the less nodes need to as much of the possibly needed information as we can. We be expanded. However, it is always more time-consuming to construct two graphs where the nodes are the possible places make better estimations. of the tours (parking places, filling stations, etc.) and the The difficulty with the heuristics is the relation of the state length of the edges are the minimal distances and driving of the driver and the opening times of the routing tasks. durations. From these graphs we generate the minimal dis- Both would be easier to handle separately but together it tances and durations between each pair of nodes with the gives an NP-hard problem. Floyd-Warshall algorithm (this is a precalculation before The main steps of the algorithm are the following: the itinerary generator algorithm mentioned before). It is a whole separated topic how we handle the truck positions 1. Create the starting node of the tree from the initial and places of the routing tasks (since they are not perma- state and position of the driver. Put it in an empty nent, hence they are not contained in the graphs). list L. After we have a lower bound for the remaining driving time we estimate the total time needed by constructing a hypo- 2. While L has any element: thetical itinerary. We suppose that the driver can drive the maximal amount what he can, each time, and then reaches (a) Pick X from L with the best TotalCost value. a parking place. In each parking place he rests the min- (b) If the itinerary given by X is a complete tour imal amount what is needed and then go further. When (finishing all the routing tasks), then RETURN he reaches a routing task sometimes he has to wait for the X. time-window. However, supposing that there are no time- windows the heuristics can be calculated in linear time (we (c) Select the best possible activities (set A) to do will call it linear heuristic). from X. On the other hand, if we think about how to include the (d) BRANCHING: For each element of A create the time-windows in the linear heuristic we face a problem. Namely, node (set N ) which represents the state and po- sometimes it would be better to rest more, not just the min- sition after that activity. imal needed amount before making the task. The following (e) For each element of N calculate the cummulated example highlights that behavior. cost (we can get it by adding the cost of the ac- Suppose that the driver arrives at 6 a clock, after 9 hours of tivity to the cummulated cost of X). driving, but the routing task opens at 10 a clock. To finish the routing task, the driver has to work 1 hour there and we (f) For each element of N calculate the heuristical have one more routing task which is 2 hours far from this. cost (this step will be examined in detail in the When will we finish the last routing task? next sections). 1. If we wait for the first opening and work 1 hour, then we (g) For each element of N compare the lower bound cannot drive further because of the daily driving time limit for the reaching time with the limitations (we (9 hours). That way we have to rest at least 9 hours. After get the lower bound during the calculation of the the rest we can drive to the next routing task and finish it heuristics). If the node can not reach the target until 23 a clock. in time than delete it from N . 2. If we rest 9 hours instead of the 4 hours waiting then 14 we can start the work with a fresh state, drive to the next tion only in 35 cases because of the running time limit. We routing task and finish it until 19 a clock. hope that with some modifications to make the heuristic The above example shows that we can not make good lower branch and bound faster, we can calculate longer itineraries bounds with such a concept (as in the linear heuristic) if we which can not be calculated with the original heuristics. If try to optimize with the driver-state and the time-windows we reach a proper running time with the new branch and at a time. However, to obtain better estimations for the bound heuristics, we will try to give more tasks for the plan- branch and bound procedure we must include the time- ning (now it is about 2 − 3 routing tasks in average). It is windows in the heuristics. For the best fit (between the anticipated that with this method we can plan two times heuristics and the algorithm) we apply almost the same log- longer itineraries. ics to calculate a lower bound for the duration as we use in the branch and bound algorithm itself. 5. ACKNOWLEDGMENTS Research is partially supported by grant No. OTKA 108947 3. THE SELF-BOUNDING BRANCH AND of the Hungarian Scientific Research Fund. BOUND ALGORITHM As we mentioned before the main branch and bound algo- 6. REFERENCES rithm works on nodes with position, time, driver-state and [1] M. W. Carter, C. C. Price (2000) Operations research: truck-state. The positions are real places on the map. To a practical introduction, Crc Press. make a sharp heuristic in step 2/f of the algorithm (we will [2] Cs. Gy. Csehi, M. Farkas, (2016) Truck routing and call it B&B heuristic) we run the same branch and bound scheduling, Central European Journal of Operations algorithm (from each node) but with hypothetical positions Research, 1-17. doi: 10.1007/s10100-016-0453-8 (with coarser data and simplified activities: no refuelling, no road costs, etc.). This means that we generate those po- sitions which was used by the linear heuristic and let the different cases compete in total duration. The best solution will give the B&B heuristic which will be the lower bound for the remaining cost of the node in the main branch and bound algorithm. Observe that the B&B heuristic needs a heuristically lower bound too. For this we can use the linear heuristic. It is easy to see that this extended procedure can give much better lower bounds for the main branch and bound algo- rithm, but it is in question that if it is worth the extra time consumed during the construction of the nodes (calculat- ing their heuristic values). Observe that it is more likely to get better heuristics this way if we have more routing tasks (with time-windows). 4. RESULTS We evaluated the differences using a sample pack of 4400 itineraries, containing 2 − 3 routing tasks in average. The original branch and bound procedure (with the linear heuris- tic) expands about 2.4 ∗ 104 nodes during an itinerary con- struction. The total time of the algorithm was about 3.2∗103 minutes, but we use about 100 parallel machines. Hence, it runs in about 30 minutes to construct the 4400 itineraries. The heuristics was calculated in about 2.7 ∗ 106 ms in total. There was 67 cases where the algorithm did not give a solu- tion because of the running time limit. The new branch and bound procedure (with the B&B heuris- tic) expands about 2 ∗ 104 nodes in the same sample pack of itineraries, but each node creation needs more time. The heuristics was calculated in about 1.4 ∗ 108 ms in total (50 times more than the original). Fortunately the total time of the algorithm was about 7.6 ∗ 103 minutes, which is just twice the original. That way it is not yet better than the original algorithm with this size. However, since the better lower bound thin the searching tree, it is an exponential re- duction in the number of routing tasks, which means that this solution will be better for longer itineraries. We are not capable to make statistics for more than 10 routing tasks in one plan yet. On the other hand the new algorithm failed to give a solu- 15 Improving flow lines by unbalance ∗ Z. Mihály Z. Lelkes Optasoft Kft. Pallasz Athéné University 1051 Sas utca 10. Department of Information Technology Budapest, Hungary 6000 Izsáki út 10. zsolt.mihaly@optasoft.hu Kecskemét, Hungary lelkes.zoltan@gamf.kefo.hu ABSTRACT they are frequently parts of complex supply chains. Manu- The paper’s aim is to provide some insight regarding the per- facture of various products, such as cars, pharmaceutical in- formance of balanced and unbalanced discrete manufactur- gredients and electrical goods are only a few instances where ing flow lines. The investigation is based on phyisical simula- flow lines can be used for modelling. tion systems. The performance characteristics are gathered with a discrete time simulation program using next-event In this paper, flow lines are investigated using physical ex- time advance mechanism. The model has been implemented periments and discrete time simulation model. Some ex- in AIMMS modelling language. amples from the literature contain investigations into flow line with common buffer [16], complex optimization prob- Categories and Subject Descriptors lems where the flow line is only one element in the model [8] or more complicated systems. Huang and Li examined H.4.2 [Information Systems Applications]: Types of a two-stage hybrid flow shop with multiple product families Systems—Logistics; G.3 [Probability and Statistics]: Stochas- [7]. Simulation modelling has a wide range of applications tic processes; J.6 [Computer-Aided Engineering]: Computer- in engineering-aided manufacturing regarding system per-aided manufacturing formance. Modelling apparel assembly cells [1], a Mercedes- Benz production facility [10], or analyzing the performance General Terms of a Korean motor factory [2] are only some of the examples. Management Hopp and Spearman [6] have introduced the concept of fac- Keywords tory physics consisting of useful theories. The type of ma- flow line, discrete time simulation, factory physics, through- terial flows that they investigated is the flow line in which put, cycle time there is only one machine per station, one job class, and no capacity constraint. 1. INTRODUCTION Three main modelling measures are proposed by Hopp and Discrete manufacturing systems can be classified by several Spearman: disciplines. Following Govil and Fu [4], the manufacturing systems can be job shops, flow lines, flexible manufacturing systems or assembly systems. The research of manufacturing • Throughput (TH): the number of entities (cars, apples, systems uses diverse modelling techniques, e.g., simulation people, etc...) coming out from the system during a models [13], queueing theory and Petri nets [9]. given time Companies make great efforts to diminish their ecological • Cycle time (CT): the time an entity spends in the sys- footprint, which is highly connected to supply chains. Inter- tem est in researches on environmentally benign business prac- • Work-in-process (WIP): the number of entities residing tices has been continuously increasing. It is necessary to in the system at the same time adopt some of these techniques in order to sustain a green supply chain [12]. The study of flow lines is important as The higher TH and lower CT the system has, the better the performance will be. These parameters are not independent from each other. Little’s law makes connection among them: W IP = T H × CT The variability of procedures is measured by the coefficient of variation (CV): standard deviation To appear in the Middle-European Conference on Applied Theoretical CV = mean Computer Science, October 2016, Koper, Slovenia Hopp and Spearman use two so called characteristic func- ∗to whom all correspondance should be addressed tions to analyze the performance. The dependent variables 16 Figure 1: The simplified flow chart of the simulation model are the TH and the CT, while the independent variable is 2.1 Implementation of the model the WIP level both times. The flow line is modelled as a The simulation program is implemented in AIMMS mod- closed network. It means that the level of WIP is a model elling language [11]. It has already been used in other studies parameter [14]. with success. E.g., [3] used it on supply chain optimization with homogenous product transport constraints. It is cho- Regarding performance analysis, three important concepts sen for a number of reasons. The simulation program can were introduced [6]: be easily extended in this environment. AIMMS is linked to the most modern solvers, which are easily integratable. Furthermore, it has an advanced graphical user interface, • Best case performance: the best possible performance which can be used for creating simply usable and aesthetic for a line. It is balanced, and there is no batching. softwares. • Worst case performance: the worst possible perfor- mance for a line. All the entities move in one batch. 3. COMPUTATIONAL RESULTS • Practical worst case (PWC): As the worst case perfor- mance is so bad that it is far from practical instances, 3.1 The effect of variability PWC was introduced to define a realistic worst case. The results of the physical experiments showed performance decrease because of the variability. This effect has already been shown in [6]. Balanced flow lines with different CV’s The paper’s aim is to provide some insight regarding the are compared with the deterministic case (see Figure 3). performance of balanced and unbalanced discrete manufac- As the CV grows, the TH decreases, and the CT increases. turing flow lines. The deteriorating effect of variability in Comparing the lines on the optimal WIP level of the deter- balanced and unbalanced systems is examined in a quanti- ministic case, that is to say on WIP = 4, it can be stated tative manner. that TH gets lower by 13% at CV = 0.2, 23% at CV = 0.4, 31% at CV = 0.6, 37% at CV = 0.8 contrasted to the de- 2. METHOD OF EXAMINATION terministic line. In the meantime, CT increases by 14% at In this research, the same characteristics are used to evalu- CV = 0.2, 30% at CV = 0.4, 44% at CV = 0.6, and 58% at ate the performance as in [5]. Both physical and simulation CV = 0.8 compared to the deterministic case. model experiments are performed to gather data. The mod- els were closed networks containing single machine stations Contrary, unbalanced systems are less sensitive to the influ- and using CONWIP control. In the physical model experi- ence of variability. A balanced and an unbalanced line are ment, a toy car factory has been realized with the assump- set against each other on figure 2. Relative changes are dis- tion of infinite raw material stock and stochastic demand. played on the ordinate, which shows the deteriorating effect The entire process to build a small car takes 4 minutes. In of variability from a different aspect. It is easier to see the an arbitrary way, the operations could be distributed among difference in the drop of performance regarding WIP. These the stations where one-one person worked with different abil- characteristics are calculated in the following way: ities. abs(T Hstoch − T Hdet) T Hrel = The simulation model is a discrete time simulation program T Hdet with next-event time advance mechanism. Comparing with fixed-increment time advance method, it is more compli- Table 1: Comparison of the maximal deteriorations cated, but more efficient regarding computational need [15]. Figure 1 shows the basic mechanics of the model. W notes Balanced Unbalanced the WIP level of the model while the actual WIP reflects TH 42% 23% the state of the simulation. In the model, the process times are stochastic variables with normal distributions. CT 174% 129% 17 (a) Decrease of TH (a) CV = 0.2 (b) Increase of CT Figure 2: The effect of variability on the performance of flow lines. (b) CV = 0.4 abs(CTstoch − CTdet) CTrel = CTdet The extent of deterioration is bigger when the line is bal- anced. In this case, the maximal TH decrease is 42% and 23% in the unbalanced one. The maximal CT increase is 74% when the flow line is balanced; 29% when it is unbal- anced. It means that the maximal deterioration of TH is twice as high in balanced lines than in unbalanced lines, and the CT maximum is 2.5 times as high. The loss of TH and the growth of CT increase until the deterministically optimal WIP value is reached. The curves of both system move together until the lower deterministically optimal WIP value. After the peak, both functions begin to decrease. At (c) CV = 0.6 high WIP levels, they will converge into 1. Table 1 sums up the results regarding the peaks. 3.2 System unbalancing In this research, a tradeoff is assumed between performance and stability. Balanced systems usually give better perfor- mance while unbalanced systems more stability. However, there are situations when unbalanced system is more effi- cient. It is illustrated by a case study here. Three systems are compared: a balanced and two unbalanced. The balanced system has uniform process times of 1 hour. The CV of the first three stations equal to 0.1, and the last station’s CV is 1. In the unbalanced systems, the process times of three op- (d) CV = 0.8 erations are 1.15 hour, and there is one with 0.55 hour. The CV values are the same compared with the balanced system. Figure 3: The impairing effect of variance 18 where unbalanced systems had worse performance. In this paper, it was showed that unbalancing the flow line in a small extent achieves better performance on low WIP levels, that is to say higher TH and lower CT. In the examined case, the TH was 9-11% higher and the CT 8-9% lower on the optimal WIP level of the deterministic case. 5. REFERENCES [1] J. T. Black and B. J. Schroer. Simulation of an apparel assembly cell with walking workers and (a) Deterministic decouplers. Journal of Manufacturing Systems, 12(2):170–180, 1993. [2] K. Cho, I. Moon, and W. Yun. System analysis of a multi-product, small-lot-sized production by simulation: A korean motor factory case. Computers & Industrial Engineering, 30(3):347–356, July 1996. [3] T. Farkas, Z. Valentinyi, E. Rév, and Z. Lelkes. Supply chain optimization with homogenous product transport constraints. In Computer Aided Chemical Engineering, volume 25, pages 205–210, 2008. [4] M. Govil and M.C.Fu. Queueing theory in (b) Stochastic (First three processes are bottlenecks) manufacturing: A survey. Journal of Manufacturing Systems, 18(3):214–240, 1999. [5] W. J. Hopp. Supply Chain Science. Waveland Pr Inc, Long Grove, Illinois, 2011. [6] W. J. Hopp and M. L. Spearman. Factory Physics. McGraw-Hill Education, New York, New York, 2000. [7] W. Huang and S. Li. A two-stage hybrid flowshop with uniform machines and setup times. Mathematical and Computer Modelling, 27(2):27–45, January 1998. [8] J. Olhager and B. Rapp. Balancing capacity and lot sizes. European Journal of Operational Research, (c) Stochastic (Last three processes are bottlenecks) 19(3):337–344, March 1985. [9] H. T. Papadopoulos and C. Heavey. Queueing theory in manufacturing systems analysis and design: A Figure 4: Comparison of the performance of a balanced and classification of models for production and transfer two unbalanced systems lines. European Journal of Operational Research, 92(1):1–27, July 1996. [10] Y. H. Park, J. E. Matson, and D. M. Miller. The station with the process time of 0.55 hour has CV = Simulation and analysis of the mercedes-benz all 1. Two unbalanced cases are examined. They differ in the activity vehicle (aav) production facility. In position of the non-bottleneck process. A balanced and two Proceedings of the 1998 Winter Simulation unbalanced flow lines are examinated in order to investigate Conference, pages 921–926, December 1998. the tradeoff (see figure 4). While the balanced system has [11] M. Roelofs and J. Bisschop. AIMMS The user’s guide. a better performance regarding any WIP level in the deter- AIMMS B.V., AP Haarlem, The Netherlands, 2016. ministic case, it is not true when stochastic processes are [12] J. Sarkis. A strategic decision framework for green investigated. Around the WIP optimum, the unbalanced supply chain management. Journal of Cleaner flow line has a better output. In practical cases, the optimal Production, 11(4):397–409, June 2003. level of WIP is about where the derivatives of the functions [13] J. S. Smith. Survey on the use of simulation for change in the deterministic case. This is the region where manufacturing system design and operation. Journal unbalanced systems work better (see Figure 4). According of Manufacturing Systems, 22(2):157–171, 2003. to the experiments, the TH of the unbalanced system can be 9-11% higher compared with the balanced line, the CT [14] W. Whitt. Open and closed models for networks of is 8-9% lower in the earlier case. The positions of the bot- queues. AT&T Bell Laboratories Technical Journal, tleneck procedures have no effect in the investigated cases. 63(9):1911–1979, November 1984. The results confirm the assumption that there is a tradeoff [15] W. L. Winston. Operations Research: Applications between performance and stability, and it can be handled as and Algorithms. Cengage Learning, Boston, an optimization problem. Massachusetts, 2003. [16] H. Yamashita and S. Suzuki. An approximation method for line production rate of a serial production 4. CONCLUSIONS line with a common buffer. Computers & Operations Endeavours are generally made to balance flow lines. This Research, 15(5):395–402, 1988. is an intuitive idea, and earlier researches showed examples 19 Process Network Solution of a Soleplate Manufacturer's Extended CPM Problem with Alternatives Nándor Vincze, Zsolt Ercsey Zoltán Kovács Department of Applied Department of System Department of Informatics, Faculty of and Software Computational Education, University of Technology, Faculty of Optimization, Institute of Szeged, Boldogasszony Engineering and Informatics, University of u. 6, 6725 Szeged, Hun- Information Technology, Szeged, Árpád tér 2, gary, University of Pécs, 6720 Szeged, Hungary, vincze@jgypk.u- Boszorkány u. 2, 7624 kovacsz@inf.u- szeged.hu Pécs, Hungary, szeged.hu ercsey@mik.pte.hu ABSTRACT in CPM, please see Chanas and Zielinski, 2001, Li et al. 2015, Madhuri et al. 2013. It is worth In this paper a Hungarian soleplate mentioning that CPM orders the resources to the manufacturer's problem is described in details, activities without a representation in the CPM extended with alternatives and effectively solved. graph. In case other type of resources are ordered First, after the presentation of the industrial to the activities, the parameters of the problem problem, the CPM graph of the problem is given have to be reset and the problem has to be solved and then it is transformed into a process network. again. This may result in a large number of Then the original problem is extended with problems to be solved for a single case. Moreover, alternatives specified by various industrial needs, it is also not handled by CPM where a given for example an activity is performed in two subtask can be solved in different ways. different ways and resources with different time and costs. Then the corresponding mathematical Process network synthesis is an optimization programming model is formulated: time optimal methodology basically used in the chemical and time optimal with additional cost constraints industry. Based on a mathematical rigor, graph mathematical programming models are given. theoretical approaches and combinatorial Please note that only the earlier corresponds to techniques are combined with the first focus on the CPM problem and the latter is an extension. the synthesis step, ie structure generation. This The solution illustrates the efficacy of the method. method enables the consideration of alternatives as well as generates all feasible solutions within one model and one time solution process. For INTRODUCTION details, see Friedler et al. 1992a, 1992b, Tick et al. 2013, Kovács et al. 1999 and 2000, Garcia-Ojeda The CPM (critical path method) is an algorithmic et al. 2015, Losada et al. 2015. approach of scheduling a set of activities, where the duration times of the activities are known Transforming CPM problems into process together with their dependencies and the aim is to networks is described by Vincze et al 2015. First calculate the longest path of the planned activities. the two terminologies are mapped: an event The method originates in the 1950s. For advances corresponds to a material, an activity corresponds 1 20 to an operating unit, the dependencies between SOLEPLATE MANUFACTUR- the activities correspond to the material flows and ING the CPM graph corresponds to the process network. After the structural mapping and the logical connections establishments, each A manufacturer in the southern region of Hungary parameter of the original CPM is represented in produces various types of soleplates for irons. For the resulting process network. this research paper the production of Gx3 and EP5 types were investigated. These two soleplate Industrial examples always raise the question of types are identical in terms of shape but they alternatives, where a given subproblem can be differ in terms of assembling. Gx3 soleplates are solved by performing more than one activity or assembled by 4 operators in 8 hours shift when more than one series of activities, or various the target is 1000 solaplates; and by 5 operators in resources can be ordered to the activities with case of larger quantities. Other operators support different durations, costs etc. These situations this production work with unpacking and material cannot be represented within one CPM problem, handling; since these workers belong to other but after the transformation these can be handled work groups, their work tasks were considered to within the process network model. be handled by one additional operator. The production line is linear. One piece of Gx3 soleplate gets finished by 68 seconds. Table 1. Production steps; with alternatives indicated in brackets Gx3 soleplate production activities Time (sec) Cost Spider bending 7 5 Fuse welding 7 3 Cut 5 4 Thermostat welding (alternative thermostat welding) 10 (8) 5 (7) Spider screwdriwing 14 4 Spider soleplate welding 13 5 Edging 3 4 Cut 2 5 Test (alternative test) 7 (9) 5 (4) Unpacking 7 5 Put on the work table (alternative put on the work table) 5 (3) 4 (6) . Gx3 and EP5 types are different. While Gx3 from the cut boxes, then the soleplates are put on soleplate is fixed by 2 screws and the inserted the work table. EP5 soleplates are put on the part has to be edged, EP5 has a different work table on a plastic tray and with a paper formation and is fixed in two different phases separator; these have to be removed first and the with 3 screws drived in. Another major difference plastic plus the paper have to be separately put is that the soleplates arrive to the production line away into their containers, then the soleplates wrapped and thus a separate unpacking activity have to be put on the work table. These additional has to be performed, which requires extra time packaging tasks result that a piece of EP5 from the operator. The previously mentioned solaplate gets finished by 71 seconds. It can be additional operator supporting this production seen that the two types of soleplates are very line performs the unpacking in a serious of steps: similar, yet different in some ways. The soleplates arrive in carton boxes, the boxes are differences of the necessary operating times may first cut along a mark, then the foils are removed be indicated for example in Yamazumi tables. 2 21 Figure 1. CPM graph of the soleplate manufacturing. Figure 2. A. Process network of the soleplate B. Process network of the manufacturing manufacturing. with alternatives. Please note that Figure 2.A. illustrates the process their costs also differ, namely the former has a network model of the soleplate manufacturing cost of 6 and the latter has a cost of 4; and Figure 2.b contains the extended model with  thermostat welding activity may be alternative activities. Three activities may be performed under 10 seconds or as an alternative performed in two ways, namely under 8 seconds when a senior welder performs  put on the work table activity may be the activity; please note that obviously their costs performed under 3 seconds or as an alternative also differ, namely the former has a cost of 5 and under 5 seconds when a student or an assistant the latter has a cost of 7; performs the activity; please note that obviously  test activity may be performed under 7 3 22 seconds or as an alternative under 9 seconds The overall cost constraint of the manufacturer when a junior worker performs the activity; was determined to be 55. Within this constraint please note that obviously their costs also differ; the optimal production process contains the namely the former has a cost of 5 and the latter original put on the work table activity and the has a cost of 4. alternative thermostat welding activity and the When only time is considered in the mathematical original test activity. When the manufacturer programming model, it corresponds to the determined the overall cost constraint to be 50, original CPM problem extended with the above then the optimal production process contains the mentioned alternatives. Since in this industrial alternative put on the work table activity and the case study financial issues were also taken into alternative thermostat welding activity and the consideration, therefore the time optimal alternative test activity. mathematical programming model was also Please note that the same method was used when extended with costs constraints. Please note that solving the industrial problem of the EP5 type these models are detailed in Vincze et al. 2016. soleplate manufacturing process. CONCLUDING REMARKS The CPM gives the longest path of the planned complicated to reach a true goal. Therefore in the activities together with its overall duration, present paper a new process network method was nevertheless, for industrial real case problems used which extends the problem range of CPM where financial issues also influence the problems with alternatives as well as cost decisions of the production processes it is very constraints. ACKNOWLEDGEMENTS the 650th anniversary of the foundation of the The present scientific contribution is dedicated to University of Pécs, Hungary. REFERENCES [6] Garcia-Ojeda, J.C., B. Bertok, F. Friedler, A. [1] Chanas S. and P. Zielinski, Critical path Argoti, and L.T. Fan, A Preliminary Study of the analysis in the network with fuzzy activity times, Application of the P-graph Methodology for Organization-based Multiagent System Designs: Fuzzy Sets and Systems, (2001) 122, 195–204 Assessment, Acta Polytechnica Hungarica, 12, [2] Friedler, F., K. Tarjan, Y. W. Huang, and L. T. 103-122 (2015). Fan, Graph-Theoretic Approach to Process Synthesis: Axioms and Theorems, Chem. Engng [7] József Tick, Csanád Imreh, Zoltán Kovács, Sci., 47, 1973-1988 (1992a). Business Process Modeling and the Robust PNS Problem, Acta Polytechnica, DOI: [3] Friedler, F., K. Tarjan, Y. W. Huang, and L. T. 10.12700/APH.10.06.2013.6.11, Volume 10, Fan, Combinatorial Algorithms for Process Issue Number 6, 193-204, 2013 Synthesis, Computers Chem. Engng, 16, S313- 320 (1992b) [8] Losada, J.P., I. Heckl, B. Bertok, F. Friedler, J. C. Garcia-Ojeda, and A. Argoti, Process-network [4] Zhenhong Li, Yankui Liu, Guoqing Yang: A Synthesis for Benzaldehyde Production: P-graph New Probability Model for Insuring Critical Path Approach, Chemical Engineering Transactions, Problem with Heuristic Algorithm. 45, 1369-1374 (2015). Neurocomputing 148: 129-135 (2015) [9] Vincze, N., Z. Ercsey, T. Kovács, J. Tick, Z. [5] Madhuri, U., Saradhi, P. ve Shankar, R., (2014) Kovács, Process Network Solution of Extended "Fuzzy Linear Programming Model for Critical CPM Problems with Alternatives, ACTA Path Analysis", Int. J. Contemp. Math. Sciences, POLYTECHNICA HUNGARICA 13:(3) pp. 101- Vol. 8, No. 2, 2013, pp. 93-116 117. (2016) 4 23 ON NIST test of a Novel Cryptosystem Based on Automata Compositions Pál Dömösi József Gáll Géza Horváth Institute of Mathematics and Faculty of Informatics Faculty of Informatics Informatics University of Debrecen University of Debrecen University of Nyíregyháza H-4028 Debrecen, Kassai út H-4028 Debrecen, Kassai út H-4400 Nyíregyháza, Sóstói út 26, Hungary 26, Hungary 31/B, Hungary gall.jozsef@inf.unideb.hu horvath.geza@inf.unideb.hu domosi.pal@nye.hu Norbert Tihanyi Faculty of Informatics Eötvös Loránd University H-1117 Budapest, Pázmány Péter sétány 1/C, Hungary tihanyi.pgp@gmail.com ABSTRACT also called public-key cryptosystem, – where the determina- In this paper we discuss on NIST test results of a previously tion of decryption key is infeasible for anyone knowing only introduced cryptosystem based on automata compositions. the encryption key. (The principle of public-key cryptog- Our conclusions based on the statistics confirm that the re- raphy was invented by Diffie and Hellman in 1976.) The quirements of NIST test are fulfilled. discussed novel cipher is a symmetric system. Several types of cryptosystems based on automata theory Keywords have been designed so far. Some of them are based on Mealy automata network, NIST, block cipher, statistics automata [11, 16, 19, 20] or their generalization [2], while others are based on cellular automata [5, 10, 14, 13]. Almost 1. INTRODUCTION all of the best-known automaton based cryptosystems share The history of cryptography is crowded by examples when the common problem of serious realization difficulties: some supposed to be very safe encryption systems were proved systems are easy to defeat [3, 4, 15], the technical realization breakable. Based on the simple probability theory and math- of others result in slow performance [10], still others exhibit ematical logic, the one-time pad system (OTP), – which is difficulties in the choice of the key-automaton [14, 13], some commonly called Vernam cipher – the only known crypto- of them has no known rigorous security analysis and the graphic system that is completely unbreakable. Only this security of some systems is largely unknown [5]. system is known to have a mathematical proof on its perfect secrecy [17]. Although the OTP is the most reliable form of In [6, 7] we introduced new block ciphers based on Gluškov- encryption, in practice its use is not efficient. Each user must type product of automata. In this paper we investigate the have a copy of the symmetric key and the key exchange can system [6]. Both systems use the following simple idea: Con- only be accomplished through secure communication chan- sider a giant-size permutation automaton such that the set nels. The key can not be used more than once and the key of states and the set of inputs consist of all given length size must be at least the size of the encoded text. OTP is of strings over a non-trivial alphabet as all possible plain- a symmetric system, where the decryption and encryption text/ciphertext blocks. Moreover consider a cryptograph- key coincide, or any of them can be easily derived from the ically secure pseudo random number generator with large other. Therefore, both of the encryptrion and decryption periodicity having the property that, getting its really ran- keys must be secret, and those secret keys should be known dom kernel, it serves a sequence of pseudo random strings only by the sender and the recipient of the message. An- as inputs for the automaton. For each plaintext block the other main type of cryptosystems is the asymmetric one – system calculates the new state into which the actual pseu- dorandom string takes the automaton from the state which is identified as the actual plaintext block. The string, iden- tified as the new state, will be the ciphertext block ordered to the considered plaintext block. Of course, the ciphertext will be the concatenation of the generated ciphertext block. The giant size of the automaton makes it infeasible to break the system by brute-force method. The problem of this idea is that store of the transition ma- trix of giant-size automata is impossible. Another idea is 24 that this problem can be overcomed considering automata (A, Σ, δ) is a temporal product (t-product) of A1 by A2 with which consists of composition of automata. In this case, we respect to Σ and ϕ if for any a ∈ A and x ∈ Σ, δ(a, x) = should store only the component-automata and the struc- δ2(δ1(a, x1), x2), where (x1, x2) = ϕ(x). The concept of tem- ture of the composition. Moreover, if the component au- poral product is generalized in the natural way to an arbi- tomata are isomorphic to each others then it is enough to trary finite family of n > 0 automata At (t = 1, . . . , n), store the transition matrix of one component automaton and all with the same state set A, for any mapping ϕ : Σ → the structure of the isomorphisms. By this recognition, the Qn Σ t=1 t, by defining δ(a, x) = δn(· · · δ2(δ1(a, x1), x2), · · · , xn) storage of automata having 2128 states and 2128 input signs when ϕ(x) = (x1, . . . , xn). In particular, a temporal prod- can be easily solved. The basic idea of this cipher is to oper- uct of automata with a single factor is just a (one-to-many) ate on a giant secret square matrix which is compressed into relabeling of the input letters of some input-subautomaton the memory using automata-theoretic methods. The matrix of its factor. has 2128 rows and 2128 columns such that each of its rows is a permutation of all bitstrings of 128 bit length. Using Lemma 2. Every temporal product of permutation au- automata-theoretic methods, we can easily handle this giant tomata is a permutation automaton. matrix. Because of the giant size of the matrix, there is no hope to attack the system by brute-force method. On the Given a function f : X1 × · · · × Xn → Y, we say that f other hand, this giant matrix can be generated unambigu- is really independent of its i-th variable if for every pair ously by a bitstring of 782 bytes length. Note that this less (x1, . . . , xn), (x1, . . . , xi−1, x0i, xi+1, . . . , xn) ∈ X1 × · · · × Xn, than 1 kilobyte long string can be generated by an appro- f (x1, . . . , xn) = f (x1, . . . , xi−1, x0i, xi+1, . . . , xn). Otherwise priate hash function using a secret password of any length. we say that f really depends on its i-th variable. For all notions and notation not defined here we refer to the A (finite) directed graph (or, in short, a digraph) D = (V, E) monographs [8, 9, 12, 1]. The discussed cryptosystem is a (of order n > 0) is a pair consisting of sets of vertices block cipher. Since the key automaton is a permutation au- V = {v1, . . . , vn} and edges E ⊆ V × V. Elements of V are tomaton, for every ciphertext there exists exactly one plain- sometimes called nodes. Moreover, if (v, v0) ∈ E then it is text making the encryption and decryption unambiguous. said that (v, v0) is an outgoing edge of v, and simultaneously, Moreover, there is a huge number of corresponding encoded (v, v0) is an incoming edge for v0. (In this way, a loop edge messages to each plaintext so that several encryptions of the (v, v) has both of these properties with respect to the vertex same plaintext yield several distinct ciphertexts. v.) An edge (v, v0) ∈ E is said to have source v and target v0. If |V | = n then we also say that D is a digraph of order n. If 2. THEORETICAL BACKGROUND V can be decomposed into two disjoint (nonempty) subsets V By an automaton we mean a deterministic finite automaton 1, V2 such that V1 is the set of all incoming edges and V2 is the set of all outgoing edges then we say that D is a bipartite without outputs. If all the rows of the transition matrix are digraph. permutations of the state set then we have a permutation automaton. Let Σ be the set of all binary strings with a given length ` ≥ 1 and let n be a positive integer, let A Lemma 1. An automaton A = (A, Σ, δ) is a permutation 1 = (Σ, Σ × Σ, δ ) be a permutation automaton and let A automaton if and only if for any a, b ∈ A, x ∈ Σ, δ(a, x) = A1 i = (Σ, Σ × Σ, δ ), i = 2, . . . , n be state-isomorphic copies of A δ(b, x) implies a = b. Ai 1 such that A1, . . . , An are pairwise distinct, and let n be a power of 2. Consider the following bipartite digraphs: Let Ai = (Ai, Σi, δi) be automata where i ∈ {1, . . . , n}, n ≥ 1. Take a finite nonvoid set Σ and a feedback function ϕi : D A 1 = ({1, . . . , n}, {(n/2 + 1, 1), (n/2 + 2, 2), . . . , (n, n/2)}), 1 × · · · × An × Σ → Σi for every i ∈ {1, . . . , n}. The Gluškov-type product of the automata Ai with respect to D the feedback functions ϕ 2 = ({1, . . . , n}, {(n/4 + 1, 1), (n/4 + 2, 2), . . . , (n/2, n/4), i (i ∈ {1, . . . , n}) is defined to be (3n/4 + 1, n/2 + 1), (3n/4 + 2, n/2 + 2), . . . , (n, 3n/4)}), the automaton A = A1 × · · · × An(Σ, (ϕ1, . . . , ϕn)) with state set A = A1 × · · · × An, input set Σ, transition function . . ., δ given by δ((a1, . . . , an), x) = (δ1(a1, ϕ1(a1, . . . , an, x)), . . . , δn(an, ϕn(a1, . . . , an, x))) for all (a1, . . . , an) ∈ A and x ∈ Σ. D In particular, if A log2n−1 = ({1, . . . , n}, {(3, 1), (4, 2), (7, 5), (8, 6), . . . , 1 = . . . = An then we say that A is a (n − 1, n − 3), (n, n − 2)}), Gluškov-type power. D We shall use the feedback functions ϕ log2n = ({1, . . . , n}, {(2, 1), (4, 3), . . . , (n, n − 1)}), i, i = 1, . . . , n in an extended sense as mappings ϕ∗i : A1 × · · · × An × Σ∗, where D ϕ∗ log2n+1 = D1, i (a1, . . . , an, λ) = λ and ϕ∗i(a1, . . . , an, px) = ϕ∗i(a1, . . . , an, p)ϕi(δ1(a1, ϕ∗1(a1, . . . , an, p)), . . . , . . . , δn(an, ϕ∗n(a1, . . . , an, p)), x), ai ∈ Ai, i = 1, . . . , n, p ∈ Σ∗,x ∈ Σ. In the sequel, ϕ∗i, i ∈ {1, . . . , n} will also be denoted by D ϕ 2log2n = Dlog2n. i. For every digraph D = (V, E) with D ∈ {D Let A 1, . . . , D2log2n}, t = (A, Σt, δt), t = 1, 2 be automata having a common let V state set A. Take a finite nonvoid set Σ and a mapping ϕ of Σ 1 be the set of all incoming edges and let V2 be the set of all outgoing edges, and define the Gluškov-type product, into Σ1 × Σ2. Then the automaton A = 25 called two-phase D-product, AD = A1 × · · · × An(Σn, (ϕ1, . . . , ϕn)) of A1, . . . , An so Table 1: Parameters used for NIST Test Suite that for every (a1, . . . , an), (x1, . . . , xn) ∈ Σn, i ∈ {1, . . . , n}, ϕ Test Name Block length i(a1, . . . , an, (x1, . . . , xn)) = (aj ⊕ xj , xi), if (j, i) ∈ V1 and aj ⊕ xj is the bitwise addition modulo 2 of aj and xj , Block Frequency 128 ϕi(a1, . . . , an, (x1, . . . , xn)) = (a0 ⊕ j xj , xi), if (j, i) ∈ V2, a0j denotes the state into which Non-overlapping Template 9 ϕj (a1, . . . , an, (x1, . . . , xn)) takes the automaton from its state Overlapping Template 9 aj , and a0 ⊕ j xj is the bitwise addition modulo 2 of a0j and x Approximate Entropy 10 j . Serial 16 Let B = (Σn, (Σn)2log2n, δB) be the temporal product of AD , . . . , A with respect to (Σn)2log2n and the iden- Linear Complexity 500 1 D2log2n tity map ϕ : (Σn)2log2n → (Σn)2log2n. We say that B is a In order to analyze the output of the algorithm we encrypted key-automaton with respect to A1, . . . , An.1 Obviously, B the Rockyou database, which contains more than 32 millions is unambigously defined by the transition matrix of A1 and of cleartext passwords. Applying the NIST test for the en- the bijective mappings τ1 : Σ → Σ, . . . , τn : Σ → Σ which crypted file it has turned out that the output of the algo- represent the state isomorphisms of A1, . . . , An to A. rithm can not be distinguished in polynomial time from true random sources by statistical tests. The exact p-values of An important property of key-automata is explained in the the evaluation of the ciphertext are shown in Table (2). We following result. also tested the uniformity of the distribution of the p-values obtained by the statistical tests included in NIST, which is a Theorem 1. Every key-automaton is a permutation au- usual requirement in the literature (see e.g. [18]). The uni- tomaton. formity of p-values provide no additional information about the type of the cryptosystem. We have also shown that the Both of the encryption and decryption algorithms use a proportions of binary sequences which passed the 0.01 level pseudo random generator and the above defined key-automaton. lie in the required confidence interval (see e.g. [18]). 3. THE NIST TEST Table 2: Results for the uniformity of p-values and the proportion of passing sequences The National Institute of Standards and Technology (NIST) published a statistical package consisting of 15 statistical P-value Proportion Statistical Test tests that were developed to test the randomness of arbi- trarily long binary sequences produced by either hardware 0.162606 296/300 F requency or software based cryptographic random or pseudorandom 0.407091 298/300 BlockF requency number generators. In case of each statistical test a set of P-values was produced. Given a significance level α, if the 0.574903 297/300 CumulativeSums P-value is less than or equal to α then test suggests that 0.840081 295/300 CumulativeSums the observed data is inconsistent with our null hypothesis, i.e. the ’hypothesis of randomness’, so we reject it. We used 0.205897 297/300 Runs α = 0.01 as it is common in such problems in cryptography. An α of 0.01 indicates that one would expect 1 sequence 0.284959 297/300 LongestRun in 100 sequences to be rejected under the null hyphothe- 0.527442 297/300 Rank sis. Hence a P-value exceeding 0.01 would mean that the sequence would be considered to be random, and P-value 0.623240 298/300 F F T ≤ 0.01 would lead to the conclusion that the sequence is 0.958773 295/300 N onOverlappingT emplate non-random. . . . One of the criteria used to evaluate the AES candidate algo- · · · · · · · · · rithms was their demonstrated suitability as random number generators. That is, the evaluation of their output utiliz- 0.419021 299/300 OverlappingT emplate ing statistical tests should not provide any means by which 0.220931 298/300 U niversal to distinguish them computationally from a truly random source. Randomness testing was performed using the same 0.935716 299/300 ApproximateEntropy parameteres as for the AES candidates in order to achieve the most reliable and comparable results. First the input pa- 0.516465 171/177 RandomExcursions rameters –such as the sequence length, sample size, and sig- 0.384836 172/177 RandomExcursionsV ariant nificance level– were fixed. Namely, these parameters were set at 220 bits, 300 binary sequences, and α = 0.01, respec- 0.042808 298/300 Serial tively. Furthermore, Table 1 shows the length parameteres 0.253551 296/300 Serial we used. 0.039244 295/300 LinearComplexity 1Recall that n should be a power of 2. 26 3.1 Sárközy and Mauduit randomness test [5] A. Clarridge and K. Salomaa. A cryptosystem based Different security audits and processes use different statis- on the composition of reversible cellular automata. tical tests and methods. In order to fulfill further require- LATA, LNCS 5457, pages 314–325, 2009. ments we performed the Sárközy and Mauduit methods in [6] P. Dömösi and G. Horváth. A novel cryptosystem order to study the behaviour of pseudorandom sequences based on abstract automata and latin cubes. Studia generated by our cryptosystem. Let EN = {e1, e2, . . . , eN } ∈ Scientiarum Mathematicarum Hungarica, {−1, +1}N represent a finite binary sequence. Let us define 52(2):221–232, 2015. t [7] P. Dömösi and G. Horváth. A novel cryptosystem X U (E based on gluškov product of automata. Acta N , t, a, b) = ea + jb j=0 Cybernetica, 22:359–371, 2015. [8] P. Dömösi and C. L. Nehaniv. Algebraic theory of automata networks: An introduction. ser. SIAM The well-distribution measure of EN is defined by monographs on Discrete Mathematics and M −1 Applications, vol. 11, Society for Industrial and X W (E N ) = max |U (EN , M, u, v)| = max eu+jv Applied Mathematics (SIAM), Philadelphia, PA, doi M,u,v M,u,v j=0 10.1137/1.9780898718492, ”2005”. where the maximum is taken over all M, u, v with u + (M − [9] F. Gécseg. Products of automata ser. EATCS 1)v ≤ N . Furthermore let us define Monographs on Theoretical Computer Science, vol. 7, doi 10.1007/978-3-642-61611-2. Springer-Verlag, M X Berlin, Heidelberg, New York, Tokyo, 1986. V (EN , M, D) = en+d e . . . e 1 n+d2 n+dk [10] P. Guan. Cellular automaton public key cryptosystem. n=1 Complex Systems, 1:51–56, 1987. [11] M. Gysin. One-key cryptosystem based on a finite The correlation measure of order k of EN is defined by non-linear automaton. In E. Dawson and J- Golic, M eds., Proc. Int. Conf. Proceedings of the Cryptography: X C k (EN ) = max |V (EN , M, D)| = max en+d e . . . e Policy and Algorithms, Lecture Notes in Computer 1 n+d2 n+dk M,D M,D n=1 Science 1029, Springer-Verlag, Berlin, pages 165–163. where the maximum is taken over all M and D = (d CPAC’95, Brisbane, Queensland, Australia, July 3-5 1, . . . , dk ) such that 0 ≤ d 1995. 1 ≤ · · · ≤ dk ≤ N − M . The goodness of a PRNG is determined by the order of W (E [12] R. M. J. E. Hopcroft and J. D. Ullman. Introduction N ) and Ck (EN ). We were not able to distinguish the output of our cryptosys- to Automata Theory, Languages and Computation, tem from true random sources by analyzing the deviation of second edition, Addison-Wesley series in computer W (E science. Addison-Wesley, 2001. N ) and Ck (EN ). [13] J. Kari. Reversibility of 2d cellular automata is 4. CONCLUSIONS undecidable. Physica D, 45:379–385, 1990. The output of our crypto algorithm has passed all statistical [14] J. Kari. Cryptosystems based on reversible cellular tests we performed (NIST test, Sárközy and Mauduit test) automata. University of Turku, Finland, preprint, and we were not able to distinguish it from true ran- April 1992. dom sources by statistical methods. Statistical analyses of [15] T. Meskaten. On finite automaton public key a cryptosystem is a must have requirement, and these tests cryptosystems. TUCS Technical Report, Turku Centre are good indicators that further analyses should be done. for Computer Science, Turku, No. 408:1–42, 2001. Exact cryptoanalyses like chosen-plaintext, known-plaintext [16] V. J. Rayward-Smith. Mealy machines as coding and related-key attack will be investigated in order to prove devices In: H. J. Beker and F. C. Piper, eds., or disprove the strength of this cryptosystem. These prob- Cryptography and Coding. Claredon Press, Oxford, lems are the subject of our future research. 1989. [17] C. Shannon. Communication theory of secrecy 5. REFERENCES systems. Bell System Technical Journal, [1] P. C. O. A. J. Menezes and S. A. Vanstone. Handbook 28(4):656–715, 1949. of Applied Cryptography ser. Discrete Mathematics [18] J. Soto. Statistical testing of random number and Its Applications, doi 10.1201/9781439821916. generators. National Institute of Standards and CRC Press, 1996. Technology, [2] A. Atanasiu. A class of coders based on gsm. Acta http://csrc.nist.gov/groups/ST/toolkit/rng/, Informatica, 29:779–791, 1992. downloaded in August 2016. [3] F. Bao. Cryptoanalysis of partially known cellular [19] R. Tao. Finite Automata and Application to automata. IEEE Trans. on Computers, 53:1493–1497, Cryptography. Springer-Verlag, Berlin, 2009. 2004. [20] R. Tao and S. Chen. The generalization of public key [4] E. Biham. Cryptoanalysis of the chaotic map cryptosystem fapkc4. Chinese Science Bulletin, cryptosystem suggested at eurocrypt’91. In D. W. 44(9):784–790, May 1999. Davies, ed., Proc. Conf. Advances in Cryptology, Workshop on the Theory and Application of Cryptographic Techniques, Brighton, UK, pages 532–534. EUROCRYPT’91, April 8-11 1991. 27 ALGator– An Automatic Algorithm Evaluation System Tomaž Dobravec Faculty of Computer and Information Science University of Ljubljana, Slovenia tomaz.dobravec@fri.uni-lj.si ABSTRACT activities user must be logged into a system and must have In this paper we present an automatic algorithm evaluation one of the following roles. system called ALGator, which was developed to facilitate the algorithm design and evaluation process. The system System administrator. enables unbiased tests of the correctness and quality of im- The system administrator installs and manages the whole plemented algorithms for solving various kinds of problems system (the software and hardware part), and has the ac- (e.g. sorting data, matrix multiplication, traveler salesman cess to all the resources of the system. problem, shortest path problem, and the like). Within the Project administrator. ALGator one can define a problem by specifying the problem The project administrator defines the project by a) imple- descriptors, test sets with corresponding test cases, input menting the predefined java or C++ interfaces that describe parameters and output indicators, algorithm specifications the problem and the structure of the algorithm; b) defining and criteria for measuring the quality of algorithms. When a sets of test cases on which algorithms will be executed, and user of the system submits an algorithm for solving the given c) characterizing the format of the input and the output of problem, ALGator automatically executes this algorithm on algorithms (i.e. defining the parameters of the input and in- predefined tests, measures the quality indicators and pre- dicators of the output). Project administrator has an access pares the results to be compared with the results of other to all the project resources. If the project is made public, algorithms in the system. ALGator in meant to be used by project data can be seen by all users, while private projects algorithm developers to perform independent quality tests can be seen only by project and system administrators. for their solutions. Researcher. The researcher defines an algorithm within the selected pro- ject, runs predefined tests and compares the results with the Keywords results of other algorithms. Public algorithms can be seen algorithm development, automatic execution and evaluation, by every user while the private algorithms can only be seen algorithm testing and analysis the owner (i.e. researcher) and the project administrator. 1. INTRODUCTION 1.2 A typical use case Algorithm evaluation is a very important part of an algo- A typical way of using the ALGator is as follows: rithm design and implementation process. The ALGator was designed to facilitate an automatic algorithm evaluation process. It is used to execute an algorithm implementation • The system administrator prepares the system by pro- on the given predefined sets of test cases and to analyze var- viding the hardware, installing the ALGator software ious indicators of the execution. Within every project of the packages, and publishing the internet address of the system user can define the problem to be solved, sets of test installed system. cases, parameters of the input and indicators of the output • The project administrator adds a new project and de- data and the criteria for the algorithm quality evaluation. fines all the project’s properties. When the project is When a project is defined, any number of algorithm im- completely defined and declared as public, the ALGa- plementations (programs) can be added. When requested, tor automatically generates an internet subpage with system executes all the implemented algorithms, checks the the project presentation and usage guide sections. correctness and compares the quality of their results. Using the ALGator user can add additional quality criteria, draw • The project administrator adds some state of the art graphs and perform evaluations and comparisons of defined algorithms for solving the problem of the project, which algorithms. will be used as a reference for the evaluation process (i.e. the results of the algorithms added by researchers will be compared with the results of these referential 1.1 User roles algorithms). The ALGator can be used by users with different roles. An unauthenticated user (guest) can execute only those actions • According to the rules, presented at the project’s web- that do not change the system, which basically means that site, the researcher adds a new algorithm. The ALGa- such user can only view the public data. For all the other tor will automatically run the new algorithm on prede- 28 fined tests. The researcher then checks the correctness the correct implementation of the getCurrent() method. and compares the results of his algorithm with the re- All the other methods are general and they can be used sults of the other algorithms defined in the project. without modification. The researcher can also decide to make the algorithm public c l a s s S o r t T e s t S e t I t e r a t o r public (by default, the algorithms are private). extends D e f a u l t T e s t S e t I t e r a t o r { • The guest of the system lists the results and prints public T e s t C a s e g e t C u r r e n t ( ) { the graphs and other data produced by the ALGator. S t r i n g [ ] f i e l d s = i n p u t L i n e . s p l i t ( ” : ” ) ; Guest can also perform some actions (like customiza- tion of the presentation) that do not alter the project p r o b S i z e = I n t e g e r . p a r s e I n t ( f i e l d s [ 1 ] ) ; configuration. S t r i n g group = f i e l d s [ 2 ] ; i n t [ ] a r r a y = new i n t [ p r o b S i z e ] ; 2. PROJECT DEFINITION switch ( group ) { The main task of the project administrator is to provide the case ”RND” : configuration files and to implement corresponding java or Random rnd = new Random ( ) ; C++ interfaces. Besides the definition of the output for- f o r ( i = 0 ; i < p r o b S i z e ; i ++) mat (where the sequence of the parameters and indicators a r r a y [ i ] = rnd . n e x t I n t ( 1 0 0 0 ) ; in output file is described), the test cases, the test sets and break ; the algorithm structure has to be defined precisely. // . . . } The test cases and the test sets S o r t T e s t C a s e t C a s e = new S o r t T e s t C a s e ( ) ; A test case in the ALGator execution environment is de- t C a s e . a r r a y T o S o r t = a r r a y ; fined by a subclass of a TestCase class, which contains data return t C a s e ; structures to hold the test case data. Since these data struc- } tures are project-specific (i.e. each problem needs data of its } own type) the project administrator has to implement the [Project]TestCase class and prepare the data structures. For example, in the data-sorting problem, the SortTestCase Algorithms class could be defined as follows. The “heart” of the each project are the implemented algo- rithms. Each algorithm is represented by a subclass of the public c l a s s S o r t T e s t C a s e extends T e s t C a s e { AbsAlgorithm class with the following methods: // An a r r a y o f d a t a t o b e s o r t e d public i n t [ ] a r r a y T o S o r t ; } ErrorStatus init(TestCase test). This method takes care for the input of the algorithm; it reads the test case A test set contains one or more test cases and it is a mini- and prepares the data. To enable fast algorithm execu- mal execution unit. Test set is defined by a single text file in tion all expensive initial tasks have to be done in this which every line defines one test case. The format of these method. When this method is done all the required lines is project-specific and it is defined by a project admin- algorithm’s input data has to be prepared in a proper istrator. If required, additional files can be used to specify format. the cases. Again, the syntax and the semantics of the con- void run(). In this method the execute(...) method is tent of these files is defined by a project administrator. The called. The parameters of the execute() method are following presents an example of the text file defining five project-specific and are provided by project adminis- test cases for the data-sorting problem. trator. The ALGator takes the time of the execution t e s t 1 : 1 0 0 0 0 :RND of the run() method as an algorithm execution time t e s t 2 : 2 0 0 0 0 :RND therefore nothing else as the execute() method call t e s t 3 : 3 0 0 0 0 :RND should be placed in the run() method body. t e s t 4 : 4 0 0 0 0 : FILE : numbers . t x t : 1 2 5 4 0 public void run ( ) { t e s t 5 : 5 0 0 0 0 : FILE : numbers . t x t : 1 6 5 3 4 e x e c u t e ( s o r t T e s t C a s e . a r r a y T o S o r t ) ; } To iterate the test set of a given project (i.e. to read the lines of the text file, to parse and interpret their meaning and ParameterSet done() . This method collects all the param- to generate a test case for each line) the ALGator uses the sters and indicators of the execution and prepares them AbstractTestSetIterator class. The main task of this class in the form suitable to be written in to the output file. is to provide test cases (as [Project]TestCase classes) one by one. The AbstractTestSetIterator class contains the following methods: void hasNext() (returns true, if there The AbsAlgorithm class is abstract and the project adminis- are some tests left in the test set), void readNext() (reads trator has to provide the [Project]AbsAlgorithm subclass the next test case and stores it in internal data structures) with the above mentioned methods implemented. Besides and TestCase getCurrent() (returns the last test case read he has to declare fields for input data (in these fields the by the readNext() method). Since the representation of test input data obtained from the test case will be stored dur- cases is project-specific, project administrator has to provide ing the execution of the init() method) and the abstract 29 execute() method with appropriate number and type of of selected type of commands on the programming language parameters. The task of the researcher is to implement a level. Using this one can, for example, measure how many subclass of [Project]AbsAlgorithm and implement the ex- times the memory allocation functions were executed during ecute(...) method. In other words, all the “dirty job” the algorithm execution and the amount of the memory allo- of preparing data and collecting the results is done by the cated by these calls. One can also use CNT measurements to project administrator. The researcher who wants to provide detect which part of the algorithm is most frequently used. an algorithm only has to implement one method which re- For example, if the problem in concern would be the data- turns a proper result. In the case of data-sorting problem, sorting, using the CNT measurements one could count the an algorithm only needs to sort the array of data; a very sim- number of comparisons, the number of swaps of elements ple (but technically correct) algorithm that can be executed and the number of recursive function calls (which are the in the ALGator is as follows. measures that can predict the algorithm execution behav- ior [4]). To facilitate the CNT measurement in the project, public c l a s s J a v a S o r t A l g o r i t h m the project administrator has to define the names and the extends S o r t A b s A l g o r i t h m { meaning of the counters and the researchers have to tag the public void e x e c u t e ( i n t [ ] d a t a ) { appropriate places in their code. Everything else is done A r r a y s . s o r t ( d a t a ) ; automatically by the ALGator. } } The JVM measurements. The algorithm written in the java programming language 3. INDICATORS OF THE ALGORITHM compiles into the java byte code. An interesting option of- fered by the ALGator is the ability to count how many times Since the ALGator was designed to be used for various kinds each byte code instruction was used while execution the al- of problems, the criteria for measuring the quality of algo- gorithm on a given test case. To facilitate this option the rithms are not defined as a part of the system but they have ALGator uses a dedicated java virtual machine which was to be defined by the project administrator. The current developed as a part of the ALGator project [2, 3]. Besides version of the system enables measurements of the three dif- counting the usage of each byte code this virtual machine ferent kinds of indicators: a) the indicators to measure the also records the data about the memory usage. In [1] Lam- speed and the quality of the algorithm (the so called EM bert and Power indicated that the frequency of the usage of indicators), b) the project-specific counters to count the us- each byte code instruction can be used to predict the exe- age of the parts of the algorithm’s program code (the so cution time. Even though the ALGator’s ability to count called CNT indicators), and c) the counters of the java byte the byte code instructions usage is quite young, we expect code usage (the so called JVM indicators). These indicators that the data produced by the JMV measurements could be are calculated with independent measurements that are per- useful not only for the quantitative but also the substantive formed as separated tasks so they do not interfere with one analysis of the algorithms. another. For example: when the ALGator measures time, the CNT and JVM indicators are disabled. To perform the JVM measurements a dedicated java virtual machine is used. 4. ANALYZING THE RESULTS As a result of the algorithm execution the ALGator pro- The EM measurements. duces the text output files. For each tuple (algorithm, test These measurements are used to measure the time and other set, measurement) one file is created; each line in this file project-specific metrics. All measurements of the time are contains parameters and indicators of one test case. performed automatically. To provide as accurate time in- dicators as possible the ALGator tries to reduce the influ- ence of the uncontrolled computer activities (e.g. sudden increase of a system resource usage) by running each al- gorithm several times. The system measures the first, the best, the worst and the average time of the execution. The project administrator only needs to specify the phases of al- gorithm execution (e.g. the pre-processing phase, the main phase, the post-processing phase, ...) and to select which of the time indicators are to be presented as the result of execution. The project-specific indicators are defined by the project administrator. They can be presented as a string or as a number. For example, for exact algorithms, the value of an indicator could be ”OK” (is the algorithm produced the correct result) or ”NOK” (is the result of the algorithm is not correct). For approximation algorithms the value of an indicator could be the quality of the result (i.e. the quotient Figure 2: An example of data query with result. of the correct result and the result of the algorithm). The CNT measurements. The data in the output line is separated by semicolons (CSV The CNT measurements are used to count the usage of the format). For efficient work with this data ALGator provides parts of the program code. This option is used to analyze the analyzer with its own query language and with the vi- the usage of a certain system resource or to count the usage sualization module for presenting data as graphs. For ex- 30 Figure 1: The visualization module of the ALGator. ample, to get the minimal execution times for algorithms instructions. As stated in [1] these statistics provide enough named JHoare and JWirth on the test set called TestSet3, information to be used for the platform independent timing user can run query as depicted in Figure 2. of the algorithms. Our preliminary tests indicate a great correlation between the number of used java byte code in- The ALGator query language is a powerful tool that en- structions (multiplied by the corresponding weight depend- ables all sorts of data manipulation. An example of a com- ing on the type of instruction) and the execution time. plex query to calculate the quotient of minimal times for The ALGator is a testing environment, which aims to make the JHoare algorithm running on two different computers the testing process as easy as possible for both, the project (F1.C1 and F1.C2) is presented below. administrators and for the researchers. We tired to mini- q u e r y F 1 C 1 = F ROM T e s t S e t 0 mize the effort that has to be used to prepare the project W H E R E ( a l g o r i t h m =*) AND C o m p u t e r I D = F1 . C1 and to prepare the algorithm and we think that this goal S E L E C T Tmi n AS A1 ; was achieved. The biggest challenge for the project admin- q u e r y F 1 C 2 = F ROM T e s t S e t 0 istrator is to prepare adequate test cases and to write several W H E R E ( a l g o r i t h m =*) AND C o m p u t e r I D = F1 . C2 lines of java of C++ code (in an average case not more that S E L E C T Tmi n AS A2 ; about 100 lines of code), while the researcher has to write FR OM q u e r y F 1 C 1 , q u e r y F 1 C 2 W H E R E ( a l g o r i t h m = J H o a r e ) only a few lines of code to call the existing java or C++ im- S E L E C T N , A1 / A2 AS Q plementation of the algorithm. All the other tasks needed to execute the algorithm and to produce the desired indicators The visualization module the ALGator can be used to pro- are performed automatically by the ALGator, therefore the duce graphs as depicted in Figure 1. researchers can focus on the analyses of the results. Fur- thermore, ALGator uses the same test cases for all the algo- 5. CONCLUSION rithms of the project, therefore the researchers can not tai- The execution part of the ALGator was developed in both lor the tests to be optimal for their implementations, which java and C++ programming languages, therefore the algo- makes the results of the evaluation fair and reliable. rithms to be tested could be implemented in one of these two languages. Measuring the exact execution time of the algo- 6. REFERENCES rithms written in java is a challenging task since the system [1] J. M. Lambert and J. F. Power. Platform independent can only measure real time and because there is no way to timing of java virtual machine bytecode instructions. eliminate the side effects of the java virtual machine’s back- Electronic Notes in Theoretical Computer Science, ground tasks (e.g. garbage collection). To overcome this 220:79–113, 2008. problem, the ALGator executes each algorithm several times [2] J. Nikolaj. Predelava javanskega navideznega stroja za and reports the first, the minimal, the maximal and the aver- štetje ukazov zložne kode. Univerza v ljubljani, age time of execution. Comparing and analyzing these times Fakulteta za računalništvo in informatiko, diplomsko one can detect the influence of the execution environment to delo, 2014. the overall execution time. In many cases this influence is [3] J. Nikolaj. Vmep. github.com/nikolai5slo/jamvm, 2014. negligible. Having the java implementation of the algorithm [4] R. Segedwick. The analysis of quicksort programs. Acta also has some benefits. Namely, the ALGator counts and Informatica, 7:327–355, 1977. generates the statistics of the usage of the java byte code 31 A Graph to the Pairing strategies of the 9-in-a-row Game∗ † Lajos Gy ˝orffy András London Géza Makay Institute of Mathematics Institute of Informatics Institute of Mathematics H-6701 Szeged, Hungary H-6701 Szeged, Hungary H-6701 Szeged, Hungary lgyorffy@math.u- london@inf.u-szeged.hu makayg@math.u- szeged.hu szeged.hu ABSTRACT In Maker-Breaker positional games two players, Maker and Breaker, are playing on a finite or infinite board with the goal of claiming or preventing to reach a finite winning set, respectively. For different games there are several winning strategies either for Maker or Breaker. One class of winning strategies are the so-called pairing strategies. Generally, a pairing strategy means that the possible moves of a game are paired up; if one player plays one, the other player plays its pair. In this study we describe all possible pairing strategies for the 9-in-a-row game. Furthermore, as a concept, we define a graph of these pairings in order to find a structure for them. The characterization of that graph will be also given. Categories and Subject Descriptors F.2 [Analysis of algorithms and problem complexity]: Nonnumerical Algorithms and Problems; G.2 [Discrete mathematics]: Graph Theory, Combinatorics Figure 1: Hales-Jewett pairing blocks the 9-in-a-row Keywords Positional games, pairing strategies, Hales-Jewett pairing After recalling positional games and pairing strategies in general, we focus on the 9-in-a-row game and its pairings. 1. INTRODUCTION We provide a computer program which generates and dis- In this work, we study the pairing strategies of the 9-in- tinguish all (194543) different 8-toric pairings. Finally, we a-row Maker-Breaker game. Hales and Jewett [7] gave the create and analyze a graph of these pairings to have a struc- first pairing strategy to this game showing Breaker’s win. ture of them. However, the uniqueness of the Hales-Jewett pairing or other examples had not been provided since then, until Gy˝ orffy et al. [6] showed the following. There exist only 8- and 16-toric 1.1 Positional games pairings (i.e. they are simply the repetitions of a pairing on A positional game can be defined as a game on a hypergraph the 8 × 8 and 16 × 16 square grids, respectively) where all H = (V, E), where V = V (H) and E = E(H) ⊆ P(H) = 16-toric ones can be derived from some 8-toric ones. {S : S ⊆ V } are the set of vertices and edges, respectively. ∗ Usually, V can be finite or infinite, but an A ∈ E edge is This work was partially supported by the National Re- always finite. The first and second players take elements of V search, Development and Innovation Office - NKFIH, SNN- 117879. in turns. In the Maker-Maker (M-M) version of the game, †Corresponding author the player who first takes all elements of some edge A ∈ E wins the game. In contrary, in the Maker-Breaker (M-B) version, Maker wins by taking every element of some A ∈ E, while the other (usually the second) player, Breaker, wins by taking at least one vertex of every edge in E. Clearly, there is no draw in this game. The M-M and M-B games are closely related, since if Breaker wins as a second player, then the M-M game is a draw. On the other hand, if the first player has a winning strategy for the M-M game, then Maker also wins the M-B version. For more on these, see Berlekamp, Conway and Guy [3] or Beck [2]. 32 In this work we deal with the hypergraph of the k-in-a-row game, which is defined as follows. Definiton 1. The vertices of the k-in-a-row hypergraph Hk are the squares of the infinite (chess)board, i.e. the in- finite square grid. The edges of the hypergraph Hk are the k-element sets of consecutive squares in a row horizontally, vertically or diagonally. We refer to the whole infinite rows as lines. For k-in-a-row M-B games Maker wins if k ≤ 5, see Allis et al. [1] and Breaker wins if k ≥ 8, see Zetters [5]. There is a Breaker winning pairing strategy only if k ≥ 9, see Csernenszky et al. [4]. For the case of k = 9 the first pairing strategy found by Hales and Jewett can be seen on Fig. 1. For k = 6, 7 the problem is open. 1.2 Pairing strategies Given a hypergraph H = (V, E) and a bijection ρ : X → Y , where X, Y ⊂ V (H), X ∩ Y = ∅, is a pairing on the hypergraph H. An (x, ρ(x)) pair blocks an A ∈ E(H) edge, Figure 2: 16-toric pairing for the 9-in-a-row game if A contains both elements of the pair. If the pairs of ρ block all edges, we say that ρ is a good pairing of H. Pairings are one way to show that Breaker has a winning strategy in positional games. A good pairing ρ for a hyper- graph H can be turned to a winning strategy for Breaker in the M-B game on H. Following ρ on H in a M-B game, for every x ∈ X chosen by Maker, Breaker chooses ρ(x) or vice versa in case of x ∈ Y (if x / ∈ X ∪ Y then Breaker can choose an arbitrary vertex). Hence Breaker can block all edges and wins the game. Hereafter we focus on the 9-in-a-row game Figure 3: Other 8-toric examples and its pairings. 2. PAIRINGS FOR 9-IN-A-ROW definition. Furthermore, that 8 × 8 section contains exactly one pair in each 32 (eight vertical, eight horizontal and 16 Definiton 2. A pairing is a domino pairing on the grid, diagonal) torus lines. Three examples, other then the Hales- if all pairs consist of only neighboring cells (horizontally, Jewett pairing, can be seen on Fig. 3. A diagonal torus line vertically or diagonally). is colored on the middle one. Note that the pairing on Fig. 1 is a domino pairing. From 2.1 Generate pairings Gy˝ orffy et al. [6] it follows that if there is a good pairing To find all possible 8-toric pairing strategies of H9 on the for H9 then this pairing is a domino pairing in which the infinite board we wrote a computer program that will be dominoes are following each other by 8-periodicity in each introduced in this section. The main challenge here is not line and all squares are covered by a pair. To handle the only finding all pairings, but deciding whether two pairings periodicity we define the concept of k-toric pairings. are the same. Definiton 3. A pairing of the infinite board is k-toric if We store a pairing in the 8 × 8 table such that each cell rep- it is an extension of a k × k square, where k is the smallest resents the actual pair of the cell according to the 8 possible possible. pairs: 0 means East, 1 South-East, and so on, 7 North-East. Naturally, if a cell’s pair is on the East, then its pair has its own pair on the West, i.e. we fill the table two cells at In [6] it was proved that a good pairing of H9 is either 8-toric a time. The algorithm itself is the usual backtracking algo- or 16-toric. Furthermore, all 16-toric pairings can derive rithm: we find possible pairs for the next cell in the table from two (or more) 8-toric pairings. Fig. 2 shows a 16- having no pair so far, try all those by recursively calling the toric (but not 8-toric) good pairing. The four 8 × 8 squares table filling function. While checking whether a pair is pos- differs from each other only in the colored squares, where sible, we also make sure that there can be no overblocking, the bold pairs show the actual pairs and what the thin line so we keep track of the blocked edges. A detailed example shows is the pairing of the other 8 × 8 square. From now can be found in webpage [8]. we only deal with the 8-toric pairings of H9. A good 8-toric pairing is uniquely determined by an 8 × 8 section of it, by From previous experiences we know, that the running time 33 is crucial, since there are too many such pairings. We try to reduce the number of cases to be considered. We consider two pairing strategies on the infinite board to be the same, if they can be transformed into each other by translation, mirroring and rotation. Thus, in order not to find the same pairing several times, we apply all transformations for any pairing found on the 8 × 8 table. From these transformed pairings we select the smallest one with respect to the lexi- cographical order. That also means that such a pairing must start with 0 and 4 in the first row of the 8×8 table, so we can also reduce the number of searched cases by starting fill the table with these two numbers. Naturally, we keep in mind Figure 4: Two examples of connections between that the 8×8 table is expanded in (say) an 8-toric way to the pairings whole infinite board while applying these transformations. More precisely: it would break the repetition. The first move that entered the cycle creates a hole “behind” (outside the cycle), and 1. We either mirror or not (2 possible cases) the table to when the cycle comes to the same cell, the pair will move the vertical line between columns 4 and 5. backwards, and the cycle is not entered again. Also, it is 2. We rotate the table by 0, 90, 180, and 270 degrees (4 easy to see that we get an optimal pairing by this method. possible cases). Since the original pairing was optimal, moving a pair (an 8- toric way) along the blocked edge keeps that direction (i.e. 3. We try all toric (that is, modulo 8) translation that 8 edges) blocked. Since the method ends in step 4, there are results in a table starting with 0 and 4. no cells without a pair. We also move the pairs on a torus, so no overblocking is possible. We say that two pairings are connected, if one can obtain the second pairing from the We select the lexicographically smallest table as a repre- first one by the method described above (of course, we con- sentative for the actual pairing. This method reduce the sider only different pairings as it was defined in the previous number of all pairing checked to 6210560, and the program section). This relation is symmetric: moving back the last found the 194543 different pairings in about 4 minutes on a pair of the above method gives back the first pairing from desktop computer with a 3.2 GHz Core i7 processor using 12 the second one. This creates a graph, where the vertices Mb of memory. The pairings themselves can be downloaded are the pairings and the edges are defined by the moving at the page [8]. Interestingly, the number of the different transition. Fig. 4 shows two examples for this moving tran- pairings turns out to be a prime number. sition. In both cases, the first pairing contains only the blue pairs, and the red dominoes show the transition to the other Since we have such many different pairings, an obvious way pairing. After computing all possible different pairings our to find a structure can be to store the pairings in a graph. program can easily find this graph. It tries to move all pair- In the next section we will show a natural method to find ings (by trying to free up each cell in the 8 × 8 board), and connections between pairings. use the method described in the previous section to find the lexicographically smallest representative for the new pair- 2.2 Graph of pairings ing. It takes about 1 minute to finish this task on the same While trying to find pairings by hand one can observe, that hardware as in the previous section. we can move a pair along the blocked edge by one step to create a new pairing using the following method. In the next section we will investigate the properties of the obtained graph. 1. Move the first pair on the table. This move creates a cell (say A) without a pair, and another cell (say B) 2.3 Analyzing the graph with two pairs. The basic parameters of the obtained graph can be seen in Tab. 1. The graph is not connected, which means that 2. Move the pair containing cell B which was not the just repeating the moving transition described in the previous moved pair so that cell B has one pair after the move. section we cannot reach an arbitrary pairing from another. But then another cell may have two pairs. One of the 14 components of the graph is a giant component containing almost all (194333) vertices. The diameter of this 3. Repeat step 2 as long as it creates a cell with two pairs. component is 34, which shows us that even this giant com- 4. This method will end when the last move creates a new ponent does not seem to be a “small-world” network. There pair for cell A, which had no pair before the move. are 5-5 smaller components of 10 and 16 vertices and 1-1 components of size 6, 26, 48. Note that every graph com- ponent containing 16 vertices is the net of a 4-dimensional Naturally, we should keep in mind that we are on an 8-toric cube. Fig. 5 shows some small components. pairing and move the pairs accordingly. Since we are on a finite table, this method will either end at step 4, or create The graph is triangle-free, moreover, the length of all in- a repeating cycle. But the later one is not possible. Note duced cycles is four. The degree distribution of the graph that cell A cannot be part of the cycle, as it has no pair, and can be seen in Tab. 2. 34 Table 1: Basic parameters of the constructed graph vertices edges #components max degree min degree avg. degree 194543 532107 14 11 1 5.47 Table 2: Degree distribution of the graph 1 2 3 4 5 6 7 8 9 10 11 17 392 395 39811 66185 53222 25309 7547 1472 183 10 Figure 5: Some components of the obtained graph 3. CONCLUSIONS [7] A. W. Hales and R. I. Jewett. Regularity and In this study, we investigated the 9-in-a-row Maker-Breaker positional games. Trans. Amer. Math. Soc. 106 positional game focusing on its pairing strategies which guar- (1963) 222–229; M.R. # 1265. antee Breaker’s win. We found all different 8-toric pairing [8] G. Makay. Personal homepage strategies using a computer program. The main concepts http://www.math.u-szeged.hu/~makay/amoba/ of the program were described in detail. In order to find a downloaded: 06. 04. 2016. structure of the 194543 pairings, we arranged them into a graph where the vertices are the pairings itself and the edges are some moving transitions of pairs. Analyzing the graph and calculating standard parameters may help in a better understanding of pairing strategies in general. 4. REFERENCES [1] L. V. Allis, H. J. van den Herik and M. P. Huntjens. Go-Moku solved by new search techniques. Proc. 1993 AAAI Fall Symp. on Games: Planning and Learning, AAAI Press Tech. Report FS93-02, pp. 1-9, Menlo Park, CA. [2] J. Beck. Combinatorial Games, Tic-Tac-Toe Theory. Cambridge University Press, 2008. [3] E. R. Berlekamp, J. H. Conway and R. K. Guy. Winning Ways for your mathematical plays, Volume 2. Academic Press, New York, 1982. [4] A. Csernenszky, R. Martin and A. Pluhár. On the Complexity of Chooser-Picker Positional Games. Integers 11, 2011. [5] R. K. Guy and J. L. Selfridge, Problem S.10, Amer. Math. Monthly 86 (1979); solution T.G.L. Zetters 87 (1980) 575–576. [6] L. Gy˝ orffy, G. Makay and A. Pluhár. Pairing strategies for the 9-in-a-row game. Submitted, 2016. http://www.math.u- szeged.hu/~lgyorffy/predok/9_pairings.pdf downloaded: 28. 08. 2016. 35 Construction of orthogonal CC-set Andrej Brodnik Vladan Jovičić Marko Palangetić University of Primorska Ecole Normale Superieure University of Primorska Koper, Slovenia Lyon, France Koper, Slovenia andrej.brodnik@upr.si vladan94.jovicic@gmail.com palangeticmarko95 @hotmail.com Daniel Sila ¯ di University of Primorska Koper, Slovenia szilagyi.d@gmail.com ABSTRACT we know that the peptides associated with edges that were In this paper we present a graph-theoretical method for cal- initially parallel must bind, and all others must not. culating the maximum orthogonal subset of a set of coiled- coil peptides. In chemistry, an orthogonal set of peptides is Essential for such designs is that each pair of peptides in- defined as a set of pairs of peptides, where the paired pep- teracts only mutually, and not with any other pair. Thus, tides interact only mutually, and not with any other peptide the notion of an orthogonal set is introduced. Obviously, from any other pair. the greater our orthogonal set is, the more complex are the structures we can create. Currently the limiting factor in The main method used is a reduction to the maximum inde- designing larger structures is the small set of available pep- pendent set problem. Then we use a relatively well-known tides. maximum independent set solving algorithm which turned out to be the best suited for our problem. We obtained an In this paper, we describe a method for determining a max- orthogonal set consisting of 29 peptides (homodimeric and imal orthogonal set, from a given set of admissible peptides. heterodimeric) from initial 5-heptade set. If we allow only Also, in section 6 we present a possible approach for extend- heterodimeric interactions we obtain an orthogonal set of 26 ing an already-calculated orthogonal set. peptides. 2. PROBLEM DESCRIPTION Keywords As input we are given a set of peptides P = p1, p2, . . . pn Algorithms, NP-hard problem, Modeling (their primary structures – given as strings of fixed length) and interaction matrix I. If Ii,j = 1, then pi interacts with p 1. MOTIVATION j and if it is 0 they do not interact. We have to construct a set of pairs S, where (pi, pj ) ∈ S, iff Ii,j = 1and for all In the last 30 years, impressive 3D structures have been other pk that are in any pair of S Ii,k = 0. Moreover, if built using DNA, in a field called DNA origami. Complex i = j in (pi, pj ) we are talking of homodimer and otherwise structures built from proteins would have many advantages, of heterodimer. since amino acids provide much more functionality. The main problem is that the simple Watson-Crick paring rules We can model this problem as a graph-theoretical one: We present in DNA have no simple analogue for proteins. Using create an undirected graph G = (V, E) where V is set of a special class of polypeptides, called coiled-coil polypep- peptides P , and the edge set E containing an edge pipj (or tides, the orthogonal binding rules of DNA can be emulated a loop at pi, denoted by pipi) if and only if pi and pj interact. By specifying only the primary structure of those polypep- Therefore, the problem definition is the following: tides (the order of amino acids), complex 3D structures can be built, such as the recent protein tetrahedron [2]. More specifically, that structure is determined by taking the wire- Definition . [Maximum Independent Set of Pairs (MISP)] frame of the desired object, doubling every edge, and per- Let G = (V, E) be undirected graph and let k be positive inte- forming an Euler traversal of the obtained graph. Then, ger. Does there exist set S ⊆ E such that for u1v1, u2v2 ∈ S {u1, v1} ∩ {u2, v2} = ∅, {{u, u1}, {u, v1}, {v, u1}, {v, v1}} ∩ E = ∅ and |S| > k? 3. HARDNESS OF THE PROBLEM In order to determine the best possible solution of our prob- lem, in this section we will prove that MISP is NP-complete. 36 Theorem 1. [] Maximum independent set of pairs is NP- 4. REDUCING MISP TO THE MAXIMUM complete. INDEPENDENT SET Now that we know that MISP is NP-complete, we can use Algorithm 1 NP certifier one of the the vast number of algorithms already developed for solving various problems in NP, once we reduce MISP Proof. 1: S ← given set of pairs to that problem. The most natural choice is the maximum 2: if |S| < k then independent set problem. 3: return No 4: for u1v1 ∈ S do Based on the MISP graph G = (V, E), we construct a new 5: for u2v2 ∈ S − u1v1 do graph G0 = (V 0, E0), where V 0 = E, and two vertices are 6: if u1u2 ∈ E ∨ u1v2 ∈ E ∨ v1u2 ∈ E ∨ v1v2 ∈ connected (in G0) if and only if their corresponding edges E ∨ u1v1 6∈ E then in G share a common vertex or have two of their vertices 7: return No connected by an edge. It is easy to see that finding an inde- 8: return Yes pendent set in G0 will give us an independent set of pairs, as per the definition in section 2. Moreover, due to our con- It is easy to check that Algorithm 1 is a polynomial certifier struction, an independent set of pairs in G also gives us a for MISP. Now we will reduce the independent set problem unique independent set in G0. to MISP in order to show that MISP is NP-hard. Thus, we have obtained a bijection between the independent Let G = (V, E) be a graph. We want to check if there exists sets of G0 and the independent sets of pairs of G. an independent set of size greater than k. Define a new graph G0 = (V 0, E0) as follows. Initially, let V 0 = V and 5. RESULTS E0 = E. Then, for each vertex v ∈ V add another vertex v0 (twin vertex) to V 0 and add the edge vv0 to E0. We use results from the previous section to solve the MISP of the input graph G which is constructed from the input set of peptides P = p1, p2, . . . , pn in several steps. Lemma 1. []Every maximal independent set of pairs con- sists only of the edges of the form vv0. 1. Based on previous work by [3], we calculate the inter- action scores sij for each pair of peptides pipj (includ- Proof. Let S be a MISP in G0. Suppose the contrary, ing homodimers p i.e. there is a pair uv ∈ S which is not of the form ww0 ipi), and store that matrix for the following steps for w ∈ V . Then, for all u1v1 ∈ S we have u1u 6∈ E0, u1v 6∈ E0, v1u 6∈ E0, v1v 6∈ E0. Then we can delete the pair 2. Choose thresholds t and T based on which we decide uv from S and add pairs uu0 and vv0 where u0 and v0 are whether peptides pi and pj with interaction score sij twin vertices of u and v, respectively. We can do this since will interact. If sij < t, we declare that pi and pj the only neighbors of u0 and v0 are u and v, respectively. We are not interacting (or, more precisely, interacting in a obtained an independent set of pairs, with more more than negligibly small proportion), and likewise, if sij > T , |S| elements, a contradiction. pi and pj interact. The greater the difference T − t, we are more certain that in the obtained orthogonal set only the designated pairs will interact. We want to prove now that there is an independent set |S| ≥ k of G if and only if there is an independent set of pairs 3. Construct the graph G on the set of peptides by con- |SP | ≥ k of G0. necting the interacting ones, as in section 2. (⇒) Suppose that S is independent set of G and |S| ≥ k. 4. Reduce G to G0, suitable for calculating the indepen- Then, define the independent set of pairs S dent set, as in 4. P of G0 on the following way: 5. Find the maximum independent set in G0, as shown be- SP = {vv0 | v ∈ S}. fore, it corresponds to the MISP (or, orthogonal set) in G. We use the (exact) maximum clique solving algo- It is easy to verify that this is independent set of pairs by rithm presented in [1], which is based on greedy graph the above definition. Then |SP | = |S| >= k. colorings – i.e. if we can color a particular subgraph with k colors, we know that that the maximum clique (⇐) Suppose that SP is an independent set of pairs of G0 in that subgraph has size at most k. with |SP | >= k. Then, by previous lemma, we can define the following independent set S of G: S = {v ∈ V | vv0 ∈ S In order to test our algorithm, we generated synthetic ini- P }. tial sets of peptides, based on two observations: Firstly, the By the construction of graph G0 and by the lemma, one can interaction scoring function is designed to consider only 4 show that S is a independent set of G. Then |S| = |SP | >= positions in each heptad. Secondly, using electrostatic argu- k which completes proof that MISP is NP-hard. ments about individual amino acids and their positions in the coiled-coil, we reduced the variation even further, by al- Combining the NP-hardness with the earlier fact that MISP lowing only 2 different amino acids on 3 of those 4 positions, is in NP, we conclude that MISP is NP-complete and completely fixing the remaining amino acid. Thus, we 37 obtain 8 essentially different heptads, which we use to build Alternative is to construct a maximal orthogonal set from up larger peptides. Our main result is the calculation of the set of all natural tetraheptads (coiled-coils where each a 29-peptide orthogonal subset of the 5-heptad initial set of the 4 heptads occurs naturally). Since there are 1171 (215) peptides generated as described above), as well as a 26- known natural heptads, we can combine them to get 11714 = peptide purely heterodimeric orthogonal subset of the same 1880301880081 possible tetraheptads. Finding a maximal initial set. The interaction score heatmap can be seen on orthogonal subset of this set would require finding the max- Figures 1 and 2 imum independent set of a graph with more than 1012 ver- tices – a task clearly impossible to do in a reasonable amount of time. Our idea is to use heuristic to reduce the initial set to a more manageable size: Since it is possible to calculate the interaction matrix for single natural heptads, we can approx- imate scores for tetraheptads as shown at Figure 3. More specifically, we will add up the precalculated scores between (adjacent) heptads which are connected as on figure 3. Of course, some interactions will be left unaccounted for in the final score, for example the last amino acid in heptad 1 on 3 may interact with first amino acid of heptad 7 which is not added to the final score. This observation enables us to construct more meaningful initial peptide sets consisting of longer peptides, based on the already-calculated orthogonal sets of shorter peptides. Figure 1: 5-heptad orthogonal set, no restriction Figure 3: Proposed way of scoring 7. CONCLUDING REMARKS In this paper, we presented an exact method for determin- ing an orthogonal set of coiled-coil polypeptides, if we are given a numeric measure of their interaction strength. Our approach has been demonstrated to be successful for moder- ately large initial peptide sets (tens of thousands), and has given us optimal orthogonal sets that could not have been calculated by hand. Figure 2: 5-heptad orthogonal set, heterodimers only Unfortunately, for even larger initial sets, we are maximum- clique solver becomes an apparent bottleneck, as it has to The peptidets which belong to orthogonal set are in both operate on graphs of size O(n4), where n is the size of the figures colored in dark red. initial set. In that case, we suggest investigating a bottom- up method described in the section 6. 6. FUTURE WORK Up to now, we have only considered orthogonal sets derived from synthetically generated peptides, as described in the 8. REFERENCES previous section. To actually use such an orthogonal set, we [1] M. Depolli, J. Konc, K. Rozman, R. Trobec, and have to manually synthesize all of those peptides. D. Janezic. Exact parallel maximum clique algorithm 38 for general and protein graphs. Journal of chemical information and modeling, 53(9):2217–2228, 2013. [2] H. Gradišar, S. Božič, T. Doles, D. Vengust, I. Hafner-Bratkovič, A. Mertelj, B. Webb, A. Šali, S. Klavžar, and R. Jerala. Design of a single-chain polypeptide tetrahedron assembled from coiled-coil segments. Nature chemical biology, 9(6):362–366, 2013. [3] V. Potapov, J. B. Kaplan, and A. E. Keating. Data-driven prediction and design of bzip coiled-coil interactions. PLoS Comput Biol, 11(2):1–28, 02 2015. 39 Usage of hereditary colorings of product graphs in clique search programs Matjaž Depolli Janez Konc Sandor Szabo Department of Communication Laboratory for Molecular Institute of Mathematics and Systems Modeling Informatics Jožef Stefan Institute National Institute of Chemistry University of Pecs matjaz.depolli@ijs.si konc@cmm.ki.si sszabo7@hotmail.com Bogdan Zavalnij Institute of Mathematics and Informatics University of Pecs bogdan@ttk.pte.hu ABSTRACT speed up the computation by reducing the search space. There are computationally demanding problems that can Finding optimal or nearly optimal colorings is itself a com- be solved by k-clique search algorithms in auxiliary prod- putationally demanding problem. From this reason in the uct graphs. The best clique search programs heavily rely above computations computationally more feasible greedy upon good colorings. But obtaining a good coloring is a de- algorithms are used to construct suboptimal colorings. It manding task itself. We present some coloring schemes that is customary to color the nodes of a graph G satisfying the exploit the property of the product graph itself and can be following conditions. constructed with ease. We call these colorings hereditary. There are indications that using these colorings some hard problems would become feasible. 1. Each node of G receives exactly one color. 2. Adjacent nodes in G cannot receive the same color. Keywords clique, maximum clique, product graph, graph isomorphy This is the most commonly encountered coloring of the nodes 1. INTRODUCTION of a graph and it is referred as legal coloring of the nodes. It Let G = (V, E) be a finite simple graph. Let D be a subset is well known that coloring can be used for estimating clique of V and let ∆ be the subgraph of G induced by D. The size. subgraph ∆ is called a clique in G if any two distinct ele- ments of D are adjacent in G. If the set D has k elements, For each finite simple graph G there is a well defined non- then we call ∆ a k-clique in G. negative integer k such that the nodes of G admit a legal coloring with k colors but the nodes of G cannot be col- Finding cliques in a given graph is an important problem in ored legally using k − 1 colors. This number k is called the discrete applied mathematics with many applications inside chromatic number of the graph G and it is denoted by χ(G). and outside of mathematics. For further details see [1], [2], [4], [7], [13], [14]. Let us suppose that ∆ is an l-clique in G and let us suppose that the nodes of G have a legal coloring with k colors. Then We formally state the following clique search problem. l ≤ k holds. Problem 1. Given a finite simple graph G and given a Problem 2. Given a finite simple graph G and given a positive integer k. Decide if G contains a k-clique. positive integer k. Decide if the nodes of G have a legal coloring using k colors. Many practical clique search algorithms employ coloring to Both Problems 2 and 1 are decision problems. From the complexity theory of computations we know that these problems belong to the NP-complete complexity class. Let G = (V, E) be a finite simple graph and let s be a positive integer such that s ≥ 2. A subset U of V is called an s-free set if the graph spanned by U in G does not contain any s-clique. A partition U1, . . . , Ur of V is called an s-clique free partition of V if Ui is an s-clique free subset of V for 40 each i, 1 ≤ i ≤ r. We can look at this partitioning as an Note that if there is a solution to the spanned subgraph alternative coloring method and we will call it s-clique free isomorphism problem then the second coloring defines also coloring [12]. a “best” coloring as it uses equal to the chromatic number of colors, because k = |V 0| = χ(Γ). 2. PRODUCT GRAPHS AND THEIR COL- We also should point out an interesting phenomenon. In ORINGS real life if one constructs a product graph for a given prob- In our paper we are interested in some special problems. lem the nodes of this product graph will be listed in such We assume that these problems can be reduced to k-clique order that they will be also listed by color classes by one of search in a given product graph. The problems of graph the above scheme. It is because one lists the nodes of the isomorphism and spanned subgraph isomorphism are rep- product graph by a double nested for loop listing the nodes resentatives of this type of problems so we will use the of one graph and the nodes of the other graph. From this it spanned subgraph isomorphism problem to illustrate the follows that programs using sequential greedy coloring may method. Other problems can be dealt with by similar means. result in the best possible coloring and will run extremely Spanned subgraph isomorphism has important applications fast comparing to other programs which use other coloring for example in drug design, chemical database problems, ar- methods. By our knowledge this phenomenon was not de- tificial intelligence or pattern recognition. Let us state the tected previously. spanned subgraph isomorphism problem more formally: 2.2 Second hereditary coloring scheme We take an independent set I ⊆ V from G, and a clique Problem 3. Let G = (V, E), H = (V 0, E0) be finite sim- K ⊆ V 0 from H. Note that nodes of Γ (a, b) ∈ W, a ∈ ple graphs. Is there a spanned subgraph G0 in G such that I, b ∈ K form an independent set. This follows from that all G0 is isomorphic to H. In other words is there a G0 = the nodes of I are independent and all the nodes of K are (V0, E0) : V0 ⊆ V where v1, v2 ∈ V0 and {v1, v2} ∈ E then connected thus no (a and only then {v ∼ 1, b1) and (a2, b2) pair can be connected 1, v2} ∈ E0 such that G0 = H ? as {a1, a2} ∈ V and {b1, b2} / ∈ V 0. Thus if we partition the nodes from G into i independent sets and partition the nodes from H into k cliques then we can define i × k color A possible method of solving this problem is to construct classes in Γ where the color classes are formed by pairs of an auxiliary graph Γ = (W, F ) where |W | = |V ||V 0|. The an independent set from G and a clique from H. nodes of the graph Γ are labeled by ordered pairs of nodes from G and H. That is if a1 ∈ V, b1 ∈ V 0 then (a1, b1) ∈ W . Obviously we can partition G into cliques and H into inde- The edges of the graph Γ are constructed as follows. Let pendent sets as well. us consider (a1, b1), (a2, b2) as two distinct nodes of Γ. We put an edge between them if {a1, a2} ∈ E and {b1, b2} ∈ E0. The described method can be used with many different par- We also put and edge between them if {a1, a2} / ∈ E and titioning, thus resulting with several different colorings. {b1, b2} / ∈ E0 for a1 6= a2, b1 6= b2. This means that {a1, a2} and {b1, b2} both should be connected or both should not be connected. A k-clique where k = |V 0| in the graph Γ rep- 2.3 Third hereditary coloring scheme resents the function f : V Similarly to the previous method we partition the set of 0 → V 0 such that b1 = f (a1), b2 = f (a nodes of G and H graphs. But instead of independent sets 2), {a1, b2} ∈ E0 ⇔ (f (a1), f (a2)) ∈ E0. in G we shall use s-clique free set Is, and instead of cliques It is well known, that the k-clique search algorithm can be in H – which is equivalent to an independent set in the ¯ H sped up by using a good coloring of the given graph. We complement graph – we shall use an r-clique free set Kr in ¯ will describe several coloring schemes in the following sub- H. Using nodes from these two sets, a ∈ Is, b ∈ Kr the sections. We called these colorings “hereditary” because of nodes (a, b) in Γ form an (s + r − 1)-clique free set. For the fact that they derive solely from the two input graphs further details of s-clique free colorings in clique search see and the constructing method of the auxiliary product graph. [12]. 2.1 First hereditary coloring scheme 3. PROPOSED PROGRAM AND PRELIMI- Note that the nodes (a NARY RESULTS 1, b1), (a1, b2), (a1, b3), . . . , (a1, b|V 0|) of the graph Γ form an independent set. Intuitively this The proposed k-clique search program is working as follows. means that the node a1 can be paired only with only one of In the first part several different colorings are prepared and the nodes in V 0 at the same time. Thus we can define |V | saved. In the second part we use the standard Carraghan- number of color classes in Γ where the nodes labeled by the Pardalos clique search method. In this procedure we always same node from G fall into one color class. That is the first check the remaining nodes against the saved different color- color class will consist of nodes (a1, b1), (a1, b2), (a1, b3), . . ., ings and use the best possible one. the second will consist of nodes (a2, b1), (a2, b2), (a2, b3), . . ., the third of nodes (a3, b1), (a3, b2), (a3, b3), . . ., and so on. As the presented work is still in progress the proposed soft- ware is yet to be completed. For the sole purpose of demon- Similarly, the nodes (a1, b1), (a2, b1), (a3, b1), . . . , (a|V |, b1) of stration we present here just two problem instances. The the graph Γ form an independent set. Thus we can define take a basic graph B, which is a 25 node graph named |V 0| number of color classes in Γ where the nodes labeled by s3myc-3x3.clq. From this graph two EVIL graphs were the same node from H fall into one color class. build using 3 and 10 instances as described in our paper 41 “Benchmark problems for exhaustive exact maximum clique search algorithms” presented in this same conference. The first graph, G1 has 75 nodes, the second one, G2 has 250 nodes. The 25 node B graph and 250 node G2 graph can be downloaded from the site http://clique.ttk.pte.hu/evil as the generator program which were used producing the 75 node G1 graph. We pictured these three graphs on Figure 2 and Figure 1. We state the problem of spanned subgraph isomorphism that if the 25 node B graph is isomorphic to a subgraph of the 75 node G1 and the 250 node G2 graph. For this purpose we produced the two auxiliary product graphs Γ1 = (W1, F1) and Γ2 = (W2, F2) having |W1| = 1875 and |W2| = 6250 nodes accordingly as described in the Introduction. We pic- tured the 1875 node Γ1 product graph on Figure 3. Obvi- ously the base graph B of 25 node is part of the 75 node G1 Figure 2: The 25 node B base graph and the 75 node and the 250 node G2 graph, so both auxiliary graphs have G1 graph. maximum cliques of size 25 representing a mapping between the base graph B and the 75 node G1 and the 250 node G2 graph. 1. San Segundo1 [8], [9], [10], [11] (BBMC, BBMC-R, BBMC-L and BBMC-X). 2. Li2 [5], [6] (MaxCLQ 10, MaxCLQ 13-1 and MaxCLQ 13-2). 3. Prosser3 (who implemented Tomita’s algorithm [13]) (MCR) 4. Österg˚ ard4 [7] (Cliquer), 5. Konc5 [3] (mcqd and mcqd-dyn) There are three ways to use the 2013 version of C.-M. Li program. A switch can be set to either “1” or “2” to select between two built in orderings of the nodes of the graph. In case no value of the switch is specified the program chooses between the “1” and “2” possibilities. During our test we explicitly used the switch “1” and “2” ( M-cql 13-1 and M- cql 13-2). We compared these programs to our own program using the first hereditary coloring scheme. The program can be found on the same site as the EVIL instances and named antiB. The brief description of the program is the following. Figure 1: The 250 node G2 graph. 1. Use the given coloring to color the nodes and save these colors. 2. Set k to be the number of colors of the legal coloring we have been given. 3. Carry out a k-clique search. 4. If a k-clique is found, then it is a maximum clique of the graph. Otherwise reduce the value of k and go to step 3. 1https://www.biicode.com/pablodev/examples_clique We used the following 12 clique search algorithms on these 2http://home.mis.u-picardie.fr/~cli/EnglishPage. product graphs from the following researchers. To eliminate html the effect of pre-ordered nodes by color classes noted in Sub- 3http://www.dcs.gla.ac.uk/~pat/maxClique/ section 2.1 we shuffled randomly the nodes of the auxiliary distribution/ graphs. Note though, that most programs did not perform 4http://users.aalto.fi/~pat/cliquer.html much better even with the unshuffled variants. 5http://insilab.org/maxclique/ 42 Γ1 = (W1, F1) Γ2 = (W2, F2) |W1| = 1875 |W2| = 6250 Zavalnij: antiB 0.2s 2.5s San Segundo: BBMC 64s 1768s BBMC-R 20s 365s BBMC-L 14s 291s BBMC-X 21s 387s Li: MaxCLQ 10 9s error MaxCLQ 13-1 136s >1h MaxCLQ 13-2 2044s >1h Prosser: MCR 35s >1h Österg˚ ard: Cliquer 134s 1767s Konc: mcqd 25s >1h mcqd-dyn 14s 3065s Table 1: Running time results for two test problems Figure 3: The 1875 node Γ1 auxiliary graph for test- Conference on Artificial Intelligence. (AAAI-10), pp. ing if B graph is isomorphic to a subgraph of the 75 128–133. node G1 graph. [6] C.-M. Li, Z. Fang, K. Xu, Combining MaxSAT reasoning and incremental upper bound for the maximum clique problem, Proceedings of the 2013 The k-clique search is based on the Carraghan-Pardalos al- IEEE 25th International Conference on Tools with gorithm, where we utilized original coloring of the nodes. Artificial Intelligence. (ICTAI2013), pp. 939–946. The ordering of the nodes was done by the size of the color [7] P. R. J. Österg˚ ard, A fast algorithm for the maximum classes and the node degrees. clique problem, Discrete Applied Mathematics 120 (2002), 197–207. The results of our small test can be seen in Table 1. As one [8] P. San Segundo, D. Rodriguez-Losada, A. Jimenez, An can easily see our simple program is two magnitude faster exact bit-parallel algorithm for the maximum clique than the best performing programs on these examples. This problem, Computers & Operations Research. 38 clearly indicates the possible potential of the proposed col- (2011), 571–581. oring schemes. [9] P. San Segundo, F. Matia, D. Rodriguez-Losada, M. Hernando, An improved bit parallel exact maximum Acknowledgment clique algorithm, Optimization Letters. 7 (2013), This research was supported by National Research, Devel- 467–479. opment and Innovation Office – NKFIH Fund No. SNN- [10] P. San Segundo, C. Tapia, Relaxed approximate 117879. coloring in exact maximum clique search, Computers & Operations Research. 44 (2014), 185–192. 4. REFERENCES [11] P. San Segundo, A. Nikolaev, M. Batsyn, Infra-chromatic bound for exact maximum clique [1] E. Balas, J. Xue, Weighted and unweighted maximum search, Computers & Operations Research. 64 (2015), clique algorithms with upper bounds from fractional 293–303. coloring, Algorithmica 15 (1996), 397–412. [12] Szabo S. and Zavalnij B. Greedy algorithms for [2] R. Carraghan, P. M. Pardalos, An exact algorithm for triangle free coloring. AKCE International Journal of the maximum clique problem, Operation Research Graphs and Combinatorics. 9:(2) pp. 169–186. (2012) Letters 9 (1990), 375–382. [13] E. Tomita and T. Seki, An efficient branch-and-bound [3] J. Konc and D. Janežič, An improved branch and algorithm for finding a maximum clique, Lecture Notes bound algorithm for the maximum clique problem, in Computer Science 2631 (2003), 278–289. MATCH Communications in Mathematical and [14] D. R. Wood, An algorithm for finding a maximum Computer Chemistry 58 (2007), 569–590. clique in a graph, Oper. Res. Lett. 21 (1997), 211–217. [4] D. Kumlander, Some Practical Algorithms to Solve the Maximal Clique problem PhD. Thesis, Tallin University of Technology, 2005. [5] C.-M. Li, Z. Quan, An efficient branch-and-bound algorithm based on MaxSAT for the maximum clique problem, Proceedings of the Twenty-Fourth AAAI 43 Testing the Markowitz Portfolio Optimization Method with Filtered Correlation Matrices∗ † Imre Gera Balázs Bánhelyi András London Institute of Informatics Institute of Informatics Institute of Informatics P.O. Box 652 P.O. Box 652 P.O. Box 652 H-6701 Szeged, Hungary H-6701 Szeged, Hungary H-6701 Szeged, Hungary london@inf.u-szeged.hu ABSTRACT the part of information which is robust against this uncer- In this work we analyze the performance of the Markowitz tainty [2, 7, 9, 10, 11]. The filtered correlation matrices have portfolio optimization method on the Budapest Stock Ex- been successfully used in portfolio optimization in terms of change data set using two different filtering techniques de- risk reduction [10, 17, 19]. In these studies, it was often as- fined for correlation matrices. The results show that the sumed that the investor has perfect knowledge on the future estimated risk is much closer to the realized risk using filter- returns. ing methods. Bootstrap analysis shows that ratio between the realized return and the estimated risk (Sharpe ratio) is In this work we investigate the portfolio selection problem also improved by filtering. using different filtering procedures applied to the correla- tion matrix. We measure the performance of the proce- Categories and Subject Descriptors dures in terms of both the predicted and realized risk and I.6 [Simulation and Modelling]: Applications return, respectively. The future returns are not known at ; G.1.6 [Optimization]: Constrained optimization, Nonlin- the time of the investment. In Section 2 we briefly describe ear programming the Markowitz portfolio optimization problem and two ap- proaches for the correlation matrix filtering (Random Matrix Keywords Theory, Clustering). In Section 3 we present our results us- ing standard performance measures on the return and risk, Portfolio optimization, Markowitz model, Correlation ma- and finally, in Section 4 we draw some conclusions and indi- trices, Random matrix theory, Hierarchical clustering cate future work. 1. INTRODUCTION 2. PORTFOLIO OPTIMIZATION The portfolio optimization is one of the most important In Markowitz’ formulation, the portfolio problem is a single problem in asset management aims at reducing the risk of an period model of investment. At the beginning of the pe- investment by diversifying it into independently fluctuating riod (t assets [5]. In his seminal work [14], Markowitz formulated 0), an investor allocates the capital among different assets. During the investment period ([t the problem through the criteria that given the expected re- 0, T ]), the portfolio produces a random rate of return and results a new value turn, the risk - measured by the variability of the return - of the capital. In the original model of Markowitz, the risk has to be minimized. The classical model measures the risk of a single asset is measured by the variance of its returns, as the variance of the asset returns resulting in a quadratic while the risk of the portfolio is measured via the covariance programming problem. Recently, the analysis of the correla- matrix of the returns of the assets in the portfolio. In this tion coefficient matrix, that appears through the covariance section we briefly introduce the Markowitz portfolio opti- matrix in the objective function of the model, has become mization problem and describe two filtering procedures of the focus of interest [2, 4, 9, 10, 17, 19]. Many attempts the covariance matrix in order to obtain less noisy matrix to have been made in order to quantify the degree of statisti- decrease the statistical uncertainty it contains. cal uncertainty present in the correlation matrix and filter ∗This work was partially supported by the National Re- 2.1 Markowitz’s model search, Development and Innovation Office - NKFIH, SNN- Given n risky assets, a portfolio composition is determined 117879. † by the weights pi (i = 1, . . . , n), such that Pn pi = 1, in- Corresponding author i dicating the fraction of wealth invested in asset i. The ex- pected return and the variance of the portfolio p = (p1, . . . , pn) are n X rp = piri = prT (1) i=1 and n n X X σ2p = pipj σij = pΣpT , (2) i=1 j=1 44 where ri is the expected return of asset i, σij is the covari- Cluster Dendrogram ance between asset i and j and Σ is the covariance matrix. 2.0 Vectors are considered as row vectors int this paper. 1.9 In the classical Markowitz model [14] the risk is measured by the variance providing a quadratic optimization problem 1.8 ORMESTER 4IG T ARK UA RABA TRINV EXTERNET FORRAST which consists of finding a vector p, assuming Pn p Q TWDINVEST NUTEX ENEFI i=1 i = 1, GSP 1.7 TO ACK BIF KARPO that minimizes σ2 ESTMEDIA AL ZW p for a given “minimal expected return” SHOP FUTURA VISONKA CSEPEL PV value of r OPIMUS MASTERPLAST p. Now, we assume that short selling is allowed and KULCSSOFT TT1 OM O PANNERGY Height 1.6 ATE1 therefore p KONZUM ANNONIA KEG i can be negative. The solution of this problem, EMASZ TP UPD OM CIGP found by Markowitz, is TERA TINUS UXO AL ELMU FORRASOE 1.5 PLO NORDTELEK EFTB MTELEK ANY p∗ = λΣ−11T + γΣ−1rT , (3) 1.4 where 1 = (1, . . . , 1), while the other parameters are MOL TPO RICHTER λ = (C − rpB)/D and γ = (rpA − B)/D, Minimal Spanning Tree of 40 BUX Assets where A = 1Σ−11T , B = 1Σ−1rT , C = rΣ−1rT , D = AC − B2. CIGPANNONIA FUTURAQUA Considering the daily price time series of n assets and de- ESTMEDIA EMASZ noting the closure price of asset i at time t (t = 1, . . . , T ) by EFTBUXOTP P KARPOT i(t), the daily logarithmic return of i is defined as ENEFI NUTEX ELMU CSEPEL KULCSSOFT P ANY i(t) MOL r FORRASOE OPIMUS it = log = log Pi(t) − log Pi(t − 1). (4) P KEG i(t − 1) GSPARK TRINV PVALTO SHOP RABA OTP BIF In case of stationary independent normal returns, which is OTT1 RICHTER MTELEKOM MASTERPLAST usually assumed for asset prices, the maximum likelihood ORMESTER KONZUM VISONKA ZWACK estimator is the sample mean of the past observations of ri, 4IG NORDTELEK P OM ANNERGY is defined as UPDATE1 T 1 TWDINVEST X ALTERA ˆ ri = rit. (5) T t=1 PLOTINUS FORRAST Hence, for the portfolio we define ˆ r = (ˆ r1, . . . , ˆ rn). The covariance σij between assets i and j is estimated by EXTERNET T 1 X ˆ σij = (rit − ˆ ri)(rjt − ˆ rj ) (6) Figure 1: Indexed hierarchical tree - obtained by the T − 1 t=1 single linkage procedure - and the associated MST of the correlation matrix of 40 assets of the Budapest and for the portfolio ˆ Σ = (ˆ σij )i,j . The correlation coefficient Stock Exchange between asset i and j is defined as √ ρij = σij / σiiσjj , (7) where λmin and λmax are the minimum and maximum eigen- where σii is often called the volatility of asset i. values, respectively [18], given in the form 1 r 1 2.2 Random matrix theory and correlation ma- λmax,min = σ2 1 + ± 2 . (9) Q Q trices Previous studies have pointed out that the largest eigen- A simple random matrix is a matrix whose elements are ran- value of correlation matrices from returns of financial assets dom numbers from a given distribution [15]. In context of is completely inconsistent with Eq. 8 and refers to the com- asset portfolios random matrix theory (RMT) can be use- mon behavior of the stocks in the portfolio [9, 16]. Since ful to investigate the effect of statistical uncertainty in the Eq. 8 is strictly valid only for n → ∞, T → ∞, we con- estimation of the correlation matrix [19]. Given the time structed random matrices for certain n and T values of the series of length T of the returns of n assets and assuming data sets that are used and compare the largest eigenval- that the returns are independent Gaussian random variables ues and the spectrum with C. We found high consistency with zero mean and variance σ2, then in the limit n → ∞, with Eq. 8. Since Trace(C) = n the variance of the part T → ∞ such that Q = T /n is fixed, the distribution Prm(λ) not explained by the largest eigenvalue can be quantified of the eigenvalues of the random correlation matrix (Crm) as σ2 = 1 − λ is given by largest/n. Using this, we can recalculate λmin and λmax in Eq. 9 and construct a filtered diagonal matrix p Q (λ C min − λ)(λmax − λ) RM T , that we get by setting all eigenvalues of C smaller Prm(λ) = , (8) than λ 2πσ2 λ max to zero and transform it to the basis of C with set- 45 ting the diagonal elements to one (and using singular value Original decomposition). A possible RMT approach for portfolio op- RMT timization, following [17], is to use ΣRMT (that can be eas- 0.20 MST ily calculated form CRMT ) instead of Σ in the Markowitz model. 0.15 Risk 2.3 Clustering 0.10 The correlation matrix C has n(n − 1)/2 ∼ n2 distinct el- ements therefore it contains a huge amount of information even for a small number of assets considered in the portfolio 0.05 selection problem. As shown by Mantegna and later many others [3, 8, 12, 19, 20], the single linkage clustering ap- 1.0 1.2 1.4 1.6 1.8 2.0 proach [6] (closely related to minimal spanning trees (MST), Expected Return Fig. 1) provides economically meaningful information using Original only n − 1 distinct elements of the correlation matrix. To 0.20 construct the filtered matrix, the correlation matrix C is RMT converted into a distance matrix D, for instance following MST 0.15 [12, 13], using dij = p2(1 − ρij ) ultrametric distance1. The distance matrix D can be seen as a fully connected graph of the assets with edge weights d Risk ij representing similarity 0.10 between time series of them. Then the filtered correlation matrix CMST is constructed with just n − 1 distinct cor- relation coefficients by converting the filtered ultrametric 0.05 distance matrix back. It was proven that the ultrametric correlation matrix obtained by the single linkage cluster- ing method is always positive definite if all the elements of 1.0 1.2 1.4 1.6 1.8 2.0 the obtained ultrametric correlation matrix are positive [1]. Expected Return This condition has been observed for all correlation matrices Original we used. Then, for portfolio optimization, we can use the 0.35 RMT obtained ΣMST instead of Σ in the Markowitz model. 0.30 MST 3. RESULTS 0.25 3.1 Data set Risk 0.20 To compare the performance of the methods we analyze the data set of n = 40 stocks traded in the Budapest Stock 0.15 Exchange (BSE) in the period 1995-2016, using 5145 records 0.10 of daily returns per stock. 0.05 We consider t = t0 as the time when the optimization is 1.0 1.2 1.4 1.6 1.8 2.0 performed. Since the covariance matrix has ∼ n2 elements Expected Return while the number of records used in the estimation is nT , the length of the time series need to be T >> n in order to get Figure 2: The ratio of the realized risk σ2r and small errors on the covariance. On the other hand, for large the predicted risk ˆ σ2p as the function of expected T the non-stationarity of the time series likely appears. This portfolio return rp for the different procedures as problem is known as the curse of dimensionality. Because T = 50, 100, 500 (top-down). The data set contains 40 of this, we compute the covariance matrix and expected re- BSE stocks in the period turns using the [−T, 0] interval, i.e. using T = 50 ≈ n, T = 100 > n and T = 500 >> n days preceding t = 0. Furthermore, filtering techniques are able the filter the part portfolio p, the ex-ante Sharpe ratio measures the excess of the covariance matrix which is less affected by statistical return per unit of risk: uncertainty. To quantify and compare the different methods ˆ rp − rf are considered, we use the measures described below. Sp = , (10) σp while the ex-post Sharpe ratio uses the same equation but 3.2 Performance evaluation with the realized return rp. Here, rf is the risk-free rate To measure the performance of the portfolios determined by of return. The portfolio risk, due to the estimation of the the different models, we use the following quantities for the correlation matrix is calculated as estimated return and risk at the time of investment and the |σ2 realized risk and returns after the investment period. For r − ˆ σ2p| Rp = (11) ˆ σ2p 1Ultrametric distances are such distances that satisfy the in- equality d where ˆ σ2 ij ≤ max{dik , dkj }, which is a stronger assumption p is the predicted risk, while σ2 r is the realized risk that the standard triangular inequality. of the portfolio. 46 5. REFERENCES Table 1: Bootstrap experiments using 50 random [1] M. R. Anderberg. Cluster analysis for applications. samples for each value of T in case of 120% expected monographs and textbooks on probability and return mathematical statistics, 1973. rp = 1.2 Original RMT MST [2] T. Conlon, H. J. Ruskin, and M. Crane. Random matrix Return 0.145 (0.330) 0.180 (0.425) 0.186 (0.348) theory and fund of funds portfolio optimisation. Physica A: T=50 Sp 0.009 0.180 0.186 Statistical Mechanics and its applications, 382(2):565–576, Rp 16.66 0.99 0.99 2007. Return 0.319 (0.332) 0.315 (0.541) 0.362 (0.418) [3] T. Di Matteo, T. Aste, and R. N. Mantegna. An interest T=100 Sp 0.036 0.315 0.364 rates cluster analysis. Physica A: Statistical Mechanics and Rp 8.954 0.99 0.99 its Applications, 339(1):181–188, 2004. Return -0.185 (0.928) -0.313 (1.234) 0.264 (0.724) [4] M. El Alaoui. Random matrix theory and portfolio T=500 Sp -0.077 -0.313 0.264 optimization in moroccan stock exchange. Physica A: Rp 2.415 0.99 0.99 Statistical Mechanics and its Applications, 433:92–99, 2015. [5] E. J. Elton, M. J. Gruber, S. J. Brown, and W. N. Goetzmann. Modern portfolio theory and investment analysis. John Wiley & Sons, 2009. 3.3 Experiments [6] J. C. Gower and G. Ross. Minimum spanning trees and Fig. 2 shows the ratio of the realized risk σ2r and the pre- single linkage cluster analysis. Applied statistics, pages dicted risk ˆ σ2 54–64, 1969. p as the function of the estimated return rp obtained by the different procedures. For each T , the in- [7] T. Guhr and B. Kälber. A new method to estimate the vestment time t noise in financial correlation matrices. Journal of Physics 0 and the set of stocks were the same. The A: Mathematical and General, 36(12):3009, 2003. ratio is significantly smaller in case of the portfolios that [8] L. Kullmann, J. Kertész, and K. Kaski. Time-dependent obtained by using filtering. Interestingly, for T = 100 the cross-correlations between different stock returns: A MST method gave better results than the RMT. directed network of influence. Physical Review E, 66(2):026125, 2002. To check the robustness of the methods, we performed a [9] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters. bootstrap experiment as follows. We considered 50 random Noise dressing of financial correlation matrices. Physical initial times to solve the optimization problem using the review letters, 83(7):1467, 1999. time series on the intervals [−T, t [10] L. Laloux, P. Cizeau, M. Potters, and J.-P. Bouchaud. 0] (T = 50, 100, 500). For Random matrix theory and financial correlations. each portfolio, we computed the predicted risk using Eq. 2 International Journal of Theoretical and Applied Finance, for expected returns rp = 1, 1.1, . . . , 2 (0 − 100% gain). We 3(03):391–397, 2000. further constrained pi to the interval [−1, 1] and used the La- [11] Y. Malevergne and D. Sornette. Collective origin of the grange multiplier method for the optimization. In all cases, coexistence of apparent random matrix theory noise and of the portfolios with realized returns in the top and bottom factors in large sample correlation matrices. Physica A: 10% were neglected. We computed the realized risk using Statistical Mechanics and its Applications, 331(3):660–668, 2004. the calculated stock weights at t0 and the realized covariance [12] R. N. Mantegna. Hierarchical structure in financial matrix on [t0, T ]. We also computed the realized returns by markets. The European Physical Journal B-Condensed comparing the value of the portfolio at t0 and T . The aver- Matter and Complex Systems, 11(1):193–197, 1999. age Sp, Rp values and returns with standard deviations for [13] R. N. Mantegna and H. E. Stanley. Introduction to rp = 1.2 are shown in Tab. 1. It can be seen, the Rp values econophysics: correlations and complexity in finance. are significantly smaller in case of the RMT and MST than Cambridge university press, 1999. in case of the original method for each T confirming the reli- [14] H. Markowitz. Portfolio selection: Efficient diversification ability of the filtering methods. The post-ante Sharpe ratio, of investments. cowles foundation monograph no. 16, 1959. however it is much smaller than 1 in every case, also shows [15] M. L. Mehta. Random matrices, volume 142. Academic press, 2004. the that the RMT and MST methods outperforms the origi- [16] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, nal method. We note, interestingly, that the highest average and H. E. Stanley. Universal and nonuniversal properties of return was obtained for T = 100 (and not for T = 500) using cross correlations in financial time series. Physical Review the BSE data set. Letters, 83(7):1471, 1999. [17] B. Rosenow, V. Plerou, P. Gopikrishnan, and H. E. Stanley. Portfolio optimization and the random magnet problem. EPL (Europhysics Letters), 59(4):500, 2002. 4. CONCLUSIONS [18] A. M. Sengupta and P. P. Mitra. Distributions of singular In this study, we performed portfolio optimization using fil- values for some random matrices. Physical Review E, tered correlation matrices obtained by two different proce- 60(3):3389, 1999. dures, namely a random matrix theory approach and the [19] V. Tola, F. Lillo, M. Gallegati, and R. N. Mantegna. single linkage clustering. A large set of experiments have Cluster analysis for portfolio optimization. Journal of Economic Dynamics and Control, 32(1):235–258, 2008. shown that using filtered covariance matrices the original [20] M. Tumminello, T. Di Matteo, T. Aste, and R. Mantegna. Markowitz solution is outperformed in terms of standard Correlation based networks of equity returns sampled at portfolio performance measures. different time horizons. The European Physical Journal B, 55(2):209–217, 2007. In the future, it would be interesting to analyze portfolio optimization using various estimators of expected returns together with different filtering procedures and check the methods using various stock exchange data sets and also varying the number of stocks considered. 47 Tight Online Bin Packing Algorithm with Buffer and Parametric Item Sizes ∗ József Békési Gábor Galambos Department of Applied Informatics Department of Applied Informatics Gyula Juhász Faculty of Education Gyula Juhász Faculty of Education University of Szeged University of Szeged H-6701 Szeged, POB 396, Hungary H-6701 Szeged, POB 396, Hungary bekesi@jgypk.szte.hu galambos@jgypk.szte.hu ABSTRACT the approximation of the optimal algorithm by an online In this paper we investigate the online bin packing problem algorithm is harder than the one by an offline algorithm, with constant buffer size, where the item sizes are in the which knows the whole input in advance. interval (0, 1 ],where r ≥ 2 is an integer. The problem was r originally given by Zheng et al [13]. They gave a lower bound To measure the efficieny of an online algorithm, we have and an algorithm, which were later improved by Zhang et several possibilities. In case of bin packing one of the clas- al[12]. We close the gap on the competitive ratio and give a sical methods is the worst case analysis. Traditionally the First Fit based optimal solution for the parametric version so called asymptotic competitive ratio R∞ is used to mea- for arbitrary r. sure the efficiency of an online algorithm in the bin packing literature. Its definition for algorithm A is the following: General Terms A(L) Theory R∞(A) := lim sup max OP T (L) = l . (1) l→∞ L l Keywords asymptotic competitive ratio, next fit, first fit, lower bound The online algorithm with the best known asymptotic com- petitive ratio 1.5815 is due to Heidrich and van Stee [9], 1. INTRODUCTION while the best lower bound is 1.54037 . . . given by Balogh et al. in [2]. One of the simplest online algorithms is Next Fit. One-dimensional bin packing is a well-known problem of It uses only one open bin and puts the next element into it, combinatorical optimization. It can be defined as follows: if it is possible. Otherwise it closes the bin and opens a new we are given a list L = {x1, x2, . . . , xn} of real numbers one. It is well-known that R∞ (called items) from the interval (0, 1] , and we want to pack N F = 2. A very famous online algorithm is First Fit (FF). It keeps open all bins used dur- each item into a unique capacity bin. The aim is to use ing the algorithm, and packs the next item into the first bin the minimal number of bins. It is known that finding the where it fits. Ullmann [11] proved first that the asymptotic optimal assignment is NP-hard [6]. Consequently, it is in- competitive ratio of FF is 1.7. teresting to find polynomial time approximation algorithms with good approximate behaviour (see surveys [3] [4]). It is clear that the online restriction results in a bad com- petitive ratio. To avoid this several relaxations of the online In practical situations it often happens, that the input is property and space limitations appear in the literature. Us- not known completely by the algorithm. This is the reason ing lookahead buffers, repacking or preordering of the input that researchers focused on studying online problems. In this are the most common methods. In general the algorithms case the items come one by one and the algorithm should that use such techniques are called semi-online algorithms. assign the next item to a bin immediately after its arrival. For example arriving of the input in decreasing order im- Later the items can no be repacked. The algorithms defined proves the asymptotic competitive ratio of NF to 1.69103... for online problems are called on-line algorithms. Of course [1]. A similar improvement can be achieved by repacking, ∗This research was supported by the Austrian-Hungarian which was proved by Galambos and Woeginger in [5]. Garey Action Foundation (Project number: 91öu2). et al. [7] and Johnson et al.[8] proved that FF works much better if the elements of the input are sorted in decreasing order. In this case R∞(F F D) = 11 . These techniques can 9 be used in many practical applications. Based on such an application Zheng et al [13] defined a vari- ant of the original bin packing problem. In this version a list of items with sizes bounded by a small interval arrive and they can be temporarily stored in a capacitated buffer be- fore packing them into bins. Zheng et al gave a lower bound of 4 and defined an algorithm with competitive ratio of 13 3 9 48 [13]. Later Zhang et al improved this to 1.423 and 1.4243 re- 3. LOWER BOUND CONSTRUCTION spectively [12]. So a small gap of 0.0013 remained and they We will construct the following instance. Let n > 0 be a analysed only the case when the upper bound of the sizes large integer and let k > 4 be an integer. Then we will of the item is 1 . In this paper we investigate the so-called consider the concatenated list L = (L 2 1, L2, . . . , Lk ), where r-parametric case where the item sizes are in the interval (0, 1 ]. First, we give an improved lower bound for any on- r • L1 contains n(mk − 1)(m1 − 1) items with size 1 + ε. line algorithm with constant buffer size S. We also give an m1 algorithm, which based on the method given by Galambos • Li contains n(mk − 1) items with size 1 + ε, for 2 ≤ mi and Woeginger in [5]. We prove that the competitive ra- i ≤ k − 1, tio of our algorithm is equal to the value of the new lower bound. So, we close the gap for arbitrary values of r, where • Lk contains n(mk − 1) items with size 1 − kε, mk−1 r = 2, 3, .... where ε is arbitrary small, ie. 1 − (m . m 1 + k − 3)ε > 1 k −1 mk 2. PRELIMINARIES We will use a sequence which was first introduced by Sylvester Using the above construction we can prove the following in [10] (1880) for the case r = 1, therefore, we refer to this theorem for the given problem. sequence as generalized Sylvester sequence. Theorem 3.1. Let us consider the r-parametric case. If For integers k > 1 and r ≥ 1, the generalized Sylvester there is a buffer with buffer size |B| = S, then for any online sequence mr1, . . . , mrk can be given by the following recursion. algorithm R∞(A) ≥ h∞(r). mr1 = r + 1, mr2 = r + 2, mr Since for r = 2 h∞(r) = 1.423117 . . . , for the problem con- j = mr j−1(mr j−1 − 1) + 1, for j = 3, . . . , k. sidered in [13] and [12] this lower bound gives an improve- ment. mrj r = 1 r = 2 r = 3 r = 4 r = 5 4. THE WEIGHTING FUNCTION j = 1 2 3 4 5 6 Investigating online bounded-spaced algorithms in [5] a weight- j = 2 3 4 5 6 7 ing function was defined to use during the analysis of an j = 3 7 13 21 31 43 algorithm. Generalizing the idea we define the following j = 4 43 157 421 931 1807 weighting function. j = 5 1807 24493 176821 865831 3263443 Table 1.1. The first few elements of the generalized  1 Sylvester sequences if k ≤ 5.  x + , if 1 < x ≤ 1  m m  m i i −1  i(mi − 1) W (x) =  m  i + 1 These sequences have the following properties.  x, if 1 < x ≤ 1 .  m m m i i+1 −1 i The weight of a bin is defined as the weight of all elements in it, and generally, the weight of a set is the weight of all items k in the set. It is easy to see that the following statements are X 1 1 1 = − , if j ≥ 2, mr mr − 1 mr − 1 true. i=j i j k+1 and Fact 4.1. k r X 1 1 (i) W (x) is nondecreasing in (0, 1]. + = 1 − if r ≥ 2. mr mr mr − 1 1 i=2 i k+1 (ii) For i ≥ 1, W (x) ≤ mi+1 if x ≤ 1 , x mi mi In the above equiations the sizes of the lists were derived (iii) For i ≥ 1, W (x) ≥ mi+1 if x ≥ 1 . x mi mi+1−1 from the generalized Sylvester sequences. For example, if r = 1, then the sizes are 1 + ε, 1 + ε, 1 + ε, 1 + ε, . . .. We 2 3 7 43 First we prove the following theorem. will use these sequences to give lower bounds. Similarly, we will allude to the following constants. Theorem 4.2. Let us consider the r-parametric problem. ∞ X 1 Then any packing of a list L the weight of any bin is at most h∞(r) = 1 + . mr − 1 h∞(r). i=2 i The first few values of h∞(r) : h∞(1) ≈ 1.69103, h∞(2) ≈ As a consequence of the above theorem, the following corol- 1.42312, h∞(3) ≈ 1.30238. lary is true. Generally, to avoid the pilling of indexes we will denote mrj by mj . Corollary 4.3. For any list L, W (L) ≤ h∞(r)OPT(L). 49 5. THE ALGORITHM [4] E.G. Coffman, G. Galambos, S. Martello, D. Vigo, Bin Our algorithm – called First Fit Decreasing with Buffer- packing approximation algorithms: combinatorial length 3 , FFD3B – is as follows. analysis. In: DZ. Du, P. Pardalos (eds) Handbook of Combinatorial Optimization. Kluwer,Dordrecht,151–208,(1999). (1) Fill up the buffer with the subsequent elements of [5] G. Galambos, G. J. Woeginger, Repacking helps in the list until the next item fits into the buffer. bounded space online bin packing, Computing, 49,329–338,(1993). (2) Order the items in the buffer in nonincreasing order, [6] M. R. Garey, D. S. Johnson, Computer and and put the items in three virtual bins each of them Intractability: A Guide to the theory of with capacity 1 using the FFD rule. NP-Completeness, New York, Freeman, (1979). (3) Check the sum of the weights in the virtual bins. [7] M. R. Garey, R. L. Graham, J. D. Ullman, Worst-case Find a set of items in the virtual bins with weight analysis of memory allocation algorithms. Proc. 4th greater or equal than one, open a new empty bin, put Symp. Theory of Computing (STOC), the items from the virtual bins into this new-opened ACM,143–150,(1973). bin, and close the bin. [8] D. S. Johnson, A. Demers, J. D. Ullman, M. R. Garey, R. L. Graham, Worst-case performance bounds for (4) If there is an unplaced item then goto (1), simple one-dimensional packing algorithms. SIAM J. (5) Empty the contents of the virtual bins into new- Comput.,3,256–278,(1974). opened bins. Close the bins, and quit. [9] S. Heydrich, R. van Stee, Beating the Harmonic lower bound for online bin packing, in Proceedings of the 43rd International Colloquium on Automata, Languages and Programming (ICALP),(2016). Theorem 5.1. If we pack the items of any list by the al- [10] J. Sylvester, On a Point in the Theory of Vulgar gorithm FFD3B then either in the step (3) we have a good Fractions, American Journal of subset of items (a subset of weight greater or equal than one) Mathematics,3,332–335,(1880). or we have enough place in the buffer to accept a new item [11] J. D. Ullman, The performance of a memory from the list. allocation algorithm, Technical Report 100, Princeton Univ., Princeton, NJ,(1971). Because of the above theorem, as a consequence we get the [12] M. Zhang, X.Han, Y. Lan, H-F. Ting, Online bin following theorem. packing problem with buffer and bounded size revisited. Journal of Combinatorial Optimization,DOI 10.1007/s10878-015-9976-5,(2015). Theorem 5.2. For any list L, W (L) ≥ F F D3B(L) − 3. [13] F. Zheng, L. Huo, E. Zhang, NF-based algorithms for online bin packing with buffer and bounded item size. Journal of Combinatorial Optimization, Therefore we get the following corollary. 30(2),360–369,(2015). Corollary 5.3. For the r-parametric case R∞(F F D3B) = h∞(r). 6. CONCLUSION In this paper we gave an online bin packing algorithm for the special problem when the sizes of the items are in the interval (0, 1 ] for arbitrary values of r, r = 2, 3, ... and when we can r use a capacitated lookahead buffer to temporarily store some elements. We proved that the asymptotic competitive ratio of our algorithm is tight. 7. REFERENCES [1] B.S. Baker, E.G. Coffman, A tight asymptotic bound for next-fit-decreasing bin-packing, SIAM J. Algebraic Discrete Methods,2,147–152,(1981). [2] J. Balogh, J. Békési, G. Galambos, New lower bounds for certain classes of bin packing algorithms, Theoretical Computer Science,440–441, 1–13,(2012). [3] E.G. Coffman, M. Garey, D. Johnson, Approximation algorithms for bin packing: A survey, In: Approximation algorithms for NP-hard problems, Edited by Dorit S. Hochbaum, PWS Publishing Co.,46–93,(1996). 50 A Branch-and-Cut Algorithm for the Multi-Depot Rural Postman Problem Elena Fernández Gilbert Laporte Jessica Universitat Politècnica de HEC Montréal Rodríguez-Pereira Catalunya-BcnTech Montréal, Canada Universitat Politècnica de Barcelona, Spain gilbert.laporte@cirrelt.ca Catalunya-BcnTech e.fernandez@upc.edu Barcelona, Spain jessica.rodriguez@upc.edu ABSTRACT same depot, and such that each required edge is traversed This paper studies the Multi-Depot Rural Postman Problem at least once. on an undirected graph. This problem is the extension of the well-known Undirected Rural Postman Problem to the The motivation for studying the MDRPP comes not only case where there are several depots instead of just one. A from its theoretical interest but also from its real-life appli- linear integer programming formulation that only uses bi- cations. Similarly to other arc routing problems, such ap- nary variables is proposed, which includes three families of plications arise in a wide variety of practical cases, namely constraints of exponential size. An exact branch-and-cut garbage collection, road maintenance, mail delivery, snow algorithm is presented, where violated constraints of both plowing or pipelines inspection, to name just a few. In large- types are separated in polynomial time. Despite the diffi- scale instances, there is usually more than one depot from culty of the problem, the numerical results from a series of which service demand can be satisfied. Such depots may be computational experiments with various types of instances vehicle stations, dump sites, replenishment points or relay illustrate a quite good behavior of the algorithm. boxes. A way of handling such problems is to first define a smaller operating area for each depot, by using a districting Categories and Subject Descriptors procedure in which each district contains a single depot, and then solving the RPP associated with each district. This so- [Theory of computation]: Mathematical optimization Branch- lution strategy is of course suboptimal. and-bound Algorithm design techniques The literature on Multi-Depot Arc Routing Problems (MDARP) General Terms is scarce. To the best of our knowledge, [4, 3] are the only Theory existing exact algorithms for the MDRPP. Both use natural decision variables which explicitly indicate the depot with which each traversed edge or arc is associated. Other than Keywords this, previous work on MDARPs focused on multi-depot Arc routing; multi-depot rural postman problem; worst-case capacitated arc routing problems (MDCARPs). Although analysis; polyhedral analysis; branch-and-cut there are some theoretical works and exact algorithms [14, 9] most of the existing literature on MDCARPs focuses on 1. INTRODUCTION heuristic solution algorithms (see, [1, 12, 11, 8, 7, 6]. In this work we develop a branch-and-cut algorithm for the Multi-Depot Rural Postman Problem (MDRPP), which ex- Multi-depot routing problems are also related to districting- tends the classical Rural Postman Problem (RPP) [13] where arc routing problems where a set of clusters or districts that there is only one depot. Similarly to the RPP, routes must be suitably partition the required edge set is sought. The de- designed to serve a given set of required edges. In contrast, sign of good districts at an strategic level, where demand in the MDRPP the depot from which each required edge is points or edges are allocated to depots, allows finding ef- served is not known in advance. The MDRPP combines two ficient routes at each district at an operational in a later types of decisions: the allocation of required edges to depots phase. There exists a rich districting literature in relation and the planning of routes. The objective is to determine a to arc routing (see, for instance, [12, 11, 10]). Two recent minimum cost set of routes, each starting and ending at the works on districting for arc routing are [2, 5]. A natural option when dealing with routing problems with multiple depots is to associate routes with depots and then to define the routes for each depot. From a modeling point of view, this can be easily done by using decision variables that explicitly indicate the arcs/edges traversed by the routes of each depot. Such an alternative offers two main advantages. On the one hand, in absence of capacity or other type of con- 51 straints, the feasibility of a route corresponding to a fixed nential size. The algorithm has been implemented and its depot is guaranteed by connectivity plus parity constraints. computational effectiveness tested on a series of computa- On the other hand, routes can be easily reproduced once tional experiments with a set of benchmark instances. The the values of the decision variables are known. The obvious numerical results assess the good performance of the solu- disadvantage of this option is that the number of variables tion algorithm, as it is capable of solving to optimality, in increases with the number of depots, so the success of exact reasonable computing times, instances with up to 700 ver- solution methods for large size instances becomes a chal- tices and four depots. lenge. The two previous MDRPP works referenced above [4, 3] use this type of decision variables. In [4] which deals 3. CONCLUSIONS with exactly the same undirected MDRPP that we study in We have studied the Multi-Depot Rural Postman Problem this paper, instances with up to 100 vertices and 4 depots (MDRPP), which is the extension of the RPP to the case were solved to optimality. In [3], which addresses a directed of several depots. A worst-case analysis of the MDRPP MDARP dealing with carriers collaboration, instances with with respect to the RPP indicates that the potential sav- up to 50 vertices and 2 depots were optimally solved. ings can be arbitrarily large, but also that the RPP may produce better solutions. Worst case analysis has been car- ried out and binary linear formulation for the MDRPP has 2. THE MULTI-DEPOT RURAL POSTMAN been presented. The formulation includes a new family of in- PROBLEM equalities that ensure that routes start and end at the same In this work we carry out a worst-case analysis to study the depot. The properties of the polyhedron associated with the potential savings that can be obtained, with respect to the formulation have studied. Furthermore, we have developed RPP and some variations, when multiple depots are consid- a branch-and-cut algorithm for the MDRPP based on the ered. We denote by z∗(M DRP P ) the optimal value of a proposed formulation. The algorithm is capable of solving MDRPP instance and by z∗(RP P ) the optimal value of the to optimality within reasonable computing times instances same instance with only one depot. It is possible to prove with up to 700 vertices and four depots. that savings can be obtained in both directions, since the best solutions are not necessarily obtained using more than 4. ACKNOWLEDGMENTS one depot. A summary of the results that we prove are: This research has been partially supported by the Span- ish Ministry of Economy and Competitiveness and EDRF funds through grants EEBB-I-16-10670, BES-2013-063633, Theorem 2.1. There exists no finite upper bound for the and MTM2012-36163-C06-05 and MTM2015-63779-R (MINECO/ ratio z∗(RP P )/z∗(M DRP P ). FEDER), and by the Canadien Natural Sciences and En- gineering Research Council under grant 2015-06189. This Theorem 2.2. z∗(RP P )/z∗(M DRP P ) ≥ 1/2, and the support is gratefully acknowledged. bound is asymptotically tight. 5. REFERENCES [1] A. Amberg, W. Domschke, and S. Voß. Multiple center Furthermore, we present a new integer linear formulation for capacitated arc routing problems: A tabu search the MDRPP with binary decision variables, which are solely algorithm using capacitated trees. European Journal associated with edges, but not with depots. In particular, of Operational Research, 124(2):360–376, 2000. two sets of binary variables are used, associated with the [2] A. Butsch, J. Kalcsics, and G. Laporte. Districting for first and second traversals of edges, respectively. For each arc routing. INFORMS Journal on Computing, e ∈ E, let xe be a binary variable indicating whether or not 26(4):809–824, 2014. edge e is traversed by some route. We denote by Ey ⊂ E [3] E. Fernández, D. Fontana, and M. Speranza. On the the set of edges that can be traversed twice in an optimal collaboration uncapacitated arc routing problem. solution. For each e ∈ Ey, let ye be a binary variable that Computers & Operations Research, 67:120–131, 2016. equal to one if and only if edge e is traversed twice. Indeed, [4] E. Fernández and J. Rodr´ıguez-Pereira. The the reduction on the number of decision variables used in our multi-depot rural postman problem. Submitted for formulation comes at the expenses of additional difficulties. publication, 2016. Now, connectivity and parity constraints are not enough to guarantee well-defined routes. To overcome this difficulty we [5] G. Garc´ıa Ayala, J. González-Velarde, propose a new set of constraints that guarantee that each R. R´ıos-Mercado, and E. Fernández. A novel model route terminates at the same depot where it has started, for arc territory design: Promoting Eulerian districts. which can be separated in polynomial time. International Transactions in Operation al Research, 23:433–458, 2015. Likewise, we study the polyhedral properties of the formula- [6] B. Golden and R. Wong. Capacitated arc routing tion. In this sense, we prove that the convex hull of the poly- problems. Networks, 11(3):305–315, 1981. hedron associated with feasible solutions is full-dimensional [7] H. Hu, T. Liu, N. Zhao, Y. Zhou, and D. Min. A under a certain conditions. Furthermore, under mild condi- hybrid genetic algorithm with perturbation for the tions the trivial inequalities and some families of constraints multi-depot capacitated arc routing problem. Journal are facet defining. of Applied Sciences, 13(16):32–39, 2013. [8] A. Kansou and A. Yassine. A two ant colony We finally propose an exact branch-and-cut algorithm with approaches for the multi-depot capacitated arc routing exact separation for all the families of inequalities of expo- problem. In International Conference on Computers & 52 Industrial Engineering, 2009. CIE 2009., pages 1040–1045. IEEE, 2009. [9] D. Krushinsky and T. Van Woensel. Lower and upper bounds for location-arc routing problems with vehicle capacity constraints. European Journal of Operational Research, 244(1):100–109, 2015. [10] L. Muyldermans. Routing, districting and location for arc traversal problems. Technical report, PhD dissertation, Catholic University of Leuven, Leuven, Belgium, 2003. [11] L. Muyldermans, D. Cattrysse, and D. Van Oudheusden. Districting for arc-routing applications. Journal of the Operational Research Society, 54:1209–1221, 2003. [12] L. Muyldermans, D. Cattrysse, D. Van Oudheusden, and T. Lotan. Districting for salt spreading operations. European Journal of Operational Research, 139(3):521–532, 2002. [13] C. S. Orloff. A fundamental problem in vehicle routing. Networks, 4(1):35–64, 1974. [14] S. Wøhlk. Contributions to arc routing. Technical report, PhD dissertation, University of Southern Denmark, Odense, Denmark, 2004. 53 Allocation and Pricing on a Network in Presence of Negative Externalities [Extended Abstract] Saša Pekeč Duke University pekec@duke.edu 1. INTRODUCTION a fierce competitive setting in which rival obtaining an item We discuss network optimization problems that arise natu- imposes a loss for agent i who did not get an item. Indus- rally in the context of optimally allocating and pricing indi- tries that involve patent and intellectual property protection visible homogeneous items to unit-demand agents in a net- or a regulatory approval are examples in which an agent work, with the caveat that the agents face negative allocative who obtains an item imposes negative externalities on all externalities. Specifically, agent’s value for (not) getting an of its rivals. For example, when pharmaceutical companies item depends on whether any of its rivals did (not) get an race to develop a drug, the company who is first to obtain item. The rivalry is represented by a network with nodes the patent and/or regulatory approval essentially eliminates representing agents and arcs representing whether an agent competition from the market for that drug, thereby turning considers another agent its rival. into losses all of the rival’s R&D investment for the same purpose drug development. An agent i could have four different values depending on the allocation structure: wi is the value if agent i gets an item We focus on the setting in which agents’ values are private but no rival gets it; vi is the value if agent i gets an item and and the monopolist seller’s objective is to maximize expected at least one if its rivals also gets it; without loss of generality, revenue. In other words, we are looking for the optimal de- we normalize to zero agent i’s value for no item allocated to terministic mechanism in presence of multi-dimensional in- i nor any of its rivals; finally, agent i experiences a loss −αi dependent private values. The mechanism design problem if i does not get an item but one of the rival’s does get it. with externalities has been studied in computer science lit- (Note that this valuation structure could be generalized be- erature, mostly motivated by mechanisms for allocation and yond binary, so that agent i’s value depends on the number pricing of digital ads. (The prevailing approach to design an of rivals who also got an item.) With normalization for one auction procedure is to design a polynomial time algorithm value, agents’ valuation function is three-dimensional. Such for solving given mechanism design problem and then inter- valuation structure generalizes those studied in [10] and [2] pret it as an auction.) Allocative externalities in digital ad where results analogous to those presented below were ini- context arise naturally: an advertiser could value an exclu- tially established. sive ad placement; similarly an advertiser who lost out on ad placement might prefer competitor ad not being shown The settings that can be represented with such allocation either. Some notable works that focus on related problems structure are prevalent in business. For example, represen- include [6],[11],[8],[3],[12],[9],[7],[4],[5]. tation wi ≥ vi ≥ 0 = −αi describes a setting in which agents put a premium wi − vi on an exclusive allocation, and lose nothing (0 = −α 2. EXCLUSIVITY MODEL i) if a rival gets the item. Exclusivity is considered valuable in a variety of settings. For example, A monopolist seller has K identical items that can be al- the right to sell a product or offer a service exclusively is located among N = {1, 2, . . . , n} unit-demand agents. Re- more valuable than having to compete for a market share lationships among agents are defined by a network (N, E) with rivals who might secure the same right. The scope of where E is the 0-1 adjacency matrix: eij = 1 if and only if exclusivity rights might be limited to a geographic area or agent i considers agent j, j 6= i, to be its rival (e.g., i consid- a market segment, as is common with franchising and with ers j a competitor or i and j are geographical neighbors or exclusive sales, service and distribution agreements. For an- directly connected in a social network). Let S(i) ⊆ N \ {i} other example, representation wi = vi ≥ 0 ≥ −αi describes denote the set of agent i’s neighbors, i.e., the set of all other agents that i considers to be related to her: S(i) = {j ∈ N : eij = 1}. Agent i’s type is represented by a vector vi = (wi, vi), where wi is agent i’s (exclusivity) valuation for the item if none of her neighbors j ∈ S(i) gets an item, and where vi is agent i’s (non-exclusivity) valuation for the item if there is a neighbor j ∈ S(i) who also obtains the item. We assume wi ≥ vi ≥ 0, 54 where, without the loss of generality, we can normalize by dimensional mechanism design problems are sensitive to var- setting agent i’s value for not getting an item to zero. Note ious details of the environment, e.g., the seller’s belief about that wi − vi can be thought of as the exclusivity premium. the agents’ types. Hence, there is little hope for finding closed-form solutions. This unappealing feature of multi- We consider the setting in which vi is privately known, while dimensional mechanism design problems has been demon- network (N, E) and the number of available items K are strated by [15], [1], and [13]. Furthermore, a numerical ap- publicly known. Agent i’s private information vi = (wi, vi) proach to solving this problem also has a limited potential is drawn independently (across agents) from a joint cumula- given that even simplistic instances exhibit computational tive distribution function Fi with support V = [w, w]×[v, v]. complexity obstacles: for example, even if vi = (1, 0) for The corresponding density function is denoted by fi. Seller’s all i (i.e., agents only value exclusivity and this valuation is valuation vector is (0, 0). the same for all agents and is publicly known, so there is no private information at all in this setting), the Problem By the Revelation Principle [14], we consider direct mecha- (General-RMP) reduces to determining whether there exists nisms that allocate items based on agents’ reports. Reports a K-independent set on (N, E). from all agents are denoted by v = (v v− b bi , b i) ∈ V 2, with commonly (ab)used convention that subscript −i denotes in- To avoid this inherent analytical obstacle stemming from formation corresponding to all agents except agent i (e.g., mechanism design, we focus on valuation structures that v− v have one-dimensional representation. The mechanism de- b i is a shorthand notation for {bj : j 6= i}). A direct mech- anism specifies the allocation: pi : V 2 → {0, 1} is agent i’s sign problem can be solved for a large class of such valua- probability to get an item, and payments: mi : V 2 → R is tions, including additive premium valuations wi = vi + bi the payment from agent i to the seller, for each v ∈ V 2. If or multiplicative premium valuations w b i = bivi, where bi is agent i does not participate, she does not get any item. publicly known. Here, we illustrate our findings with modu- lar valuations which we name local linear exclusivity (LLE): Agent i’s ex post utility when she reports her type as v bi , X while her true type is v w α i, and when other agents report v−i, i = vi + ij vj (2) is j∈S(i) Y U with publicly known non-negative matrix A = [α i (v v (1 − p v ij ]. Also of bi , vi , v−i ) = wipi (bi, v−i) j (bi, v−i)) interest is a special case of LLE, where for every j, j∈S(i)   X α Y ij ≤ 1. (3) + vipi(v 1− (1−p v bi , v−i ) j (bi, v−i)) {i:j∈S(i)} j∈S(i) We say that such valuations satisfy bounded local linear ex- − mi (v bi , v−i ) . (1) clusivity (BLLE). Therefore, the LP relaxation of the seller’s Revenue Maxi- mization Problem (General-RMP) is Proposition 1. Suppose that valuations are BLLE and N Z publicly known and that the seller’s supply is unlimited, i.e., X max mi (vi, v−i) dF (v) K ≥ n. Then, allocating an item to every buyer i who pays {pi,mi}N i=1 i=1 vi is an optimal solution to the (FB-RMP) problem. subject to (EPIC) U Hence, the problem is trivial with publicly known BLLE i (vi, vi, v−i) ≥ Ui (v bi , vi , v−i ) valuations. However, introducing private values drastically for all i and all vi, v bi , v−i , changes the complexity of the problem. (EPIR) Ui (vi, vi, v−i) ≥ 0 Proposition 2. Suppose that valuations are BLLE, and for all i and all vi, v−i, that the seller’s supply is unlimited, i.e., K ≥ n. Then allo- cating exclusively to some buyers could be optimal. Further- n more, finding a deterministic optimal solution to the (SB- X (Feasibility) pi (v) ≤ K and 0 ≤ pi (v) ≤ 1 RMP) problem is at least as hard as finding the maximum i=1 independent set in (N, E), even if virtual valuations ψi ≥ 0 for all i, for all i. where (EPIC) is the ex-post incentive compatibility con- straint to ensure truth-telling and (EPIR) is the ex-post in- 3. GENERALIZED EXTERNALITIES dividual rationality constraint to ensure participation. (In We establish the following results within the exclusivity model the rest of this paper, we simplify the notation by denoting and hence it extends to a generalized negative externalities Ui (vi, vi, v−i) by Ui (vi, v−i).) setting. Problem (General-RMP), as well as the corresponding so- cial surplus maximization problem, is a multi-dimensional Theorem 1. The expected revenues from the optimal mech- mechanism design problem which is extremely difficult to anism are not monotone with respect any dimension of agents’ solve analytically. It is well known that solutions to multi- valuations. 55 This result is in stark contrast with standard mechanism conference on Electronic commerce, pages 361–370, literature that assumes no extarnalities. In fact, even with San Jose, California, USA, 2011. ACM. positive externalities, the monotonicity of the expected rev- [10] C. Deng and S. Pekeč. Optimal allocation of enues in the optimal mechanism is established in a straight- exclusivity contracts. Technical report, Duke forward manner. University, 2013. [11] A. Ghosh and A. Sayedi. Expressive auctions for A direct consequence of this result involves rivalry networks. externalities in online advertising. In Proceedings of the 19th international conference on World Wide Web, WWW ’10, pages 371–380, New York, NY, USA, Corollary 1. The expected revenues of the optimal mech- anism for a given rivalry network are not monotone with 2010. ACM. respect to arc addition/deletion. [12] N. Haghpanah, N. Immorlica, V. Mirrokni, and K. Munagala. Optimal auctions with positive network externalities. In Proceedings of the 12th ACM Finally, consider the setting in which the seller could also conference on Electronic commerce, EC ’11, pages design/impose a rivalry network and answer the question 11–20, New York, NY, USA, 2011. ACM. of the (non)-existence of an optimal (extremal) rivalry net- [13] A. M. Manelli and D. R. Vincent. Multidimensional work. In general, such an extremal network could depend mechanism design: Revenue maximization and the on the distributional assumptions on the bidder valuations. multiple-good monopoly. Journal of Economic Theory, However, we show that in some restricted settings such an 137(1):153–185, Nov. 2007. extremal network not only exists, but is also independent of [14] R. B. Myerson. Optimal auction design. Mathematics distributional assumptions. of Operations Research, 6(1):58–73, 1981. [15] J.-C. Rochet and P. Choné, P. Ironing, sweeping, and Theorem 2. Suppose there exists v > 0 such that multidimensional screening. Econometrica, 66(4):783–826, July 1998. vi = v > 0 > −αi for all agents i and that all values are publicly known. Sup- pose that the seller has k = 1 item to allocate. Then there ex- ists extremal networks that maximize seller’s revenue across all possible rivalry networks. 4. REFERENCES [1] M. Armstrong. Multiproduct nonlinear pricing. Econometrica, 64(1):51–75, Jan. 1996. [2] A. Belloni, C. Deng, and S. Pekeč. Mechanism and network design with private negative externalities. Technical report, Duke University, 2015. [3] S. Bhattacharya, J. Kulkarni, K. Munagala, and X. Xu. On allocations with negative externalities. In WINE, pages 25–36, 2011. [4] Z. Cao, X. Chen, X. Hu, and C. Wang. Pricing in social networks with negative externalities. In International Conference on Computational Social Networks, pages 14–25. Springer, 2015. [5] J. Chen and S. Micali. Auction revenue in the general spiteful-utility model. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, pages 201–211. ACM, 2016. [6] P.-A. Chen and D. Kempe. Bayesian auctions with friends and foes. In M. Mavronicolas and V. Papadopoulou, editors, Lecture Notes in Computer Science, volume 5814, pages 335–346. Springer Berlin / Heidelberg, 2009. [7] V. Conitzer and T. Sandholm. Computing optimal outcomes under an expressive representation of settings with externalities. Journal of Computer and System Sciences, 78(1):2 – 14, 2012. [8] F. Constantin, M. Rao, C.-C. Huang, and D. C. Parkes. On expressing value externalities in position auctions. In 6th Workshop on Ad Auctions (at EC-10), 2010. [9] C. Deng and S. Pekeč. Money for nothing: exploiting negative externalities. In Proceedings of the 12th ACM 56 The vertex sign balance of (hyper)graphs ∗ Dezs˝ o Miklós Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences Budapest P.O.B. 127 H-1364 Hungary e-mail: dezso@renyi.hu Joint work with J. Ahmann, E. Collins-Wildman, J. Wallace, S. Yang and Yicong Guo Budapest Semesters in Mathematics and Gy. Y. Katona Budapest University of Technology and Economics Department of Computer Science and Information Theory Budapest P.O.B. 91 H-1521 Hungary e-mail: kiskat@cs.bme.hu August 29, 2016 We define the vertex sign balance of a (hyper)graph G as the minimum num- ∑ ber of non-negative edges over all ω : V ( G) → R satisfying ω( x) ≥ 0 x∈V ( G) (i.e., the minimum number of edges with non-negative sum of weights of ver- tices in it). Clearly, the vertex sign balance of a (hyper)graph is always less than or equal to the minimum degree, as it is shown by assigning a large pos- itive number to a minimum degree vertex and close to 0 negative numbers to all other vertices. We will denote the vertex sign balance of a (hyper)graph by µ( G), as it will be shown closely related to the matching number of the (hyper)graph, ν( G). Huang and Sudakov [1], and Pokrovsky[3] (probably inde- pendently) introduced the so-called Manickam-Miklós-Singhi (MMS-)property of a hypergraph: a hypergraph H has the MMS-property if µ( H) = δ( H), the minimum degree of the hypergraph. (The definition is based on the 30 years old conjecture of Manickam, Miklós[2], and Singhi, which says, in this language, that the complete r-uniform hypergraph on n vertices has the MMS-property if n ≥ 4 r.) Both of the papers (of Huang and Sudakov and Pokrovsky) used the notion of MMS-property to prove a better than earlier known bound on n to ensure the MMS-property of this complete r-uniform hypergraph on n vertices. Huang and Sudakov showed that every r-uniform n-vertex hypergraph with ∗ This research was partially supported by National Research, Development and Innovation Office – NKFIH Fund No. SNN-117879. 1 57 equal codegrees and n > 10 r 3 has the MMS-property while Pokrovskiy proved that if a (any) d-regular r-uniform n-vertex hypergraph has the MMS-property, then the complete r-uniform hypergraph on n vertices has the MMS-property as well. Here we will explore this newly introduced graph parameter, the vertex sign balance of graphs and — to some extend — hypergraphs. Definition. The vertex sign balance of a (hyper)graph G (or H), denoted by µ( G) ( µ( H), resp.) is defined to be the minimum number of non-negative edges ∑ for all weighting of vertices ω : V ( G) → R satisfying ω( x) ≥ 0, i.e. x∈V ( G) the minimum number of edges with non-negative sum of weights of vertices in it. For a given weight assignment to the vertices, an edge will be called positive (non-negative, negative) if the sum of the weights of the vertices in the edge are positive (non-negative, negative, resp.) Remark. The vertex sign balance of a (hyper)graph is always less than or equal to the minimum degree. Theorem 1. For any graph G the following statements are equivalent: 1. µ( G) ≥ 1 2. the fractional matching number, ν∗( G) = n/ 2 . 3. There exists no independent subset of vertices S ⊂ V ( G) such that |N( S) | = |S| − 1 . 4. G has a perfect 2-matching, that is, there exists a collection of edges (multiple choice of an edges is allowed) which covers every vertex exactly twice. We mention here that a perfect 2-matching can always be pictured as the union of disjoint odd cycles (whose edges are counted once) and a matching, i.e., collection of disjoint edges (which are counted twice). A similar theorem can be stated for hypergraphs as well: Theorem 2. For r-uniform hypergraph H = ( V, E) with n vertices, µ( H) = 0 if and only if the fractional matching number of H, ν∗( G) , is less than n . r The existence and structure of a perfect 2-matching of a graph can be used not only for determining whether µ( G) > 0 but also to find the value of µ( G), as the following theorem gives: Theorem 3. 1. µ( G) = the min # of edges one can remove from G to get G∗ such that there exists S∗ ⊂ V ( G) and |S∗| > |N ( S∗) |. 2. µ( G) = the min # of edges one can remove from G so that the remaining graph does not have a perfect 2-matching. This characterization of µ( G), together with some recent result on the com- putational complexity of the existence of a bounded size set of edges in a bipar- tite graph covering all maximal size matchings by Zenklusen, Ries, Picouleau, Werra, and Bentz[4] we managed to show that finding the vertex sign balance of a graph is NP-complete: 2 58 Theorem 4. The following two problems about the vertex sign balance of a (hyper) graph are N P − complete: V SB( G, k) Instance: An undirected (hyper) graph G = ( V, E) and a positive integer 0 ≤ k ≤ |E|. Question: Is ε( G) ≤ k? V SB( G) Instance: An undirected (hyper) graph G = ( V, E) Question: Is ε( G) < δ( G) , that is, is it true that G does not have the MMS-property? Further, we succeeded in characterizing some classes of graphs which have MMS property and gave lower bounds on µ( G) in terms of the minimum degree (which is constant in case of regular graphs) of the graphs: Theorem 5. Suppose G is a graph with n many vertices. If δ( G) ≥ n 2 − 1 , then 2 n− 6 G has MMS-property. The following theorem will give the exact bounds on δ( G), the minimum degreeof a graph to ensure the MMS-property. Theorem 6. (Sharp bounds for minimum degree) For any graph G with n vertices where n ≥ 6 , if n is odd and the minimum degree δ( G) ≥ n+5 , or n is 2 even and δ( G) ≥ n+2 , then G has MMS-property. This bound is sharp. 2 In case the minimal degree is smaller than the required lower bound, it still implies high vertex sign balance value (though not as high as δ). Theorem 7. For any graph G with n many vertices, if δ( G) ≥ n 2+8 t− 1 , then 2 n+2 µ( G) ≥ t for all t ≤ δ( G) . The following theorems will give lower bound on the vertex sign balance for regular graphs: Theorem 8. For any k-regular graph G, µ( G) ≥ k 2 Theorem 9. For a k-regular graph, G, with n vertices, µ( G) = k if and only if 2 G has an independent subset S such that |S| = n− 1 (or G is disconnected and 2 has some component with this property). Corollary 1. For a k-regular graph G with n vertices, the lower bound µ( G) = k 2 can only be achieved if G has an odd number of vertices and k ≤ n+1 . (These 2 are necessary but not sufficient conditions) Corollary 2. For a connected k-regular graph G on an even number of vertices, µ( G) ≥ ⌊ k + 1 ⌋ is a sharp lower bound. 2 3 59 References [1] Huang, H., and Sudakov, B., The Minimum Number of Nonnegative Edges in Hypergraphs, The Electronic Journal of Combinatorics, 21(3), 2014, pp. 3-7. [2] Manickam, N., and Miklós, D., On the number of non-negative partial sums of a non-negative sum, in Colloq. Math. Soc. Janos Bolyai, Vol. 52, pp. 385-392. [3] Pokrovskiy, A., A linear bound on the Manickam–Miklós–Singhi conjec- ture, Journal of Combinatorial Theory, Series, 133 2015, pp. 280-306. [4] R. Zenklusen, B. Ries, C. Picouleau, D. de Werra, M. -C. Costa, and C. Bentz, Blockers and transversals, Discrete Mathematics, 309 2009, pp. 4306-4314. 4 60 Packing tree degree sequences [Extended Abstract] Kristóf Bérczi Zoltán Király Department of Computer Department of Computer Science Eötvös Loránd Science Eötvös Loránd University University Pázmány Péter sétány 1/c Pázmány Péter sétány 1/c 1117 Budapest, Hungary 1117 Budapest, Hungary berkri@cs.elte.hu kiraly@cs.elte.hu ∗ Changshuo Liu István Miklós Budapest Semesters in Rényi Institute Mathematics Reáltanoda u. 13-15 Bethlen Gábor tér 2 1053 Budapest, Hungary 1071 Budapest, Hungary miklos.istvan@renyi.mta.hu cl20@princeton.edu ABSTRACT sequences are degree sequences of trees and they do not share A degree sequence D = d1, d2, . . . dn is a series on non- common leaves. negative integers. A degree sequence is graphical if there is a vertex labeled graph G in which the degrees of the 1. INTRODUCTION vertices are exactly D. Such graph G is called a realiza- Packing degree sequences is related to discrete tomography. tion of D. The color degree matrix problem also known as The central problem of tomography is to reconstruct spatial edge disjoint realization, edge packing or graph factorization objects from lower dimensional projections. The discrete 2D problem is the following: given a c × n degree matrix D = version is to reconstruct a coloured grid from vertical and {{d1,1, d1,2, . . . d1,n}, {d2,1, d2,2, d2,n}, . . . {dc,1, dc,2, dc,}}, in horisontal projections. In the simplest version, this prob- which each row of the matrix is a degree sequence, decide lem is to reconstruct the colouring of an n × m grid with if there is an ensemble of edge disjoint realizations of the the requirement that each row and colomn has a specific degree sequences. Such set of edge disjoint graphs is called number of entries for each colour. Such coloured matrix a realization of the degree matrix. A realization can also can be considered as a factorization of the complete bipar- be presented as an edge colored simple graph, in which the tite graph K edges with a given color form a realization of the degree n,m. Indeed, for each colour ci, the 0-1 matrix obtained by replacing c sequence in a given row of the color degree matrix. i to 1 and all other colours to 0 is an adjacency matrix of a simple bipartite graph such that the disjoint union of these simple graphs is K It is known that the color degree sequence problem is NP- n,m. The pre- scribed number of entries for each colour are the degrees of complete even if the number of colors is 3. Here we consider the simple bipartite graphs. Therefore, an equivalent prob- a special case when two of the degree sequences are degree lem is to give a factorization of the complete bipartite graph sequences of trees. We show that this special case is easy. given prescribed degree sequences. We also show that the problem is still NP-complete if only one of the degree sequences is a degree sequence of a tree. It is also possible to consider the simple (not bipartite) ver- sion of the graph factorization problem. Obviously, the sum We also consider counting the number of solutions. We show of the degrees for each vertex must be n − 1 when the com- that efficient approximations for the number of solutions ex- plete graph K ists as well as an almost uniform sampler exists if two degree n is factorized. Therefore, if there are k degree sequences, the last degree sequence is unequivocally deter- ∗Secondary affiliation: SZTAKI, 1111 Budapest, Lá- mined by the first k − 1 degree sequences. gymányosi u. 11, Hungary When k = 2, the problem is reduced to the degree sequence problem, and can be solved in polynomial time [2, 3]. When k = 3, the problem already becomes NP-complete [1]. How- ever, special cases are polynomial solvable even when k = 3. Such a special case is when one of the degree sequences is almost regular, that is, any two degrees differ at most by 1 [5]. In this paper we consider the case when k = 3 and two of the degree sequences are tree degree sequences. We show that 61 this special case is polynomial solvable. Some results on the The total variational distance dT V (p, π) between two dis- solution space is also presented. We also provide a negative crete distributions p and π over the set X is defined as result: when only one of the degree sequences is tree degree 1 X sequence, the problem is still NP-complete. dT V (p, π) := |p(x) − π(x)| (2) 2 x∈X 2. PRELIMINARIES In this section we give the definitions needed to state the Definition 7. A counting problem in #P is in FPAUS theorems. The central problem in this paper is the colour if there exists a randomized algorithm (a Fully Polynomial degree sequence problem. Almost Uniform Sampler that is also abbreviated as FPAUS) such that for any instance x, and > 0, it generates a ran- dom element of the solution space following a distribution p Definition 1. A degree sequence D = d1, d2, . . . dn is a series on non-negative integers. A degree sequence is graph- satisfying ical if there is a vertex labeled simple graph G in which dT V (p, U ) ≤ (3) the degrees of the vertices are exactly D. Such graph G is called a realization of D. The colour degree matrix prob- where U is the uniform distribution over the solution space, lem is the following: given a c × n degree matrix D = and the algorithm has a time complexity bounded by a poly- {{d nomial of |x|, and − log(). 1,1, d1,2, . . . d1,n}, {d2,1, d2,2, d2,n}, . . . {dc,1, dc,2, dc,}}, in which each row of the matrix is a degree sequence, decide if there is an ensemble of edge disjoint realizations of the de- 3. PACKING TWO TREES gree sequences. Such set of edge disjoint graphs is called a Our main result is about packing two tree sequences with realization of the degree matrix. no common leaves. Although it is well known, we also define the tree degree Theorem 1. Let D = d sequences and caterpillars. 1, d2, . . . dn and F = f1, f2, . . . fn be two tree degree sequences, such that mini{di + fi} ≤ 3. Then D and F have edge disjoint caterpillar realizations. Definition 2. A degree sequence D = d1, d2, . . . dn is called a tree sequence if Pn d i=1 i = 2n − 2 and each degree is positive. The theorem implicitely states that if two degree sequences do not share common leaves then their sum is graphical. The proof of this theorem is skipped here. If the two trees Definition 3. A tree is a caterpillar if the non-leaf ver- have common leaves, their sum is not necessarily graphical. tices form a path in it. However, when their sum is graphical, they do have edge disjoint realizations, as Kundu already proved it. In this paper, we are using complexity classes which might be unfamiliar for non-expert readers, therefore we give the Theorem 2. [6] Let D = d1, d2, . . . dn and F = f1, f2, . . . fn definition of them here. be two tree degree sequences. Then there exist edge dis- joint tree realizations of D and F iff D + F (= d1 + f1, d2 + f Definition 4. A decision problem is in NP if a non- 2, . . . dn + fn) is graphical. deterministic Turing Machine can solve it in polynomial time. An equivalent definition is that a witness proving the “yes” answer to the question can be verified in polynomial time. We also give another theorem that also provide edge disjoint A counting problem is in #P if it asks for the number of realizations. witnesses of a problem in NP. Theorem 3. Let D = d1, d2, . . . dn and F = f1, f2, . . . fn Definition 5. A counting problem in #P is in FP if be two tree degree sequences, such that mini{di + fi} ≤ 3. there is a polynomial running time algorithm which gives Let T1 and T2 be random realizations of D and F uniformly the solution. It is #P − complete if any problem in #P can distributed. Then the expected number of common edges of be reduced to it by a polynomial-time counting reduction. T1 and T2 is strictly less that 1 if there exists a vertex which is not a leaf in both trees and at most 1 if each vertex is a leaf in exactly one of the trees. Definition 6. A counting problem in #P is in FPRAS ( Fully Polynomial Randomized Approximation Scheme) if there exists a randomized algorithm such that for any in- If there is a vertex which is not a leaf in both trees then stance x, and , δ > 0, it generates an approximation ˆ f for there must exist edge disjoint realizations T1 and T2, oth- the solution f , satisfying erwise the number of common edges cannot be less than 1 in expectation. If each vertex is a leaf in exactly one of the f P ≤ ˆ f ≤ f (1 + ) ≥ 1 − δ (1) trees then there must be vertices v1 and v2 which have de- 1 + gree 1 in D and v3 and v4 which have degree 1 in F (recall and the algorithm has a time complexity bounded by a poly- that any tree contains at least 2 leaves). Then there exist a nomial of |x|, 1/ and − log(δ). pair of trees T1 and T2 such that both trees contain edges 62 (v1, v3) and (v2, v4). Indeed, the degree 1 vertices can be the fraction of edge disjoint realizations is also polynomially connected to any of the non-leaf vertices. This means that bounded. This means that FRPAS and FPAUS algorithms there are trees having at least 2 common edges, which is can be designed in exactly the same way than above. above the average. However, then there must be a pair of trees with less than average number of common edges. That It remains an open question whether or not similar theo- is, they are edge disjoint realizations. rems exist for the case when the tree degree sequences have common leaves. Also it is open if exact counting of the edge Now we turn to the case when there are common leaves. The disjoint solutions is possible in polynomial time, although following lemma helps here. the natural conjecture is that this counting problem is #P- complete. Lemma 4. Let D = d1, d2, . . . dn and F = f1, f2, . . . fn be 5. AN NP-COMPLETE THEOREM two tree degree sequences, such that d1 + f1 ≥ d2 + f2 ≥ What can we say when only one of the degree sequences is . . . ≥ dn + fn) is graphical. Define a tree degree sequence and the other is arbitrary? Unfortu- i := min{j|j > 1∧[(d nately, we have a negative result here. 1 > 1∧fj > 1)∨(f1 > 1∧dj > 1)]} (4) Furthermore, assume that dn + fn = 2. Then the degree sequence is also graphical which is obtained from D + F by Theorem 6. It is NP-complete to decide if there is an removing 1 from d1 + f1 and di + fi and deleting dn + fn. edge disjoint realization of a tree degree sequence and an arbitrary degree sequence. (It is not required that the tree degree sequence have a tree realization). This lemma says that we can construct an edge disjoint re- alization of D and F by iteratively removing the common leaves and modifying the remaining degree sequences, and Proof. We use the theorem by [1] that it is NP-complete the lemma guarantees that the remaining degree sequence to decide if two bipartie degree sequences has an edge dis- will be graphical. Once there is no common leaf, then we joint realizations. We have the following observations. can apply Theorem 1.The so obtained caterpillar realizations can be extended to edge disjoint realizations of the original • A bipartite degree sequence pair degree sequences by adding back the common leaves. D = (d1,1, d1,2, . . . d1,n ), (d ) 1 2,1, d2,2, . . . d2,n2 4. COUNTING AND SAMPLING REALIZA- and TIONS F = (f1,1, f1,2, . . . f1,n ), (f ) 1 2,1, f2,2, . . . f2,n2 Since typically there are more than one realizations when a realization exists, and typically the number of realizations has an edge disjoint realization iff the simple degree might grow exponentially, is is also a computational chal- sequence pair lenge to estimate their number and/or sample almost uni- D0 = (d1,1 + n1 − 1, . . . d1,n + n1 − 1, d2,1, . . . d2,n ) formly a solution. Here we have the following theorem. 1 2 and F 0 = (f , f + n Theorem 5. Let D = d 1,1, . . . f1,n 2,1 + n2 − 1, . . . f2,n 2 − 1) 1, d2, . . . dn and F = f1, f2, . . . fn 1 2 be two tree degree sequences, such that mini{di + fi} ≤ 3. has an edge disjoint realization. Indeed, if an edge Then there is an FPRAS for estimating the number of dis- disjoint bipartite realization of D and F is given, then joint realizations and there is an FPAUS to almost uniformly the complete graph on the first vertex class can be sample realizations. added to the first realization and the complete graph on the second vertex class can be added to the second realization to get a (now non-bipartite) realization of This theorem is based on Theorem 3. If there is a vertex D0 and F 0. On the other hand, it is easy to see that any which is not a leaf then it is easy to show that the expected realization of D0 contains Kn on the first n1 vertices, number of common edges is polynomially separated from 1, 1 and any realization of F 0 contains Kn on the last n2 that is, the inverse of 1 minus the expectation is polynomi- 2 vertices. Given an edge disjoint realization of D0 and ally bounded. It means that in a polynomial number of trials F 0, deleting Kn from D0 and Kn from F 0 yields an of random couple of trees, at least one edge disjoint realiza- 1 2 edge disjoint realization of D and F . tion is expected. It follows from the central limit theorem that an FPRAS algorithm can be designed based on this • The degree sequence pair D = d1, d2, . . . dn and F = property. It is also well known that an FPAUS algorithm f1, f2, . . . fn has an edge disjoint realization iff the de- can be desiged in this case, see [4] for techical details. gree sequence pair D0 = d1 + 1, d2 + 1, . . . dn + 1, n and F 0 = f1, f2, . . . fn, 0 has an edge disjoint realization. When each vertex is a leaf in exactly one of the trees than Indeed, let G1 and G2 be an edge disjoint realization it is easy to show that a non-negligible fraction of the ran- of D and F . Then add a vertex vn+1 to G1, and con- dom pair of trees contains at least two common edges. In nect it to all the other vertices to get a realization of fact, the inverse of the fraction of the couple of trees T1 and D0. Add an izolated vertex vn+1 to G2 to get a real- T2 that have the above mentioned common edges (v1, v3) ization of F 0. These realizations of D0 and F 0 are edge and (v2, v4) is polynomially bounded. Then the inverse of disjoint. On the other hand, in any realization of D0, 63 vn+1 is connected to all the other vertices. If edge dis- 7. REFERENCES joint realizations of D0 and F 0 are given, delete vn+1 [1] Dürr, C., Guinez, F., Matamala, M.: Reconstructing from both realizations to get edge disjoint realizations 3-colored grids from horizontal and vertical projections of D and F . is NP-hard. European Symposium on Algorithms, 776–787 (2009) • The degree sequence pair D = d1, d2, . . . dn and F = [2] S.L. Hakimi: On the degrees of the vertices of a f1, f2, . . . fn has an edge disjoint realization iff the de- directed graph. J. Franklin Institute, 279(4):290–308. gree sequence pair D0 = d1, d2, . . . dn, 1, 1 and F 0 = (1965) f1 + 1, f2 + 1, . . . fn + 1, n, 0 has an edge disjoint real- [3] V. Havel: A remark on the existence of finite graphs. ization. Indeed, any edge disjoint realization G1 and (Czech), Časopis Pěst. Mat. 80:477–480. (1955) G2 of D and F can be extended to an edge disjoint realization of D0 and F 0 by adding two vertices v [4] Jerrum, M.R., Valiant, L.G., Vazirani, V.V.: Random n+1 and v generation of combinatorial structures from a uniform n+2, and then connecting vn+1 to all v1, . . . vn in G distribution. Theoretical Computer Science, 2 and connecting vn+1 and vn+2 in G1. On the other hand, in any edge disjoint realizations G0 43(2-3):169–188 (1986) 1 and G0 [5] Kundu, S.: The k-factor conjecture is true. Discrete 2 of D0 and F 0, vn+1 is connected to all v1, . . . vn in G0 Mathematics, 6(4):367–376. (1973) 2, therefore, vn+1 must be connected to vn+2 in G0 [6] Kundu, S.: Disjoint Representation of Tree Realizable 1. Therefore deleting vn+1 and vn+2 yields an edge disjoint realization of D and F . Sequences. SIAM Journal on Applied Mathematics, 26(1):103–107. (1974) We can use the first observation to prove that it is also NP- complete to decide that two simple degree sequences have edge disjoint realizations. The second observation provides that it is NP-complete to decide if two degree sequences have edge disjoint realizations such that one of the degree sequences does not have 0 degrees. Finally, we can use the third observation to iteratively transform any D degree se- quence (that already does not have a 0 degree) to a tree degree sequence. Indeed, in each step, we add two vertices to D and extend the sum of the degrees only by 2. There- fore in a polynomial number of steps, we get a degree se- quence D0 in which the sum of the degrees is exatly twice the number of vertices minus 2. Therefore it follows that given any bipartite degree sequences D and F , we can con- struct in polynomial time two simple degree sequences D0 and F 0 such that D and F have edge disjoint realizations iff D0 and F 0 have edge disjoint realizations, furthermore, D0 is a tree degree sequence. 6. DISCUSSION AND CONSLUSIONS In this paper, we considered packing tree degree sequences. Our main theorem is that two tree degree sequences have edge disjoint tree realizations iff their sum is graphical. This is similar to the Kundu’s theorem [5] stating that a degree sequence and an almost regular degree sequence have an edge disjoint realization iff their sum is graphical. This raises the natural question if a degree sequence and a tree sequence have edge disjoint realizations iff their sum is graphical. We showed that the answer is no to this question, and actually, it is NP-complete to decide if an arbitrary degree sequence and a tree degree sequence have edge disjoint realizations. We also considered to approximately count and sample edge disjoint tree realizations with prescribed degrees. We showed that it is possible if there are no common leaves. It remains an open question when the two degree sequences have com- mon leaves. 64 Benchmark problems for exhaustive exact maximum clique search algorithms Sandor Szabo Bogdan Zavalnij Institute of Mathematics and Informatics Institute of Mathematics and Informatics University of Pecs University of Pecs sszabo7@hotmail.com bogdan@ttk.pte.hu ABSTRACT The most widely used test instances are the Erd˝ os–Rényi There are well established widely used benchmark tests to random graphs, graphs from the second DIMACS challenge1, assess the performance of practical exact clique search algo- combinatorial problems of monotonic matrices [5], and hard rithms. In this paper a family of further benchmark prob- coding problems of Deletion-Correcting Codes2. lems is proposed mainly to test exhaustive clique search pro- cedures. Evaluating the performances of various clique search algo- rithms is a delicate matter. On one hand one would like to Keywords reach some practically relevant conclusion about the com- clique, maximum clique, random graph peting algorithms. On the other hand this conclusion is based on a finite list of instances. Let G = (V, E) be a finite simple graph. Here V is the set of vertices of G and E is the set of edges of G. Let C be a One has to be ever cautious not to draw overly sweeping con- subset of V . If two distinct nodes in C are always adjacent in clusions from these inherently limited nature experiments. G, then C is called a clique in G. When C has k elements, (We intended to contrast this approach to the asymptotic then we talk about a k-clique. A k-clique is a maximum techniques which are intimately tied to infinity.) The sit- clique in G if G does not contain any (k + 1)-clique. We uation is of course not completely pessimistic. After all, call this well defined number the clique number of G and we these benchmarks were successful at shedding light on the denote it by ω(G). practicality of many of the latest clique search procedures. However, we should strive for enhancing the test procedures. A number of problems is referred as clique search problems. The main purpose of this paper is to propose new benchmark instances. Problem 1. Given a finite simple graph G and given a There are occasions when we are trying to locate a large positive integer k. Decide if G has a k-clique. clique in a given graph such that the clique is not necessar- ily optimal. This approach is referred as non-exact method to contrast it to the exhaustive search. For instance con- Problem 2. Given a finite simple graph G. Determine ω(G). structing a large time table in this way can be practically important and useful even without a certificate of optimal- ity. The complexity theory of the algorithm tells us that Problem 1 is in the NP-complete complexity class. (See for instance The benchmark tests are of course relevant in connection [2].) Consequently, Problem 2 must be NP-hard. Loosely with non-exact procedures too. In order to avoid any unnec-speaking it can be interpreted such that the maximum clique essary confusion we would like emphasize that in this paper problem is computationally demanding. we are focusing solely on the exact clique search methods. As at this moment there are no readily available mathe- Let n be a positive integer and let p be a real number such matical tools to evaluate the performance of practical clique that 0 ≤ p ≤ 1. An Erd˝ os-Rényi random graph with pa- search algorithms, the standard procedure is to carry out nu- rameters n, p is a graph G with vertices 1, 2, . . . , n. The merical experiments on a battery of well selected benchmark probability that the unordered pair {x, y} is an edge of G is tests. equal to p for each x, y, 1 ≤ x < y ≤ n. The events that the distinct pairs {xi(1), yi(1)}, . . . , {xi(s), yi(s)} are edges of G are independent of each other for each subset {i(1), . . . , i(s)} of {1, 2, . . . , n}, where s ≥ 2. 1ftp://dimacs.rutgers.edu/pub/challenge/ 2http://neilsloane.com/doc/graphs.html 65 In a more formal way the Erd˝ os-Rényi random graph of pa- Note that the graph G is isomorphic to the lexicographic rameters n, p is a random variable whose values are all the product of the graphs M (k) and K, where M (k) is the My- simple graphs with n vertices. The probability distribution cielski graph of parameter k and K is the complete graph on over these graphs is specified in the manner we have de- m nodes. One can verify that χ(G) = (k)(m) and ω(G) = scribed above. In this paper we can work safely in a more (2)(m). intuitive level. We start with a complete graph on n vertices and we decide the fate of each edge by flipping a biased coin. We choose a real number p such that 0 ≤ p ≤ 1. We random- ize the red edges of G. We flip a biased coin and keep each In the case p = 0 we end up with a graph consisting of n red edge with probability p. The resulted random graph isolated nodes. In the case p = 1 we end up with a complete is denoted by G0. It is obvious that χ(G0) ≤ (k)(m) and graph on n nodes. (Paper [1] is the basic reference on Erd˝ os- ω(G0) ≤ (2)(m). By planting a (2m)-clique into G0 we can Rényi random graph.) guarantee that ω(G0) = (2)(m). We pick xi, yi ∈ Vi such that the unordered pair {xi, yi} is an edge in G0 for each i, Let l, n be positive integers. Let Hi = (Vi, Ei) be a graph 1 ≤ i ≤ m. Finally, we construct a (2m)-clique whose nodes consisting of l isolated nodes. This means that |Vi| = l and are x1, y1, . . . , xm, ym. Ei = ∅ for each i, 1 ≤ i ≤ n. Let Vi = {vi,1, . . . , vi,l}. We construct a new graph G = (V, E). We set V = V1 ∪ · · · ∪ Vn. The nodes vi,r, vj,s are connected by an edge in G whenever i 6= j. We may say that the graph G is isomorphic to the lexicographic product of the graphs H and K, where H consists of l isolated nodes and K is the complete graph on n nodes. (For further details of graph products see [3]. ) Clearly, Vi is an independent set in G for each i, 1 ≤ i ≤ n. The subgraph induced by Vi ∪Vj in G is a complete bipartite graph for each i, j, 1 ≤ i < j ≤ n. Obviously, χ(G) = n and ω(G) = n hold. In fact G contains ln distinct n-cliques. At this stage we choose a real number p such that 0 ≤ p ≤ 1. At each edge of G we flip a biased coin. The edge stays with probability p. We call this step randomizing G. The resulting random graph G0 belongs to the parameters l, n, p. The l = 1 particular case corresponds to the Erd˝ os-Rényi Figure 1: The adjacency matrices of the Mycielski random graph of parameters n, p. graph M (4) and the random graph G0. It is clear that χ(G0) ≤ n and ω(G0) ≤ n. In order to guarantee that ω(G0) = n holds we will plant an n-clique into G0. One can achieve this by picking xi ∈ Vi for each i, 1 ≤ i ≤ n and connect each distinct pairs among x1 . . . , xn Note that other graphs can be used instead of the My- by an edge in G0. cielski graphs. Presumably the kind of graphs where the clique number is far from the chromatic number. Using Benchmark tests based on these random graphs are collected this method we constructed several test problems. The pro- in the BHOSLIB library.3 (The acronym BHOSLIB stands posed new collection of test graphs can be found on the site for Benchmarks with Hidden Optimum Solutions Library.) clique.ttk.pte.hu/evil. The sourse code of the program that generates the adjacency matrices of these graphs are After all these preparations we are ready to describe the also available on this site. graphs we would like to propose for testing clique search al- gorithms. Let k, m be positive integers. Let M (k) = (V i i, Ei) We carried out a large scale numerical experiment to check be the Mycielski graph of parameter k. (For the definition of the proposed EVIL benchmark problems. We used 55 test Mycielski graphs see [4]. ) Let Vi = {vi,1, . . . , vi,n} for each graphs. We took 35 BHOSLIB graphs and 20 EVIL graphs. i, 1 ≤ i ≤ m. We construct a new graph G = (V, E). We The experiment involved 7 programs implementing 12 dif- set V = V1 ∪ · · · ∪ Vm. Let vi,r, vi,s ∈ Vi. If the unordered ferent algorithms and so we are able to compare the running pair {vi,r, vi,s} is an edge of M (k), then we add this pair as i times of 660 clique searches. We shall present the results in an edge to G. These edges will be the blue edges of G. In the extended version of out paper. One particular result was other words the subgraph induced by Vi in G is isomorphic that there is a test graph with 220 nodes – 20 copies of the to M (k) for each i, 1 ≤ i ≤ m. i M (4) graph, p = 98% edge probability – whose clique num- ber could be determined by only one program in slightly less Pick vi,r ∈ Vi, vj,s ∈ Vj . We connect the nodes vi,r, vj,s by than 12 hours. We suppose that this problem is the hardest an edge in G whenever i 6= j. These edges will be the red one of such small size. edges of G. We would like to close the paper with a few remarks why 3http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/ the reader should appreciate the proposed benchmark prob- graph-benchmarks.htm lems. Although it seems that there is a large number of 66 benchmark problems for maximum clique search the plain fact is that there are not enough of them. Many of these test problems are too easy for the modern solvers as the sizes of these problems are small. On the other hand there are test instances that are overly hard for the contemporary clique solvers. The proposed EVIL test graphs are forming param- eterized families. The parameters can be tuned to produce benchmark problems in various degrees of difficulty. Acknowledgments This research was supported by National Research, Develop- ment and Innovation Office – NKFIH Fund No. SNN-117879 and the Pécsi Tudományegyetem Alap´ıtvány. 1. REFERENCES [1] P. Erd˝ os, A. Rényi, On the evolution of random graphs, Magyar Tud. Akad. Mat. Kutató Int. Közl. 5 (1960), 17–61. [2] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness, Freeman, New York, 2003. [3] R. Hammack, W. Imrich, S. Klavžar, Handbook of Product Graphs, CRC Press, Boca Raton, FL, 2011. [4] J. Mycielski, Sur le coloriage des graphes, Colloq. Math. 3 (1955), 161–162. [5] E. W. Weisstein, Monotonic Matrix, In: MathWorld–A Wolfram Web Resource. http: //mathworld.wolfram.com/MonotonicMatrix.html 67 On embedding degree sequences [Extended Abstract] ∗ † Béla Csaba Bálint Vásárhelyi Bolyai Institute Bolyai Institute University of Szeged University of Szeged 6720 Szeged, Hungary, Aradi vértanúk tere 1. 6720 Szeged, Hungary, Aradi vértanúk tere 1. bcsaba@math.u-szeged.hu mesti90@gmail.com ABSTRACT A finite sequence of natural numbers π = (d1, . . . , dn) is a Assume that we are given two graphic sequences, π1 and π2. graphic sequence or degree sequence if there exists a graph We consider conditions for π1 and π2 which guarantee that G such that π is the (not necessarily) monotone degree se- there exists a simple graph G2 realizing π2 such that G2 is quence of G. Such a graph G realizes π. The largest value the subgraph of any simple graph G1 that realizes π1. of π is denoted by ∆(π). We sometimes refer to the value of π at vertex v as π(v). Categories and Subject Descriptors Let G and H be two graphs on n vertices. They pack if G.2.2 [Graph theory]: Extremal graph theory; Matchings there exist edge-disjoint copies of G and H in K and factors; Graph coloring n. Two degree sequences π1 and π2 pack, if there are graphs G1 and G2 realizing π1 and π2, respectively, such that G1 and G2 General Terms pack. Equivalently, G1 and G2 pack if and only if G1 ⊂ G2, Graph theory that is, G1 can be embedded into G2, where G denotes the complement of G. Keywords degree sequence, embedding, extremal graph theory It is an old an well-understood problem in graph theory to tell whether a given sequence of natural numbers is a degree sequence or not. We consider a generalization of it, which is 1. INTRODUCTION remotely related to the so-called discrete tomography [3] (or All graphs considered in this paper are simple. We use stan- degree sequence packing) problem as well‡. The question dard graph theory notation, see for example [4]. Let us whether a sequence π of n numbers is a degree sequence provide a short list of a few perhaps not so common no- can be formulated as follows: Does Kn have a subgraph tions, notations. Given a bipartite graph G(A, B) we call it H such that the degree sequence of H is π? The question balanced if |A| = |B|. This notion naturally generalizes for becomes more general if Kn is replaced by some (simple) r-partite graphs with r ∈ N, r ≥ 2. graph G on n vertices. If the answer is yes, we say that π can be embedded into G, or equivalently, π packs with G. If S ⊂ V for some graph G = (V, E), then the subgraph In order to state our main result let δ(G) and ∆(G) denote spanned by S is denoted by G[S]. Moreover, let Q ⊂ V so the minimum and maximum degree of G, respectively. We that S ∩ Q = ∅, then G[S, Q] denotes the bipartite subgraph prove the following. of G on vertex classes S and Q, having every edge of G that connects a vertex of S with a vertex of Q. The number of Theorem 1. For every ε > 0 and D ∈ N there exists an edges of a graph is denoted by e(G). The chromatic number n0 = n0(ε, D) such that for all n > n0 if G is a graph on n of a graph G is χ(G). The complete graph on n vertices is vertices with δ(G) ≥ n + εn and π is a degree sequence of denoted by K 2 n, the complete bipartite graph with vertex length n with ∆(π) ≤ D, then π is embeddable into G. class sizes n and m is denoted by Kn,m. ∗ We also state Theorem 1 in an equivalent complementary Partially supported by ERC-AdG. 321400 and by the Na- tional Research, Development and Innovation Office - NK- form, as a packing problem. FIH Fund No. SNN-117879. †Supported by TÁMOP-4.2.2.B-15/1/KONV-2015-0006. Theorem 2. For every ε > 0 and D ∈ N there exists an n0 = n0(ε, D) such that for all n > n0 if π1 and π2 are graphic sequences of length n satisfying ∆(π1) < 1 − ε n 2 and ∆(π2) ≤ D then there exists a graph G2 that realizes π2 and packs with any G1 realizing π1. It is easy to see that Theorem 1 is sharp up to the εn ad- ditive term. For that let n be an even number, and sup- ‡This relation is discussed in the full version of the paper 68 pose that every element of π is 1. Then the only graph We construct the type 2 gadgets using the following algo- that realizes π is the union of n/2 vertex disjoint edges. Let rithm. G = Kn/2−1,n/2+1 be the complete bipartite graph with ver- tex class sizes n/2 − 1 and n/2 + 1. Clearly G does not have Algorithm 4. Set every type 1 gadget unmarked and n/2 vertex disjoint edges. every vertex in R − Ro uncolored. 2. PROOF OF THEOREM 1 Step 1 Choose an uncolored vertex v from R−Ro and color it. We are going to construct a 3-colorable graph H that realizes π and has the following properties. There exists A ⊂ V = Step 2 Choose a type 1 unmarked gadget K and mark it. V (H) such that Step 3 Choose an arbitrary perfect matching M (1) |A| ≤ 5∆3(π), K in K (MK exists since K is a balanced complete bipartite graph). (2) the components of H[V − A] are balanced complete bipartite graphs, each having size at most 2∆(π), Step 4 Choose an arbitrary xy edge in MK . (3) χ(H[A]) = 3 if A is non-empty, and Step 5 Replace the edge xy with the new edges vx and vy. (4) e(H[A, V − A]) = 0. Step 6 If v is still missing edges, then if MK is not empty, go to Step 4, else go to Step 2. In order to construct H we will use two types of ”gadgets”. Type 1 gadgets are balanced complete bipartite graphs on 2k Step 7 If v reaches its desired degree and there are still vertices, where k ∈ {1, . . . , ∆(π)}, these are the components uncolored vertices in R − R of H[V − A]. Type 2 gadgets are composed of at least two o, then go to Step 1, else stop the algorithm type 1 gadgets and at most two other vertices, these are the components of H[A]. It is easy to see that in π(v)/2 steps v reaches its desired degree, while the degrees of vertices in the marked type 1 We find type 1 gadgets with the following algorithm. gadgets have not changed. It is straightforward to use this algorithm for vertices in Ro, since each of these miss an even Algorithm 3. Assign the elements of π arbitrarily to V. number of edges. Set every vertex active. Let k = 1. Figure 1 shows examples of type 2 gadgets. Let F ⊂ H Step 1 If there are at least 2k active vertices with degree denote the subgraph containing the union of all type 2 gad- k, then take any 2k such vertices, create a balanced gets, thus F = H[A]. Observe that type 2 gadgets of F are complete bipartite graph on these 2k vertices, and 3-chromatic, and all have less than 5∆2(π) vertices. This then unactivate them. easily implies the following claim. Step 2 If the number of active vertices with degree k drops below 2k, set k = k + 1. Claim 5. We have that |V (F )| ≤ 5∆3(π). We are going to show that H ⊂ G. For that we first em- Step 3 If k ≤ ∆(π), then go to Step 1. Else stop the algo- bed the 3-chromatic part F using the following strength- rithm. ening of the Erd˝ os–Stone theorem proved by Chvátal and Szemerédi [1]. This way we obtain several components, each being a bal- anced complete bipartite graph. These are type 1 gadgets. It is easy to see that for every k ∈ {1, . . . , ∆(π)} at most Theorem 6. Let ϕ > 0 and assume that G is a graph on n vertices where n is sufficiently large. Let r ∈ 2k − 1 vertices are left out from the union of type 1 gadgets, N, r ≥ 2. If a total of at most ∆2(π) − 2∆(π) vertices. Furthermore, if a vertex v belongs to some type 1 gadget, then its degree is r − 2 |E(G)| ≥ + ϕ n2, exactly π(v). 2(r − 1) Let R denote the set of vertices that are uncovered by the then G contains a Kr(t), i.e. a complete r-partite graph with above set of type 1 gadgets. As we noted earlier |R| ≤ t vertices in each class, such that ∆(π)2 − 2∆(π). In order to get the right degrees for the vertices of R we construct type 2 gadgets, using some type log n 1 gadgets as well. t > . (1) 500 log 1 ϕ Notice first that the sum of the degrees of the vertices of R must be an even number, hence, Ro, the subset of R containing the odd degree vertices, has an even number of Since δ(G) ≥ n/2 + εn, the conditions of Theorem 6 are sat- elements. Find |Ro|/2 disjoint pairs in Ro, and join vertices isfied with r = 3 and ϕ = ε/2, hence, G contains a balanced by a new edge that belong to the same pair. With this we complete tripartite subgraph T on Ω(log n) vertices. Using get that every vertex of R misses an even number of edges. Claim 5 and the 3-colorability of F this implies that F ⊂ T. 69 1 1 1 1 1 1 1 2 2 2 2 2 3 1 2 2 2 2 2 2 2 3 3 3 3 3 2 2 2 2 2 2 2 3 3 3 3 3 1 3 3 3 3 3 3 3 2 2 2 2 2 2 3 2 3 2 3 2 3 2 3 Figure 1: Type 2 gadgets of H with a 3-coloring Observe that after embedding F into G every uncovered Since H − F has bounded size components, we can apply vertex still has at least δ(G) − v(F ) > n/2 + εn/2 uncovered Theorem 8 for H − F and G0, with parameter γ = ε/2. neighbors. Denoting the uncovered subgraph of G by G0 we With this we finished proving what was desired. obtain that δ(G0) > n/2 + εn/2. We need a definition from [2]. 3. A GENERALIZATION While Theorem 1 is best possible up to the εn additive term, if π has a special property, one can claim much more as The- Definition 7. [2] A graph H on n vertices is well-separable, if it has a subset S ⊂ V (H) of size o(n) such that all com- orem 9 shows below. Let us call a bipartite graph H(A, B) ponents of H − S are of size o(n). u-unbalanced if |A| = u|B| for some u ∈ N. A bipartite de- gree sequence π is u-unbalanced if π can be realized by a In order to prove that H − F ⊂ G0 we will apply a special u-unbalanced bipartite graph. We need the notion of edit case of the main theorem of [2], which is as follows: distance of graphs: the edit distance between two graphs on the same labeled vertex set is defined to be the size of the symmetric difference of the edge sets. Theorem 8. [2] For every γ > 0 and positive integer D there exists an n0 such that for all n > n0 if J is a bipartite well-separable graph on n vertices, ∆(J ) ≤ ∆ and A generalization of Theorem 1 is the following: δ(G) ≥ 1 + γ n for a graph G of order n, then J ⊂ G. 2 Theorem 9. For every ε > 0 and D, u ∈ N there exist an 70 n0 = n0(ε, u) and a K = K(ε, D, u) such that if n ≥ n0, π is a u-unbalanced degree sequence of length n with ∆(π) ≤ D, G is a graph on n vertices with δ(G) ≥ n + εn, then there u+1 exists a graph G0 on n vertices so that the edit distance of G and G0 is at most K, and π is embeddable into G0. Hence, if π is unbalanced, the minimum degree requirement of Theorem 1 can be substantially decreased, what we pay for this is the ”almost embedding” of π. For example, if π is a 10-unbalanced bounded degree sequence of length n and G is a graph on n vertices having δ(G) ≥ n/11 + εn for some ε > 0, then after deleting/adding a constant number (i.e. a function of ε§) of edges, we obtain a graph G0 from G into which π can be embedded. In another direction, one can also show that if π has little less elements than the number of vertices in G, then π can be embedded into G under very similar conditions. Theorem 10. For every ε > 0 and D, u ∈ N there exist an n0 = n0(ε, u) and an M = M (ε, D, u) such that if n ≥ n0, π is a u-unbalanced degree sequence of length n with ∆(π) ≤ D, G is a graph on n + M vertices with δ(G) ≥ n+M + ε(n + M ), then π is embeddable into G. u+1 The proofs of Theorem 9 and Theorem 10 are much more involved than that of Theorem 1, they are given in the full version of the paper. Let us note that the conditions for δ(G) are best possible in the above theorems up to the εn additive term. 4. REFERENCES [1] V. Chvátal and E. Szemerédi, On the Erd˝ os–Stone Theorem, Journal of the London Mathematical Society s2-23 (1981), no. 2, 207–214. [2] B. Csaba, On embedding well-separable graphs, Discrete Mathematics 308 (2008), 4322–4331. [3] J. Diemunsch, M.J. Ferrara, S. Jahanbekam, and J. M. Shook, Extremal theorems for degree sequence packing and the 2-color discrete tomography problem, SIAM Journal of Discrete Mathematics 29 (2015), no. 4, 2088–2099. [4] Douglas B. West, Introduction to graph theory, second ed., Prentice Hall, 2001. §Unfortunately, it can be a tower function of 1/ε 71 Computational complexity of the winner determination problem for geometrical combinatorial auctions Dries Goossens Bart Vangerven Frits C.R. Spieksma Faculty of Economics and Operations Research and Operations Research and Business Administration, Statistics, Faculty of Statistics, Faculty of Ghent University Economics and Business, KU Economics and Business, KU Tweekerkenstraat 2, 9000 Leuven Leuven Gent Naamsestraat 69, 3000 Naamsestraat 69, 3000 Belgium Leuven Leuven dries.goossens@ugent.be Belgium Belgium ABSTRACT bidder, i.e., which bids to accept. In general, this winner We consider auctions of items that can be arranged in rows, determination problem is NP-hard [11], and does not allow for instance pieces of land for real estate development. The good approximation results [10]. objective is, given bids on subsets of items, to find a subset of bids that maximizes auction revenue (often referred to as the We discuss a combinatorial auction in a restricted topology. winner determination problem). We show that for a k-row In this setting, an item corresponds to a rectangle, and all problem with connected and gap-free bids, the winner deter- items are arranged in (a limited number of) rows, see Fig- mination problem can be solved in polynomial time, using a ure 1 for an example. Notice that the individual items (or dynamic programming algorithm. We study the complexity for bids in a grid, complementing known results in liter- ature. Additionally, we study variants of the geometrical winner determination setting. We provide a NP-hardness proof for the 2-row setting with gap-free bids. Finally, we extend this dynamic programming algorithm to solve the case where bidders submit connected, but not necessarily gap-free bids in a 2-row and a 3-row problem. Figure 1: An example of an instance with 3 rows and 5 bids. Keywords rectangles) need not have the same size. A bid consists of Auctions, winner determination problem, computational com- a set of items satisfying some restrictions (see Section 2 for plexity, rows, dynamic programming a precise problem definition), together with a value. The objective is to select a set of bids that maximizes the sum 1. INTRODUCTION of the expressed values, while making sure that each item is In combinatorial auctions, bidders can place bids on combi- present at most once in a selected bid. nations of items, called packages or bundles. Clearly, combi- natorial auctions allow bidders to better express their pref- There are several situations in practice that motivate this erences compared to the traditional auction formats, where specific geometric setting. We mention the following: bidders place bids on individual items. In particular, it makes sense to use a combinatorial auction when comple- mentarities or substitution effects exist between different • Real estate. Goossens et al. [7] describe how space in items. For an introduction to combinatorial auctions, we a newly erected building, to be used for housing and refer to [5]; for a survey of the literature, we refer to [1] and commercial purposes, is allocated using a combinato- [6]. rial auction. The geometric structure of each of the levels of the building features the properties described One important challenge within this domain is, given the here. Quan [8] reports on empirical studies in real es- bids, to decide which items should be allocated to which tate auctions. Several of these studies have focused on verifying and quantifying the afternoon effect. This af- ternoon effect describes similar items consistently sell- ing for significantly less in later rounds in multi-object sequential auctions. A combinatorial auction, by sell- ing all items simultaneously, can mitigate this effect. • Mineral rights. Imagine a region that is partitioned into lots, with the lots organized in rows. For sale is the right to extract minerals, oil or gas found on or below the surface of the lot. Clearly, having adjacent 72 lots allows for exploration and production efficiencies, setting described above. We show that it can be used to a complementarity. For more about this particular set- efficiently solve the winner determination problem (which is ting, we refer to [4]. Figure 2 shows an example of oil hard in general), using dynamic programming procedures. and gas leases neatly arranged in rows. Additionally, we settle the complexity of the winner deter- mination problem for bidding in a grid. 2. PROBLEM DESCRIPTION The geometric setting that we consider can be described as follows. Given are k rows. Each row contains an (ordered) set of items (or rectangles). If, on some row, an item u lies to the left of item v, then we write u ≺ v. We use Xj = {0, 1, . . . , mj } to denote the set of items in row j, j = 1, . . . , k. The set of items that can be bid on is Sk X j=1 j \{0}; item 0 cannot be part of any bid, and is only present for notational convenience. We assume that item ` lies directly Figure 2: Oil and Gas Leases managed to the left of item ` + 1, for each ` ∈ Xj \ {mj }, j = 1, . . . , k. by the Texas General Land Office. Taken from: http://www.glo.texas.gov/GLO/agency- administration/gis/gis-data.html. Definition 1. We say that a pair of items are adjacent if and only if they share a border with non-zero length. • Seats in a grandstand, theater or stadium. In some of these cases, one can even assume that a grid, con- Clearly, items ànd ` + 1 are adjacent. However, items on sisting of rows and columns, is given where each cell different (but consecutive) rows can be adjacent as well. We represents a seat. Typically, demand exists for sets use m to denote the number of items in the instance, i.e., of adjacent seats - think of a family of four going to a m = Pk m j=1 j . Figure 3 visualizes this. ball game, or a group of friends visiting a concert. The complementarities that people perceive from adjacent seats offer possibilities for combinatorial auctions. Al- Row 1 1 2 m1 though tickets are usually sold at a fixed price, there are occasions where sports teams have auctioned off Row 2 1 2 m2 (part of) their seat licenses. Row 3 1 2 m3 In all these cases, it is clear that complementarities between adjacent items exist; a combinatorial auction is best-placed Figure 3: An example of an instance with k = 3 (i.e. 3 rows) to take these effects into account. and m1 = 6, m2 = 8, m3 = 7. Goossens et al. [7] show that when a constraint is imposed stating that a bidder can have at most one winning bid, the We investigate the following problem, called the winner de- winner determination problem is NP-hard even if all items termination problem (WDP). Given is a set of bids B on are arranged on a single row. Hence, to have any prospect subsets of items, with v(b) denoting the value of bid b, for of coming up with a positive result, we allow bidders to win each b ∈ B. We set n = |B|, i.e. there are n bids; specifying multiple bids. a bid implies specifying a set of items, as well as a value v(b) > 0. The problem is to find an allocation that maxi- Our problem is a special case of finding a maximum-weight mizes the sum of the values of the accepted bids, ensuring independent set in a geometric intersection graph. In such a that each item is allocated at most once. graph, there is a node for each bid (in our case: a (connected) set of rectangles), and two nodes are connected if and only if Given a bid b, consider the item graph, H(b), which has a the corresponding bids overlap. Finding a maximum-weight node for each item in bid b, and there is an edge between a independent set in a geometric intersection graph is a well- pair of nodes in H(b) if and only if the corresponding items studied problem for several types of intersection graphs. For are adjacent. There are two main restrictions on the bids instance, in the work of [9], it is shown that if all items are that we consider. We define the concept of a connected bid. arranged in a single row, and bids are only allowed for sub- sets of consecutive items, the resulting winner determina- tion problem is polynomially solvable. These results follow Definition 2. We say that bid b is connected if the sub- from the equivalence of this problem to finding a maximum- graph H(b) induced by the items of bid b is connected. If weight independent set in an interval graph. For an overview bid b is not connected, we say that it is disconnected. on results for more general intersection graphs we refer to [3]. Further, let us define the concept of a bid that is gap-free. A In this paper, we study the computational complexity of formal definition of a bid having no gaps (i.e. being gap-free) the winner determination problem for the specific geometric is formulated as follows. 73 Definition 3. We say that bid b is gap-free if no three be row bids or column bids, the resulting winner de- items u ≺ v ≺ w on a single row exist for which u ∈ b, v / ∈ termination problem is NP-hard. b, w ∈ b. • For gap-free bids, the winner determination problem is NP-hard, even on two rows. A bid that is not gap-free has at least one gap. Notice that it is easy to exhibit examples of connected bids that are • For connected bids, the winner determination problem not gap-free (see Figure 4a), and gap-free bids that are not is easy on three rows or fewer. We show this by adapt- connected (see Figure 4b). It is also easy to see that in the ing and expanding upon the general dynamic program- case of a single row, i.e. k = 1, connectedness of a bid is ming algorithm developed for connected and gap-free equivalent to a bid being gap-free. bids. We point out that the complexity of the winner determina- tion problem with connected bids on a fixed number of rows k, with k ≥ 4, is still an open problem. If the number of rows is part of the input, a result in [9] implies the problem is NP-hard. Due to the page limitation imposed on this manuscript, the following section only describes our dynamic program for winner determination for the case of k rows, with connected and gap-free bids. For the proofs of our other claims, we refer the reader to our working paper [12]. (a) A bid that is connected and not gap-free. 4. A DYNAMIC PROGRAM FOR WINNER DETERMINATION FOR CONNECTED AND GAP-FREE BIDS In this section we assume that bids are connected and gap- free. We show how the winner determination problem for a setting with k rows can be solved as a shortest path prob- lem on a graph G = (V, A), which is constructed as fol- lows. There is a node in V for each element in the Carte- sian product of the sets X1,X2, . . . , Xk. We write V = Qk X i=1 i. Nodes in V are k-tuples. We consider the k- tuple x = hx (b) A bid that is disconnected and gap-free. 1, x2, . . . , xk i, where x1 ∈ X1, x2 ∈ X2, . . . and xk ∈ Xk. This k-tuple represents a state, i.e. a collec- Figure 4: Examples illustrating the concepts of a connect- tion of assigned items. More specifically, the k-tuple x rep- edness and gap-freeness. resents a state where irrevocable decisions concerning the items {0, . . . , x1} ∪ {0, . . . , x2} ∪ · · · ∪ {0, . . . , xk} have been made, i.e. for each row i all items from left to right up to and including x Finally, it is important to see that bids on identical sets i. As there is a node in V for every k-tuple, this leads to O(mk) nodes. of items but with different values need not all be consid- ered. Indeed, one need only consider the bid with the high- The arc set A includes two types of arcs: the zero arcs and est value. If more than one bid has the highest value, one bid arcs. The zero arcs have a weight of 0, and are used could use the bid entry time as a tie-breaker. Thus, all but to handle items not included in the set of winning bids. the highest value bid on a specific set of items can be elimi- Consider some node x = hx nated and bids will be unique in the sense that they are all 1, x2, . . . , xi, . . . , xk i ∈ V , with 1 ≤ i ≤ k and x for different sets of items. i 6= mi. A zero arc goes from node x to node hx1, . . . , xi + 1, . . . , xki ∈ V , for each 1 ≤ i ≤ k. Thus, up to k zero arcs emanate node x ∈ V , giving rise to O(mk) 3. RESULTS zero arcs in the graph G. For the setting where items are arranged in rows, we show the following: The bid arcs correspond to actual bids and have a weight equal to the value of the bid v(b). We represent a bid by listing k pairs of elements; each pair represents the first ele- • For connected and gap-free bids, the winner determi- ment, and the last element present in a bid on a particular nation problem is easy when the number of rows is row. For a bid b that contains elements from each of the k fixed. We solve this problem using a polynomial time rows, we write: b = {(xb dynamic programming algorithm. 1, yb 1), (xb 2, yb 2), . . . , (xb k , yb k )}, where the element xb ∈ j Xj (1 ≤ j ≤ k) refers to the leftmost • For the setting where the bid space is a grid and both element of Xj present in bid b, and the element yb ∈ j Xj the number of rows and columns are a part of the in- (1 ≤ j ≤ k) refers to the rightmost element of Xj present put, we show that even when bids are constrained to in bid b. We use the symbol (∅, ∅) to denote that a bid 74 does not include items from that row. Thus, as an exam- if the number of rows is part of the input. Our results may ple, when we write b = {(∅, ∅), (xb2, yb2), (xb3, yb3), (∅, ∅)} this also prove useful for experimental research on combinato- means that the bid b does not include any items on the first rial auctions: our dynamic program will allow researchers row, it includes items x2 up to and including y2 on the sec- to study bidder behavior in larger settings, involving more ond row, it includes items x3 up to and including y3 on the items and bidders than considered so far. third row, and it does not include any items on the fourth row. 6. ACKNOWLEDGMENTS This research is supported by the Interuniversity Attraction The bid arcs can be described as follows. Let us, for con- Poles Programme initiated by the Belgian Science Policy venience, first assume that bid b contains elements from Office. The authors wish to thank James B. Orlin for an each of the k rows. To represent bid b in the graph G, interesting and stimulating conversation. we draw an arc from node hxb1 − 1, xb2 − 1, . . . , xb − k 1i to node hyb1, yb2, . . . , yb i k with weight v(b). Consider now a bid 7. REFERENCES b such that there are rows with no elements in b. Ob- [1] J. Abrache, T. G. Crainic, M. Gendreau, and serve that, due to connectedness of b, these rows can only M. Rekik. Combinatorial auctions. Annals of have indices 1, 2, . . . , s(b) and f (b), f (b) + 1, . . . , k with 0 ≤ Operations Research, 153(1):131–164, 2007. s(b) < f (b) ≤ k + 1. Note that if a bid b is present on the row 1 then s(b) = 0. Similarly, if a bid b is present [2] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. on row k then f (b) = k + 1. Now, to represent bid b, for Network Flows: Theory, Algorithms, and Applications. each x Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1 ∈ X1, x2 ∈ X2, . . . , xs(b) ∈ Xs(b), xf (b) ∈ Xf (b), x 1993. f (b)+1 ∈ Xf (b)+1, . . . , xk ∈ Xk there is an arc from node hx [3] T. M. Chan and S. Har-Peled. Approximation 1, x2, . . . , xs(b), xb −1, . . . , xb −1, x s(b)+1 f (b)−1 f (b), . . . , xk i to Algorithms for Maximum Independent Set of node hx1, x2, . . . , xs(b), yb , . . . , yb , x s(b)+1 f (b)−1 f (b), . . . , xk i with Pseudo-Disks. Discrete Computational Geometry, weight v(b). Notice that there are O(nmk−1) bid arcs (of 48:373–392, 2012. course it is conceivable that the number of bid arcs will be [4] P. Cramton. How Best to Auction Oil Rights. In far less). M. Humphreys, J. Sachs, and J. E. Stiglitz, editors, Escaping the resource curse, chapter 5, pages 114–151. We now compute a longest path from node 0 = h0, . . . , 0i Cambridge Univ Press, 2007. to node m = hm1, . . . , mki. The length of this path cor- [5] P. Cramton, Y. Shoham, and R. Steinberg. responds to the optimal revenue of the auction, and the Combinatorial auctions. MIT Press, 2006. winning bids can be derived from the arcs in the path. No- [6] S. de Vries and R. V. Vohra. Combinatorial Auctions: tice that G = (V, A) is acyclic by construction and con- A Survey. INFORMS Journal on Computing, sists of O(mk) nodes and O(mk−1(n + m)) arcs. Hence, a 15(3):284–309, 2003. longest path can be found efficiently by solving a shortest path problem in G = (V, A) with edge weights multiplied [7] D. R. Goossens, S. Onderstal, J. Pijnacker, and by -1. Since Ahuja et al. [2] show that shortest path prob- F. C. R. Spieksma. Solids: A Combinatorial Auction lems in directed acyclic graphs with p nodes and q arcs can for Real Estate. Interfaces, 44(4):351–363, 2014. be solved in O(p + q) time, our dynamic program requires [8] D. Quan. Real Estate Auctions: A Survey of Theory O(mk + nmk−1) time. and Practice. Journal of Real Estate Finance and Economics, 9(1):23–49, 1994. Once a longest path is found, it is easy to see which bids [9] M. H. Rothkopf, A. Pekeč, and R. M. Harstad. are accepted. Every arc that is not a zero arc in G = (V, A) Computationally Manageable Combinational corresponds to exactly one bid. To find the set of winning Auctions. Management Science, 44(8):1131–1147, bids, for every non-zero arc in the longest path simply accept 1998. the bid corresponding to that arc. For a numerical example [10] T. Sandholm. Algorithm for optimal winner and and proof of correctness of our algorithm, we refer the determination in combinatorial auctions. Artificial reader to [12]. Intelligence, 135(1-2):1–54, 2002. [11] S. Van Hoesel and R. Müller. Optimization in electronic markets: examples in combinatorial 5. CONCLUSIONS auctions. Netnomics, 3(1):23–33, 2001. We study the winner determination problem for a combina- [12] B. Vangerven, D. Goossens, and F. Spieksma. Winner torial auction with a specific geometric structure. We argue determination in geometrical combinatorial auctions. that this structure is relevant, as it occurs in real estate, Technical report, KBI 1614, Faculty of Economics and plots of land, and mineral rights. The complementarities Business, KU Leuven, 2016. present in these situations offer great potential for combina- torial auctions. With our dynamic programming algorithm, we present auc- tioneers a tool that enables them, under some reasonable assumptions on the bids and with a fixed number of rows, to efficiently compute the winning bids. Next, we comple- ment existing results by showing that bidding in a grid is difficult, even when only row and column bids are allowed, 75 Diploid Genome Rearrangement [Extended Abstract] ∗ István Miklós Adrienn Szabó Rényi Institute SZTAKI Reáltanoda u. 13-15 Lágymányosi u. 11 1053 Budapest, Hungary 1111 Budapest, Hungary miklos.istvan@renyi.mta.hu aszabo@ilab.sztaki.hu ABSTRACT A B C D E A B C D E Next Generation Sequencing (NGS) techniques revolution- ized the collection of genomic data. It allows massively par- A B C D E A B C D E allel sequencing of short fragments reducing the time and cost of sequencing. When pairs of fragments are sequenced, it is possible to detect rearrangement events using NGS, but in case of diploid genomes, rearrangement events might happen on both chromosomes of homologous pairs, and the A C B D E A C D B E entire rearranged genome cannot be directly read out from NGS data. A B D C E A B C D E We consider the problem of reconstructing the rearranged diploid genome from NGS data, and study the computa- A B A B tional complexity of the problem. We prove that finding one solution can be done in polynomial running time. On the other hand, deciding if there is a solution without non- E C E C homologous recombination between homologous chromoso- mes is NP-complete. D D a) b) 1. INTRODUCTION The Next Generation Sequencing technique breaks the geno- Figure 1: A pair of examples for diploid rearrange- me into small, overlapping pieces (several copies of the ge- ment and NGS graph. Both examples contain a pair nomic DNA are broken) and these small pieces are sequenced. of homologous chromosomes with 5 synteny blocks From these small, overlapping copies, the whole genome is (unit segments), labelled by A,B, ... E. The grey reconstructed. The diploid genomes contain two copies of edges of the NGS graphs are noted by dotted lines. each chromosome (except the sex chromosomes) and in case The two copies of the black edges are replaced with of healthy genomes, the two chromosomes are identical. one single edge due to sake of simplicity. See text for more details. However, cancer genomes might undergo a huge amount of genome rearrangement events, see for example [11]. In these genomes, the homologous chromosomes might be re- overlap, then it is decidable if the two rearrangement events arranged in different ways. It is possible to read out from the happened on one or two copies of homologous chromosomes, NGS data where rearrangement events happened in terms of see Fig. 1. On the left, segment BC is inverted in one of the chromosome positions, however, this data does not reveal in chromosomes and the segment CD is inverted on the other which copy of the homologous chromosomes a particular re- chromosome. This latter inversion affects the adjecency of arrangement happened. However, if the rearranged intervals segments B and D and the adjecency of segments D and E. ∗ On the right, first the BC segment is inverted in one of the Secondary affiliation: SZTAKI, 1111 Budapest, Lá- gymányosi u. 11, Hungary chromosomes, then another segment is inverted on the same chromosome also affecting the adjacency of segments B and C and the adjacency of segments D and E. The resulting rearranged genomes are different, and their NGS graphs are also different as shown at the bottom of the picture. In this paper, we consider the problem of reconstructing the diploid genomes from NGS data. We show that without constraints, finding one solution is easy. On the other hand, the reconstruction problem is NP-complete if a biologically relevant restriction is introduced. 76 2. PRELIMINARIES a rearrangement event in neither of the chromosomes. A In this section, we transform the reconstruction problem into vertex with no gray edge is the end of a pair of homologous a graph theoretical problem. synteny blocks that are telomers in two chromosomes, in an extreme case, it might be the two telomers of the same chro- mosome. A vertex with one gray edge is an end of a pair Definition 1. A diploid genome {G(V, E), L} is an edge of synteny blocks whose one copy is a telomer, and whose labelled directed graph in which each vertex has a total de- another copy is in an adjacency with another synteny block. gree 1 or 2, each label in L is used exactly twice and the Finally, a vertex with two grey edges is the end of a pair of graph contains no cycles. The components of the diploid synteny blocks such that both copies are in adjacency with genome are called chromosomes. The edges are called syn- another synteny block ends. teny blocks. The beginning of an edge is called tail, and the end of the edge is called a head. The edges with the same labels are called homologous synteny blocks. The degree 1 Example genomes and NGS graphs can be seen on Fig. 1. vertices are called telomers, the degree 2 vertices are called For example, on Fig. 1 a), the two grey edges at the head adjacencies. of the synteny block A indicates the two synteny blocks ad- jacent to the head of synteny block A in the rearranged genome: head of C and tail of B. A synteny block is a DNA sequence that can be identified in a hereby not detailed biological way. Sometimes synteny blocks are called genes, however, a synteny block might be Definition 3. A diploid genome {G0(V 0, E0), L0} is a re- alization of a NGS graph {G(V, E), L} if L = L0 and there a large cluster of genes. A diploid genome contains two is a bijection between the grey edges in G and the degree (almost) identical copies of each synteny block; this is why 2 vertices in G0 such that the grey edges connect the same each label is used exactly twice in the graph representation. endpoints of the diploid synteny blocks that are adjacent in The differences in the two copies of the synteny blocks are G0. point mutations that happen at less than one percentage of the nucleotides and causes the genetic varience of the individuals. Equivalently, a realization is a decomposition of the NGS graph into alternating walks, defined below. We also define The NGS sequencing technique obtains short fragments from the alternating circuits for technical reasons. the genomes, typically at most one hundred of nucleotides. The typical length of a synteny block contains tenthousands or even more nucleotides. A run of few tens of nucleotides Definition 4. An alternating walk on an NGS graph is is typically unique in a genome, and thus, can identify a a series of edges e1, e2, . . . , en such that all edges are dif- synteny block. Therefore the sequenced fragments are suffi- ferent, for all i = 1, 2, . . . , n − 1, edges ei and ei+1 have a ciently long to identify which synteny blocks are neighbours common vertex, and the edges have alternating colourings (when a fragment covers the endings of two synteny blocks), in the series. Similarly, an alternating circuit is a series however, it does not tell which copies of the two identical of edges e1, e2, . . . , e2n such that all edges are different, for ones. Indeed, the rare point mutations do not provide suffi- all i = 1, 2, . . . , 2n − 1, edges ei and ei+1 have a common cient information to distinguish the two copies of the synteny vertex, furthermore, e2n and e1 have a common vertex, and blocks. Furthermore, the sequenced fragments are not long the edges have alternating colourings in the series. enough to reveal the corresponding neighbours at the end of one copy of a synteny block. The information revealed 3. FINDING ONE SOLUTION FOR THE DI- from the NGS data can be summarized in the NGS graph, PLOID REARRANGEMENT PROBLEM defined below. Here we consider two versions of the diploid rearrangement problem. The first version allows non-homologous recombi- nations between the same chromosomes. Such a rearrange- Definition 2. A NGS (Next Generation Sequencing) graph ment yields a chromosome that contains 2 copies of the same {G(V, E), L} is an edge colored directed multigraph and la- diploid synteny block. (In comparision, a homolgous recom- bels with the following properties: bination swaps the almost identical synteny blocks between two chromosomes.) The second version does not allow such • The edges are coloured with black and gray. Black rearrangements, this happens for example, when the rear- edges are directed and come in pairs, i.e. if there is rangement events contain only reversals. We show that the a black edge from u to v, then there are exactly two first version is an easy problem while the other is an NP- black edges going from u to v. complete one. • Each vertex has exactly 2 black edges and at most 2 3.1 Diploid rearrangement allowing non-homo- gray edges. Loops are allowed only for gray edges, if a vertex has a grey loop, then it counts as 2 gray edges. logous recombinations between homologous chromosomes • Each couple of black vertices has a unique label coming The diploid rearrangement with non-homologous recombi- from the label set L. nations is the following problem: given a NGS graph, con- struct a diploid genome which is a realization of the NGS The couples of black vertices are called diploid synteny blocks. graph such that one chromosome might contain two copies A diploid synteny block is a genomic segment that underwent of the same diploid synteny block. 77 Theorem 1. Let {G(V, E), L} be a NGS graph. It has at x least one diploid genome realization iff each component of G 2 x contains at least one vertex with degree less than 4. x 1 3 ( x ¬ x ) 1 2 Proof. If there is a component in G whose vertices all have degree 4 then it is impossible to map its grey edges onto linear components of a diploid genome. Indeed, since each ¬ ( x x x ) vertex has 2 grey edges in the component, both copies of the 1 2 3 diploid synteny blocks must be in adjacency with another synteny block, and thus, they cannot be telomers. On the other hand, if there is a vertex with a degree less ( x x ) than 4 then there is a vertex with degree 2 or there are at 2 3 least 2 vertices with degree 3, since the sum of degrees must be an even number. Starting with a vertex wich has less grey edges than black ones, take an alternating walk on the component, starting with a black edge, and ending with a vertex with no remaining edges with the alternating colour. Figure 2: The NGS graph for the CNF (x1 ∨ ¬x2) ∧ Such walk ends in a vertex which has less grey edges than (¬x1 ∨ x2 ∨ x3) ∧ (x2 ∨ x3). Gray edges are indicated black ones, and thus, it ends with a black edge. with dotted lines. Pairs of black edges are repre- sented with a single black edge for sake of simplic- Once the walk is finished, remove this walk from the compo- ity, and their directions are also omitted. The gray nent. Either the walk covers the entire component, or there path for each boolean variable as well as the alter- are remaining vertices. In this later case, removing the walk nating grey-black cycle for each clause are labelled. might create more than one components, take any of them. See text for more details. If there are remaining vertices in the component having less grey edges than black ones, start a new alternating walk ization of a NGS graph {G(V, E), L} exists in which the in such a vertex, taking a black edge first, and finish it in chromosomes can be split into two sets such that each set another such vertex, remove this walk from the component, contains the entire label set L. Unfortunately, this question consider the remaining component, etc. After removing a is hard to answer, as the following theorem states. few alternating walks, either the remaining component is empty or the remaining component contains only vertices having the same number of grey and black edges. There must exist a vertex with a degree less than 4, otherwise the Theorem 2. The diploid rearrangement problem without non-homologous recombinations between homologous chro- removed alternating paths were vertex disjoint from the re- mosomes is NP-complete. maining component, thus disjoint from the component, a contradiction. Choose any vertex with degree 2 from the remaining compo- Proof. It is trivial that any solution can be verified in polynomial time, thus the problem is clearly in NP. Below nent, let it be denoted by v, and take an alternating circuit we prove that the problem is NP-hard by proving that SAT starting with v. Since this alternating circuit shares the is polynomially reducible to it. vertex v with one of the removed walks, it can be merged with this alternating walk thus obtaining a larger walk. If Clearly, a NGS graph has a realization without non-homologous the component is still not empty after removing the alter- recombination between homologous chromosomes if the gray nating circuit, keep processing it in the same way: take an edges can be coloured with two colours, say, blue and red alternating circuit having a vertex shared with one of the such that the red edges and one copy from each pair of black already removed walks, and merge the circuit and the path, vertices can be decomposed into a collection of alternating thus obtaining a larger walk. Eventually, the component is paths, furthermore, the blue edges and the other copies of decomposed into alternating walks. black vertices can be decomposed into a collection of al- Now, each alternating walk represents a chromosome by con- ternating paths. Each vertex of the NGS graph contains at tracting the grey edges into a single vertex. Clearly, this set most two gray edges, therefore the gray subgraph can be de- of chromosomes give a realization of the component. composed unequivocally into paths and cycles. It is obvious that in any solution to the problem, these paths and cy- It is also trivial that the decomposition of a component into cles are coloured alternating, hence there are two candidate alternating paths and thus into chromosomes can be done colourings of each component. In the polynomial reduction, in polynomial time. there will be a grey path for each boolean variable, and the two possible colourings will correspond to the logical true 3.2 Diploid rearrangement excluding non-homo- and false assignments of the boolean variables. logous recombinations between homologous Consider a conjuctive normal form with n boolean variables chromosomes and k clauses. Construct a NGS graph for the diploid re- The diploid rearrangement without non-homologous recom- arrangement problem in the following way (see also Fig. 2): binations between homologous chromosomes ask if a real- Make n + 1 chains of grey edges, for j = 1, 2, . . . , n, the jth 78 chain contains 4mj − 1 vertices, where mj is the number of therefore it is also an interesting question what can we say clauses in which the jth boolean variable participates (ei- about the computational complexity of counting or sampling ther negated or not negated). Number the vertices in each these solutions. These types of questions have impact in chain starting with 1, and in each of these chains, the ver- bioinformatics, and were considered and partially answered tices with indices 4i − 2 accomodate the incoming pair of for other genome rearrangment problems, see [1, 2, 3, 5, 6, black edges for the ith clause having the boolean variable in 7, 8, 9, 10]. it. The vertices with indexes 4i will have a ”separator” pair of black edges, whose other vertex is a ”dead end”, namely, The diploid rearrangement problem is slightly similar to has a degree 2. The vertices with indexes 4i − 1 and 4i − 3 finding Eulerian circuits in an Eulerian graph, in fact, if are connected to the outgoing pair of black edges of the ith a component of an NGS graph contains one vertex with de- closure. If the logical true value of the jth boolean variable gree 2 and all other vertices have degree 4 then each solution satisfies the ith clause, then the black edges going out from is a closed Eulerian walk that in fact is an Eulerian cycle. the 4i − 1st vertex have a dead end, and the black edges It is known that counting the Eulerian cycles in an undi- going out from the 4i − 3rd vertex will be the incoming edge rected graph is #P-complete, and it is an open question if for the next grey chain. Otherwise, the edges going out from there are efficient algorithms for approximating the number the 4i − 3rd vertex will have a dead end, and the other pair of solutions and sampling almost uniformly the solutions [4]. will be the incoming pair of the next gray chain. Therefore it is natural to conjecture that counting and sam- pling the solutions for the diploid genome rearrangement The last gray chain contains 4k − 1 vertices, the vertices problem is also #P-complete. with indexes 4i − 2 will have the incoming pair of black edges for the ith clause. The vertices with indexes 4i will 5. REFERENCES have a ”separator” pair of black edges, whose other vertex [1] Ajana, Y., Lefebvre, J.F., Tillier, E.R.M., El-Mabrouk, is a ”dead end”, namely, has a degree 2. The vertices with N. 2002. Exploring the set of all minimal sequences of indices 4i−1 and 4i−3 are connected to the outgoing pair of reversals - an application to test the replication-directed black edges of the ith clause. The black edges going out from reversal hypothesis. In: WABI ’02: Proceedings of the the 4i − 1st vertex are are the incoming edges in the first Second International Workshop on Algorithms in gray chain for the ith clause, and the pair of vertices going Bioinformatics, London, UK, Springer-Verlag 300–315 out from the 4i−3rd vertex have a dead end. In this way, for [2] Braga, MDV, Sagot, M-F, Scornavacca, C, Tannier, E each clause, we create a cycle, containing alternatingly pairs (2008) Exploring the Solution Space of Sorting by of black edges and grey edges. The gray edges indicate the Reversals with Experiments and an Application to logical assignments of the boolean variables providing that Evolution, IEEE-ACM Transactions on Computational the clause is not satisfied (and there is an additional grey Biology and Bioinformatics, 5, 348-356 edge in the last grey chain). If these grey edges have the [3] Braga, M.D.V., Stoye, J. (2009) Counting All DCJ same colour, then such colouring cannot provide a solution Sorting Scenarios. LNCS, 5817, 36–47. to the problem, since it contains a cycle. [4] Brightwell, G.R., Winkler, P.: Note on Counting Eulerian Circuits, http://arxiv.org/abs/cs/0405067 We claim that the diploid rearrangement is solvable for the (2004) so-constructed graph if and only if the CNF is satisfiable. If the CNF is satisfiable, then there is a colouring of the gray [5] Miklós, I., Darling, A. (2009) Efficient sampling of edges such that the red edges and one copy of the black parsimonious inversion histories with application to edges have at least one dead end for each clause, so the red- genome rearrangement in Yersinia Genome Biology and black subgraph contains only paths. The last chain can be Evolution, 1(1):153–164. coloured such that the blue edges and the other copies of [6] Istvan Miklos, Sandor Z. Kiss, Eric Tannier (2013) On black edges will have dead ends in this chain, so the blue- sampling SCJ rearrangement scenarios, black subgraph contains only paths, and thus, we have a http://arxiv.org/abs/1304.2170 solution for the diploid rearrangement problem. [7] Miklós, I., Mélykúti, B., Swenson, K.: The Metropolized Partial Importance Sampling MCMC On the other hand, if the CNF is not satisfiable, then for mixes slowly on minimum reversal rearrangement paths any colouring of the first n gray chains, at least one of the ACM/IEEE Transactions on Computational Biology clauses does not have a dead end for the blue-black subgraph and Bioinformatics, vol. 4, no. 7, 763–767, 2010. in the first n gray chains and also at least one of them does [8] Miklós, I., Smith, H.: Sampling and counting genome not have a dead end for the red-black subgraph. Whatever is rearrangement scenarios, BMC Bioinformatics, the colouring of the n + 1st gray chain, one of the colourings 16(Suppl 14): S6. (2015) will create a circular chromosome, hence there is no solution [9] Miklós, I., Tannier, E. (2012) Approximating the for the diploid rearrangement problem. number of Double Cut-and-Join scenarios, Theoretical Computer Science 439:30–40. 4. DISCUSSION AND CONSLUSIONS [10] Ouangraoua, A., Bergeron, A. (2009) Parking In this paper, we considered the diploid rearrangement prob- Functions, Labeled Trees and DCJ Sorting Scenarios. lem and showed that it is polynomial solvable when there is LNCS, 5817, 24–35. no restriction and NP-complete if the solution space is re- [11] Stratton, B.R., Campbell, P.J., Futreal, P.A.: The stricted. cancer genome. Nature 458, 719–724 (2009). There might be more than one solution to the problem, and 79 Team Work Scheduling ∗ Gyorgy Dosa Hans Kellerer Zsolt Tuza Department of Mathematics Institut für Statistik und Alfréd Rényi Institute of University of Pannonia, Operations Research, Mathematics, Hungarian Veszprém, Hungary Universität Graz, Austria Academy of Sciences dosagy@almos.vein.hu hans.kellerer@uni- tuza@dcs.uni-pannon.hu graz.at ABSTRACT that the team that processes job j consists of tj differ- We introduce a quite general scheduling model we call Team ent types of collaborators/machines, Work Scheduling. It mainly means that a team works to- tj gether to process any job. Its special version is recently • also given a collection of tj sets Cj = (M 1 j , M 2 j , ..., M ), j defined as MultiProfessor scheduling, and even a more spe- these sets are pairwise disjoint and M k ⊆ j M for any cial version is the RAR problem. This last one means that k ∈ {1, ..., tj }, parallel machine scheduling is considered with job assign- ment restrictions, i.e., each job can only be processed on a • furthermore given a collection of tj integer numbers tj certain subset of the machines. Moreover, each job requires n1 ≤ j , n2 j , ..., n , such that nk j j M k j . Then nkj means a set of renewable resources. Any resource can be used by the required number of machines from set M k j . (That only one job at any time. The objective is to minimize the is, from the k-th type of machines specified for the makespan. We present approximation algorithms with con- job, there are M k j possible machines, and from these stant worst-case bound in the case that each job requires machines ”only” nkj machines will be chosen.) Let nj = only a fixed number of resources. For some special cases tj optimal algorithms with polynomial running time are given. Xnkj, this integer means that alltogether exactly nj On the other hand we prove that the problem is APX-hard, k=1 even when there are just three machines and the input is machines will be chosen to process job j, from all types. restricted to unit-time jobs. Keywords When we schedule the jobs, for any job j, the nkj required multiprofessor scheduling, approximation algorithm number of machines must be chosen from set of machines M k j , for any k. In case nkj < M k j these machines are 1. INTRODUCTION elective, we can freely choose any nkj machines from the We define a general problem we call Team Work Scheduling, M k j machines. Otherwise, if nkj = M k j these machines TWS for short. In this model given jobs (as usual in the are mandatory, all of them are needed for the execution of area of scheduling), but now each jobs is executed simul- the job. The chosen machines are denoted by Tj and called taneously by certain machines, i.e. a team. We minimize the team (chosen for job j). the makespan. Now we give the exact definition of the new model as below. Finally, given the processing time pi,j for any (i, j) pair (i ∈ M, j ∈ N ), this is the time needed to execute job j by Given m machines, and n jobs, the set of machines is denoted machine i. Each job will be executed by the team chosen for by M , and the set of jobs is denoted by N . Moreover for the job, so, all these intended machines will run in parallel. any job j ∈ N , For any job j, we choose the team, and we take the maximum of the pi,j processing times for the chosen machines (i.e. the team). This is the processing time of the job by the team, • given an integer 1 ≤ tj ≤ m, this parameter means denoted by qj . Naturally, qj is not given in advance, it ∗ depends on the choice of the team to execute job j. Any another affilitation is: Department of Computer Science and Systems Technology, University of Pannonia, Veszprém, machine can process at most one job at any time, and if Hungary a team is chosen for a job, no matter if some machine’s processing time is smaller and another machine’s processing time is larger in the team, all machines of the team are considered busy during the longest pi,j processing time for i ∈ Tj . We ask for the minimum time (i.e. makespan) until all jobs are executed by the machines. We are interested in both the offline and online case. 80 2. APPLICATIONS pairs: (Pi, Lj ) ∈ C means that professor Pi can deliver lec- ture Lj if it is assigned to him, while (Ps, Lt)∗ ∈ C∗ means that professor P A typical online model is the following one. Accidents hap- s has to be present when Lt is delivered by some other professor, who is assigned to this lecture. The pen in an unpredictable way in a city, and the injured people MPS problem is still quite general, in [1] many (other) ap- are taken into a hospital to perform the necessary operations plications are also given. For example, MPS is still general- for them. These operations are the jobs. For any opera- ization of the Restricted assignment (RA for short) problem tion, according to the nature of the injury, a special team is or the Hierarchical scheduling problem (HS for short). needed, the members of the team play the role of the ma- chines. Let us consider one operation. It is possible that the Restricted Assignment with Resources Problem (RAR for presence of some doctors is indispensable. For example only short, [3]). Finally we define an even more special case of the one doctor can make the anesthesia, so he/she will surely TWS model, which is a special case of the MPS model. We be there during the operation. Also, there is an expert, the are given n independent jobs 1, . . . , n that are to be sched- only one who can make a special kind of operation. So both uled on m0 parallel machines M of them will be there, they play the role of some mandatory 1, . . . , Mm0 . In the restricted assignment problem (RA, for short, [2]) each job j can be machines. Moreover there are also several nurses who are executed on a specific subset M(j) of the machines, and on free at that time, and either of them can be chosen as the those machines the processing time of job j is p one who helps the doctors. For example from five such per- j . The ob- jective is to minimize the makespan. In the three field nota- sons three must be selected, they play the role of elective tion, we abbreviate this problem by R|p machines. Suppose there are two operating rooms that are ij ∈ {pj , ∞}|Cmax. Assume that additionally there are µ renewable resources available at moment, one of them must be choosen, this is R also an elective machine in our model. Naturally, the injured 1, . . . , Rµ (then m0 + µ = m). Let Λk be the set of jobs which require resource R person plays the role also of a manditory machine. The du- k , and let λk denote the cardinal- ity of set Λ ration of the operation may depend on the chosen persons k , k = 1, . . . , µ. Job j requires simultaneous availability of all resources in the set R(j) ⊆ {R (as a proficient worker makes some activity faster than a 1, . . . , Rµ} for processing; we denote by ρ beginner). j the cardinality of R(j), j = 1, . . . , n. Any resource can be used by only one job at any time. It means that two jobs which require the same For another (offline) application let us consider a fast food resource cannot be processed simultaneously. We abbreviate restaurant, where some kinds of salads are made (among this problem by R|p other foods). The machines are of different types. ij ∈ {pj , ∞}, resµ|Cmax. The degree of the problem is defined as the quantity B = max ρj , that j=1,...,n a, Members of the staff (called makers) who make the salad. is the maximum number of resources required by a job. b, Machines for mixing, heating, and other prepearing op- 4. RESULTS erations. Professors of the MPS model correspond to machines of the c, Ingredients. For example mustard is stored in some bottle, RAR model; the lectures are the jobs, the duration of a lec- and the whole bottle is reserved for some salad-maker during ture means the processing time. But RAR is only a particu- he makes the salad, but not all content will be used, only lar case of MPS: distinction between machines and resources some portion. So the battle of the mustard is a (mobil) means a partition of professors into two classes: those only machine. delivering lectures (‘Professors’), and the others only attend- ing (‘Instructors’). This special case of MPS is termed the Then all machines (i.e. the member who makes the salad, PI model. the devices that are needed, and all ingredients) are collected together, and by use of them the salad will be made ready. Among several results, it is proved in [1] that PI with unit- We want to make ready all ordered foods as soon as possible. time lectures is N P -hard to O(n1−)-approximate for any fixed > 0 if there are n professors and O(n2) instructors, 3. RELATED MODELS even in the more restricted PI model where C establishes a bijection between lectures and professors and when it is assumed further that any instructor is involved in just two Multiprofessor Scheduling (MPS for short). The MPS prob- conditions of C∗. On the other hand, still considering unit lem is characterized by the following settings: For any job times, if the number of professors, or the number of lectures j, 1 ≤ tj ≤ 2. The team that processes job j consists of at is fixed, then MPS can be solved in linear time. most 2 different types of collaborators, if tj = 2, then one type is mandatory, another is elective. The set of mandatory For the RAR model, we can prove inapproximability results machines contains several machines, also the set of elective and design approximation algorithms. Our main negative machines, but exactly one machine must be chosen from the result is that the problem with unit-time jobs is AP X-hard, elective machines. The problem is defined and considered already on three machines. In the case that each job requires in [1]. It is evident, that MPS is a special case of the TWS only a bounded number of resources, we design approxi- problem. mation algorithms with constant worst-case bound, without any restrictions on processing times. For some special cases To explain better the MPS model, in this model we have a (e.g., unit-time jobs with degree B = 1) we design optimal set P = {P1, . . . , Pu} of professors and a set L = {L1, . . . , Ln} algorithms with polynomial running time. To derive the of lectures with two sets C and C∗ of conditions given by main negative result, we prove a theorem on graph coloring, 81 which seems to be of interest on its own right, too. It states AP X-hardness of the chromatic number on a restricted class of graphs. 5. ACKNOWLEDGMENTS The first author is partially supported by the project VKSZ 12- 1-2013-0088 Development of cloud based smart IT solutions by IBM Hungary in cooperation with the University of Pan- nonia. All three authors are partially supported by Stiftung Aktion Österreich-Ungarn, under grant 92öu1. The first and the last author are supported in part by the National Re- search, Development and Innovation Office – NKFIH under the grant SNN 116095. 6. REFERENCES [1] G. Dosa, Zs. Tuza, Multiprofessor Scheduling, online first, DAM, 2016, http://dx.doi.org/10.1016/j.dam.2016.01.035. [2] C. A. Glass and H. Kellerer, Parallel machine scheduling with job assignment restrictions, Naval Research Logistics, 54 (3), 250–257, 2007. [3] G. Dosa, H. Kellerer, Zs. Tuza, Restricted Assignment Scheduling with Resource Constraints, manuscript, 2016. 82 Incremental 2-D nearest-point search with evenly populated strips David Podgorelec University of Maribor Denis Špelič University of Maribor Faculty of Electrical Engeneering and Computer Faculty of Electrical Engeneering and Computer Science Science Maribor, Slovenia Maribor, Slovenia david.podgorelec@um.si denis.spelic@um.si ABSTRACT search is trivially handled in θ(n) time, but the problem be- The incremental nearest-point search successively inserts query comes more demanding when a recurring nearest-point prob- points into the space partition data structure, and the nearest- lem has to be solved. A straightforward repetition of the ba- point for each of them is simultaneously found among the sic nearest-point search results in θ(n2) time when applied to previously inserted points. The paper introduces a new ap- θ(n) query points. More advanced approaches use space par- proach to solve this problem in 2D-space. Dynamic par- titioning to bound the number of possible nearest-point can- tition successfully prevents situations with over-populated didates in each iteration [11]. The partition is accomplished strips but still fails to reach optimality. A variant with two by constructing a hierarchical or a grid data structure, typ- perpendicular partitions and four types of deterministic skip ically a tree [4], the Voronoi diagram [7], a regular grid [2], lists is therefore discussed as a possible extension. or a multi-level organization of these structures [11]. Such a data structure is aimed to accelerate solving the point- Categories and Subject Descriptors location problem i.e. determination of the region where a E.1 [Data Structures]: Lists, Stacks and Queues, Trees; query point lies. A static partition does not adapt itself to F.2.2 [Analysis of Algorithms and Problem Complex- the point distribution. On the contrary, a dynamic partition ity]: Nonnumerical Algorithms and Problems—geometrical maintains the numbers of points in all cells within previously problems and computations, sorting and searching determined limits. Particularly in higher dimensions, where either query time or storage space must be sacrificed, a user may also be satisfied by approximate solutions provided by General Terms the reasonably fast locality sensitive hashing technique [10]. Algorithms, Performance, Theory In this paper, we introduce an original dynamic plane parti- Keywords tion into parallel strips and utilize it to handle the so-called Incremental nearest-point, dynamic partition, deterministic incremental nearest-point search in 2-D space. This rep- skip list resents a special case of the recurring nearest-point search where: (1) the set of target points S and the set of query 1. INTRODUCTION points coincide, and (2) the points p1, ..., pn are successively The nearest-point search means a search for the target point inserted into the data structure and their nearest-points are p simultaneously found. The incremental search adequately i ∈ S = {p1, ..., pn}, such that the distance between pi and a given query point p is minimal. It enables or at least models interactive processing of database queries where the facilitates solving numerous practical problems from vari- results of previous queries are usually irrelevant for pro- ous research and application areas, such as computational cessing the current one. In computational geometry, a re- geometry [12], GIS [6], motion planning [9], and computer markably fast incremental Delaunay triangulation algorithm graphics [1]. Note that the distance need not refer to pure is based on the incremental nearest-point search [12]. geometric relation between two spatial points (e.g. Euc- lidean distance). This generalization extends the usability 2. DP-DSL APPROACH TO INCREMENTAL of the nearest-point search to database quering in the most NEAREST-POINT SEARCH versatile applications. The Voronoi diagram enables optimal O(n log n) time in the preliminary points arrangement approach, but the incremental If the distance is computable in θ(1) time, the nearest-point nearest-point search requires some of the incremental Voro- noi diagram construction algorithms which all, although fast on average, require quadratic time in the worst case [5]. For this reason and because of a relatively complex mainten- ance of the Voronoi diagrams, we preferably study other space partitioning techniques. First of all, we wish to keep practical advantages of the HT-DSL approach [11] and, sim- ultaneously, to improve its theoretical behaviour. The pion- eering HT-DSL approach represents even nowadays the only work where the incremental nearest-point search is explicitly 83 a) b) Figure 1: (1, 3)-deterministic skip list. Figure 2: a) HT-DSL and b) DP-DSL approach em- ployed on clusters of points. considered. It is based on a uniform plane subdivision into parallel strips. These static strips are directly accessible in ectly designed to prevent from such situations. The idea is O(1) time through a hash table (HT). On the other hand, straightforward: when a particular strip contains too many our DP-DSL approach uses a dynamic partition (DP) into points, the algorithm splits it into a pair of strips, each con- evenly populated strips. In both methods, the points in taining half of the points of the original strip. Under certain a particular strip are stored in (a, b)-deterministic skip list conditions, splitting may also result in three strips. The DP- (DSL) [8], providing a point insertion in O(log n) time and, DSL approach in Fig. 2b cuts the clusters by many narrow on the average, efficient nearest-point search inside the strip. strips, and leaves wide undivided strips between the clusters. The DP-DSL approach must additionally provide the func- tionality of DSL splitting when an over-populated strip is The DP-DSL approach requires additional data structure to split into two (or three) strips. store the strips’ borders. We use additional DSL named Borders for this purpose. It plays the same role as the hash table in the HT-DSL approach, but requires longer 2.1 Deterministic skip lists search time (logarithmic instead of constant) and dynamic Our implementation of (a, b)-DSL, inherited from [11], con- construction. Two types of strips are stored in Borders. A sists of a doubly linked list of points sorted regarding the line strip is a horizontal line, and an interval strip is a re- x-coordinate. Double connectivity assures that the move gion between two horizontal lines. The role of line strips from an arbitrary point to its direct predecessor or successor is to keep sizes of the interval strips limited. A line strip takes O(1) time. This list represents the basic level (level is introduced when the y-coordinates of two or more points 1) of the DSL. Its nodes (leaves) are accessible from simply correspond to the splitting threshold. linked lists of the internal nodes at higher levels. Each par- ent node (for example P in Fig. 1) at level h, h > 1, points Points in each DSL are sorted according to x-coordinates, to a single child node (C) at level h − 1. Nodes at level h − 1, but an over-populated strip should be split with regard to y. between the child nodes (C and C0) of two successive parent All the points in a line strip have the same y-coordinate and, nodes (P and P 0) form a gap. The gap size must be in range therefore, splitting is only sensible for the interval strips. [a, b]. Values stored in a gap are lower or equal to the value The splitting algorithm must firstly determine the splitting in the parent node. Consequently, value M must be set to threshold. We utilize the well-known SELECT algorithm [3] some ”safely” high value. which performs this task in linear time. The physical DSL splitting is realized by the original bricklaying approach. To access a particular leaf or to insert a new one, at most This firstly constructs level 1 for each of the two or three one child node and the successive gap must be examined at separate DSLs. This is achieved by moving the leaves of each of the dlog (n + 1)e levels, resulting in O(b log n) worst b the input DSL, one after another, to the end of the cor- time. By keeping b small, the logarithmic access time is responding separate list. Upper levels are then built from provided. Typical pairs (a, b) in practice are (1, 2), (1, 3), the elements of the so-called global list of recyclable nodes, (2, 5), and (3, 7). Fig. 1 shows a (1,3)-DSL. consisting of the eventual unused nodes from previous split- ting operations, the input DSL’s internal nodes and, only if The actual search for the nearest point to the query point p necessary, from newly allocated nodes. At each level, the was also inherited from [11]. It consist of the local search in algorithm groups the nodes into gaps of size b − gsc, where the strip where p was inserted, and the inter-cluster search gsc is a user-selected gap size correction parameter. which progresses up and/or down through the adjacent strips. 3. RESULTS AND ANALYSIS 2.2 Dynamic partition The number of strips in HT-DSL was experimentally de- √ The HT-DSL is remarkably fast for nearly uniform point dis- termined in the range m = θ( n). Consequently, the num- √ tributions. However, examples with much slower perform- ber of points in a strip is O( n) in an optimal case of the ance and also strongly affected by the points ordering can uniform point distribution. We have retained this result in effortlessly be constructed and, not rarely, also met in prac- the DP-DSL approach as well, and experimentally determ- tice. Example in Fig. 2a consists of a few over-populated ined the best performance by splitting a strip when its size √ strips and, on the other hand, of a large majority of strips reaches q = d3 ne. We use (1, 3)-DSLs in the HT-DSL ap- containing only few points. The DP-DSL approach is dir- proach, and (2, 5)-DSLs in the DP-DSL approach. We have 84 a) b) a) d) {p , ..., p } 1 n b) c) d) c) p n ... Figure 3: Dynamic partition into strips: a) uniform point distribution, b) grid, and c-d) GIS datasets. {p , ..., p } 1 n – 1 w Figure 5: Testing examples of a) a ladder across all p p 1 r + 1 strips, b) clusters with uniform noise, c) a ladder h in a single strip of the HT-DSL approach, and d) Gaussian distribution with additional point. p p 2 w r + 2 Table 1: Comparison of times between HT-DSL and DP-DSL approaches Fig. HT/DP Time HT Time DP ... ... 3a 0.79 O(n log n) O(n log n) 3b 0.84 O(n log n) O(n log n) 3c 0.50 O(n log n) O(n log n) w 3d 0.47 O(n log n) O(n log n) p √ r – p 1 2 r – 1 4b 1.46 O(n n) O(n log n) √ 2 7.21 O(n n) O(n log n) h √ 4d 9.91 O(n n) O(n log n) √ √ 4a 0.59 O(n n) O(n n) p √ w p r 2 r 4c 203.73 θ(n2) O(n n) l l 1 2 Figure 4: Point organization in the ladder example. 3.1 Theoretical time complexity analysis Table 2 gives expected worst-case time complexities of all phases of both approaches. The construction of strips and maintenance of DSLs are optimal in both cases, assuring the also determined the best long-term performance by using desired O(n log n) time. On the other hand, the local search the gap size correction parameter gsc = 1. and the inter-strip search time are both above this limit. Note that the local search time of the DP-DSL approach In Table 1, the comparison between the HT-DSL and DP- could be improved to O(n log n) by splitting the DSLs of √ DSL approach is given. The time ratios in the second column size q = O(log n) instead of current q = d3 ne. This change were obtained for configurations of 5.000.000 points. Figs. does not modify theoretical worst-case time complexities of 2, 3 and 5 show the tested distributions with reduced num- other phases above the desired limits, but it usually results bers of points. The realistic examples in Figs. 3c-d consist of in slower practical performance due to the increased number 70.334 and 193.360 points, respectively. Expected time com- of DSL splits and initial positioning operations in much more plexities are listed in the last two columns. The examples DSLs during the inter-strip search. from Figs 5a and 5c entitled the ladder, were synthetically generated and represent the worst-case for the local search. We have also managed to construct an example that requires The construction is emphasized in Fig. 4. The condition O(n2) inter-strip search time in the DP-DSL approach. The xr − x1 < w < h results in θ(r2) time for a ladder with construction is too extensive to find place in this paper. It 2r = n − 1 points in the same strip. Thus the HT-DSL ap- is based on a geometric progression of x-coordinates with proach spends θ(n2) local search time for the example from common ratio 2. Even for relatively low n, the exponential Fig. 5c. growth will quickly produce x-coordinates out of the range of 85 tional” HT-DSL approach still achieve quadratic inter-strip Table 2: Time complexities of particular phases in search time. The DP-4DSLs variant seems to solve the con- both approaches. sidered problematic examples in optimal time, but a formal Phase HT-DSLP DP-DSL proof is still missing. Construction of the Voronoi diagram Strip identification O(n) O(n log n) √ on θ( n) points and utilization of two perpendicular DSLs Point insertion O(n log n) O(n log n) in each Voronoi cell could have a potential, but one should DSL splitting 0 O(n) √ first prove that such dynamic partition is generally possible, Maintenance of Borders 0 O( n log n) √ and then provide an efficient region splitting algorithm. Local search O(n2) O(n n) Inter-strip search O(n2) O(n2) 5. REFERENCES [1] S. Bouaziz, A. Tagliasacchi, and M. Pauly. Sparse iterative closest point. In Computer graphics forum, Table 3: Comparison of three approaches in con- volume 32, pages 113–123. Wiley Online Library, 2013. sidered critical cases: I - ladder, II - rotated ladder, [2] J. G. Cleary. Analysis of an algorithm for finding III - geometric progression, IV - rotated geometric nearest neighbors in euclidean space. ACM progression, V - ”regular”. Transactions on Mathematical Software (TOMS), Example HT-DSLDP DP-DSL DP-4DSLs Winner √ 5(2):183–192, 1979. I O(n2) O(n n) O(n log n) YH [3] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and II O(n2) O(n2) O(n log n) XV C. Stein. Introduction to algorithms, volume 6. MIT III O(n2) O(n2) O(n log n) XV press Cambridge, 2001. IV O(n log n) O(n log n) O(n log n) YH V O(n log n) O(n log n) O(n log n) various [4] E. Gómez-Ballester, L. Micó, and J. Oncina. Some approaches to improve tree-based nearest neighbour search algorithms. Pattern Recognition, 39(2):171–179, the IEEE 754 floating-point specification, making this con- 2006. struction fully theoretical, as we do not expect such extreme [5] L. J. Guibas, D. E. Knuth, and M. Sharir. values in industrial, GIS and other practical applications. Randomized incremental construction of delaunay and However, a rotated ladder can, with some additional con- voronoi diagrams. Algorithmica, 7(1-6):381–413, 1992. straints, also represent the O(n2) inter-strip time example. [6] C. S. Jensen, J. Kolářvr, T. B. Pedersen, and I. Timko. Nearest neighbor queries in road networks. 3.2 DP-4DSLs approach In Proceedings of the 11th ACM international symposium on Advances in geographic information We have recently developed an engineering solution which systems, pages 1–8. ACM, 2003. handles all the considered problematic examples in the de- sired (optimal) time bounds. Besides the horizontal DP, it [7] T. Kanda and K. Sugihara. Comparison of various additionally performs the vertical DP. In each strip, two or- trees for nearest-point search with/without the thogonal DSLs are constructed, the horizontal one sorted voronoi diagram. Information processing letters, regarding the x-coordinate, and the vertical one sorted re- 84(1):17–22, 2002. garding the y-coordinate. Each point is therefore placed into [8] J. I. Munro, T. Papadakis, and R. Sedgewick. four DSLs: XH-DSL (the one used in the DP-DSL approach) Deterministic skip lists. In Proceedings of the third and YH-DSL are assigned to each horizontal strip, and XV- annual ACM-SIAM symposium on Discrete DSL and YV-DSL are constructed in each vertical strip. In algorithms, pages 367–375. Society for Industrial and each iteration of the local search, the method performs one Applied Mathematics, 1992. move in each DSL which are all addressing the same radius r. [9] J. Nagasue, Y. Konishi, N. Araki, T. Sato, and The nearest-point of q is found when the first DSL (the win- H. Ishigaki. Slope-walking of a biped robot with k ner) manages to examine all the points within the distance nearest neighbor method. In Innovative Computing, r around q. We have not managed to theoretically prove op- Information and Control (ICICIC), 2009 Fourth timal time complexity but the performance in the considered International Conference on, pages 173–176. IEEE, problematic cases appears promising as seen in table 3. On 2009. the other hand, the HT-DSL and the DP-DSL outperform [10] H. Wang, J. Cao, L. Shu, and D. Rafiei. Locality the DP-4DSLs approach in ”regular” cases as maintenance sensitive hashing revisited: filling the gap between of two partitions and four DSLs is quite expensive. theory and algorithm analysis. In Proceedings of the 22nd ACM international conference on Information & 4. CONCLUSION Knowledge Management, pages 1969–1978. ACM, 2013. The paper considers a new (DP-DSL) approach to the incre- √ mental nearest-point search in 2-D. It guarantees θ( n) [11] M. Zadravec, A. Brodnik, M. Mannila, M. Wanne, and √ strips, each containing O( n) points and, therefore, success- B. Žalik. A practical approach to the 2d incremental fully prevents situations with over-populated strips and de- nearest-point problem suitable for different point √ creases the local search time from O(n2) to O(n n). In our distributions. Pattern Recognition, 41(2):646–653, opinion, this is an important acceleration, although the al- 2008. gorithm still fails to achieve an optimal O(n log n) time per- [12] M. Zadravec and B. Žalik. An almost formance characteristic for the preliminary points arrange- distribution-independent incremental delaunay ment approach. In addition, examples can be constructed triangulation algorithm. The Visual Computer, (although hardly met in practice) which, just as the ”tradi- 21(6):384–396, 2005. 86 Exploratory Equivalence on Hypercube Graphs Jurij Mihelič Uroš Čibej Luka Fürst Faculty of Computer and Faculty of Computer and Faculty of Computer and Information Science, Information Science, Information Science, University of Ljubljana University of Ljubljana University of Ljubljana Večna pot 113 Večna pot 113 Večna pot 113 Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia jurij.mihelic@fri.uni-lj.si uros.cibej@fri.uni-lj.si luka.fuerst@fri.uni-lj.si ABSTRACT can be proved that an EE partition consisting of equivalence An exploratory equivalent (EE) partition of the vertex set classes P1, . . . , Ps reduces the number of subgraph isomor- of a graph G comprises sets of vertices that can be regarded phisms f : G → H that have to be considered during the as interchangeable when searching for copies of G in some search by a factor of Qs |P i=1 i|!. The goal of the maximum other graph. This property may be used to speed up the EE partition problem (MaxEE) is to find an EE partition search process. Since a graph may have multiple EE parti- that maximizes the reduction factor. tions, a natural problem is to find one that gives rise to a greatest speedup factor, i.e., a maximum EE partition. This For general graphs, the MaxEE problem is GI-hard [2]. problem is GI-hard for general graphs, so it makes sense to Therefore, it makes sense to study restricted classes of graphs. study restricted graph classes. In this paper, we focus on In this paper, we deal with hypercube graphs, which are im- the challenging class of hypercube graphs. We present a set portant both theoretically and practically. Being regular of rules to construct an EE partition for any such graph and and hence highly symmetric, these graphs are considered a prove that the resulting partition is maximum. suitable choice for the topology of a communication network [4]. Categories and Subject Descriptors G.2 [Discrete Mathematics]: Graph Theory; F.2 [Analy- The MaxEE problem on hypercube graphs has turned out sis of algorithms and problem complexity]: Nonnu- to be surprisingly challenging. Nevertheless, we have devised merical Algorithms and Problems a set of simple rules to construct a maximum EE partition for any such graph. Following the necessary definitions (Sec- tion 2), we give the construction rules and prove that they General Terms indeed produce a maximum EE partition (Section 3). Sec- Graph Theory, Algorithm tion 4 concludes the paper. Keywords exploratory equivalence, hypercube, algorithm, Hamming 2. PRELIMINARIES distance Let G = (V, E) with the vertex set V = {1, . . ., n} and the edge set E ⊆ V ×V be a simple undirected graph. Given an- 1. INTRODUCTION other simple undirected graph, H = (U, F ), an isomorphism f : G → H is a bijective mapping such that (f (u), f (v)) ∈ F In the world of planetary-scale networks, efficient subgraph iff (u, v) ∈ E. An automorphism G is an isomorphism from search is of paramount importance. However, the problem G to itself, and a subgraph isomorphism f : G → H is an of determining whether a pattern graph G is a subgraph isomorphism between G and a subgraph of H. of a host graph H (the subgraph isomorphism problem) is N P-complete, and although several algorithms perform rea- The set of automorphisms of a graph G, Aut(G), forms a sonably well in practice [1, 5, 6], they may fail if G has group under composition. A set A ⊆ Aut(G) covers a set many isomorphisms. In such cases, exploratory equivalence P ⊆ V (denoted cover(A, P )) if for each permutation σ of P [3] can be used to speed up the search. In particular, an there exists an automorphism a ∈ A such that a(i) = σ(i) for exploratory equivalent (EE) partition of G comprises equiv- all i ∈ P . The pointwise stabilizer of a set A ⊆ Aut(G) with alence classes (disjoint sets) of vertices that can be regarded respect to a set P ⊆ V is the set PointStab(A, P ) = {a ∈ A | as interchangeable during the search for copies of G in H. It ∀i ∈ P : a(i) = i}. An ordered partition hP1 | P2 | . . . | Psi of G (with Ss P i=1 i = V , ∀i, j, i 6= j : Pi ∩ Pj = ∅, and ∀i : Pi 6= ∅) is exploratory equivalent if cover(Ai−1, Pi) and Ai = PointStab(Ai−1, Pi) for all i ∈ {1, . . . , s}, where A0 = Aut(G). If hP1 | . . . | Psi is an EE partition of G and Pi = {vi1, . . . , vik }, then for each copy G0 of G in H there exists an i isomorphism f : G → G0 with f (vi1) < . . . < f (vik ) for i all i ∈ {1, . . . , s}. The number of subgraph isomorphisms 87 to be considered during the search for copies of G in H is In what follows, we first show how to construct maximum thus reduced by Qs |P exploratory equivalent partition of a hypercube graph Q i=1 i|! — the score of an EE partition d of hP1 | . . . | Psi. The goal of the MaxEE problem is to find a dimension d, then we prove its correctness and optimality. maximum-score EE partition of G. 3.1 Construction For example, the maximum EE partition of a graph G with There are at most two non-singleton classes in an optimal n = 4 and E = {(1, 2), (2, 3), (3, 4), (1, 4)} (a 4-cycle and exploratory equivalent partition of Qd for any dimension d. also a 2-hypercube) is h1, 3 | 2, 4i. The set Aut(G) includes In particular, hypercubes Q1 and Q3 result in one such class, the automorphisms 1234 and 3214 (1 7→ 3, 2 7→ 2, 3 7→ whereas all other Qd’s result in two such classes. Our con- 1, 4 7→ 4), which cover the set P1 = {1, 3}, and the set struction (described below and denoted with HCEE) results PointStab(Aut(G), P1) = {1234, 1432} covers the set P2 = in an exploratory equivalent partition, which is also opti- {2, 4}. The partitions h1, 2 | 3 | 4i and h1 | 2 | 3 | 4i are also mal for any hypercube Qd except for Q3 which we deal with EE but not maximum. separately. The classes are as folows: The Hamming distance between binary vectors p = (p1, . . . , • The first class consists of any two vertices which are the pd) and q = (q1, . . . , qd) is h(p, q) = Pd |p i=1 i − qi|. A binary farthest apart, i.e., their Hamming distance is d. For vector bind(r) = (b1, . . . , bd) is the binary representation of example, Q an integer r if r = Pd 2d−ib 1 gives {0,1}, Q2 gives {00,11} or {01,10}, i=1 i. The d-hypercube graph (or and Q4 gives {0000,1111}, etc. simply the d-hypercube) is the graph Qd with n = 2d and E = {(u, v) | h(bind(u), bind(v)) = 1}. The vertices of a d- • The second class (when d ≥ 2) consists of all the ver- hypercube will be labeled bind(1), . . . , bind(2d) rather than tices adjacent to one of the vertices in the first class. 1, . . . , 2d. For example, taking the vertex labeled with d zeros, Q2 gives {10,01}, and Q4 gives {1000,0100,0010,0001} 3. HYPERCUBES as the second class. In this section we focus on the exploratory equivalence of • All other classes (when d ≥ 4) are singletons, i.e., each hypercube graphs. Such a graph contains vertices and edges vertex not in the first or the second class is a separate of a d-dimensional hypercube and is denoted with Qd. It class. contains 2d vertices, d2d−1 edges, and is a regular graph of degree d. Its number of automorphisms is |Aut(Qd)| = d! 2d. Several examples of such construction of exploratory equiv- A straightforward procedure to generate the hypercube of alent partitions are shown in Figure 1 (for Q1, Q2, and Q4), a given dimension d is to create a vertex for each d-digit and Figure 2 a) (for Q3, non-optimal). binary number and connect two vertices with an edge if their Hamming distance is one. See Figure 1 for several examples Lemma 1. Given a hypercube graph Qd of dimension d, of hypercubes of dimensions from 1 to 4 as well as their where d ≥ 1, the HCEE construction produces an exploratory respective maximum exploratory equivalent partitions; for equivalent partition. example, the corresponding partition of Q4 is h0000, 1111 | 1000, 0100, 0010, 0001i. Proof. The first class is clearly exploratory equivalent; indeed, any two vertices of Qd would suffice, but selecting 011 111 the two farthest apart (denoted here with u and v) leaves the 01 11 010 110 most room for the second class. There is no other class in Q1. Alternatively, notice that, N (u) = N (v) in Q2, otherwise, 001 101 when d ≥ 3, neighborhoods are disjoint, i.e., N (u) ∩ N (v) = ∅ (since h(u, v) = d, hence, the distance between vertices 0 1 00 10 000 100 from N (u) and N (v) is at least d − 2 ≥ 1). Q1 Q2 Q3 Fix both vertices u and v from the first class, i.e., A1 = PointStab(Aut(Qd), {u, v}). Observe that, cover(A1, N (u)) 0110 1110 0111 1111 is satisfied, since interchanging any two vertices from N (u) is possible (and leaving all other from N (u) on their posi- 0100 1100 0101 1101 tion). Consequently, all possible permutations of N (u) are attainable. 0010 1010 0011 1011 Observing that |N (u)| = |N (v)| = d, gives the following 0000 1000 0001 1001 corollary. Q4 Corollary 1. Given a hypercube graph Qd of dimen- Figure 1: Several hypercubes and their maximum sion d, where d ≥ 2, the HCEE construction produces an exploratory equivalent partitions. Vertices in the exploratory equivalent partition having d + 2 vertices in its same class are of the same shade of gray; singletons non-singleton classes, and 2d − d − 2 in its singleton classes. are white. The score of such partition is 2 d!. 88 Following the steps of the construction gives rise to many d \ h 0 2 4 6 8 10 12 14 16 different exploratory equivalent partitions. We give their 1 1 count in the following lemma. 2 1 2 3 1 4 4 1 4 2 Lemma 2. The HCEE construction can produce 2d differ- 5 1 5 2 ent exploratory equivalent partitions of the hypercube graph 6 1 6 4 2 Qd of dimension d, where d ≥ 3. 7 1 7 4 2 8 1 8 4 2 2 Proof. The first pair of vertices may be selected on 2d−1 9 1 9 4 4 2 ways, but then there are only two available neighborhoods 10 1 10 5 4 2 2 to select from. 11 1 11 5 4 2 2 12 1 12 6 4 4 2 2 Representing a hypercube Qd explicitly with a list of vertices 13 1 13 6 4 4 2 2 and edges is deemed very inefficient. Hence, we assume the 14 1 14 7 4 4 2 2 2 input to the partition construction algorithm is only a num- 15 1 15 7 5 4 4 2 2 ber d of dimensions, which is, in general, of n = O(lg d) 16 1 16 8 5 4 4 2 2 2 bits long. In this sense, even outputting one vertex label requires exponential time, i.e., O(d) = O(2n). Furthermore, Table 1: A tabular representation of ξ when d ≥ 2, there are 2 + d vertices in non-singleton classes. d(h) from The- orem 2. Exceptions using the third case are framed. Thus, we have the following lemma. Empty cells are zeros. Lemma 3. The time complexity of the HCEE construction is O(d2). Theorem 1. Given a hypercube graph Qd of dimension d, where d ≥ 1, if h is odd then ξd(h) = 2. In practice, when d is small enough, i.e., on today’s archi- tectures d ≤ 64, one can assume that outputting a d-bit Proof. For d = 1 this is straightforward. Without loss number is O(1). of generality, consider the vertex labeled 0 . . . 0. Any vertex with the distance h from it contains h ones and d − h zeros. 3.2 Optimality Now to obtain the third vertex with the distance h from the second toggle p ones to zero, and q zeros to one, where To prove the optimality of our construction we need an ad- p + q = h. However, the third cannot be on the distance h ditional notion of vertices whose distance from each other from the first: their distance is h − p + q = 2h − 2p 6= h, is the same. First, for each graph Qd, we define a param- since h is odd. eterized family of sets containing vertices of Qd, where the Hamming distance between any two nodes in the set equals to h, i.e., Theorem 2. Given a hypercube graph Qd of dimension Hd(h) = {H ⊆ V (Qd) | ∀u, v ∈ H : h(u, v) = h}. d, where d ≥ 1, if h is even then Now, we determine the size of a maximum set in H  d(h), i.e., 0 d < h    ξ 1 h = 0 d(h) = max |H|. H∈H ξ d (h) d(h) = 4 3/2h ≤ d < 2h   In what follows we are interested into an upper bound on  b d c h ≤ d < 3/2h ∨ d ≥ 2h. h/2 ξd(h), since exploratory equivalent classes are subsets of Hd(h). In particular, we have the following lemma. Proof. The first two cases are obvious: there are no ver- Lemma 4. Given a hypercube graph Qd of dimension d tices u, v ∈ Qd with h(u, v) > d and h(u, v) = 0 if only if and its exploratory equivalent partition hP1, . . . , Psi, for any u = v. 1 ≤ i ≤ s, it holds that Pi ∈ Hd(h) for some h. Now, consider the last two cases, where 0 < h ≤ d. To Proof. Singleton classes are contained in H construct the maximum size set begin from any vertex and d(0), and any two-vertex class {u, v} ∈ H observe positions (in binary representation) altered when d(h(u, v)). Now, consider three- vertex class {u, v, w}. If we interchange u and v, then w constructing the next vertex. Obviously, at each step there remains fixed only if h(u, w) = h(v, w), and, similarly, for are h positions altered, but to permit further steps h/2 of all other pairs in the class. them are the ones just altered in the previous step, and h/2 of them are the new ones (i.e., not yet altered). Observe that, any other technique blocks further steps. Proceeding The following two theorems specify ξd(h) for odd and even h, in this manner one can produce bd/(h/2)c vertices, since respectively, and are also of its own interest (see also Figure there are at most d positions altered in total. For example, 1). for Q4 and h = 2 we get 10000, 0100, 0010, 0001, and for 89 Q9 and h = 4 we get 110000000, 001100000, 000011000, Thus, x ≤ k −3 and the size of the second class ≥ d−(k −3). 000000110. Observe that a class of size l alters at least l − 1 positions in its containing vectors. Together both classes would now There is an exception to the technique (represented by the alter (k − 1) + (d − (k − 3) − 1) = d + 1 positions, which is second case), when only three vectors are generated, e.g., impossible on Qd. for Q3 and h = 2 we get 100,010,001, but better solution is 000, 110, 101, 011. Notice, that the second condition gives 2) Notice also that, d − k + 2 is the upper bound on the size b d c + 1 = 4. of the second class as well as on the total size of all non- h/2 singleton classes except the first. Thus, using even more than two classes cannot improve the score, since c! ≥ a!b!, Using these two theorems and Lemma 4 we now have the where c = a + b. upper bound specified by the following corollary. 4. CONCLUSIONS In this article we dealt with the exploratory equivalence on Corollary 2. Given a hypercube graph Qd of dimension d and its exploratory equivalent partition hP d-dimensional hypercube graphs. We presented an efficient 1, . . . , Psi, for any 1 ≤ i ≤ s, it holds that construction algorithm for an exploratory equivalent parti- tion, which was further proven to be optimal. (4 d = 3 |Pi| ≤ d d 6= 3. While proving the optimality we also observed another in- teresting property of hypercubes: the maximum cardinality of the vertex sets with a given Hamming distance (pairwise). Now, separately consider the hypercube Q3. Its optimal solution is shown in Figure 2 b). Indeed, observe that it is For the future work we will explore a generalisation of hy- exploratory equivalent (by interchanging any two vertices in percubes, i.e. hypergrids. the non-singleton partition the other vertices remain fixed). Additionally, its size is 4, which is, due to Theorem 2, at 5. ACKNOWLEDGMENTS most ξ3(h) ≤ 4 for any h. The solution is thus optimal, and This work was partially supported by the Slovenian Research we have the following theorem. Agency and the projects ”P2-0095 Parallel and distributed systems” and ”N2-0053 Graph Optimisation and Big Data”. Theorem 3. A maximum exploratory equivalent parti- tion of the hypercube graph Q 6. REFERENCES 3 is {000, 011, 101, 110} with the score of 24. [1] L. P. Cordella, P. Foggia, C. Sansone, and M. Vento. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Analysis and Machine 011 111 011 111 Intelligence, 26(10):1367–72, Oct. 2004. 010 110 010 110 [2] L. Fürst, U. Čibej, and J. Mihelič. Maximum exploratory equivalence in trees. In 2015 Federated Conference on Computer Science and Information 001 101 001 101 Systems, FedCSIS 2015, Lódź, Poland, September 000 100 000 100 13-16, 2015, pages 507–518, 2015. a) constructed b) optimal [3] J. Mihelič, L. Fürst, and U. Čibej. Exploratory equivalence in graphs: Definition and algorithms. In Figure 2: 3-dimensional hypercube: constructed vs. 2014 Federated Conference on Computer Science and optimal exploratory equivalent partition. Information Systems, Fedcsis 2014, Warsaw, Poland, September 7-10, 2014, pages 447–456, 2014. Our main results is summarized in the following theorem. [4] T. H. Szymanski. On the permutation capability of a circuit-switched hypercube. In International Conference on Parallel Processing (ICPP ’89), pages 103–110, Theorem 4. Given a hypercube graph Qd of dimension 1989. d, where d ≥ 4, the HCEE construction produces a maximum [5] J. R. Ullmann. An Algorithm for Subgraph exploratory equivalent partition with the score 2 d!. Isomorphism. J. Assoc. for Computing Machinery, 23:31–42, 1976. Proof. We will prove that 1) we cannot improve the two [6] U. Čibej and J. Mihelič. Improvements to Ullmann’s constructed classes and 2) we will show that using more than algorithm for the subgraph isomorphism problem. two classes would also produce a lower score. The optimality International Journal of Pattern Recognition and of the obtained solution thus follows. Artificial Intelligence, 29(07):1550025, 2015. 1) Since the size of any class is ≤ d, we cannot increase the second class, i.e., the d! factor. Assume, that we can improve upon the first class, say k = |P1| ≥ 3, potentially decreasing the size of the second partition by x. The improved score would thus be k!(d − x)! > 2d!. Furthermore, kk − 2 > dx. 90 Partitioning polyominoes into polyominoes of at most 8 vertices, mobile vs. point guards Ervin Győria ∗ Tamás Róbert Mezeib † August 29, 2016 Abstract We prove that every simply connected polyomino of n vertices can be partitioned into ⌊ 3 n+4 ⌋ (simply connected) polyominoes of at most 16 8 vertices. It yields a new and shorter/simpler proof of the theorem of A. Aggarwal that ⌊ 3 n+4 ⌋ mobile guards are sufficient to control the 16 interior of an n vertex orthogonal polygon. Moreover, we strengthen this result by requiring combinatorial guards (visibility is only needed at the endpoints of patrols) and prohibiting intersecting patrols. This yields positive answers to two questions of O’Rourke [7, Section 3.4]. Our result is also a further example of the metatheorem that (or- thogonal) art gallery theorems are based on partition theorems. We also found and interesting sharp bound on the ratio of the necessary number of appropriate mobile and point guards Kahn, Klawe and Kleitman in 1980 proved that ⌊n⌋ guards are sometimes 4 necessary and always sufficient to cover the interior of an orthogonal polygon ∗ MTA Renyi Institute/CEU Math. Dept. (Budapest) gyori@renyi.hu † CEU Math. Dept. (Budapest) tamasrobert.mezei@gmail.com 1 91 of n vertices. Later the first author of this paper provided a simple and short proof of Theorem 1 ([3], [7, p. 68]). Every simply connected polyomino of n vertices can be partitioned into ⌊n⌋ polyominoes of at most 6 vertices. 4 [? ] is a deeper result than that of Kahn, Klawe and Kleitman, and gave the first hint of the existence of a “metatheorem”: (orthogonal) art gallery theorems have underlying partition theorems. The general case was proved by Hoffmann [5]. Theorem 2 ([5]). Every (not necessarily simply connected!) polyomino with n vertices can be covered by [ n/ 4] guards. Hoffmann’s method (partitioning into smaller polyominoes that can be cov- ered by one guard) is another proof of the metatheorem. In this paper, we present further evidence that the metatheorem holds, namely we prove the following partition theorem: Theorem 3. Any simple polyomino of n vertices can be polyomino- ⌊ ⌋ partitioned into at most 3 n+4 polyominoes of at most 8 vertices. 16 The mobile guard art gallery theorem for simple orthogonal polygons follows immediately, as a polyomino of at most 8 vertices can be covered by a guard. ⌊ ⌋ Theorem 4 ([1], proof in [7, p. 91]). 3 n+4 mobile guards are sufficient for 16 covering an n vertex simple orthogonal polygon. The proof of Aggarval is about 20 pages, so we not only present a shorter proof, but also provide a stronger result which is very interesting on its own. It fits into the series of results in [3], [5], [7, p. 68] showing that rectilinear art gallery theorems are based on theorems on partitions of polyominoes into smaller (one guardable) polyominoes. 2 92 The proof of the main theorem is similar to O’Rourke’s proof in that it finds a suitable cut and then uses induction on the parts created by the cut. However, here a cut along a line connecting to concave vertices is not automatically good. In case we have no such cuts, we also rely heavily on a tree structure of the polyomino. However, we must consider L-shaped cuts too, which are responsible for most of the extra complexity of our analysis. The proof yields an O( n 2) algorithm partitioning P into at most 3 n+4 simple 16 polyominoes. The running time can be improved to O( n) by using linear-time triangulation (Chazelle). Theorem 4 fills a gap between two already estab- lished (sharp) results: in [3] it is proved that polyominoes can be partitioned into at most n polyominoes of at most 6 vertices, and in [4] it is proved that 4 any polyomino in general position (a polyomino without 2-cuts) can be par- titioned into n polyominoes of at most 10 vertices. However, we do not know 6 of a sharp theorem about partitioning polyominoes into polyominoes of at most 12 vertices. Furthermore, for k ≥ 4, not much is known about partition- ing (not necessarily simply connected) orthogonal polygons into polyominoes of at most 2 k vertices. According to the metatheorem, the first step in this direction would be proving that an orthogonal polygon of n vertices with h holes can be partitioned into 3 n+4 h+4 polyominoes of at most 8 vertices. This 16 would generalize the corresponding art gallery result in [4, Thm. 5.]. From the mentioned results we can read off that from an extremal point of view point guards are 3/4 as efficient as mobile guards. The following theorem provides insight into why this is the case, as the 3/4 bound appears already for a single polygon. Theorem 5. Given a P simple orthogonal polygon let mV be the minimum number of vertical mobile guards necessary to cover P , and let mH be defined analogously for horizontal mobile guards, and finally let p be the minimum number of point guards necessary to cover P . Then mV + mH− 1 ≥ 3 . p 4 The proof of this theorem can be turned into an 8/3 -approximation algo- 3 93 rithm for covering simple orthogonal polygons with point guards. References [1] A. Aggarwal, The art gallery theorem: its variations, applications, and algorithmic aspects, Ph.D. thesis, Johns Hopkins Univ. (1984) [2] Chazelle, B., 1991. Triangulating a simple polygon in linear time. Dis- crete and Computational Geometry 6, 485–524. [3] E. Győri, A short proof of the rectilinear art gallery theorem, SIAM J. Alg. Disc. Meth. 7 (1986), 452-454. [4] Győri, E., Hoffmann, F., Kriegel, K., Shermer, T., Generalized guarding and partitioning for rectilinear polygons. Computational Geometry 6 (1996), 21–44. [5] F. Hoffmann, On the rectilinear art gallery problem, Proc. ICALP ’90, Lecture Notes in Comp. Sci. , 443 (1990), pp. 717–728 [6] J. Kahn, M. Klawe, and D. Kleitman, Traditional galleries require fewer watchmen, SIAM J. Alg. Disc. Meth. 4 (1983), 194-206. [7] J. O’Rourke, Art gallery theorems and algorithms, Oxford University Press 57 (1987) 4 94 On Linear Grammars with Exact Control Dávid Angyal Benedek Nagy Department of Computer Science Department of Mathematics Faculty of Informatics Faculty of Arts & Sciences University of Debrecen Eastern Mediterranean University angyal.david@inf.unideb.hu nbenedek.inf@gmail.com ABSTRACT control were introduced in [1]. In these systems only those Grammars with exact control are controlled grammars with control languages are allowed that are subsets of the Szilard the condition that every word of the control language re- language (of the base grammar). In this way new classes of sults at least one word of the derived language. In this pa- languages are obtained. In this paper, our starting point is per, an infinite family of semi-linear languages is presented the class of linear grammars, and they with linear exact con- where the base grammar is a linear grammar and the control trol. We can also use the languages obtained in this manner language is a linear language or a language class obtained as control languages for linear base grammars, and so on. in this manner. Already the class of languages generated by linear grammars with exact linear control contains some 2. DEFINITIONS AND PRELIMINARIES non context-free languages. Normal form result for these Let FIN, REG, LIN, CF, CS, RE denote the family of fi- systems and pumping lemmas are shown to help to prove nite, regular, linear, context-free, context-sensitive, recur- the infinite hierarchy. sively enumerable languages, respectively. Let λ denote the empty word. Keywords linear grammars, controlled grammars, parsing, hierarchy Definition 1. [3] A linear grammar is a quadruple G = (N, T, S, P ), where N is the finite set of nonterminal sym- bols, T is the finite set of terminal symbols, S ∈ N is the 1. INTRODUCTION start-symbol, and P is the finite set of productions of the The class of linear context-free languages is already defined form R → aQb, where R ∈ N , Q ∈ N ∪ {λ}, and a, b ∈ T ∗. by Chomsky, and it is properly between the regular and For uxv, uyv ∈ (N ∪ T )∗, uxv ⇒ uyv, if x → y ∈ P , and ⇒∗ context-free language classes, even some of its properties is is the reflexive and transitive closure of ⇒. The language inherited from the class of regular languages. They are rec- generated by G is defined as L(G) := {w | S ⇒∗ w ∧ w ∈ ognized by finite automata equipped by two reading heads, T ∗}. starting from the two extremes of the input word, see [6, 7]. Because of the big gaps between language classes there are Now we recall a class of 2-head finite automata. various formalisms to obtain other language classes. Sev- eral such formalisms are belonging to the field of regulated rewriting [2, 5]. As examples, matrix grammars and con- Definition 2. (Based on [6, 7]) A 2-head finite automa- trolled grammars are mentioned here. Let us see, this latter ton (Q, T, q0, δ, F ) is defined, similarly to non-deterministic ones in more details. In a controlled grammar there is a base finite automaton, with the finite set of states Q, input (or grammar and a control language. The control language is tape) alphabet T , initial state q0 ∈ Q, transition function used to filter the derivations of the base grammar: only those δ : Q × (T ∪ {λ})2 → 2Q, and set of final (or accepting) derivations are valid in these systems in which the deriva- states F ⊆ Q. Initially the 2 heads are located at the two tion word is belonging to the control language. We could say extremes of the input word, and can read the first and the that one can obtain only words that belong to derivations last letter of the input, respectively (if any). The next state of the intersection of the control language and the Szilard is chosen according to δ based on the actual state and the language. If the control language is the Szilard language [4] letters read by the two heads (also it is allowed to read the of the base grammar, then exactly the same language is gen- empty word by any or both heads). The heads are mov- erated without and with this control. Grammars with exact ing opposite directions. The input word is accepted, if the automaton reaches a final state when the heads meet. The accepted language consists of every word that can be ac- cepted by the automaton. Lemma 1. [6, 7] The class of 2-head finite automata ac- cepts exactly the class of linear context-free languages. Lemma 2. (Bar-Hillel Lemma) [3] For every L ∈ CF, there exists a constant n ∈ N, such that if z ∈ L and |z| ≥ n, 95 then z can be written as z = x0y1x1y2x2 such that y1y2 6= λ, • Add S to M (0, 0). |y1x1y2| ≤ n, and ∀i ∈ N : x0yi1x1yi2x2 ∈ L. • For i > 0: Add A to M (i, j), if ∃B ∈ M (i − 1, j) : B → Awn+1−i ∈ P . Definition 3. [1] G = (N, T, S, P, C) is a grammar with • For j > 0: Add A to M (i, j), if ∃B ∈ M (i, j − 1) : exact control, if B → wj A ∈ P . • N is the finite set of nonterminal symbols, After completing the matrix, check the diagonal entries for • T is the finite set of terminal symbols, nonterminal symbols that can be erased, where M (i, j) is a diagonal entry, if and only if, i + j = n. The word w is in • S ∈ N is the start-symbol, the language generated by the given grammar, if and only • P is the finite set of productions of the form α : u → v, if, there exists E ∈ N , such that E is in at least one of the where diagonal entries of M , and E → λ ∈ P . – α is the label (id) of the production, – u ∈ (N ∪ T )∗ N (N ∪ T )∗, Example 1. Let G = ({S, A, E}, {x, y}, S, {S → yA, S → Sx, S → yE, A → yS, E → λ}) and w = yyyxx. – v ∈ (N ∪ T )∗, • and C is a set of words over the alphabet Labels (P ) := {α | α : u → v ∈ P }, • moreover the following constraint must hold: ∀c1c2 . . . cn ∈ C : ∃w ∈ T ∗ : S ⇒c · · · ⇒ w; 1 cn where xuy ⇒c xvy, if c : u → v ∈ P . The generated language defined as L(G) := {w | ∃c1c2 . . . cn ∈ C : (S ⇒c · · · ⇒ w ∧ w ∈ T ∗)}. 1 cn We say that G is an X grammar with exact Y control, if • (N, T, S, {u → v | α : u → v ∈ P }) is a grammar of type X and • C is a language of type Y, where X, Y ∈ {FIN, REG, LIN, CF, CS, RE}. We denote the family of languages generated by X grammars with Y exact control by EC(X,Y). We find that w ∈ L(G), since E ∈ M (2, 3) and E → λ ∈ P . Definition 4. Let 4. THE FAMILIES ECn(LIN,LIN) In this section, we investigate some interesting properties of • EC1(X,Y) := EC(X,Y), the language family ECn(LIN,LIN) (n ∈ N ). • ECn(X,Y) := EC(X,ECn-1(X,Y)), if n > 1. Theorem 1. Every language in ECn(LIN,LIN) (n ∈ N ) 3. A SIMPLE PARSING ALGORITHM FOR is semi-linear (in Parikh-sense). LINEAR GRAMMARS In this section, we assume that in linear grammars, every Claim 1. There exsits L ∈ EC(LIN,LIN) such that L is production is in one of the following forms: B → aC, B → non-context-free. Ca, or B → λ (where B, C ∈ N and a ∈ T ). Algorithm 1. Let w = w Claim 2. ECn(LIN,LIN) is closed under union (n ∈ N). 1 . . . wn be the input word, where wi ∈ T (i ∈ {1, . . . , n}). Let M denote an (n + 1) × (n + 1) upper-left-triangular matrix. We index the rows and Lemma 3. If L ∈ ECn(LIN,LIN) and c is a terminal let- columns from 0 to n. Each matrix entry is a subset of the ter, then L · {c}, {c} · L ∈ ECn(LIN,LIN) (n ∈ N). nonterminal symbols, initially the empty set. Apply the following rules until no new nonterminal symbol Theorem 2. ECn(LIN,LIN) is closed under (erasing) ho- can be added to the matrix. momorphisms (n ∈ N). 96 5. NORMAL FORMS y1 and y2 can be pumped. Let y1 = y1,1 . . . y1,l(1) and y2 = Lemma 4. LIN = EC(LIN,REG) y2,1 . . . y2,l(2), where yi,j : S → ai,jSbi,j is a production (S ∈ N , ai,j , bi,j ∈ T ∗), for every j ∈ {1, . . . , l(i)} and i ∈ {1, 2}. Finally, w = x00y01x01y02x02y03x03y04x04, where y01 = a1,1 . . . a1,l(1), Definition 5. A linear context-free grammar is in simple y02 = a2,1 . . . a2,l(2), y03 = b2,l(2) . . . b2,1, y04 = b1,l(1) . . . b1,1. form, if it is given as a linear grammar with regular exact It is easy to see that as we pump the control word c = control x0y1x1y2x2 for any i ∈ N, the control word x0yi1x1yi2x2 will generate the word x00y0i 1 x01y0i 2 x02y0i 3 x03y0i 4 x04. • having a single nonterminal symbol (i.e., N = {S}), and Theorem 4. For every L ∈ ECn(LIN,LIN), there exists a constant k ∈ • having no chain rule in its production set (i.e., there N, such that if w ∈ L and |w| ≥ k, then w can be written as w = x is no production S → S). 0y1x1y2x2 . . . y2n+1 x2n+1 such that • y1y2 . . . y2n+1 6= λ, Lemma 5. For every grammar G of type LIN there exists a grammar G0 of type EC(LIN,REG) that is in simple form • ∀i ∈ N : x0yi1x1yi2x2 . . . yi2n+1x2n+1 ∈ L. and L(G) = L(G0). Proof. By induction on n. Definition 6. A grammar G of type ECn(LIN,LIN) (n ∈ N), is in simple form if it has a single nonterminal symbol, • Base case: n = 1. and it has no chain rules. Proved in Lemma 6. • Suppose the claim is true for every m < n, we show Theorem 3. For every grammar G of type ECn(LIN,LIN), that it is also true for n. there exists a grammar G0 of type ECn(LIN,LIN) in simple If L ∈ ECn(LIN,LIN) and G = (N, T, S, P, C) is a form (n ∈ N). grammar in simple form that generates L, then C ∈ ECn-1(LIN,LIN). By the induction hypothesis, there 6. PUMPING LEMMAS FOR LANGUAGES exists k ∈ N such that every word in C having length GENERATED BY EXACT CONTROL at least k can be pumped. Now consider the grammar G0 = (N, T, S, P, {w ∈ C | |w| < k}). Let Lemma 6. For every L ∈ EC(LIN,LIN), there exists a constant k ∈ N, such that if w ∈ L and |w| ≥ k, then w can k0 := 1 + max {|w|}. be written as w = x w∈L(G0) 0y1x1y2x2y3x3y4x4 such that In other words, every word in L with at least length k0 can be generated by a control word that can be • y1y2y3y4 6= λ, pumped. Let yi := yi,1 . . . yi,l(i) denote the subwords • ∀i ∈ of the control word to be pumped and let y N : x0yi i,j : S → 1x1yi 2x2yi 3x3yi 4x4 ∈ L. ai,j Sbi,j ∈ P (S ∈ N , ai,j , bi,j ∈ T ∗), for every j ∈ {1, . . . , l(i)} and i ∈ {1, . . . , 2n}. Next we choose the Proof. Let G = (N, T, S, P, C) be a grammar of type subwords of the generated word: let EC(LIN,LIN) in simple form that generates L. The control  language C is linear, therefore Lemma 2 can be applied. ai,1 . . . ai,l(i),  Let n ∈  N denote a pumping constant of C, and let C|  n :=  if 1 ≤ i ≤ 2n,  {w ∈ C | |w| < n}. Clearly C|  n is a finite set. Consider y0i := the grammar G0 := (N, T, S, P, C|n) of type EC(LIN,FIN).  b . . . b , L(G0) ∈ FIN and let  2n+1−i+1,l(2n+1−i+1) 2n+1−i+1,1    if 2n + 1 ≤ i ≤ 2n+1. k := 1 + max {|w|}. w∈L(G0) This formula is the generalised form of the formula shown in Lemma 6. There are 2n+1 subwords. The We claim that k is a pumping constant for L. The number first 2n subwords get the ai,j words in order, and the n was a pumping constant for the control language of L, second 2n subwords get the bi,j words in reverse order. therefore every control word having length greater than or It is easy to see that as the control word is pumped, equal to n can be pumped. By the definition of k, k − 1 is the generated word pumped too, i.e., for any i ∈ N the the maximal length for a word in L that can be generated control word x0yi1x1yi2 . . . yi2n x2n generates the word by control words having length less than n. Therefore, if a x00y0i 1 x01y0i 2 . . . y0i 2n+1 x02n+1 . word in L has length of at least k, it has to be obtained by a control word having length at least n, i.e., a control word which can be pumped. Suppose w ∈ L, |w| ≥ k and c is a control word for w. Then 7. INFINITE HIERARCHY |c| ≥ n and by Lemma 2 c = x0y1x1y2x2 and the subwords Now we are ready to esatblish our main result. 97 Theorem 5. ECn(LIN,LIN) ( ECn+1(LIN,LIN) (n ∈ We have left open some interesting questions about these N). classes. Is ∪∞ i=1ECi(LIN,LIN) closed under concatenation? In other words, for arbitrary L1, L2 ∈ ECn(LIN,LIN) is there a language L ∈ ECm(LIN,LIN) such that L1 ·L2 = L? Is the Proof. We are going to use the language family L(k) := Dyck-language contained in ECn(LIN,LIN) for some n ∈ { N? (anbn)k | n ∈ N} as separating languages, where k ∈ N. And in general, what is the relation of the family of context- Clearly L(1) ∈ LIN, for instance G = ({S}, {a, b}, S, {S → free languages to the above families? aSb, S → ab}) generates it. As a matter of our future work we are working on an exten- We show that L(2k) ∈ ECk(LIN,LIN). In every case, the sion of Algorithm 1 to the family EC(LIN,LIN). nonterminal alphabet, the terminal alphabet, the produc- tion set and the start symbol are always the same, only the control language changes. 9. ACKNOWLEDGMENTS • N = {S}, • T = {a, b}, • P = {α : S → aSb, β : S → bSa, γ : S → λ}. Let h : T ∗ → {α, β}∗ be defined as h(a) := α, h(b) := β. Note that h can be interpreted as a simple letter substitu- Emberi Erőforrások tion. There is no need for a general homomorphism. Minisztériuma Induction by k: • Base case: k = 1: Supported through the New National Excellence Program L(2) is generated with the control set h(L(1)) · {γ}, of the Ministry of Human Capacities i.e., {αnβnγ | n ∈ N} ∈ LIN. The authors thank the helpful comments of the anoninious • Suppose the claim is true for m < k where k > 1, we reviewer. show that it is also true for k: L(2k) ∈ ECk(LIN,LIN) can be generated with the con- 10. REFERENCES trol set h(L(2k−1)) · {γ} ∈ ECk-1(LIN,LIN). [1] D. Angyal and B. Nagy. On language families generated by controlled grammars. In R. Freund, M. Holzer, See Lemma 3 on the control set modifications. N. Moreira, and R. Reis, editors, Seventh Workshop on Non-Classical Models of Automata and Applications - Now, we show that L(2k+1) / ∈ ECk(LIN,LIN). We can NCMA 2015, Porto, Portugal, August 31 - September use Theorem 4. Consider the word (anbn)k+1, where n is 1, 2015. Proceedings, volume 318 of books@ocg.at, pages a pumping constant of L(2k+1). Every word in L(2k+1) 59–72. Österreichische Computer Gesellschaft, 2015. has exactly 2k+1 occurrences of the subword ab and exactly [2] J. Dassow and Gh. P˘ aun. Regulated Rewriting in 2k+1 − 1 occurrences of the subword ba, and therefore the Formal Language Theory, volume 18 of EATCS word consists of 2k+1 blocks of a’s and the same number of Monographs in Theoretical Computer Science. blocks of b’s. The subwords y Springer-Verlag Berlin, 1989. 1, y2, . . . , y2k+1 have to be chosen in a way such that each y [3] J. E. Hopcroft and J. D. Ullman. Introduction to i may contain only a’s or only b’s. However, the 2k+1 subwords can take place in at Automata Theory, Languages and Computation. most 2k+1 blocks, leaving another at least 2k+1 blocks out of Addison-Wesley, 1979. pumping. Pumping subwords in the selected blocks will re- [4] E. Mäkinen. A bibliography on Szilard languages, sult different number of letters in the different blocks, which technical report, department of computer sciences, contradicts the language definition. university of tampere, 1998. [5] A. Meduna and P. Zemek. Regulated Grammars and 8. CONCLUSIONS Automata. Springer, 2014. Grammars with exact control are a relatively new family [6] B. Nagy. On 5’ → 3’ sensing Watson-Crick finite of generative systems. In this paper linear grammars are automata. In M. H. Garzon and H. Yan, editors, DNA considered. The derivations of these grammars are simple, Computing, 13th International Meeting on DNA actually, they have regular Szilard languages. However, by Computing, DNA13, Memphis, TN, USA, June 4-8, using an exact linear control, they are able to generate some 2007, Revised Selected Papers, volume 4848 of LNCS, non context-free languages. By allowing a kind of iteration pages 256–262. Springer, 2008. on the control language, an infinite hierarchy of language [7] B. Nagy. On a hierarchy of 50 → 30 sensing classes is presented here. We close the paper with some Watson-Crick finite automata languages. J. Log. future directions of this research line. Comput., 23(4):855–872, 2013. 98 Some computable functions without Brouwer fixed-points ∗ † [Extended Abstract] Petrus H. Potgieter Department of Decision Sciences University of South Africa (Pretoria) php@member.ams.org potgiph@unisa.ac.za ABSTRACT point is a point all the coordinates of which are computable This paper is an overview of results that show the Brouwer reals. The notation fixed-point theorem to be essentially non-constructive and non-computable and discusses some computable functions N0 for the non-negative natural numbers; without computable fixed points. The counter-examples of Rc for the set of computable reals; Orevkov and Baigger that imply that there is no procedure Ic for I ∩ Rc; and for finding the fixed point in general and do so by giving an δX for the boundary of a set X, being X ∩ Xc example of a computable function which does not fix any computable point. In this contribution, we discuss some examples of computable functions not fixing any computable is also used. point. We consider the Brouwer fixed-point theorem in the follow- Categories and Subject Descriptors ing form, where the standard unit interval is denoted by G.1.0 [Mathematics of computing]: Numerical analysis— I = [0, 1]. General Theorem 1 (Brouwer). Any continuous function f : General Terms I2 → I2 has a fixed point, i.e. there exists an x ∈ I2 such Theory that f (x) = x. Keywords The two examples discussed use distinct definitions of a com- Computable analysis, Brouwer fixed-point theorem putable function of real variables. 1. INTRODUCTION Russian school Recall that computable real number is a number for which a In the Russian school of Markov and others, a computable Turing machine exists that, on input n, produces a rational function maps computable reals to computable reals by a approximation with error no more than 2−n. A computable single algorithm for the function that translates an algorithm ∗Research supported in part by NRF incentive grant approximating the argument to an algorithm approximating IFR2011041500051 and the EU project COMPUTAL as well the value of the functions. It need not be possible to extend as the University of South Africa’s College of Economic and a function that is computable in the Russian school to a Management Sciences Research Committee. All opinions continuous function on all of the reals. These functions are expressed in this work are the author’s and in not neces- often called Markov-computable. sarily endorsed or supported by his present, future or past employer(s) and/or funder(s). † Polish school Most of the material in this paper can also be found in arXiv:0804.3199 [math.GM], with more detail and dia- In the Polish school of Lacombe, Grzegorczyk, Pour-El and grammes. Richards, and others, a function is computable on a region if it maps every computable sequence of reals to a computable sequence of reals and it has a computable uniform modulus of continuity on the region [9]. 2. OREVKOV’S EXAMPLE One can construct a Markov-computable function f through a computable mapping of descriptions of computable points x ∈ I2 c to descriptions of f (x) ∈ I 2 c , such that f (x) 6= x ∀x ∈ I2 c . 99 That is, no computable point is a fixed point for f . Unfor- Definition 1. For any W ⊆ I2 we define tunately the f which is constructed in this way, cannot be extended to a continuous function on I2. This is the (Rus- W ε = x ∈ W d x, δW \ δI2 ≥ ε sian school) construction of [7], another instance of which and can be found in [12]. W ε = x ∈ W d x, δW \ δI2 = ε . 3. BAIGGER’S EXAMPLE For the nowadays more current approach of the Polish school, One can define fn such that a counter-example was constructed by Baigger [1]. Let a be any non-computable point in I2. Consider the function f which moves each point half-way to a, 1. fn moves every point in the interior of C2−n n but is 1 the identity outside the set, and is computable; f (x) = x + (a − x) 2 2−n· 3 2. f 2 and has a single fixed point, namely a itself. The function f n+1 agrees with fn on Cn and therefore is continuous and defined on all of I2 and has no computable 3. f = limn→∞ fn is computable. fixed point. Nevertheless, this is not really interesting since Every computable point eventually lies in some • the fixed point a has no reasonable description—since it is itself not computable; and therefore 2−n· 3 ◦ C 2 n ⊂ C2−n n • the function f has no reasonable description—it is not and is therefore moved by f . Clearly f (I2) ⊆ I2 and f will computable in any sense. be as required. In fact, f has no fixed point in [ [ One would like to see a function which is computable, defined Cn = Jk × J`. (and therefore continuous) on all of I2 and yet avoids fixing n k,`≥1 any of the computable points I2 c . The following example, Also, f as constructed here has no isolated fixed point— having appeared in [1] and in [12], modifies the construc- its fixed points all occur on horizontal and vertical lines tion of Orevkov to produce a computable f defined on all of spanning the height and breadth of the unit square. Fur- I2 having no computable fixed point. As in the example of ther details of the construction appear in Appendix A. The Orevkov, we need the following fact. construction cannot be applied in the one-dimensional case because it is impossible to effect a change of direction by Lemma 1 ([6], for example). There exist computable continuous rotation and, of course, in one dimension one sequences of rational numbers (a can compute the fixed point. n) and (bn) in the interval I = [0, 1] such that the intervals Jn = [an, bn] have the fol- lowing properties. 4. THE KÖNIG LEMMA In reverse mathematics it is known that in RCA0, the sys- (i) If n 6= m then |J tem of recursive comprehension and Σ0 n ∩ Jm| ≤ 1. 1 -induction, the weak König lemma, WKL0, is equivalent to the Brouwer FPT [11]. (ii) If an 6= 0 then an ∈ {b0, b1, . . .} and if bn 6= 1 then bn ∈ {a0, a1, . . .}. Lemma 2 (WKL0, K˝ onig). Every infinite binary tree (iii) I S c ( J n n, i.e. the Jn cover the computable reals in has an infinite branch. I = [0, 1]. The König lemma does not have a direct computable coun- One uses the intervals Jn = [an, bn] of the lemma above and terpart. sets [ Cn = Jk × J` Theorem 2 (Kleene [5]). There exists an infinite bin- k,`≤n ary tree, all the computable paths of which are finite. after which one defines f progressively, using the sets Cn. The points The relation of the Kleene tree to the Baigger counterexample tn = (vn, vn) can be relatively easily constucted and is detailed in [8]. where vn = min {x | (x, x) 6∈ Cn} 5. REMARKS x∈I We close with some remarks about the fixed points of com- are used as “target point” at each stage of the construction. putable functions that may be of interest. Note that v = lim vn n→∞ n n Theorem 3. If g :⊆ R → R is computable on [a1, b1] × is not a computable number and (v, v) will be one of the · · ·×[an, bn] and has an isolated fixed point in (a1, b1)×· · ·× fixed points of f . (an,bn) then that fixed point is a computable point. 100 This result is mentioned by [4], for example, with a proof can be returned as the required approximation of z. The outline. hypothesis that z was not computable is thereby contra- dicted. Proof. It is sufficient to consider h(x) = kg(x) − xk Theorem 4. For any ε > 0 there exists a computable f on I2 that fixes no computable point but the set of fixed points and to show that if h has an isolated zero then that zero is of which has Lebesgue measure at least 1 − ε. computable. Assume therefore that h(z) = 0 where z ∈ [e1, f1] × · · · × [en, fn] ⊆ (c1, d1) × · · · × (cn,dn) ⊆ (a This is a straight-forward consequence of the fact that the 1, b1) × · · · × (an,bn) intervals in Lemma 1 can be chosen to be arbitrarily small. and h(x) > 0 whenever x ∈ (c1, d1) × · · · × (cn,dn) \ {z} and all the given interval bounds are computable numbers. 6. FURTHER QUESTIONS Suppose now that z were non-computable. Let (q Suppose x is a non-computable fixed point of a computable n) be any computable enumeration of the rational points in (c function f . By the preceding, it is not an isolated fixed 1, d1) × · · · × (c point but is it necessarily an accumulation point of the non- n,dn). Since obviously z 6∈ {q1, q2, . . .}, computable fixed points? h(qn) > 0 ∀n and hence for each n it is possible to choose ε On the other hand, can an accumulation point of non-computable n by setting fixed points be a computable (fixed) point? 1 εn = h (qn) 4 7. CONCLUSION so that εn > 0 for each n. Since h is computable on [a1, b1]× · · · × The existence of the Kleene tree can quite easily be derived [an, bn] it has a decreasing modulus of uniform continu- from the impossibility of ensuring the existence of a comput- ity mh. Define able fixed point for a computable function (in both Russian δn = mh (εn) and Polish senses), in two dimensions (or higher). The in- genuous constructions of Orevkov and Baigger provide a way so that for each n it is true that of defining a computable function with no computable fixed 3 kx − q point from the set of intervals derived from the Kleene tree, nk < δn ⇒ h(x) > h (qn) 4 in a constructive manner. This correspondence is, perhaps, and that on the open ball B (qn, δn) the function h is bounded more attractive for the “working mathematician” than the strictly away from 0 by 3 h (q elegant derivation of the result in reverse mathematics. In 4 n) > 0. one dimension, any computable f : I → I does have a com- The B (qn, δn) now form an open cover of [e1, f1] × · · · × putable point x ∈ Ic such that f (x) = x, which can be seen [en, fn] \ {z}. For, given any x 666= z in [e1, f1] × · · · × [en, fn] by fairly straight-forward reductio ad absurdum. it is the case that h is bounded away from 0 on Non-computable fixed points of computable functions ap- 1 V = [e pear in a relatively complicated way. 1, f1] × · · · × [en, fn] \ B z, kz − xk 2 and hence there exists a κ > 0 such that εn > κ whenever 8. REFERENCES qn ∈ V and therefore [1] Günter Baigger. Die Nichtkonstruktivität des δ Brouwerschen Fixpunktsatzes. Archiv für n > mh(κ) > 0 for each qn ∈ V. Mathematische Logik und Grundlagenforschung, As a result, x ∈ B (qi, mh(κ)) ⊂ B (qi, δi) for some qi ∈ V . 25(3-4):183–188, 1985. [2] V. Brattka, S. Le Roux, and A. Pauly. Connected We now have computable sequences (qn) and (δn) which Choice and the Brouwer Fixed Point Theorem. June allow us to approximate z arbitrarily closely. For, given any 2012. arXiv:1206.4809B. ε > 0 the balls B (qn, δn) form an open cover of the compact [3] Janusz Brzdek, Liviu C˘ adariu, and Krzysztof set , Ciepliński. Fixed point theory and the Ulam stability. ε [e1, f1] × · · · × [en, fn] \ B z, Journal of Function Spaces, 2014:1–16, 2014. 2 [4] Jeffry L. Hirst. Notes on Reverse Mathematics and so, if a computable ε is given, an enumeration of the balls Brouwer’s Fixed Point Theorem. Available online:- can be done, stopping as soon as the complement of http://www.mathsci.appstate.edu/~jlh/snp/ m pdfslides/bfp.pdf, 2000. [ B (qi, δi) [5] S. C Kleene. Recursive functions and intuitionistic i=1 mathematics. pages 679–685, Providence, R. I., 1952. has diameter less than ε at which stage any point of Amer. Math. Soc. [6] Joseph S. Miller. Degrees of unsolvability of m [ [e continuous functions. The Journal of Symbolic Logic, 1, f1] × · · · × [en, fn] \ B (qi, δi) 69(2):555–584, 2004. i=1 101 [7] V. P. Orevkov. A constructive map of the square into ways. The important part of the proof is that the construc- itself, which moves every constructive point. Doklady tion is, at each stage, extended at the boundary to “look Akademii Nauk SSSR, 152:55–58, 1963. right” from the outside. This ensures that, eventually every [8] Petrus H. Potgieter. Computable counter-examples to point is in fact moved towards one of a sequence of points the Brouwer fixed-point theorem, 2008. that converge to the non-computable fixed point (v, v) on the arXiv:0804.3199. diagonal. The Baigger construction is a somewhat delicate [9] Marian B. Pour-El and J. Ian Richards. Computability construction of a function that is in fact computable but in analysis and physics. Perspectives in Mathematical that—somehow—mimics a simple mapping of every point Logic. Springer-Verlag, Berlin, 1989. in I2 in the direction of (v, v). [10] J. Rocha, B. Rzepka, and K. Sadarangani. Fixed point theorems for contractions of rational type with PPF dependence in Banach spaces. Journal of Function Spaces, 2014:1–8, 2014. [11] Naoki Shioji and Kazuyuki Tanaka. Fixed point theory in weak second-order arithmetic. Annals of Pure and Applied Logic, 47(2):167–188, 1990. [12] Kam-Chau Wong and Marcel K. Richter. Non-computability of competitive equilibrium. Economic Theory, 14(1):1–27, 1999. Appendix: details for Section 3 The constructions should guarantee that at each stage, the function fn moves every point of ◦ 2−n· 5 D 4 n = C2−n n \ Cn in the direction of tn by an amount proportional to its dis- tance to C2−n n . The construction of f1 with this property is trivial. We proceed to construct fn+1 from fn. (i) Extend and modify fn to C2−n n+1 so that every point x of ◦ 2−n· 5 C2−n 4 n+1 \ Cn+1 is moved in the direction of tn by an amount propor- tional to d x, C2−n n+1 . (ii) Modify the resulting function so that each point in 2−n· 9 C2−n 8 n+1 \ Cn+1 is mapped a non-negative amount proportional to its distance to C2−(n+1) n+1 in the direction of tn. (iii) By rotation of the direction of the mapping, extend the function to C2−(n+1) n+1 such that every point x of ◦ 2−(n+1)· 5 D 4 n+1 = C2−(n+1) n+1 \ Cn+1 is mapped in the direction of tn+1 by an amount pro- portional to d x, C2−(n+1) n+1 . The final step is the only one in which we use the fact that we are working in two dimensions as this step requires the continuous (computable) rotation of a vector in the direction of tn to a vector in the direction of tn+1. A construction is given explicitly in [1] but it should be clear from the preceding that it can be done in many different 102 Indeks avtorjev / Author index Angyal Dávid ............................................................................................................................................................................... 95 Bánhelyi Balázs ............................................................................................................................................................................ 44 Békési József ................................................................................................................................................................................ 48 Berczi Kristof ............................................................................................................................................................................... 61 Brodnik Andrej ............................................................................................................................................................................. 36 Čibej Uroš .................................................................................................................................................................................... 87 Csaba Bela .................................................................................................................................................................................... 68 Csehi Csongor Gy. ....................................................................................................................................................................... 13 Dávid Balázs .................................................................................................................................................................................. 9 Depolli Matjaž .............................................................................................................................................................................. 40 Dobravec Tomaž .......................................................................................................................................................................... 28 Dömösi Pál ................................................................................................................................................................................... 24 Dosa Gyorgy ................................................................................................................................................................................ 80 Ercsey Zsolt .................................................................................................................................................................................. 20 Farkas Márk ................................................................................................................................................................................. 13 Fernández Elena ........................................................................................................................................................................... 51 Fürst Luka .................................................................................................................................................................................... 87 Galambos Gábor ........................................................................................................................................................................... 48 Gáll József .................................................................................................................................................................................... 24 Gera Imre ..................................................................................................................................................................................... 44 Goossens Dries ............................................................................................................................................................................. 72 Győrffy Lajos ............................................................................................................................................................................... 32 Gyori Ervin ................................................................................................................................................................................... 91 Horváth Géza ............................................................................................................................................................................... 24 Jovičić Vladan .............................................................................................................................................................................. 36 Kellerer Hans ............................................................................................................................................................................... 80 Kiraly Zoltan ................................................................................................................................................................................ 61 Konc Janez ................................................................................................................................................................................... 40 Kovács Zoltán .............................................................................................................................................................................. 20 Laporte Gilbert ............................................................................................................................................................................. 51 Lelkes Zoltán ................................................................................................................................................................................ 16 Liu Changshuo ............................................................................................................................................................................. 61 London András ............................................................................................................................................................................. 32 Makay Géza ................................................................................................................................................................................. 32 Mezei Tamas ................................................................................................................................................................................ 91 Mihály Zsolt ................................................................................................................................................................................. 16 Mihelič Jurij ................................................................................................................................................................................. 87 Miklos Dezso ............................................................................................................................................................................... 57 Miklós István .......................................................................................................................................................................... 61, 76 Nagy Benedek .............................................................................................................................................................................. 95 Palangetić Marko.......................................................................................................................................................................... 36 Palfi Laszlo ..................................................................................................................................................................................... 5 Pekec Saša .................................................................................................................................................................................... 54 Pjevalica Nebojša ........................................................................................................................................................................... 5 Podgorelec David ......................................................................................................................................................................... 83 Potgieter Petrus H......................................................................................................................................................................... 99 Rodríguez Pereira Jessica ............................................................................................................................................................. 51 Silai Daniel ................................................................................................................................................................................... 36 Špelič Denis ................................................................................................................................................................................. 83 Spieksma Frits .............................................................................................................................................................................. 72 Subotić Miloš ................................................................................................................................................................................. 5 Szabo Sandor .......................................................................................................................................................................... 40, 65 Tihanyi Norbert ............................................................................................................................................................................ 24 Tóth Ádám ................................................................................................................................................................................... 13 Tuza Zsolt..................................................................................................................................................................................... 80 Vangerven Bart ............................................................................................................................................................................ 72 103 Vasarhelyi Balint .......................................................................................................................................................................... 68 Vincze Nándor ............................................................................................................................................................................. 20 x 44 Zavalnij Bogdan ..................................................................................................................................................................... 40, 65 104 Konferenca / Conference Uredili / Edited by Middle-European Conference on Applied Theoretical Computer Science (MATCOS 2016) Prof. Andrej Brodnik Document Outline Blank Page Blank Page Blank Page 01-54.pdf 01-paper_4 02-paper_29 03-paper_19 04-paper_24 05-paper_28 11-paper_9 12-paper_8 13-paper_20 14paper_23 15paper_22 Introduction Product graphs and their colorings First hereditary coloring scheme Second hereditary coloring scheme Third hereditary coloring scheme Proposed program and preliminary results References 21-paper_10 22-paper_11 23-paper_17 24-paper_26 31-paper_14 32-paper_16 33-paper_18 References 34-paper_12 Introduction Proof of Theorem 1 A generalization References 41-paper_6 42-paper_7 43-paper_13 44-paper_25 51-paper_5 52-paper_21 53-paper_30 54-paper_15 Blank Page