Communication and Fault-Tolerance in Distributed Computing

GR/M75105

Principal investigator: Dr. Leszek Gasieniec
Co-investigators: Prof. Alan Michael Gibbons and Prof. Wojciech Rytter

EPSRC Visiting Fellowship for Prof. Bogdan S. Chlebus, Instytut Informatyki, Uniwersytet Warszawski, Poland

Summary of objectives and results

This research is on the algorithmic aspects of information exchange and retrieval in communication networks, and on fault-tolerant distributed computing. We focused our attention on the efficiency and reliability of communication and computing in a distributed setting. The studies include communication in radio networks [1], as well as performing independent tasks in the presence of processors failures [2].

Deterministic radio broadcasting We consider broadcasting in radio networks: one node of the network knows a message that needs to be learned by all the remaining nodes. We seek distributed deterministic algorithms to perform this task. Radio networks are modeled as directed graphs. They are unknown, in the sense that nodes are not assumed to know their neighbors, nor the size of the network, they are aware only of their individual identifying numbers. If more than one message is delivered to a node in a step then the node cannot hear any of them. Nodes cannot distinguish between such collisions and the case when no messages have been delivered in a step. The fastest previously known deterministic algorithm for deterministic distributed broadcasting in unknown radio networks works in time $O(n^{11/6})$ and it is based on the notion of a {\sl selective family}. We develop three new deterministic distributed algorithms. Algorithm A develops further the ideas of the selective family and operates in time $O(n^{1.77291})=O(n^{9/5})$, for general networks, and in time $O(n^{1+a+\H(a)+o(1)})$ for sparse networks with in-degrees $O(n^a)$ for $a<1/2$; here H is the entropy function. Algorithm B uses a new approach to selective families and works in time $O(n^{3/2}\log^{1/2} n)$ for general networks or $O(n^{1+a+o(1)})$ for sparse networks. Algorithm C further improves the performance for general networks running in time $O(n^{3/2})$.

Performing Tasks in Networks We consider the problem Do-All in which $p$~processors need to perform $t$~independent tasks. Processors communicate by message passing and may fail by crashing. Faults are controlled by an adversary who is fully adaptive and subject to the only restriction that a constant fraction of processors cannot be failed. The complexity is measured by effort, which is work (measured by the available processors steps) and communication combined. We develop a number of algorithms; in this abstract their performance is given for the case $n=p=t$. The effort of a simple deterministic algorithm based on load-balancing is $O((n\log n)^{3/2})$, this algorithm is fully constructive. The expected effort of a randomized algorithm is $O(n\log^2n/\log \log n)$ with probability at least $1-\exp(-\alpha p)$, for some positive constant~$\alpha$. The main result is a deterministic algorithm with effort $O(n\cdot \log^2 n)$, this algorithm uses a family of permutations of tasks to safeguard the processors against any strategy of the adversary. All previous algorithms have the effort $\Omega(n^2)$ if the number of faults is as large as a linear fraction of processors. Each of our algorithms is an improvement with respect to the algorithms known before for the adversarial model considered. The most efficient algorithm has effort close to the lower bound $\Omega(n\log n/\log\log n)$.

During his visit in United Kingdom prof. B.S.Chlebus payed a number of scientific visits in King's College London.

Acknowledgement We would like to thank EPSRC for funding this project.

[1] B.Chlebus, L.Gasieniec, D.Kowalski, and A.A.Shvartsman, Performing Tasks in Networks with a Constant Fraction of Non-Faulty Processors, submitted.

[2] B.Chlebus, L.Gasieniec, A.Ostlin, and M.Robson, Deterministic Radio Broadcasting, In Proc. of 27th International Colloquium on Automata, Languages and Programming, ICALP'2000, Geneva, 9-15 July, 2000, Springer LNCS 1853, pp. 717--728.