Research

My research interest is centered around privacy, scalability, security and reliability of distributed systems. The applications may differ but the goal remains the same: study theoretical and fundamental limits of innovative storage and computing systems and design codes achieving those limits. My current work is motivated by federated learning, private and secure distributed computing, distributed storage, DNA-based storage systems and private and secure network coding. A brief description of the topics is given next.

Federated Learning

Tremendous amount of data is generated every minute by a huge collection of electronic devices owned by different users. Federated learning emerged as a natural mechanism to process and analyze the users' data without moving it to a central server. A federator asks the users to run a gradient descent algorithm on their local data and share the gradient with the federator at every iteration. The challenges of federated learning include reducing the exorbitant communication cost, maintaining privacy of the processed data and robustness against users dropping out and malicious users actively trying to corrupt the computation. Those challenges can be tackled by using gradient compression techniques and by designing new secret sharing schemes tailored to this specific problem and understand the fundamental limits that arise to best design the desired schemes.

 

Private and Secure Coded Computing

A parameter server (referred to as main node or master node) wants to offload intensive computation on sensitive data to cheap commodity nodes (referred to as worker nodes) to parallelize and speed up the computation. Two main challenges arise in this setting: 1) mitigating the effect of slow or unresponsive worker nodes, known as stragglers; and 2) maintaining privacy of the sensitive data while offloading computations to the workers. When the computation is linear or a polynomial function of the input data, those challenges can be tackled by adding redundancy and using tools from coding theory and secret sharing. However, adding redundancy artificially increases the computation load of the workers. Hence the need to understand the limits on the required redundancy for a certain number of stragglers and design schemes that match those limits. We tackle those challenges for several types of computations and for homogeneous and heterogeneous time-varying computing systems.

  1. N. Raviv, R.Bitar and E. Yaakobi, Information Theoretic Private Inference in Quantized Models, IEEE International Symposium on Information Theory (ISIT), 2022.
     
  2. M. Xhemrishi, R. Bitar and A. Wachter-Zeh, Distributed Matrix-Vector Multiplication with Sparsity and Privacy Guarantees, IEEE International Symposium on Information Theory (ISIT), 2022.

  3. C. Hofmeister, R. Bitar, M. Xhemrishi and A. Wachter-Zeh, Secure Private and Adaptive Matrix Multiplication Beyond the Singleton Bound, IEEE Journal on Selected Areas in Information Theory (JSAIT), Vol. 3, No. 2, June 2022. (arXiv:2108.05742)

  4. R. Bitar, M. Xhemrishi and A. Wachter-Zeh, Adaptive Private Distributed Matrix Multiplication, IEEE Transactions on Information Theory, Vol. 68, No. 4, April 2022. (arXiv:2004.12925)

  5. C. Hofmeister, R. Bitar, M. Xhemrishi and A. Wachter-Zeh, Secure Private and Adaptive Matrix Multiplication Beyond the Singleton Bound, Twelfth International Workshop on Coding and Cryptography (WCC), March 2022.

  6. R. Bitar, Y. Xing, Y. Keshtkarjahromi, V. Dasari, S. El Rouayheb, and H. Seferoglu, Private and Rateless Adaptive Coded Computation at the Edge, EURASIP Journal on Wireless Communications and Networking, Vol.1, January 2021. (arXiv:1909.12611)

  7. R. Bitar, M. Xhemrishi and A. Wachter-Zeh, Fountain Codes for Private Distributed Matrix-Matrix Multiplication, IEEE International Symposium on Information Theory and its Applications (ISITA), 2020.

  8. R. Bitar, P. Parag and S. El Rouayheb, Minimizing Latency for Secure Coded Computing Using Secret Sharing via Staircase Codes, IEEE Transactions on Communication, Vol. 68, No. 8, August 2020. (arXiv:1802.02640)

  9. Y. Keshtkarjahromi, R. Bitar, V. Dasari, S. El Rouayheb, and H. Seferoglu, Secure Coded Cooperative Computation at the Heterogeneous Edge against Byzantine Attacks, preprint, arXiv:1908.05385, 2020.

  10. Y. Keshtkarjahromi, R. Bitar, V. Dasari, S. El Rouayheb, and H. Seferoglu, Secure Coded Cooperative Computation at the Heterogeneous Edge against Byzantine Attacks, IEEE Global Communication Conference (GLOBECOM), Waikoloa, 2019

  11. R. Bitar, Y. Xing, Y. Keshtkarjahromi, V. Dasari, S. El Rouayheb, and H. Seferoglu, PRAC: Private and Rateless Adaptive Coded Computation at the Edge, SPIE Defense + Commercial Sensing, Baltimore, 2019.

  12. E. Klarlund, R. Bitar and S. El Rouayheb, Search Efficient Blockchain-Based Immutable Logging And Querying, submitted to BMC Medical Genomics, 2018.

  13. R. Bitar, P. Parag and S. El Rouayheb, Minimizing Latency for Secure Distributed Computing, IEEE International Symposium on Information Theory (ISIT), Aachen, 2017.

  14. R. Bitar, Codes for Private Distributed Computation with Applications to Machine Learning, Rutgers University school of graduate studies electronic theses and dissertations, 2020.

 

Distributed Stochastic Gradient Descent

In this setting, a parameter server (referred to as main node or master node) wants to offload the computations required to run a machine learning algorithm on sensitive data to cheap commodity nodes (referred to as worker nodes). The focus of this research direction is to mitigate the effect of stragglers without resorting to the use of redundancy. The main computation of machine learning algorithms is that of evaluating the gradient of a loss function on the input data. Under mild assumptions, stragglers can be simply ignored. The effect of ignoring the stragglers is a slower convergence of the machine learning algorithm. Hence the need to understand the effect of the number of ignored stragglers on the convergence of the algorithm and design mechanisms that gradually decrease the straggler tolerance to reduce the overall runtime. Further, assigning tasks to all workers and ignoring the stragglers incurs extra costs. Thus, it is more efficient to assign tasks only to a subset of workers deemed to be the fastest. We tackle the challenge of learning the speed of the workers while assigning them useful computations and reducing the overhead required for the learning process.

  1. M. Egger, R. Bitar, A. Wachter-Zeh and D. Gündüz, Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits, preprint, arXiv:2202.08302, 2022.
     
  2. M. Egger, R. Bitar, A. Wachter-Zeh and D. Gündüz, Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits, IEEE International Symposium on Information Theory (ISIT), 2022.

  3. S. Kas Hanna, R. Bitar, P. Parag, V. Dasari and S. El Rouayheb, Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning, preprint, arXiv:2208.03134, 2022.

  4. S. Kas Hanna, R. Bitar, P. Parag, V. Dasari and S. El Rouayheb, Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, 2020.

  5. R. Bitar, M. Wootters and S. El Rouayheb, Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning, IEEE Journal on Selected Areas in Information Theory,, Vol.1, No. 1, May 2020. (arXiv:1905.05383)

  6. R. Bitar, M. Wootters and S. El Rouayheb, Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning, IEEE Information Theory Workshop (ITW), 2019.

 

Privacy and Security in Distributed Storage

With cloud storage and large-scale data centers, data storage is becoming distributed by nature. Despite its great benefits, distributed storage is prone to node failures and adversarial attacks. Node failure is now the norm rather than the exception. Traditional error-correcting codes used to recover from node failures incur huge communication overheads. To reduce the communication overhead, regenerating codes are proposed. Those codes provide a tradeoff between the storage overhead (redundancy) and the communication overhead. However, regenerating codes are a dynamic family of codes that can repair nodes failures sequentially. Hence, if care is not taken, a malicious attacker jamming one node in the system can corrupt the whole stored data through what is known as pollution attack. In addition, guaranteeing the strong information-theoretic privacy of the stored data requires the use of secret sharing which are equivalent to traditional error-correcting codes. We study the interplay between the storage overhead, communication overhead and the achievable privacy and security (against malicious nodes) guarantees for private and secure distributed storage schemes. A crucial observation we make here is that in contrast to traditional secret sharing, novel secret sharing schemes can be designed to encode sparse input data into sparse output codewords if the privacy requirement is relaxed. We study this fundamental tradeoff between privacy and sparsity, which reduces the storage overhead of the scheme.

  1. M. Xhemrishi, M. Egger and R. Bitar, Efficient Private Storage of Sparse Machine Learning Data, IEEE Information Theory Workshop (ITW), invited paper, 2022.
  2. R. Bitar and S. Jaggi, Communication Efficient Secret Sharing in the Presence of Malicious Adversary, preprint, arXiv:2002.03374, 2020.

  3. R. Bitar and S. Jaggi, Communication Efficient Secret Sharing in the Presence of Malicious Adversary, IEEE International Symposium on Information Theory (ISIT), 2020.

  4. R. Bitar and S. El Rouayheb, Staircase Codes for Secret Sharing with Optimal Communication and Read Overheads, IEEE Transactions on Information Theory, Vol. 64, No. 2, February 2018. (arXiv:1512.02990)

  5. R. Bitar and S. El Rouayheb, Staircase-PIR: Universally Robust Private Information Retrieval, IEEE Information Theory Workshop (ITW), Guangzhou, 2018.

  6. R. Bitar and S. El Rouayheb, Staircase Codes for Secret Sharing with Optimal Communication and Read Overheads, IEEE International Symposium on Information Theory (ISIT), Barcelona, 2016.

  7. R. Bitar and S. El Rouayheb, Securing data against Limited-Knowledge Adversaries in Distributed Storage Systems, IEEE International Symposium on Information Theory (ISIT), Hong Kong, 2015.

 

Insertion/Deletion-Correcting Codes

Traditional error-correcting codes such as Reed-Solomon codes tackle the correction of substitution errors that arise in almost all current storage media. Due to the exponential increase of the amount of stored data, researchers and practitioners are aiming for a novel dense and sustainable long-term storage medium. DNA-based storage is arising as a competitive candidate that satisfies the required properties. Many practical and theoretical challenges need to be addressed until DNA-based storage becomes viable. From the theoretical perspective, the errors that arise in DNA-based storage are a mix of insertions, deletions and substitutions. The study of codes correcting insertions and deletions dates back to the 1960s to correct synchronisation errors and has now regained attention due to DNA-based storage. We study the fundamental limits codes able to correct insertions and deletions in d-dimensional arrays and construct such codes. We investigate the correction of insertions and deletions in many particular settings of interest.

  1. L. Welter, R. Bitar, A. Wachter-Zeh and E. Yaakobi, Multiple Criss-Cross Deletion and Insertion Correcting Codes, IEEE Transactions on Information Theory, Vol. 68, No. 6, June, 2022. (arXiv:2102.02727)

  2. E. Stylianou, L. Welter, R. Bitar, A. Wachter-Zeh and E. Yaakobi, Equivalence of Insertion/Deletion Correcting Codes for d-dimensional Arrays, IEEE International Symposium on Information Theory (ISIT), 2022.

  3. S. Kas Hanna and R. Bitar, Codes for Detecting Deletions and Insertions in Concatenated Strings, preprint, arXiv:2105.00212, 2021.

  4. S. Kas Hanna and R. Bitar, Codes for Detecting Deletions and Insertions in Concatenated Strings, IEEE International Symposium on Information Theory (ISIT), 2021.

  5. R. Bitar, S. Kas Hanna, N. Polyanskii and I. Vorobyev, Optimal Codes Correcting Localized Deletions, IEEE International Symposium on Information Theory (ISIT), 2021.

  6. L. Welter, R. Bitar, A. Wachter-Zeh and E. Yaakobi, Multiple Criss-Cross Deletion-Correcting Codes, IEEE International Symposium on Information Theory (ISIT), 2021.

  7. R. Bitar, L. Welter, I. Smagloy, A. Wachter-Zeh and E. Yaakobi, Criss-Cross Insertion and Deletion Correcting Codes, IEEE Transactions on Information Theory, Vol.67, No.12, December 2021. (arXiv:2004.14740)

  8. R. Bitar, I. Smagloy, L. Welter, A. Wachter-Zeh and E. Yaakobi, Criss-Cross Deletion Correcting Codes, IEEE International Symposium on Information Theory and its Applications (ISITA), 2020.

 

Private and Secure Network Coding

Network coding consists of using error-correcting codes for reliable data transmission through communication networks. The errors that arise in this setting can be corrected by shifting from the Hamming metric considered in traditional error-correcting codes and going to the rank metric that considers array codes and analyzes the rank of the error. We consider private and secure network codes that allow reliable transmission of private data in networks corrupted by an attacker that can eavesdrop and jam a subset of the communication links. We leverage the limitation of the adversary to guarantee higher transmission rates through the use of subspace codes.

  1. S. Li, R. Bitar, S. Jaggi and Y. Zhang, Network Coding with Myopic Adversaries, IEEE Journal on Selected Areas in Information Theory (JSAIT), Vol. 2, No. 4, December 2021. (arXiv:2102.09885).

  2. S. Li, R. Bitar, S. Jaggi and Y. Zhang, Network Coding with Myopic Adversaries, IEEE International Symposium on Information Theory (ISIT), 2021.