Master's Theses
Private and Secure Federated Learning
Description
In federated learning, a machine learning model shall be trained on private user data with the help of a central server, the so-called federator. This setting differs from other machine learning settings in that the user data shall not be shared with the federator for privacy reasons and/or to decrease the communication load of the system.
Even though only intermediate results are shared, extra care is necessary to guarantee data privacy. An additional challenge arises if the system includes malicious users that breach protocol and send corrupt computation results.
The goal of this work is to design, implement and analyze coding- and information-theoretic solutions for privacy and security in federated learning.
Prerequisites
- Information Theory
- Coding Theory (e.g., Channel Coding)
- Machine Learning (Theory and Practice)
Supervisor:
Coding theory for NVMs
Coding problems motivated by properties and asymmetries of NVMs
Description
Non-volatile memories (NVMs) are electronic data-storage technologies that do not require a continuous power supply to retain data; unlike traditional magnetic or optical media, they do not utilize mechanically movable components and can therefore offer better performance, and allow for three-dimensional scaling of storage devices. Under most realistic workloads, they also offer better energy efficiency.
However, these technologies also feature imbalances in behavior, performance and consequences, between the processes of reading data and writing it. To wit, in memory cells which represent data by the level of held charge (traditionally allowing for representation of several logical levels), the process of charge-injection is a simple and efficient, whereas charge-depletion is both technically complex (requiring the depletion of entire blocks of cells) and destructive, a main driver of cell-degradation over the device's life cycle.
Different coding theoretic approaches have been explored to alleviate this imbalance, including coding schemes that delay charge-depletion cycles [1]--[3], and such that seek to mitigate the effects of defective memory cells once those appear in a device [4], [5].
Theses are available in extending either approach, as well as combining them.
[1] A. Jiang, R. Mateescu, M. Schwartz and J. Bruck, "Rank Modulation for Flash Memories," in IEEE Transactions on Information Theory, vol. 55, no. 6, pp. 2659-2673, June 2009, doi: 10.1109/TIT.2009.2018336.
[2] M. Horovitz and E. Yaakobi, "On the Capacity of Write-Once Memories," in IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 5124-5137, Aug. 2017, doi: 10.1109/TIT.2017.2689034.
[3] M. Horovitz and T. Etzion, "Local Rank Modulation for Flash Memories," in IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1705-1713, March 2019, doi: 10.1109/TIT.2018.2859403.
[4] V. Sidorenko, G. Schmidt, E. Gabidulin, M. Bossert and V. Afanassiev, "On polyalphabetic block codes," IEEE Information Theory Workshop, 2005., Rotorua, New Zealand, 2005, pp. 4 pp.-, doi: 10.1109/ITW.2005.1531889.
[5] Y. Yehezkeally, H. A. Kim, S. Puchinger and A. Wachter-Zeh, "Bounds on Mixed Codes with Finite Alphabets," 2023 IEEE Information Theory Workshop (ITW), Saint-Malo, France, 2023, pp. 389-394, doi: 10.1109/ITW55543.2023.10161655.
Supervisor:
DNA-based data storage
Coding theoretic problems motivated by DNA-based stroage
Description
Contemporary global demand for storage capacity is increasing exponentially, even as traditional magnetic storage media has exhausted its potential for optimization, and requires unsustainable investments for both production and maintenance, in capital, energy and space.
One promising potential medium for archival data storage is DNA; it features high density, extreme longevity, convenient scalability and a lower maintenance footprint. Complete DNA-based storage ecosystems are in active development, raising multiple coding theoretic (as well as engineering, algorithmic, and biotechnological) challenges; correspondingly, increasing attention is recently given to the study of such systems, and they are drawing significant investments from both governments
and the private sector.
The following theses are available (other topics in this domain will also be entertained):
- The torn paper channel models the effects of DNA strand breakage in storage or processing. It has been studied from both an average-case and a worst-case [1] perspective, with several distinct adversarial models. Recently, the t-break model was studied [2] as a refinement of the previously studied min-max constraint. These developments open the way to study new problems in this setting.
- Reconstruction from substring spectra is a model motivated by the process of shot-gun sequencing, where short strands are drawn sufficiently many times to reconstruct a long information sequence. Adapting existing literature [3,4] to more realistic models is an open problem.
- Nanopore sequencing is a nascent technology that reads single-stranded DNA molecules by passing them through a narrow pore while passing electric current through it. More work studying its properties and designing codes capable of handling its relatively high error rate, extending existing literature [5,6], is necessary.
- Duplications are a type of mutation occurring in the process of cell replication, which may be responsible for large portions of our current genome. For data storage schemes in in vivo DNA (e.g., for watermarking research material) it is an error model that needs to be countered. We aim to extend and build upon existing literature [7,8].
[1] D. Bar-Lev, S. Marcovich, E. Yaakobi and Y. Yehezkeally, "Adversarial Torn-Paper Codes," in IEEE Transactions on Information Theory, vol. 69, no. 10, pp. 6414-6427, Oct. 2023, doi: 10.1109/TIT.2023.3292895.
[2] C. Wang, J. Sima and N. Raviv, "Break-Resilient Codes for Forensic 3D Fingerprinting," arXiv preprint arXiv:2310.03897v1 [cs.IT], Oct. 2023, doi: https://doi.org/10.48550/arXiv.2310.03897.
[3] Y. Yehezkeally, D. Bar-Lev, S. Marcovich and E. Yaakobi, "Generalized Unique Reconstruction From Substrings," in IEEE Transactions on Information Theory, vol. 69, no. 9, pp. 5648-5659, Sept. 2023, doi: 10.1109/TIT.2023.3269124.
[4] H. Wei, M. Schwartz, G. Ge, "Reconstruction from Noisy Substrings," arXiv preprint arXiv:2312.04790v1 [cs.IT], Dec. 2023, doi: 10.48550/arXiv.2312.04790.
[5] A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh and E. Yaakobi, "Error-Correcting Codes for Nanopore Sequencing," in IEEE Transactions on Information Theory, doi: 10.1109/TIT.2024.3380615.
[6] A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh and E. Yaakobi, "Correcting a Single Deletion in Reads from a Nanopore Sequencer," arXiv preprint arXiv:2401.15939v2 [cs.IT], May. 2024, doi: 10.48550/arXiv.2401.15939.
[7] S. Jain, F. Farnoud Hassanzadeh, M. Schwartz and J. Bruck, "Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms," in IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 4996-5010, Aug. 2017, doi: 10.1109/TIT.2017.2688361.
[8] D. Goshkoder, N. Polyanskii and I. Vorobyev, "Codes Correcting Long Duplication Errors," in IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, doi: 10.1109/TMBMC.2024.3403755.
Supervisor:
Homomorphic Encryption for Machine Learning
Partial/Somewhat Homomorphic Encryption, Federated Learning
Description
Homomorphic encryption (HE) schemes are increasingly attracting attention in the era of large scale computing. While lattice-based approaches have been well-studied, recently first progress has been made towards establishing code-based alternatives. Preliminary results show that such alterative approaches might enable undiscovered functionalities not present in current lattice-based schemes. In this project, we particularily study novel code-based Partial/Somewhat HE schemes tailored to applications in artificial intelligence and federated learning.
After familiarizing with SotA methods in relevant fields (such as [1]), the student should analyze the requirements for use-cases at hand and explore suitable modifications to current schemes and novel approaches.
[1] Aguilar-Melchor, Carlos, Victor Dyseryn, and Philippe Gaborit, "Somewhat Homomorphic Encryption based on Random Codes," Cryptology ePrint Archive (2023).
Prerequisites
- Strong foundation in linear algebra
- Channel Coding
- Security in Communications and Storage
- Basic understanding of Machine Learning concepts
Supervisor:
Research Internships (Forschungspraxis)
Private and Secure Federated Learning
Description
In federated learning, a machine learning model shall be trained on private user data with the help of a central server, the so-called federator. This setting differs from other machine learning settings in that the user data shall not be shared with the federator for privacy reasons and/or to decrease the communication load of the system.
Even though only intermediate results are shared, extra care is necessary to guarantee data privacy. An additional challenge arises if the system includes malicious users that breach protocol and send corrupt computation results.
The goal of this work is to design, implement and analyze coding- and information-theoretic solutions for privacy and security in federated learning.
Prerequisites
- Information Theory
- Coding Theory (e.g., Channel Coding)
- Machine Learning (Theory and Practice)
Supervisor:
Coding theory for NVMs
Coding problems motivated by properties and asymmetries of NVMs
Description
Non-volatile memories (NVMs) are electronic data-storage technologies that do not require a continuous power supply to retain data; unlike traditional magnetic or optical media, they do not utilize mechanically movable components and can therefore offer better performance, and allow for three-dimensional scaling of storage devices. Under most realistic workloads, they also offer better energy efficiency.
However, these technologies also feature imbalances in behavior, performance and consequences, between the processes of reading data and writing it. To wit, in memory cells which represent data by the level of held charge (traditionally allowing for representation of several logical levels), the process of charge-injection is a simple and efficient, whereas charge-depletion is both technically complex (requiring the depletion of entire blocks of cells) and destructive, a main driver of cell-degradation over the device's life cycle.
Different coding theoretic approaches have been explored to alleviate this imbalance, including coding schemes that delay charge-depletion cycles [1]--[3], and such that seek to mitigate the effects of defective memory cells once those appear in a device [4], [5].
Theses are available in extending either approach, as well as combining them.
[1] A. Jiang, R. Mateescu, M. Schwartz and J. Bruck, "Rank Modulation for Flash Memories," in IEEE Transactions on Information Theory, vol. 55, no. 6, pp. 2659-2673, June 2009, doi: 10.1109/TIT.2009.2018336.
[2] M. Horovitz and E. Yaakobi, "On the Capacity of Write-Once Memories," in IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 5124-5137, Aug. 2017, doi: 10.1109/TIT.2017.2689034.
[3] M. Horovitz and T. Etzion, "Local Rank Modulation for Flash Memories," in IEEE Transactions on Information Theory, vol. 65, no. 3, pp. 1705-1713, March 2019, doi: 10.1109/TIT.2018.2859403.
[4] V. Sidorenko, G. Schmidt, E. Gabidulin, M. Bossert and V. Afanassiev, "On polyalphabetic block codes," IEEE Information Theory Workshop, 2005., Rotorua, New Zealand, 2005, pp. 4 pp.-, doi: 10.1109/ITW.2005.1531889.
[5] Y. Yehezkeally, H. A. Kim, S. Puchinger and A. Wachter-Zeh, "Bounds on Mixed Codes with Finite Alphabets," 2023 IEEE Information Theory Workshop (ITW), Saint-Malo, France, 2023, pp. 389-394, doi: 10.1109/ITW55543.2023.10161655.
Supervisor:
DNA-based data storage
Coding theoretic problems motivated by DNA-based stroage
Description
Contemporary global demand for storage capacity is increasing exponentially, even as traditional magnetic storage media has exhausted its potential for optimization, and requires unsustainable investments for both production and maintenance, in capital, energy and space.
One promising potential medium for archival data storage is DNA; it features high density, extreme longevity, convenient scalability and a lower maintenance footprint. Complete DNA-based storage ecosystems are in active development, raising multiple coding theoretic (as well as engineering, algorithmic, and biotechnological) challenges; correspondingly, increasing attention is recently given to the study of such systems, and they are drawing significant investments from both governments
and the private sector.
The following theses are available (other topics in this domain will also be entertained):
- The torn paper channel models the effects of DNA strand breakage in storage or processing. It has been studied from both an average-case and a worst-case [1] perspective, with several distinct adversarial models. Recently, the t-break model was studied [2] as a refinement of the previously studied min-max constraint. These developments open the way to study new problems in this setting.
- Reconstruction from substring spectra is a model motivated by the process of shot-gun sequencing, where short strands are drawn sufficiently many times to reconstruct a long information sequence. Adapting existing literature [3,4] to more realistic models is an open problem.
- Nanopore sequencing is a nascent technology that reads single-stranded DNA molecules by passing them through a narrow pore while passing electric current through it. More work studying its properties and designing codes capable of handling its relatively high error rate, extending existing literature [5,6], is necessary.
- Duplications are a type of mutation occurring in the process of cell replication, which may be responsible for large portions of our current genome. For data storage schemes in in vivo DNA (e.g., for watermarking research material) it is an error model that needs to be countered. We aim to extend and build upon existing literature [7,8].
[1] D. Bar-Lev, S. Marcovich, E. Yaakobi and Y. Yehezkeally, "Adversarial Torn-Paper Codes," in IEEE Transactions on Information Theory, vol. 69, no. 10, pp. 6414-6427, Oct. 2023, doi: 10.1109/TIT.2023.3292895.
[2] C. Wang, J. Sima and N. Raviv, "Break-Resilient Codes for Forensic 3D Fingerprinting," arXiv preprint arXiv:2310.03897v1 [cs.IT], Oct. 2023, doi: https://doi.org/10.48550/arXiv.2310.03897.
[3] Y. Yehezkeally, D. Bar-Lev, S. Marcovich and E. Yaakobi, "Generalized Unique Reconstruction From Substrings," in IEEE Transactions on Information Theory, vol. 69, no. 9, pp. 5648-5659, Sept. 2023, doi: 10.1109/TIT.2023.3269124.
[4] H. Wei, M. Schwartz, G. Ge, "Reconstruction from Noisy Substrings," arXiv preprint arXiv:2312.04790v1 [cs.IT], Dec. 2023, doi: 10.48550/arXiv.2312.04790.
[5] A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh and E. Yaakobi, "Error-Correcting Codes for Nanopore Sequencing," in IEEE Transactions on Information Theory, doi: 10.1109/TIT.2024.3380615.
[6] A. Banerjee, Y. Yehezkeally, A. Wachter-Zeh and E. Yaakobi, "Correcting a Single Deletion in Reads from a Nanopore Sequencer," arXiv preprint arXiv:2401.15939v2 [cs.IT], May. 2024, doi: 10.48550/arXiv.2401.15939.
[7] S. Jain, F. Farnoud Hassanzadeh, M. Schwartz and J. Bruck, "Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms," in IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 4996-5010, Aug. 2017, doi: 10.1109/TIT.2017.2688361.
[8] D. Goshkoder, N. Polyanskii and I. Vorobyev, "Codes Correcting Long Duplication Errors," in IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, doi: 10.1109/TMBMC.2024.3403755.
Supervisor:
Homomorphic Encryption for Machine Learning
Partial/Somewhat Homomorphic Encryption, Federated Learning
Description
Homomorphic encryption (HE) schemes are increasingly attracting attention in the era of large scale computing. While lattice-based approaches have been well-studied, recently first progress has been made towards establishing code-based alternatives. Preliminary results show that such alterative approaches might enable undiscovered functionalities not present in current lattice-based schemes. In this project, we particularily study novel code-based Partial/Somewhat HE schemes tailored to applications in artificial intelligence and federated learning.
After familiarizing with SotA methods in relevant fields (such as [1]), the student should analyze the requirements for use-cases at hand and explore suitable modifications to current schemes and novel approaches.
[1] Aguilar-Melchor, Carlos, Victor Dyseryn, and Philippe Gaborit, "Somewhat Homomorphic Encryption based on Random Codes," Cryptology ePrint Archive (2023).
Prerequisites
- Strong foundation in linear algebra
- Channel Coding
- Security in Communications and Storage
- Basic understanding of Machine Learning concepts