Membership Inference Attacks Against against Machine Learning Models
Beschreibung
It has been widely acknowledged that machine learning models can leak a significant amount of (potentially private) information about their training data. Analyzing the amount of information leaked about the training data is important to judge the model's privacy. In practice, so-called membership inference attacks [1,2] are employed for such a privacy audit. A membership inference attack tries to predict whether a particular data sample was used in the training of a machine learning model. Besides empirical research, membership inference attacks have been put on a theoretical foundation through a Bayesian decision framework [3].
The goal of this seminar topic is to understand state-of-the-art membership inference attacks [1,2] and the Bayesian decision framework [3]. Students are encouraged to produce their own results using openly available implementations.
[1] N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis and F. Tramèr, "Membership Inference Attacks From First Principles," 2022 IEEE Symposium on Security and Privacy (SP), 2022.
[2] S. Zarifzadeh, P. Liu, R. Shokri, "Low-Cost High-Power Membership Inference Attacks," ICML 2024.
[3] A. Sablayrolles, M. Douze, C. Schmid, Y. Ollivier, H. Jegou, "White-box vs Black-box: Bayes Optimal Strategies for Membership Inferenc," Proceedings of the 36th International Conference on Machine Learning, 2019.
Voraussetzungen
Compulsory:
- Solid background in probability theory and hypothesis testing
- Basic knowledge about machine learning methods and neural networks
Optional:
- Implementations of machine learning methods in python
Kontakt
E-mail: luis.massny@tum.de
Betreuer:
Fundamental limits of Byzantine-resilient distributed learning
Beschreibung
In a distributed learning setting, multiple worker machines are hired to help a main server with the expensive training of a machine learning model. Each worker is assigned a subtask, from which the main server tries to reconstruct the total computation result.
In the presence of faulty or malicious workers, called Byzantine workers, a common strategy is to distribute subtasks redundantly [3]. Since the redundancy introduces large computation overheads for the workers, strategies to reduce this overhead are required. One approach is to use interactive consistency checks at the main server, which can reduce the redundancy by up to 50% [1].
The interactive consistency checks are not for free, but cause additional computation and communication cost. For specific parameter regimes, this cost is well-studied. However, it is unkown how large this cost is in general. Therefore, we ask the following research questions:
1) How much computation is needed to guarantee error-free reconstruction of the computation result?
2) How much communication is needed?
3) What is the best trade-off between communication and computation cost?
The focus of this project is to study these research questions fundamentally. That is, we aim at understanding what the least amount of communication and computation possible is. The student will analyze these questions through mathematical tools, such as graph theory or information theory. The findings shall be compared against existing schemes [1,2] to evaluate their (sub-)optimality.
[1] C. Hofmeister, L. Maßny, E. Yaakobi and R. Bitar, "Byzantine-Resilient Gradient Coding Through Local Gradient Computations," in IEEE Transactions on Information Theory, vol. 71, no. 4, pp. 3142-3156, April 2025, doi: 10.1109/TIT.2025.3542896.
[2] S. Jain, L. Maßny, C. Hofmeister, E. Yaakobi and R. Bitar, "Interactive Byzantine-Resilient Gradient Coding for General Data Assignments," 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 2024, pp. 3273-3278, doi: 10.1109/ISIT57864.2024.10619596.
[3] R. Tandon, Q. Lei, A. G. Dimakis, N. Karampatziakis, "Gradient Coding: Avoiding Stragglers in Distributed Learning", in Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3368-3376, 2017.
Voraussetzungen
Mandatory:
- strong mathematical background
- prior knowledge in information theory
- basic knowledge in graph theory
- interest in theoretical fundamental research
Recommended:
- proficiency in algebra and probability theory
- basic knowledge in coding theory
Kontakt
luis.massny@tum.de, christoph.hofmeister@tum.de