Geometric 3D Gaussian Splatting Compression
Beschreibung
3D Gaussian Splatting (3DGS) has demonstrated strong performance in high-quality novel view synthesis and real-time rendering by representing scenes as dense sets of Gaussians with associated attributes [1]. This representation captures fine geometric detail and view-dependent appearance efficiently at render time, contributing to its practical success. However, trained 3DGS models often contain millions of Gaussians, resulting in substantial storage and memory requirements that limit scalability and deployment on resource-constrained systems.
This project focuses on exploiting geometric relationships among Gaussians (i.e., primitives) to reduce redundancy in the attribute space. The goal is to apply sampling and sparsification techniques to decrease the number of stored primitives while preserving perceptual and structural fidelity. The proposed methodologies will be evaluated against existing compression approaches [2]. The project aims to disseminate the results in highly recognized scientific venues; hence, the applicants should be motivated to do research.
[1] Kerbl, Bernhard, et al. "3d gaussian splatting for real-time radiance field rendering." ACM Trans. Graph. 42.4. 2023.
[2] Bagdasarian, Milena T., et al. "3dgs. zip: A survey on 3d gaussian splatting compression methods." Computer Graphics Forum. Vol. 44. No. 2. 2025.
Voraussetzungen
Python, PyTorch, graphs, Gaussian Splatting, motivation for research
Kontakt
cem.eteke@tum.de
Betreuer:
Content Safety for Generative Multimedia: Automated Evaluation and Re-Prompting for Age-Appropriate AI-Generated Content
Beschreibung
Generative AI models (e.g., LLMs, diffusion models) are increasingly used to create multimodal content (text + images) for applications like interactive storytelling, educational tools, and digital media. However, ensuring that generated content is safe, unbiased, and age-appropriate remains a critical challenge. Manual moderation is unscalable, and existing automated filters often lack contextual understanding or multimodal reasoning.
This thesis explores the development of automated pipelines to evaluate and refine AI-generated content, with a focus on:
- Real-time safety assessment of text-image pairs.
- Automatic re-prompting to guide models toward compliant outputs.
- Adaptability to diverse use cases (e.g., children’s toys, educational platforms).
Objectives
-
Multimodal Safety Evaluation:
- Investigate state-of-the-art metrics (e.g., CLIP-based similarity, toxicity scores, emotional valence) to detect unsafe or age-inappropriate content in text-image pairs.
- Develop a lightweight, interpretable scoring system combining:
- Text analysis (e.g., perspective API, custom fine-tuned classifiers).
- Image analysis (e.g., NSFW detectors, aesthetic/emotional classifiers).
- Cross-modal alignment (e.g., does the image match the text’s intent and safety constraints?).
-
Re-Prompting Strategies:
- Design adaptive prompting techniques to iteratively refine outputs (e.g., using reinforcement learning or constrained decoding).
- Explore few-shot learning to generalize safety rules across domains (e.g., fairy tales vs. scientific explanations).
-
Benchmarking and Evaluation:
- Curate a multimodal dataset of edge cases (e.g., subtle biases, ambiguous contexts).
- Compare against human annotations and existing tools (e.g., Google’s Perspective API, LAION filters).
- Optimize for latency and computational efficiency (critical for embedded/edge devices).