Content Safety for Generative Multimedia: Automated Evaluation and Re-Prompting for Age-Appropriate AI-Generated Content
Description
Generative AI models (e.g., LLMs, diffusion models) are increasingly used to create multimodal content (text + images) for applications like interactive storytelling, educational tools, and digital media. However, ensuring that generated content is safe, unbiased, and age-appropriate remains a critical challenge. Manual moderation is unscalable, and existing automated filters often lack contextual understanding or multimodal reasoning.
This thesis explores the development of automated pipelines to evaluate and refine AI-generated content, with a focus on:
- Real-time safety assessment of text-image pairs.
- Automatic re-prompting to guide models toward compliant outputs.
- Adaptability to diverse use cases (e.g., children’s toys, educational platforms).
Objectives
-
Multimodal Safety Evaluation:
- Investigate state-of-the-art metrics (e.g., CLIP-based similarity, toxicity scores, emotional valence) to detect unsafe or age-inappropriate content in text-image pairs.
- Develop a lightweight, interpretable scoring system combining:
- Text analysis (e.g., perspective API, custom fine-tuned classifiers).
- Image analysis (e.g., NSFW detectors, aesthetic/emotional classifiers).
- Cross-modal alignment (e.g., does the image match the text’s intent and safety constraints?).
-
Re-Prompting Strategies:
- Design adaptive prompting techniques to iteratively refine outputs (e.g., using reinforcement learning or constrained decoding).
- Explore few-shot learning to generalize safety rules across domains (e.g., fairy tales vs. scientific explanations).
-
Benchmarking and Evaluation:
- Curate a multimodal dataset of edge cases (e.g., subtle biases, ambiguous contexts).
- Compare against human annotations and existing tools (e.g., Google’s Perspective API, LAION filters).
- Optimize for latency and computational efficiency (critical for embedded/edge devices).