Information Retrieval in High Dimensional Data
Dozent: PD Dr. Martin Kleinsteuber
Assistenten: Anwaar Muhammad Umer/Rayyan Khan
Zielgruppe: Master
ECTS: 6
Umfang: 2/2 (SWS Vorlesung/Übung)
Turnus: Wintersemester
Anmeldung: Wahlfach
Zulassungsvoraussetzungen: -
Zeit & Ort:  
Beginn:  

Contents

From face recognition to gene data analysis, from the problem of analyzing motor sensor data to a concise description of human body motion: Engineers are often faced with the problem of analyzing high dimensional data, i.e. data acquired  from many sensors. The crucial step in retrieving information out of this huge amount of data is to reduce its high dimension in an intelligent way, which is also important for the task of visualizing high dimensional data.
Starting with an overview of applications and a very basic method of dimensionality reduction, the linear principle component analysis, we investigate modern methods and their field of applications.

  • Decisions from Data
  • Curse of Dimensionality
  • From Phenomena to Data
  • Logistic Regression
  • Principal Component Analysis (PCA)
  • Linear Discriminant Analysis
  • Support Vector Machines
  • Kernel PCA
  • Feedforward Neural Networks

At the end of the lecture, students understand several state-of-the-art dimensionality reduction and data analysis techniques and are able to implement them into Python.

The application examples will mainly be based on either image processing or natural language processing tasks, since they provide a paradigm for analyzing high dimensional data.

Moreover, during the lab course, students will have the possibility to improve their presentation and teamwork skills. This includes the design of a poster or powerpoint presentation.

Prerequisite: Basic knowledge of linear algebra and statistics as well as basic knowledge in Python (or the motivation to learn it).

Teaching Format

The course consists partially of frontal teaching with black board and beamer slides, but also of discussions and mumble groups to learn new definitions and concepts by means of simple examples.
The tutorials consist of discussing the exercises and programming tasks and supporting the students in solving them. Complementary presentations for mathematical questions are provided if it is required.

Recommended Literature

  • C.C. Aggarwal: Data Mining: The Text Book. Springer 2015.
  • C.M. Bishop: Pattern Recognition and Machine Learning. Springer Science and Business, 2006.
  • J. Izenman: Modern Multivariate Statistical Techniques. Springer 2008.
  • J.A. Lee, M. Verleysen: Nonlinear Dimensionality Reduction, Springer 2007.
  • T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning, Springer 2009.

Exam

  • Homework (33%)
  • Written exam (66%)