| || || |
A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction
Deep convolutional neural networks (DCNNs) have led to breakthrough results in numerous practical machine learning tasks, such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a classifier. The mathematical analysis of DCNNs for feature extraction was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on a wavelet transform followed by the modulus non-linearity in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. This paper complements Mallat’s results by developing a theory that encompasses general convolutional transforms, or in more technical parlance, general semi-discrete frames (including Weyl–Heisenberg filters, curvelets, shearlets, ridgelets, wavelets, and learned filters), general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), and general Lipschitz-continuous pooling operators emulating, e.g., sub-sampling and averaging. In addition, all of these elements can be different in different network layers. For the resulting feature extractor, we prove a translation invariance result of vertical nature in the sense of the features becoming progressively more translation-invariant with increasing network depth, and we establish deformation sensitivity bounds that apply to signal classes such as, e.g., band-limited functions, cartoon functions, and Lipschitz functions.
Local Sampling and Approximation of Operators with Bandlimited Kohn-Nirenberg Symbols
Recent sampling theorems allow for the recovery of operators with bandlimited
Kohn-Nirenberg symbols from their response to a single discretely supported identifer signal.
The available results are inherently non-local. For example, we show that in order to recover a
bandlimited operator precisely, the identifer cannot decay in time nor in frequency. Moreover, a
concept of local and discrete representation is missing from the theory. In this paper, we develop
tools that address these shortcomings.
We show that to obtain a local approximation of an operator, it is sufficient to test the
operator on a truncated and mollifed delta train, that is, on a compactly supported Schwarz
class function. To compute the operator numerically, discrete measurements can be obtained
from the response function which are localized in the sense that a local selection of the values
yields a local approximation of the operator.
Central to our analysis is to conceptualize the meaning of localization for operators with
bandlimited Kohn-Nirenberg symbol.