Skip to main content

卷积神经网络 CNN 详解：原理、结构与应用指南 | AI技术

July 1, 2025 · 4 min read

Present version 1 - Convolutional Layer

Image Classification

对电脑来说，一张图片是一个三维的 Tensor(维度大于 2 的矩阵 --> Tensor)。

Observation 1

我们可以紧紧通过图片的某些特征判断图片中的物体可能是什么?

Simplification 1

一个 neural 不需要看整张图片。

Simplification 1 - Typical Setting

Observation 2

The same patterns appear in different regions.

Simplification 2

parameter sharing.

Simplification 2 - Typical Settings

Each receptive has a set of neurons (e.g., 64 neurons).

Each receptive field has the neurons with the same set of parameters.

Benefit of Convolutional Layer

Receptive \space Field + Parameter \space Sharing = Convolutional \space Layer

Present version 2 - Convolutional Layer

Filter 1

Filter 2

Convolutional Layer

1 Channel is 1 filter.

Multiple Convolutional Layer

如果我们在 multiple 层还是选 3x3的范围的话，是否会导致无法识别大范围的 pattern。答案是不会的，因为经过第一层转化后的数据中，同一个 3x3 的范围其实是之前多个 3x3的数据叠加，范围实际上是比较大的。

Comparison of Two Stories

Filter 共用了同样的 wights，并守备不同的范围；
Share weight 这件事情，其实就是我们把 Filter 扫过整张图片；
把 Filter 扫过整张图片这件事请，就叫 Convolution；
总的来说，所谓的把 Filter 扫过整张图片这件事请就是不同的 receptive filelds, neural 可以共用参数，这组共用参数就是一个 filter。

Oberservation 3

Subsampling the pixels will not change the object.
把大图片缩小(把奇数项的 column 拿掉，把偶数项的 row 拿掉)，图片的物体不会变。

Pooling - Max Pooling

Pooling 没有需要学习的参数，所以它不是 layer，所以有人说它比较像 activation function (Sigmoid, ReLU)。

Pooling 会把图片变小。(这是为了减少运算量)

The whole CNN

Application: Playing Go

Why CNN for Go Playing ?

Some patterns are much smaller than the whole image

The same patterns appear in different regions.

Subsampling the pixels will not change the object --> not use pooling in Playing Go

More Applications

To learn more

CNN is not invariant to scaling and rotation (we need data augmentation :) ).