Paper Reading

ImageNet Classification with Deep Convolutional Neural Networks

references: you can read this papar via BTS-DSN or star/fork this github project

Abstract

To achieve a great result, using ImageNet LSVRC-2010 contest dataset, this network model ,including dropout and some other operations, is quite fancy.

1.Introduction

machine learning methods have achieved great result on small image dataset, but still been poor in big dataset.
To learn about thousands of objects from millions of images, we need a model with a large learning capacity.

Dataset

ImageNet is a dataset of over 15 million labeled high-resolution images belonging to roughly 22,000 categories.
to show which dataset this paper trained to test its result.

3.The Architecture!!!

1.ReLU Nonlinearity

why we use ReLU instead of sigmoid or tanh function?
In terms of training time with gradient descent, these saturating nonlinearities are much slower than the non-saturating nonlinearity f(x) = max(0, x).

ps : 当我们使用ReLU函数而不是sigmoid或者tanh函数进行激活时，主要原因在于其更快的速度。当然，我们可以使用Leaky ReLU甚至Randomized Leaky ReLU来防止其它一些问题。还有一些其它的变体，包括PReLU和RReLU。

2.Training on Multiple GPUs

Current GPUs are particularly well-suited to cross-GPU parallelization, as they are able to read from and write to one another’s memory directly, without going through host machine memory.using a method to parallel the training process.Notice the parallelization scheme that is employed in this paper, which are quite inspirsed.

ps : 在利用多个GPU进行并行处理的时候，需要注意之间的信息传递与交互。如何提高数据之间传输的效率，是能够提高计算速度的关键。

3.Local Response Normalization

using local normalization to aid generation.

$b_{x,y}^i=a_{x,y}^i/(k+\alpha \sum_{j=max(0,i-n/2)}^{min(N-1,i+n/2)}(a_{x,y}^i)^2)^\beta$

ps : 这一小节主要讲解了这个normalization的方式，这个公式的结果展示出来的效果在于，提高泛化能力（即b能更好地代表一个连续性的结果）简单分析三个超参数,k为一个偏移量，使得本身有一个偏移；n在于考虑其连续的范围，考虑泛化能力，在于一个权衡，$
\beta$在于调整参数的大小，使得与模型契合。

4. Overlapping Pooling

using overlapping can make our pooling result more precise than non-overlapping.We generally observe during training that models with overlapping pooling find it slightly more difficult to overfit

ps : 我们应该注意一个overlapping 与 non-overlapping之间的平衡，建议是首先不用overlapping，如果得到的结果处于欠拟合状态，则修改而使用overlapping pooling

5. Overall Architecture

the overall architecture will be shown in figure 1 below.

ps : 主要理解这个结构一些中间的部分，需要掌握对其的理解并且与前面的相贯通。

Reducing Overfitting

too many parameters in this model(60 million) are not insufficient to learn.

1.Data Augmentation

The first form of data augmentation consists of generating image translations and horizontal reflections.
The second form of data augmentation consists of altering the intensities of the RGB channels in training images.using PCA on the set.

$[p1,p2,p3][\alpha_1 \lambda_1,\alpha_2 \lambda_2,\alpha_3 \lambda_3]^T$

ps: 由于原来的网络模型过深，所以需要避免过拟合，第一种方法就是增加数据量，可以通过对图像的一些处理来进行，包括进行图像的裁剪和对颜色进行处理。

2.Dropout

dropout 是一个重要的内容，我在附加的部分主要说明，这里就直接跳过。这里也只是运用了这种方法，没有做什么修改。

5.Details of learning

We trained our models using stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weight decay of 0.0005.

the update relu for weight w was:

$v_{i+1} := 0.9v_i-0.0005 \epsilon w_i - \epsilon <\frac{\partial L}{\partial w}|_{w_i}>_{D_i}$ $w_{i+1} := w_i+v_{i+1}$

6.Results

this part is not so necessary, you should look through the paper to learn the result.

7.Discussion

the depth really is important for achieving our results.you can see that, from the discussion in this paper, deep network on image classfication have been aroused.

my conclusion

这篇文章主要提出了AlexNet这种结构，其主要有几个主要的因素构成：

使用两个GPU交互训练的网络
使用dropout方法进行正则化的方法
使用深层的深度网络进行学习（这是使用深度神经网络的一个重要的开端，它有后来的工作很大的启发作用）
使用data augmentation来进行数据的预处理，防止过拟合
使用了normalization方法来处理原来的数据，同时采用ReLU激活函数，使得整个计算过程更加快捷，且没有降低最终的效果。

AlexNet