一小时培训之神经网络入门

2019年5月18日 283次阅读来源: linxinzhe

系列培训目录

➡️神经网络（Neural Networks）⬅️

卷积神经网络（Convolutional Neural Networks）

循环神经网络（Recurrent Neural Networks）

生成对抗神经网络（Generative Adversarial Networks）

神经网络（Neural Networks）

最简的神经神经网络 — 一个神经元

《一小时培训之神经网络入门》 image

组成：
- 参数：用x表示
- 权重：用w表示
- 偏差：用b表示
- 激活函数：用f(h)表示
数学形式：
image
其中f表示激活函数，通常用
image
sigmoid值在0-1之间的数值很像概率适合做分类

《一小时培训之神经网络入门》 image

如何找出这样的函数？

方法：

用监督学习的方式训练：
- 让机器找出使数据和目标之间的误差最小的函数
  - 告诉机器衡量误差的函数
  - 用梯度下降（Gradient Descent）更新神经元的权重(w)，使得误差的方程最小

常用的误差函数：

回归问题用：平方差之和（the sum of squared errors）:
image
其中： image
是为了方便计算在求导时可去除平方，u是每行，j表示每列
分类问题用：最小交叉墒(后面讲)
image

梯度下降 – 数学

用链式法则去求误差函数对于权重w的偏微分，用来更新神经网络的权重w
存在的问题：
只看梯度的话，会卡在局部最优值

《一小时培训之神经网络入门》 image

数学求法：
为了更新权重，就要求误差函数E对于权重w的偏导，乘以学习率来控制学习速度
η成为learning rate，表示每次更新权重w的步长，用来控制学习速度

《一小时培训之神经网络入门》 image

求误差函数E对于权重w的偏导的求法：
目标：

《一小时培训之神经网络入门》 image

链式法则：

《一小时培训之神经网络入门》 image

再用一次链式法则：

《一小时培训之神经网络入门》 image

带入后，再用一次链式法则：

《一小时培训之神经网络入门》 image

最后，做替换：

《一小时培训之神经网络入门》 image

最终结果：

《一小时培训之神经网络入门》 image

定义：

《一小时培训之神经网络入门》 image

梯度下降 – 代码实现

import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

def sigmoid_prime(x):
    """
    # Derivative of the sigmoid function
    """
    return sigmoid(x) * (1 - sigmoid(x))

learnrate = 0.5
x1,x2,x3,x4 = 1, 2, 3, 4
y = 0.5

# Initial weights
w1,w2,w3,w4 = 0.5, -0.5, 0.3, 0.1

### Calculate one gradient descent step for each weight
### Note: Some steps have been consilated, so there are
###       fewer variable names than in the above sample code

# TODO: Calculate the node's linear combination of inputs and weights
h = x1*w1+x2*w2+x3*w3+x4*w4

# TODO: Calculate output of neural network
y_hat = sigmoid(h)

# TODO: Calculate error of neural network
error = (y - y_hat)

# TODO: Calculate the error term
#       Remember, this requires the output gradient, which we haven't
#       specifically added a variable for.
error_term = error * sigmoid_prime(h)
# Note: The sigmoid_prime function calculates sigmoid(h) twice,
#       but you've already calculated it once. You can make this
#       code more efficient by calculating the derivative directly
#       rather than calling sigmoid_prime, like this:
# error_term = error * nn_output * (1 - nn_output)

# TODO: Calculate change in weights
del_w = learnrate * error_term * x

print('Neural Network output:')
print(nn_output)
print('Amount of Error:')
print(error)
print('Change in Weights:')
print(del_w)

Neural Network output:
0.689974481128
Amount of Error:
-0.189974481128
Change in Weights:
[-0.02031869 -0.04063738 -0.06095608 -0.08127477]

训练方法

迭代直到误差最小：

正向传播，获得预测值：沿着神经网络，矩阵点乘，计算出预测值\hat y。
反向传播，获得每层的误差梯度：用\hat y计算误差函数，反向传播误差。
更新权重：根据误差更新权重

以预测研究生是否能入学为例 – 单个神经元版本

《一小时培训之神经网络入门》 image

读取原始数据

import numpy as np
import pandas as pd
admissions=pd.read_csv("entry_admission.csv")
admissions.head()

<div>
<table border=”1″ class=”dataframe”>
<thead>
<tr style=”text-align: right;”>
<th></th>
<th>admit</th>
<th>gre</th>
<th>gpa</th>
<th>rank</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>380</td>
<td>3.61</td>
<td>3</td>
</tr>
<tr>
<th>1</th>
<td>1</td>
<td>660</td>
<td>3.67</td>
<td>3</td>
</tr>
<tr>
<th>2</th>
<td>1</td>
<td>800</td>
<td>4.00</td>
<td>1</td>
</tr>
<tr>
<th>3</th>
<td>1</td>
<td>640</td>
<td>3.19</td>
<td>4</td>
</tr>
<tr>
<th>4</th>
<td>0</td>
<td>520</td>
<td>2.93</td>
<td>4</td>
</tr>
</tbody>
</table>
</div>

数据处理

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(42)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

features.head()

targets.head()

209    0
280    0
33     1
210    0
93     0
Name: admit, dtype: int64

单神经元版本

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))

# TODO: We haven't provided the sigmoid_prime function like we did in
#       the previous lesson to encourage you to come up with a more
#       efficient solution. If you need a hint, check out the comments
#       in solution.py from the previous lecture.

# Use to same seed to make debugging easier
np.random.seed(42)

n_records, n_features = features.shape
last_loss = None

# Initialize weights
weights = np.random.normal(scale=1 / n_features**.5, size=n_features)

# Neural Network hyperparameters
epochs = 1000
learnrate = 0.5

for e in range(epochs):#训练的 代 数
    del_w = np.zeros(weights.shape)
    for x, y in zip(features.values, targets):
        # Loop through all records, x is the input, y is the target

        # Note: We haven't included the h variable from the previous
        #       lesson. You can add it if you want, or you can calculate
        #       the h together with the output

        # TODO: Calculate the output
        output = sigmoid(np.dot(x,weights))

        # TODO: Calculate the error
        error = y-output

        # TODO: Calculate the error term
        error_term = error*output*(1-output)

        # TODO: Calculate the change in weights for this sample
        #       and add it to the total weight change
        del_w += error_term*x

    # TODO: Update weights using the learning rate and the average change in weights
    weights += learnrate*del_w

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        out = sigmoid(np.dot(features, weights))
        loss = np.mean((out - targets) ** 2)
        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss


# Calculate accuracy on test data
tes_out = sigmoid(np.dot(features_test, weights))
predictions = tes_out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Train loss:  0.286196010415
Train loss:  0.257761346594
Train loss:  0.257722034703
Train loss:  0.257722749419   WARNING - Loss Increasing
Train loss:  0.257722752361   WARNING - Loss Increasing
Train loss:  0.257722752309
Train loss:  0.257722752309
Train loss:  0.257722752309   WARNING - Loss Increasing
Train loss:  0.257722752309   WARNING - Loss Increasing
Train loss:  0.257722752309   WARNING - Loss Increasing
Prediction accuracy: 0.725

神经网络 – 由神经元组成

通过非线性的激活函数的神经元组合起来就是神经网络。
能得出非线性的函数，从而具备找到各种各样函数的能力。

《一小时培训之神经网络入门》 image

数学形式：
矩阵相乘，每一隐含层是一个矩阵（图待加上偏差）

《一小时培训之神经网络入门》 image

神经网络 – 代码表示

import numpy as np

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1/(1+np.exp(-x))

# Network size
N_input = 4
N_hidden = 3
N_output = 2

np.random.seed(42)
# Make some fake data
X = np.random.randn(N_input)

weights_input_to_hidden = np.random.normal(0, scale=0.1, size=(N_input, N_hidden))
weights_hidden_to_output = np.random.normal(0, scale=0.1, size=(N_hidden, N_output))

# element-wise
# TODO: Make a forward pass through the network
hidden_layer_in = np.dot(X, weights_input_to_hidden)
hidden_layer_out = sigmoid(hidden_layer_in)

print('Hidden-layer Output:')
print(hidden_layer_out)

output_layer_in = np.dot(hidden_layer_out, weights_hidden_to_output)
output_layer_out = sigmoid(output_layer_in)

print('Output-layer Output:')
print(output_layer_out)

Hidden-layer Output:
[ 0.41492192  0.42604313  0.5002434 ]
Output-layer Output:
[ 0.49815196  0.48539772]

反向传播 – 将误差的梯度反向传播到神经网络的每个神经元用以更新权重w

反向传播的计算方法：
从最后一层的梯度计算，利用链式法则反向计算每一层梯度

《一小时培训之神经网络入门》 image

公式：
第j层的误差：

《一小时培训之神经网络入门》 image

这里的Σ表示如果下一层（第k层）有多个神经元，则反向传上来的误差要叠加起来
第j层的每个权重w的值：

《一小时培训之神经网络入门》 image

import numpy as np


def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


x = np.array([0.5, 0.1, -0.2])
target = 0.6
learnrate = 0.5

weights_input_hidden = np.array([[0.5, -0.6],
                                 [0.1, -0.2],
                                 [0.1, 0.7]])

weights_hidden_output = np.array([0.1, -0.3])

## Forward pass
hidden_layer_input = np.dot(x, weights_input_hidden)
hidden_layer_output = sigmoid(hidden_layer_input)

output_layer_in = np.dot(hidden_layer_output, weights_hidden_output)
output = sigmoid(output_layer_in)

## Backwards pass
## TODO: Calculate error
error = target - output

# TODO: Calculate error gradient for output layer
del_err_output = error * output * (1 - output)
# TODO: Calculate change in weights for hidden layer to output layer
delta_weights_hidden_output = learnrate * del_err_output * hidden_layer_output


# TODO: Calculate error gradient for hidden layer
del_err_hidden = np.dot(del_err_output, weights_hidden_output) * \
                 hidden_layer_output * (1 - hidden_layer_output)
# TODO: Calculate change in weights for input layer to hidden layer
delta_weights_input_hidden = learnrate * del_err_hidden * x[:, None]

print('Change in weights for hidden layer to output layer:')
print(delta_weights_hidden_output)
print('Change in weights for input layer to hidden layer:')
print(delta_weights_input_hidden)

Change in weights for hidden layer to output layer:
[ 0.00804047  0.00555918]
Change in weights for input layer to hidden layer:
[[  1.77005547e-04  -5.11178506e-04]
 [  3.54011093e-05  -1.02235701e-04]
 [ -7.08022187e-05   2.04471402e-04]]

回顾训练方法

迭代直到误差最小：

正向传播，获得预测值：沿着神经网络，矩阵点乘，计算出预测值\hat y。
反向传播，获得每层的误差梯度：用\hat y计算误差函数，反向传播误差。
更新权重：根据每层的误差更新每层的权重

以预测研究生是否能入学为例 – 神经网络版本

数据处理

import numpy as np
import pandas as pd

admissions = pd.read_csv('entry_admission.csv')

# Make dummy variables for rank
data = pd.concat([admissions, pd.get_dummies(admissions['rank'], prefix='rank')], axis=1)
data = data.drop('rank', axis=1)

# Standarize features
for field in ['gre', 'gpa']:
    mean, std = data[field].mean(), data[field].std()
    data.loc[:,field] = (data[field]-mean)/std
    
# Split off random 10% of the data for testing
np.random.seed(21)
sample = np.random.choice(data.index, size=int(len(data)*0.9), replace=False)
data, test_data = data.ix[sample], data.drop(sample)

# Split into features and targets
features, targets = data.drop('admit', axis=1), data['admit']
features_test, targets_test = test_data.drop('admit', axis=1), test_data['admit']

神经网络版本

import numpy as np

np.random.seed(21)

def sigmoid(x):
    """
    Calculate sigmoid
    """
    return 1 / (1 + np.exp(-x))


# Hyperparameters
n_hidden = 2  # number of hidden units
epochs = 900
learnrate = 0.005

n_records, n_features = features.shape
last_loss = None
# Initialize weights
weights_input_hidden = np.random.normal(scale=1 / n_features ** .5,
                                        size=(n_features, n_hidden))
weights_hidden_output = np.random.normal(scale=1 / n_features ** .5,
                                         size=n_hidden)

for e in range(epochs):
    del_w_input_hidden = np.zeros(weights_input_hidden.shape)
    del_w_hidden_output = np.zeros(weights_hidden_output.shape)
    for x, y in zip(features.values, targets):
        ## Forward pass ##
        # TODO: Calculate the output
        hidden_input = np.dot(x, weights_input_hidden)
        hidden_output = sigmoid(hidden_input)

        output = sigmoid(np.dot(hidden_output,
                                weights_hidden_output))

        ## Backward pass ##
        # TODO: Calculate the network's prediction error
        error = y - output

        # TODO: Calculate error term for the output unit
        output_error_term = error * output * (1 - output)

        ## propagate errors to hidden layer

        # TODO: Calculate the hidden layer's contribution to the error
        hidden_error = np.dot(output_error_term, weights_hidden_output)

        # TODO: Calculate the error term for the hidden layer
        hidden_error_term = hidden_error * hidden_output * (1 - hidden_output)

        # TODO: Update the change in weights
        del_w_hidden_output += output_error_term * hidden_output
        del_w_input_hidden += hidden_error_term * x[:,None]

    # TODO: Update weights
    weights_input_hidden += learnrate * del_w_input_hidden / n_records
    weights_hidden_output += learnrate * del_w_hidden_output / n_records

    # Printing out the mean square error on the training set
    if e % (epochs / 10) == 0:
        hidden_output = sigmoid(np.dot(x, weights_input_hidden))
        out = sigmoid(np.dot(hidden_output,
                             weights_hidden_output))
        loss = np.mean((out - targets) ** 2)

        if last_loss and last_loss < loss:
            print("Train loss: ", loss, "  WARNING - Loss Increasing")
        else:
            print("Train loss: ", loss)
        last_loss = loss

# Calculate accuracy on test data
hidden = sigmoid(np.dot(features_test, weights_input_hidden))
out = sigmoid(np.dot(hidden, weights_hidden_output))
predictions = out > 0.5
accuracy = np.mean(predictions == targets_test)
print("Prediction accuracy: {:.3f}".format(accuracy))

Train loss:  0.245943442947
Train loss:  0.224108177301
Train loss:  0.228908195703   WARNING - Loss Increasing
Train loss:  0.230352461418   WARNING - Loss Increasing
Train loss:  0.230651907986   WARNING - Loss Increasing
Train loss:  0.230865845199   WARNING - Loss Increasing
Train loss:  0.231183108301   WARNING - Loss Increasing
Train loss:  0.231499116961   WARNING - Loss Increasing
Train loss:  0.231737211823   WARNING - Loss Increasing
Train loss:  0.231882889013   WARNING - Loss Increasing
Prediction accuracy: 0.750

关于我：

linxinzhe，全栈工程师，目前供职于某世界500强银行的金融科技部门（人工智能，区块链）。

GitHub:https://github.com/linxinzhe

欢迎留言讨论，也欢迎关注我~
我也会关注你的哦！

    原文作者：linxinzhe
    原文地址: https://www.jianshu.com/p/b9c35ffaf848
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。