神经网络
神经网络优秀讲解链接
基础讲解及实现神经网络_python
神经网络实现_c++
一.基础知识
神经网络是由具有适应性的简单单元组成的广泛并行互连的网络
- Perceptron 感知机 :
感知机是一种线性分类模型
感知机只有两层神经元组成,而且只有输出层是M-P神经单元也就是功能神经元
原始形式和对偶形式,关于对偶形式的迭代更新要理解. 反向传播算法(Back propagation)可以应用于多层前馈神经网络,还可以应用于训练递归神经网络
一般说 BP算法就是训练的多层前馈神经网络.深度学习的基本名词
1.卷积神经网络(convolutional neural network CNN) cnn复合多个 卷积层 和 采样层 来对输入信号进行加工.最终在连接层实现与输出目标之间的映射.
2.卷积层:包含多个特征映射,每个特征映射是一个由多个神经元构成的平面.
3.采样层:基于局部相关性原理进行亚采样,减少数据量的同时保留有用信息.换个角度理解就是 用机器代替原来专家的"特征工程(feature engineering)"神经网络的激活函数:
1.logitic:典型的激活函数sigmod函数,在计算分类概率时,非常有用.\[f(z)=\frac{1}{1+exp(-z)} , 0<f(z)<1\]
2.Tanh: \[f(z)=tanh(z)=\frac{e^{z}-e^{-z}}{e^{z}+e^{-z}} ,-1<f(z)<1\]
3.Relu:线性修正函数,函数的主要目的是对抗梯度消失,当梯度反向传播到第一层的时候,梯度容易趋近于0或者一个非常小的值.\[f(z)=max(0,x)\]卷积神经网络(CNN):
卷积:就是两个操作在时间维度上的融合.\[(f\cdot g)(\tau)=\int_{-\infty}^{\infty}f(\tau)g(t-\tau)d\tau\]
卷积的使用范围可以被延展到离散域,数学表达式为\[(f\cdot g)\left [ n \right ]=\sum_{m=-\infty}^{\infty} f(m)g(n-m)\]
卷积运算中最重要的是核函数,利用核函数分别与每个点的乘积再求和.作为下一个层的元素点.
二.思想脉络
- 1.根据训练数据集来调整神经元之间的连接权 connection weight ,以及每个功能神经元的阈值.也就是说,神经网络所学到的东西都在连接权和阈值中.
- 2.参数的确定(利用迭代更新)调整感知机(神经网络)的权重.$ \omega_{i}\leftarrow \omega+\Delta \omega_{i}$
\(\Delta \omega_{i}=\eta(y-\hat{y}x_{i})\) - 3.先将输入事例提供给输入层神经元,逐层将信号进行前传,直到产生输出层的结果
- 4.计算输出层的误差,再将误差逆向传播至隐层神经元
- 5.最后根据隐层神经元的误差来对连接权和阈值进行调整.并进行迭代循环进行.
三.算法推导
BP算法:
训练集$D = {(x_{1},y_{1}),(x_{2},y_{2}),…,(x_{m},y_{m})} $
输入:d个属性
输出:l维实值向量 阈值\(\theta_{j}\)
隐藏层:q个隐层神经元网络
阈值 \(\gamma_{h}\) \(b_{h}=f_{1}(\alpha_{h}-\gamma_{h})\)\(y_{j}=f_{2}(\beta_{j}-\theta_{j})\)
任意参数的更新估计式
\[\upsilon \leftarrow \upsilon +\Delta \upsilon\]
BP算法基于梯度下降策略来进行参数的调整知识点补充:
梯度下降法(gradient descent)
梯度下降法是一种常用的一阶优化方法,是求解无约束优化问题最简单,最经典的方法之一.
f(x)是连续可微函数,且满足\[f(x^{t+1})<f(x^{t}) t=0,1,2,3…\]
则不断执行该过程可收敛到局部最小点,根据泰勒公式展开\[f(x+\Delta x)\simeq f(x)+\Delta x^{T}\triangledown f(x)\]
为了使\(f(x+\Delta x)<f(x)\) 可以让\[\Delta x=-\gamma \triangledown f(x), 其中 \gamma为步长,一个小常数\]
目标函数:\(E_{k}=\frac{1}{2}\sum_{j=1}^{l}(\hat{y_{j}^{k}}-y_{j}^{K})\)最小化目标函数
推导\(\Delta \upsilon_{ih}\)的更新公式:
对目标函数进行求导
\[\frac{\partial E_{k}}{\partial \upsilon_{ih}}=\frac{\partial E_{k}}{\partial b_{h}}. \frac{\partial b_{h}}{\partial \alpha_{h} }=-\sum_{j=1}^{l} \frac{\partial E_{k}}{\partial \beta{j}}.\frac{\partial \beta{j}}{\partial \alpha_{h}}{f}'(\alpha_{h}-\gamma_{h})=\sum_{i=1}^{l} \omega_{hj}g_{j}{f}'(\alpha_{h}-\gamma_{h})=b_{h}(1-b_{h})\sum_{j=1}^{l}\omega_{hj}g_{j}.\]隐藏层和输出层的激活函数是相同的
全局最小 & 局部最小
其实整个算法是一个参数寻优的过程.找到一组最优的参数.
四.编程推导
- BP算法,在西瓜数据集3.0上用算法训练一个单隐层神经网络
PesudoCode:
输入:训练集
学习率
过程:
1.在(0,1)范围内随机初始化网络中所有的连接权值和阈值
2.repeat
3. for all (Xk,Yk) do
4. 根据当前参数和公式,计算当前样本的输出
5. 根据公式计算出输出层神经元的梯度项
6. 根据公式计算隐层神经元的梯度项
7. 根据公式更新连接权和阈值
8. end for
9. until 达到停止条件
输出:连接权与阈值确定的多层前馈神经网络
注意区分标准BP算法,和累积BP算法(accumulated error backpropagation)
累积BP算法:是将训练集进行读取一遍后才进行更新
标准BP算法:针对一个训练样例进行更新
# input()函数
# 将西瓜数据集3.0进行读取
def input():
"""
@param : none or filepath
@return : dataSet,dataFrame using pandas
Random double or random.uniform()
"""
try:
import pandas as pd
except ImportError:
print("module import error")
with open('/home/dengshuo/GithubCode/ML/CH05/watermelon3.csv') as data_file:
df=pd.read_csv(data_file)
return df
# learningRatio()函数
# 初始化函数的学习率
def learningRatio():
"""
@ return : learningRatio
"""
try:
import random
except ImportError:
print('module import error')
learningRatio=random.uniform(0,1)
return learningRatio
ratio=learningRatio()
print(ratio)
input()
0.8475765311660175
编号 | 色泽 | 根蒂 | 敲声 | 纹理 | 脐部 | 触感 | 密度 | 含糖率 | 好瓜 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 青绿 | 蜷缩 | 浊响 | 清晰 | 凹陷 | 硬滑 | 0.697 | 0.460 | 是 |
1 | 2 | 乌黑 | 蜷缩 | 沉闷 | 清晰 | 凹陷 | 硬滑 | 0.774 | 0.376 | 是 |
2 | 3 | 乌黑 | 蜷缩 | 浊响 | 清晰 | 凹陷 | 硬滑 | 0.634 | 0.264 | 是 |
3 | 4 | 青绿 | 蜷缩 | 沉闷 | 清晰 | 凹陷 | 硬滑 | 0.608 | 0.318 | 是 |
4 | 5 | 浅白 | 蜷缩 | 浊响 | 清晰 | 凹陷 | 硬滑 | 0.556 | 0.215 | 是 |
5 | 6 | 青绿 | 稍蜷 | 浊响 | 清晰 | 稍凹 | 软粘 | 0.403 | 0.237 | 是 |
6 | 7 | 乌黑 | 稍蜷 | 浊响 | 稍糊 | 稍凹 | 软粘 | 0.481 | 0.149 | 是 |
7 | 8 | 乌黑 | 稍蜷 | 浊响 | 清晰 | 稍凹 | 硬滑 | 0.437 | 0.211 | 是 |
8 | 9 | 乌黑 | 稍蜷 | 沉闷 | 稍糊 | 稍凹 | 硬滑 | 0.666 | 0.091 | 否 |
9 | 10 | 青绿 | 硬挺 | 清脆 | 清晰 | 平坦 | 软粘 | 0.243 | 0.267 | 否 |
10 | 11 | 浅白 | 硬挺 | 清脆 | 模糊 | 平坦 | 硬滑 | 0.245 | 0.057 | 否 |
11 | 12 | 浅白 | 蜷缩 | 浊响 | 模糊 | 平坦 | 软粘 | 0.343 | 0.099 | 否 |
12 | 13 | 青绿 | 稍蜷 | 浊响 | 稍糊 | 凹陷 | 硬滑 | 0.639 | 0.161 | 否 |
13 | 14 | 浅白 | 稍蜷 | 沉闷 | 稍糊 | 凹陷 | 硬滑 | 0.657 | 0.198 | 否 |
14 | 15 | 乌黑 | 稍蜷 | 浊响 | 清晰 | 稍凹 | 软粘 | 0.360 | 0.370 | 否 |
15 | 16 | 浅白 | 蜷缩 | 浊响 | 模糊 | 平坦 | 硬滑 | 0.593 | 0.042 | 否 |
16 | 17 | 青绿 | 蜷缩 | 沉闷 | 稍糊 | 稍凹 | 硬滑 | 0.719 | 0.103 | 否 |
17 | 18 | 青绿 | 蜷缩 | 浊响 | 清晰 | 凹陷 | 硬滑 | 0.697 | 0.460 | NaN |
# outputlayer() 函数
# 计算函数输出层的输出值Yk
def outputlayer(df):
"""
@param df: the dataframe of pandas
@return Yk:the output
"""
# 复杂的参数让人头疼
# define class()
# define the neural networks structure,创建整个算法的框架
'''
the definition of BP network class
'''
class BP_network:
def __init__(self):
'''
initial variables
'''
# node number each layer
self.i_n = 0
self.h_n = 0
self.o_n = 0
# output value for each layer
self.i_v = []
self.h_v = []
self.o_v = []
# parameters (w, t)
self.ih_w = [] # weight for each link
self.ho_w = []
self.h_t = [] # threshold for each neuron
self.o_t = []
# definition of alternative activation functions and it's derivation
self.fun = {
'Sigmoid': Sigmoid, # 对数几率函数
'SigmoidDerivate': SigmoidDerivate,
'Tanh': Tanh, # 双曲正切函数
'TanhDerivate': TanhDerivate,
}
'Sigmoid': Sigmoid, # 对数几率函数
^
SyntaxError: invalid character in identifier
# CreateNN() 函数
# 将架构进行填充
def CreateNN(self, ni, nh, no, actfun):
'''
build a BP network structure and initial parameters
@param ni, nh, no: the neuron number of each layer
@param actfun: string, the name of activation function
'''
# import module packages
import numpy as np
import random
# assignment of node number
# 对每层的结点树的输入值进行赋值
self.i_n = ni
self.h_n = nh
self.o_n = no
# initial value of output for each layer
self.i_v = np.zeros(self.i_n)
self.h_v = np.zeros(self.h_n)
self.o_v = np.zeros(self.o_n)
# initial weights for each link (random initialization)
self.ih_w = np.zeros([self.i_n, self.h_n])
self.ho_w = np.zeros([self.h_n, self.o_n])
# 利用循环来对权值进行赋值
for i in range(self.i_n):
for h in range(self.h_n):
self.ih_w[i][h] = rand(0,1)# float(0,1) # 调用rand()函数
for h in range(self.h_n):
for j in range(self.o_n):
self.ho_w[h][j] = rand(0,1)
# initial threshold for each neuron
self.h_t = np.zeros(self.h_n)
self.o_t = np.zeros(self.o_n)
for h in range(self.h_n): self.h_t[h] = rand(0,1)
for j in range(self.o_n): self.o_t[j] = rand(0,1)
# initial activation function
# 这个不调库能直接用?不是很理解
self.af = self.fun[actfun]
self.afd = self.fun[actfun+'Derivate']
# 随机取值函数的定义
'''
the definition of random function
'''
def rand(a, b):
'''
random value generation for parameter initialization
@param a,b: the upper and lower limitation of the random value
'''
from random import random
return (b - a) * random() + a
# define th need functions
# 一些激活函数
'''
the definition of activation functions
'''
def Sigmoid(x):
'''
definition of sigmoid function and it's derivation
'''
from math import exp
return 1.0 / (1.0 + exp(-x))
def SigmoidDerivate(y):
return y * (1 - y)
def Tanh(x):
'''
definition of sigmoid function and it's derivation
'''
from math import tanh
return tanh(x)
def TanhDerivate(y):
return 1 - y*y
# predict process through the network
# 计算一个输出
def Pred(self, x):
'''
@param x: the input array for input layer
'''
# activate input layer
for i in range(self.i_n):
self.i_v[i] = x[i]
# activate hidden layer
for h in range(self.h_n):
total = 0.0
for i in range(self.i_n):
total += self.i_v[i] * self.ih_w[i][h]
self.h_v[h] = self.af(total - self.h_t[h])
# activate output layer
for j in range(self.o_n):
total = 0.0
for h in range(self.h_n):
total += self.h_v[h] * self.ho_w[h][j]
self.o_v[j] = self.af(total - self.o_t[j])
**还有一个问题就是,已经读取的西瓜数据,该以什么样的形式来进行输入
西瓜数据集的离散性变量该如何处理 例如:色泽{青緑,乌黑,浅白}={0,1,2} ??
如何不是这样,怎么实现离散性变量的计算?**
# the implementation of BP algorithms on one slide of sample
# backPropagate() 函数
# 后向传播函数,进行计算
def BackPropagate(self, x, y, lr):
'''
@param x, y: array, input and output of the data sample
@param lr: float, the learning rate of gradient decent iteration
'''
# import need module packages
import numpy as np
# get current network output
self.Pred(x)
# calculate the gradient based on output
o_grid = np.zeros(self.o_n)
for j in range(self.o_n):
# 输出层的神经元梯度项,参考西瓜书 5.3 公式(5.10)
o_grid[j] = (y[j] - self.o_v[j]) * self.afd(self.o_v[j])
# 这个self.afd()函数就相当于yk(1-yk)
# caculate the gradient of hidden layer
# 计算隐藏层的梯度项Eh
h_grid = np.zeros(self.h_n)
for h in range(self.h_n):
for j in range(self.o_n):
h_grid[h] += self.ho_w[h][j] * o_grid[j]
h_grid[h] = h_grid[h] * self.afd(self.h_v[h])
# self.afd()函数就是 Bh(1-Bh)
# updating the parameter
# 将参数进行更新
for h in range(self.h_n):
for j in range(self.o_n):
# 更新公式
self.ho_w[h][j] += lr * o_grid[j] * self.h_v[h]
for i in range(self.i_n):
for h in range(self.h_n):
self.ih_w[i][h] += lr * h_grid[h] * self.i_v[i]
for j in range(self.o_n):
self.o_t[j] -= lr * o_grid[j]
for h in range(self.h_n):
self.h_t[h] -= lr * h_grid[h]
# define TrainStandard() 函数
# 标准的BP函数,计算累积误差
def TrainStandard(self, data_in, data_out, lr=0.05):
'''
@param lr, learning rate, default 0.05
@param data_in :the networks input data
@param data_out:the output data of output layer
@return: e, accumulated error
@return: e_k, error array of each step
'''
e_k = []
for k in range(len(data_in)):
x = data_in[k]
y = data_out[k]
self.BackPropagate(x, y, lr)
# error in train set for each step
# 计算均方误差
y_delta2 = 0.0
for j in range(self.o_n):
y_delta2 += (self.o_v[j] - y[j]) * (self.o_v[j] - y[j])
e_k.append(y_delta2/2)
# total error of training
# 先计算出累积误差,然后最小化累积误差
e = sum(e_k)/len(e_k)
return e, e_k
# 返回预测的标签,好瓜是1,坏瓜是0
def PredLabel(self, X):
'''
predict process through the network
@param X: the input sample set for input layer
@return: y, array, output set (0,1 - class) based on [winner-takes-all]
也就是竞争学习,胜者通吃
'''
import numpy as np
y = []
for m in range(len(X)):
self.Pred(X[m])
if self.o_v[0] > 0.5: y.append(1)
else : y.append(0)
# max_y = self.o_v[0]
# label = 0
# for j in range(1,self.o_n):
# if max_y < self.o_v[j]: label = j
# y.append(label)
return np.array(y)
4.2 利用tensorflow 来实现BP算法
先学习如何实现BP算法
汽车燃油效率建模,一个非线性回归.建立一个多变量输入,单变量输出的前向神经网络
1.数据集的描述和加载
这个数据集是一个著名的,标准的输入数据集.这是一个非常简单的例子,主要还是理解其主要的步骤和方法.
因为这个数据集是标准封装好的数据集,不需要进行详细的数据分析.
一般情况下,数据集会进行可视化处理和详细的数据分析.
2.数据的预处理
一般情况下的预处理也是利用sklearn包中的函数进行直接调用处理.
Sklearn中的Pre-Processing模块
sklearn.preprocessing.StandardScaler
# Standardize features by removing the mean and scaling to unit variance
scaler=preprocessing.StandardScaler()
X_train=scaler.fit_transform(X_train)
这是我现阶段认为进行算法分析最难,也是最不容易操作的地方
就是将数据进行处理,满足算法分析的要求.
一般情况下都是数据进行处理,满足输入的条件 向算法靠拢
有没有根据数据,算法向数据靠拢的,是不是就是一开始的算法选择问题?
3.模型架构
多输入,双隐层,单输出的前向神经网络
七个输入结点,第一隐藏层10,第二隐藏层5,一个输出结点.
不过这个比较简单,可直接利用tensorflow中skflow库来直接调取,skflow库的学习
4.准确度测试
利用均方误差来监测准确度.
还是sklearn.metrics 模型的性能度量.
这个例子不需要进行参数的更新? 主要还是损失函数的优化,本例中没有体现.
score=metrics.mean_squared_error(regressor.predict(scaler.transform(X_test)),y_test)
print("Total mean squared error :".format(score))
上述代码进行汇总,步骤进行合成
完整的源代码
from sklearn import datasets,cross_validation,metrics
from sklearn import preprocessing
from tensorflow.contrib import learn
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format='svg'
from keras.models import Sequential
from keras.layers import Dense
read the original dataset with pandas packages
df=pd.read_csv('mpg.csv',header=0)
df
mpg | cylinders | displacement | horsepower | weight | acceleration | model_year | origin | name | |
---|---|---|---|---|---|---|---|---|---|
0 | 18.0 | 8 | 307.0 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
1 | 15.0 | 8 | 350.0 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
2 | 18.0 | 8 | 318.0 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
3 | 16.0 | 8 | 304.0 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
4 | 17.0 | 8 | 302.0 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
5 | 15.0 | 8 | 429.0 | 198 | 4341 | 10.0 | 70 | 1 | ford galaxie 500 |
6 | 14.0 | 8 | 454.0 | 220 | 4354 | 9.0 | 70 | 1 | chevrolet impala |
7 | 14.0 | 8 | 440.0 | 215 | 4312 | 8.5 | 70 | 1 | plymouth fury iii |
8 | 14.0 | 8 | 455.0 | 225 | 4425 | 10.0 | 70 | 1 | pontiac catalina |
9 | 15.0 | 8 | 390.0 | 190 | 3850 | 8.5 | 70 | 1 | amc ambassador dpl |
10 | 15.0 | 8 | 383.0 | 170 | 3563 | 10.0 | 70 | 1 | dodge challenger se |
11 | 14.0 | 8 | 340.0 | 160 | 3609 | 8.0 | 70 | 1 | plymouth ‘cuda 340 |
12 | 15.0 | 8 | 400.0 | 150 | 3761 | 9.5 | 70 | 1 | chevrolet monte carlo |
13 | 14.0 | 8 | 455.0 | 225 | 3086 | 10.0 | 70 | 1 | buick estate wagon (sw) |
14 | 24.0 | 4 | 113.0 | 95 | 2372 | 15.0 | 70 | 3 | toyota corona mark ii |
15 | 22.0 | 6 | 198.0 | 95 | 2833 | 15.5 | 70 | 1 | plymouth duster |
16 | 18.0 | 6 | 199.0 | 97 | 2774 | 15.5 | 70 | 1 | amc hornet |
17 | 21.0 | 6 | 200.0 | 85 | 2587 | 16.0 | 70 | 1 | ford maverick |
18 | 27.0 | 4 | 97.0 | 88 | 2130 | 14.5 | 70 | 3 | datsun pl510 |
19 | 26.0 | 4 | 97.0 | 46 | 1835 | 20.5 | 70 | 2 | volkswagen 1131 deluxe sedan |
20 | 25.0 | 4 | 110.0 | 87 | 2672 | 17.5 | 70 | 2 | peugeot 504 |
21 | 24.0 | 4 | 107.0 | 90 | 2430 | 14.5 | 70 | 2 | audi 100 ls |
22 | 25.0 | 4 | 104.0 | 95 | 2375 | 17.5 | 70 | 2 | saab 99e |
23 | 26.0 | 4 | 121.0 | 113 | 2234 | 12.5 | 70 | 2 | bmw 2002 |
24 | 21.0 | 6 | 199.0 | 90 | 2648 | 15.0 | 70 | 1 | amc gremlin |
25 | 10.0 | 8 | 360.0 | 215 | 4615 | 14.0 | 70 | 1 | ford f250 |
26 | 10.0 | 8 | 307.0 | 200 | 4376 | 15.0 | 70 | 1 | chevy c20 |
27 | 11.0 | 8 | 318.0 | 210 | 4382 | 13.5 | 70 | 1 | dodge d200 |
28 | 9.0 | 8 | 304.0 | 193 | 4732 | 18.5 | 70 | 1 | hi 1200d |
29 | 27.0 | 4 | 97.0 | 88 | 2130 | 14.5 | 71 | 3 | datsun pl510 |
… | … | … | … | … | … | … | … | … | … |
368 | 27.0 | 4 | 112.0 | 88 | 2640 | 18.6 | 82 | 1 | chevrolet cavalier wagon |
369 | 34.0 | 4 | 112.0 | 88 | 2395 | 18.0 | 82 | 1 | chevrolet cavalier 2-door |
370 | 31.0 | 4 | 112.0 | 85 | 2575 | 16.2 | 82 | 1 | pontiac j2000 se hatchback |
371 | 29.0 | 4 | 135.0 | 84 | 2525 | 16.0 | 82 | 1 | dodge aries se |
372 | 27.0 | 4 | 151.0 | 90 | 2735 | 18.0 | 82 | 1 | pontiac phoenix |
373 | 24.0 | 4 | 140.0 | 92 | 2865 | 16.4 | 82 | 1 | ford fairmont futura |
374 | 23.0 | 4 | 151.0 | 0 | 3035 | 20.5 | 82 | 1 | amc concord dl |
375 | 36.0 | 4 | 105.0 | 74 | 1980 | 15.3 | 82 | 2 | volkswagen rabbit l |
376 | 37.0 | 4 | 91.0 | 68 | 2025 | 18.2 | 82 | 3 | mazda glc custom l |
377 | 31.0 | 4 | 91.0 | 68 | 1970 | 17.6 | 82 | 3 | mazda glc custom |
378 | 38.0 | 4 | 105.0 | 63 | 2125 | 14.7 | 82 | 1 | plymouth horizon miser |
379 | 36.0 | 4 | 98.0 | 70 | 2125 | 17.3 | 82 | 1 | mercury lynx l |
380 | 36.0 | 4 | 120.0 | 88 | 2160 | 14.5 | 82 | 3 | nissan stanza xe |
381 | 36.0 | 4 | 107.0 | 75 | 2205 | 14.5 | 82 | 3 | honda accord |
382 | 34.0 | 4 | 108.0 | 70 | 2245 | 16.9 | 82 | 3 | toyota corolla |
383 | 38.0 | 4 | 91.0 | 67 | 1965 | 15.0 | 82 | 3 | honda civic |
384 | 32.0 | 4 | 91.0 | 67 | 1965 | 15.7 | 82 | 3 | honda civic (auto) |
385 | 38.0 | 4 | 91.0 | 67 | 1995 | 16.2 | 82 | 3 | datsun 310 gx |
386 | 25.0 | 6 | 181.0 | 110 | 2945 | 16.4 | 82 | 1 | buick century limited |
387 | 38.0 | 6 | 262.0 | 85 | 3015 | 17.0 | 82 | 1 | oldsmobile cutlass ciera (diesel) |
388 | 26.0 | 4 | 156.0 | 92 | 2585 | 14.5 | 82 | 1 | chrysler lebaron medallion |
389 | 22.0 | 6 | 232.0 | 112 | 2835 | 14.7 | 82 | 1 | ford granada l |
390 | 32.0 | 4 | 144.0 | 96 | 2665 | 13.9 | 82 | 3 | toyota celica gt |
391 | 36.0 | 4 | 135.0 | 84 | 2370 | 13.0 | 82 | 1 | dodge charger 2.2 |
392 | 27.0 | 4 | 151.0 | 90 | 2950 | 17.3 | 82 | 1 | chevrolet camaro |
393 | 27.0 | 4 | 140.0 | 86 | 2790 | 15.6 | 82 | 1 | ford mustang gl |
394 | 44.0 | 4 | 97.0 | 52 | 2130 | 24.6 | 82 | 2 | vw pickup |
395 | 32.0 | 4 | 135.0 | 84 | 2295 | 11.6 | 82 | 1 | dodge rampage |
396 | 28.0 | 4 | 120.0 | 79 | 2625 | 18.6 | 82 | 1 | ford ranger |
397 | 31.0 | 4 | 119.0 | 82 | 2720 | 19.4 | 82 | 1 | chevy s-10 |
398 rows × 9 columns
# convert the displacement column as float
df['displacement']=df['displacement'].astype(float)
# we got the data columns from the dataset
# first and last (mpg and car names )are ignored for X
X=df[df.columns[1:8]]
y=df['mpg']
plt.figure()
f,ax1=plt.subplots()
for i in range (1,8):
number=420+i
ax1.locator_params(nbins=3)
ax1=plt.subplot(number) # 4rows x 2 columns
plt.title(list(df)[i])
ax1.scatter(df[df.columns[i]],y) # plot a scatter draw of the datapoints
plt.tight_layout(pad=0.4,w_pad=0.5,h_pad=1.0)
plt.show()
<matplotlib.figure.Figure at 0x7f37680ad9b0>
# split the datasets
X_train,X_test,y_train,y_test=cross_validation.train_test_split(X,y,test_size=0.25)
# Scale the data for convergency optimization
scaler=preprocessing.StandardScaler()
# set the transform parameters
X_train=scaler.fit_transform(X_train)
# bulid a 2 layer fully connected DNN with 10 and 5 units respectively
model=Sequential()
model.add(Dense(10,input_dim=7,init='normal',activation='relu'))
model.add(Dense(5,init='normal',activation='relu'))
model.add(Dense(1,init='normal'))
# compile the model ,with the mean squared error as lost function
model.compile(loss='mean_squared_error',optimizer='adam')
# fit the model in 1000 epochs
model.fit(X_train,y_train,nb_epoch=1000,validation_split=0.33,shuffle=True,verbose=2)
训练结果:
Train on 199 samples, validate on 99 samples
Epoch 1/1000
– 2s – loss: 617.0525 – val_loss: 609.8485
Epoch 2/1000
– 0s – loss: 616.6131 – val_loss: 609.3912
Epoch 3/1000
– 0s – loss: 616.1424 – val_loss: 608.8852
Epoch 4/1000
- 0s - loss: 6.8414 - val_loss: 8.4878
Epoch 96