我们已经了解了如何定义神经网络,计算损失并对网络的权重进行更新。
接下来的问题就是:
一、What about data?
通常处理图像、文本、音频或视频数据时,可以使用标准的python包将数据加载到numpy数组中。然后你可以将这个数组转换成一个torch.Tensor.
对于图片, 涉及到的库有Pillowh和OpenCV。
对于音频,涉及到的库有scipy和librosa
对于文本,无论是原始的Python还是基于Cython的加载,都会用到NLTK或者SpaCy。
我们已经创建了一个名为torchvision的软件包。
torchvision为像Imagenet、CIFAR10、MNIST等普通数据集提供数据加载器,并为图像、viz、torchvision提供数据转换器,也就是torchvision.datasets
torch.utils.data.DataLoader
.
我们在这里使用的是CIFAR10数据集。它的类包括:“飞机”、“汽车”、“鸟”、“猫”、“鹿”、“狗”、“青蛙”、“马”、“船”、“卡车”。cifar – 10中的图像大小为3x32x32,即3 – channel彩色图像,大小为32×32像素。
二、Training an image classifier
我们将按顺序进行以下步骤:
1使用torchvision对CIFAR10训练和测试数据集进行加载和规范化
2.定义一个卷积神经网络
3.定义一个损失函数
4.在训练数据上训练神经网络
5.在测试数据上测试神经网络
1加载并规范化CIFAR10
import相关类:
import torch import torchvision import torchvision.transforms as transforms
创建transform来处理图像数据
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
下载训练数据集到./data/目录下:
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
查看下载的数据:
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
image, label = trainset[0]
print(image.size())
print(label)
print(classes[label])
输出结果:
torch.Size([3, 32, 32]) 6
frog
torchvision数据集的输出是范围[0,1]的PILImage图像。我们将它们转换为标准化范围的Tensor[- 1,1]:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
同理,我们下载测试数据集并将其转化为Tensor:
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
2.定义一个卷积神经网络
从PyTorch教程之Neural Networks复制代码,然后修改成获取3通道图像(而不是原本定义为1通道的图像)。
from torch.autograd import Variable import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) self.pool = nn.MaxPool2d(2, 2) self.conv2 = nn.Conv2d(6, 16, 5) self.fc1 = nn.Linear(16 * 5 * 5, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x net = Net()
3.定义一个损失函数
这里使用的损失函数为Classification Cross-Entropy loss and SGD with momentum:
import torch.optim as optim criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
4.在训练数据上训练神经网络
我们只需要对数据迭代器进行循环,并将输入反馈到网络并进行优化。
for epoch in range(2): # loop over the dataset multiple times running_loss = 0.0 for i, data in enumerate(trainloader, 0): # get the inputs inputs, labels = data # wrap them in Variable inputs, labels = Variable(inputs), Variable(labels) # zero the parameter gradients optimizer.zero_grad() # forward + backward + optimize outputs = net(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() # print statistics running_loss += loss.data[0] if i % 2000 == 1999: # print every 2000 mini-batches print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 2000)) running_loss = 0.0 print('Finished Training')
输出结果:
[1, 2000] loss: 2.224 [1, 4000] loss: 1.896 [1, 6000] loss: 1.721 [1, 8000] loss: 1.591 [1, 10000] loss: 1.542 [1, 12000] loss: 1.471 [2, 2000] loss: 1.411 [2, 4000] loss: 1.377 [2, 6000] loss: 1.334 [2, 8000] loss: 1.316 [2, 10000] loss: 1.290 [2, 12000] loss: 1.281
5.在测试数据上测试神经网络
我们将通过预测神经网络输出的类标签来检查它,如果预测是正确的,我们将样本添加到正确预测的列表中。
获取前四个测试数据的GroundTruth:
dataiter = iter(testloader) images, labels = dataiter.next() print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
输出结果:
GroundTruth: cat ship ship plane
神经网络输出是10类分别对应的energy,一个类的 energy量越高,神经网络就认为图像属于该类可能性越高,我们将energy最高的类作为我们预测结果:
outputs = net(Variable(images)) _, predicted = torch.max(outputs.data, 1) print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
输出结果:
Predicted: cat ship ship ship
我们在整个测试数据集上进行测试:
correct = 0 total = 0 for data in testloader: images, labels = data outputs = net(Variable(images)) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum() print('Accuracy of the network on the 10000 test images: %d %%' % ( 100 * correct / total))
输出结果显示正确率为56%
Accuracy of the network on the 10000 test images: 56 %
我们对不同的类识别效果进行分别统计:
class_correct = list(0. for i in range(10)) class_total = list(0. for i in range(10)) for data in testloader: images, labels = data outputs = net(Variable(images)) _, predicted = torch.max(outputs.data, 1) c = (predicted == labels).squeeze() for i in range(4): label = labels[i] class_correct[label] += c[i] class_total[label] += 1 for i in range(10): print('Accuracy of %5s : %2d %%' % ( classes[i], 100 * class_correct[i] / class_total[i]))
结果显示:
Accuracy of plane : 52 % Accuracy of car : 73 % Accuracy of bird : 45 % Accuracy of cat : 26 % Accuracy of deer : 39 % Accuracy of dog : 42 % Accuracy of frog : 73 % Accuracy of horse : 73 % Accuracy of ship : 75 % Accuracy of truck : 63 %