为了加速训练神经网络的数据增强,我试图通过某种形式的并行处理来为我的GPU提供数据.目前,限制是我生成增强数据的速度,而不是GPU训练网络的速度.
如果我尝试对生成器使用multiprocessing = True,我在Windows 10(v1083)64位下使用Python 3.6.6中的keras 2.2.0会出现以下错误:
ValueError: Using a generator with
use_multiprocessing=True
is not
supported on Windows (no marshalling of generators across process
boundaries). Instead, use single thread/process or multithreading.
我发现例如在GitHub上的the following所以这是Windows下keras的预期行为.该链接似乎建议转移到序列而不是生成器(即使错误消息似乎建议使用多线程,但我也无法弄清楚如何使用keras而不是多处理多线程 – 我可能忽略了它在文档中,但我只是没有找到它).所以,我使用下面的代码(使用序列修改示例),但是也没有实现加速,或者在变量中使用use_multiprocessing = True只是冻结.
我是否遗漏了一些明显的东西,如何获得某种形式的并行发电机?
最小(非)工作示例:
from keras.utils import Sequence
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
import numpy as np
class DummySequence(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
return np.array(batch_x), np.array(batch_y)
x = np.random.random((100, 3))
y = to_categorical(np.random.random(100) > .5).astype(int)
seq = DummySequence(x, y, 10)
model = Sequential()
model.add(Dense(32, input_dim=3))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
print('single worker')
model.fit_generator(generator=seq,
steps_per_epoch = 100,
epochs = 2,
verbose=2,
workers=1)
print('achieves no speed-up')
model.fit_generator(generator=seq,
steps_per_epoch = 100,
epochs = 2,
verbose=2,
workers=6,
use_multiprocessing=False)
print('Does not run')
model.fit_generator(generator=seq,
steps_per_epoch = 100,
epochs = 2,
verbose=2,
workers=6,
use_multiprocessing=True)
最佳答案 结合序列,使用multi_processing = False和workers =例如. 4确实有效.
我刚刚意识到,在问题的示例代码中,我没有看到加速,因为数据生成太快.通过插入time.sleep(2),这变得很明显.
class DummySequence(Sequence):
def __init__(self, x_set, y_set, batch_size):
self.x, self.y = x_set, y_set
self.batch_size = batch_size
def __len__(self):
return int(np.ceil(len(self.x) / float(self.batch_size)))
def __getitem__(self, idx):
batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
time.sleep(2)
return np.array(batch_x), np.array(batch_y)
x = np.random.random((100, 3))
y = to_categorical(np.random.random(100) > .5).astype(int)
seq = DummySequence(x, y, 10)
model = Sequential()
model.add(Dense(32, input_dim=3))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
print('single worker')
model.fit_generator(generator=seq,
steps_per_epoch = 10,
epochs = 2,
verbose=2,
workers=1)
print('achieves speed-up!')
model.fit_generator(generator=seq,
steps_per_epoch = 10,
epochs = 2,
verbose=2,
workers=4,
use_multiprocessing=False)
这在我的笔记本电脑上产生如下:
single worker
>>> model.fit_generator(generator=seq,
... steps_per_epoch = 10,
... epochs = 2,
... verbose=2,
... workers=1)
Epoch 1/2
- 20s - loss: 0.6984 - acc: 0.5000
Epoch 2/2
- 20s - loss: 0.6955 - acc: 0.5100
和
achieves speed-up!
>>> model.fit_generator(generator=seq,
... steps_per_epoch = 10,
... epochs = 2,
... verbose=2,
... workers=4,
... use_multiprocessing=False)
Epoch 1/2
- 6s - loss: 0.6904 - acc: 0.5200
Epoch 2/2
- 6s - loss: 0.6900 - acc: 0.5000
重要笔记:
您可能希望在__init___中使用self.lock = threading.Lock(),然后在__getitem__中使用self.lock :.尝试在with self.lock中执行绝对最低要求:据我所知,这将是对self.xxxx的任何引用(当使用self.lock:block时,多线程被阻止).
此外,如果您希望多线程加速计算(即CPU操作是限制),请不要期望任何加速. global-interpreter lock(GIL)将阻止这一点.如果限制在I / O操作中,多线程只会对您有所帮助.显然,为了加速CPU计算,我们需要真正的多处理,keras目前在Windows 10上不支持.也许可以手工制作多处理器(我不知道).