python – 为什么线程会增加处理时间？

2024年2月2日 123次阅读

我正在进行基本2-D DLA仿真的多任务处理.扩散限制聚合(DLA)是指粒子在接触当前聚合时执行随机游走和聚合的情况.

在模拟中,我在每一步都有10.000个粒子走向随机方向.我使用一个工作池和一个队列来喂它们.我用粒子列表给它们喂食,工人在每个粒子上执行方法.updatePositionAndggregate().

如果我有一个工人,我用一个10.000颗粒列表喂它,如果我有两个工人,我给它们提供每个5000个颗粒的清单,如果我有3个工人,我给它们提供每个3.333颗粒的清单,等等

我现在向你展示一些代码

class Worker(Thread):
    """
    The worker class is here to process a list of particles and try to aggregate
    them.
    """

    def __init__(self, name, particles):
        """
        Initialize the worker and its events.
        """
        Thread.__init__(self, name = name)
        self.daemon = True
        self.particles = particles
        self.start()

    def run(self):
        """
        The worker is started just after its creation and wait to be feed with a
        list of particles in order to process them.
        """

        while True:

            particles = self.particles.get()
            # print self.name + ': wake up with ' + str(len(self.particles)) + ' particles' + '\n'

            # Processing the particles that has been feed.
            for particle in particles:
                particle.updatePositionAndAggregate()

            self.particles.task_done()
            # print self.name + ': is done' + '\n'

并在主线程中：

# Create the workers.
workerQueue = Queue(num_threads)
for i in range(0, num_threads):
    Worker("worker_" + str(i), workerQueue)

# We run the simulation until all the particle has been created
while some_condition():

    # Feed all the workers.
    startWorker = datetime.datetime.now()
    for i in range(0, num_threads):
        j = i * len(particles) / num_threads
        k = (i + 1) * len(particles) / num_threads

        # Feeding the worker thread.
        # print "main: feeding " + worker.name + ' ' + str(len(worker.particles)) + ' particles\n'
        workerQueue.put(particles[j:k])


    # Wait for all the workers
    workerQueue.join()

    workerDurations.append((datetime.datetime.now() - startWorker).total_seconds())
    print sum(workerDurations) / len(workerDurations)

所以,我打印等待工人终止任务的平均时间.我做了一些不同线程号的实验.

| num threads | average workers duration (s.) |
|-------------|-------------------------------|
| 1           | 0.147835636364                |
| 2           | 0.228585818182                |
| 3           | 0.258296454545                |
| 10          | 0.294294636364                |

我真的很想知道为什么添加工人会增加处理时间,我认为至少有2名工人可以减少处理时间,但是从.14s开始大大增加.到0.23s.你能解释一下为什么吗？

编辑：
那么,解释是Python线程实现,有没有办法让我可以拥有真正的多任务处理？

最佳答案发生这种情况是因为线程不会同时执行,因为由于GIL(全局解释器锁定),Python一次只能执行一个线程.

当你生成一个新线程时,除了这个线程之外,所有东西都会冻结.当它停止时,另一个被执行.产卵线程需要很多时间.

友好的说,代码无关紧要,因为任何使用100个线程的代码都是SLOWER而不是使用Python中的10个线程的代码(如果更多的线程意味着更高的效率和更高的速度,这并非总是如此).

以下是Python docs的确切报价：

CPython implementation detail:
In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing or concurrent.futures.ProcessPoolExecutor. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

Wikipedia about GIL

StackOverflow about GIL