为什么这个pthreads代码在OS X上一直是段错误而在Linux上不是？

2023年2月23日 207次阅读

我有一个任务调度代码,我想与基线进行比较,基本上为每个任务创建一个新的pthread(我知道这不是一个好主意,但这就是为什么这只是比较的基线).但是,出于某种原因,pthreads版本不断给我在OS X1上的段错误,但是当我尝试在
Linux2上运行相同的代码时,一切正常.

在OS X上,它偶尔会成功完成,但它通常在pthread_create中进行段错误,有时在pthread_join中会出现段错误.我还发现,如果我调用pthread_create提供PTHREAD_CREATE_DETACHED属性,并跳过pthread_joins,那么段错误就会消失.

这个问题的底部包含一个精简版本的代码,我试图尽可能地减少代码,同时仍然导致有问题的段错误.

我的问题如下：

为什么这会在OS X上崩溃,但在Linux上却不会崩溃？

也许有一个我忽略的错误,在Linux上恰好是良性的.我很确定互斥和CAS操作提供了足够的同步,所以我认为这不是数据竞争问题.

就像我说的,我可以通过使用PTHREAD_CREATE_DETACHED解决这个问题,但我真的很好奇segfaults的根本原因.我的感觉是,我当前压倒了一些系统资源限制,当我需要加入线程时,这个限制没有被足够快地释放,但问题是针对分离的pthreads修复的,因为它们可以在线程退出时立即被销毁;但是,我对pthread内部结构不太熟悉,无法证实/驳斥我的假设.

以下是代码如何工作的概述：

>我们有一堆pthread(通过wait_list_head访问),这些pthread当前被阻塞等待特定于线程的条件变量的信号.
>主线程创建一个子线程,然后等待所有传递子项完成(通过检查活动线程计数器达到零).
>子线程通过创建两个子线程来计算Fibonacci(N-1)和Fibonacci(N-2)来计算Fibonacci(N = 10),然后连接两个线程,将它们的结果相加并将该总和作为自己的结果返回.这就是所有子线程的工作方式,N< 2的基本情况只返回N.
>请注意,被阻塞的线程堆栈半随机化了哪些线程由父线程连接.也就是说,一个父母线程可能会加入其中一个兄弟姐妹的孩子,而不是加入自己的孩子;然而,由于整数加法的可交换性,最终的总和仍然是相同的.通过让每个父级加入自己的子级来消除这种“随机化”行为也可以消除段错误.
>还有一个简单的纯递归Fibonacci实现(pure_fib),用于计算验证的预期答案.

这是核心行为的一些伪代码：

Fibonacci(N):
    If N < 2:
        signal_parent(N)
    Else:
        sum = 0
        pthread_create(A, Fibonacci, N-1)
        pthread_create(B, Fibonacci, N-2)
        sum += suspend_and_join_child(); // not necessarily thread A
        sum += suspend_and_join_child(); // not necessarily thread B
        signal_parent(sum)

下面包括C代码的最小工作示例.

1 Apple LLVM版本7.0.0(clang-700.1.76),目标：x86_64-apple-darwin14.5.0
2 gcc(Ubuntu 5.4.0-6ubuntu1~16.04.2)5.4.0 20160609

#include <assert.h>
#include <pthread.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <unistd.h>

#define N 10

#define RCHECK(expr)                                     \
    do {                                                 \
        int _rcheck_expr_return_value = expr;            \
        if (_rcheck_expr_return_value != 0) {            \
            fprintf(stderr, "FAILED CALL: " #expr "\n"); \
            abort();                                     \
        }                                                \
    } while (0);

typedef struct wait_state_st {
    volatile intptr_t val;
    pthread_t other;
    pthread_mutex_t lock;
    pthread_cond_t cond;
    struct wait_state_st *next;
} wait_state;

wait_state *volatile wait_list_head = NULL;
volatile int active = 0;

static inline void push_thread(wait_state *ws) {
    do {
        ws->next = wait_list_head;
    } while (!__sync_bool_compare_and_swap(&wait_list_head, ws->next, ws));
}

static inline wait_state *pop_thread(void) {
    wait_state *ws, *next;
    do {
        ws = wait_list_head;
        while (!ws) {
            usleep(1000);
            ws = wait_list_head;
        }
        next = ws->next;
    } while (!__sync_bool_compare_and_swap(&wait_list_head, ws, next));
    assert(ws->next == next); // check for ABA problem
    ws->next = NULL;
    return ws;
}

intptr_t thread_suspend(int count) {
    intptr_t sum = 0;
    // WAIT TO BE WOKEN UP "count" TIMES
    for (int i = 0; i < count; i++) {
        wait_state ws;
        ws.val = -1;
        ws.other = pthread_self();
        RCHECK(pthread_mutex_init(&ws.lock, NULL));
        RCHECK(pthread_cond_init(&ws.cond, NULL));

        RCHECK(pthread_mutex_lock(&ws.lock));

        push_thread(&ws);

        while (ws.val < 0) {
            RCHECK(pthread_cond_wait(&ws.cond, &ws.lock));
        }

        assert(ws.other != pthread_self());
        pthread_join(ws.other, NULL);

        sum += ws.val;

        RCHECK(pthread_mutex_unlock(&ws.lock));
    }
    return sum;
}

void thread_signal(intptr_t x) {
    // wake up the suspended thread
    __sync_fetch_and_add(&active, -1);
    wait_state *ws = pop_thread();
    RCHECK(pthread_mutex_lock(&ws->lock));
    ws->val = x;
    ws->other = pthread_self();
    RCHECK(pthread_cond_signal(&ws->cond));
    RCHECK(pthread_mutex_unlock(&ws->lock));
}

void *fib(void *arg) {
    intptr_t n = (intptr_t)arg;
    if (n > 1) {
        pthread_t t1, t2;
        __sync_fetch_and_add(&active, 2);
        RCHECK(pthread_create(&t1, NULL, fib, (void *)(n - 1)));
        RCHECK(pthread_create(&t2, NULL, fib, (void *)(n - 2)));
        intptr_t sum = thread_suspend(2);
        thread_signal(sum);
    }
    else {
        thread_signal(n);
    }
    return NULL;
}

intptr_t pure_fib(intptr_t n) {
    if (n < 2) return n;
    return pure_fib(n-1) + pure_fib(n-2);
}

int main(int argc, char *argv[]) {
    printf("EXPECTED = %" PRIdPTR "\n", pure_fib(N));
    assert("START" && wait_list_head == NULL);

    active = 1;

    pthread_t t;
    RCHECK(pthread_create(&t, NULL, fib, (void *)N));

    while (active > 0) { usleep(100000); }
    intptr_t sum = thread_suspend(1);

    printf("SUM      = %" PRIdPTR "\n", sum);
    printf("DONE %p\n", wait_list_head);

    assert("END" && wait_list_head == NULL);

    return 0;
}

更新：This Gist包含上述代码的略微变化,该代码使用全局互斥锁进行所有线程推送/弹出操作,从而避免了上述CAS可能出现的ABA问题.此版本的代码仍定期定期段错误,但只有大约30-50％的时间而不是99％的时间像上面的代码一样.

再一次,我觉得当线程没有足够快地加入/销毁线程时,pthreads库耗尽资源一定是个问题,但我不知道如何确认.

最佳答案我看了几个小时,因为我想知道解决方案.

我发现代码是在堆栈上运行并线程私有数据,以便它覆盖线程ID.代码中的链表指向并使用堆栈变量的地址.代码只能起作用,因为线程的时间和产生的线程数量.

如果这个生成少于20个左右的线程,那么链表内存不会踩到其他数据,这一切都归结为如何布局内存和线程被杀死.只要程序在被压碎的线程唤醒之前终止就可以了.

它在Linux而不是OS X上运行的原因可能是运气与内存布局和旋转usleep()循环所花费的时间相结合.

应该审查在多线程应用程序中使用usleep.

这在很多方面都有很多讨论：

https://computing.llnl.gov/tutorials/pthreads/#Overview

https://en.wikipedia.org/wiki/ABA_problem

与W.R. Stevens,“Unix网络计划,第1卷”第23章一起.

阅读这些资源将解释为什么这些代码不起作用以及它应该如何工作.