python获取字符串第一个字母_python – 从numpy数组中的每个字符串中提取第一个字母...

2024年2月21日 6次阅读来源: weixin_39827304

这里不需要正则表达式.只需使用astype将数组转换为1字节字符串 –

v = np.array([‘abc’, ‘def’, ‘ghi’])

>>> v.astype(‘

array([‘a’, ‘d’, ‘g’],

dtype=’

或者,您可以更改其视图和步幅.这是针对等大小字符串的略微优化版本. –

>>> v.view(‘

array([‘a’, ‘d’, ‘g’],

dtype=’

这是.view方法的更通用版本,但这适用于具有不同长度的字符串数组.感谢Paul Panzer为suggestion –

>>> v.view(‘

array([‘a’, ‘d’, ‘g’],

dtype=’

性能

y = np.array([x * 20 for x in v]).repeat(100000)

y.shape

(300000,)

len(y[0]) # they’re all the same length – `abcabcabc…`

现在,时间 –

# `astype` conversion

%timeit y.astype(‘

100 loops, best of 3: 5.03 ms per loop

# `view` for equal sized string arrays

%timeit y.view(‘

100000 loops, best of 3: 2.43 µs per loop

# Paul Panzer’s version for differing length strings

%timeit y.view(‘

100000 loops, best of 3: 3.1 µs per loop

视图方法的速度更快.

但是,请谨慎使用,因为内存是共享的.

如果您对找到第一个字母(无论它在哪里)的更通用的解决方案感兴趣,我会说最快/最简单的方法是使用re模块,编译模式并在列表理解中搜索.

>>> p = re.compile(‘[a-zA-Z]’)

>>> [p.search(x).group() for x in v]

[‘a’, ‘d’, ‘g’]

并且,它在上述相同设置中的性能 –

%timeit [p.search(x).group() for x in y]

1 loop, best of 3: 320 ms per loop

    原文作者：weixin_39827304
    原文地址: https://blog.csdn.net/weixin_39827304/article/details/110784337
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。