python - 解析 numpy 数组的字符串表示

如果我只有 numpy.array 的字符串表示:

>>> import numpy as np
>>> arr = np.random.randint(0, 10, (10, 10))
>>> print(arr)  # this one!
[[9 4 7 3]
 [1 6 4 2]
 [6 7 6 0]
 [0 5 6 7]]

如何将其转换回 numpy 数组？手动实际插入 , 并不复杂，但我正在寻找一种编程方法。

用 , 替换空格的简单正则表达式实际上适用于个位数整数:

>>> import re
>>> sub = re.sub('\s+', ',', """[[8 6 2 4 0 2]
...  [3 5 8 4 5 6]
...  [4 6 3 3 0 3]]
... """)
>>> sub
'[[8,6,2,4,0,2],[3,5,8,4,5,6],[4,6,3,3,0,3]],'  # the trailing "," is a bit annoying

它可以转换成几乎(dtype可能会丢失但没关系)完全相同的数组:

>>> import ast
>>> np.array(ast.literal_eval(sub)[0])
array([[8, 6, 2, 4, 0, 2],
       [3, 5, 8, 4, 5, 6],
       [4, 6, 3, 3, 0, 3]])

但它对多位数整数和 float 失败:

>>> re.sub('\s+', ',', """[[ 0.  1.  6.  9.  1.  4.]
... [ 4.  8.  2.  3.  6.  1.]]
... """)
'[[,0.,1.,6.,9.,1.,4.],[,4.,8.,2.,3.,6.,1.]],'

因为这些在开头有一个额外的,。

解决方案不一定需要基于正则表达式，任何其他适用于 unabriged 的方法(未使用 ... 缩短)bool/int/float/具有 1-4 维的复杂数组就可以了。

最佳答案

这是一个非常手动的解决方案:

import re
import numpy

def parse_array_str(array_string):
    tokens = re.findall(r'''             # Find all...
                            \[         | # opening brackets,
                            \]         | # closing brackets, or
                            [^\[\]\s]+   # sequences of other non-whitespace characters''',
                        array_string,
                        flags = re.VERBOSE)
    tokens = iter(tokens)

    # Chomp first [, handle case where it's not a [
    first_token = next(tokens)
    if first_token != '[':
        # Input must represent a scalar
        if next(tokens, None) is not None:
            raise ValueError("Can't parse input.")
        return float(first_token)  # or int(token), but not bool(token) for bools

    list_form = []
    stack = [list_form]

    for token in tokens:
        if token == '[':
            # enter a new list
            stack.append([])
            stack[-2].append(stack[-1])
        elif token == ']':
            # close a list
            stack.pop()
        else:
            stack[-1].append(float(token))  # or int(token), but not bool(token) for bools

    if stack:
        raise ValueError("Can't parse input - it might be missing text at the end.")

    return numpy.array(list_form)

或者基于检测在何处插入逗号的手动解决方案:

import re
import numpy

pattern = r'''# Match (mandatory) whitespace between...
              (?<=\]) # ] and
              \s+
              (?= \[) # [, or
              |
              (?<=[^\[\]\s]) 
              \s+
              (?= [^\[\]\s]) # two non-bracket non-whitespace characters
           '''

# Replace such whitespace with a comma
fixed_string = re.sub(pattern, ',', array_string, flags=re.VERBOSE)

output_array = numpy.array(ast.literal_eval(fixed_string))

关于python - 解析 numpy 数组的字符串表示，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43879345/

python - 解析 numpy 数组的字符串表示

上一篇：python - PYSPARK:如何找到 pyspark 数据框中两列的余弦相似度？

下一篇：python - Django:无法将登录用户与帖子作者进行比较