python - 使用 np.view() 更改 numpy 1.14 中的结构化数组

我有一个带有混合数据类型(即 float 、整数和字符串)的 numpy 结构化数组。我想选择数组的一些列(所有列都只包含 float )，然后按列获取行的总和，作为标准的 numpy 数组。初始数组采用类似于以下的形式:

some_data = np.array([('foo', 3.5, 2.15), ('bar', 2.8, 5.3), ('baz', 1.2, 3.7)], 
                     dtype=[('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])

对于这个例子，我想对 A 列和 B 列求和，得到 np.array([7.5, 11.15])。对于 numpy ≤1.13，我可以按如下方式进行:

get_cols = ['A', 'B']
desired_sum = np.sum(some_data[get_cols].view(('<f8', len(get_cols))), axis=0)

随着 numpy 1.14 的发布，此方法现在失败并出现 ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged，这是在numpy 1.14 to the handling of structured arrays . (用户 bbengfort 在 this answer 中评论了有关此更改的 FutureWarning。)

鉴于结构化数组的这些变化，我如何才能从结构化数组子集中获得所需的总和？

最佳答案

In [165]: some_data = np.array([('foo', 3.5, 2.15), ('bar', 2.8, 5.3), ('baz', 1.2, 3.7)], dtype=[('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])
     ...:                      
In [166]: get_cols = ['A','B']
In [167]: some_data[get_cols]
Out[167]: 
array([( 3.5,  2.15), ( 2.8,  5.3 ), ( 1.2,  3.7 )],
      dtype=[('A', '<f8'), ('B', '<f8')])

只需读取字段值即可。在 1.13 中我们得到一个警告

In [168]: some_data[get_cols].view(('<f8', len(get_cols)))
/usr/local/bin/ipython3:1: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array. 

This code may break in numpy 1.13 because this will return a view instead of a copy -- see release notes for details.
  #!/usr/bin/python3
Out[168]: 
array([[ 3.5 ,  2.15],
       [ 2.8 ,  5.3 ],
       [ 1.2 ,  3.7 ]])

使用推荐的副本，没有警告:

In [169]: some_data[get_cols].copy().view(('<f8', len(get_cols)))
Out[169]: 
array([[ 3.5 ,  2.15],
       [ 2.8 ,  5.3 ],
       [ 1.2 ,  3.7 ]])
In [171]: np.sum(_, axis=0)
Out[171]: array([  7.5 ,  11.15])

在你原来的数组中，

dtype([('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])

A,B 切片会将两个 f8 项散布在 20U 项中。更改这种混合的 view dtype 是有问题的。这就是使用副本更可靠的原因。

由于 U20 占用 4*20 字节，所以 itemsize 总共是 96，是 8 的倍数。我们可以将整个东西转换为 f8， reshape 和“丢弃”U20 列:

In [183]: some_data.view('f8').reshape(3,-1)[:,-2:]
Out[183]: 
array([[ 3.5 ,  2.15],
       [ 2.8 ,  5.3 ],
       [ 1.2 ,  3.7 ]])

它不是很漂亮，我不推荐它，但它可以让您深入了解结构化数据的排列方式。

view 结构化数组有时很有用，但要正确使用通常有点棘手。

如果这 2 个数字字段通常一起使用，我建议使用复合数据类型，例如:

In [184]: some_data = np.array([('foo', [3.5, 2.15]), ('bar', [2.8, 5.3]), ('baz
     ...: ', [1.2, 3.7])], 
     ...:                      dtype=[('col1', '<U20'), ('AB', '<f8',(2,))])
     ...:                      
     ...:                      
In [185]: some_data
Out[185]: 
array([('foo', [ 3.5 ,  2.15]), ('bar', [ 2.8 ,  5.3 ]),
       ('baz', [ 1.2 ,  3.7 ])],
      dtype=[('col1', '<U20'), ('AB', '<f8', (2,))])
In [186]: some_data['AB']
Out[186]: 
array([[ 3.5 ,  2.15],
       [ 2.8 ,  5.3 ],
       [ 1.2 ,  3.7 ]])

genfromtxt 接受这种风格的 dtype。

关于python - 使用 np.view() 更改 numpy 1.14 中的结构化数组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48267058/

python - 使用 np.view() 更改 numpy 1.14 中的结构化数组

上一篇：python - Ansible pyenv virtualenv 给出 pip 错误？

下一篇：python - BruteForce while 循环中的计数器； yield 内存不足？