我有一个包含以下内容的列表:
list1 = [(4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0),
(0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0),
(0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0),
(0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0),
(0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0),
(0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0),
(0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0),
(0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0),
(0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)]
我需要在垂直轴上计算这些值的平均值,以便输出如下所示:
[(average_col1, average_col2, average_col3, average_col4, average_col5, average_col6)]
但是,np.mean(list1, axis=1)
命令返回:
IndexError: tuple index out of range
因此我尝试使用以下方法创建 numpy 数组:
a = np.array(list1)
a = array([ (4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0),
(0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0),
(0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0),
(0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0),
(0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0),
(0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0),
(0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0),
(0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0),
(0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)],
dtype=[('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8'), ('col4', '<f8'), ('col5', '<i4'), ('col6', '<i4')])
如果我使用与上面相同的平均命令,它将返回:
IndexError: tuple index out of range
因此我不知道从这里该做什么。
最佳答案
您在使用 numpy 时遇到的问题是示例中矩阵的声明。
给定:
list1 = [(4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0),
(0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0),
(0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0),
(0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0),
(0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0),
(0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0),
(0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0),
(0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0),
(0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)]
您可以轻松地使用它来获取 numpy 中按列的平均值:
>>> np.mean(list1, axis=0)
[ 0.68679585 0.34464285 0.20140261 5.83448231 0.44444444 0. ]
接下来你有一个有趣的声明:
a = np.array([ (4.974874129422414, 0.4384932775564907, 0.1879318517703546, 5.820735609514166, 0, 0),
(0.15069597326856923, 0.2961961688603689, 0.21595885700786707, 5.848923022691187, 1, 0),
(0.15085612758502492, 0.28850876174946627, 0.18977362640233908, 5.826501216543082, 0, 0),
(0.15069597326856923, 0.2887489932217097, 0.2176404773200905, 5.834028536994648, 1, 0),
(0.15093620474325167, 0.3005203353595069, 0.18961347208652674, 5.849643723630468, 0, 0),
(0.15069597326856923, 0.3235825566813912, 0.21515808543054254, 5.849964035159586, 1, 0),
(0.15085612758502492, 0.3520099475391594, 0.18937324061280378, 5.814569613228549, 0, 0),
(0.15093620474325167, 0.3860427394179732, 0.2174803230046498, 5.858131979266134, 1, 0),
(0.1506158961103403, 0.42768286128894817, 0.18969354924443318, 5.807843071967709, 0, 0)],
dtype=[('col1', '<f8'), ('col2', '<f8'), ('col3', '<f8'), ('col4', '<f8'), ('col5', '<i4'), ('col6', '<i4')])
这与matrix=np.array(list1)不同
它所做的是声明一个numpy structured array并命名每一列并为该列提供一个dtype
该数组的每一行元素都是一个元组:
>>> a[0]
( 4.97487413, 0.43849328, 0.18793185, 5.82073561, 0, 0)
并且您无法以通常的方式访问列:
>>> a[:,0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
因为它实际上是一个一维数组:
>>> a.shape
(9,)
相反,您必须按名称访问列:
>>> a['col1']
array([ 4.97487413, 0.15069597, 0.15085613, 0.15069597, 0.1509362 ,
0.15069597, 0.15085613, 0.1509362 , 0.1506159 ])
或者,按列名称取平均值:
>>> [np.mean(a[col]) for col in ['col{}'.format(i) for i in range(1,7)]]
[0.68679584555500162, 0.34464284907500159, 0.20140260920884526, 5.8344823121106151, 0.44444444444444442, 0.0]
关于python - 如何对包含空列表的列表进行垂直平均?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44307952/