我读了《数据科学简介。大数据、机器学习等,使用 Python 工具》一书 Chapter4中有一段关于分块矩阵计算的代码:
import dask.array as da
import bcolz as bc
import numpy as np
import dask
n = 1e4 #A
ar = bc.carray(np.arange(n).reshape(n/2,2) , dtype='float64', rootdir = 'ar.bcolz', mode = 'w') #B
y = bc.carray(np.arange(n/2), dtype='float64', rootdir = 'yy.bcolz', mode = 'w') #B,
dax = da.from_array(ar, chunks=(5,5)) #C
dy = da.from_array(y,chunks=(5,5)) #C
XTX = dax.T.dot(dax) #D
Xy = dax.T.dot(dy) #E
coefficients = np.linalg.inv(XTX.compute()).dot(Xy.compute()) #F
coef = da.from_array(coefficients,chunks=(5,5)) #G
ar.flush() #H
y.flush() #H
predictions = dax.dot(coef).compute() #I
print (predictions)
我得到值错误:
ValueError Traceback (most recent call last)
<ipython-input-4-7ae8e9cf2346> in <module>()
10
11 dax = da.from_array(ar, chunks=(5,5)) #C
---> 12 dy = da.from_array(y,chunks=(5,5)) #C
13
14 XTX = dax.T.dot(dax) #D
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in from_array(x, chunks, name, lock, fancy, getitem)
1868 >>> a = da.from_array(x, chunks=(1000, 1000), lock=True) # doctest: +SKIP
1869 """
-> 1870 chunks = normalize_chunks(chunks, x.shape)
1871 if len(chunks) != len(x.shape):
1872 raise ValueError("Input array has %d dimensions but the supplied "
C:\Users\F\Anaconda3\lib\site-packages\dask\array\core.py in normalize_chunks(chunks, shape)
1815 raise ValueError(
1816 "Chunks and shape must be of the same length/dimension. "
-> 1817 "Got chunks=%s, shape=%s" % (chunks, shape))
1818
1819 if shape is not None:
ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(5, 5), shape=(5000,)
问题是什么?
最佳答案
问题出在这里:
np.arange(n/2).reshape(n)
您创建了一个大小为 n/2
的数组,然后尝试将其 reshape
为大小为 n
。您无法使用reshape
更改大小。
这可能是复制/粘贴错误?它不在你的原始代码中,看来你正在做
np.arange(n).reshape(n/2,2)
在其他地方,只要 n
是偶数,它就可以工作(请小心,如果 n
不是偶数,这也会失败。)
关于python - 值错误: Chunks and shape must be of the same length/dimension,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44918438/