python - 'TableIterator' 对象没有属性 'shape'

标签 python unit-testing testing pandas pytest

我正在尝试获取一个大型数据框并将其传递给一个函数,然后将其切成 block 。

所以为了测试我写的内容如下:

def test_get_dataframe(workspace):
dataframe = workspace.get_df('testing_df',True)
assert dataframe.shape[0] == 500000 

其中 testing_df 是大型数据框,但我收到此“TableIterator”对象没有属性“shape”错误。

我正在尝试使用形状方法来测试是否有 50 万行通过分块返回给我。

有什么帮助吗?

最佳答案

我写了一个函数来做到这一点:

def test_chunks(data, chunk):

    store = []

    for idx, count in enumerate(range(0, data.shape[0], chunk)): 
        try:
            test = pd.DataFrame(data.ix[count:count+chunk-1,:])
            assert test.shape[0] == chunk
        except:
            print 'table chunked incorrectly'
        store.append(test)

    print 'table chunked correctly'

    return store

它返回一个列表供您验证您的表是否被正确拆分:

In [72]: df = pd.DataFrame(np.random.randn(1000000,1))

In [73]: df
Out[73]: 
               0
0      -1.182168
1      -1.505764
2      -0.162236
3      -0.621377
4       2.341008
5      -1.506157
6      -0.116629
7       0.957325
8       0.367071
9       0.647191
10     -2.419967
11      0.442284
12      0.241475
13      0.171289
14     -0.624512
15     -0.780075
16     -1.627152
17     -0.100081
18     -0.540503
19     -1.126215
20      0.649648
21     -0.812951
22      0.596237
23     -1.413866
24      0.343937
25     -0.767372
26     -1.632577
27     -0.065164
28     -1.239659
29     -0.810848
...          ...
999970 -2.027269
999971 -0.149554
999972  1.217983
999973  0.453195
999974  0.514412
999975  0.151795
999976 -1.170795
999977 -0.945090
999978  1.385541
999979 -1.084080
999980 -0.564011
999981  1.497476
999982 -0.422143
999983  0.989664
999984  1.295070
999985 -0.838345
999986 -1.110576
999987  0.659037
999988 -1.099105
999989 -0.869162
999990  1.147460
999991  1.543114
999992  1.494555
999993 -1.526764
999994  0.025678
999995 -0.247338
999996 -0.985417
999997  0.356573
999998 -0.622785
999999 -0.100821

[1000000 rows x 1 columns]

In [74]: df = pd.DataFrame(np.random.randn(1000000,1))

In [75]: %paste
def test_chunks(data, chunk):

        store = []

        for idx, count in enumerate(range(0, data.shape[0], chunk)): 
                try:
                        test = pd.DataFrame(data.ix[count:count+chunk-1,:])
                        assert test.shape[0] == chunk
                except:
                        print 'table chunked incorrectly'
                store.append(test)

        print 'table chunked correctly'



        return store
## -- End pasted text --

In [76]: test_chunks(df, 500000)
table chunked correctly
Out[76]: 
[               0
 0      -0.770808
 1      -0.941473
 2       0.508013
 3       0.424950
 4       0.101314
 5      -1.154268
 6      -0.932678
 7       0.844011
 8       0.281692
 9       2.376677
 10      0.555523
 11     -0.565176
 12     -0.091829
 13     -1.262907
 14      0.769793
 15     -0.369955
 16     -0.071488
 17     -2.051964
 18      1.101495
 19      0.355003
 20     -0.537814
 21      1.368524
 22     -1.164048
 23     -1.483500
 24      0.737210
 25      0.228551
 26     -1.500423
 27      1.013433
 28      0.722119
 29      0.253644
 ...          ...
 499970  1.266769
 499971  0.594241
 499972  0.210255
 499973  0.730457
 499974 -0.454487
 499975 -0.125958
 499976  0.655793
 499977 -0.169799
 499978 -2.051298
 499979  0.066739
 499980  0.011063
 499981  0.707727
 499982 -1.070386
 499983 -0.875807
 499984 -1.283149
 499985  0.685271
 499986 -0.981217
 499987 -1.978422
 499988 -0.424755
 499989  0.976395
 499990  0.892599
 499991  0.582446
 499992 -2.256608
 499993 -0.915423
 499994  0.080076
 499995  2.350798
 499996 -0.208804
 499997  0.303654
 499998  1.730798
 499999  1.833389

 [500000 rows x 1 columns],                0
 500000  0.232947
 500001  0.335351
 500002 -0.252290
 500003  1.251981
 500004 -0.190665
 500005  1.686744
 500006 -0.398652
 500007 -1.732415
 500008  1.441498
 500009  0.574721
 500010 -1.586857
 500011  0.090962
 500012  0.041795
 500013 -0.074869
 500014 -0.549962
 500015  0.726490
 500016 -2.686839
 500017  1.369451
 500018 -1.947568
 500019 -0.115681
 500020 -0.292935
 500021 -0.535109
 500022 -1.276597
 500023 -1.228783
 500024  0.705259
 500025  0.538611
 500026 -0.100649
 500027 -1.145738
 500028  0.716736
 500029 -0.354400
 ...          ...
 999970 -0.682481
 999971 -0.823475
 999972 -1.144725
 999973  0.305905
 999974 -1.520020
 999975 -0.049710
 999976 -0.171224
 999977 -0.133479
 999978 -0.259963
 999979 -1.618230
 999980 -0.042287
 999981 -1.204132
 999982 -1.195320
 999983  0.343836
 999984 -0.163967
 999985  0.285751
 999986  0.476105
 999987 -0.657065
 999988 -0.259893
 999989 -0.481626
 999990  0.615710
 999991  0.111523
 999992 -0.278765
 999993 -0.597503
 999994 -0.356952
 999995 -0.156546
 999996 -0.082010
 999997 -0.296540
 999998  0.184973
 999999  0.127719

 [500000 rows x 1 columns]]

关于python - 'TableIterator' 对象没有属性 'shape',我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31016480/

相关文章:

Python Tkinter 网格管理器和条目小部件

javascript - JS-如何使用qunit进行测试

ios - App Store Connect 沙盒服务器到服务器通知未发送预期字段

Angular 单元测试: Access template variable

unit-testing - GH-Unit 和 Objective C++

sql - 如何避免手动编写/管理 SQL

python - 如何在Python中转换HH :MM:SS to time. time()对象

python - 如何提高python Dataframe中以下代码的性能,如果可能,请提及复杂性的顺序:

python - 定义 NDEF 消息

javascript - Angular Jasmine : View console output from controller?