我有一个 CSV 输入文件,需要在其中一列中添加所有值,但这些值不是普通整数,我不确定如何处理。
总输出应该在15k左右,也就是整列的总和。我正在使用 Pandas 数据框来存储 .csv 文件。
这是我输入的 .csv
文件中的其中一列:
DAMAGE_PROPERTY
0K
0K
2.5K
2.5K
.25K
.25K
2.5K
25K
2.5K
.25K
25K
25K
250K
2.5K
25K
2.5K
2.5K
2.5K
0K
2.5K
.25K
2.5K
25K
最佳答案
我认为您需要先通过 str.replace
删除 K
,然后通过 astype
转换为 float
最后sum
:
print (df.DAMAGE_PROPERTY.str.replace('K','').astype(float).sum())
401.0
然后可以乘以1000
:
print (df.DAMAGE_PROPERTY.str.replace('K','').astype(float).sum() * 1000)
401000.0
如果需要添加K
:
print (str(df.DAMAGE_PROPERTY.str.replace('K','').astype(float).sum()) + 'K')
401.0K
通过评论编辑:
如果需要在K
中输出:
print (df)
DAMAGE_PROPERTY
0 2.5K
1 2.5K
2 25M
#create mask where values `M`
mask = df.DAMAGE_PROPERTY.str.contains('M')
print (mask)
0 False
1 False
2 True
Name: DAMAGE_PROPERTY, dtype: bool
#multiple by 1000 where is mask
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.str.replace(r'[KM]','').astype(float)
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.mask(mask, df.DAMAGE_PROPERTY*1000)
print (df)
DAMAGE_PROPERTY
0 2.5
1 2.5
2 25000.0
print (df['DAMAGE_PROPERTY'].sum())
25005.0
print (str(df['DAMAGE_PROPERTY'].sum()) + 'K' )
25005.0K
如果需要输出为数字:
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.str.replace(r'[KM]','').astype(float)
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.mask(mask, df.DAMAGE_PROPERTY*1000) * 1000
print (df)
DAMAGE_PROPERTY
0 2500.0
1 2500.0
2 25000000.0
print (df['DAMAGE_PROPERTY'].sum())
25005000.0
编辑1:
如果有B
的值:
print (df)
DAMAGE_PROPERTY
0 2.5K
1 2.5B
2 25M
maskM = df.DAMAGE_PROPERTY.str.contains('M')
print (maskM)
0 False
1 False
2 True
Name: DAMAGE_PROPERTY, dtype: bool
maskB = df.DAMAGE_PROPERTY.str.contains('B')
print (maskB)
0 False
1 True
2 False
Name: DAMAGE_PROPERTY, dtype: bool
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.str.replace(r'[KMB]','').astype(float)
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.mask(maskM, df.DAMAGE_PROPERTY*1000)
df['DAMAGE_PROPERTY'] = df.DAMAGE_PROPERTY.mask(maskB, df.DAMAGE_PROPERTY*1000000)
print (df)
DAMAGE_PROPERTY
0 2.5
1 2500000.0
2 25000.0
print (df['DAMAGE_PROPERTY'])
0 2.5
1 2500000.0
2 25000.0
Name: DAMAGE_PROPERTY, dtype: float64
关于python - 求和一列值,包括 Python 中的字母,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38560126/