我想将以下整个数据集转换为百分比。
https://cocl.us/datascience_survey_data
要找出应该使用该行的百分比总和。
例如大数据 (Spark/Hadoop) = 1332 + 729 + 127 = 2188
所以百分比会很感兴趣:60.87%
我想为所有行自动执行此操作。 怎么做?
最佳答案
可以用DataFrame.div
划分所有列的数据每行求和
,然后乘以100
:
df = pd.read_csv('Topic_Survey_Assignment.csv', index_col=0)
df1 = df.div(df.sum(axis=1), axis=0).mul(100)
print (df1)
Very interested Somewhat interested \
Big Data (Spark / Hadoop) 60.877514 33.318099
Data Analysis / Statistics 77.007299 20.255474
Data Journalism 20.235849 50.990566
Data Visualization 61.580882 33.731618
Deep Learning 58.229599 35.500231
Machine Learning 74.724771 21.880734
Not interested
Big Data (Spark / Hadoop) 5.804388
Data Analysis / Statistics 2.737226
Data Journalism 28.773585
Data Visualization 4.687500
Deep Learning 6.270171
Machine Learning 3.394495
详细信息:
print (df.sum(axis=1))
Big Data (Spark / Hadoop) 2188
Data Analysis / Statistics 2192
Data Journalism 2120
Data Visualization 2176
Deep Learning 2169
Machine Learning 2180
dtype: int64
Numpy 替代方案非常相似:
df = pd.read_csv('Topic_Survey_Assignment.csv', index_col=0)
arr = df.values
df1 = pd.DataFrame(arr / np.sum(arr, axis=1)[:, None] * 100,
index=df.index,
columns=df.columns)
print (df1)
Very interested Somewhat interested \
Big Data (Spark / Hadoop) 60.877514 33.318099
Data Analysis / Statistics 77.007299 20.255474
Data Journalism 20.235849 50.990566
Data Visualization 61.580882 33.731618
Deep Learning 58.229599 35.500231
Machine Learning 74.724771 21.880734
Not interested
Big Data (Spark / Hadoop) 5.804388
Data Analysis / Statistics 2.737226
Data Journalism 28.773585
Data Visualization 4.687500
Deep Learning 6.270171
Machine Learning 3.394495
关于python - 将整个数据集转换为百分比,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58785948/