我必须从以下路径读取 pandas 的 CTG.xls 文件: https://archive.ics.uci.edu/ml/machine-learning-databases/00193/ .
从这个文件中我必须选择工作表数据。此外,我必须从文件的 K 列到 AT 列进行选择。所以最后我们得到了一个包含这些列的数据集:
["LB","AC","FM","UC","DL","DS","DP","ASTV","MSTV","ALTV","MLTV","宽度","Min","Max","Nmax","Nzeros","Mode","Mean","Median","Variance","Tendency","CLASS","NSP"]
如何使用 pandas 中的 read 函数来做到这一点?
最佳答案
用途:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00193/CTG.xls'
df = pd.read_excel(url, sheet_name='Data', skipfooter=3)
df = df.drop(columns=df.filter(like='Unnamed').columns)
df.columns = df.iloc[0].to_list()
df = df[1:].reset_index(drop=True)
输出
LB AC FM UC DL DS DP ASTV MSTV ALTV MLTV Width Min Max Nmax Nzeros Mode Mean Median Variance Tendency CLASS NSP
0 120 0 0 0 0 0 0 73 0.5 43 2.4 64 62 126 2 0 120 137 121 73 1 9 2
1 132 0.00638 0 0.00638 0.00319 0 0 17 2.1 0 10.4 130 68 198 6 1 141 136 140 12 0 6 1
2 133 0.003322 0 0.008306 0.003322 0 0 16 2.1 0 13.4 130 68 198 5 1 141 135 138 13 0 6 1
3 134 0.002561 0 0.007682 0.002561 0 0 16 2.4 0 23 117 53 170 11 0 137 134 137 13 1 6 1
4 132 0.006515 0 0.008143 0 0 0 16 2.4 0 19.9 117 53 170 9 0 137 136 138 11 1 2 1
... ... ... ... ... ... .. .. ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
2121 140 0 0 0.007426 0 0 0 79 0.2 25 7.2 40 137 177 4 0 153 150 152 2 0 5 2
2122 140 0.000775 0 0.006971 0 0 0 78 0.4 22 7.1 66 103 169 6 0 152 148 151 3 1 5 2
2123 140 0.00098 0 0.006863 0 0 0 79 0.4 20 6.1 67 103 170 5 0 153 148 152 4 1 5 2
2124 140 0.000679 0 0.00611 0 0 0 78 0.4 27 7 66 103 169 6 0 152 147 151 4 1 5 2
2125 142 0.001616 0.001616 0.008078 0 0 0 74 0.4 36 5 42 117 159 2 1 145 143 145 1 0 1 1
[2126 rows x 23 columns]
关于python - 读取包含特定工作表且仅选定列的在线 Excel 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71421344/