python-2.7 - 以文件名作为列标题将多个 *.txt 文件读入 Pandas Dataframe

我正在尝试导入一组 *.txt 文件。我需要将文件导入 Python 中 Pandas DataFrame 的连续列中。

要求和背景信息:

每个文件都有一列数字

文件中不存在标题

正整数和负整数都是可能的

所有 *.txt 文件的大小相同

DataFrame 的列必须以文件名(不带扩展名)作为标题

提前未知文件数量

这是一个示例 *.txt 文件。所有其他人都具有相同的格式。

这是我的尝试:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

步骤 1 和 2 起作用。我在执行第 3 步时遇到问题。我收到以下错误消息:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

题:
有没有更好的方法将这些 *.txt 文件加载到 Pandas 数据框中？为什么 read_csv 不接受文件名的字符串？

最佳答案

您可以将它们读入多个数据帧，然后将它们连接在一起。假设您有其中两个文件，其中包含显示的数据。

In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
    val1  val2
0     16    16
1     54    54
2   -314  -314
3      1     1
4     15    15
5      4     4
6    153   153
7     86    86
8      4     4
9     64    64
10   373   373
11     3     3
12   434   434
13    31    31
14    93    93
15    53    53
16   873   873
17    43    43
18    11    11
19   533   533
20    46    46

关于python-2.7 - 以文件名作为列标题将多个 *.txt 文件读入 Pandas Dataframe，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26415906/

python-2.7 - 以文件名作为列标题将多个 *.txt 文件读入 Pandas Dataframe

上一篇：bazel - 如何将 Starlark 脚本正确加载到另一个脚本中？

下一篇：angular - 响应重定向(302)在 Angular 不起作用