python - 如何指定 Pandas 数据框的行数？

我有一个 Pandas 数据框，我每秒不断地附加一行数据，如下所示。

df.loc[time.strftime("%Y-%m-%d %H:%M:%S")] = [reading1, reading2, reading3]
>>>df
                     sensor1 sensor2 sensor3
2015-04-14 08:50:23    5.4     5.6     5.7
2015-04-14 08:50:24    5.5     5.6     5.8
2015-04-14 08:50:26    5.2     5.3     5.4

如果我继续这样做，最终我将开始遇到内存问题(每次它都会调用整个 DataFrame)。

我只需要保留 X 行数据。即在操作之后，它将是:

>>>df
                     sensor1 sensor2 sensor3
(this row is gone)
2015-04-14 08:50:24    5.5     5.6     5.8
2015-04-14 08:50:26    5.2     5.3     5.4
2015-04-14 08:50:27    5.2     5.4     5.6

有没有一种方法可以指定最大行数，以便在添加任何后续行时，同时删除最旧的行而无需 “检查 DataFrame 的长度，如果 DataFrame 的长度 > X，删除第一行，追加新行”？

像这样，但对于 Pandas DataFrame:https://stackoverflow.com/a/10155753/4783578

最佳答案

一种方法是预先分配行，并循环替换值。

# Say we to limit to a thousand rows
N = 1000

# Create the DataFrame with N rows and 5 columns -- all NaNs
data = pd.DataFrame(pd.np.empty((N, 5)) * pd.np.nan) 

# To check the length of the DataFrame, we'll need to .dropna().
len(data.dropna())              # Returns 0

# Keep a running counter of the next index to insert into
counter = 0

# Insertion always happens at that counter
data.loc[counter, :] = pd.np.random.rand(5)

# ... and increment the counter, but when it exceeds N, set it to 0
counter = (counter + 1) % N

# Now, the DataFrame contains one row
len(data.dropna())              # Returns 1

# We can add several rows one after another. Let's add twice as many as N
for row in pd.np.random.rand(2 * N, 5):
    data.loc[counter, :] = row
    counter = (counter + 1) % N

# Now that we added them, we still have only the last N rows
len(data)                       # Returns N

这避免了以任何方式修改数据的需要，并且是一种追加数据的快速方法。但是，如果出现以下情况，则读取数据的速度可能会变慢:

数据的顺序很重要。如果需要相同顺序的数据，则需要使用counter对data进行切片，提取出原始顺序。
行数少。如果您最终添加的行少于 N，您将需要 .dropna()(或计算插入的总行数)来删除未使用的行。

在我处理的大多数截断附加性能很重要的场景中，以上都不是真的，但您的场景可能会有所不同。在这种情况下，@Alexander 有一个很好的解决方案，涉及 .shift()。

关于python - 如何指定 Pandas 数据框的行数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29609118/

python - 如何指定 Pandas 数据框的行数？

上一篇：Django 中的 Python 社交身份验证，makemigrations 未检测到任何更改

下一篇：python - 使用 Python : TypeError: an integer is required (got type socket) 的 HTTP 请求