python - 如何制作一个以 pandas 数据帧列表作为值的字典?

标签 python pandas dictionary dataframe

我正在尝试将 pandas 数据帧放入字典中,而不是相反。

我尝试将数据帧 block 列表作为值放入字典中,Python 返回错误且没有任何解释。

这就是我正在尝试做的事情:

我将 Messenger 聊天日志 csv 文件导入到 pandas 数据框中,并设法按日期将其拆分,并将它们全部放入列表中。

现在我想迭代此列表并将其进一步拆分:如果聊天停止超过 15 分钟,则会将其拆分为多个 block 。我想制作另一个特定日期的聊天 block 的列表,然后将它们放在一个字典中,其中键是日期,值是这些 block 的列表。

然后 Python 突然返回一个错误。以下是我陷入困境并返回错误的地方。

import pandas as pd
from datetime import datetime

# Get chatlog and turn it into Pandas Dataframe
ktlk_csv = pd.read_csv(r'''C:\Users\Jaepil\PycharmProjects\test_pycharm/5years.csv''', encoding="utf-8")
df = pd.DataFrame(ktlk_csv)

# Change "Date" column from String to DateTime 
df["Date"] = pd.to_datetime(df["Date"])

# Make a column "time_diff" which is literally diffences of timestamp between chats. 
df["time_diff"] = df["Date"].diff()
df["time_diff"] = df["time_diff"].dt.total_seconds()

# Criteria to split chat chunks 
chunk_tolerance = 900 # 900: 15min of silence splits a chat
chunk_min = 5 # a chat less than 5 min is not a chunk. 

# Split a chatlog by date. (1st split)
df_byDate = []
for group in df.groupby(lambda x: df["Date"][x].day):
    df_byDate.append(group)

# Iterate over the list of splitted chats and split them into many chunks
df_chunk = {}
for day in df_byDate:
    table = day[1]
    list_of_daily_chunks = []
    for group in table.groupby(lambda x: table["time_diff"][x] < chunk_tolerance ):
        list_of_daily_chunks.append(group)

    # It does NOT return any error up to this point. 

    key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")
    df_chunk[key] = list_of_daily_chunks

这会返回一个错误:

> C:/Users/Jaepil/PycharmProjects/test_pycharm/PYNEER_KatalkBot_-_CSV_to_Chunk.py Traceback (most recent call last): File "C:/Users/Jaepil/PycharmProjects/test_pycharm/PYNEER_KatalkBot_-_CSV_to_Chunk.py", line 32, in key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d") File "C:\Users\Jaepil\Anaconda3\lib\site-packages\pandas\core\series.py", line 601, in getitem result = self.index.get_value(self, key) File "C:\Users\Jaepil\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value tz=getattr(series.dtype, 'tz', None)) File "pandas_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas_libs\index.c:4404) File "pandas_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas_libs\index.c:4087) File "pandas_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5126) File "pandas_libs\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas_libs\hashtable.c:14031) File "pandas_libs\hashtable_class_helper.pxi", line 765, in pandas._libs.hashtable.Int64HashTable.get_item (pandas_libs\hashtable.c:13975) KeyError: 0

我做错了什么? 起初,我收到一个错误,系列对象无法进行哈希处理,因此我将其更改为字符串。但是,现在出现了不同的错误。

"Series objects are mutable and cannot be hashed" error

最佳答案

我认为你需要:

key = table.loc[:, "Date"].dt.date[0].strftime("%Y-%m-%d")

首先通过 strftime 转换为字符串然后通过 iat 选择第一个值:

key = table["Date"].dt.strftime("%Y-%m-%d").iat[0]

或者使用iloc用于选择第一行 get_loc对于日期列的位置:

key = table.iloc[0, df.columns.get_loc("Date")].strftime("%Y-%m-%d")

关于python - 如何制作一个以 pandas 数据帧列表作为值的字典?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47451517/

相关文章:

python - 从 SQL 语句结果中删除符号

python - 使用列标签列表从 DataFrame 行中选择列

python - 将行附加到 Pandas 中的组

c++ - vector map - 怎么做

python - Django - 从 django admin 编辑模板

Python Regex 匹配 YAML Front Matter

python - 使用实际数据在图上添加回归线

python - Pandas:如何将 MultiIndex DataFrame 与单个索引 DataFrame 连接起来,以及自定义排序

Python根据字典中的条件查找obj值

php - php中有字典吗?