python - 类型错误:字符串索引必须是整数(使用 pandas Dataframe)

再次谦虚地请求社区的帮助。

我有一项数据分析任务，研究给定数据集不同列之间的联系。为此，我必须编辑我想要使用的列。我需要的列包含数据，它看起来像一个字典列表，但它实际上是一个字符串。所以我必须编辑它以从以前的“字典”中获取“名称”值。

下面的代码代表了我的神奇仪式，从该字符串中获取“名称”值，将它们作为字符串保存在另一列中，仅将那些“名称”值收集在列表中，之后我将该函数应用于整个列并通过这些字符串与“名称”值的唯一组合对其进行分组。 (最大任务是将这些“名称”值分隔为几个附加列，以便稍后按所有这些列对它们进行排序；但是出现了问题，源列中的巨大字符串(df['specializations'])可以包含许多“字典”，所以我无法确切地知道要为它们创建多少个附加列；所以我放弃了这个想法。)

带有伪字典列表的典型字符串看起来像这样(这些“字典”的数量各不相同):

[{'id': '1.172', 'name': '初学者', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '1.117', 'name': '测试', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': '初学者'}]

import re

def get_names_values(df):
    for a in df['specializations']:
        for r in (("\'", "\""), ("[", ""), ("]", ""), ("}", "")):
            a = a.replace(*r)
        a = re.split("{", a)
        m = 0
        while m < len(a):
            if a[m] in ('', ': ', ', '):
                del a[m]
            m += 1
        a = "".join(a)
        a = re.split("\"", a)
        n = 0
        while n < len(a):
            if a[n] in ('', ': ', ', '):
                del a[n]
            n += 1
        nameslist = []
        for num in range(len(a)):
            if a[num] == 'name':
                nameslist.append(a[num+1])
        return str(nameslist)


df['specializations_names'] = df['specializations'].fillna('{}').apply(get_names_values)
df['specializations_names']

问题出现在 for a in df['specializations']: 上，因为它引发了 类型错误:字符串索引必须是整数。我单独检查了该循环，例如 (print(a))，它给了我一个正确的结果；我也尝试过:

for k in range(len(df)):
  a = df['specializations'][k]

再次，它按照我的需要单独工作，但在我的函数内部它引发了 TypeError。我觉得我要放弃[“特化”]专栏并尝试研究其他一些；但我仍然很好奇这里出了什么问题以及如何解决这个问题。

非常感谢所有愿意提前提供建议的人。

最佳答案

您遇到的“带有伪字典列表的字符串”似乎是 json 数据。您可以使用 eval() 将其转换为实际的字典列表，然后正常操作。使用eval() with caution ，尽管。我尝试重新创建该字符串并使其工作:

str_dicts = str([{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'},
                 {'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'},
                 {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}])

dicts = list(eval(str_dicts))
     
names = [d['name'] for d in dicts]

print(names)

[0]: ['Beginner', 'Testing', 'IT']

如果您的列是一系列字符串，实际上是字典列表，那么您可能需要进行这样的列表理解:

df['specializations_names'] = [[d['name'] for d in list(eval(row))] 
                               for row in df['specializations']]

我尝试根据您提供的内容部分重现您尝试执行的操作:

import pandas as pd

str_dicts = str([{'id': '1.172', 'name': 'Beginner', 'profarea_id': '1', 'profarea_name': 'IT'},
                 {'id': '1.117', 'name': 'Testing', 'profarea_id': '1', 'profarea_name': 'IT'},
                 {'id': '15.93', 'name': 'IT', 'profarea_id': '15', 'profarea_name': 'Beginner'}])

df = pd.DataFrame({'specializations': [str_dicts, str_dicts, str_dicts]})

df['specializations_names'] = [[d['name'] for d in list(eval(row))] 
                               for row in df['specializations']]

print(df)

结果是:

<表类=“s-表”> <标题> 特化 specializations_names <正文> 0 [{'id': '1.172', 'name': '初学者', 'profarea_id': '1', 'profarea_name': 'IT'}, {' id': '1.117', 'name': '测试', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', ' profarea_id': '15', 'profarea_name': '初学者'}] ['初学者', '测试', 'IT'] 1 [{'id': '1.172', 'name': '初学者', 'profarea_id': '1', 'profarea_name': 'IT'}, {' id': '1.117', 'name': '测试', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', ' profarea_id': '15', 'profarea_name': '初学者'}] ['初学者', '测试', 'IT'] 2 [{'id': '1.172', 'name': '初学者', 'profarea_id': '1', 'profarea_name': 'IT'}, {' id': '1.117', 'name': '测试', 'profarea_id': '1', 'profarea_name': 'IT'}, {'id': '15.93', 'name': 'IT', ' profarea_id': '15', 'profarea_name': '初学者'}] ['初学者', '测试', 'IT']

因此，可能存在包含任意数量的字典列表的字符串，而不是我使用的虚拟变量，其长度最多为 df .

关于python - 类型错误:字符串索引必须是整数(使用 pandas Dataframe)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73759146/

python - 类型错误:字符串索引必须是整数(使用 pandas Dataframe)

上一篇：node.js - mongodb聚合中的日期过滤器

下一篇：c# - ASP.NET Core Web API - 执行 'LastOrDefault' 操作的查询必须具有确定性排序顺序