python - 使用来自另一列的子字符串创建字典键

我有一个数据框，其中有一列包含纽约市行政区的名称(曼哈顿、布鲁克林等)。我想创建另一列“borough_num”，为每个行政区分配一个数字(曼哈顿 -> 1，布鲁克林 -> 2，皇后区 -> 3，史坦顿岛 -> 4，布朗克斯 -> 5，其他 -> 0) .

但是，在“自治市镇”列中，某些行在自治市镇名称前面包含数字(例如，我用“07 Bronx”代替“Bronx”)。因为这个“07 Bronx”仍然是布朗克斯区的一部分，所以它也应该被赋予与“布朗克斯”相同的值“5”。因此，我需要创建一个字典，将数字 5 分配给一个包含单词“Bronx”的字符串。每个自治市镇都一样。关于如何做到这一点的任何线索？我是 Python 新手!

这是我在注意到带有数字的单元格之前所拥有的:

df['Borough'] = df['Borough'].fillna('OTHER')
borough_dict = {'MANHATTAN':1, 'BROOKLYN':2, 'QUEENS': 3, 'STATEN ISLAND': 4, 'BRONX': 5, 'OTHER':6}
df['borough_num'] = df['Borough'].apply(lambda x:0 if borough_dict.get(x) == None else borough_dict.get(x))

最佳答案

由于要为一小组行政区名称分配整数代码，因此将其作为一系列明确的逻辑索引分配来完成是完全可以接受的，例如下面的一些示例数据。

具体来说，在这种情况下没有必要尝试将行政区到代码的映射封装到 dict 中。或辅助功能或任何爱好者 apply或 map对 DataFrame 的操作。

只是一组 5 个无聊、直接的逻辑作业。

In [13]: df = pandas.DataFrame({
    'Borough': ["Manhattan", "Brooklyn", "Bronx", "07 Bronx", 
                "109 Staten Island", "03 Brooklyn", "04 Queens"], 
    'Value':[1, 2, 3, 4, 5, 6, 7]
})

In [14]: df
Out[14]:
             Borough  Value
0          Manhattan      1
1           Brooklyn      2
2              Bronx      3
3           07 Bronx      4
4  109 Staten Island      5
5        03 Brooklyn      6
6          04 Queens      7

In [15]: df['Borough_num'] = 6  # everything defaults to the 'other' case

In [16]: df.loc[df.Borough.str.contains("Manhattan"), 'Borough_num'] = 1

In [17]: df.loc[df.Borough.str.contains("Brooklyn"), 'Borough_num'] = 2

In [18]: df.loc[df.Borough.str.contains("Queens"), 'Borough_num'] = 3

In [19]: df.loc[df.Borough.str.contains("Staten Island"), 'Borough_num'] = 4

In [20]: df.loc[df.Borough.str.contains("Bronx"), 'Borough_num'] = 5

In [21]: df
Out[21]: 
             Borough  Value  Borough_num
0          Manhattan      1            1
1           Brooklyn      2            2
2              Bronx      3            5
3           07 Bronx      4            5
4  109 Staten Island      5            4
5        03 Brooklyn      6            2
6          04 Queens      7            3

如果你出于任何原因想要封装自治市镇到代码的映射，你可以使用简单的 dict 来实现。接着是一个循环:

In [30]: borough_code = {'Manhattan': 1, 'Brooklyn': 2, 'Queens': 3,
                         'Staten Island': 4, 'Bronx': 5}

In [31]: for borough, code in borough_code.items():
    ...:     df.loc[df.Borough.str.contains(borough), 'Borough_num'] = code

除非DataFrame很大，否则str.contains的重复向量化计算与跨列映射函数没有区别，但更容易理解。

关于python - 使用来自另一列的子字符串创建字典键，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49909176/

python - 使用来自另一列的子字符串创建字典键

上一篇：python - 从子流程的子流程捕获输出

下一篇：python - 使用 Selenium + Python 滚动到元素后的元素屏幕截图？