python - 如何打印具有相似字符串的值?

标签 python python-3.x

我的目标是读取一个大的 csv 文件并打印出所有类似的值,因为它都是关于酒店的,为了简单起见,我将在此代码中列出一个字典列表:

S1 = [{'name': 'Holiday Inn A','price': '552'},
{'name': 'Holiday Inn B','price': '568'},
{'name': 'Holiday Inn C','price': '589'},
{'name': 'Grand Palace','price': '768'}
and so on...]

我的意思是我想打印出所有名称为“Holiday Inn”的值,这是我想要的结果:

Holiday Inn A
Holiday Inn B
Holiday Inn C

这是我的代码:

import csv

name = []
value = []
linked = []
a = []

def filereader():
    line_count = 0
    with open('hotelRev.csv','r', encoding ='utf-8') as fileIn:
        reader = csv.reader(fileIn)
        for row in reader:
            line_count = line_count + 1
            if line_count == 1:
                name.append(row)
            else:
                value.append(row)

    for x in name:
        for y in value:
            linked.append(dict(zip(x,y)))

filereader()
for row in linked:
    a.append(row['name'])

b = sorted(set(a))

for row in linked:
    print(row['name']['Holiday Inn'])

显然这不起作用,所以有人知道如何做到这一点吗?

edit-1:我所说的“类似”是指将所有假日酒店元素分类为一个大组,以便更容易调用和打印。

来自数据集本身的直接示例:

Holiday Inn Express & Suites Austin South                             
Holiday Inn Express & Suites Baton Rouge East                         
Holiday Inn Express & Suites Bethlehem                                
Holiday Inn Express & Suites Bloomington                              
Holiday Inn Express & Suites Butte                                    
Holiday Inn Express & Suites Carmel-north Indianapolis                
Holiday Inn Express & Suites Carpinteria                              
Holiday Inn Express & Suites Columbus - Polaris Parkway               
Holiday Inn Express & Suites Columbus Univ Area - Osu                 
Holiday Inn Express & Suites Denver Northeast - Brighton

如果可能的话,我很想找到一种方法以尽可能少的行数打印它们

最佳答案

这是使用集合的基本解决方案。我认为这对于非常大的数据集来说效率不高,但可以引用它来创建有效的解决方案。

import pandas as pd
import re

df = pd.read_csv('HotelNames.csv')

search_terms = input('Enter search terms: ')
#Convert to lower case
search_terms = search_terms.lower()
#Remove special characters except space
search_terms = re.sub(r"[^a-zA-Z0-9]+", ' ', search_terms)

#Make a list of words from the string
temp = search_terms.split(' ')

search_set = set()
for i in range(len(temp)):
    #Make a set of unique words
    search_set.add(temp[i])

for i in range(len(df)):

    t = re.sub(r"[^a-zA-Z0-9]+", ' ', df.iloc[i][0])
    t = t.lower()
    temp = t.split(' ')

    hotel_set = set()
    for j in range(len(temp)):
        hotel_set.add(temp[j])

    #Find whether the searched terms are a subset of the hotel name in that particular row
    if(search_set.issubset(hotel_set)):
        print(df.iloc[i][0])

HotelNames.csv 目前包含 1 列,即酒店名称。

关于python - 如何打印具有相似字符串的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55642447/

相关文章:

python - python 中的 Windows 通知

javascript - 从 Javascript 确定 Django 用户 ID

r - 无法更改网状中的Python路径

python - Scikit Learn 多标签分类 : ValueError: You appear to be using a legacy multi-label data representation

python - 使用编译函数时出现UnicodeEncodeError

python - 将列添加到 pandas 数据框中,从其他列中的列表中获取值

python - 在Python中解码消息头和消息体

python-3.x - Pandas:扩展行的列表列表

python - sklearn Imputer() 返回的特征不适合拟合函数

python - 如何在 Python 中读取输入文件?