我在使用自动谷歌搜索获得的数据框中有各种品牌网址,我已将这些网址拆分成单词,并尝试将品牌名称和制造商名称与网址进行比较,以检查是否正确(AS大多数公司都有基于其品牌名称或制造公司名称的 URL)
try:
from googlesearch import search
except ImportError:
print("No module named 'google' found")
for i in search(Brand.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
for i in search(Manufacturer.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
for i in search(Brand.get_attribute("innerHTML") and Manufacturer.get_attribute("innerHTML"), tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
for i in search(Brand.get_attribute("innerHTML") and Manufacturer.get_attribute("innerHTML") and "Beverage", tld="com", num=15, stop=1, pause=2):
webaddresses.append(i)
webaddresses = pd.DataFrame(webaddresses)
webaddresses.rename(columns = {list(webaddresses)[0]:'URL'}, inplace=True)
splitting_gurl = webaddresses['URL'].str.split(r'[.\:/?=\-&]+', expand = True)
for i in range(len(splitting_gurl.index)):
row = splitting_gurl.loc[[i]]
for j in range (0,5):
if row[[j]] == str(Brand_check) or row[[j]] == str(Manufacturer_check):
a=webaddresses.loc[[i]]
print(a)
以下是错误:-
File "<ipython-input-124-0b002229b2b7>", line 4, in <module>
if row[[j]] == str(Brand_check) or row[[j]] == str(Manufacturer_check):
File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我只想运行 For 循环和 IF 语句并比较单词。
最佳答案
我们可以在Python中使用Fuzzywuzzy包,它根据levenstein距离比较单词,并针对插入字母表、删除或替换等任一操作进行惩罚。
关于python - 尝试使用 for 循环比较文本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55387227/