我从 HTML 表中提取了一些信息,重新组织了数据并尝试将数据输出到 CSV 文件。但是,我在输出 CSV 的“价格”列中看到很多乱码(见下文)。当我检查 Python 中的数据帧内容时,我发现价格列似乎有空格/制表符和奇怪的对齐方式。
打印数据框时的结果:
输出 CSV 中出现乱码:
在下面附上我的代码,以便您能够复制该问题:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd
import os
# Using Selenium to Load Page and Parse with BeautifulSoup
url = 'https://fuelkaki.sg/home'
options = Options()
options.binary_location = "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(3)
page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html.parser')
# Using Pandas to read the table on the page and reorganize the data
df = pd.read_html(page)
df[0].columns=['Brand','Diesel','92','95','98','Premium']
df1 = df[0]
del df1['Brand']
df1.insert(0,"Brand",["Caltex", "Esso","Shell", "Sinopec","SPC"],True)
df2=pd.melt(df[0],id_vars=['Brand'],value_vars=['Diesel','92','95','98','Premium'],var_name='Grade',value_name='Price')
# Using Pandas to clean the data in the 'Price" column
df2['Price']=df2['Price'].apply(lambda x: x.replace("Diesel", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("Regular", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("Extra", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(Synergy Supreme+)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(Platinum 98)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(Shell V-Power)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(SINO X Power)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("S$ ", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("N.A.", "-"))
print (df2)
# Output the dataframe to CSV file
output_path='I:\\test.csv'
df2.to_csv(output_path, mode='a',index=False,encoding='utf-8',header=not os.path.exists(output_path))
感谢有关如何纠正间距、删除空格和修复乱码的任何建议。
最佳答案
在所有现有的应用/替换行之后添加此行。 之后,它打印得很好。看起来您有 unicode 字符,可以将其编码为 ascii 并忽略错误:
df2['Price']=df2['Price'].apply(lambda x: x.encode("ascii", "ignore").decode())
数据帧输出
Brand Grade Price
0 Caltex Diesel 2.67
1 Esso Diesel 2.66
2 Shell Diesel 2.90
3 Sinopec Diesel 2.66
4 SPC Diesel 2.43
5 Caltex 92 3.00
6 Esso 92 3.00
CSV 输出
关于python - 无法删除 Pandas 中的空白 + CSV 中的乱码输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71404165/