python - 无法删除 Pandas 中的空白 + CSV 中的乱码输出

我从 HTML 表中提取了一些信息，重新组织了数据并尝试将数据输出到 CSV 文件。但是，我在输出 CSV 的“价格”列中看到很多乱码(见下文)。当我检查 Python 中的数据帧内容时，我发现价格列似乎有空格/制表符和奇怪的对齐方式。

打印数据框时的结果:

输出 CSV 中出现乱码:

在下面附上我的代码，以便您能够复制该问题:

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import pandas as pd
import os

# Using Selenium to Load Page and Parse with BeautifulSoup
url = 'https://fuelkaki.sg/home'
options = Options()
options.binary_location = "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" 
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
driver.get(url)
time.sleep(3)
page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html.parser')

# Using Pandas to read the table on the page and reorganize the data
df = pd.read_html(page)
df[0].columns=['Brand','Diesel','92','95','98','Premium']
df1 = df[0]
del df1['Brand']
df1.insert(0,"Brand",["Caltex", "Esso","Shell", "Sinopec","SPC"],True)
df2=pd.melt(df[0],id_vars=['Brand'],value_vars=['Diesel','92','95','98','Premium'],var_name='Grade',value_name='Price')

# Using Pandas to clean the data in the 'Price" column
df2['Price']=df2['Price'].apply(lambda x: x.replace("Diesel", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("Regular", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("Extra", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(Synergy Supreme+)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(Platinum 98)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(Shell V-Power)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("(SINO X Power)", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("S$ ", ""))
df2['Price']=df2['Price'].apply(lambda x: x.replace("N.A.", "-"))

print (df2)


# Output the dataframe to CSV file
output_path='I:\\test.csv'
df2.to_csv(output_path, mode='a',index=False,encoding='utf-8',header=not os.path.exists(output_path))

感谢有关如何纠正间距、删除空格和修复乱码的任何建议。

最佳答案

在所有现有的应用/替换行之后添加此行。之后，它打印得很好。看起来您有 unicode 字符，可以将其编码为 ascii 并忽略错误:

df2['Price']=df2['Price'].apply(lambda x: x.encode("ascii", "ignore").decode())

数据帧输出

      Brand    Grade Price
0    Caltex   Diesel  2.67
1      Esso   Diesel  2.66
2     Shell   Diesel  2.90
3   Sinopec   Diesel  2.66
4       SPC   Diesel  2.43
5    Caltex       92  3.00
6      Esso       92  3.00

CSV 输出

关于python - 无法删除 Pandas 中的空白 + CSV 中的乱码输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/71404165/

python - 无法删除 Pandas 中的空白 + CSV 中的乱码输出

上一篇：sql - 如何从sql中的时间戳中仅提取小时？

下一篇：html - <div> 不显示背景图片