我已将一个数据框读入Python,其列名包含欧元符号“price_€。Python 将该列视为price_�。它不允许我使用 € 或 � 引用该列
File "<ipython-input-53-d7f8249147e7>", line 1
df[price_€] = df[0].str.replace(r'[€,]', '').astype('float')
^
SyntaxError: invalid syntax
有什么想法如何从列名称中删除它,以便我可以开始引用它吗?
最佳答案
您不能在变量名称中使用欧元符号:
Identifiers (also referred to as names) are described by the following lexical definitions:
identifier ::= (letter|"_") (letter | digit | "_")*
letter ::= lowercase | uppercase
lowercase ::= "a"..."z"
uppercase ::= "A"..."Z"
digit ::= "0"..."9"
您需要使用字符串:
df["price_€"] ...
pandas 实际上对我来说欧元符号没有问题:
import pandas as pd
df = pd.DataFrame([[1, 2]], columns=["£", "€"])
print(df["€"])
print(df["£"])
0 2
Name: €, dtype: int64
0 1
Name: £, dtype: int64
该文件采用 cp1252 编码,因此您需要指定编码:
mport pandas as pd
iimport codecs
df = pd.read_csv("PPR-2015.csv",header=0,encoding="cp1252")
print(df.columns)
Index([u'Date of Sale (dd/mm/yyyy)', u'Address', u'Postal Code', u'County',
u'Price (€)', u'Not Full Market Price', u'VAT Exclusive', u'Description of Property', u'Property Size Description'], dtype='object')
print(df[u'Price (€)'])
0 €138,000.00
1 €270,000.00
2 €67,000.00
3 €900,000.00
4 €176,000.00
5 €155,000.00
6 €100,000.00
7 €120,000.00
8 €470,000.00
9 €140,000.00
10 €592,000.00
11 €85,000.00
12 €422,500.00
13 €225,000.00
14 €55,000.00
...
17433 €262,000.00
17434 €155,000.00
17435 €750,000.00
17436 €96,291.69
17437 €112,000.00
17438 €350,000.00
17439 €190,000.00
17440 €25,000.00
17441 €100,000.00
17442 €75,000.00
17443 €46,000.00
17444 €175,000.00
17445 €48,500.00
17446 €150,000.00
17447 €400,000.00
Name: Price (€), Length: 17448, dtype: object
然后更改为 float :
df[u'Price (€)'] = df[u'Price (€)'].str.replace(ur'[€,]'), '').astype('float')
print(df['Price (€)'.decode("utf-8")])
输出:
0 138000
1 270000
2 67000
3 900000
4 176000
5 155000
6 100000
7 120000
8 470000
9 140000
10 592000
11 85000
12 422500
13 225000
14 55000
...
17433 262000.00
17434 155000.00
17435 750000.00
17436 96291.69
17437 112000.00
17438 350000.00
17439 190000.00
17440 25000.00
17441 100000.00
17442 75000.00
17443 46000.00
17444 175000.00
17445 48500.00
17446 150000.00
17447 400000.00
Name: Price (€), Length: 17448, dtype: float64
关于python - 未知的字符问题�,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30937395/