我目前正在根据用户输入从 MS SQL Server 2008 查询数据。但是,当我尝试使用 describe() 函数获取五位数摘要时出现错误。
import pyodbc
import numpy as np
import pandas.io.sql as sql
import pandas
print "What Part Number will you examine?"
PartN = raw_input()
conn = pyodbc.connect('my connection info')
curs = conn.cursor()
sqlr = """SELECT partmadeperhour FROM Completions WHERE PartNumber = ?
AND endtime > '2012-12-31 23:59:00' ORDER BY partmadeperhour"""
q = curs.execute(sqlr,[PartN]).fetchall()
df = pandas.DataFrame(q, columns =['rate'])
print df
columnnames = list(df.columns.values)
print columnnames
df['rate'].describe()
我的数据框看起来像这样
rate
0 [0.25]
1 [0.67]
2 [0.93]
... ...
1474 [5400.00]
我得到以下返回和错误:
[1475 rows x 1 columns]
['rate']
rate object
dtype: object
Traceback (most recent call last):
File "newr.py", line 30, in <module>
df['rate'].describe()
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 4034, in describe
return describe_1d(self, percentiles)
File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 4031, in describe_1d
return describe_categorical_1d(data)
File "C:\Python27\lib\site-packages\pandas\core\generic.py",
line 4007, in describe_categorical_1d
objcounts = data.value_counts()
File "C:\Python27\lib\site-packages\pandas\core\base.py", line 433, in value_counts
normalize=normalize, bins=bins, dropna=dropna)
File "C:\Python27\lib\site-packages\pandas\core\algorithms.py", line 245, in value_counts
keys, counts = htable.value_count_object(values, mask)
File "pandas\hashtable.pyx", line 983, in pandas.hashtable.value_count_object
(pandas\hashtable.c:17616)
File "pandas\hashtable.pyx", line 994, in pandas.hashtable.value_count_object
(pandas\hashtable.c:17353)
TypeError: unhashable type: 'pyodbc.Row'
我知道我需要将数据框中的数据转换为与当前对象不同的类型,但不确定如何转换为 float 。
感谢任何帮助
最佳答案
确保您使用的是 pandas 0.12 或更高版本:
>>> import pandas
>>> pandas.__version__
'0.14.1'
使用pandas.read_sql_query直接填充数据框,传递查询字符串和 pyodbc 连接。请注意,列别名 rate
已添加到 T-SQL 查询中,因为 pandas.read_sql_query
不支持传递列名列表或字典:
...
>>> sql = "select 0.25 union select 0.67 union select 0.93 as rate"
>>> df = pandas.read_sql_query(sql, connection)
>>> df
rate
0 0.25
1 0.67
2 0.93
>>> df['rate'].describe()
count 3.000000
mean 0.616667
std 0.343123
min 0.250000
25% 0.460000
50% 0.670000
75% 0.800000
max 0.930000
dtype: float64
可以使用 pandas.read_sql_query
的 params
参数提供原始查询中的参数值。
关于python - 使用 pyodbc 从 SQL Server 中提取的数据行是 "unhashable type",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30331663/