python-3.x - Pandas_datareader错误SymbolWarning:未能读取符号:“T”,替换为NaN

标签 python-3.x pandas pandas-datareader

我有这段代码,该代码从Wikipedia获取符号列表,然后从yahoofinance获取股票数据。这是一个简单的代码,直到几天前都可以正常工作,但是由于某些原因,我在许多股票上都遇到了unable to read symbol错误。雅虎在这样做吗?我该怎么做才能解决此错误。我不能忽略这一点,因为超过50个符号是NaN,并且当我重新运行代码时,错误中会显示不同的符号

pandas_datareader版本:0.8.1
码:

import datetime
import pandas as pd
import numpy as np
import csv
from pandas_datareader import data as web
import matplotlib
import matplotlib.pyplot as plt
import requests
import bs4 as bs
from urllib.request import urlopen
from bs4 import BeautifulSoup
import tqdm
from pandas import DataFrame
import seaborn as sns

resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable'})
tickers = []
for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text.strip()
    tickers.append(ticker)

start = datetime.date(2008,11,1)
end = datetime.date.today()
# df = web.get_data_yahoo(tickers, start, end)
df = web.DataReader(tickers, 'yahoo', start, end)


错误:

C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py:270: SymbolWarning: Failed to read symbol: 'T', replacing with NaN.
  warnings.warn(msg.format(sym), SymbolWarning)
C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py:270: SymbolWarning: Failed to read symbol: 'BKR', replacing with NaN.
  warnings.warn(msg.format(sym), SymbolWarning)
C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py:270: SymbolWarning: Failed to read symbol: 'BRK.B', replacing with NaN.
  warnings.warn(msg.format(sym), SymbolWarning)

最佳答案

看起来日期问题可以通过根据此github issue'.'替换为'-'来解决

另外,您不需要requestsBeautifulSoup只需使用pd.read_html

我成功创建了一个DataFrame,对于所有505个自动收录器,在python 3.6.8pandas 24.2中都没有警告或错误。请参见下面的示例:

import pandas as pd
from pandas_datareader import data as web
import datetime

# no need for requests or BeautifulSoup use read_html
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
# convert symbol column to list
tickers = df['Symbol'].values.tolist()

# list comprehension to replace data in strings
t = [x.replace('.', '-') for x in tickers] 

start = datetime.date(2008,11,1)
end = datetime.date.today()
df2 = web.DataReader(t, 'yahoo', start, end)


以下是DataFrame中所有505个代码的列表:

print(*df2.columns.levels[1])

A AAL AAP AAPL ABBV ABC ABMD ABT ACN ADBE ADI ADM ADP ADS ADSK AEE AEP AES AFL AGN AIG AIV AIZ AJG AKAM ALB ALGN ALK ALL ALLE ALXN AMAT AMCR AMD AME AMG AMGN AMP AMT AMZN ANET ANSS ANTM AON AOS APA APD APH APTV ARE ARNC ATO ATVI AVB AVGO AVY AWK AXP AZO BA BAC BAX BBT BBY BDX BEN BF-B BIIB BK BKNG BKR BLK BLL BMY BR BRK-B BSX BWA BXP C CAG CAH CAT CB CBOE CBRE CBS CCI CCL CDNS CDW CE CERN CF CFG CHD CHRW CHTR CI CINF CL CLX CMA CMCSA CME CMG CMI CMS CNC CNP COF COG COO COP COST COTY CPB CPRI CPRT CRM CSCO CSX CTAS CTL CTSH CTVA CTXS CVS CVX CXO D DAL DD DE DFS DG DGX DHI DHR DIS DISCA DISCK DISH DLR DLTR DOV DOW DRE DRI DTE DUK DVA DVN DXC EA EBAY ECL ED EFX EIX EL EMN EMR EOG EQIX EQR ES ESS ETFC ETN ETR EVRG EW EXC EXPD EXPE EXR F FANG FAST FB FBHS FCX FDX FE FFIV FIS FISV FITB FLIR FLS FLT FMC FOX FOXA FRC FRT FTI FTNT FTV GD GE GILD GIS GL GLW GM GOOG GOOGL GPC GPN GPS GRMN GS GWW HAL HAS HBAN HBI HCA HD HES HFC HIG HII HLT HOG HOLX HON HP HPE HPQ HRB HRL HSIC HST HSY HUM IBM ICE IDXX IEX IFF ILMN INCY INFO INTC INTU IP IPG IPGP IQV IR IRM ISRG IT ITW IVZ JBHT JCI JEC JKHY JNJ JNPR JPM JWN K KEY KEYS KHC KIM KLAC KMB KMI KMX KO KR KSS KSU L LB LDOS LEG LEN LH LHX LIN LKQ LLY LMT LNC LNT LOW LRCX LUV LVS LW LYB M MA MAA MAC MAR MAS MCD MCHP MCK MCO MDLZ MDT MET MGM MHK MKC MKTX MLM MMC MMM MNST MO MOS MPC MRK MRO MS MSCI MSFT MSI MTB MTD MU MXIM MYL NBL NCLH NDAQ NEE NEM NFLX NI NKE NLOK NLSN NOC NOV NOW NRG NSC NTAP NTRS NUE NVDA NVR NWL NWS NWSA O OKE OMC ORCL ORLY OXY PAYX PBCT PCAR PEAK PEG PEP PFE PFG PG PGR PH PHM PKG PKI PLD PM PNC PNR PNW PPG PPL PRGO PRU PSA PSX PVH PWR PXD PYPL QCOM QRVO RCL RE REG REGN RF RHI RJF RL RMD ROK ROL ROP ROST RSG RTN SBAC SBUX SCHW SEE SHW SIVB SJM SLB SLG SNA SNPS SO SPG SPGI SRE STI STT STX STZ SWK SWKS SYF SYK SYY T TAP TDG TEL TFX TGT TIF TJX TMO TMUS TPR TRIP TROW TRV TSCO TSN TTWO TWTR TXN TXT UA UAA UAL UDR UHS ULTA UNH UNM UNP UPS URI USB UTX V VAR VFC VIAB VLO VMC VNO VRSK VRSN VRTX VTR VZ WAB WAT WBA WCG WDC WEC WELL WFC WHR WLTW WM WMB WMT WRK WU WY WYNN XEC XEL XLNX XOM XRAY XRX XYL YUM ZBH ZION ZTS

print(len(df2.columns.levels[1]))
505

关于python-3.x - Pandas_datareader错误SymbolWarning:未能读取符号:“T”,替换为NaN,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58920176/

相关文章:

python - 从时间戳在指定范围或持续时间内的 Pandas DataFrame 中删除重复行

python-3.x - Pandas 数据阅读器

python - pandas_datareader 无法在 x 轴上打印日期

python - 将数组列表转换为 Pandas 数据框

python-3.x - 如何在特定序列第一次出现之前识别序列和索引号

python - 制作列表到字符串 Python

python - 在 python asyncio 上获取第一个可用的锁/信号量

python - 如何根据排序算法获得 pandas 的获胜者选民

python-3.x - 如何将单元格添加到 pd.DataFrame 但保留值的类型(np.uint64)?

python - 使用 usecols 时 pandas.read_excel 错误