python - 将空格分隔的 .csv 导入 python3,忽略开头的文本?

标签 python python-3.x

我想将以下 .csv 数据(.txt 文件)导入到每列数据的 python 列表中,忽略开头的文本。我无法更改文件的格式。我收到错误:

"Traceback (most recent call last):
  File "/Users/Hamish/Desktop/Python/AWBM/Import.py", line 13, in <module>
    rain_column = float(row[7])
IndexError: list index out of range"

这是我正在尝试运行的代码...

import csv
import numpy as np

file = open('Data_Bris.txt')
reader = csv.reader(file, delimiter=' ')

datelist = []
rainlist = []
evaplist = []
for row in reader:
    # row = [date, day, date2, T.Max, Smx, T.Min, Smn, Rain, Srn, Evap, Sev, Rad, Ssl, VP, Svp, maxT, minT, Span, Ssp]
    date_column = str(row[0])
    rain_column = float(row[7])
    evap_column = float(row[9])

    datelist.append([date_column])
    rainlist.append([rain_column])
    evaplist.append([evap_column])

date = np.array([datelist])
rain = np.array([rainlist])
evap = np.array([evaplist])

timeseries = np.arange(rain.size)

这是我想要导入的数据文件(继续相同)...

"17701231" 365 31/12/1770 -99.9 999 -99.9 999 9999.9 999 999.9 999 999.9  999 999.9 999 9999.9 9999.9 9999.9  999
""
" This file is SPACE DELIMITED for easy import into both spreadsheets and programs."
"The first line 17701231 contains dummy data and is provided to allow spreadsheets to sense the columns"
" To read into a spreadsheet select DELIMITED and SPACE."
" "
" "
"=========  The following essential information and notes should be kept in the data file =========="
" "
"The Data Drill system and data are copyright to the Queensland Government Department of Science, Information Technology and Innovation (DSITI)."
"SILO data, with the exception of Patched Point data for Queensland, are supplied to the licencee only and may not be given, lent, or sold to any other party"
" "
"Notes:"
" * Data Drill for Lat, Long: -27.5000 153.0000 (DECIMAL DEGREES), 27 30'S 153 00'E Your Ref: Data_Bris"
" * Elevation:  102m "
" * Extracted from Silo on 20171214"
" * Please read the documentation on the Data Drill at http://www.longpaddock.qld.gov.au/silo"
" "
" * As evaporation is read at 9am, it has been shifted to the day before"
"    ie The evaporation measured on 20 April is in row for 19 April"
" * The 6 Source columns Smx, Smn, Srn, Sev, Ssl, Svp indicate the source of the data to their left, namely Max temp, Min temp, Rainfall, Evaporation, Radiation and Vapour Pressure respectively "
" "
"   35 = interpolated from daily observations using anomaly interpolation method for CLIMARC data
"   25 = interpolated daily observations,     75 = interpolated long term average"
"   26 = synthetic pan evaporation "
" "
" * Relative Humidity has been calculated using 9am VP, T.Max and T.Min"
"   RHmaxT is estimated Relative Humidity at Temperature T.Max"
"   RHminT is estimated Relative Humidity at Temperature T.Min"
"   Span = a calibrated estimate of class A pan evaporation based on vapour pressure deficit and solar radiation          
" * The accuracy of the data depends on many factors including date, location, and variable."
"   For consistency data is supplied using one decimal place, however it is not accurate to that precision."
"   Further information is available from http://www.longpaddock.qld.gov.au/silo"
"===================================================================================================="
" "
Date       Day Date2      T.Max Smx T.Min Smn Rain   Srn  Evap Sev Radn   Ssl VP    Svp RHmaxT RHminT Span   Ssp    
(yyyymmdd)  () (ddmmyyyy)  (oC)  ()  (oC)  ()   (mm)  ()  (mm)  () (MJ/m2) () (hPa)  ()   (%)    (%)    (mm)  () 
18890101     1  1-01-1889  29.5  35  21.5  35    0.3  25   6.2  75  23.0   35  26.0  35   63.1  100.0    5.6  26
18890102     2  2-01-1889  32.0  35  21.5  35    0.1  25   6.2  75  23.0   35  21.0  35   44.2   81.9    6.9  26
18890103     3  3-01-1889  31.5  35  21.5  35    0.0  25   6.2  75  23.0   35  24.0  35   51.9   93.6    6.4  26
18890104     4  4-01-1889  29.5  35  21.0  35    0.0  25   6.2  75  23.0   35  22.0  35   53.4   88.5    6.1  26
18890105     5  5-01-1889  30.0  35  19.0  35    0.0  25   6.2  75  23.0   35  19.0  35   44.8   86.5    6.5  26
18890106     6  6-01-1889  28.5  35  18.5  35    0.0  25   6.2  75  23.0   35  23.0  35   59.1  100.0    5.7  26
18890107     7  7-01-1889  30.0  35  18.5  35    0.1  25   6.2  75  23.0   35  20.0  35   47.1   94.0    6.4  26
18890108     8  8-01-1889  28.0  35  18.5  35    0.0  25   6.2  75  23.0   35  21.0  35   55.6   98.7    5.8  26
18890109     9  9-01-1889  28.5  35  19.0  35    0.0  25   6.2  75  24.0   35  22.0  35   56.5  100.0    6.0  26
18890110    10 10-01-1889  29.0  35  20.0  35    0.0  25   6.2  75  23.0   35  21.0  35   52.4   89.9    6.1  26

最佳答案

在这里,您想要忽略标题中的所有行,包括列的名称和格式。实现此目的的一个简单方法是忽略任何不以数字开头的行。使用生成器(以避免加载内存中的所有文件),您可以简单地创建您的阅读器:

...
reader = csv.reader((row for row in io.StringIO(t) if row[0].isdigit()),
    delimiter=' ', skipinitialspace=True))
...

skipinitialspace=True 允许接受多个空格作为单个分隔符。

关于python - 将空格分隔的 .csv 导入 python3,忽略开头的文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47826517/

相关文章:

python - 单个语句中的多个递归

python - pandas 系列查找具有特定作为最后一个字符的字符串

python - 使用 python 使用 Active Directory 密码连接 Azure SQL Server 数据库(出现错误)

python-3.x - Pandas 左连接,其中多列上的right为空

python - 你将如何在 python 中打印数字 1、2、3、4、5、6、7、8、18、19、20、21、22、23、24?

python - Celery 不使用 Redis 在 Kubernetes 中处理任务

python - Django 可以使用 "external"python 脚本链接到其他库(NumPy,RPy2 ...)

python - 为什么 Python 在从字符串转换为 int 时会删除前导零?

python - TypeError : expected str, 字节或 os.PathLike 对象,而不是 None 类型

python - uvicorn 启动时执行脚本并缓存数据