python - 在 MySql 中存储文本文件数据,时间戳在引号 ""中,并且有一些缺失的数据值

标签 python mysql csv

我正在从文本文件读取数据并将其存储在 MySql 数据库中。文本文件中的数据格式为

"sd","modu","datfil","244","de3000.Std.27","CPU:KC.EBM1_16.02.18.cd","13","ffm"
"TIMESTAMP","RNO","tem","t_vel","tem_acc","velc","sd","sd_ds_as"
"","","mp","mp","mp","mp","mp","mp"
"2009-02-25 14:28:36.76",missing RNO,8.277527,0.68,0.15,0.42,762.0303,4.6801
"2009-02-25 14:28:36.8",missing RNO,8.24408,0.7,0.03,0.3,761.878,4.682412
"2009-02-25 14:29:36.88",2,8.277527,0.55,0.09,0.31,762.0018,4.680709
"2009-02-25 14:30:36.92",3,8.277527,0.47,0.2,0.31,761.8914,4.684526
"2009-02-25 14:48:36.96",4,8.277527,0.46,0.14,0.28,761.9133,4.692356
"2009-02-25 14:58:37",5,8.210632,0.42,0.09,0.35,761.9025,4.696963
"2009-02-25 14:58:37.08",6,8.277527,0.51,0.19,0.27,761.8416,4.69718
"2009-02-25 14:58:37.12",7,8.277527,0.36,0.23,0.33,761.7534,4.701172
"2009-02-25 14:58:37.16",8,8.24408,0.44,0.08,0.5,761.8087,4.700504

问题是时间戳包含在“”中,并且某些数据值包含毫秒。为了解决这个问题,我编写了以下代码

    with open(filepath) as f:
        lines = f.readlines()
    max_lines = len(lines)
    for k, line in enumerate(lines):
        if k >= (int(skip_header_line) + int(index_line_number)): # skipping headerlines
            data_tmp = line.split(',')
            strDate = data_tmp[0].replace("\"", "")  #  2016-02-25 14:48:36.76
        strDate = strDate.split('.')[0]  #  2016-02-25 14:48:36  
        timestamp = datetime.datetime.strptime(strDate, '%Y-%m-%d %H:%M:%S')     # 2016-02-25 14:48:36
        ts = calendar.timegm(timestamp.timetuple())  #  1456411716  

            data_buffer = [ts] + data_tmp[1:]
            for val in data_buffer:
                if val == " ":
                    val = None
                    data_buffer.append(val)
                else:
                    continue
            print data_buffer

            cursor.execute(add_data, data_buffer)
            cnx.commit()

            with open(marker_file, "w") as f:
                f.write(" ".join([ str(item[0]), str(data_tmp[0]), str(max_lines), str(k-int(skip_header_line)+1) ]))
cursor.close()
cnx.close()

我收到以下错误

 [1456411716, ' ', '8.277527', '0.68', '0.15', '0.42', '762.0303', '4.6801\n']

 mysql.connector.errors.DatabaseError: 1265 (01000): Data 
 truncated for column 'RNO' at row 1

如果有人知道如何处理这种情况。我会非常感激。

最佳答案

我不确定,如果我理解你的问题,你不应该循环从calendar.timegm函数返回的时间戳。

ts = calendar.timegm(timestamp.timetuple())  #  1456411716  

data_buffer = []
for val in ts:

稍微修改您的代码:

lines = '''
"2009-02-25 14:28:36.76",0,8.277527,0.68,0.15,0.42,762.0303,4.6801
"2009-02-25 14:28:36.8",1,8.24408,0.7,0.03,0.3,761.878,4.682412
"2009-02-25 14:29:36.88",2,8.277527,0.55,0.09,0.31,762.0018,4.680709
"2009-02-25 14:30:36.92",3,8.277527,0.47,0.2,0.31,761.8914,4.684526
"2009-02-25 14:48:36.96",4,8.277527,0.46,0.14,0.28,761.9133,4.692356
"2009-02-25 14:58:37",5,8.210632,0.42,0.09,0.35,761.9025,4.696963
"2009-02-25 14:58:37.08",6,8.277527,0.51,0.19,0.27,761.8416,4.69718
"2009-02-25 14:58:37.12",7,8.277527,0.36,0.23,0.33,761.7534,4.701172
"2009-02-25 14:58:37.16",8,8.24408,0.44,0.08,0.5,761.8087,4.700504
'''

skip_header_line = 0
index_line_number = 0

if 1:
    lines = lines.splitlines()
    for k, line in enumerate(lines):
        if k <= (int(skip_header_line) + int(index_line_number)):
            continue

        data_tmp = line.split(',')
        strDate = data_tmp[0].replace("\"", "")  #  2016-02-25 14:48:36.76
        strDate = strDate.split('.')[0]  #  2016-02-25 14:48:36
        timestamp = datetime.datetime.strptime(strDate, '%Y-%m-%d %H:%M:%S')     # 2016-02-25 14:48:36
        ts = calendar.timegm(timestamp.timetuple())  #  1456411716  

        # rebuild list, first element is ts the others from data_tmp (excluding the datetime)
        data_buffer = [ts] + data_tmp[1:]
        print data_buffer

        # here your insert ?

结果如下:

[1235572116, '0', '8.277527', '0.68', '0.15', '0.42', '762.0303', '4.6801']
[1235572116, '1', '8.24408', '0.7', '0.03', '0.3', '761.878', '4.682412']
[1235572176, '2', '8.277527', '0.55', '0.09', '0.31', '762.0018', '4.680709']
[1235572236, '3', '8.277527', '0.47', '0.2', '0.31', '761.8914', '4.684526']
[1235573316, '4', '8.277527', '0.46', '0.14', '0.28', '761.9133', '4.692356']
[1235573917, '5', '8.210632', '0.42', '0.09', '0.35', '761.9025', '4.696963']
[1235573917, '6', '8.277527', '0.51', '0.19', '0.27', '761.8416', '4.69718']
[1235573917, '7', '8.277527', '0.36', '0.23', '0.33', '761.7534', '4.701172']
[1235573917, '8', '8.24408', '0.44', '0.08', '0.5', '761.8087', '4.700504']

关于python - 在 MySql 中存储文本文件数据,时间戳在引号 ""中,并且有一些缺失的数据值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40982893/

相关文章:

php - 从 PHP 查询中调用 getter 方法

python - 如何让浏览器响应下载 Django CSV 文件

python - 在 setuptools 中使用 console_scripts 时出现 ImportError

php - 无法将 UTF-8 字符串添加到 MySQL 中?

python - 将值随机分配给 Pandas 数据框中的行子集

mysql - 具有不同排序规则的 VARCHAR 列上的 Mysql JOIN 的性能

c++ - 将 csv 文件中的值插入到 Qt 中的 vector 时出错

ruby - 为什么 CSV::HeaderConverters 在返回非字符串时停止处理?

python - 用日期时间索引组成数据框

python - 给出 dow 时 python 中的字符串与最大可能的年份