mysql - MariaDB : convert string to int when importing from CSV, 同时删除数字中的空格

标签 mysql csv replace import mariadb

我有一个“大”csv 文件(大约 1GB 数据,300 万行)要导入到 MariaDB 表中。

问题是,几乎每一行的每个字段都被视为一个字符串。 因此,我必须将“1 337”(字符串)转换为 1337(整数)。

这是用于导入表的脚本:

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'data.txt'
    INTO TABLE `test`.`test_import`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`id`,
        `data`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberOne, ' ', ''),
            `Number Three` = REPLACE(@NumberOne, ' ', ''),
            `Number Four` = REPLACE(@NumberOne, ' ', '');

使用这个脚本,导入999以下的数字没有问题。但是从1000开始(在我的csv中写成“1 000”),我所得到的只是一个警告(截断不正确INTEGER 值:'1 000')以及我数据库中的值 1。

“有趣”的是,当我尝试这样做时:

SET `Number One` = REPLACE(@NumberOne, '1', 'k'),
                `Number Two` = REPLACE(@NumberOne, '1', 'k'),
                `Number Three` = REPLACE(@NumberOne, '1', 'k'),
                `Number Four` = REPLACE(@NumberOne, '1', 'k')

-> REPLACE() 有效,“1 000”变为“k 000”。

那么,如何使用 REPLACE() 删除数字中的空格?或者,如何使 CAST()/CONVERT() 在“1 337”等字符串上正常工作?

<小时/>

更多信息。

这是一个新鲜测试表:

CREATE OR REPLACE TABLE test_spaces_extr (
    `Identifier`   tinytext,
    `First name`   tinytext,
    `Last name`    tinytext,
    `Number One`   int unsigned,
    `Number Two`   int unsigned,
    `Number Three` int unsigned,
    `Number Four`  int unsigned,
    `Number Five`  int unsigned,
    `Number Six`   int unsigned,
    `Number Seven` int unsigned
);

这是导入 CSV 的脚本:

LOAD DATA LOW_PRIORITY LOCAL
    INFILE 'some_data.txt'
    INTO TABLE `test`.`test_spaces_extr`
    CHARACTER SET utf8
    FIELDS TERMINATED BY ';'
    OPTIONALLY ENCLOSED BY '"'
    ESCAPED BY '"'
    LINES TERMINATED BY '\r\n'
    (`Identifier`,
        `First name`,
        `Last name`,
        @NumberOne,
        @NumberTwo,
        @NumberThree,
        @NumberFour,
        @NumberFive,
        @NumberSix,
        @NumberSeven)
        SET `Number One` = REPLACE(@NumberOne, ' ', ''),
            `Number Two` = REPLACE(@NumberTwo, ' ', ''),
            `Number Three` = REPLACE(@NumberThree, ' ', ''),
            `Number Four` = REPLACE(@NumberFour, ' ', ''),
            `Number Five` = REPLACE(@NumberFive, ' ', ''),
            `Number Six` = REPLACE(@NumberSix, ' ', ''),
            `Number Seven` = REPLACE(@NumberSeven, ' ', '');

以下是some_data.txt的完整内容:

"3efa639b3a";"已审查";"已审查";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"

(一行,是的。)

这是结果:

"Identifier"    "First name"    "Last name" "Number One"    "Number Two"    "Number Three"  "Number Four"   "Number Five"   "Number Six"    "Number Seven"
"3efa639b3a"    "Censored"  "Censored"  "7896"  "3468"  "3854"  "5000"  "1234"  "9654"  "0"

事实上,“Number”字段在这里变成了整数。所有这些,但不是最后一个(“数字七”->“0”)。

事情变得越来越奇怪了......

最佳答案

我无法重现该问题:

$ mysql -u user -p --column-type-info
MariaDB [(none)]> SELECT VERSION();
Field   1:  `VERSION()`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       VAR_STRING
Collation:  utf8_general_ci (33)
Length:     72
Max_length: 24
Decimals:   31
Flags:      NOT_NULL 


+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [(none)]> SELECT CAST(REPLACE('1 337', ' ', '') AS UNSIGNED);
Field   1:  `CAST(REPLACE('1 337', ' ', '') AS UNSIGNED)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       LONGLONG
Collation:  binary (63)
Length:     5
Max_length: 4
Decimals:   0
Flags:      NOT_NULL UNSIGNED BINARY NUM 


+---------------------------------------------+
| CAST(REPLACE('1 337', ' ', '') AS UNSIGNED) |
+---------------------------------------------+
|                                        1337 |
+---------------------------------------------+
1 row in set (0.00 sec)

更新

文件:/path/to/data.csv

"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
MariaDB [_]> SELECT VERSION();
+-----------------+
| VERSION()       |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)

MariaDB [_]> DROP TABLE IF EXISTS `test_spaces_extr`;
Query OK, 0 rows affected (0.07 sec)

MariaDB [_]> CREATE OR REPLACE TABLE `test_spaces_extr` (
    ->     `Identifier`   tinytext,
    ->     `First name`   tinytext,
    ->     `Last name`    tinytext,
    ->     `Number One`   int unsigned,
    ->     `Number Two`   int unsigned,
    ->     `Number Three` int unsigned,
    ->     `Number Four`  int unsigned,
    ->     `Number Five`  int unsigned,
    ->     `Number Six`   int unsigned,
    ->     `Number Seven` int unsigned
    -> );
Query OK, 0 rows affected (0.00 sec)

MariaDB [_]> LOAD DATA LOW_PRIORITY LOCAL INFILE '/path/to/data.csv'
    ->   INTO TABLE `test_spaces_extr`
    ->   CHARACTER SET utf8
    ->   FIELDS TERMINATED BY ';'
    ->   OPTIONALLY ENCLOSED BY '"'
    ->   ESCAPED BY '"'
    ->   LINES TERMINATED BY '\r\n'
    ->   (
    ->     `Identifier`,
    ->     `First name`,
    ->     `Last name`,
    ->     @`NumberOne`,
    ->     @`NumberTwo`,
    ->     @`NumberThree`,
    ->     @`NumberFour`,
    ->     @`NumberFive`,
    ->     @`NumberSix`,
    ->     @`NumberSeven`
    ->   )
    ->   SET
    ->   `Number One` = REPLACE(@`NumberOne`, ' ', ''),
    ->   `Number Two` = REPLACE(@`NumberTwo`, ' ', ''),
    ->   `Number Three` = REPLACE(@`NumberThree`, ' ', ''),
    ->   `Number Four` = REPLACE(@`NumberFour`, ' ', ''),
    ->   `Number Five` = REPLACE(@`NumberFive`, ' ', ''),
    ->   `Number Six` = REPLACE(@`NumberSix`, ' ', ''),
    ->   `Number Seven` = REPLACE(@`NumberSeven`, ' ', '');
Query OK, 1 row affected (0.00 sec)                  
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

MariaDB [_]> SELECT
    ->   `Identifier`,
    ->   `First name`,
    ->   `Last name`,
    ->   `Number One`,
    ->   `Number Two`,
    ->   `Number Three`,
    ->   `Number Four`,
    ->   `Number Five`,
    ->   `Number Six`,
    ->   `Number Seven`
    -> FROM
    ->   `test_spaces_extr`;
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| Identifier | First name | Last name | Number One | Number Two | Number Three | Number Four | Number Five | Number Six | Number Seven |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| 3efa639b3a | Censored   | Censored  |       7896 |       3468 |         3854 |        5000 |        1234 |       9654 |         1337 |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
1 row in set (0.00 sec)

关于mysql - MariaDB : convert string to int when importing from CSV, 同时删除数字中的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44521291/

相关文章:

mysql - 如何确定 VARCHAR 字段值中的字节数?

mysql - SQL - 用户 Zebra 表 - 两个用户是否相互关注?

Python:Scrapy 蜘蛛不返回结果?

将 NA 或 <NA> 替换为数据框列中的某些内容或其他内容

mysql - 日期格式不起作用,mysql

php - json解码合并后

html - 将 .CSV 数据放入 HTML(Github 页面)

java - 如何使用 jackson CsvMapper 或其他 csv 解析器解析 CSV 字符串中的列?

replace - 替换 : replacement evaluation

javascript - 使用 Greasemonkey 更改链接和文字