我有一个“大”csv 文件(大约 1GB 数据,300 万行)要导入到 MariaDB 表中。
问题是,几乎每一行的每个字段都被视为一个字符串。 因此,我必须将“1 337”(字符串)转换为 1337(整数)。
这是用于导入表的脚本:
LOAD DATA LOW_PRIORITY LOCAL
INFILE 'data.txt'
INTO TABLE `test`.`test_import`
CHARACTER SET utf8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
(`id`,
`data`,
@NumberOne,
@NumberTwo,
@NumberThree,
@NumberFour)
SET `Number One` = REPLACE(@NumberOne, ' ', ''),
`Number Two` = REPLACE(@NumberOne, ' ', ''),
`Number Three` = REPLACE(@NumberOne, ' ', ''),
`Number Four` = REPLACE(@NumberOne, ' ', '');
使用这个脚本,导入999以下的数字没有问题。但是从1000开始(在我的csv中写成“1 000”
),我所得到的只是一个警告(截断不正确INTEGER 值:'1 000'
)以及我数据库中的值 1。
“有趣”的是,当我尝试这样做时:
SET `Number One` = REPLACE(@NumberOne, '1', 'k'),
`Number Two` = REPLACE(@NumberOne, '1', 'k'),
`Number Three` = REPLACE(@NumberOne, '1', 'k'),
`Number Four` = REPLACE(@NumberOne, '1', 'k')
-> REPLACE() 有效,“1 000”变为“k 000”。
那么,如何使用 REPLACE() 删除数字中的空格?或者,如何使 CAST()/CONVERT() 在“1 337”等字符串上正常工作?
<小时/>更多信息。
这是一个新鲜测试表:
CREATE OR REPLACE TABLE test_spaces_extr (
`Identifier` tinytext,
`First name` tinytext,
`Last name` tinytext,
`Number One` int unsigned,
`Number Two` int unsigned,
`Number Three` int unsigned,
`Number Four` int unsigned,
`Number Five` int unsigned,
`Number Six` int unsigned,
`Number Seven` int unsigned
);
这是导入 CSV 的脚本:
LOAD DATA LOW_PRIORITY LOCAL
INFILE 'some_data.txt'
INTO TABLE `test`.`test_spaces_extr`
CHARACTER SET utf8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\r\n'
(`Identifier`,
`First name`,
`Last name`,
@NumberOne,
@NumberTwo,
@NumberThree,
@NumberFour,
@NumberFive,
@NumberSix,
@NumberSeven)
SET `Number One` = REPLACE(@NumberOne, ' ', ''),
`Number Two` = REPLACE(@NumberTwo, ' ', ''),
`Number Three` = REPLACE(@NumberThree, ' ', ''),
`Number Four` = REPLACE(@NumberFour, ' ', ''),
`Number Five` = REPLACE(@NumberFive, ' ', ''),
`Number Six` = REPLACE(@NumberSix, ' ', ''),
`Number Seven` = REPLACE(@NumberSeven, ' ', '');
以下是some_data.txt
的完整内容:
"3efa639b3a";"已审查";"已审查";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
(一行,是的。)
这是结果:
"Identifier" "First name" "Last name" "Number One" "Number Two" "Number Three" "Number Four" "Number Five" "Number Six" "Number Seven"
"3efa639b3a" "Censored" "Censored" "7896" "3468" "3854" "5000" "1234" "9654" "0"
事实上,“Number”字段在这里变成了整数。所有这些,但不是最后一个(“数字七”->“0”)。
事情变得越来越奇怪了......
最佳答案
我无法重现该问题:
$ mysql -u user -p --column-type-info
MariaDB [(none)]> SELECT VERSION();
Field 1: `VERSION()`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: VAR_STRING
Collation: utf8_general_ci (33)
Length: 72
Max_length: 24
Decimals: 31
Flags: NOT_NULL
+-----------------+
| VERSION() |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)
MariaDB [(none)]> SELECT CAST(REPLACE('1 337', ' ', '') AS UNSIGNED);
Field 1: `CAST(REPLACE('1 337', ' ', '') AS UNSIGNED)`
Catalog: `def`
Database: ``
Table: ``
Org_table: ``
Type: LONGLONG
Collation: binary (63)
Length: 5
Max_length: 4
Decimals: 0
Flags: NOT_NULL UNSIGNED BINARY NUM
+---------------------------------------------+
| CAST(REPLACE('1 337', ' ', '') AS UNSIGNED) |
+---------------------------------------------+
| 1337 |
+---------------------------------------------+
1 row in set (0.00 sec)
更新
文件:/path/to/data.csv
"3efa639b3a";"Censored";"Censored";"7 896";"3 468";"3 854";"5 000";"1 234";"9 654";"1 337"
MariaDB [_]> SELECT VERSION();
+-----------------+
| VERSION() |
+-----------------+
| 10.0.31-MariaDB |
+-----------------+
1 row in set (0.00 sec)
MariaDB [_]> DROP TABLE IF EXISTS `test_spaces_extr`;
Query OK, 0 rows affected (0.07 sec)
MariaDB [_]> CREATE OR REPLACE TABLE `test_spaces_extr` (
-> `Identifier` tinytext,
-> `First name` tinytext,
-> `Last name` tinytext,
-> `Number One` int unsigned,
-> `Number Two` int unsigned,
-> `Number Three` int unsigned,
-> `Number Four` int unsigned,
-> `Number Five` int unsigned,
-> `Number Six` int unsigned,
-> `Number Seven` int unsigned
-> );
Query OK, 0 rows affected (0.00 sec)
MariaDB [_]> LOAD DATA LOW_PRIORITY LOCAL INFILE '/path/to/data.csv'
-> INTO TABLE `test_spaces_extr`
-> CHARACTER SET utf8
-> FIELDS TERMINATED BY ';'
-> OPTIONALLY ENCLOSED BY '"'
-> ESCAPED BY '"'
-> LINES TERMINATED BY '\r\n'
-> (
-> `Identifier`,
-> `First name`,
-> `Last name`,
-> @`NumberOne`,
-> @`NumberTwo`,
-> @`NumberThree`,
-> @`NumberFour`,
-> @`NumberFive`,
-> @`NumberSix`,
-> @`NumberSeven`
-> )
-> SET
-> `Number One` = REPLACE(@`NumberOne`, ' ', ''),
-> `Number Two` = REPLACE(@`NumberTwo`, ' ', ''),
-> `Number Three` = REPLACE(@`NumberThree`, ' ', ''),
-> `Number Four` = REPLACE(@`NumberFour`, ' ', ''),
-> `Number Five` = REPLACE(@`NumberFive`, ' ', ''),
-> `Number Six` = REPLACE(@`NumberSix`, ' ', ''),
-> `Number Seven` = REPLACE(@`NumberSeven`, ' ', '');
Query OK, 1 row affected (0.00 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
MariaDB [_]> SELECT
-> `Identifier`,
-> `First name`,
-> `Last name`,
-> `Number One`,
-> `Number Two`,
-> `Number Three`,
-> `Number Four`,
-> `Number Five`,
-> `Number Six`,
-> `Number Seven`
-> FROM
-> `test_spaces_extr`;
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| Identifier | First name | Last name | Number One | Number Two | Number Three | Number Four | Number Five | Number Six | Number Seven |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
| 3efa639b3a | Censored | Censored | 7896 | 3468 | 3854 | 5000 | 1234 | 9654 | 1337 |
+------------+------------+-----------+------------+------------+--------------+-------------+-------------+------------+--------------+
1 row in set (0.00 sec)
关于mysql - MariaDB : convert string to int when importing from CSV, 同时删除数字中的空格,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44521291/