绝望地卡在下面,直到现在我的编程快速拨号伙伴都没有能够提供帮助(他们中的大多数不是 MySQL 专家):
我有不同的表,其中列名和数据类型是使用 CSV 文件从“导入表数据向导”自动生成的,并且该表不包含 AUTO INCREMENT 列(还)。这个特定的表包含大约:30.000 行它从 row=id(1) 从一个看起来像这样的表开始:
我正在尝试使用一个“更正”表来更正一列中以逗号分隔
的值。为此,我正在编写一个包含 WHILE 循环的存储过程,以逐行交互校正表,并检查是否在导入的表中找到别名。
| id | material | alias01 | alias02 | alias03 | *up to 12
1 Katoen Cotton Supima Pima
2 Polyester Polyster
3 Lyocell Lycocell Lyocel
4 Linnen Linen
5 Viscose Visose Viskose Viscoe Voscose
6 Scheerwol
7 Polyamide
8 Nylon
9 Leer Leder Lamsleder Varkensleder
10 Polyurethaan Polyurethan PU Polyuretaan
出于测试目的以测试任何类型的结果,我现在只使用 alias01(它需要检查 alias01,然后是 02 等...但我会在稍后尝试解决该问题)。
它需要比较“长度”(alias_string_length = found_string_length)以确保在“wool”或“wol”中找不到由“wo”组成的字符串。
列中需要更正的值如下所示(逗号不需要在那里,这正是我要使用的):
| material |
,Katoen,Elastaan,Voering,Acetaat,Polyester
,Nylon,Polyester,Elastaan
,Katoen
,Leder,in,Leder,Loopzool,Leder
,Polyester
,Polyester,Elastaan,Voering,Polyester
更新
感谢 Drew 的提示,我更改了程序。我添加了一个 tmp 表,其中包含每行的 Material 和唯一 ID,并使用 alias01 遍历每一行。执行 9000 行大约需要 11 秒,但 0 行受到影响,
。任何关于提高速度的提示都是最受欢迎的,但深入了解可能是什么问题会更有帮助。
CREATE DEFINER=`root`@`localhost` PROCEDURE `replace_materials`()
BEGIN
set @rownumber = 1;
set @totalrows = 28;
set @um ='';
set @cm ='';
set @corrected ='';
set @correctme ='';
TRUNCATE TABLE tmp;
INSERT INTO tmp (material) SELECT material FROM vantilburgonline.productinfo;
WHILE (@rownumber < @totalrows) DO
SET @um = (SELECT alias01 FROM vantilburgonline.materials WHERE id=@rownumber);
-- gives 'um' value from column alias01, from table materials, row(X)
SET @cm = (SELECT material FROM vantilburgonline.materials WHERE id=@rownumber);
-- gives 'cm' value from column material, from table materials, row(X)
set @tmprow = 1;
set @totaltmprow =9000;
WHILE (@tmprow < @totaltmprow) DO
SET @correctme = (SELECT material FROM vantilburgonline.tmp WHERE id = @tmprow);
-- gives the value from column material from table tmp to correctme(X).
SET @correctme = REPLACE(@correctme,@um,@cm);
-- should run through column material from table productinfo and replace 'alias01' with correct 'material'.
SET @tmprow = @tmprow +1;
END WHILE;
SET @rownumber = @rownumber +1;
END WHILE;
END
虽然我确定 alias01 包含它应该在 Material 中找到的字符串。 Workbench 此时也使用了 9GB,我只能通过重新启动来解决这个问题。
最佳答案
我建议对您的 materials
表进行更改,该表对于多列(alias01 .. alias12)来说很笨重。向标准化、可扩展系统的过渡。它会有一个 Material 表和一个 materials_alias 表。由于它与您当前创建的表格并排放置,因此我将它们命名为 2
。
架构
drop table if exists materials2;
create table materials2
( material varchar(100) primary key, -- let's go with a natural key
active bool not null -- turn it LIVE and ON for string replacement of alias back to material name
-- so active is TRUE for ones to do replacement, or FALSE for skip
-- facilitates your testing of your synonyms, translations, slangs, etc
)engine=INNODB;
insert materials2 (material,active) values
('KARTON',true),
('Polyester',false),
('Lyocell',false),
('Linnen',true),
('Viscose',true),
('Scheerwol',false),
('Nylon',false),
('Leer',true),
('Polyurethaan',true),
('Polyacryl',true),
('Acryl',false),
('Modal',true),
('Acetaat',true),
('Papier',false),
('Wol',true),
('Zijde',true),
('Temcal',false),
('Polyamide',true),
('Wol-Merino',true),
('Elastan',true),
('Elastomultiester',true);
-- 21 rows
-- a few rows were skipped. The intent of them read as gibberish to me. Please review.
-- we need to restructure the materials2_alias table (after the first attempt)
-- 1. it might need special handling when `alias` is a legitimate substring of `material` (those 2 columns)
-- 2. it needs a unique composite index
drop table if exists materials2_alias;
create table materials2_alias
( id int auto_increment primary key,
material varchar(100) not null,
alias varchar(100) not null,
ais bool not null, -- Alias is Substring (alias is a legitimate substring of material, like Wo and Wol, respectively)
unique key(material,alias), -- Composite Index, do not allow dupe combos (only 1 row per combo)
foreign key `m2alias_m2` (material) references materials2(material)
)engine=INNODB;
insert materials2_alias (material,alias,ais) values
('KARTON','Cotton',false),('KARTON','Katoen',false),('KARTON','Pima',false),
('Polyester','Polyster',false),
('Lyocell','Lycocell',false),('Lyocell','Lyocel',false),
('Linnen','Linen',false),
('Viscose','Visose',false),('Viscose','Viskose',false),('Viscose','Viscoe',false),('Viscose','Voscose',false),
('Leer','Leder',false),('Leer','Lamsleder',false),('Leer','Varkensleder',false),('Leer','Schapenleder',false),('Leer','Geitenleder',false),
('Polyurethaan','Polyurethan',false),('Polyurethaan','PU',false),('Polyurethaan','Polyuretaan',false),('Polyurethaan','Polyurathane',false),('Polyurethaan','Polyurtaan',false),('Polyurethaan','Polyueretaan',false),
('Polyacryl','Polyacrylic',false),
('Acetaat','Leder',false),('Acetaat','Lamsleder',false),
('Wol','Schuurwol',false),('Wol','Wool',false),('Wol','WO',false),('Wol','Scheerwol',false),
('Zijde','Silk',false),('Zijde','Sede',false),
('Polyamide','Polyamie',false),('Polyamide','Polyamid',false),('Polyamide','Poliamide',false),
('Wol-Merino','Merino',false),
('Elastan','Elastaan',false),('Elastan','Spandex',false),('Elastan','Elataan',false),('Elastan','Elastane',false),
('Elastomultiester','elastomutltiester',false),('Elastomultiester','Elasomultiester',false);
-- this cleans up the above, where false should have been true
update materials2_alias
set ais=true
where instr(material,alias)>0;
-- 4 rows
有几个 alter table
语句和其他东西。我将尝试记录它们或链接到它们。考虑到您有几百行代码,我只是想捕捉一些东西来分享。但我的代码归结为一个简单的代码块,您可以将其放入一个循环中。
循环中的更新:
UPDATE productinfo pi
join materials2_alias ma
on instr( pi.material, concat(',',ma.alias,',') )>0
join materials2 m
on m.material=ma.material and m.active=true
set pi.material=replace(lower(pi.material),lower(ma.alias),lower(ma.material)),
pi.touchCount=pi.touchCount+1;
关于更新的一些注意事项:
-- Note, pi.material starts and ends with a comma.
-- I forced that during the ETL. But `ma.alias` does not contain commas.
-- So add the commas with a concat() within the "Update with a Join" pattern shown
--
-- Note that the commas solved the problem with the Wol - Wo
嗯,特别是以下 4 个。
select * from materials2_alias
where ais=true
order by material,alias;
+----+------------+----------+-----+
| id | material | alias | ais |
+----+------------+----------+-----+
| 6 | Lyocell | Lyocel | 1 |
| 33 | Polyamide | Polyamid | 1 |
| 28 | Wol | WO | 1 |
| 35 | Wol-Merino | Merino | 1 |
+----+------------+----------+-----+
-- instr() is not case sensitive except for binary strings
-- REPLACE(str,from_str,to_str); -- case sensitive
-- http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_replace
--
-- so the update uses lower() or this won't work due to replace() case sensitivity
--
存储过程:
DROP PROCEDURE if exists touchCounts;
DELIMITER $$
CREATE PROCEDURE touchCounts()
BEGIN
select touchCount,count(*) as rowCount
from productinfo
group by touchCount
order by touchCount;
END $$
DELIMITER ;
当该存储过程在连续调用(下一次调用)中返回相同的行数时,您就完成了通过更新修改 material
列。
该存储过程自然会为行数返回一个 out
参数。但是已经很晚了,该 sleep 了。
对于您这边的最后一个数据集,更新语句需要调用 4 次。在我的普通笔记本电脑上,这就像 13 秒。这个想法自然是灵活的,如果您愿意,可以为每种 Material 创建数百个别名。
我把它停在了 github因为否则太多了。
关于mysql - 使用存储过程或函数遍历一个表并替换另一个表中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37841364/