MySQL 将数据库列从 Blob 转换为单独的部分

标签 mysql sql database migration

我试图从一个旧的 MySQL 表中取出一个 blob 并为它创建一个新表,以努力达到第一范式。然而,事实证明,要将数据库中已有的数据从 blob 转换为新表中的多行并非易事。

用SQL命令实现转换最简单的方法是什么?

父表:

CREATE  TABLE TEST.People (
  `id` INT  AUTO_INCREMENT,
  `age` INT,
  `height` INT,
  `weight` INT  ,
  `variations` BLOB DEFAULT NULL,
  PRIMARY KEY (`id`), 
);

新表:

CREATE  TABLE TEST.Variations (
  `id` INT  AUTO_INCREMENT,
  `chr` INT,
  `start` INT,
  `stop` INT  ,
  `type` ENUM('SNP','INDEL','CNV') DEFAULT NULL,
  PRIMARY KEY (`id`), 
);

当我运行 SELECT id,variations FROM TEST.People; 我得到:

+----+----------------------------------------------------------------------------------------------------------------------+
| id | variations                                                                                                           |
+----+----------------------------------------------------------------------------------------------------------------------+
|  3 | xp   t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV                                                           |
|  6 | xp                                                                                                                   |
|  9 | xp                                                                                                                   |
| 12 | xp   t !1:84289718-85466763/CNV                                                                                      |
| 15 | xp                                                                                                                   |
| 18 | xp                                                                                                                   |
| 21 | xp                                                                                                                   |
| 24 | xp                                                                                                                   |
| 27 | xp                                                                                                                   |
| 30 | xp   t !10:166909544-166909544/SNPt !2:66903445-66903445/SNPt !2:166897864-166897864/CNVt !7:6892788-6892788/SNP     |
+----+----------------------------------------------------------------------------------------------------------------------+

所以我希望 TEST.Variations 表在转换后具有的是:

+----+-----+-----------+-----------+----------+
| id | chr | start     | stop      | type     |  
+----+-----+-----------+-----------+----------+
|  3 |   3 | 124093754 | 124467278 | CNV      |
|  3 |   7 |  78030601 |  79638023 | CNV      |
| 12 |   1 |  84289718 |  85466763 | CNV      |
| 30 |  10 | 166909544 | 166909544 | SNP      |
| 30 |   2 |  66903445 |  66903445 | SNP      |
| 30 |   2 | 166897864 | 166897864 | CNV      |
| 30 |   7 |   6892788 |   6892788 | SNP      |
+----+-----+-----------+-----------+----------+

最佳答案

首先两件事:

  1. 您的 id 3 的数据不一致。7:... 之前没有 !。我希望这只是一个错字

    xp   t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV
                                      ^^
    
  2. 如果你想在你的目标表中有一个auto_increment列,那么你的架构应该看起来像这样

    CREATE  TABLE variations 
    (
      `var_id` INT NOT NULL AUTO_INCREMENT,
      `id`    INT, -- id from People goes here and it's not UNIQUE
      `chr`   INT,
      `start` INT,
      `stop`  INT ,
      `type`  ENUM('SNP','INDEL','CNV') DEFAULT NULL,
      PRIMARY KEY (`var_id`) 
    );
    

现在您可以通过查询将数据从People 传输到Variations

INSERT INTO variations (id, chr, start, stop, type)
SELECT id, 
       SUBSTRING_INDEX(variation, ':', 1) chr,
       SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', 1), ':', -1) start,
       SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', -1), '/', 1) stop,
       SUBSTRING_INDEX(variation, '/', -1) type
  FROM
(
  SELECT p.id, SUBSTRING_INDEX(SUBSTRING_INDEX(p.variations, 't !', n.n), 't !', -1) variation
    FROM 
  (
    SELECT id, SUBSTR(variations, 9) variations
      FROM people 
     WHERE variations LIKE 'xp   t !%'
  ) p CROSS JOIN 
  (
     SELECT a.N + b.N * 10 + 1 n
       FROM 
      (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
     ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
      ORDER BY n
  ) n
   WHERE n.n <= 1 + (LENGTH(p.variations) - LENGTH(REPLACE(p.variations, 't !', ''))) / 3
   ORDER BY id
) q
 ORDER BY id, chr, start, stop, type;

注意:此查询将拆分每个 ID 最多 100 个变体。如果您需要更多或更少,您可以通过使用 n 别名编辑内部子查询来调整限制,这会即时生成数字(计数)表。

结果:

| VAR_ID | ID | CHR |     START |      STOP | TYPE |
|--------|----|-----|-----------|-----------|------|
|      1 |  3 |   3 | 124093754 | 124467278 |  CNV |
|      2 |  3 |   7 |  78030601 |  79638023 |  CNV |
|      3 | 12 |   1 |  84289718 |  85466763 |  CNV |
|      4 | 30 |  10 | 166909544 | 166909544 |  SNP |
|      5 | 30 |   2 | 166897864 | 166897864 |  CNV |
|      6 | 30 |   2 |  66903445 |  66903445 |  SNP |
|      7 | 30 |   7 |   6892788 |   6892788 |  SNP |

这是 SQLFiddle 演示

关于MySQL 将数据库列从 Blob 转换为单独的部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18749635/

相关文章:

linux - bash 脚本到一个类轮

mysql - SQL 内连接给出错误

mysql - 通过从两列中选择值插入同一表中的一列来插入

mysql - MySQL 中的 Oracle 序列等效项

.net - IsNull(Max(id)) 的最佳 Linq2Sql 等效项

database - 每个 DBMS 的所有保留字

mysql - Blob数据库,极其动态的数据库

mysql - sql 查询不适用于 concat 和 astext

php - 如何在 foreach 循环内进行 ORDER BY?

c# - 如何以没有前导零的 yyddd 格式格式化日期