python - 将 join 的两列合并为一列的结果

我正在尝试找出一种节省时间的方法来将我们通常连接在一起的两个表折叠成一个表。这些表包含读数，其中表 A 是包含读数类型的表，表 B 包含到表 A 的 FK 和实际读数的值。在我们的生产服务器上，这两个表各约 1 GB。

两个表的架构如下

表A

id | fk_id | timestamp | type
 1 |   1   | 1510155021| type A
 2 |   1   | 1510155021| type B

表B

id | fk_to_a | value 
1  |   1     | 30.5
2  |   2     | 50.7

我们通常执行一个看起来像这样的连接

select * 
from a 
join b
on b.fk_to_a = a.id
order by a.time_stamp desc

这里的关键点是连接会给我们返回一堆行，其中每 n 行都有一个 n+1“伙伴”行。

连接的示例结果是

a.id | a.fk_id | a.timestamp | a.type | b.id | b.fk_to_a | b.reading
  1  |   1     |  1510155021 | type A |   1  |    1      | 30.5
  2  |   1     |  1510155021 | type B |   2  |    2      | 50.7

第一行是n，第二行是n+1。 n 和 n+1 唯一的共同点是它们的时间戳，它总是相同的。

我们想将这两行压缩成一行，就像这样

c.id | c.fk_id | c.timestamp | c.a_reading | c.b_reading
 1   |    1    |  1510155021 |     30.5    |   50.7

我目前有一个非常非常基本的迁移脚本，我在其中使用 python 获取查询并存储连接结果，遍历此连接(这需要数小时)以找到 n 和 n+1 以便创建“对”，然后通过 INSERT 语句将这些对输出到新表中。

这是我的 for 循环，循环遍历连接，这是执行此 ETL 作业的 99% 的时间。

#above is the join, database initialization. I'm using pymysql 
combinedList = []
eventList = list(cursor.fetchall())
for idx, row in enumerate(eventList):
    if (idx + 1) < eventLength:
        if eventList[idx][2] == eventList[idx+1][2]:
            insertStatement = 'INSERT INTO c (fk_to_a, timestamp, a_reading, b_reading) VALUES('
            insertStatement += str(eventList[idx][1]) + ',' + str(eventList[idx][2]) + ',' + str(eventList[idx][6]) + ',' + str(eventList[idx+1][6]) + ');'
            combinedList.append(insertStatement)
            del eventList[idx+1]

    else:
       print 'end of the events'

我知道我的迁移策略还有改进的余地。有没有人有过像我想做的事情的经验？

感谢您抽出时间阅读本文。

最佳答案

JOIN 的目的是将两个单独的行(来自)合并为一个行，无论这些行来自不同的表还是来自同一个表。您可以编写一个相对简单的查询来生成您想要的行，例如

select
  a1.id as id,
  a1.fk_id as fk_id,
  a1.timestamp as timestamp,
  b1.reading as a_reading,
  b2.reading as b_reading
from
  a as a1
  join a as a2 on a1.timestamp = a2.timestamp
  join b as b1 on b1.fk_to_a = a1.id
  join b as b2 on b2.fk_to_a = a2.id
where
  a1.type = 'type A' and a2.type = 'type B'

在 MySQL 中，您可以将这样的查询与 CREATE TABLE ... SELECT 语句或 INSERT INTO ... SELECT 语句结合使用(取决于目标表是否已经存在)来填充新表，将所有内容保存在数据库中。将其保存在数据库中应该会带来实质性的改进。

原始表上的合适索引可能有助于提高查询性能。您可能会发现仅在最初填充结果表后才在结果表上创建任何需要的索引更有效。

关于python - 将 join 的两列合并为一列的结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47184102/

python - 将 join 的两列合并为一列的结果

上一篇：java - 如何使用 JOOQ 批量执行

下一篇：mysql - 从表中查找所有最后一个 child