sql - 将文件中的矩阵加载到 PostgreSQL 表中

标签 sql postgresql matrix file-io postgresql-copy

我有一个 universities.txt 文件,如下所示:

Alabama

Air University
Alabama A&M University
Alabama State University
Concordia College-Selma
Faulkner University
Huntingdon College
Jacksonville State University
Judson College
Miles College
Oakwood College
Samford University
Southeastern Bible College
Southern Christian University
Spring Hill College
Stillman College
Talladega College
University of North Alabama
University of South Alabama
University of West Alabama

Alaska

Alaska Bible College
Alaska Pacific University
Sheldon Jackson College
University of Alaska - Anchorage
University of Alaska - Fairbanks
University of Alaska - Southeast

Arizona

American Indian College of the Assemblies of God
Arizona State University
Arizona State University East
Arizona State University West
DeVry University-Phoenix
Embry-Riddle Aeronautical University
Grand Canyon University
Northcentral University
Northern Arizona University

.. and so on, where in this case Alabama, Alaska and Arizona are locations and everything else are universities. What I want to do is load the location into a table called Location and the Universities into a table called University, where the Id of the Location table is a FK to the University table, like this:

CREATE TABLE Location (
Id          SERIAL PRIMARY KEY,
Name        TEXT
);

CREATE TABLE University (
Id          SERIAL PRIMARY KEY,
Location    INTEGER REFERENCES Location (Id) NOT NULL,
Name        TEXT
);

所以我想在 Postgres 中做的是这样的:

for (int i=0 until i = universities.size()  i++){
//each entry in the universities vector is a tuple with the first entry being the country/state
//and the second entry being a vector of the universities as String's
Vector tuple = (Vector)universities.get(i);
//insert into location table
String state = (String)tuple.get(0); 
Vector u = (Vector)tuple.get(1);
for(int j=0; until j =u.size(); j++){
//insert into university table with i as FK to location table

有人知道怎么做吗?

最佳答案

这是一个纯 SQL 解决方案

使用COPY使用 data modifying CTEs 将您的文件导入临时表和一个 DML 语句(需要 PostgreSQL 9.1 或更高版本)完成剩下的工作。这两个步骤都应该很快:

具有单个文本列的临时表(在 session 结束时自动删除):

CREATE TEMP TABLE tmp (txt text);

从文件导入数据:

COPY tmp FROM '/path/to/file.txt'

如果您是从远程客户端执行此操作,请使用 meta command \copy of psql相反。

我的解决方案取决于问题中显示的数据格式。即:城市前后有一个空行。我假设导入文件中有实际的空字符串。确保在第一个城市之前有一个带空字符串的前导行,以避免出现特殊情况。

行将按顺序插入。我将其用于以下窗口函数而无需排序。

WITH x AS (
    SELECT txt
          ,row_number() OVER () AS rn
          ,lead(txt) OVER () = '' AND
            lag(txt) OVER () = '' AS city
    FROM   tmp                -- don't remove empty rows just yet
    ), y AS (
    SELECT txt, city
          ,sum(city::int) OVER w AS id
    FROM   x
    WHERE  txt <> ''          -- remove empty rows now
    WINDOW w AS (ORDER BY rn)
    ), l AS (
    INSERT INTO location (id, name)
    SELECT id, txt
    FROM   y
    WHERE  city
    ), u AS (
    INSERT INTO university u (location, name)
    SELECT id, txt
    FROM   y
    WHERE  NOT city
    )
SELECT setval('location_id_seq', max(id))
FROM   y;

瞧。

  • CTE x 根据城市前后行中的空字符串值标记城市。

  • CTE y 添加城市的运行总和 (id),从而为每个城市及其所属的城市形成一个完全有效的 id统一。

  • CTE lu 执行插入,现在这很容易。

  • 最后的 SELECT 设置附加到 location.id 的序列的下一个值。我们还没有使用它,所以我们必须将它设置为当前的最大值,否则我们会在未来的 INSERT 到位置时遇到重复键错误。

关于sql - 将文件中的矩阵加载到 PostgreSQL 表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10407545/

相关文章:

c - 检测C中的行尾

java - PostgreSQL 异常 : syntax error

MySQL 组小结果

MYSQL删除每个GROUP超过15条的所有记录

postgresql - 需要在查询中捕获 NULL

c++ - opencv矩阵除以标量产生非常大/小的数字

sql - 如何为表中的列的给定值选择最多 N 条记录?

ruby-on-rails - Rails - 使用复选框在数据库中存储多个值

postgresql - 我如何计算 Postgres 9.6 中唯一对的数量?

r - R中的类内相关系数,如何处理NA以及评分者和受试者之间没有太多重叠