我有一个 universities.txt 文件,如下所示:
Alabama Air University Alabama A&M University Alabama State University Concordia College-Selma Faulkner University Huntingdon College Jacksonville State University Judson College Miles College Oakwood College Samford University Southeastern Bible College Southern Christian University Spring Hill College Stillman College Talladega College University of North Alabama University of South Alabama University of West Alabama Alaska Alaska Bible College Alaska Pacific University Sheldon Jackson College University of Alaska - Anchorage University of Alaska - Fairbanks University of Alaska - Southeast Arizona American Indian College of the Assemblies of God Arizona State University Arizona State University East Arizona State University West DeVry University-Phoenix Embry-Riddle Aeronautical University Grand Canyon University Northcentral University Northern Arizona University
.. and so on, where in this case Alabama, Alaska and Arizona are locations and everything else are universities. What I want to do is load the location into a table called Location
and the Universities into a table called University
, where the Id
of the Location
table is a FK to the University
table, like this:
CREATE TABLE Location (
Id SERIAL PRIMARY KEY,
Name TEXT
);
CREATE TABLE University (
Id SERIAL PRIMARY KEY,
Location INTEGER REFERENCES Location (Id) NOT NULL,
Name TEXT
);
所以我想在 Postgres 中做的是这样的:
for (int i=0 until i = universities.size() i++){
//each entry in the universities vector is a tuple with the first entry being the country/state
//and the second entry being a vector of the universities as String's
Vector tuple = (Vector)universities.get(i);
//insert into location table
String state = (String)tuple.get(0);
Vector u = (Vector)tuple.get(1);
for(int j=0; until j =u.size(); j++){
//insert into university table with i as FK to location table
有人知道怎么做吗?
最佳答案
这是一个纯 SQL 解决方案。
使用COPY
使用 data modifying CTEs 将您的文件导入临时表和一个 DML 语句(需要 PostgreSQL 9.1 或更高版本)完成剩下的工作。这两个步骤都应该很快:
具有单个文本列的临时表(在 session 结束时自动删除):
CREATE TEMP TABLE tmp (txt text);
从文件导入数据:
COPY tmp FROM '/path/to/file.txt'
如果您是从远程客户端执行此操作,请使用 meta command \copy
of psql相反。
我的解决方案取决于问题中显示的数据格式。即:城市前后有一个空行。我假设导入文件中有实际的空字符串。确保在第一个城市之前有一个带空字符串的前导行,以避免出现特殊情况。
行将按顺序插入。我将其用于以下窗口函数而无需排序。
WITH x AS (
SELECT txt
,row_number() OVER () AS rn
,lead(txt) OVER () = '' AND
lag(txt) OVER () = '' AS city
FROM tmp -- don't remove empty rows just yet
), y AS (
SELECT txt, city
,sum(city::int) OVER w AS id
FROM x
WHERE txt <> '' -- remove empty rows now
WINDOW w AS (ORDER BY rn)
), l AS (
INSERT INTO location (id, name)
SELECT id, txt
FROM y
WHERE city
), u AS (
INSERT INTO university u (location, name)
SELECT id, txt
FROM y
WHERE NOT city
)
SELECT setval('location_id_seq', max(id))
FROM y;
瞧。
CTE
x
根据城市前后行中的空字符串值标记城市。CTE
y
添加城市的运行总和 (id
),从而为每个城市及其所属的城市形成一个完全有效的id
统一。CTE
l
和u
执行插入,现在这很容易。最后的
SELECT
设置附加到location.id
的序列的下一个值。我们还没有使用它,所以我们必须将它设置为当前的最大值,否则我们会在未来的 INSERT 到位置时遇到重复键错误。
关于sql - 将文件中的矩阵加载到 PostgreSQL 表中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10407545/