xml - 将 XML 数据导入 PostgreSQL 9.5.12、Ubuntu 16.04.4 的问题

标签 xml postgresql xpath

注意:这是一个关于尝试执行中的建议时遇到的错误的问题 import-xml-files-to-postgresql

我正在尝试导入单行 XML 文件,以测试导入所有行所需的代码,这些行应该超过 600,000。我的 XML 如下所示:

<response>
<row>
<row _id="1" _uuid="7A68A6C8-3E73-4976-A4BD-9995F97A580F" _position="1" _address="https://data.kcmo.org/resource/vrys-qgrz/1">
<objectid>471537</objectid>
<parcelid>2960</parcelid>
<kivapin>100064</kivapin>
<subdivision></subdivision>
<landusecode>1111 - Single Family (Non-Mobile Home Park)</landusecode>
<apn>CL1330600060270001</apn>
<parceltype>Parcels</parceltype>
<status>2 - Existing</status>
<condo>No</condo>
<prefix>N</prefix>
<own_name>Smith John</own_name>
<own_addr>123 Main Street</own_addr>
<own_city>Kansas City</own_city>
<own_zip>64114-1234</own_zip>
<shape_length>410.3620269</shape_length>
<shape_area>9314.662882</shape_area>
<latitude>39.2636</latitude>
<longitude>-94.5698</longitude>
<location_1 human_address="{&quot;address&quot;:&quot;123 Main Street&quot;,&quot;city&quot;:&quot;Kansas City&quot;,&quot;state&quot;:&quot;MO&quot;,&quot;zip&quot;:&quot;64114-1234&quot;}" latitude="39.2636" longitude="-94.5698" needs_recoding="false"/>
</row>
</row>
</response>

我将其插入数据库表的代码如下:

SELECT
  (xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
  (xpath('//parcelid/text()', myTempTable.myXmlColumn))[1]::text AS parcelid,
  (xpath('//kivapin/text()', myTempTable.myXmlColumn))[1]::text AS kivapin,
  (xpath('//subdivision/text()', myTempTable.myXmlColumn))[1]::text AS subdivision,
  (xpath('//block/text()', myTempTable.myXmlColumn))[1]::text AS block,
  (xpath('//lot/text()', myTempTable.myXmlColumn))[1]::text AS lot,
  (xpath('//datecreated/text()', myTempTable.myXmlColumn))[1]::text AS datecreated,
  (xpath('//landusecode/text()', myTempTable.myXmlColumn))[1]::text AS landusecode,
  (xpath('//apn/text()', myTempTable.myXmlColumn))[1]::text AS apn,
  (xpath('//parceltype/text()', myTempTable.myXmlColumn))[1]::text AS parceltype,
  (xpath('//status/text()', myTempTable.myXmlColumn))[1]::text AS status,
  (xpath('//condo/text()', myTempTable.myXmlColumn))[1]::text AS condo,
  (xpath('//platname/text()', myTempTable.myXmlColumn))[1]::text AS platname,
  (xpath('//fraction/text()', myTempTable.myXmlColumn))[1]::text AS fraction,
  (xpath('//prefix/text()', myTempTable.myXmlColumn))[1]::text AS prefix,
  (xpath('//suite/text()', myTempTable.myXmlColumn))[1]::text AS suite,
  (xpath('//own_name/text()', myTempTable.myXmlColumn))[1]::text AS own_name,
  (xpath('//own_addr/text()', myTempTable.myXmlColumn))[1]::text AS own_addr,
  (xpath('//own_city/text()', myTempTable.myXmlColumn))[1]::text AS own_city,
  (xpath('//own_zip/text()', myTempTable.myXmlColumn))[1]::text AS own_zip,
  (xpath('//blvdfront/text()', myTempTable.myXmlColumn))[1]::text AS blvdfront,
  (xpath('//lastupdate/text()', myTempTable.myXmlColumn))[1]::text AS lastupdate,
  (xpath('//shape_length/text()', myTempTable.myXmlColumn))[1]::text AS shape_length,
  (xpath('//shape_area/text()', myTempTable.myXmlColumn))[1]::text AS shape_area,
  (xpath('//latitude/text()', myTempTable.myXmlColumn))[1]::text AS latitude,
  (xpath('//longitude/text()', myTempTable.myXmlColumn))[1]::text AS longitude,
  (xpath('//location1/text()' myTempTable.myXmlColumn))[1]::text AS location1,
  myTempTable.myXmlColumn as myXmlElement
FROM unnest(
  '//row'
  ,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8'))
) AS myTempTable(myXmlColumn);

尝试执行此语句会导致此错误:

[2018-03-26 19:42:50] Using batch mode (1000 insert/update/delete statements max)
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
(xpath('//parcelid/text()', myTempTable.myXmlColumn))[1]::text AS parcelid,
(xpath('//kivapin/text()', myTempTable.myXmlColumn))[1]::text AS kivapin,
...
[2018-03-26 19:42:50] [42601] ERROR: syntax error at or near "myTempTable"
[2018-03-26 19:42:50] Position: 2058
[2018-03-26 19:42:50] Summary: 1 of 1 statements executed, 1 failed in 380ms (2293 symbols in file)

我认为这可能是代码主体中的一些语法错误的问题,所以我只运行了第一个 xpath 语句,但它给出了一个错误:

[2018-03-26 19:46:17] Using batch mode (1000 insert/update/delete statements max)
SELECT
(xpath('//objectid/text()', myTempTable.myXmlColumn))[1]::text AS objectid,
myTempTable.myXmlColumn as myXmlElement
FROM unnest(
'//row'
,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8...
[2018-03-26 19:46:17] [42804] ERROR: could not determine polymorphic type because input has type "unknown"
[2018-03-26 19:46:17] Summary: 1 of 1 statements executed, 1 failed in 385ms (273 symbols in file)

我不太确定从这里到哪里去。

最佳答案

一旦您的表中已有 XML 文档,您就可以使用如下方式解析它:

 WITH j AS (SELECT UNNEST(XPATH('//row',myXmlColumn)) AS myXmlColumn
 FROM myTempTable)
 SELECT
      (xpath('//objectid/text()', j.myXmlColumn))[1]::text AS objectid,
      (xpath('//parcelid/text()', j.myXmlColumn))[1]::text AS parcelid,
      (xpath('//kivapin/text()', j.myXmlColumn))[1]::text AS kivapin,
      (xpath('//subdivision/text()', j.myXmlColumn))[1]::text AS subdivision,
      (xpath('//block/text()', j.myXmlColumn))[1]::text AS block,
      (xpath('//lot/text()', j.myXmlColumn))[1]::text AS lot,
      (xpath('//datecreated/text()', j.myXmlColumn))[1]::text AS datecreated,
      (xpath('//landusecode/text()', j.myXmlColumn))[1]::text AS landusecode,
      (xpath('//apn/text()', j.myXmlColumn))[1]::text AS apn,
      (xpath('//parceltype/text()', j.myXmlColumn))[1]::text AS parceltype,
      (xpath('//status/text()', j.myXmlColumn))[1]::text AS status,
      (xpath('//condo/text()', j.myXmlColumn))[1]::text AS condo,
      (xpath('//platname/text()', j.myXmlColumn))[1]::text AS platname,
      (xpath('//fraction/text()', j.myXmlColumn))[1]::text AS fraction,
      (xpath('//prefix/text()', j.myXmlColumn))[1]::text AS prefix,
      (xpath('//suite/text()', j.myXmlColumn))[1]::text AS suite,
      (xpath('//own_name/text()', j.myXmlColumn))[1]::text AS own_name,
      (xpath('//own_addr/text()', j.myXmlColumn))[1]::text AS own_addr,
      (xpath('//own_city/text()', j.myXmlColumn))[1]::text AS own_city,
      (xpath('//own_zip/text()', j.myXmlColumn))[1]::text AS own_zip,
      (xpath('//blvdfront/text()', j.myXmlColumn))[1]::text AS blvdfront,
      (xpath('//lastupdate/text()', j.myXmlColumn))[1]::text AS lastupdate,
      (xpath('//shape_length/text()', j.myXmlColumn))[1]::text AS shape_length,
      (xpath('//shape_area/text()', j.myXmlColumn))[1]::text AS shape_area,
      (xpath('//latitude/text()', j.myXmlColumn))[1]::text AS latitude,
      (xpath('//longitude/text()', j.myXmlColumn))[1]::text AS longitude,
      (xpath('//location1/text()', j.myXmlColumn))[1]::text AS location1,
      j.myXmlColumn as myXmlElement
    FROM j

CTE在处理大量数据时,s 并不总是我的首选,但它确实使代码更具可读性,并且在处理数据导入时值得考虑。

关于将 XML 文件导入 PostgreSQL,我总是使用 COPY为此,在取消嵌套之前使用中间表来存储 XML 文档。

类似于描述的 here :

$ psql db -c "CREATE TABLE tmp (doc XML);"
$ cat xmlfile.xml | psql db -c "COPY tmp FROM STDIN"

如果 PostgreSQL 提示你的数据有换行 \n,你可以使用像 sed, tr 或者甚至使用 perl -pe:

$ cat xmlfile.xml | perl -pe 's/\n/\\n/g' | psql db -c "COPY tmp FROM STDIN"

顺便说一句:您在查询中紧接此 xpath 表达式后遗漏了一个逗号 ,: (xpath('//location1/text()' myTempTable.myXmlColumn))[1]::text AS location1,

编辑:如果您有幸将文件直接放入数据库服务器的文件系统(我们大多数人不这样做),您可以继续使用pg_read_binary_fileconvert_from 通过 UNNEST 但请记住,表达式 //row/ 会导致未知类型,这对于在函数中用作参数可能很棘手。相反,使用简单的 XPATH 表达式来完成这项工作:

SELECT
...
FROM UNNEST(XPATH(
  '//row'
  ,XMLPARSE(DOCUMENT convert_from(pg_read_binary_file('parcel_data_first_row.xml'), 'UTF8')))
) AS myTempTable(myXmlColumn);

关于xml - 将 XML 数据导入 PostgreSQL 9.5.12、Ubuntu 16.04.4 的问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49502780/

相关文章:

xpath - 在 Xpath 中使用通配符和属性

java - 在android中调用XML中的java方法

postgresql - Laravel 5.3 Schema::create ENUM 字段是 VARCHAR

PostgreSQL 全文搜索标题不包含足够的上下文

postgresql - 将 “previous row” 值与 SELECT 语句中的现有行值相加

java - 如何使用java将webelement转换为selenium中的字符串?请参阅详细信息部分以获取更多信息

XPath 获取特定长度的文本

java - 从具有单个/多个子项的 XML 进行 Json 转换

android - 将对象的大小调整为屏幕的大小

java - Android,按钮根本不起作用