xml - 将 Infopath .XML 表单导入 R 中的数据框

标签 xml r xml-parsing infopath infopath-forms-services

在 R 中导入 Infopath .XML 表单并转换为数据框的最佳方法是什么?如果我在 Excel 中打开 Infopath .XML 文件,数据框的行和列会正确显示。

这是我在 R 中使用 XML 包尝试的内容:

  1. 我使用 xmlParse() 来解析 XML 文件
  2. 我使用 xmlToDataFrame() 尝试将已解析的 XML 文件转换为数据框

但是,在第 2 步中,我收到以下错误:

Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("touch your head13011000",  : 
  duplicate subscripts for columns

不过,当我在 Excel 中打开 XML 文件时,似乎没有重复的列。如何将此 XML 文件从 Infopath 转换为 R 中的数据框?预期的列应该是(因为它们出现在 Excel 中):

TCID, DateCoded, tcAge, T1_B3, T1_B2, T1_B1, T1_B0, T1_A3, T1_A2, T1_A1, T1_A0, T1_DelayTotal, T2_A3, T2_A2, T2_A1, T2_A, T2_B3, T2_B2, T2_B1, T2_B0, T2_DelayTotal, Coder, notes_t1, note_t2, bachildpres30, baparpres30, bapassptgo, bapassptnogo, bamissgame, P1_B3, P1_B2, P1_B1, P1_B0, P1_A3, P1_A2, P1_A1, P1_A0, P1_DelayTotal, P1_action, P1_go-nogo, P1_score, P1_delay, P1_trial, P1_Ecommand, P1_imitation, P1_restraint, P1_ruleswitch, P1_trials, P1_gotrials, P1_nogotrials, T1_gotrials, T1_nogotrials, T1_trials, T2_gotrials, T2_nogotrials, T2_trials, P1_notplay, T1_trial, T1_go-nogo, T1_score, T1_delay, T1_action, T2_trial, T2_go-nogo, T2_score, T2_delay, T2_action

对于在 XML 文件中多次出现的变量,我希望它们以长格式出现在数据框中(即,同一变量的多行)。我对 XML 文件没有太多经验,因此非常感谢您的帮助。

下面是我使用 xmlParse 时在 R 中解析的 XML 文件的样子:

<my:myFields lang="en-us" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint" xmlns:aml="http://schemas.microsoft.com/aml/2001/core" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:st="urn:schemas-microsoft-com:office:smarttags" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2009-07-01T18:12:59" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003">
 <my:SPSS>
  <my:TCID>10</my:TCID>
  <my:DateCoded>2013-04-01</my:DateCoded>
  <my:tcAge>30</my:tcAge>
  <my:T1_B3>6</my:T1_B3>
  <my:T1_B2>0</my:T1_B2>
  <my:T1_B1>0</my:T1_B1>
  <my:T1_B0>0</my:T1_B0>
  <my:T1_A3>0</my:T1_A3>
  <my:T1_A2>0</my:T1_A2>
  <my:T1_A1>1</my:T1_A1>
  <my:T1_A0>5</my:T1_A0>
  <my:T1_DelayTotal>1</my:T1_DelayTotal>
  <my:T2_A3 nil="true"/>
  <my:T2_A2 nil="true"/>
  <my:T2_A1 nil="true"/>
  <my:T2_A0 nil="true"/>
  <my:T2_B3 nil="true"/>
  <my:T2_B2 nil="true"/>
  <my:T2_B1 nil="true"/>
  <my:T2_B0 nil="true"/>
  <my:T2_DelayTotal nil="true"/>
  <my:Coder>Name</my:Coder>
 </my:SPSS>
 <my:notes_t1/>
 <my:note_t2/>
 <my:bachildpres30>0</my:bachildpres30>
 <my:baparpres30>0</my:baparpres30>
 <my:bapassptgo>1</my:bapassptgo>
 <my:bapassptnogo>0</my:bapassptnogo>
 <my:bamissgame>0</my:bamissgame>
 <my:P1_B3>4</my:P1_B3>
 <my:P1_B2>0</my:P1_B2>
 <my:P1_B1>0</my:P1_B1>
 <my:P1_B0>1</my:P1_B0>
 <my:P1_A3>0</my:P1_A3>
 <my:P1_A2>0</my:P1_A2>
 <my:P1_A1>1</my:P1_A1>
 <my:P1_A0>3</my:P1_A0>
 <my:P1_DelayTotal>0</my:P1_DelayTotal>
 <my:group2>
  <my:group3>
   <my:P1_action>touch your head</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>1</my:P1_trial>
   <my:P1_Ecommand>1</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your nose</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>2</my:P1_trial>
   <my:P1_Ecommand>1</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your tummy</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>3</my:P1_trial>
   <my:P1_Ecommand>1</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your head</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>4</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your head</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>5</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your nose</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>6</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>clap your hands</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>7</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your nose</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>8</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your ears</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>9</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your tummy</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>0</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>10</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your eyes</my:P1_action>
   <my:P1_go-nogo>0</my:P1_go-nogo>
   <my:P1_score>1</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>11</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>1</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
  <my:group3>
   <my:P1_action>touch your eyes</my:P1_action>
   <my:P1_go-nogo>1</my:P1_go-nogo>
   <my:P1_score>3</my:P1_score>
   <my:P1_delay>0</my:P1_delay>
   <my:P1_trial>12</my:P1_trial>
   <my:P1_Ecommand>0</my:P1_Ecommand>
   <my:P1_imitation>0</my:P1_imitation>
   <my:P1_restraint>0</my:P1_restraint>
   <my:P1_ruleswitch>0</my:P1_ruleswitch>
  </my:group3>
 </my:group2>
 <my:P1_trials>9</my:P1_trials>
 <my:P1_gotrials>5</my:P1_gotrials>
 <my:P1_nogotrials>4</my:P1_nogotrials>
 <my:T1_gotrials>6</my:T1_gotrials>
 <my:T1_nogotrials>6</my:T1_nogotrials>
 <my:T1_trials>12</my:T1_trials>
 <my:T2_gotrials>0</my:T2_gotrials>
 <my:T2_nogotrials>0</my:T2_nogotrials>
 <my:T2_trials>0</my:T2_trials>
 <my:P1_notplay/>
 <my:group4>
  <my:group5>
   <my:T1_trial>1</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>1</my:T1_delay>
   <my:T1_action>Touch your tongue</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>2</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your teeth</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>3</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your ear</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>4</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Clap your hands</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>5</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Clap your hands</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>6</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your eyes</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>7</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your feet</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>8</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your nose</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>9</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>1</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your nose</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>10</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your tummy</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>11</my:T1_trial>
   <my:T1_go-nogo>0</my:T1_go-nogo>
   <my:T1_score>0</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Wave your hand</my:T1_action>
  </my:group5>
  <my:group5>
   <my:T1_trial>12</my:T1_trial>
   <my:T1_go-nogo>1</my:T1_go-nogo>
   <my:T1_score>3</my:T1_score>
   <my:T1_delay>0</my:T1_delay>
   <my:T1_action>Touch your head</my:T1_action>
  </my:group5>
 </my:group4>
 <my:group6>
  <my:group7>
   <my:T2_trial>1</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your tongue</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>2</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your teeth</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>3</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your ear</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>4</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Clap your hands</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>5</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Clap your hands</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>6</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your eyes</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>7</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your feet</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>8</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your nose</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>9</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your nose</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>10</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your tummy</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>11</my:T2_trial>
   <my:T2_go-nogo>1</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Wave your hand</my:T2_action>
  </my:group7>
  <my:group7>
   <my:T2_trial>12</my:T2_trial>
   <my:T2_go-nogo>0</my:T2_go-nogo>
   <my:T2_score/>
   <my:T2_delay>0</my:T2_delay>
   <my:T2_action>Touch your head</my:T2_action>
  </my:group7>
 </my:group6>
</my:myFields>

最佳答案

根据我的经验,xmlToDataFrame 仅在 XML 已经以非常一致的方式构建时才有效。您正在使用的数据以多种不同的方式构建:

# Assuming you've already read your data into a character vector called `xml_file    
xml_file <- xmlParse(xml_file)
xml_file <- xmlToList(xml_file)

stack(sapply(xml_file, length))
   values           ind
1      22          SPSS
2       0      notes_t1
3       0       note_t2
4       1 bachildpres30
5       1   baparpres30
6       1    bapassptgo
7       1  bapassptnogo
8       1    bamissgame
9       1         P1_B3
10      1         P1_B2
11      1         P1_B1
12      1         P1_B0
13      1         P1_A3
14      1         P1_A2
15      1         P1_A1
16      1         P1_A0
17      1 P1_DelayTotal
18     12        group2
19      1     P1_trials
20      1   P1_gotrials
21      1 P1_nogotrials
22      1   T1_gotrials
23      1 T1_nogotrials
24      1     T1_trials
25      1   T2_gotrials
26      1 T2_nogotrials
27      1     T2_trials
28      0    P1_notplay
29     12        group4
30     12        group6
31      1        .attrs

所以你的大部分节点都包含一个值。有几个是空的。 “SPSS”节点包含22个值,名称各不相同,“group2”、“group4”、“group6”均包含12个节点,每个节点包含多个值,但各节点的值相似。当我查看 Excel 在导入文件时做了什么时,它将 12 个节点的组件堆叠在一起,然后将所有 22 个“SPSS”组件与所有单值节点串在一起并重复该字符串与通过堆叠 12 节点组件创建的行一样多,然后将这两个部分按列绑定(bind)在一起。

为此,从 12 节点 block 中分离出长字符串:

xml_file_singles <- xml_file[sapply(xml_file, length) != 12]
xml_file_singles[sapply(xml_file_singles, length) == 0] <- NA
xml_file_singles <- unlist(xml_file_singles)

xml_file_multiples <- xml_file[sapply(xml_file, length) == 12]

现在获取 12 节点 block 并将每个 block 转换为数据框:

xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) {

  x <- lapply(xml_file_multiples[[i]], function(y) {
    data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)})
  x <- do.call("rbind", x)
  cbind("group" = names(xml_file_multiples)[i], x)
})

现在使用 plyr 包的 rbind.fill 函数将所有新数据框放在一起:

require(plyr)

xml_file_multiples <- do.call("rbind.fill", xml_file_multiples)

现在 cbind 你的一长串值到你绑定(bind)的数据帧:

xml_final <- cbind(as.list(xml_file_singles), xml_file_multiples, 
  stringsAsFactors = FALSE)

这种方法与 Excel 的方法一样,引入了大量的 NA,因为不同的 12 节点 block 的列名都略有不同。如果您在调用 rbind.fill 之前执行此操作:

xml_file_multiples <- lapply(1:length(xml_file_multiples), function(i) {

  x <- lapply(xml_file_multiples[[i]], function(y) {
    data.frame(as.list(unlist(y)), stringsAsFactors = FALSE)})
  x <- do.call("rbind", x)
  x <- cbind("group" = names(xml_file_multiples)[i], x)
  colnames(x) <- gsub("^\\w\\d_", "", colnames(x))
  x
})

您会生成较少的 NA,因为您会生成较少的冗余列,但随后您将不得不依赖“组”列中的值来跟踪哪些行最初出现在哪个节点中。

关于xml - 将 Infopath .XML 表单导入 R 中的数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16593301/

相关文章:

Android View 以编程方式查看 XML

在 mcmc 中从 R 从 C 调用 R

r - 箱线图两个变量,根据第三个变量的平均值对它们进行着色

c# - 如何避免在 Visual Studio 中读取资源文件中的字节顺序标记 (BOM)?

java - 使用 XML 文件在 swing 中构建 GUI

javascript - 如何通过 jquery/JavaScript 使用 Google XML 建议

ruby - 如何使用 nokogiri 打印所有非空白 XML 节点的值及其标签名称?

python - 使用 NCBIXML 仅解析来自 BLAST 输出的前 3 个命中

java - 解析没有标记名的 xml

r - 如何根据它们对 R 中所有列的总和的贡献来删除数据框中的列