python - 使用python从xml中提取具体数据

标签 python python-3.x xml

我想从 data.xml 中收集特定信息,root[0] 'CaplockSet' 包含超过 100 个 'Caplock',其中我只需要提取作者信息!请帮助我,非常感谢您的支持。

<?xml version="1.0"?>

<CaplockSet>

<Caplock>
    <MedlineCitation Status="clonelisher" Owner="NLM">
        <PMID Version="1">32045906</PMID>
        <DateRevised>
            <Year>2020</Year>
            <Month>02</Month>
            <Day>11</Day>
        </DateRevised>
        <Article cloneModel="Print-Electronic">
            <Journal>
                <ISSN IssnType="Electronic">1423-0135</ISSN>
                <JournalIssue CitedMedium="Internet">
                    <cloneDate>
                        <Year>2020</Year>
                        <Month>Feb</Month>
                        <Day>11</Day>
                    </cloneDate>
                </JournalIssue>
                <Title>Journal of vascular research</Title>
                <ISOAbbreviation>J. Vasc. Res.</ISOAbbreviation>
            </Journal>
            <ArticleTitle>miR-96-5p Regulates Proliferation, Migration, and Apoptosis of Vascular Smooth Muscle Cell Induced by Angiotensin II via Targeting NFAT5.</ArticleTitle>
            <Pagination>
                <MedlinePgn>1-11</MedlinePgn>
            </Pagination>
            <ELocationID EIdType="doi" ValidYN="Y">10.1159/000505457</ELocationID>
            <Abstract>
                <AbstractText Label="BACKGROUND" NlmCategory="BACKGROUND">Aberrant proliferation, migration, and apoptosis of vascular smooth muscle cells (VSMCs) are major pathological phenomenon in hypertension. MicroRNAs (miRNAs/miRs) serve crucial roles in the progression of hypertension. We aimed to determine the role of miR-96-5p in the proliferation, migration, and apoptosis of VSMCs and its underlying mechanisms.</AbstractText>
                <AbstractText Label="METHODS" NlmCategory="METHODS">Angiotensin II (Ang II) was employed to treat VSMCs, and the expression of miR-96-5p was detected by RT-qPCR. Then, miR-96-5p mimic was transfected into VSMCs. Cell Counting Kit-8 assay, flow cytometry, transwell assay, and wound healing assay were applied to measure proliferation, cell cycle, and migration of VSMCs. The expression of proteins associated with proliferation, migration, and apoptosis was assessed. A luciferase reporter assay was applied to confirm the target binding between miR-96-5p and nuclear factors of activated T-cells 5 (NFAT5). Subsequently, siRNA was used to silence NFAT5, and cell proliferation, migration, and apoptosis were assessed.</AbstractText>
                <AbstractText Label="RESULTS" NlmCategory="RESULTS">The results revealed that the expression of miR-96-5p was downregulated in Ang II-induced VSMCs. MiR-96-5p overexpression inhibited cell proliferation and migration but promoted cell apoptosis, enhanced the percentages of cells in the G1 and G2 phases, and reduced those in the S phase, accompanied by changes in the expression associated proteins. NFAT5 was confirmed as a direct target of miR-96-5p. NFAT5 silencing had the same results with miR-96-5p overexpression on VSMC proliferation, migration, and apoptosis, whereas miR-96-5p inhibitor reversed these effects.</AbstractText>
                <AbstractText Label="CONCLUSIONS" NlmCategory="CONCLUSIONS">Our findings concluded that miR-96-5p could regulate proliferation, migration, and apoptosis of VSMCs induced by Ang II via targeting NFAT5.</AbstractText>
                <CopyrightInformation>© 2020 S. Karger AG, Basel.</CopyrightInformation>
            </Abstract>
            <AuthorList CompleteYN="Y">
                <Author ValidYN="Y">
                    <LastName>Tian</LastName>
                    <ForeName>Long</ForeName>
                    <Initials>L</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Cai</LastName>
                    <ForeName>Dinghua</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Zhuang</LastName>
                    <ForeName>Derong</ForeName>
                    <Initials>D</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Wenyuan</ForeName>
                    <Initials>W</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wang</LastName>
                    <ForeName>Xuan</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Bian</LastName>
                    <ForeName>Xiaoli</ForeName>
                    <Initials>X</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Xu</LastName>
                    <ForeName>Rui</ForeName>
                    <Initials>R</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Nephrology, Jiangdu People's Hospital, Yangzhou, China.</Affiliation>
                    </AffiliationInfo>
                </Author>
                <Author ValidYN="Y">
                    <LastName>Wu</LastName>
                    <ForeName>Guanji</ForeName>
                    <Initials>G</Initials>
                    <AffiliationInfo>
                        <Affiliation>Department of Cardiology, Xi'an Central Hospital of Xi'an Jiaotong University, Xi'an, China, guanjiguanji22@163.com.</Affiliation>
                    </AffiliationInfo>
                </Author>
            </AuthorList>
            <Language>eng</Language>
            <clonelicationTypeList>
                <clonelicationType UI="D016428">Journal Article</clonelicationType>
            </clonelicationTypeList>
            <ArticleDate DateType="Electronic">
                <Year>2020</Year>
                <Month>02</Month>
                <Day>11</Day>
            </ArticleDate>
        </Article>
        <MedlineJournalInfo>
            <Country>Switzerland</Country>
            <MedlineTA>J Vasc Res</MedlineTA>
            <NlmUniqueID>9206092</NlmUniqueID>
            <ISSNLinking>1018-1172</ISSNLinking>
        </MedlineJournalInfo>
        <CitationSubset>IM</CitationSubset>
        <KeywordList Owner="NOTNLM">
            <Keyword MajorTopicYN="N">Migration</Keyword>
            <Keyword MajorTopicYN="N">NFAT5</Keyword>
            <Keyword MajorTopicYN="N">Proliferation</Keyword>
            <Keyword MajorTopicYN="N">Vascular smooth muscle cell</Keyword>
            <Keyword MajorTopicYN="N">miR-96-5p</Keyword>
        </KeywordList>
    </MedlineCitation>
    <CardData>
        <History>
            <CardcloneDate cloneStatus="received">
                <Year>2019</Year>
                <Month>09</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="accepted">
                <Year>2019</Year>
                <Month>12</Month>
                <Day>16</Day>
            </CardcloneDate>
            <CardcloneDate cloneStatus="entrez">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="Card">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
            <CardcloneDate cloneStatus="medline">
                <Year>2020</Year>
                <Month>2</Month>
                <Day>12</Day>
                <Hour>6</Hour>
                <Minute>0</Minute>
            </CardcloneDate>
        </History>
        <clonelicationStatus>aheadofprint</clonelicationStatus>
        <ArticleIdList>
            <ArticleId IdType="Card">32045906</ArticleId>
            <ArticleId IdType="pii">000505457</ArticleId>
            <ArticleId IdType="doi">10.1159/000505457</ArticleId>
        </ArticleIdList>
    </CardData>
</Caplock>


</CaplockSet>

我尝试了多种方法来摆脱这个 .py 代码,但遇到了很多错误。我详细阐述了下面的方法之一

import xml.etree.ElementTree as ET

mytree = ET.parse('data.xml')
myroot = mytree.getroot()
for x in myroot.findall('Author'):
    lastname = x.find('LastName').text
    forename = x.find('ForeName').text
    affiliation = x.find('AffiliationInfo/Affiliation').text

    print(lastname,forename,affiliation)

错误

Traceback (most recent call last):
  File "c:/Users/jeeva/Desktop/data/program.py", line 3, in <module>
    mytree = ET.parse('data/data.xml')
  File "C:\Users\jeeva\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1202, in parse
    tree.parse(source, parser)
  File "C:\Users\jeeva\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 595, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: syntax error: line 2, column 21

最佳答案

也许这应该有效

def find_rec(node):
    for item in node.iter():
        if item.tag == "Author":
            author_values = {}
            for i in item.iter():
                author_values[i.tag] = i.text
            yield author_values


auth = find_rec(ET.parse('./data.xml').getroot())
for x in auth:
    print(x["LastName"], x["ForeName"], x["Affiliation"])

关于python - 使用python从xml中提取具体数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60221531/

相关文章:

python - 在 emacs 中,如何使用自定义参数执行当前编辑的 python 脚本?

java - 针对大型 XSD 验证大型 XML 文件,是否有一种快速方法可以做到这一点?

java - 将按钮放在 google map API 的右上角

python - CSRF 豁免失败 - APIView csrf django rest framework

python - Scipy 中没有图像函数

javascript - 如何将数据值从 View 传递到 Django 中的模板?

python - 从 python 程序中禁用哈希随机化

python - 如何修复 'Can' t 在 'localhost:3306' 错误上连接到 MySQL 服务器

php - XPath 在匹配占位符的属性节点时失败

python - python中使用Twisted框架的聊天服务器无法接收来自flash客户端的数据