r - 将表示多项逻辑回归的 PMML 转换回 R 系数

标签 r logistic-regression pmml

我最近的任务是将 PMML 解析回 R 模型。 (我进行了广泛搜索,没有库可以为您进行这种转换。)我正在尝试将包含多项逻辑回归的 PMML 转换回 R 模型,但我不知道如何转换任何PMML 文档中保存的系数转换为 R 模型保存的系数。

PMML 如下:

<?xml version="1.0"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
 <Header copyright="Copyright (c) 2014 hlin117" description="Generalized Linear Regression Model">
  <Extension name="user" value="hlin117" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2014-06-23 13:04:17</Timestamp>
 </Header>
 <DataDictionary numberOfFields="13">
  <DataField name="audit.train$TARGET_Adjusted" optype="continuous" dataType="double"/>
  <DataField name="ID" optype="continuous" dataType="double"/>
  <DataField name="Age" optype="continuous" dataType="double"/>
  <DataField name="Employment" optype="categorical" dataType="string">
   <Value value="Consultant"/>
   <Value value="Private"/>
   <Value value="PSFederal"/>
   <Value value="PSLocal"/>
   <Value value="PSState"/>
   <Value value="SelfEmp"/>
   <Value value="Volunteer"/>
  </DataField>
  <DataField name="Education" optype="categorical" dataType="string">
   <Value value="Associate"/>
   <Value value="Bachelor"/>
   <Value value="College"/>
   <Value value="Doctorate"/>
   <Value value="HSgrad"/>
   <Value value="Master"/>
   <Value value="Preschool"/>
   <Value value="Professional"/>
   <Value value="Vocational"/>
   <Value value="Yr10"/>
   <Value value="Yr11"/>
   <Value value="Yr12"/>
   <Value value="Yr1t4"/>
   <Value value="Yr5t6"/>
   <Value value="Yr7t8"/>
   <Value value="Yr9"/>
  </DataField>
  <DataField name="Marital" optype="categorical" dataType="string">
   <Value value="Absent"/>
   <Value value="Divorced"/>
   <Value value="Married"/>
   <Value value="Married-spouse-absent"/>
   <Value value="Unmarried"/>
   <Value value="Widowed"/>
  </DataField>
  <DataField name="Occupation" optype="categorical" dataType="string">
   <Value value="Cleaner"/>
   <Value value="Clerical"/>
   <Value value="Executive"/>
   <Value value="Farming"/>
   <Value value="Home"/>
   <Value value="Machinist"/>
   <Value value="Military"/>
   <Value value="Professional"/>
   <Value value="Protective"/>
   <Value value="Repair"/>
   <Value value="Sales"/>
   <Value value="Service"/>
   <Value value="Support"/>
   <Value value="Transport"/>
  </DataField>
  <DataField name="Income" optype="continuous" dataType="double"/>
  <DataField name="Gender" optype="categorical" dataType="string">
   <Value value="Female"/>
   <Value value="Male"/>
  </DataField>
  <DataField name="Deductions" optype="continuous" dataType="double"/>
  <DataField name="Hours" optype="continuous" dataType="double"/>
  <DataField name="IGNORE_Accounts" optype="categorical" dataType="string">
   <Value value="Canada"/>
   <Value value="China"/>
   <Value value="Columbia"/>
   <Value value="Cuba"/>
   <Value value="Ecuador"/>
   <Value value="England"/>
   <Value value="Fiji"/>
   <Value value="Germany"/>
   <Value value="Greece"/>
   <Value value="Guatemala"/>
   <Value value="Hong"/>
   <Value value="Hungary"/>
   <Value value="India"/>
   <Value value="Indonesia"/>
   <Value value="Iran"/>
   <Value value="Ireland"/>
   <Value value="Italy"/>
   <Value value="Jamaica"/>
   <Value value="Japan"/>
   <Value value="Malaysia"/>
   <Value value="Mexico"/>
   <Value value="NewZealand"/>
   <Value value="Nicaragua"/>
   <Value value="Philippines"/>
   <Value value="Poland"/>
   <Value value="Portugal"/>
   <Value value="Scotland"/>
   <Value value="Singapore"/>
   <Value value="Taiwan"/>
   <Value value="UnitedStates"/>
   <Value value="Vietnam"/>
   <Value value="Yugoslavia"/>
  </DataField>
  <DataField name="RISK_Adjustment" optype="continuous" dataType="double"/>
 </DataDictionary>
 <GeneralRegressionModel modelName="General_Regression_Model" modelType="generalizedLinear" functionName="regression" algorithmName="glm" distribution="binomial" linkFunction="logit">
  <MiningSchema>
   <MiningField name="audit.train$TARGET_Adjusted" usageType="predicted"/>
   <MiningField name="ID" usageType="active"/>
   <MiningField name="Age" usageType="active"/>
   <MiningField name="Employment" usageType="active"/>
   <MiningField name="Education" usageType="active"/>
   <MiningField name="Marital" usageType="active"/>
   <MiningField name="Occupation" usageType="active"/>
   <MiningField name="Income" usageType="active"/>
   <MiningField name="Gender" usageType="active"/>
   <MiningField name="Deductions" usageType="active"/>
   <MiningField name="Hours" usageType="active"/>
   <MiningField name="IGNORE_Accounts" usageType="active"/>
   <MiningField name="RISK_Adjustment" usageType="active"/>
  </MiningSchema>
  <Output>
   <OutputField name="Predicted_audit.train$TARGET_Adjusted" feature="predictedValue"/>
  </Output>
  <ParameterList>
   <Parameter name="p0" label="(Intercept)"/>
   <Parameter name="p1" label="ID"/>
   <Parameter name="p2" label="Age"/>
   <Parameter name="p3" label="EmploymentPrivate"/>
   <Parameter name="p4" label="EmploymentPSFederal"/>
   <Parameter name="p5" label="EmploymentPSLocal"/>
   <Parameter name="p6" label="EmploymentPSState"/>
   <Parameter name="p7" label="EmploymentSelfEmp"/>
   <Parameter name="p8" label="EmploymentVolunteer"/>
   <Parameter name="p9" label="EducationBachelor"/>
   <Parameter name="p10" label="EducationCollege"/>
   <Parameter name="p11" label="EducationDoctorate"/>
   <Parameter name="p12" label="EducationHSgrad"/>
   <Parameter name="p13" label="EducationMaster"/>
   <Parameter name="p14" label="EducationPreschool"/>
   <Parameter name="p15" label="EducationProfessional"/>
   <Parameter name="p16" label="EducationVocational"/>
   <Parameter name="p17" label="EducationYr10"/>
   <Parameter name="p18" label="EducationYr11"/>
   <Parameter name="p19" label="EducationYr12"/>
   <Parameter name="p20" label="EducationYr1t4"/>
   <Parameter name="p21" label="EducationYr5t6"/>
   <Parameter name="p22" label="EducationYr7t8"/>
   <Parameter name="p23" label="EducationYr9"/>
   <Parameter name="p24" label="MaritalDivorced"/>
   <Parameter name="p25" label="MaritalMarried"/>
   <Parameter name="p26" label="MaritalMarried-spouse-absent"/>
   <Parameter name="p27" label="MaritalUnmarried"/>
   <Parameter name="p28" label="MaritalWidowed"/>
   <Parameter name="p29" label="OccupationClerical"/>
   <Parameter name="p30" label="OccupationExecutive"/>
   <Parameter name="p31" label="OccupationFarming"/>
   <Parameter name="p32" label="OccupationHome"/>
   <Parameter name="p33" label="OccupationMachinist"/>
   <Parameter name="p34" label="OccupationMilitary"/>
   <Parameter name="p35" label="OccupationProfessional"/>
   <Parameter name="p36" label="OccupationProtective"/>
   <Parameter name="p37" label="OccupationRepair"/>
   <Parameter name="p38" label="OccupationSales"/>
   <Parameter name="p39" label="OccupationService"/>
   <Parameter name="p40" label="OccupationSupport"/>
   <Parameter name="p41" label="OccupationTransport"/>
   <Parameter name="p42" label="Income"/>
   <Parameter name="p43" label="GenderMale"/>
   <Parameter name="p44" label="Deductions"/>
   <Parameter name="p45" label="Hours"/>
   <Parameter name="p46" label="IGNORE_AccountsChina"/>
   <Parameter name="p47" label="IGNORE_AccountsColumbia"/>
   <Parameter name="p48" label="IGNORE_AccountsCuba"/>
   <Parameter name="p49" label="IGNORE_AccountsEcuador"/>
   <Parameter name="p50" label="IGNORE_AccountsEngland"/>
   <Parameter name="p51" label="IGNORE_AccountsFiji"/>
   <Parameter name="p52" label="IGNORE_AccountsGermany"/>
   <Parameter name="p53" label="IGNORE_AccountsGreece"/>
   <Parameter name="p54" label="IGNORE_AccountsGuatemala"/>
   <Parameter name="p55" label="IGNORE_AccountsHong"/>
   <Parameter name="p56" label="IGNORE_AccountsHungary"/>
   <Parameter name="p57" label="IGNORE_AccountsIndia"/>
   <Parameter name="p58" label="IGNORE_AccountsIndonesia"/>
   <Parameter name="p59" label="IGNORE_AccountsIran"/>
   <Parameter name="p60" label="IGNORE_AccountsIreland"/>
   <Parameter name="p61" label="IGNORE_AccountsItaly"/>
   <Parameter name="p62" label="IGNORE_AccountsJamaica"/>
   <Parameter name="p63" label="IGNORE_AccountsJapan"/>
   <Parameter name="p64" label="IGNORE_AccountsMalaysia"/>
   <Parameter name="p65" label="IGNORE_AccountsMexico"/>
   <Parameter name="p66" label="IGNORE_AccountsNewZealand"/>
   <Parameter name="p67" label="IGNORE_AccountsNicaragua"/>
   <Parameter name="p68" label="IGNORE_AccountsPhilippines"/>
   <Parameter name="p69" label="IGNORE_AccountsPoland"/>
   <Parameter name="p70" label="IGNORE_AccountsPortugal"/>
   <Parameter name="p71" label="IGNORE_AccountsScotland"/>
   <Parameter name="p72" label="IGNORE_AccountsSingapore"/>
   <Parameter name="p73" label="IGNORE_AccountsTaiwan"/>
   <Parameter name="p74" label="IGNORE_AccountsUnitedStates"/>
   <Parameter name="p75" label="IGNORE_AccountsVietnam"/>
   <Parameter name="p76" label="IGNORE_AccountsYugoslavia"/>
   <Parameter name="p77" label="RISK_Adjustment"/>
  </ParameterList>
  <FactorList>
   <Predictor name="Employment"/>
   <Predictor name="Education"/>
   <Predictor name="Marital"/>
   <Predictor name="Occupation"/>
   <Predictor name="Gender"/>
   <Predictor name="IGNORE_Accounts"/>
  </FactorList>
  <CovariateList>
   <Predictor name="ID"/>
   <Predictor name="Age"/>
   <Predictor name="Income"/>
   <Predictor name="Deductions"/>
   <Predictor name="Hours"/>
   <Predictor name="RISK_Adjustment"/>
  </CovariateList>
  <PPMatrix>
   <PPCell value="1" predictorName="ID" parameterName="p1"/>
   <PPCell value="1" predictorName="Age" parameterName="p2"/>
   <PPCell value="Private" predictorName="Employment" parameterName="p3"/>
   <PPCell value="PSFederal" predictorName="Employment" parameterName="p4"/>
   <PPCell value="PSLocal" predictorName="Employment" parameterName="p5"/>
   <PPCell value="PSState" predictorName="Employment" parameterName="p6"/>
   <PPCell value="SelfEmp" predictorName="Employment" parameterName="p7"/>
   <PPCell value="Volunteer" predictorName="Employment" parameterName="p8"/>
   <PPCell value="Bachelor" predictorName="Education" parameterName="p9"/>
   <PPCell value="College" predictorName="Education" parameterName="p10"/>
   <PPCell value="Doctorate" predictorName="Education" parameterName="p11"/>
   <PPCell value="HSgrad" predictorName="Education" parameterName="p12"/>
   <PPCell value="Master" predictorName="Education" parameterName="p13"/>
   <PPCell value="Preschool" predictorName="Education" parameterName="p14"/>
   <PPCell value="Professional" predictorName="Education" parameterName="p15"/>
   <PPCell value="Vocational" predictorName="Education" parameterName="p16"/>
   <PPCell value="Yr10" predictorName="Education" parameterName="p17"/>
   <PPCell value="Yr11" predictorName="Education" parameterName="p18"/>
   <PPCell value="Yr12" predictorName="Education" parameterName="p19"/>
   <PPCell value="Yr1t4" predictorName="Education" parameterName="p20"/>
   <PPCell value="Yr5t6" predictorName="Education" parameterName="p21"/>
   <PPCell value="Yr7t8" predictorName="Education" parameterName="p22"/>
   <PPCell value="Yr9" predictorName="Education" parameterName="p23"/>
   <PPCell value="Divorced" predictorName="Marital" parameterName="p24"/>
   <PPCell value="Married" predictorName="Marital" parameterName="p25"/>
   <PPCell value="Married-spouse-absent" predictorName="Marital" parameterName="p26"/>
   <PPCell value="Unmarried" predictorName="Marital" parameterName="p27"/>
   <PPCell value="Widowed" predictorName="Marital" parameterName="p28"/>
   <PPCell value="Clerical" predictorName="Occupation" parameterName="p29"/>
   <PPCell value="Executive" predictorName="Occupation" parameterName="p30"/>
   <PPCell value="Farming" predictorName="Occupation" parameterName="p31"/>
   <PPCell value="Home" predictorName="Occupation" parameterName="p32"/>
   <PPCell value="Machinist" predictorName="Occupation" parameterName="p33"/>
   <PPCell value="Military" predictorName="Occupation" parameterName="p34"/>
   <PPCell value="Professional" predictorName="Occupation" parameterName="p35"/>
   <PPCell value="Protective" predictorName="Occupation" parameterName="p36"/>
   <PPCell value="Repair" predictorName="Occupation" parameterName="p37"/>
   <PPCell value="Sales" predictorName="Occupation" parameterName="p38"/>
   <PPCell value="Service" predictorName="Occupation" parameterName="p39"/>
   <PPCell value="Support" predictorName="Occupation" parameterName="p40"/>
   <PPCell value="Transport" predictorName="Occupation" parameterName="p41"/>
   <PPCell value="1" predictorName="Income" parameterName="p42"/>
   <PPCell value="Male" predictorName="Gender" parameterName="p43"/>
   <PPCell value="1" predictorName="Deductions" parameterName="p44"/>
   <PPCell value="1" predictorName="Hours" parameterName="p45"/>
   <PPCell value="China" predictorName="IGNORE_Accounts" parameterName="p46"/>
   <PPCell value="Columbia" predictorName="IGNORE_Accounts" parameterName="p47"/>
   <PPCell value="Cuba" predictorName="IGNORE_Accounts" parameterName="p48"/>
   <PPCell value="Ecuador" predictorName="IGNORE_Accounts" parameterName="p49"/>
   <PPCell value="England" predictorName="IGNORE_Accounts" parameterName="p50"/>
   <PPCell value="Fiji" predictorName="IGNORE_Accounts" parameterName="p51"/>
   <PPCell value="Germany" predictorName="IGNORE_Accounts" parameterName="p52"/>
   <PPCell value="Greece" predictorName="IGNORE_Accounts" parameterName="p53"/>
   <PPCell value="Guatemala" predictorName="IGNORE_Accounts" parameterName="p54"/>
   <PPCell value="Hong" predictorName="IGNORE_Accounts" parameterName="p55"/>
   <PPCell value="Hungary" predictorName="IGNORE_Accounts" parameterName="p56"/>
   <PPCell value="India" predictorName="IGNORE_Accounts" parameterName="p57"/>
   <PPCell value="Indonesia" predictorName="IGNORE_Accounts" parameterName="p58"/>
   <PPCell value="Iran" predictorName="IGNORE_Accounts" parameterName="p59"/>
   <PPCell value="Ireland" predictorName="IGNORE_Accounts" parameterName="p60"/>
   <PPCell value="Italy" predictorName="IGNORE_Accounts" parameterName="p61"/>
   <PPCell value="Jamaica" predictorName="IGNORE_Accounts" parameterName="p62"/>
   <PPCell value="Japan" predictorName="IGNORE_Accounts" parameterName="p63"/>
   <PPCell value="Malaysia" predictorName="IGNORE_Accounts" parameterName="p64"/>
   <PPCell value="Mexico" predictorName="IGNORE_Accounts" parameterName="p65"/>
   <PPCell value="NewZealand" predictorName="IGNORE_Accounts" parameterName="p66"/>
   <PPCell value="Nicaragua" predictorName="IGNORE_Accounts" parameterName="p67"/>
   <PPCell value="Philippines" predictorName="IGNORE_Accounts" parameterName="p68"/>
   <PPCell value="Poland" predictorName="IGNORE_Accounts" parameterName="p69"/>
   <PPCell value="Portugal" predictorName="IGNORE_Accounts" parameterName="p70"/>
   <PPCell value="Scotland" predictorName="IGNORE_Accounts" parameterName="p71"/>
   <PPCell value="Singapore" predictorName="IGNORE_Accounts" parameterName="p72"/>
   <PPCell value="Taiwan" predictorName="IGNORE_Accounts" parameterName="p73"/>
   <PPCell value="UnitedStates" predictorName="IGNORE_Accounts" parameterName="p74"/>
   <PPCell value="Vietnam" predictorName="IGNORE_Accounts" parameterName="p75"/>
   <PPCell value="Yugoslavia" predictorName="IGNORE_Accounts" parameterName="p76"/>
   <PPCell value="1" predictorName="RISK_Adjustment" parameterName="p77"/>
  </PPMatrix>
  <ParamMatrix>
   <PCell parameterName="p0" df="1" beta="-12.0199804097759"/>
   <PCell parameterName="p1" df="1" beta="3.62329433275629e-08"/>
   <PCell parameterName="p2" df="1" beta="0.0380676635766761"/>
   <PCell parameterName="p3" df="1" beta="0.756901134378277"/>
   <PCell parameterName="p4" df="1" beta="0.375762595900717"/>
   <PCell parameterName="p5" df="1" beta="0.50309824514625"/>
   <PCell parameterName="p6" df="1" beta="0.470897191210805"/>
   <PCell parameterName="p7" df="1" beta="-2.10284542055317"/>
   <PCell parameterName="p8" df="1" beta="-15.5455611068614"/>
   <PCell parameterName="p9" df="1" beta="0.0997435072074993"/>
   <PCell parameterName="p10" df="1" beta="-1.22905386951777"/>
   <PCell parameterName="p11" df="1" beta="-6.76667195830752"/>
   <PCell parameterName="p12" df="1" beta="-1.01297363710822"/>
   <PCell parameterName="p13" df="1" beta="-0.340407862763258"/>
   <PCell parameterName="p14" df="1" beta="-15.8841924243017"/>
   <PCell parameterName="p15" df="1" beta="3.18173392385448"/>
   <PCell parameterName="p16" df="1" beta="-0.569821531302005"/>
   <PCell parameterName="p17" df="1" beta="-3.3033217141108"/>
   <PCell parameterName="p18" df="1" beta="-0.430994461878221"/>
   <PCell parameterName="p19" df="1" beta="-17.0972305473487"/>
   <PCell parameterName="p20" df="1" beta="-15.929168040244"/>
   <PCell parameterName="p21" df="1" beta="-17.7483980280451"/>
   <PCell parameterName="p22" df="1" beta="-16.1514804898207"/>
   <PCell parameterName="p23" df="1" beta="-10.3889654044557"/>
   <PCell parameterName="p24" df="1" beta="-0.690592385956069"/>
   <PCell parameterName="p25" df="1" beta="2.53630505787246"/>
   <PCell parameterName="p26" df="1" beta="1.41541804527502"/>
   <PCell parameterName="p27" df="1" beta="1.49491086815453"/>
   <PCell parameterName="p28" df="1" beta="0.174099244312997"/>
   <PCell parameterName="p29" df="1" beta="1.01865424623088"/>
   <PCell parameterName="p30" df="1" beta="1.73213477081248"/>
   <PCell parameterName="p31" df="1" beta="-1.80877402327631"/>
   <PCell parameterName="p32" df="1" beta="-12.4454410582178"/>
   <PCell parameterName="p33" df="1" beta="-0.417346874910574"/>
   <PCell parameterName="p34" df="1" beta="-12.475145396564"/>
   <PCell parameterName="p35" df="1" beta="1.45214141089004"/>
   <PCell parameterName="p36" df="1" beta="1.64050123149924"/>
   <PCell parameterName="p37" df="1" beta="0.134775653612853"/>
   <PCell parameterName="p38" df="1" beta="0.948585540443075"/>
   <PCell parameterName="p39" df="1" beta="0.144171863863442"/>
   <PCell parameterName="p40" df="1" beta="0.789971116324262"/>
   <PCell parameterName="p41" df="1" beta="0.842781801750256"/>
   <PCell parameterName="p42" df="1" beta="-9.63129083571953e-07"/>
   <PCell parameterName="p43" df="1" beta="-0.52313575926474"/>
   <PCell parameterName="p44" df="1" beta="0.00125611277933667"/>
   <PCell parameterName="p45" df="1" beta="0.0109489183058056"/>
   <PCell parameterName="p46" df="1" beta="-2.86790934232277"/>
   <PCell parameterName="p47" df="1" beta="-10.4586048958891"/>
   <PCell parameterName="p48" df="1" beta="-11.8078344468555"/>
   <PCell parameterName="p49" df="1" beta="-8.15369086351991"/>
   <PCell parameterName="p50" df="1" beta="-15.1509749621394"/>
   <PCell parameterName="p51" df="1" beta="-12.6588234930477"/>
   <PCell parameterName="p52" df="1" beta="7.44342418994783"/>
   <PCell parameterName="p53" df="1" beta="-8.80415604321149"/>
   <PCell parameterName="p54" df="1" beta="-0.909551298634999"/>
   <PCell parameterName="p55" df="1" beta="3.21333791872318"/>
   <PCell parameterName="p56" df="1" beta="-9.7080063371067"/>
   <PCell parameterName="p57" df="1" beta="-9.94640566996892"/>
   <PCell parameterName="p58" df="1" beta="-7.34469543656762"/>
   <PCell parameterName="p59" df="1" beta="-10.1375079207868"/>
   <PCell parameterName="p60" df="1" beta="4.03786237290128"/>
   <PCell parameterName="p61" df="1" beta="-9.95289672035589"/>
   <PCell parameterName="p62" df="1" beta="-11.2800534550324"/>
   <PCell parameterName="p63" df="1" beta="-8.5259456003378"/>
   <PCell parameterName="p64" df="1" beta="-11.1183864482514"/>
   <PCell parameterName="p65" df="1" beta="-3.17790587178398"/>
   <PCell parameterName="p66" df="1" beta="7.62183148791729"/>
   <PCell parameterName="p67" df="1" beta="-9.29840834254978"/>
   <PCell parameterName="p68" df="1" beta="5.87739404847556"/>
   <PCell parameterName="p69" df="1" beta="-11.0988711939497"/>
   <PCell parameterName="p70" df="1" beta="-5.78171399043641"/>
   <PCell parameterName="p71" df="1" beta="-11.009822161619"/>
   <PCell parameterName="p72" df="1" beta="-7.98831399897464"/>
   <PCell parameterName="p73" df="1" beta="-14.2857685874083"/>
   <PCell parameterName="p74" df="1" beta="4.89065048867447"/>
   <PCell parameterName="p75" df="1" beta="-2.21686920486685"/>
   <PCell parameterName="p76" df="1" beta="-10.0494769160447"/>
   <PCell parameterName="p77" df="1" beta="0.0044395180546043"/>
  </ParamMatrix>
 </GeneralRegressionModel>
</PMML>

R模型持有的系数如下:

Coefficients:
                               Estimate Std. Error z value Pr(>|z|)
(Intercept)                  -5.779e+00  1.108e+04  -0.001 0.999584
ID                            3.922e-08  6.187e-08   0.634 0.526164
Age                           2.705e-02  1.388e-02   1.949 0.051314 .
EmploymentPrivate             1.087e+00  6.774e-01   1.605 0.108550
EmploymentPSFederal           1.155e+00  1.050e+00   1.101 0.271105
EmploymentPSLocal             1.262e+00  8.811e-01   1.432 0.152036
EmploymentPSState             8.151e-01  1.011e+00   0.806 0.420221
EmploymentSelfEmp             2.217e-01  9.859e-01   0.225 0.822066
EmploymentVolunteer          -1.667e+01  1.075e+04  -0.002 0.998764
EducationBachelor             4.297e-01  7.768e-01   0.553 0.580154
EducationCollege             -1.234e+00  8.393e-01  -1.470 0.141592
EducationDoctorate            1.604e+00  1.697e+00   0.945 0.344690
EducationHSgrad              -5.332e-01  7.613e-01  -0.700 0.483661
EducationMaster              -3.705e-01  1.117e+00  -0.332 0.740081
EducationPreschool           -1.306e+01  3.588e+03  -0.004 0.997096
EducationProfessional         1.600e+00  1.251e+00   1.279 0.200733
EducationVocational          -3.887e-01  1.023e+00  -0.380 0.703998
EducationYr10                -2.121e+00  1.897e+00  -1.118 0.263626
EducationYr11                -3.222e-01  1.294e+00  -0.249 0.803322
EducationYr12                -4.786e+00  1.235e+01  -0.388 0.698298
EducationYr1t4               -1.588e+01  4.174e+03  -0.004 0.996965
EducationYr5t6               -1.779e+01  2.356e+03  -0.008 0.993976
EducationYr7t8               -1.659e+01  1.951e+03  -0.009 0.993214
EducationYr9                 -1.672e+01  2.680e+03  -0.006 0.995022
MaritalDivorced              -6.700e-01  8.277e-01  -0.809 0.418238
MaritalMarried                2.269e+00  5.238e-01   4.332 1.48e-05 ***
MaritalMarried-spouse-absent  1.299e+00  1.385e+00   0.938 0.348362
MaritalUnmarried              1.570e+00  9.025e-01   1.740 0.081926 .
MaritalWidowed                7.018e-01  1.209e+00   0.581 0.561438
OccupationClerical            1.060e+00  1.224e+00   0.866 0.386731
OccupationExecutive           1.851e+00  1.138e+00   1.627 0.103649
OccupationFarming             1.189e-01  1.530e+00   0.078 0.938065
OccupationHome               -1.296e+01  6.601e+03  -0.002 0.998434
OccupationMachinist           2.869e-01  1.299e+00   0.221 0.825190
OccupationMilitary           -1.318e+01  1.075e+04  -0.001 0.999022
OccupationProfessional        1.589e+00  1.187e+00   1.339 0.180656
OccupationProtective          1.099e+00  1.622e+00   0.678 0.497935
OccupationRepair              1.641e-01  1.204e+00   0.136 0.891597
OccupationSales               7.170e-01  1.205e+00   0.595 0.551929
OccupationService            -5.600e-02  1.348e+00  -0.042 0.966858
OccupationSupport             8.431e-01  1.348e+00   0.626 0.531515
OccupationTransport           3.488e-01  1.242e+00   0.281 0.778911
Income                        1.442e-06  3.112e-06   0.463 0.643050
GenderMale                    1.510e-01  5.361e-01   0.282 0.778254
Deductions                    1.476e-03  4.109e-04   3.593 0.000327 ***
Hours                         2.116e-02  1.433e-02   1.476 0.139922
IGNORE_AccountsChina         -2.048e+01  1.867e+04  -0.001 0.999125
IGNORE_AccountsColumbia      -2.085e+01  1.294e+04  -0.002 0.998715
IGNORE_AccountsCuba          -1.942e+01  1.544e+04  -0.001 0.998997
IGNORE_AccountsEcuador       -1.701e+01  1.544e+04  -0.001 0.999121
IGNORE_AccountsEngland       -1.418e+01  1.109e+04  -0.001 0.998980
IGNORE_AccountsGermany       -4.952e-02  1.108e+04   0.000 0.999996
IGNORE_AccountsGreece        -1.645e+01  1.544e+04  -0.001 0.999150
IGNORE_AccountsGuatemala     -2.767e+00  1.459e+04   0.000 0.999849
IGNORE_AccountsHong          -3.325e+00  1.557e+04   0.000 0.999830
IGNORE_AccountsIndia         -1.506e+01  1.110e+04  -0.001 0.998918
IGNORE_AccountsIndonesia     -1.692e+01  1.225e+04  -0.001 0.998897
IGNORE_AccountsIreland       -3.329e+00  1.108e+04   0.000 0.999760
IGNORE_AccountsItaly         -1.663e+01  1.304e+04  -0.001 0.998982
IGNORE_AccountsJamaica       -2.174e+01  2.163e+04  -0.001 0.999198
IGNORE_AccountsJapan         -1.577e+01  1.544e+04  -0.001 0.999185
IGNORE_AccountsMalaysia      -1.903e+01  1.206e+04  -0.002 0.998741
IGNORE_AccountsMexico        -9.440e+00  1.108e+04  -0.001 0.999320
IGNORE_AccountsNewZealand     1.773e-01  1.562e+04   0.000 0.999991
IGNORE_AccountsNicaragua     -1.786e+01  1.200e+04  -0.001 0.998812
IGNORE_AccountsPhilippines   -9.526e-01  1.108e+04   0.000 0.999931
IGNORE_AccountsPoland        -1.878e+01  1.544e+04  -0.001 0.999030
IGNORE_AccountsPortugal      -1.432e+00  1.557e+04   0.000 0.999927
IGNORE_AccountsSingapore     -1.778e+01  1.225e+04  -0.001 0.998842
IGNORE_AccountsTaiwan        -1.922e+01  1.259e+04  -0.002 0.998782
IGNORE_AccountsUnitedStates  -2.519e+00  1.108e+04   0.000 0.999819
IGNORE_AccountsVietnam       -1.984e+01  1.250e+04  -0.002 0.998734
IGNORE_AccountsYugoslavia    -1.774e+01  1.544e+04  -0.001 0.999083
RISK_Adjustment               3.802e-03  6.819e-04   5.575 2.47e-08 ***

(生成此 GLM 模型和相应的 PMML 的 R 脚本如下:

library(pmml)
auditDF <- read.csv("http://rattle.togaware.com/audit.csv")
auditDF <- na.omit(auditDF)
target <- auditDF$TARGET_Adjusted
N <- length(target); M <- N - 500
i.train <- sample(N, M)
audit.train <- auditDF[i.train,]
audit.test <- auditDF[-i.train,]
glm.model <- glm(audit.train$TARGET_Adjusted ~ ., data = audit.train, family = "binomial")
glm.pmml <- pmml(glm.model, name = "glm model", data = trainDF)
xmlFile <- file.path(getwd(), "audit-glm.xml")
saveXML(glm.pmml, xmlFile)

来源:http://blog.revolutionanalytics.com/2011/03/predicting-r-models-with-pmml.html)

最佳答案

我想这完全取决于您将模型放回 R 后要对其执行的操作。有一次我帮助某人创建了一个伪 gml 对象,该对象知道变量的系数并且可以与 预测()。许多其他功能需要存在填充数据集。

如果您可能对此感兴趣。该函数称为 makeglm.R .您将只想将该函数复制并粘贴到您的 R session 中。但是有必要首先转换您的数据。这里有一些辅助函数可以做到这一点。

getdata <- function(xml, ns=attr(xml,"ns")) {
    names<-xpathSApply(xml, "//d:DataField/@name", namespaces = ns)
    vals<-xpathApply(xml, "//d:DataField", function(x) {
        if(xmlGetAttr(x, "optype")=="categorical") {
            levels<-xpathSApply(x, "Value/@value")
            factor(character(0), levels=levels)
        } else if (xmlGetAttr(x, "optype")=="continuous"){
            numeric(0)
        }
    }, namespaces = ns)
    names(vals)<-names
    as.data.frame(vals)
}

getformula <- function(xml, ns=attr(xml,"ns")) {
    resp<-xpathSApply(xml, "//d:MiningField[@usageType=\"predicted\"]/@name",
        namespaces = ns)
    covar<-xpathSApply(xml, "//d:MiningField[@usageType=\"active\"]/@name",
        namespaces = ns)
    fmc<-paste(paste(resp, collapse=" + "), "~", paste(covar, collapse=" + "))
    as.formula(fmc)
}

getestimates <- function(xml, ns=attr(xml,"ns")) {
    betas <- setNames(as.numeric(xpathSApply(xml, "//d:PCell/@beta", namespaces = ns)), 
        xpathSApply(xml, "//d:PCell/@parameterName", namespaces = ns))
    numericparam <- unname(xpathSApply(xml, "//d:CovariateList/d:Predictor/@name", namespaces = ns))
    factorparam <- unname(xpathSApply(xml, "//d:FactorList/d:Predictor/@name", namespaces = ns))
    values <- do.call(rbind, Map(function(x,y,z) data.frame(p=x, val=y, pred=z, stringsAsFactors=F), 
        unname(xpathSApply(xml,"//d:PPCell/@parameterName", namespaces = ns)), 
        xpathSApply(xml, "//d:PPCell/@value", namespaces = ns),
        xpathSApply(xml, "//d:PPCell/@predictorName", namespaces = ns)))
    lf<-Map(function(x) {
        vv <- values[values$pred==x, ]
        setNames(betas[vv$p], vv$val)
    }, factorparam)
    ln<-Map(function(x) {
        vv <- values[values$pred==x, ]
        unname(betas[vv$p])
    }, numericparam)
    estimates<-c(lf,ln)
    intercept<-getNodeSet(xml,"//d:Parameter[@label=\"(Intercept)\"]", namespaces = ns)
    if(length(intercept)) {
        estimates<-c(unname(betas[xmlGetAttr(intercept[[1]],"name")]), estimates)
    }
    estimates
}

我一点也不熟悉 PMML 格式,但我根据您的示例文档将它们放在一起。我尝试从数据中提取构建公式、data.frame stub 和参数估计所需的所有正确信息,以便使用 makeglm() 函数。加载该函数和这些辅助函数后,您可以运行

library(XML)
mypmml <- xmlParse("pmml.xml")
attr(mypmml, "ns")<-"d"

dd <- getdata(mypmml)
ff <- getformula(mypmml)
ee <- getestimates(mypmml)
do.call(makeglm, c(list(ff, family="binomial", data=dd), ee))

实际运行函数。这将返回一个 glm 对象,您可以将其与 predict() 一起使用。我确实必须更改您的示例数据中的一件事。由于某种原因,您将表名作为 glm 模型中公式的一部分

glm(audit.train$TARGET_Adjusted ~ .,  data = audit.train, ...)

而不是

glm(TARGET_Adjusted ~ .,  data = audit.train, ...)

这可能会导致问题。所以我在读入之前从 xml 文件中取出了“audit.train$”。也许可以进行更多的错误检查,但我什至不确定这是否最终是您想要的。

关于r - 将表示多项逻辑回归的 PMML 转换回 R 系数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24374503/

相关文章:

R 3.3.0 在 Windows : gcc not found error 上安装软件包

r - 如何在 R 中的 twitteR 包中提取推文地理编码

java - 在 Python 3.6 中调用 sklearn2pmml() 函数会抛出 RuntimeError

apache-spark - 如何在 Spark-MLlib PMML 文件中用精确的列名替换 DataField 值?

r - 在 R 中的数据框的每一行中保留唯一分数

r - 获取逗号分隔字符串列之间的公共(public)元素(来自同一行),保留行名

r - summary.connection(connection) : invalid connection 中的错误

matlab - Logistic回归中的成本函数得出NaN作为结果

r - 使用 rjags 包进行贝叶斯多项式回归

pmml - 是否存在以编程方式向现有 PMML 添加转换的东西?