r - R : Variable not found in data. 框架中的多元回归

标签 r regression

这是我的 data.frame::beef

> head(beef)
   YEAR....PBE  CBE  PPO  CPO  PFO DINC  CFO RDINC RFP
1 1925    59.7 58.6 60.5 65.8 65.8 51.4 90.9  68.5 877
2 1926    59.7 59.4 63.3 63.3 68.0 52.6 92.1  69.6 899
3   1927    63 53.7 59.9 66.8 65.5 52.1 90.9  70.2 883
4   1928    71 48.1 56.3 69.9 64.8 52.7 90.9  71.9 884
5   1929    71 49.0 55.0 68.7 65.6 55.1 91.1  75.2 895
6 1930    74.2 48.2 59.6 66.1 62.4 48.8 90.7  68.3 874

dput(head(beef))
structure(list(YEAR....PBE = structure(1:6, .Label = c("1925    59.7", 
"1926    59.7", "1927    63", "1928    71", "1929    71", "1930    74.2", 
"1931    72.1", "1932    79", "1933    73.1", "1934    70.2", 
"1935    82.2", "1936    68.4", "1937    73", "1938    70.2", 
"1939    67.8", "1940    63.4", "1941    56"), class = "factor"), 
    CBE = c(58.6, 59.4, 53.7, 48.1, 49, 48.2), PPO = c(60.5, 
    63.3, 59.9, 56.3, 55, 59.6), CPO = c(65.8, 63.3, 66.8, 69.9, 
    68.7, 66.1), PFO = c(65.8, 68, 65.5, 64.8, 65.6, 62.4), DINC = c(51.4, 
    52.6, 52.1, 52.7, 55.1, 48.8), CFO = c(90.9, 92.1, 90.9, 
    90.9, 91.1, 90.7), RDINC = c(68.5, 69.6, 70.2, 71.9, 75.2, 
    68.3), RFP = c(877L, 899L, 883L, 884L, 895L, 874L)), .Names = c("YEAR....PBE", 
"CBE", "PPO", "CPO", "PFO", "DINC", "CFO", "RDINC", "RFP"), row.names = c(NA, 
6L), class = "data.frame")

我想根据其他变量为 PBE 创建一个多元线性回归模型。按照此 link 中的教程进行操作我想我应该执行以下代码:

> lm(formula = PBE ~ CBE + PBO + CPO + PFO + 
+        DINC + CFO+RDINC+RFP+YEAR, data = beef)

eval(expr, envir, enclos) 错误:找不到对象 'PBE' 所以我决定尝试以下方法,但都有一些错误:

> lm(formula=PBE~YEAR,data=beef)
Error in eval(expr, envir, enclos) : object 'PBE' not found
> lm(formula=beef$PBE~beef$YEAR)
Error in model.frame.default(formula = beef$PBE ~ beef$YEAR, drop.unused.levels = TRUE) : 
  invalid type (NULL) for variable 'beef$PBE

你能告诉我错别字/错误在哪里吗?

P.S.:我使用 beef=read.table("beef.txt", header = TRUE, sep = "\t", comment.char="%") 读取文件文件如下所示:

% http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html
% 
% Datafile Name: Agricultural Economics Studies
% Datafile Subjects: Agriculture , Economics , Consumer
% Story Names: Agricultural Economics Studies
% Reference: F.B. Waugh, Graphic Analysis in Agricultural Economics,
%   Agricultural Handbook No. 128, U.S. Department of Agriculture, 1957.
% Authorization: free use
% Description: Price and consumption per capita of beef and pork
%   annually from 1925 to 1941 together with other variables relevant to
%   an economic analysis of price and/or consumption of beef and pork
%   over the period.
% Number of cases: 17
% Variable Names:
% 
%   PBE = Price of beef (cents/lb)
%   CBE = Consumption of beef per capita (lbs)
%   PPO = Price of pork (cents/lb)
%   CPO = Consumption of pork per capita (lbs)
%   PFO = Retail food price index (1947-1949 = 100)
%   DINC = Disposable income per capita index (1947-1949 = 100)
%   CFO = Food consumption per capita index (1947-1949 = 100)
%   RDINC = Index of real disposable income per capita (1947-1949 = 100)
%   RFP = Retail food price index adjusted by the CPI (1947-1949 = 100)
% 
% The Data:
YEAR    PBE CBE PPO CPO PFO DINC    CFO RDINC   RFP
1925    59.7    58.6    60.5    65.8    65.8    51.4    90.9    68.5    877
1926    59.7    59.4    63.3    63.3    68  52.6    92.1    69.6    899
1927    63  53.7    59.9    66.8    65.5    52.1    90.9    70.2    883
1928    71  48.1    56.3    69.9    64.8    52.7    90.9    71.9    884
1929    71  49  55  68.7    65.6    55.1    91.1    75.2    895
1930    74.2    48.2    59.6    66.1    62.4    48.8    90.7    68.3    874
1931    72.1    47.9    57  67.4    51.4    41.5    90  64  791

这是 Patrick 建议的 View(beef) 的结果: enter image description here

最佳答案

您需要返回并查看将这些数据加载到 R 中的文件。 head() 的输出表明第一个变量是 YEAR....PBE 并且 PBE 数据已与 YEAR 变量,可能是因为您读入的文件中使用的分隔符存在一些问题。返回并仔细检查文件。

在 R 中执行此操作的一种方法是使用 count.fields(),您将文件名传递给它以进行检查。请务必阅读 ?count.fields,因为您可能需要设置 sepquote 参数以匹配您从中读取数据的文件.该函数将告诉您它找到了多少个字段(变量);将其与已知数量的变量进行比较。

根据您的编辑,很明显发生了我上面描述的事情:

> names(beef)
[1] "YEAR....PBE" "CBE"         "PPO"         "CPO"         "PFO"        
[6] "DINC"        "CFO"         "RDINC"       "RFP"

似乎该文件并非全部/完全/真正以制表符分隔。我能够读取您随附的部分数据:

beef <- read.table("file.name", header = TRUE, sep = "", comment.char = "%")

> head(beef)
  YEAR  PBE  CBE  PPO  CPO  PFO DINC  CFO RDINC RFP
1 1925 59.7 58.6 60.5 65.8 65.8 51.4 90.9  68.5 877
2 1926 59.7 59.4 63.3 63.3 68.0 52.6 92.1  69.6 899
3 1927 63.0 53.7 59.9 66.8 65.5 52.1 90.9  70.2 883
4 1928 71.0 48.1 56.3 69.9 64.8 52.7 90.9  71.9 884
5 1929 71.0 49.0 55.0 68.7 65.6 55.1 91.1  75.2 895
6 1930 74.2 48.2 59.6 66.1 62.4 48.8 90.7  68.3 874
> str(beef)
'data.frame':   7 obs. of  10 variables:
 $ YEAR : int  1925 1926 1927 1928 1929 1930 1931
     $ PBE  : num  59.7 59.7 63 71 71 74.2 72.1
 $ CBE  : num  58.6 59.4 53.7 48.1 49 48.2 47.9
     $ PPO  : num  60.5 63.3 59.9 56.3 55 59.6 57
 $ CPO  : num  65.8 63.3 66.8 69.9 68.7 66.1 67.4
     $ PFO  : num  65.8 68 65.5 64.8 65.6 62.4 51.4
 $ DINC : num  51.4 52.6 52.1 52.7 55.1 48.8 41.5
     $ CFO  : num  90.9 92.1 90.9 90.9 91.1 90.7 90
 $ RDINC: num  68.5 69.6 70.2 71.9 75.2 68.3 64
     $ RFP  : int  877 899 883 884 895 874 791

关于r - R : Variable not found in data. 框架中的多元回归,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21803531/

相关文章:

javascript - 如何调整数据表(DT)中选定列的宽度

r - 基于比较其他两个向量 "lagged"的元素的新向量?

r - 为什么 R 不 relevel?

python - 使用 pandas 使用时间作为自变量滚动 OLS

r - 从Survreg解释Weibull参数

r - 通过 Revolution R 聚合 .xdf

python - 如何在大型稀疏矩阵中找到非零元素的索引?

r - 获取 'best' lambda 处的 glmnet 系数

python - 在 scikit-learn 中实现 R 随机森林特征重要性评分

R 中速率变量的回归