这是我的 data.frame::beef
> head(beef)
YEAR....PBE CBE PPO CPO PFO DINC CFO RDINC RFP
1 1925 59.7 58.6 60.5 65.8 65.8 51.4 90.9 68.5 877
2 1926 59.7 59.4 63.3 63.3 68.0 52.6 92.1 69.6 899
3 1927 63 53.7 59.9 66.8 65.5 52.1 90.9 70.2 883
4 1928 71 48.1 56.3 69.9 64.8 52.7 90.9 71.9 884
5 1929 71 49.0 55.0 68.7 65.6 55.1 91.1 75.2 895
6 1930 74.2 48.2 59.6 66.1 62.4 48.8 90.7 68.3 874
和
dput(head(beef))
structure(list(YEAR....PBE = structure(1:6, .Label = c("1925 59.7",
"1926 59.7", "1927 63", "1928 71", "1929 71", "1930 74.2",
"1931 72.1", "1932 79", "1933 73.1", "1934 70.2",
"1935 82.2", "1936 68.4", "1937 73", "1938 70.2",
"1939 67.8", "1940 63.4", "1941 56"), class = "factor"),
CBE = c(58.6, 59.4, 53.7, 48.1, 49, 48.2), PPO = c(60.5,
63.3, 59.9, 56.3, 55, 59.6), CPO = c(65.8, 63.3, 66.8, 69.9,
68.7, 66.1), PFO = c(65.8, 68, 65.5, 64.8, 65.6, 62.4), DINC = c(51.4,
52.6, 52.1, 52.7, 55.1, 48.8), CFO = c(90.9, 92.1, 90.9,
90.9, 91.1, 90.7), RDINC = c(68.5, 69.6, 70.2, 71.9, 75.2,
68.3), RFP = c(877L, 899L, 883L, 884L, 895L, 874L)), .Names = c("YEAR....PBE",
"CBE", "PPO", "CPO", "PFO", "DINC", "CFO", "RDINC", "RFP"), row.names = c(NA,
6L), class = "data.frame")
我想根据其他变量为 PBE 创建一个多元线性回归模型。按照此 link 中的教程进行操作我想我应该执行以下代码:
> lm(formula = PBE ~ CBE + PBO + CPO + PFO +
+ DINC + CFO+RDINC+RFP+YEAR, data = beef)
eval(expr, envir, enclos) 错误:找不到对象 'PBE' 所以我决定尝试以下方法,但都有一些错误:
> lm(formula=PBE~YEAR,data=beef)
Error in eval(expr, envir, enclos) : object 'PBE' not found
> lm(formula=beef$PBE~beef$YEAR)
Error in model.frame.default(formula = beef$PBE ~ beef$YEAR, drop.unused.levels = TRUE) :
invalid type (NULL) for variable 'beef$PBE
你能告诉我错别字/错误在哪里吗?
P.S.:我使用 beef=read.table("beef.txt", header = TRUE, sep = "\t", comment.char="%")
读取文件文件如下所示:
% http://lib.stat.cmu.edu/DASL/Datafiles/agecondat.html
%
% Datafile Name: Agricultural Economics Studies
% Datafile Subjects: Agriculture , Economics , Consumer
% Story Names: Agricultural Economics Studies
% Reference: F.B. Waugh, Graphic Analysis in Agricultural Economics,
% Agricultural Handbook No. 128, U.S. Department of Agriculture, 1957.
% Authorization: free use
% Description: Price and consumption per capita of beef and pork
% annually from 1925 to 1941 together with other variables relevant to
% an economic analysis of price and/or consumption of beef and pork
% over the period.
% Number of cases: 17
% Variable Names:
%
% PBE = Price of beef (cents/lb)
% CBE = Consumption of beef per capita (lbs)
% PPO = Price of pork (cents/lb)
% CPO = Consumption of pork per capita (lbs)
% PFO = Retail food price index (1947-1949 = 100)
% DINC = Disposable income per capita index (1947-1949 = 100)
% CFO = Food consumption per capita index (1947-1949 = 100)
% RDINC = Index of real disposable income per capita (1947-1949 = 100)
% RFP = Retail food price index adjusted by the CPI (1947-1949 = 100)
%
% The Data:
YEAR PBE CBE PPO CPO PFO DINC CFO RDINC RFP
1925 59.7 58.6 60.5 65.8 65.8 51.4 90.9 68.5 877
1926 59.7 59.4 63.3 63.3 68 52.6 92.1 69.6 899
1927 63 53.7 59.9 66.8 65.5 52.1 90.9 70.2 883
1928 71 48.1 56.3 69.9 64.8 52.7 90.9 71.9 884
1929 71 49 55 68.7 65.6 55.1 91.1 75.2 895
1930 74.2 48.2 59.6 66.1 62.4 48.8 90.7 68.3 874
1931 72.1 47.9 57 67.4 51.4 41.5 90 64 791
这是 Patrick 建议的 View(beef)
的结果:
最佳答案
您需要返回并查看将这些数据加载到 R 中的文件。 head()
的输出表明第一个变量是 YEAR....PBE
并且 PBE
数据已与 YEAR
变量,可能是因为您读入的文件中使用的分隔符存在一些问题。返回并仔细检查文件。
在 R 中执行此操作的一种方法是使用 count.fields()
,您将文件名传递给它以进行检查。请务必阅读 ?count.fields
,因为您可能需要设置 sep
和 quote
参数以匹配您从中读取数据的文件.该函数将告诉您它找到了多少个字段(变量);将其与已知数量的变量进行比较。
根据您的编辑,很明显发生了我上面描述的事情:
> names(beef)
[1] "YEAR....PBE" "CBE" "PPO" "CPO" "PFO"
[6] "DINC" "CFO" "RDINC" "RFP"
似乎该文件并非全部/完全/真正以制表符分隔。我能够读取您随附的部分数据:
beef <- read.table("file.name", header = TRUE, sep = "", comment.char = "%")
> head(beef)
YEAR PBE CBE PPO CPO PFO DINC CFO RDINC RFP
1 1925 59.7 58.6 60.5 65.8 65.8 51.4 90.9 68.5 877
2 1926 59.7 59.4 63.3 63.3 68.0 52.6 92.1 69.6 899
3 1927 63.0 53.7 59.9 66.8 65.5 52.1 90.9 70.2 883
4 1928 71.0 48.1 56.3 69.9 64.8 52.7 90.9 71.9 884
5 1929 71.0 49.0 55.0 68.7 65.6 55.1 91.1 75.2 895
6 1930 74.2 48.2 59.6 66.1 62.4 48.8 90.7 68.3 874
> str(beef)
'data.frame': 7 obs. of 10 variables:
$ YEAR : int 1925 1926 1927 1928 1929 1930 1931
$ PBE : num 59.7 59.7 63 71 71 74.2 72.1
$ CBE : num 58.6 59.4 53.7 48.1 49 48.2 47.9
$ PPO : num 60.5 63.3 59.9 56.3 55 59.6 57
$ CPO : num 65.8 63.3 66.8 69.9 68.7 66.1 67.4
$ PFO : num 65.8 68 65.5 64.8 65.6 62.4 51.4
$ DINC : num 51.4 52.6 52.1 52.7 55.1 48.8 41.5
$ CFO : num 90.9 92.1 90.9 90.9 91.1 90.7 90
$ RDINC: num 68.5 69.6 70.2 71.9 75.2 68.3 64
$ RFP : int 877 899 883 884 895 874 791
关于r - R : Variable not found in data. 框架中的多元回归,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21803531/