我有一个比较大的.dta
包含 1280000 个观察值的文件,在 Stata 中运行良好,但我在将其导入 R 时遇到了麻烦。
数据是用Stata 15创建的,数据中包含strL或str#,#>244个变量,不能以Stata 12格式保存。
我正在尝试使用 haven
使用 read_dta()
导入保存的数据的包但它给了我以下错误消息:"Failed to parse /Users/folder/my_data.dta: Unable to allocate memory."
有谁知道可能导致此问题的原因以及如何克服它以便能够在 R 中导入数据?
我试图以多种方式解决这个问题,但我的尝试似乎都没有奏效。
Sys.setenv('R_MAX_VSIZE'=32000000000)
扩展我的 r 环境的内存大小。但是当我尝试导入数据时,控制台报告了同样的错误。该问题似乎与我在 R 中的内存大小无关。saveold my_data13, version(13)
以 Stata 13 格式保存数据在 Stata 中,但尝试使用 haven
将其导入 R仍然产生相同的错误信息。read.dta13(my_data13)
,但这经常会导致 R 崩溃。奇怪的是,我可以在 Stata 中正确打开数据,只需双击它。
有没有人对如何解决这个问题有任何建议?任何有关 a) 错误消息的含义以及如何解决它 2) 能够操作 stata15 文件的替代包 c) 能够在 R 中打开数据的方法的任何见解将是最受欢迎的。
非常感谢您的帮助
此致
最佳答案
我只想总结评论中提到的所有内容。
来自 official Stata documentation ,我们有以下内容:
strL variables can be 0 to 2-billion bytes long. strL variables are not required to be longer than 2,045 bytes. str# variables can store strings of up to 2,045 bytes, so strL and str# overlap. This overlap is comparable to the overlap of the numeric types int and float. Any number that can be stored as an int can be stored as a float. Similarly, any string that can be stored as a str#, can be stored as a strL. The reverse is not true. In addition, strL variables can hold binary strings, whereas str# variables can only hold text strings. Thus the analogy between str#/strL and int/float is exact. There will be occasions when you will want to use strL variables in preference to str# variables, just as there are occasions when you will want to use float variables in preference to int variables.
以上信息使我在尝试复制您的问题之前推断出两点。
strL
没有必要。strL
的使用是您问题的根源,可能导致与 haven
的一些兼容性问题图书馆。但是,在尝试复制您所描述的内容后,我得出了不同的结论。
请仔细考虑以下模拟您的问题的代码。
version 16
clear all
set obs 1
*4200 character string generated here: http://www.unit-conversion.info/texttools/random-string-generator/
gen strL str_var_in_strL = "eSw0qZcVs5DHU2GxgRo1Seo9uTwJ0MvHXyYUQidJMRWw8KW1310Ec242O6D4xrLziO4c56WgluSddTy0Q64QapkwGgOMZdy8ru0fyss0nwJvF4M3kBjYGF00ZsvQGYt4DjF51R3vxTzUx4xlApKwaoRADIgFlXvBh2Bug0VVhmXR3uInHDfpmID57kVWiyxX1gELdyPMVzJWizEHVx2GpjBsm1UdRphDdukFtFrnkr1HFRXBekxHkW3uOCHz0wnyDBfwitDGHosctRrWPhIjujnoalOaHkI5jbnENSNJEsOdGohoe5QKZIxtXmVbD4l8m8wLCbuSjZLw8NzU5vjPX57T2yWWasdFMIHk3kFipT0CG3dNForECS8UiW6ZWSIEmO2V62uakfrxTsRb9fIFVBUIHpGizeR0b27OnfSVB2wE2Ix0ij7kR19jz0wIh35fbwkJWqLq93pfHEtGu0FTb8H5A4XNOcR8chEAQBI7zV3rosSGnSP2h9QZtuSAcz1TrRHNMpCguvNf1DD72TCfCaiBXyflOCre7f5zchLA7k2cQ5qi4fBMVc9GnAdGB2vnjFeFlwaUD0AEUhfSJJINRQ2CKfJegqUL0jBgHBVy5cYCNxsP8Gu8NXRUo6vvyiTJMDcBkL0JKNOT4usSDi4v86cJNzQQa3ArafRzOv1RFz8BfI7pP7rXDLD6d1Z1miCqTZ8UtJBVQ0Z0eCQmTrlAvlu5busOjcAl4ZV7THH6qCV8tI53zh1THBfjnEgoPxy8UIaIK6tXDUM4RFMMd1366324mJEVwyvc5CWgzPian39Q3GFLl6zXCfD4pw7rSUmH5CNOmKgPihxPbV9NSBxiwVK3M07KFS2hZbf3ZDB4CBSJV9geFWKZlR3XNrsPudQgkpsdywNNjZTDwD2RiHF7kQAgyEW7q1w42OC2IbreBBtiPekx6yzCEWBEokLwfhrhbOnDwcnFmfKjnrxCbqypXrSnyvrUP2nUQ9vBmdxCqiVLrBHuDi6Wv2U4vyZ7dTqk84WmnwACXo5PbYY2dmhtjscLMpRw4Q6xVUEWC3qPMnQkbI1UKEq1NfOrF0X8nC0rqrwHQuNuJqHuebJj5AMXVgyZWTaqYIb4gkbGaEze3wNmHbbj1q2bmumiwd6RZRSdx7U3ZwozO9kTkZ69NHFSa2QDi8GrhgvBDMshJVaOR9K8tWcpa2QrFD7cI0ZqzneLXHXm6LsOmtZPFmikKfyts1pASGwZ8DzuWfT3j0daNmyk6y0HwHwM98KOyeuSXnQJOJzunAXkidv90hrgviWUhP70Nrx527JI7vpRr2dClBBnzO2O7YwjTdKTrmfcPs9z4iLeroo20Jg9ODHjvUYWtLRTOKrgvYAgywkj2PoVdwmuYK32UKcqH5EdhPHxWarjsqUuBb40u6nUGIQ0YS7ZuzsnVDerB8hO3rCl0FlMMYgRh4vdPcEG23JQoIwTvdujULg6Lpplyt6yK16UhVSklj6aVNIoA4zr51dULyOzWF9ZqlZz7l90QpXvLuRD9Elr5gxWvNW4fvkCAU3kpEv0s7gHS7ytjNxm0WLk74bN1iP8ZjcxXXBqwtatCo9e1Ayc59VYR9RVxtfvilb038WpHglhWEZoK91rumPSFiCJWUmlkL6P4SAbz5b6LDdW9ybiN8zdZmNtQ2px556d7DF5RRcXgLocLH37Uh6uU9cz2wmWRrcJS4rO9MkUe6KSuVjVLXSsk6J1bnvvagWl4BkY8ZPm0iBg4XXTkRAjfVgnfex1hee47b6k9c5gdS6AJSVazCPpXQJlGJ7NpyAn3hXdHkhaGtokTmne6Zag8DterOyDldPXHXwrG7PgtsmREc0VugLVPrYEbdf9QMHBtGQLwQz04Gyg2lspZ5HbGnOkfI0MTanuMN7XnWdcGBko9gmQKbpONgPqg8POcpxG2aRefswG090hvYKj5gzp3r1nitZZhBm8KDUT8P2Wy06hPxrkZinMGmBv2SIDegXr5uzceHymEnyMQZINS96QCyTiV7z1X1NZ3IBDfVPZTZ9bRxpKyMbAnYzFhx9PYSkescyMMtsGOEli1gFp2PWcqO4bpj0EnKjgWf9ae2R5nDKIkVbsNRCik3JrCM7WjHPfwdZSiA335Eyl0yoHQWjp6YJrR8ykOtw3zL2XHa2ilKIRSypG5dtDwjuqLI1fb7fB5wiG3LuowKqam8HY86aDsuu0DkpED9mAxoSvE7V6WPs5ptg31yoUOgGK1rvGtdpY7CHkaBmmv0jKNYjcZiuET6Q0If2IO36HafXJN8onjMvYadAypEY4IAxkmU2yemCFQkdDBuhr8G4DWcGO46W24QOvMghH6k3HVHeDgj5dNXKIz3rqbCC6tSNMptuQhoM2eWnwBFw6PzNIqKwB6qxbJMs9wxqvDEJlQkKgMZN4HJu1hNFpXIePLN8dSsV8xhgb1pxwFaqhLQMoYXcgobOdcrb7DDpJFbWhUXKn3WHEjO8nk4EAuNmIUdyfWxwxPPmTLEyohT7QrcjvRph82n64aRJIcyDPCho8pqtCTve84PAp4jIeechI8sl92e94jsX7XTZu3LaqDGkEEtcmp69ZqPA5Ev9NZv5ovmWNiu39kKu7QW0YnXGCvorirdScdCow4NyLgnpoAEG0FPz52oN7xU5xyTgY0x5Hel5GDsA28Qoy463tyBNuT51gQROr9XqgiM9Voq0ax1vI8QKFMLXwaLtHwPC7TkFtJbXYNmPt2kXzQ1EjLq0DGiyKd0BFMww3zpEjeSUS7KhCrf5qU7aDjIkVniPs6TGkTBOwG9ItrUv50WJfqgOd6ngHfYWzJFIAgZnGjXhtkHartO3F19iPs5VHRhTEUw2HZbgnTjmf2NmJ1onUkMNSFUMPrNkrfxl78GHNjJrGAkRU0jlXqAuIK8v6uYh4oyqSrFtzPru8GNIWCbka6LKLrMcCysoDk7VQI12xzELxUebiUsnLYCrnvmJtD0T7yv9M8H8rI4YsdbzD3forc56uvwqS0h0Bl6Sw71n781EC0R0V6067RA2TRZ39fs7yXYZ4O5pQ1uHn0qV82aZI1kHxWVJ1omu81KqoFpTnB0QuNd62AeVKRiuMiAf2UFhy40vFgFElRZFipH1TrJAuFYcgwd38kJWTGTYyW51z1DQfVZnlIegEfZQPTjnhFroayXw55MKnJGiVPQ4R0A2nO5LCDmDFmC8SyLz62pn0aQb5tvlQs5Es7woSf3SbxcKh9JndW42V8hQn4uXxbEKhAX9f48VJg5xXYsMOaBz4h0UfsOrOucFH3YVA9c7TVszXbSq7kRQkpsy3xkunDaIfdiAx5wdLE7LbPhUwrC1FWnCa2qQQxNUimxrQ35Woar9tNQSwpVd8ybEaivgQ77HPSYjTdkKX2j2TBCzmGVBesOUnWI9r34kRO5xPsPPDoJvLQe6kns75Yjcuz82OEuUai1PLVRKmzkRjyLp3tt5YDkjzuYCOWNchY3Eup1IEDvGf64wu4S1qLvRrl6HI9jZj7Li2GZc9grCTxbqpUQgCbCxdgmS6a396AJNijmG8uNnchGPlnNVm6DskG7T2pWasVuuhYhkyFNoUWuY5mBXurDMEDyyZPxlY9nlQYKHBNgg6ZnNEYnwCqTLzudDBQ48YG9r3700uvz83jJAX18s2Kjm2LlmOuPJON6rzbPua8Ac2Y0HPuQZD9Ikcim2MOyR9mbvtRTPeLAX3issevCDYaBiG6BFMaN9rW7j1UnlKQZYgTCveE4oH8tT7QGwdWENAjW4kGjS93zCS6QYxyUjg03er43KivMQOVHaT3iznZnQD3Nk5c0T9IKqRpcytY7JaRV7kmayUmKc4d1ApFqY8imlu2iTMiVfY16qMqDeulTtcKjKUuyWBrJSENwv238nWXQudShLeCsiwUMUnJvXyHdSsmAaaoG5O3RA4GkiQVAiX63tWPl6GNfweFAcpxoD4x8hZpbQ1SaBo3pRNwwHuAvzOwm0jKWndfugKqlUmPoDZb9Bx6dzDolUJtHSNYVrOACFY26SyeyeiFHnnd6wZFypkDGL4LgeBbU8TqJTjO2lFXVmCQZjTnO75V43vJKoHUrRSUkZJYTyl3c8tqWJmrUZ7lJo4cUrGzoudgzDHH8N8H73TwXboF8rbQ8i4yb9w51L0EnCd3kdmSXI0PJxuQ9CQ8AD9WKTwvEaSWDTZ"
describe
Contains data
obs: 1
vars: 1
------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------
str_var_in_strL strL %9s
------------------------------------------------------------------------------------
save "route\all_in_strL" , replace
之后,我将您的修改应用于我生成的数据。gen var1 = str_var_in_strL
foreach var of varlist var1 {
generate str `var'_str = `var'
replace `var' = ""
compress `var'
replace `var' = `var'_str
drop `var'_str
describe `var'
}
obs: 1
vars: 2 16 Dec 2020 12:07
------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------
str_var_in_strL strL %9s
var1 strL %9s
------------------------------------------------------------------------------------
save "route\edited_strL" , replace
奇怪的是,我能够使用 haven
将这两个文件导入到 R 中。带有以下代码的库。library(haven)
file <- "route_to_first_file/all_in_strL.dta"
# Import first file into R.
dta_in_R <- read_dta(
file,
encoding = NULL,
col_select = NULL,
skip = 0,
n_max = Inf,
.name_repair = "unique")
# Import edited file using your loop method into R.
file <- "route_to_edited_file/edited_strL.dta"
edited_dta_in_R<- read_dta(
file,
encoding = NULL,
col_select = NULL,
skip = 0,
n_max = Inf,
.name_repair = "unique")
这里唯一的区别可能是:最后,我认为问题的根源不是
strL
类型的数据,但您的机器上可用的内存,这可能由您的 compress
解决了进入您描述的 for 循环。PS:一切都是在Win10上运行的。 R 版本 4.0.3 (2020-10-10) 和
haven_2.3.1
关于r - 避风港 : read_dta error() "Failed to parse/Users/folder/my_data.dta: Unable to allocate memory.",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65304984/