我有一个通过网络抓取获得的名为“价格”的数据框。目标是跟踪津巴布韦证券交易所股票的每日价格。
从网站抓取网页:
library(rvest)
library(stringr)
library(reshape2)
# Data from African Financials
url <- "https://africanfinancials.com/zimbabwe-stock-exchange-share-prices/"
prices <- url %>%
read_html() %>%
html_table(fill = T)
prices <- prices[[1]]
价格数据框:
> prices
Counter PriceRTGS cents Volume ChangeRTGS cents ChangePercent YTDPercent
1 AFDS.zw Afdis 169.75 4 Apr 19 0 0.00 0.00% 10.95%
2 ARIS.zw Ariston 2.90 4 Apr 19 572 -0.03 -1.02% 20.83%
3 ARTD.zw ART Holdings 9.20 4 Apr 19 0 0.00 0.00% 4.55%
我想将“PriceRTGS cents”列拆分为两列“Price RTGS Cents”和“Date”。
我尝试使用下面的代码,但它捕获了价格列中的日期 4。
str_split_fixed(prices$`PriceRTGS cents`," ", 2)
colsplit(prices$`PriceRTGS cents`," ",c("Price RTGS Cents", "Date"))
我希望输出如下所示:
Counter Price RTGS Cents Date Volume ChangeRTGS cents ChangePercent YTDPercent
1 AFDS.zw Afdis 169.75 4/04/2019 0 0.00 0.00% 10.95%
2 ARIS.zw Ariston 2.90 4/04/2019 572 -0.03 -1.02% 20.83%
3 ARTD.zw ART Holdings 9.20 4/04/2019 0 0.00 0.00% 4.55%
输出数据:
structure(list(Counter = c("AFDS.zw Afdis", "ARIS.zw Ariston",
"ARTD.zw ART Holdings", "ASUN.zw Africansun", "AXIA.zw Axia",
"BAT.zw BAT"), `PriceRTGS cents` = c("169.75 4 Apr 19", "2.90 4 Apr 19",
"9.20 4 Apr 19", "15.00 4 Apr 19", "35.05 4 Apr 19", "3,000.00 4 Apr 19"
), Volume = c("0", "572", "0", "0", "8,557", "0"), `ChangeRTGS cents` = c(0,
-0.03, 0, 0, 0, 0), ChangePercent = c("0.00%", "-1.02%", "0.00%",
"0.00%", "0.00%", "0.00%"), YTDPercent = c("10.95%", "20.83%",
"4.55%", "50.00%", "-22.11%", "-9.09%")), row.names = c(NA, 6L
), class = "data.frame")
最佳答案
我刚刚将您的第一个价格数据复制并粘贴到文本编辑器中,并用“;”更改空格(我还没有看到你的数据版本)。
prices <- read.table("dat.txt", sep=";", header=T)
有点“快速而肮脏”的代码,但它正在工作:
str_split_fixed(prices$PriceRTGS.cents," ", 2)
new_prices <- data.frame(prices$Counter, str_split_fixed(prices$PriceRTGS.cents," ", 2), prices$Volume, prices$ChangeRTGS.cents, prices$ChangePercent, prices$YTDPercent)
colnames(new_prices) <- c("Counter", "PriceRTGS_cents", "Date", "Volume", "ChangeRTGS cents", "ChangePercent", "YTDPercent")
new_prices$Date <- gsub("Apr", "04", new_prices$Date)
new_prices$Date <- gsub(" ", "/", new_prices$Date)
new_prices <- data.frame(prices$Counter, new_prices$PriceRTGS_cents, new_prices$Date, prices$Volume, prices$ChangeRTGS.cents, prices$ChangePercent, prices$YTDPercent)
colnames(new_prices) <- c("Counter", "PriceRTGS_cents", "Date", "Volume", "ChangeRTGS cents", "ChangePercent", "YTDPercent")
new_prices
如果您还有除“四月”之外的其他月份,只需添加其他行 (例如:如果“十一月”)
new_prices$Date <- gsub("Nov", "10", new_prices$Date)
new_prices$Date <- gsub(" ", "/", new_prices$Date)
关于r - 将字符串列拆分为 2 列,一列是数字,另一列是日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55536695/