r - 将字符串列拆分为 2 列,一列是数字,另一列是日期

标签 r dataframe split

我有一个通过网络抓取获得的名为“价格”的数据框。目标是跟踪津巴布韦证券交易所股票的每日价格。

从网站抓取网页:

library(rvest)
library(stringr)
library(reshape2)
# Data from African Financials
url <- "https://africanfinancials.com/zimbabwe-stock-exchange-share-prices/"
prices <- url %>%
  read_html() %>%
  html_table(fill = T)
prices <- prices[[1]]

价格数据框:

> prices

                   Counter   PriceRTGS cents  Volume ChangeRTGS cents ChangePercent YTDPercent
1            AFDS.zw Afdis   169.75 4 Apr 19       0             0.00         0.00%     10.95%
2          ARIS.zw Ariston     2.90 4 Apr 19     572            -0.03        -1.02%     20.83%
3     ARTD.zw ART Holdings     9.20 4 Apr 19       0             0.00         0.00%      4.55%

我想将“PriceRTGS cents”列拆分为两列“Price RTGS Cents”和“Date”。

我尝试使用下面的代码,但它捕获了价格列中的日期 4。

str_split_fixed(prices$`PriceRTGS cents`," ", 2)
colsplit(prices$`PriceRTGS cents`," ",c("Price RTGS Cents", "Date"))

我希望输出如下所示:

                   Counter   Price RTGS Cents              Date         Volume ChangeRTGS cents ChangePercent YTDPercent
1            AFDS.zw Afdis             169.75         4/04/2019              0             0.00         0.00%     10.95%
2          ARIS.zw Ariston               2.90         4/04/2019            572            -0.03        -1.02%     20.83%
3     ARTD.zw ART Holdings               9.20         4/04/2019              0             0.00         0.00%      4.55%

输出数据:

structure(list(Counter = c("AFDS.zw Afdis", "ARIS.zw Ariston", 
"ARTD.zw ART Holdings", "ASUN.zw Africansun", "AXIA.zw Axia", 
"BAT.zw BAT"), `PriceRTGS cents` = c("169.75 4 Apr 19", "2.90 4 Apr 19", 
"9.20 4 Apr 19", "15.00 4 Apr 19", "35.05 4 Apr 19", "3,000.00 4 Apr 19"
), Volume = c("0", "572", "0", "0", "8,557", "0"), `ChangeRTGS cents` = c(0, 
-0.03, 0, 0, 0, 0), ChangePercent = c("0.00%", "-1.02%", "0.00%", 
"0.00%", "0.00%", "0.00%"), YTDPercent = c("10.95%", "20.83%", 
"4.55%", "50.00%", "-22.11%", "-9.09%")), row.names = c(NA, 6L
), class = "data.frame")

最佳答案

我刚刚将您的第一个价格数据复制并粘贴到文本编辑器中,并用“;”更改空格(我还没有看到你的数据版本)。

prices <- read.table("dat.txt", sep=";", header=T)

有点“快速而肮脏”的代码,但它正在工作:

str_split_fixed(prices$PriceRTGS.cents," ", 2)
new_prices <- data.frame(prices$Counter, str_split_fixed(prices$PriceRTGS.cents," ", 2), prices$Volume, prices$ChangeRTGS.cents, prices$ChangePercent, prices$YTDPercent)
colnames(new_prices) <- c("Counter", "PriceRTGS_cents", "Date",  "Volume", "ChangeRTGS cents", "ChangePercent",  "YTDPercent")
new_prices$Date <- gsub("Apr", "04", new_prices$Date)
new_prices$Date <- gsub(" ", "/", new_prices$Date)
new_prices <- data.frame(prices$Counter, new_prices$PriceRTGS_cents, new_prices$Date, prices$Volume, prices$ChangeRTGS.cents, prices$ChangePercent, prices$YTDPercent)
colnames(new_prices) <- c("Counter", "PriceRTGS_cents", "Date",  "Volume", "ChangeRTGS cents", "ChangePercent",  "YTDPercent")
new_prices

如果您还有除“四月”之外的其他月份,只需添加其他行 (例如:如果“十一月”)

new_prices$Date <- gsub("Nov", "10", new_prices$Date)
new_prices$Date <- gsub(" ", "/", new_prices$Date)

关于r - 将字符串列拆分为 2 列,一列是数字,另一列是日期,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55536695/

相关文章:

r - 如何检测时间序列数据中的符号变化(例如,正到负)?

r - 将串联列拆分到相应的列位置

r - 使用 R 并行加速 Bootstrap

python - 使用 Panda,根据 ID 和新值列表更新列值

apache-spark - 计算Spark Dataframe中分组数据的分位数

r - 如何随机采样具有唯一列值的数据帧行

Java字符串-将多个空格替换为一个,然后根据空格将字符串拆分为数组

c - 将行分割成单词数组 + C

r - 确保在运行 .Rprofile 之前加载了所有默认包

r - 在 r 中绘制同心饼图