我想知道的是如何使用上面的数据框和正则表达式来将数据行按正确的顺序排列。 正如您可以通过索引 2 和 4 看到的,数量和件数的顺序错误。 有人知道如何解决这个问题吗?
data = [['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4'],['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4']]
df = pd.DataFrame(data, columns = ['Information'])
df
+-------+--------------------------------------+
| index | Information |
+-------+--------------------------------------+
| 0 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 1 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 2 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
| 3 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 4 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
| 5 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 6 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 7 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
| 8 | Total 8\r\r\nQuantity 2\r\r\nPiece 4 |
| 9 | Total 8\r\r\nPiece 2\r\r\nQuantity 4 |
+-------+--------------------------------------+
dt = pd.DataFrame(df)
data = []
for item in dt['Information']:
regex = re.findall(r"(\d+)\D+(\d+)\D+(\d+)",item)
quantity = re.findall(r"\bTotal\s?\d\D+(\bQuantity)",item)
piece = re.findall(r"\bTotal\s?\d\D+(\bPiece)",item)
regex = (map(list,regex))
data.append(list(map(int,list(regex)[0])))
dftotal = pd.DataFrame(data, columns=['Total','Quantity','Piece'])
print(dftotal)
通过这段代码,我得到了如下所示的列
+-------+----------+-------+
| Total | Quantity | Piece |
+-------+----------+-------+
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
+-------+----------+-------+
如何通过从“数据数组”中切换这些错误的顺序并将正确的变量放入单个数据框中来获得如下所示的数据框?
+-------+----------+-------+
| Total | Quantity | Piece |
+-------+----------+-------+
| 8 | 2 | 4 |
| 8 | 4 | 2 |
| 8 | 2 | 4 |
| 8 | 4 | 2 |
| 8 | 2 | 4 |
| 8 | 2 | 4 |
| 8 | 4 | 2 |
| 8 | 2 | 4 |
| 8 | 4 | 2 |
+-------+----------+-------+
最佳答案
这是一种使用 str.extract
例如:
import pandas as pd
data = [['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4'],['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'],['Total 8\r\r\nPiece 2\r\r\nQuantity 4'], ['Total 8\r\r\nQuantity 2\r\r\nPiece 4'], ['Total 8\r\r\nPiece 2\r\r\nQuantity 4']]
df = pd.DataFrame(data, columns = ['Information'])
df["Total"] = df["Information"].str.extract(r"Total (\d+)")
df["Quantity"] = df["Information"].str.extract(r"Quantity (\d+)")
df["Piece"] = df["Information"].str.extract(r"Piece (\d+)")
df.drop("Information", inplace=True, axis=1)
print(df)
输出:
Total Quantity Piece
0 8 2 4
1 8 2 4
2 8 4 2
3 8 2 4
4 8 4 2
5 8 2 4
6 8 2 4
7 8 4 2
8 8 2 4
9 8 4 2
关于python - 如何使用正则表达式更改数据记录的顺序并将其放在一个数据框中?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56148015/