我的 table :
New York 3 books 1000
London 2,25 2000
Paris 1.000 apples 3000
30 4000
Berlin newspapers
我想保留表中的空字段,用 xxxx
值填充它们,并将整个表放入列表中。
New York 3 books 1000
London 2,25 xxxx 2000
Paris 1.000 apples 3000
xxxx 30 xxxx 4000
Berlin xxxx newspapers xxxx
我所做的就是拾取每一行并将它们分开。
finallist = []
for line in range(1,6):
listtemp = re.split("\s{2,}", line)
finallist .append(listtemp)
然后我压缩了列表
zippedlist = zip(*finallist)
检查列(现在是行)的长度是否有足够的元素并添加缺失的元素xxxx
添加末尾,但这不起作用,因为它会压缩列(行分割不会) t 拾取一列中的空白)
如何用 xxxx
元素填充表格并将它们放入这样的列表中:
[['New York','3','books','1000'],['London','2,25','xxxx','2000'],['Paris','1.000','apples','3000'],['xxxx','30','xxxx','4000'],['Berlin','xxxx','newspapers','xxxx']]
另一个表可能是:
New York 3 books 1000
London 2,25 2000
Paris 1.000 3000
30 4000
Berlin apples newspapers
更新
两个答案都没有给出解决方案,但我用这两个答案找到了不同的解决方案(经过多次尝试......)
#list of all lines
r = ['New York 3 books 1000 ', ' London 2,25 2000 ', ' Paris 1.000 3000 ', ' 30 4000 ', ' Berlin apples newspapers ']
#split list
separator = "\s{2,}"
mylist = []
for i in range(0,len(r)):
mylisttemp = re.split(separator, r[i].strip())
mylist.append(mylisttemp)
#search for column matches
p = regex.compile("^(?<=\s*)\S|(?<=\s{2,})\S")
i = []
for n in range(0,len(r)):
itemp = []
for m in p.finditer(r[n]):
itemp.append(m.start())
i.append(itemp)
#find out which matches are on next lines comparing the column match with all the matches of first line (the one with the smallest difference is the match).
i_currentcols = []
i_0_indexes = list(range(0,len(i[0])))
for n in range(1,len(mylist)):
if len(i[n]) == len(i[0]):
continue
else:
i_new = []
for b in range(0,len(i[n])):
difference = []
for c in range(0,len(i[0])): #first line is always correct
difference.append(abs(i[0][c]-i[n][b]))
i_new.append(difference.index(min(difference)))
i_notinside = sorted([elem for elem in i_0_indexes if elem not in i_new ], key=int)
#add linenr.
i_notinside.insert(0, str(n))
i_currentcols.append(i_notinside)
#insert missing fields in list
for n in range(0,len(i_currentcols)):
for i in range(1,len(i_currentcols[n])):
mylist[int(i_currentcols[n][0])].insert(i_currentcols[n][i], "xxxx")
最佳答案
这非常具有挑战性,但我分两步想出了一个解决方案:
第 1 步:检测列起始位置
这里的复杂性在于,在某些行中该列是空的。
方法是:每个双空格后跟一个非空格字符标识一个新列的开始。 0 始终是列开始。从每一行开始搜索每一列:
t = """New York 3 books 1000
London 2,25 2000
Paris 1.000 apples 3000
30 4000
Berlin newspapers """
p = re.compile(" [^ ]")
i = set([0])
for line in t.split('\n'):
for m in p.finditer(line):
i.add(m.start()+2)
i = sorted(i)
输出:[0, 10, 18, 31]
第 2 步:标记这些位置上的每一行
def split_line_by_indexes( indexes, line ):
tokens=[]
indexes = indexes + [len(line)]
for i1,i2 in zip(indexes[:-1], indexes[1:]): #pairs
tokens.append( line[i1:i2].rstrip() )
return tokens
for line in t.split('\n'):
print split_line_by_indexes(i, line)
输出:
['New York', '3', 'books', '1000']
['London', '2,25', '', '2000']
['Paris', '1.000', 'apples', '3000']
['', '30', '', '4000']
['Berlin', '', 'newspapers', '']
当然,您可以用 xxxx
替换空值并将其写回文件,而不是打印
关于python - 如何填充表格中缺失的元素?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36643456/