我的 table :

New York  3       books        1000
London    2,25                 2000
Paris     1.000   apples       3000
          30                   4000
Berlin            newspapers

我想保留表中的空字段，用 xxxx 值填充它们，并将整个表放入列表中。

New York  3       books        1000
London    2,25    xxxx         2000
Paris     1.000   apples       3000
xxxx      30      xxxx         4000
Berlin    xxxx    newspapers   xxxx

我所做的就是拾取每一行并将它们分开。

finallist = []
for line in range(1,6):
   listtemp = re.split("\s{2,}", line)
   finallist .append(listtemp)

然后我压缩了列表

zippedlist = zip(*finallist)

检查列(现在是行)的长度是否有足够的元素并添加缺失的元素xxxx添加末尾，但这不起作用，因为它会压缩列(行分割不会) t 拾取一列中的空白)

如何用 xxxx 元素填充表格并将它们放入这样的列表中:

[['New York','3','books','1000'],['London','2,25','xxxx','2000'],['Paris','1.000','apples','3000'],['xxxx','30','xxxx','4000'],['Berlin','xxxx','newspapers','xxxx']]

另一个表可能是:

New York      3         books   1000  
  London      2,25              2000  
   Paris  1.000                 3000  
             30                 4000  
  Berlin  apples    newspapers

更新

两个答案都没有给出解决方案，但我用这两个答案找到了不同的解决方案(经过多次尝试......)

#list of all lines
r = ['New York      3         books   1000  ', '  London      2,25              2000  ', '   Paris  1.000                 3000  ', '             30                 4000  ', '  Berlin  apples    newspapers ']

#split list
separator = "\s{2,}"
mylist = []
for i in range(0,len(r)):
   mylisttemp = re.split(separator, r[i].strip())
   mylist.append(mylisttemp)

#search for column matches
p = regex.compile("^(?<=\s*)\S|(?<=\s{2,})\S") 

i = []
for n in range(0,len(r)):
   itemp = []
   for m in p.finditer(r[n]):
      itemp.append(m.start())
   i.append(itemp)

#find out which matches are on next lines comparing the column match with all the matches of first line (the one with the smallest difference is the match). 
i_currentcols = []
i_0_indexes = list(range(0,len(i[0])))
for n in range(1,len(mylist)):
   if len(i[n]) == len(i[0]):
      continue
   else:
      i_new = []
      for b in range(0,len(i[n])):
         difference = []
         for c in range(0,len(i[0])): #first line is always correct
             difference.append(abs(i[0][c]-i[n][b]))
         i_new.append(difference.index(min(difference)))
      i_notinside = sorted([elem for elem in i_0_indexes if elem not in i_new ], key=int)
      #add linenr.
      i_notinside.insert(0, str(n))
      i_currentcols.append(i_notinside)

#insert missing fields in list
for n in range(0,len(i_currentcols)):
    for i in range(1,len(i_currentcols[n])):
       mylist[int(i_currentcols[n][0])].insert(i_currentcols[n][i], "xxxx")

最佳答案

这非常具有挑战性，但我分两步想出了一个解决方案:

第 1 步:检测列起始位置

这里的复杂性在于，在某些行中该列是空的。

方法是:每个双空格后跟一个非空格字符标识一个新列的开始。 0 始终是列开始。从每一行开始搜索每一列:

t = """New York  3       books        1000
London    2,25                 2000
Paris     1.000   apples       3000
          30                   4000
Berlin            newspapers """

p = re.compile("  [^ ]")

i = set([0])
for line in t.split('\n'):
    for m in p.finditer(line):
        i.add(m.start()+2)
i = sorted(i)

输出:[0, 10, 18, 31]

第 2 步:标记这些位置上的每一行

def split_line_by_indexes( indexes, line ):
    tokens=[]
    indexes = indexes + [len(line)]
    for i1,i2 in zip(indexes[:-1], indexes[1:]): #pairs
        tokens.append( line[i1:i2].rstrip() )
    return tokens

for line in t.split('\n'):
    print split_line_by_indexes(i, line)

输出:

['New York', '3', 'books', '1000']
['London', '2,25', '', '2000']
['Paris', '1.000', 'apples', '3000']
['', '30', '', '4000']
['Berlin', '', 'newspapers', '']

当然，您可以用 xxxx 替换空值并将其写回文件，而不是打印

关于python - 如何填充表格中缺失的元素？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36643456/

python - 如何填充表格中缺失的元素？

更新

第 1 步:检测列起始位置

第 2 步:标记这些位置上的每一行

上一篇：python - 如何根据正则表达式模式从文本文件中提取数据

下一篇：python - 如何使用 bs4 获取多个嵌套的 div 值并以 json 格式输出？