python - 如何合并两个列表?为集合操作保留相同的列表元素

标签 python list join merge set

我一直在绘制维恩图、编码循环和不同的集合(symmetrical_differences、unions、intersection、isdisjoint),在一两天的大部分时间里按行号进行枚举,试图弄清楚如何在代码中实现它。

a = [1, 2, 2, 3] # <-------------|
b = [1, 2, 3, 3, 4] # <----------| Do not need to be in order.
result = [1, 2, 2, 3, 3, 4] # <--|

或者:

A = [1,'d','d',3,'x','y']
B = [1,'d',3,3,'z']
result =  [1,'d','d',3,3,'x','y','z']

编辑:

不尝试做 a + b = [1, 1, 2, 2, 2, 3, 3, 3, 4]

尝试做类似的事情:

a - b = [2]

b - a = [3, 4]

a∩b = [1,2,3]

所以

[a - b] + [b - a] + a ∩ b = [1, 2, 2, 3, 3, 4] ?

这里我不确定。

我有两个电子表格,每个都有几千行。我想按列类型比较这两个电子表格。

我已经根据每一列创建了列表以进行比较/合并。

def returnLineList(fn):
    with open(fn,'r') as f:
        lines = f.readlines()
    line_list = []
    for line in lines:
        line = line.split('\t')
        line_list.append(line)
    return line_list

def returnHeaderIndexDictionary(titles):
    tmp_dict = {}
    for x in titles:
        tmp_dict.update({x:titles.index(x)})
    return tmp_dict

def returnColumn(index, l):
    column = []
    for row in l:
        column.append(row[index])
    return column

def enumList(column):
    tmp_list = []
    for row, item in enumerate(column):
        tmp_list.append([row,item])
    return tmp_list

def compareAndMergeEnumerated(L1,L2):
    less = []
    more = []
    same = []
    for row1,item1 in enumerate(L1):
        for row2,item2 in enumerate(L2):
            if item1 in item2:
                count1 = L1.count(item1)
                count2 = L2.count(item2)
                dif = count1 - count2
                if dif != 0:
                    if dif < 0:
                        less.append(["dif:"+str(dif),[item1,row1],[item2,row2]])
                    if dif > 0:
                        more.append(["dif:"+str(dif),[item1,row1],[item2,row2]])
                else:
                    same.append(["dif:"+str(dif),[item1,row1],[item2,row2]])
                break
    return less,more,same,len(less+more+same),len(L1),len(L2)

def main():
    unsorted_lines = returnLineList('unsorted.csv')
    manifested_lines = returnLineList('manifested.csv')

    indexU = returnHeaderIndexDictionary(unsorted_lines[0])
    indexM = returnHeaderIndexDictionary(manifested_lines[0])

    u_j_column = returnColumn(indexU['jnumber'],unsorted_lines)
    m_j_column = returnColumn(indexM['jnumber'],manifested_lines)

    print(compareAndMergeEnumerated(u_j_column,m_j_column))

if __name__ == '__main__':
    main()

更新:

from collections import OrderedDict
A = [1,'d','d',3,'x','y']
B = [1,'d',3,3,'z']
M = A + B
R = [1,'d','d',3,3,'x','y','z']


ACount = {}
AL = lambda x: ACount.update({str(x):A.count(x)})
[AL(x) for x in A]

BCount = {}
BL = lambda x: BCount.update({str(x):B.count(x)})
[BL(x) for x in B]

MCount = {}
ML = lambda x: MCount.update({str(x):M.count(x)})
[ML(x) for x in M]


RCount = {}
RL = lambda x: RCount.update({str(x):R.count(x)})
[RL(x) for x in R]


print('^sym_difAB',set(A) ^ set(B)) # set(A).symmetric_difference(set(B))
print('^sym_difBA',set(B) ^ set(A)) # set(A).symmetric_difference(set(B))
print('|union    ',set(A) | set(B)) # set(A).union(set(B))
print('&intersect',set(A) & set(B)) # set(A).intersection(set(B))
print('-dif AB   ',set(A) - set(B)) # set(A).difference(set(B))
print('-dif BA   ',set(B) - set(A)) 
print('<=subsetAB',set(A) <= set(B)) # set(A).issubset(set(B))
print('<=subsetBA',set(B) <= set(A)) # set(B).issubset(set(A))
print('>=supsetAB',set(A) >= set(B)) # set(A).issuperset(set(B))
print('>=supsetBA',set(B) >= set(A)) # set(B).issuperset(set(A))

print(sorted(A + [x for x in (set(A) ^ set(B))]))
#[1, 3, 'd', 'd', 'x', 'x', 'y', 'y', 'z']

print(sorted(B + [x for x in (set(A) ^ set(B))]))
#[1, 3, 3, 'd', 'x', 'y', 'z', 'z']
cA = lambda y: A.count(y)
cB = lambda y: B.count(y)
cM = lambda y: M.count(y)
cR = lambda y: R.count(y)
print(sorted([[y,cA(y)] for y in (set(A) ^ set(B))]))
#[['x', 1], ['y', 1], ['z', 0]]

print(sorted([[y,cB(y)] for y in (set(A) ^ set(B))]))
#[['x', 0], ['y', 0], ['z', 1]]

print(sorted([[y,cA(y)] for y in A]))
print(sorted([[y,cB(y)] for y in B]))
print(sorted([[y,cM(y)] for y in M]))
print(sorted([[y,cR(y)] for y in R]))
#[[1, 1], [3, 1], ['d', 2], ['d', 2], ['x', 1], ['y', 1]]
#[[1, 1], [3, 2], [3, 2], ['d', 1], ['z', 1]]
#[[1, 2], [1, 2], [3, 3], [3, 3], [3, 3], ['d', 3], ['d', 3], ['d', 3], ['x', 1], ['y', 1], ['z', 1]]
#[[1, 1], [3, 2], [3, 2], ['d', 2], ['d', 2], ['x', 1], ['y', 1], ['z', 1]]

cAL = sorted([[y,cA(y)] for y in A])

enter image description here

更新:2

基本上我认为是时候学习了:

它看起来像是聚合、groupby 和求和的组合。

最佳答案

还不需要学习 pandas! (虽然它是一个非常优秀的库。)我不确定我是否完全理解你的问题,但是 collections.Counter 数据类型被设计为充当包/多重集。实现的运营商之一是“或”,这可能是您需要的。阅读此代码示例中的注释,看看它是否符合您的需求:

a = [1, 2, 2, 3]
b = [1, 2, 3, 3, 4]

from collections import Counter

# A Counter data type counts the elements fed to it and holds
# them in a dict-like type.

a_counts = Counter(a) # {1: 1, 2: 2, 3: 1}
b_counts = Counter(b) # {1: 1, 2: 1, 3: 2, 4: 1}

# The union of two Counter types is the max of each value
# in the (key, value) pairs in each Counter. Similar to
# {(key, max(a_counts[key], b_counts[key])) for key in ...}

result_counts = a_counts | b_counts

# Return an iterator over the keys repeating each as many times as its count.

result = list(result_counts.elements())

# Result:
# [1, 2, 2, 3, 3, 4]

关于python - 如何合并两个列表?为集合操作保留相同的列表元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23923428/

相关文章:

python - Django 数据库查询往返

python - 如何用Python修改xml文件中指定元素的值?

r - 从数据框列表中的 ggplot 中绘制线条

mysql - 优化连接查询以从 A 获取数据,条件是 B 按 B 排序

mysql - SQL JOIN 第一行与 CONCAT_WS

python - 如何将标量添加到特定范围内的 numpy 数组?

python - Pandas 数据帧 : Getting row indices from criteria 1, 按标准 2 排序

r - 旋转数据框列表并合并它们

python - 从列表列表中获取所有唯一组合,直到第 n 个组合

sql - 分解整数范围以加入 SQL