Python 多处理代码未启动

标签 python multiprocessing

由于我使用许多 .csv 文件,并且必须经常将它们转换为不同的格式,因此我真的很想在 Python 中为此制作一个应用程序,这样我就不必每次都在 notepad++ 中手动执行此操作。

下面的代码可以在单核模式下工作。但是,我想合并多处理来一次处理几个线程。

我的问题是,当我使用多重处理时,代码完成时没有任何错误,也没有打印任何内容。

你能帮我解决这个问题吗?

import pandas as pd
from multiprocessing import Process
import os
import time
start_time = time.time()
thread_count = 2

def timer():
    print('Script ran for ' + time.strftime("%H:%M:%S", time.gmtime(time.time() - start_time)))

class Imported:

    def __init__(self, filename, convert_type):
        self.filename, self.convert_type = filename, convert_type

        if self.convert_type == 'to_hana':
            self.delim = ';'
            self.decim = ','

        if self.convert_type == 'from_hana':
            self.delim = ','
            self.decim = '.'

        if self.convert_type not in ['to_hana' , 'from_hana']:
            raise ValueError('Convert type is wrong!')

    def convert(self):
        if self.convert_type == 'to_hana':

            self.file = pd.read_csv(self.filename, encoding = 'utf-8', sep=self.delim, decimal=self.decim)
            self.error_check()

            self.delim = ','
            self.decim = '.'

            process_id = os.getpid()
            print(f'Process ID: {process_id}')

            self.file.to_csv(self.filename, index = None, header=True, decimal=self.decim, sep=self.delim)

            print('test1')
            timer()

        if self.convert_type == 'from_hana':

            self.file = pd.read_csv(self.filename, encoding = 'utf-8', sep=self.delim, decimal=self.decim)
            self.error_check()

            self.delim = ';'
            self.decim = ','

            process_id = os.getpid()
            print(f'Process ID: {process_id}')

            self.file.to_csv(self.filename, index = None, header=True, decimal=self.decim, sep=self.delim)

            print('test2')
            timer()

    def error_check(self):
        if len(list(self.file.columns.values)[0]) > 20:
            raise ValueError('Pravědpodobně poškozený soubor. Název prvního sloupce je příliš dlouhý.')

if __name__ == '__main__':

    filenames = [['data.csv','from_hana'],['klasifikace.csv','to_hana'],['klasifikace_statistiky.csv','to_hana']]

    processes = []

    #for i in enumerate(filenames):
    #    Imported.convert(Imported(filenames[i[0]][0], filenames[i[0]][1]))

    for i in enumerate(filenames):
        process = Process(target=Imported.convert, args=(filenames[i[0]][0], filenames[i[0]][1]))
        processes.append(process)
        process.start()

    for process in processes:
        process.join()

    print('DONE')

最佳答案

您可以通过创建类的对象然后通过将目标指定为 obj.convert 来启动进程来解决此问题

import pandas as pd
from multiprocessing import Process
import os
import time

start_time = time.time()
thread_count = 2


def timer():
    print('Script ran for ' + time.strftime("%H:%M:%S", time.gmtime(time.time() - start_time)))


class Imported:

    def __init__(self, filename, convert_type):
        self.filename, self.convert_type = filename, convert_type

        if self.convert_type == 'to_hana':
            self.delim = ';'
            self.decim = ','

        if self.convert_type == 'from_hana':
            self.delim = ','
            self.decim = '.'

        if self.convert_type not in ['to_hana', 'from_hana']:
            raise ValueError('Convert type is wrong!')

    def convert(self):
        if self.convert_type == 'to_hana':
            self.file = pd.read_csv(self.filename, encoding='utf-8', sep=self.delim, decimal=self.decim)
            self.error_check()

            self.delim = ','
            self.decim = '.'

            process_id = os.getpid()
            print('Process ID:', process_id)

            self.file.to_csv(self.filename, index=None, header=True, decimal=self.decim, sep=self.delim)

            print('test1')
            timer()

        if self.convert_type == 'from_hana':
            self.file = pd.read_csv(self.filename, encoding='utf-8', sep=self.delim, decimal=self.decim)
            self.error_check()

            self.delim = ';'
            self.decim = ','

            process_id = os.getpid()
            print('Process ID', process_id)

            self.file.to_csv(self.filename, index=None, header=True, decimal=self.decim, sep=self.delim)

            print('test2')
            timer()

    def error_check(self):
        if len(list(self.file.columns.values)[0]) > 20:
            raise ValueError('Pravědpodobně poškozený soubor. Název prvního sloupce je příliš dlouhý.')


if __name__ == '__main__':

    filenames = [['data.csv', 'from_hana'], ['klasifikace.csv', 'to_hana'], ['klasifikace_statistiky.csv', 'to_hana']]

    processes = []

    # for i in enumerate(filenames):
    #    Imported.convert(Imported(filenames[i[0]][0], filenames[i[0]][1]))

    for i in enumerate(filenames):
        obj = Imported(filenames[i[0]][0], filenames[i[0]][1])
        process = Process(target=obj.convert)
        processes.append(process)
        process.start()

    for process in processes:
        process.join()

    print('DONE')

关于Python 多处理代码未启动,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57727602/

相关文章:

python - 使用 opencv imread 读取时图像高度和宽度被交换

python - Pydantic 场没有值(value)

python - 装饰类的方法导致 2.7 中的未绑定(bind)方法 TypeError

python - 在 Pandas 图中仅隐藏轴标签,而不是整个轴

python - 避免用户中止 python 子进程

python - 是否可以按顺序启动 Pool 进程?

python - 使用多重处理以自己的方法映射数组的每个元素

python - Gunicorn 未在 AWS EC2 上创建 .sock 文件

python - 多处理中的共享内存

python - 线程和进程混合的奇怪行为