java - 在 Java 中点击 CSV 文件的行首或行尾

标签 java csv opencsv

我正在使用此代码来分割和处理 csv 文件,问题是 block 被设置在任意位置,可能在行的开头、中间或结尾!

如何将 start_loc 设置为行首或行尾,以便 block 成为完整的 CSV 文件而不会丢失任何数据?

public static void main(String[] args) throws IOException {
        long start = System.currentTimeMillis();

        CSVReader reader = new CSVReader(new FileReader("x_tran.csv"));
        String[] columnsNames = reader.readNext();
        reader.close();
        FileInputStream fileInputStream = new FileInputStream("x_tran.csv");
        FileChannel channel = fileInputStream.getChannel();
        long remaining_size = channel.size(); //get the total number of bytes in the file
        long chunk_size = remaining_size / 4; //file_size/threads

        //Max allocation size allowed is ~2GB
        if (chunk_size > (Integer.MAX_VALUE - 5))
        {
            chunk_size = (Integer.MAX_VALUE - 5);
        }

        //thread pool
        ExecutorService executor = Executors.newFixedThreadPool(4);

        long start_loc = 0;//file pointer
        int i = 0; //loop counter
        boolean first = true;
        while (remaining_size >= chunk_size)
        {
            //launches a new thread
            executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i, String.join(",", columnsNames), first));
            remaining_size = remaining_size - chunk_size;
            start_loc = start_loc + chunk_size;
            i++;
            first = false;
        }

        //load the last remaining piece
        executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i, String.join(",", columnsNames), first));

        //Tear Down
        executor.shutdown();

        //Wait for all threads to finish
        while (!executor.isTerminated())
        {
            //wait for infinity time
        }
        System.out.println("Finished all threads");
        fileInputStream.close();


        long finish = System.currentTimeMillis();
        System.out.println( "Time elapsed: " + (finish - start) );
    }

最佳答案

您可以读取文件一次,然后让每个线程处理以线程数为模的行(例如第一个线程处理第 0、4、8 行等)。

package ...;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class CsvParallelReader {

    private static final int THREAD_NUMBER = 4;

    public static void main(String[] args) {


        ExecutorService executor = Executors.newFixedThreadPool(THREAD_NUMBER);


        try {
            List<String> lines = Files.readAllLines(Path.of("yourfile.csv"));

            for (int i = 0; i < THREAD_NUMBER; i++) {
                Runnable readTask = new ReadTask(i, lines);
                executor.submit(readTask);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }


    }

    private static class ReadTask implements Runnable {

        private final List<String> lines;
        private int start;

        public ReadTask(int start, List<String> lines) {
            this.start = start;
            this.lines = lines;
        }

        @Override
        public void run() {
            for (int i = start; i < lines.size(); i += THREAD_NUMBER) {
                // do something with this line of data
            }
        }
    }
}

关于java - 在 Java 中点击 CSV 文件的行首或行尾,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57712132/

相关文章:

java - 我正在尝试用java读取csv文件。下面是我的代码

c++ - 如何使用 C/C++ 有效地加入巨大的 csv 文件(100 0's of columns x 1000' 行)?

java - Android 中的 CSV 文件无法写入?

java - Spring Boot 仪表板不会在 VSCode 上显示应用程序

java - 正则表达式从复杂字符串中提取子字符串

python - 将每一列作为自己的列表

java - 循环访问 3 个不同的 JDBC 结果集

java - 如何在 spring amqp 中设置 basicQos 以实现公平调度?

java - 如何从 apache POI 中的 HSSFComments 获取值(value)

java - 使用自定义字符串分隔符解析 CSV 文件