spring - 如何为每个 block 创建多个文件(csv)?

标签 spring spring-boot spring-batch

嗨,我是 Spring Batch 的新手,我想为每个处理的 block 创建多个文件(csv)。文件名类似于时间戳.csv。 知道我该怎么做吗?基本上,它是将一个大文件拆分为较小的文件。

谢谢!

最佳答案

我会使用命令行实用程序,例如 split 命令(或等效命令),或者尝试使用纯 Java 来执行此操作(请参阅 Java - Read file and split into multiple files )。

但是如果你真的想用 Spring Batch 来做到这一点,那么你可以使用类似的东西:

import java.time.LocalDateTime;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.PassThroughLineMapper;
import org.springframework.batch.item.file.transform.PassThroughLineAggregator;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;

@Configuration
@EnableBatchProcessing
public class MyJob {

    private final JobBuilderFactory jobBuilderFactory;

    private final StepBuilderFactory stepBuilderFactory;

    public MyJob(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
        this.jobBuilderFactory = jobBuilderFactory;
        this.stepBuilderFactory = stepBuilderFactory;
    }

    @Bean
    public FlatFileItemReader<String> itemReader() {
        return new FlatFileItemReaderBuilder<String>()
                .name("flatFileReader")
                .resource(new FileSystemResource("foos.txt"))
                .lineMapper(new PassThroughLineMapper())
                .build();
    }

    @Bean
    public ItemWriter<String> itemWriter() {
        final FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
        writer.setLineAggregator(new PassThroughLineAggregator<>());
        writer.setName("chunkFileItemWriter");
        return items -> {
            writer.setResource(new FileSystemResource("foos" + getTimestamp() + ".txt"));
            writer.open(new ExecutionContext());
            writer.write(items);
            writer.close();
        };
    }

    private String getTimestamp() {
        // TODO tested on unix/linux systems, update as needed to not contain illegal characters for a file name on MS windows
        return LocalDateTime.now().toString();
    }

    @Bean
    public Step step() {
        return stepBuilderFactory.get("step")
                .<String, String>chunk(3)
                .reader(itemReader())
                .writer(itemWriter())
                .build();
    }

    @Bean
    public Job job() {
        return jobBuilderFactory.get("job")
                .start(step())
                .build();
    }

    public static void main(String[] args) throws Exception {
        ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
        JobLauncher jobLauncher = context.getBean(JobLauncher.class);
        Job job = context.getBean(Job.class);
        jobLauncher.run(job, new JobParameters());
    }

}

文件foos.txt如下:

foo1
foo2
foo3
foo4
foo5
foo6

该示例将使用时间戳将每个 block 写入单独的文件中:

文件1 foos2019-11-28T09:23:47.769.txt:

foo1
foo2
foo3

文件2 foos2019-11-28T09:23:47.779.txt:

foo4
foo5
foo6

顺便说一句,我认为最好使用数字而不是时间戳。

注意:我不太关心这种用例的可重新启动性。

关于spring - 如何为每个 block 创建多个文件(csv)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59046620/

相关文章:

mysql - Spring + Hibernate - @Table 名称注释的问题

java - 分隔多个登录 Web 用户之间的变量值

java - 在 Spring 批处理中从处理器调用另一个作业

spring-batch - 如何保证一个作业只能同时运行一个JobInstance?

java - Spring addFormatters 未调用 WebMvcConfigurerAdapter

java - Spring批处理如何根据条件跳过整个文件

java - ContextConfiguration注解异常

java - 如何从 Java/Spring 创建类型化的 Tuple2?

java - 在 Spring Boot 中的 application.properties 文件中定义映射列表

java - Tomcat关闭时Spring Boot中Executor服务的关闭