嗨,我是 Spring Batch 的新手,我想为每个处理的 block 创建多个文件(csv)。文件名类似于时间戳.csv。 知道我该怎么做吗?基本上,它是将一个大文件拆分为较小的文件。
谢谢!
最佳答案
我会使用命令行实用程序,例如 split
命令(或等效命令),或者尝试使用纯 Java 来执行此操作(请参阅 Java - Read file and split into multiple files )。
但是如果你真的想用 Spring Batch 来做到这一点,那么你可以使用类似的东西:
import java.time.LocalDateTime;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.PassThroughLineMapper;
import org.springframework.batch.item.file.transform.PassThroughLineAggregator;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;
@Configuration
@EnableBatchProcessing
public class MyJob {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
public MyJob(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
}
@Bean
public FlatFileItemReader<String> itemReader() {
return new FlatFileItemReaderBuilder<String>()
.name("flatFileReader")
.resource(new FileSystemResource("foos.txt"))
.lineMapper(new PassThroughLineMapper())
.build();
}
@Bean
public ItemWriter<String> itemWriter() {
final FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
writer.setLineAggregator(new PassThroughLineAggregator<>());
writer.setName("chunkFileItemWriter");
return items -> {
writer.setResource(new FileSystemResource("foos" + getTimestamp() + ".txt"));
writer.open(new ExecutionContext());
writer.write(items);
writer.close();
};
}
private String getTimestamp() {
// TODO tested on unix/linux systems, update as needed to not contain illegal characters for a file name on MS windows
return LocalDateTime.now().toString();
}
@Bean
public Step step() {
return stepBuilderFactory.get("step")
.<String, String>chunk(3)
.reader(itemReader())
.writer(itemWriter())
.build();
}
@Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(step())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
文件foos.txt
如下:
foo1
foo2
foo3
foo4
foo5
foo6
该示例将使用时间戳将每个 block 写入单独的文件中:
文件1 foos2019-11-28T09:23:47.769.txt
:
foo1
foo2
foo3
文件2 foos2019-11-28T09:23:47.779.txt
:
foo4
foo5
foo6
顺便说一句,我认为最好使用数字而不是时间戳。
注意:我不太关心这种用例的可重新启动性。
关于spring - 如何为每个 block 创建多个文件(csv)?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59046620/