java - Spring 批处理 : Parsing a CSV file with quoteCharacter

标签 java spring spring-batch

我是 Spring Batch 的新手,我们知道 CSV 文件有各种形式和形状……其中一些在语法上是不正确的。我正在尝试解析 CSV 文件,该行以 '"' 并以 '"' 结尾,这是我的 CSV:

"1;Paris;13/4/1992;16/7/2006"
"2;Lyon;31/5/1993;1/8/2009"
"3;Metz;21/4/1990;27/4/2010"

我试过这个:

  <bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="data-1.txt" />
    <property name="lineMapper">
      <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
        <property name="fieldSetMapper">
          <!-- Mapper which maps each individual items in a record to properties in POJO -->
          <bean class="com.sam.fourthTp.MyFieldSetMapper" />
        </property>
        <property name="lineTokenizer">
          <!-- A tokenizer class to be used when items in input record are separated by specific characters -->
          <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
            <property name="quoteCharacter" value="&quot;" />
            <property name="delimiter" value=";" />
          </bean>
        </property>
      </bean>
    </property>
  </bean>

但是当 CSV 文件如下所示时,此方法有效:

"1";"Paris";"13/4/1992";"16/7/2006"
"2;"Lyon";"31/5/1993";"1/8/2009"
"3";"Metz";"21/4/1990";"27/4/2010"

我的问题是当一行开头时如何解析我的 CSV '"' 并以 '"' 结尾 ??!

最佳答案

正如您提到的,quoteCharacter 适用于字段,而不是记录。

My question is how I can parse my CSV when a line start with '"' and end with '"' ??!

你能做的是:

  • 将行读取为原始字符串
  • 使用具有两个委托(delegate)的复合项目处理器:一个委托(delegate)从每条记录的开头/结尾修剪 ",另一个委托(delegate)解析该行并将其映射到您的域对象

这是一个简单的示例:

import java.util.Arrays;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.batch.item.support.CompositeItemProcessor;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableBatchProcessing
public class MyJob {

    @Autowired
    private JobBuilderFactory jobs;

    @Autowired
    private StepBuilderFactory steps;

    @Bean
    public ItemReader<String> itemReader() {
        return new ListItemReader<>(Arrays.asList(
                "\"1;Paris;13/4/1992;16/7/2006\"",
                "\"2;Lyon;31/5/1993;1/8/2009\"",
                "\"3;Metz;21/4/1990;27/4/2010\"",
                "\"4;Lille;21/4/1980;27/4/2011\""
                ));
    }

    @Bean
    public ItemProcessor<String, String> itemProcessor1() {
        return item -> item.substring(1, item.length() - 1);
    }

    @Bean
    public ItemProcessor<String, Record> itemProcessor2() {
        DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
        lineTokenizer.setNames("id", "ville");
        lineTokenizer.setDelimiter(";");
        lineTokenizer.setStrict(false);
        BeanWrapperFieldSetMapper<Record> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
        fieldSetMapper.setTargetType(Record.class);
        return item -> {
            FieldSet tokens = lineTokenizer.tokenize(item);
            return fieldSetMapper.mapFieldSet(tokens);
        };
    }

    @Bean
    public ItemWriter<Record> itemWriter() {
        return items -> {
            for (Record item : items) {
                System.out.println(item);
            }
        };
    }

    @Bean
    public CompositeItemProcessor<String, Record> compositeItemProcessor() {
        CompositeItemProcessor<String, Record> compositeItemProcessor = new CompositeItemProcessor<>();
        compositeItemProcessor.setDelegates(Arrays.asList(itemProcessor1(), itemProcessor2()));
        return compositeItemProcessor;
    }

    @Bean
    public Step step() {
        return steps.get("step")
                .<String, Record>chunk(2)
                .reader(itemReader())
                .processor(compositeItemProcessor())
                .writer(itemWriter())
                .build();
    }

    @Bean
    public Job job() {
        return jobs.get("job")
                .start(step())
                .build();
    }

    public static class Record {

        private int id;
        private String ville;

        public Record() {
        }

        public int getId() {
            return id;
        }

        public void setId(int id) {
            this.id = id;
        }

        public String getVille() {
            return ville;
        }

        public void setVille(String ville) {
            this.ville = ville;
        }

        @Override
        public String toString() {
            return "Record{" +
                    "id=" + id +
                    ", ville='" + ville + '\'' +
                    '}';
        }
    }

    public static void main(String[] args) throws Exception {
        ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
        JobLauncher jobLauncher = context.getBean(JobLauncher.class);
        Job job = context.getBean(Job.class);
        jobLauncher.run(job, new JobParameters());
    }

}

我使用了一个名为 Record 的简单 POJO,并且仅映射了两个字段。该示例打印:

Record{id=1, ville='Paris'}
Record{id=2, ville='Lyon'}
Record{id=3, ville='Metz'}
Record{id=4, ville='Lille'}

希望这有帮助。

关于java - Spring 批处理 : Parsing a CSV file with quoteCharacter,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55730974/

相关文章:

java - *系列* Bean 的多个实例?

java - 带有教程的示例企业标准 Spring 应用程序

java - Spring错误-org.springframework.beans.factory.NoSuchBeanDefinitionException

java - JpaItemWriter : no transaction is in progress

java - 无法从servlet获取传递的数据

java - 无法在java中包含自己创建的包

java - Spring Batch Reader参数问题

java - Spring 批处理 : get list of defined jobs at runtime

java - Tomcat:如何将不同的目录指向同一个 WEB-INF

java - 将新方法添加到由许多类直接实现的接口(interface)