我最近将我的应用程序切换到 Spring Boot 2。我依靠 Spring Data JPA 来处理所有事务,我注意到这与我的旧配置之间存在巨大的速度差异。存储大约 1000 个元素在大约 6 秒内完成,现在需要超过 25 秒。我看过有关使用 Data JPA 进行批处理的帖子,但这些都不起作用。
让我向您展示两种配置:
实体(两者共有):
@Entity
@Table(name = "category")
public class CategoryDB implements Serializable
{
private static final long serialVersionUID = -7047292240228252349L;
@Id
@Column(name = "category_id", length = 24)
private String category_id;
@Column(name = "category_name", length = 50)
private String name;
@Column(name = "category_plural_name", length = 50)
private String pluralName;
@Column(name = "url_icon", length = 200)
private String url;
@Column(name = "parent_category", length = 24)
@JoinColumn(name = "parent_category", referencedColumnName = "category_id")
private String parentID;
//Getters & Setters
}
旧存储库(仅显示插入内容):
@Override
public Set<String> insert(Set<CategoryDB> element)
{
Set<String> ids = new HashSet<>();
Transaction tx = session.beginTransaction();
for (CategoryDB category : element)
{
String id = (String) session.save(category);
ids.add(id);
}
tx.commit();
return ids;
}
旧的 Hibernate XML 配置文件:
<property name="show_sql">true</property>
<property name="format_sql">true</property>
<!-- connection information -->
<property name="hibernate.connection.driver_class">com.mysql.cj.jdbc.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
<!-- database pooling information -->
<property name="connection_provider_class">org.hibernate.connection.C3P0ConnectionProvider</property>
<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">100</property>
<property name="hibernate.c3p0.timeout">300</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.idle_test_period">3000</property>
旧统计数据:
18949156 nanoseconds spent acquiring 2 JDBC connections;
5025322 nanoseconds spent releasing 2 JDBC connections;
33116643 nanoseconds spent preparing 942 JDBC statements;
3185229893 nanoseconds spent executing 942 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
3374152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
新存储库:
@Repository
public interface CategoryRepository extends JpaRepository<CategoryDB,String>
{
@Query("SELECT cat.parentID FROM CategoryDB cat WHERE cat.category_id = :#{#category.category_id}")
String getParentID(@Param("category") CategoryDB category);
}
我在我的服务中使用 saveAll()
。
新的application.properties:
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.connection-timeout=6000
spring.datasource.hikari.maximum-pool-size=10
spring.jpa.properties.hibernate.show_sql=true
spring.jpa.properties.hibernate.format_sql=true
spring.jpa.properties.hibernate.generate_statistics = true
spring.jpa.properties.hibernate.dialect=org.hibernate.dialect.MySQLDialect
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
新统计数据:
24543605 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
136919170 nanoseconds spent preparing 942 JDBC statements;
5457451561 nanoseconds spent executing 941 JDBC statements;
19985781508 nanoseconds spent executing 19 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
20256178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections);
0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
可能,我代表 Spring 错误配置了某些内容。这是一个巨大的性能差异,我已经走进了死胡同。非常感谢任何关于这里出了什么问题的提示。
最佳答案
让我们合并统计数据,以便可以轻松比较它们。
旧行以 o
为前缀,新行以 n
为前缀。
计数为 0 的行将被忽略。
纳秒测量值经过格式化,以便毫秒可以位于 之前。
o: 18 949156 nanoseconds spent acquiring 2 JDBC connections;
n: 24 543605 nanoseconds spent acquiring 1 JDBC connections;
o: 33 116643 nanoseconds spent preparing 942 JDBC statements;
n: 136 919170 nanoseconds spent preparing 942 JDBC statements;
o: 3185 229893 nanoseconds spent executing 942 JDBC statements;
n: 5457 451561 nanoseconds spent executing 941 JDBC statements; //loosing ~2sec
o: 0 nanoseconds spent executing 0 JDBC batches;
n: 19985 781508 nanoseconds spent executing 19 JDBC batches; // loosing ~20sec
o: 3374 152568 nanoseconds spent executing 1 flushes (flushing a total of 941 entities and 0 collections);
n: 20256 178886 nanoseconds spent executing 3 flushes (flushing a total of 2823 entities and 0 collections); // loosing ~20sec, processing 3 times the entities
o: 6485 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections)
n: 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
以下似乎是相关点:
新版本有19个批处理,需要20秒,这在旧版本中根本不存在。
新版本有 3 次冲水,而不是 1 次,总共多花 20 秒或大约 6 倍的时间。这可能或多或少与批处理相同的额外时间,因为它们肯定是这些刷新的一部分。
虽然批处理应该会让事情变得更快,但有报告表明它们会让事情变得更慢,尤其是使用 MySql:Why Spring's jdbcTemplate.batchUpdate() so slow?
这给我们带来了一些您可以尝试/研究的事情:
- 禁用批处理,以测试您是否确实遇到了某种批处理速度慢的问题。
- 使用链接的 SO 帖子来加快批处理速度。
- 记录实际执行的 SQL 语句,以便找出差异。 由于这将导致需要操作的日志相当长,因此请尝试仅提取两个文件中的 SQL 语句并使用 diff 工具进行比较。
- 记录刷新以便了解触发额外刷新的原因。
- 使用断点和调试器或额外的日志记录来找出哪些实体被刷新以及为什么第二个变体中有更多实体。
以上所有提案均在 JPA 上运行。
但是您的统计数据和问题内容表明您正在单个或几个表中进行简单的插入。
在 JDBC 上执行此操作,例如使用 JdbcTemplate
可能会更高效,并且至少更容易理解。
关于java - Spring Data JPA 太慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61117931/