java - 如何使用 JPA(或至少使用 Hibernate)处理大型数据集?

标签 java performance hibernate jpa jakarta-ee

我需要让我的网络应用程序处理非常庞大的数据集。目前,我得到 OutOfMemoryException 或 1-2 分钟后生成的输出。

简单来说,假设我们在 DB 中有 2 个表:WorkerWorkLog,第一个表大约有 1000 行,第二个表有 10 000 000 行一。后一个表有几个字段,包括“workerId”和“hoursWorked”字段等。我们需要的是:

  1. 计算每个用户的总工作时数;

  2. 每个用户的工作时段列表。

对于普通 SQL 中的每个任务,最直接的方法 (IMO) 是:

1)

select Worker.name, sum(hoursWorked) from Worker, WorkLog 
   where Worker.id = WorkLog.workerId 
   group by Worker.name;

//results of this query should be transformed to Multimap<Worker, Long>

2)

select Worker.name, WorkLog.start, WorkLog.hoursWorked from Worker, WorkLog
   where Worker.id = WorkLog.workerId;

//results of this query should be transformed to Multimap<Worker, Period>
//if it was JDBC then it would be vitally 
//to set resultSet.setFetchSize (someSmallNumber), ~100

所以,我有两个问题:

  1. 如何使用 JPA(或至少使用 Hibernate)实现我的每个方法;
  2. 您将如何处理这个问题(当然是使用 JPA 或 Hibernate)?

最佳答案

suppose that we have 2 tables in DB: Worker and WorkLog with about 1000 rows in the first one and 10 000 000 rows in the second one

对于像这样的大批量,我的建议是使用 The StatelessSession interface来自 hibernate :

Alternatively, Hibernate provides a command-oriented API that can be used for streaming data to and from the database in the form of detached objects. A StatelessSession has no persistence context associated with it and does not provide many of the higher-level life cycle semantics. In particular, a stateless session does not implement a first-level cache nor interact with any second-level or query cache. It does not implement transactional write-behind or automatic dirty checking. Operations performed using a stateless session never cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate's event model and interceptors. Due to the lack of a first-level cache, Stateless sessions are vulnerable to data aliasing effects. A stateless session is a lower-level abstraction that is much closer to the underlying JDBC.

StatelessSession session = sessionFactory.openStatelessSession();
Transaction tx = session.beginTransaction();

ScrollableResults customers = session.getNamedQuery("GetCustomers")
    .scroll(ScrollMode.FORWARD_ONLY);
while ( customers.next() ) {
    Customer customer = (Customer) customers.get(0);
    customer.updateStuff(...);
    session.update(customer);
}

tx.commit();
session.close();

In this code example, the Customer instances returned by the query are immediately detached. They are never associated with any persistence context.

The insert(), update() and delete() operations defined by the StatelessSession interface are considered to be direct database row-level operations. They result in the immediate execution of a SQL INSERT, UPDATE or DELETE respectively. They have different semantics to the save(), saveOrUpdate() and delete() operations defined by the Session interface.

关于java - 如何使用 JPA(或至少使用 Hibernate)处理大型数据集?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/2761543/

相关文章:

java - Hibernate 将 Java 日期转换为 MySQL YEAR

java - BufferedOutputStream 连接两个连续的flush() 调用

java - 是否可以从某个固定装置的某个编号开始加载数据库?

java - 在 Java 中使用流对 2 个列表的项进行乘法和求和

java - 在网络上传递 Swing 类(引用)?

Android:浅堆和保留堆有什么区别

c# - 如果一个方法在 linq 查询中,它会被多次调用吗?

performance - hive -4表联接

java - 为什么Hibernate在查询对象时重新更新会导致Batch update返回意外的行数?

java - Hibernate @Validator 与 Spring Boot