我正在尝试通过 springboot+jsoup+hadoop+R 来构建我的作品集。 我成功地通过 jsoup 从网站上抓取了数据(标题、内容、IP、作者等)。 我在其上添加了 repo.save() 方法以将数据保存到 mysql。 当我执行它时,它突然自动停止,没有错误......;; 每页大约有40个帖子,如果我废弃1~30页,那么应该有大约1200个raws必须保存到数据库中。
但它会停止大约 400 个原始数据,有时也会停止大约 120 个原始数据。
我想知道如何解决这个问题。
我附上代码。 感谢您阅读我的帖子! :)
[==== console log ====
4.writer :dwk(175.223.*.*) 2018-03-19 02:10:40
Hibernate:
insert
into
dc_invest
(content, ip_address, regdate, title, tno, updatedate, upload_date, writer)
values
(?, ?, ?, ?, ?, ?, ?, ?)
1.tno : 3113177
2018-03-19 02:51:48.889 INFO 24571 --- \[ Thread-3\] o.s.w.c.s.GenericWebApplicationContext : Closing org.springframework.web.context.support.GenericWebApplicationContext@1033576a: startup date \[Mon Mar 19 02:43:42 KST 2018\]; root of context hierarchy
2018-03-19 02:51:48.904 INFO 24571 --- \[ Thread-3\] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default'
2018-03-19 02:51:48.907 INFO 24571 --- \[ Thread-3\] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated...
2018-03-19 02:51:48.918 INFO 24571 --- \[ Thread-3\] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed.
============================================
@Test
public void InsertToDB2() throws IOException {
// page Checker
tunner:while (true) {
int page = 1;
System.out.println("======= " + page + "start ======");
// target url.
String url = "http://gall.dcinside.com/board/lists/?id=stock_new2&page=" + page;
Document doc = Jsoup.connect(url).get();
Elements title_no = doc.select("td.t_notice");
// filtering
int cnt = 0;
while (cnt < 4) {
title_no.remove(0);
cnt++;
}
// 1.Getting tno.
for (Element no : title_no) {
System.out.println("1.tno : " + no.text());
String no_ = no.text();
WebDc dc = new WebDc();
dc.setTno(Long.parseLong(no_));
// 2.Getting title.
Document tno_doc = Jsoup
.connect("http://gall.dcinside.com/board/view/?id=stock_new2&no=" + no_ + "&page=" + page)
.get();
Elements titles = tno_doc.select("dl.wt_subject dd");
for (Element title : titles) {
System.out.println("2.Title : " + title.text());
dc.setTitle(title.text());
Document title_content = Jsoup
.connect("http://gall.dcinside.com/board/view/?id=stock_new2&no=" + no_ + "&page=" + page)
.get();
// 3. Getting contents
Element content = title_content.select("div.s_write").first();
// contents checker
if (content.hasText()) {
System.out.println(content.text());
dc.setContent(content.text());
} else {
System.out.println("there is no letters");
}
// 4. Getting writer&ip&date.
Element writer = title_content.select("span.user_nick_nm").first();
Element ip = title_content.select("li.li_ip").first();
Element date = title_content.select("div.w_top_right b").first();
System.out.println("4.writer :" + writer.text() + "(" + ip.text() + ")" + " " + date.text());
dc.setWriter(writer.text());
dc.setIpAddress(ip.text());
dc.setUploadDate(date.text());
repo.save(dc);
}
}
// page limiter & checker
System.out.println("==== Done Of " + page + " ====");
if(page < 30){
page++;
} else{
break tunner;
}
}
} // while
============= domain =========
package com.example.domain;
import java.sql.Timestamp;
import java.util.Date;
import java.util.List;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.Table;
import org.hibernate.annotations.CreationTimestamp;
import org.hibernate.annotations.UpdateTimestamp;
import org.jboss.logging.FormatWith;
import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.Setter;
import lombok.ToString;
@Getter
@Setter
@Entity
@Table(name = "dc_invest")
@EqualsAndHashCode(of = "db_no")
@ToString
public class WebDc {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long db_no;
private Long tno;
// for making LONGTEXT
@Column(length=1000000000)
private String title;
private String writer;
@Column(length=1000000000)
private String content;
private String ipAddress;
private String uploadDate;
@CreationTimestamp
private Timestamp regdate;
@UpdateTimestamp
private Timestamp updatedate;
}][1]
最佳答案
我从 Hibernate 开始,但以前也遇到过这样的问题。总是因为同样的请求而发生这种情况。然后我发现我正在打开尽可能多的连接。我必须找到一种方法来不打开这么多连接。据我了解,你正朝着这个方向前进。希望这会有所帮助。
关于java - 在 Spring Boot 上将数据保存到数据库时,程序自行停止而没有错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49354595/