python - 如何实时读取html并循环插入MySQL？

我读取 html 表并使用此代码将其放入 MySQL

html = urllib.request.urlopen("http://xxx")
bt = BeautifulSoup(html,"lxml") 
alltable = bt.find_all('table')

def read_data(last_id):

    lst_df = pd.read_html(str(alltable))

    #Change list of daframe to one dataframe
    df = pd.concat(lst_df)


    l_id = last_id+1    
    res = df.loc[df.ID ==l_id]
    mycursor = mydb.cursor(buffered=True)

    if not res.empty:

        number = res['number'].item()
        user = res['User'].item()

        qt = check_user(user)

        if not number > qt:

            r = q - p

            sql = "UPDATE user SET p = %s WHERE user = %s"
            val = (r, user)
            mycursor.execute(sql, val)
            mydb.commit()

            print(mycursor.rowcount, "record(s) affected")

        #Insert Log
        sql = "INSERT INTO log (id, user, number, l_id) VALUES (%s, %s, %s, %s,)"
        val = [(None, user, number , l_id)]

        mycursor.executemany(sql, val)

        mydb.commit()

        print(mycursor.rowcount, "was inserted.") 

        mycursor.close()

我在这段代码中使用 while 循环来运行函数。

while True:
    last_id = get_last_id_db()
    read_data(last_id)

它只调用函数一次。我更新了 html 表，但 MySQL 没有更新。当我更新表时，MySQL 应该自动更新，因为 ture 仍在运行。

当我点击运行按钮1次时没有问题。但我想自动检查html。所以，我使用 while true

最佳答案

这应该在 read_data() 函数中，这样您就可以获取网络的最新状态，而不是在开始时只获取一次
```
html = urllib.request.urlopen("http://xxx")
bt = BeautifulSoup(html,"lxml") 
alltable = bt.find_all('table')
```
您是否总是将相同的 last_id 传递给 read_data()？
(强烈推荐)您应该在循环中使用某种 sleep() 机制，以便每隔几秒/分钟获取一次数据。

关于python - 如何实时读取html并循环插入MySQL？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59280426/

python - 如何实时读取html并循环插入MySQL？

上一篇：mysql - 超过 1 个表的 SQL 内连接问题

下一篇：此 Oracle 触发器的 MySQL 等效代码