我这样做的方式相当麻烦。我如何能够调整以下代码以使用列表中的国家/地区代码修改链接,并加载这些 json 链接,而不是联合不同的数据帧。
非常感谢您的帮助!
import urllib.request
import json
import pandas as pd
from datetime import datetime
countries = ["nl", "us", "se"]
## load Dutch top episodes chart
with urllib.request.urlopen("https://podcastcharts.byspotify.com/api/charts/top_episodes? region=nl") as url_NL:
dataFrameNL = json.load(url_NL)
print(dataFrameNL)
## load US top episodes chart
with urllib.request.urlopen("https://podcastcharts.byspotify.com/api/charts/top_episodes?region=us") as url_US:
dataFrameUS = json.load(url_US)
print(dataFrameUS)
# creating the dataframe
## NL
dfNL = pd.json_normalize(dataFrameNL)
## US
dfUS = pd.json_normalize(dataFrameUS)
## add scraped_date
dfNL['scraped_date'] = pd.Timestamp.today().strftime('%Y-%m-%d')
dfUS['scraped_date'] = pd.Timestamp.today().strftime('%Y-%m-%d')
## add rank
dfNL["rank"] = dfNL.index + 1
dfUS["rank"] = dfNL.index + 1
## add country
dfNL['country'] = 'NL'
dfUS['country'] = 'US'
## concetenate
union_dataframes = pd.concat([dfNL, dfUS])
## create file name with date output
file_name = 'mycsvfile' + str(datetime.today().strftime('%Y-%m-%d')) + '.csv'
# converted a file to csv
union_dataframes.to_csv(file_name, encoding='utf-8', index=False)
我正在加载不同的数据集并将它们连接起来,而不是在列表上使用循环函数。
最佳答案
创建循环并处理 DataFrame 列表的 country
的每个值,最后一个外部循环通过 concat()
连接在一起:
from pathlib import Path
import pandas as pd
countries = ['nl', 'us', 'se']
url_base = 'https://podcastcharts.byspotify.com/api/charts/top_episodes'
today = pd.Timestamp.today().strftime('%Y-%m-%d')
dfs = []
for country in countries:
# dynamic set country by f-string
with urllib.request.urlopen(f'{url_base}?region={country}') as url:
dataFrame = json.load(url)
df = pd.json_normalize(dataFrame)
# add scraped_date
df['scraped_date'] = today
# add rank
df['rank'] = dfNL.index + 1
# add country, dynamic generate uppercase country name
df['country'] = country.upper()
dfs.append(df)
# concatenate
union_dataframes = pd.concat(dfs)
# create file name with date output
file_path = Path(f'mycsvfile{today}.csv')
# converted a file to csv
union_dataframes.to_csv(file_path, encoding='utf-8', index=False)
编辑:
from pathlib import Path
import pandas as pd
countries = ['nl', 'us', 'se']
url_base = 'podcastcharts.byspotify.com/api/'
today = pd.Timestamp.today().strftime('%Y-%m-%d')
dfs = []
for country in countries:
for category in categories:
# dynamic set country by f-string
with urllib.request.urlopen(f'{url_base}charts{category}?region={country}') as url:
dataFrame = json.load(url)
df = pd.json_normalize(dataFrame)
# add scraped_date
df['scraped_date'] = today
# add rank
df['rank'] = dfNL.index + 1
# add country, dynamic generate uppercase country name
df['country'] = country.upper()
df['category'] = category
dfs.append(df)
# concatenate
union_dataframes = pd.concat(dfs)
# create file name with date output
file_path = Path(f'mycsvfile{today}.csv')
# converted a file to csv
union_dataframes.to_csv(file_path, encoding='utf-8', index=False)
关于python - 基本的 Python 问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75753527/