python - 在 Python 中自动读取文件

我想写一个 python 脚本，自动读取具有以下扩展名的文件(csv、tsv、json、xml、xls、xlsx、hdf5、sql) 并显示前 100 行。我要给我的脚本的唯一参数是路径。

这是我的第一次尝试。我可以使用 switch case 而不是 if/elif 作为良好实践，但除此之外，你们有改进代码的建议吗？

import os
import pandas as pd

file_path = input("enter the path to the file you want to open")
for file in file_path :
    if file.endswith('.csv', 'tsv') :
        df = pd.read_csv(file)
        print (df.head(100))
    elif file.endswith('.json') :
        df = pd.read_json(file)
        print (df.head(100))
    elif file.endswith('.xml') :
        df = pd.read_xml(file)
        print (df.head(100))
    elif file.endswith('.xls','xlsx') : 
        df = pd.read_excel(file)
        print (df.head(100))
    elif file.endswith('.hdf') :
        df = pd.read_hdf(file)
        print (df.head(100))
    elif file.endswith('.sql') :
        df = pd.read_sql(file)
        print (df.head(100))
    else:
        print("file format not supported")

最佳答案

只是为了将富有成效的评论堆栈压缩成一个独立的答案。

`if-elif` 稍微压缩并转换为函数

import os
import pandas as pd

def read_any(file):
    if file.endswith('.csv', 'tsv') :
        df = pd.read_csv(file)
    elif file.endswith('.json'):
        df = pd.read_json(file)
    elif file.endswith('.xml'):
        df = pd.read_xml(file)
    elif file.endswith('.xls','xlsx'):
        df = pd.read_excel(file)
    elif file.endswith('.hdf'):
        df = pd.read_hdf(file)           
    elif file.endswith('.sql'):
        df = pd.read_sql(file)
    else:
        raise ValueError(f'Unsupported filetype: {file}')
    return df

if __name__ == '__main__':
    # here or wherever it is used
    file_path = input("enter the path to the file you want to open")
    df = read_any(file_path)
    print(df.head(100))

Hacky (?) 函数字典

现在可以将 if-elif 堆栈重构为字典查找。我不觉得它很hacky，但它不是真的有必要。它更像是一种风格选择:

import os.path

READER_MAP = {
    'xlsx': pd.read_excel,
    'xls': pd.read_excel,
    'xml': pd.read_xml, # .. and so on
    'sql': my_read_sql  # special case
}

def my_read_sql(con_w_sql, sep='#'):
    """Made-up function that reads SQL from CON in a single call
    
    Args:
        con_w_sql: connection string + query or table name,
            joined by separator string sep
        sep: separator

    Example:
        >>> my_read_sql('postgres:///db_name#table_name')
    """
    con, sql = special_path.split(sep)
    return pd.read_sql(sql, con)

有了这一切，read_any 将缩短为:

def read_any(file):
    _, ext = os.path.splitext(file)
    try:
        reader = READER_MAP[ext]
    except KeyError:
        raise ValueError(f'Unsupported filetype: {ext}')
    return reader(file)

我真的不喜欢必须制定一个非标准约定来将 sql(或表名)与 con(连接字符串)连接起来。我会为类似文件和类似数据库的读取保留单独的面向用户的功能。

结论

好的，现在写完了，我建议坚持使用 if-elif 解决方案。它以更少的模糊处理特殊情况，并且可以重构以调用 my_read_sql 类型的特殊处理程序，其行数与 READER_MAP 相同，但没有该变量。

此外:.endswith 堆栈在双重扩展方面更灵活(例如 .csv.gz)，这将更难以正确使用 READER_MAP 方法。

关于python - 在 Python 中自动读取文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68594858/

python - 在 Python 中自动读取文件

`if-elif` 稍微压缩并转换为函数

Hacky (?) 函数字典

结论

上一篇：JavaScript 和 HTML 自定义提示框

下一篇：perl - Pod::Usage: `=begin :text`/`=end :text` 弄乱了格式

python - 在 Python 中自动读取文件

if-elif 稍微压缩并转换为函数

Hacky (?) 函数字典

结论

上一篇：JavaScript 和 HTML 自定义提示框

下一篇：perl - Pod::Usage: `=begin :text`/`=end :text` 弄乱了格式

`if-elif` 稍微压缩并转换为函数