python - python中的文件名链

标签 python string list file slice

我无法使用任何导入的库。我有这个任务,其中有一些目录包含一些文件;每个文件的第一行除了一些单词之外还包含下一个要打开的文件的名称。一旦打开目录中包含的每个文件的每个单词,就必须以返回单个字符串的方式处理它们;这样的字符串在其第一个位置包含以前见过的每个单词最常见的第一个字母,在其第二个位置包含最常见的第二个字母,依此类推。我已经设法使用包含 3 个文件的目录来做到这一点,但它没有使用任何类型的链式机制,而是使用局部变量的传递。我的一些大学同事建议我必须使用列表切片,但我不知道如何做。我无法使用任何导入的库。 这就是我得到的:

'''
    The objective of the homework assignment is to design and implement a function
    that reads some strings contained in a series of files and generates a new
    string from all the strings read.
    The strings to be read are contained in several files, linked together to
    form a closed chain. The first string in each file is the name of another
    file that belongs to the chain: starting from any file and following the
    chain, you always return to the starting file.
    
    Example: the first line of file "A.txt" is "B.txt," the first line of file
    "B.txt" is "C.txt," and the first line of "C.txt" is "A.txt," forming the 
    chain "A.txt"-"B.txt"-"C.txt".
    
    In addition to the string with the name of the next file, each file also
    contains other strings separated by spaces, tabs, or carriage return 
    characters. The function must read all the strings in the files in the chain
    and construct the string obtained by concatenating the characters with the
    highest frequency in each position. That is, in the string to be constructed,
    at position p, there will be the character with the highest frequency at 
    position p of each string read from the files. In the case where there are
    multiple characters with the same frequency, consider the alphabetical order.
    The generated string has a length equal to the maximum length of the strings
    read from the files.
    
    Therefore, you must write a function that takes as input a string "filename"
    representing the name of a file and returns a string.
    The function must construct the string according to the directions outlined
    above and return the constructed string.
    
    Example: if the contents of the three files A.txt, B.txt, and C.txt in the
    directory test01 are as follows
    
    
    test01/A.txt          test01/B.txt         test01/C.txt                                                                 
    -------------------------------------------------------------------------------
    test01/B.txt          test01/C.txt         test01/A.txt
    house                 home                 kite                                                                       
    garden                park                 hello                                                                       
    kitchen               affair               portrait                                                                     
    balloon                                    angel                                                                                                                                               
                                               surfing                                                               
    
    the function most_frequent_chars ("test01/A.txt") will return "hareennt".
    '''

        def file_names_list(filename):
            intermezzo = []
            lista_file = []
        
            a_file = open(filename)
        
            lines = a_file.readlines()
            for line in lines:
                intermezzo.extend(line.split())
            del intermezzo[1:]
            lista_file.append(intermezzo[0])
            intermezzo.pop(0)
            return lista_file
        
        
        def words_list(filename):
            lista_file = []
            a_file = open(filename)
        
            lines = a_file.readlines()[1:]
            for line in lines:
                lista_file.extend(line.split())
            return lista_file
        
        def stuff_list(filename):
            file_list = file_names_list(filename)
            the_rest = words_list(filename)
            second_file_name = file_names_list(file_list[0])
            
            
            the_lists = words_list(file_list[0]) and 
            words_list(second_file_name[0])
            the_rest += the_lists[0:]
            return the_rest
            
        
        def most_frequent_chars(filename):
            huge_words_list = stuff_list(filename)
            maxOccurs = ""
            list_of_chars = []
            for i in range(len(max(huge_words_list, key=len))):
                for item in huge_words_list:
                    try:
                        list_of_chars.append(item[i])
                    except IndexError:
                        pass
                    
                maxOccurs += max(sorted(set(list_of_chars)), key = list_of_chars.count)
                list_of_chars.clear()
            return maxOccurs
        print(most_frequent_chars("test01/A.txt"))

最佳答案

如果代码结构良好,则此分配相对容易。这是完整的实现:

def read_file(fname):
    with open(fname, 'r') as f:
        return list(filter(None, [y.rstrip(' \n').lstrip(' ') for x in f for y in x.split()]))

def read_chain(fname):
    seen   = set()
    new    =  fname
    result = []
    while not new in seen:
        A          = read_file(new)
        seen.add(new)
        new, words = A[0], A[1:]
        result.extend(words)
    return result

def most_frequent_chars (fname):
    all_words = read_chain(fname)
    result    = []
    for i in range(max(map(len,all_words))):
        chars = [word[i] for word in all_words if i<len(word)]
        result.append(max(sorted(set(chars)), key = chars.count))
    return ''.join(result)

print(most_frequent_chars("test01/A.txt"))
# output: "hareennt"

在上面的代码中,我们定义了3个函数:

  1. read_file:读取文件内容并返回字符串列表的简单函数。命令 x.split() 负责处理用于分隔单词的任何空格或制表符。最后一个命令 list(filter(None, arr)) 确保从解决方案中删除空字符串。

  2. read_chain:迭代文件链并返回其中包含的所有单词的简单例程。

  3. most_frequent_chars:简单的例程,其中最常见的字符会被仔细计算。


PS。您的这行代码非常有趣:

maxOccurs += max(sorted(set(list_of_chars)), key = list_of_chars.count)

我编辑了代码以包含它。


空间复杂度优化

如果扫描文件时不存储所有单词,则前面代码的空间复杂度可以提高几个数量级:

def scan_file(fname, database):
    with open(fname, 'r') as f:
        next_file = None
        for x in f:
            for y in x.split():
                if next_file is None:
                    next_file = y
                else:
                    for i,c in enumerate(y):
                        while len(database) <= i:
                            database.append({})
                        if c in database[i]:
                            database[i][c] += 1
                        else:
                            database[i][c]  = 1
        return next_file

def most_frequent_chars (fname):
    database  =  []
    seen      =  set()
    new       =  fname
    while not new in seen:
        seen.add(new)
        new  =  scan_file(new, database)
    return ''.join(max(sorted(d.keys()),key=d.get) for d in database)
print(most_frequent_chars("test01/A.txt"))
# output: "hareennt"

现在我们扫描跟踪数据库中字符频率的文件,而不存储中间数组。

关于python - python中的文件名链,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74500987/

相关文章:

c - 从 C 中的函数返回字符串

python - 如何在列表中的字典字符串中搜索单词?

python - 如何使用卡方检验从文档中计算关键术语?

python - 每个包 ("Could not find a version that satisfies the requirement"的“pip install”都失败)

java - 如何从网页中获取特定文本

python - Pandas DataFrame 将列表存储为字符串 : How to convert back to list

java - 正确的 lambda 过滤器实现

java.lang.ClassCastException : java. util.ArrayList 无法转换为 antlr.collections.List

python - 通过外部按钮管理摄像机

python - 如何使用 SQLAlchemy 和 Flask 对两个表执行自然连接?