我无法使用任何导入的库。我有这个任务,其中有一些目录包含一些文件;每个文件的第一行除了一些单词之外还包含下一个要打开的文件的名称。一旦打开目录中包含的每个文件的每个单词,就必须以返回单个字符串的方式处理它们;这样的字符串在其第一个位置包含以前见过的每个单词最常见的第一个字母,在其第二个位置包含最常见的第二个字母,依此类推。我已经设法使用包含 3 个文件的目录来做到这一点,但它没有使用任何类型的链式机制,而是使用局部变量的传递。我的一些大学同事建议我必须使用列表切片,但我不知道如何做。我无法使用任何导入的库。 这就是我得到的:
'''
The objective of the homework assignment is to design and implement a function
that reads some strings contained in a series of files and generates a new
string from all the strings read.
The strings to be read are contained in several files, linked together to
form a closed chain. The first string in each file is the name of another
file that belongs to the chain: starting from any file and following the
chain, you always return to the starting file.
Example: the first line of file "A.txt" is "B.txt," the first line of file
"B.txt" is "C.txt," and the first line of "C.txt" is "A.txt," forming the
chain "A.txt"-"B.txt"-"C.txt".
In addition to the string with the name of the next file, each file also
contains other strings separated by spaces, tabs, or carriage return
characters. The function must read all the strings in the files in the chain
and construct the string obtained by concatenating the characters with the
highest frequency in each position. That is, in the string to be constructed,
at position p, there will be the character with the highest frequency at
position p of each string read from the files. In the case where there are
multiple characters with the same frequency, consider the alphabetical order.
The generated string has a length equal to the maximum length of the strings
read from the files.
Therefore, you must write a function that takes as input a string "filename"
representing the name of a file and returns a string.
The function must construct the string according to the directions outlined
above and return the constructed string.
Example: if the contents of the three files A.txt, B.txt, and C.txt in the
directory test01 are as follows
test01/A.txt test01/B.txt test01/C.txt
-------------------------------------------------------------------------------
test01/B.txt test01/C.txt test01/A.txt
house home kite
garden park hello
kitchen affair portrait
balloon angel
surfing
the function most_frequent_chars ("test01/A.txt") will return "hareennt".
'''
def file_names_list(filename):
intermezzo = []
lista_file = []
a_file = open(filename)
lines = a_file.readlines()
for line in lines:
intermezzo.extend(line.split())
del intermezzo[1:]
lista_file.append(intermezzo[0])
intermezzo.pop(0)
return lista_file
def words_list(filename):
lista_file = []
a_file = open(filename)
lines = a_file.readlines()[1:]
for line in lines:
lista_file.extend(line.split())
return lista_file
def stuff_list(filename):
file_list = file_names_list(filename)
the_rest = words_list(filename)
second_file_name = file_names_list(file_list[0])
the_lists = words_list(file_list[0]) and
words_list(second_file_name[0])
the_rest += the_lists[0:]
return the_rest
def most_frequent_chars(filename):
huge_words_list = stuff_list(filename)
maxOccurs = ""
list_of_chars = []
for i in range(len(max(huge_words_list, key=len))):
for item in huge_words_list:
try:
list_of_chars.append(item[i])
except IndexError:
pass
maxOccurs += max(sorted(set(list_of_chars)), key = list_of_chars.count)
list_of_chars.clear()
return maxOccurs
print(most_frequent_chars("test01/A.txt"))
最佳答案
如果代码结构良好,则此分配相对容易。这是完整的实现:
def read_file(fname):
with open(fname, 'r') as f:
return list(filter(None, [y.rstrip(' \n').lstrip(' ') for x in f for y in x.split()]))
def read_chain(fname):
seen = set()
new = fname
result = []
while not new in seen:
A = read_file(new)
seen.add(new)
new, words = A[0], A[1:]
result.extend(words)
return result
def most_frequent_chars (fname):
all_words = read_chain(fname)
result = []
for i in range(max(map(len,all_words))):
chars = [word[i] for word in all_words if i<len(word)]
result.append(max(sorted(set(chars)), key = chars.count))
return ''.join(result)
print(most_frequent_chars("test01/A.txt"))
# output: "hareennt"
在上面的代码中,我们定义了3个函数:
read_file
:读取文件内容并返回字符串列表的简单函数。命令 x.split() 负责处理用于分隔单词的任何空格或制表符。最后一个命令list(filter(None, arr))
确保从解决方案中删除空字符串。read_chain
:迭代文件链并返回其中包含的所有单词的简单例程。most_frequent_chars
:简单的例程,其中最常见的字符会被仔细计算。
PS。您的这行代码非常有趣:
maxOccurs += max(sorted(set(list_of_chars)), key = list_of_chars.count)
我编辑了代码以包含它。
空间复杂度优化
如果扫描文件时不存储所有单词,则前面代码的空间复杂度可以提高几个数量级:
def scan_file(fname, database):
with open(fname, 'r') as f:
next_file = None
for x in f:
for y in x.split():
if next_file is None:
next_file = y
else:
for i,c in enumerate(y):
while len(database) <= i:
database.append({})
if c in database[i]:
database[i][c] += 1
else:
database[i][c] = 1
return next_file
def most_frequent_chars (fname):
database = []
seen = set()
new = fname
while not new in seen:
seen.add(new)
new = scan_file(new, database)
return ''.join(max(sorted(d.keys()),key=d.get) for d in database)
print(most_frequent_chars("test01/A.txt"))
# output: "hareennt"
现在我们扫描跟踪数据库中字符频率的文件,而不存储中间数组。
关于python - python中的文件名链,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74500987/