python - 尝试在固定长度的多行上打印单个字符串并最小化成本

标签 python algorithm

首先是一些背景知识我刚开始学习算法(我现在觉得我缺乏擅长的逻辑和推理能力)我一直在尝试将“This is a sample text”打印到不同的行中,每行最多 7 个字符行所以第一行将有:

this is  (no spaces left in the end so cost 0)
a  
[cost=6*6*6(The spaces left at the end of each line are cubed which will be the cost) ]
sample [cost=1*1*1]
text [cost= 3*3*3]

(Total cost = 0+216+1+27=244)

现在这可以优化

this [cost 3*3*3]
is a [cost 3*3*3]
sample [cost 1*1*1]
text [cost 3*3*3]

[Total cost = 27+27+1+27 = 82]

很明显我们不能在这里使用贪婪的方法而是使用动态规划,但我的问题是我无法弄清楚将被重用的子结构。我真的很难弄清楚如何将成本条件与 python 中的打印联系起来,我可以为每个单词编制索引,我可以获得每个单词的长度,有点卡在我接下来要做的事情上打印时发生的一切都是整个每个字符串打印在一行上(这是我到目前为止所得到的)。 如果这是一个非常愚蠢的问题,我深表歉意,但我被困住了,真的需要一些帮助。 谢谢


这就是我尝试实现代码的方式,虽然我尝试对代码运行一些测试,测试是由我的 friend 编写的,但我认为我没有做对任何帮助或建议表示赞赏 打印测试.py

 import os
 import sys
 from glob import glob

  #TODO -- replace this with your solution 
 from printing import print_neatly

 log = open('output.log', 'w')

 #This tests the code against my own text
 maxline = 80
 for source in glob('*.txt'):
 with open(source) as f:
    fulltext = f.read()

 words = fulltext.split()
 (cost, text) = print_neatly(words, maxline)

 #double check the cost
 #lines = text.split('\n')
 truecost = 0
 for line in text[0:-1]:
    truecost += (maxline - len(line))**3


   #print the output and cost
   print >>log, '----------------------'
   print >>log, source
   print >>log, '----------------------'
   print >>log, text
   print >>log, '----------------------'
   print >>log, 'cost = ', cost
   print >>log, 'true cost = ', truecost
   print >>log, '----------------------'


log.close()

#print the log
with open('output.log') as f: print f.read()

打印.py

def print_neatly(wordlist, max):
   #strings='This is a sample string'

   #splitting the string and taking out words from it 
   #wordlist=strings.split()
   (cost, dyn_print) = print_line(wordlist, len(wordlist), max)
   for dyn in dyn_print:
      print dyn
   return cost, dyn_print

 def cost(lines, max):

    return sum([(max-len(x)) ** 3 for x in lines])

 def print_line(wordlist, count, max, results = {}):
  results = [([],0)]
  for count in range(1, len(wordlist) + 1):
    best = wordlist[:count]               
    best_cost = cost(best, max)
    mycount = count - 1
    line = wordlist[mycount]       
    while len(line) <= max: 
        attempt, attempt_cost = results[mycount]
        attempt = attempt + [line]
        attempt_cost += cost([line],max)
        if attempt_cost < best_cost:
            best = attempt
            best_cost = attempt_cost
        if mycount > 0:
            mycount -= 1
            line = wordlist[mycount] + ' ' + line
        else:
            break
    results += [(best, best_cost)]

 #print best
 #print best_cost
 return (best_cost, best)


#print_neatly(0,7)

需要测试的文本文件给了我这个输出,这里的两个成本需要相同,但我没有得到,有人能指出我哪里出错了


成本 = 16036

真实成本 = 15911

最佳答案

一种方法是列出所有可能的备选方案并选择成本最低的一个:

from functools import wraps

def cache(origfunc):
    d = {}
    @wraps(origfunc)
    def wrapper(*args):
        if args in d:
            return d[args]
        result = origfunc(*args)
        d[args] = result
        return result
    return wrapper

@cache
def alternatives(t, m=7):
    ''' Given a tuple of word lengths and a maximum line length,
        return a list of all possible line groupings
        showing the total length of each line.

        >>> alternatives((4, 2, 1, 3), 7)
        [[4, 2, 1, 3], [4, 2, 5], [4, 4, 3], [7, 1, 3], [7, 5]]

    '''
    if not t:
        return []
    alts = []
    s = 0
    for i, x in enumerate(t):
        s += x
        if s > m:
            break
        tail = t[i+1:]
        if not tail:
            alts.append([s])
            break
        for subalt in alternatives(tail, m):
            alts.append([s] + subalt)
        s += 1
    return alts

def cost(t, m=7):
    ''' Evaluate the cost of lines given to line lengths

            >>> cost((7, 1, 6, 4), m=7)  # 'this is', 'a', 'sample', 'text'
            244
            >>> cost((4, 4, 6, 4))       # 'this', 'is a', 'sample', 'text'
            82

    '''
    return sum((m - x) ** 3 for x in t)

def textwrap(s, m=7):
    ''' Given a string, result a list of strings with optimal line wrapping

        >>> print textwrap('This is a sample text', 7)
        ['This', 'is a', 'sample', 'text']

    '''
    words = s.split()
    t = tuple(map(len, words))
    lengths = min(alternatives(t, m), key=cost)
    result = []
    worditer = iter(words)
    for length in lengths:
        line = []
        s = 0
        while s < length:
            word = next(worditer)
            line.append(word)
            s += len(word) + 1
        result.append(' '.join(line))
    return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

可以通过限制备选搜索的数量(可能限制为每行中三个最长的备选)来加快代码速度。

关于python - 尝试在固定长度的多行上打印单个字符串并最小化成本,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7860461/

相关文章:

algorithm - 使用 Bhattacharyya 距离进行特征选择

java - 我在 map 上有一条路。如何将其转换为函数?

javascript - Python - 如何将多重赋值更改为 JavaScript 语法并获得给定字符串的正确排列

python - 如何使用 t-SNE 进行降维以可视化我的 300 维词嵌入?

Python 日志记录 : INFO, DEBUG 日志未显示

python - 全日历和 django

c++ - 无法访问构造函数定义之外的变量

python - 在 python 中调试 argparse

algorithm - 存储大量唯一字符串的最快方法是什么?

python - 使用 DFS 搜索图形时出现 KeyError