Haskell 文本编码器

我是 Haskell 新手，希望得到一些指导来解决我的问题。我想要一个文本编码函数，该函数列出了文本中的每个单词都由其索引表示的列表。例如:

["The more I like, the more I love.","The more I love, the more I hate."]

输出可能是

   (["The", "more", "I", "like", "the", "love.", "love,", "hate."],
   [1, 2, 3, 4, 5, 2, 3, 6, 1, 2, 3, 7, 1, 2, 3, 8])

我已经完成了删除重复部分

removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
  where rdHelper seen [] = seen
          rdHelper seen (x:xs)
            | x `elem` seen = rdHelper seen xs
            | otherwise = rdHelper (seen ++ [x]) xs

最佳答案

您可以迭代单词列表并累积唯一单词及其索引。如果该元素位于累积列表中，则将索引附加到累积索引列表中。如果该元素不在列表中，则附加新索引(单词列表的长度 + 1)。

说实话，Haskell 代码比我的描述更容易理解:

import Data.List (findIndex)

build :: ([String], [Int]) -> String -> ([String], [Int])
build (words, indexes) word =
  let
    maybeIndex = findIndex (== word) words
  in
    case maybeIndex of
      Just index ->
        (words, indexes ++ [index + 1])
      Nothing ->
        (words ++ [word], indexes ++ [(+1) . length $ words])

buildIndexes =
  let
    listOfWords = words "The more I like, the more I love. The more I love, the more I hate."
  in
    foldl build ([], []) listOfWords

这里我有一个连接字符串作为输入

“我越喜欢，我就越爱。我越爱，我就越恨。”

请随意根据您的需要定制代码。

顺便说一下，在列表的开头插入元素然后反转结果列表可能会更高效。

import Data.List (findIndex)

build :: ([String], [Int]) -> String -> ([String], [Int])
build (words, indexes) word =
  let
    maybeIndex = findIndex (== word) words
  in
    case maybeIndex of
      Just index ->
        (words, (index + 1) : indexes)
      Nothing ->
        (word : words, ((+1) . length $ words) : indexes)

buildIndexes =
  let
    listOfWords = words "The more I like, the more I love. The more I love, the more I hate."
    (listOfUniqueWords, listOfIndexes) = foldl build ([], []) listOfWords
  in
    (reverse listOfUniqueWords, reverse listOfIndexes)

关于Haskell 文本编码器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45452696/

Haskell 文本编码器

上一篇：.net - 在 VB.NET 异步中使用 TASK 类

下一篇：php - 如何在 MAC 中安装 Laravel 的 mssql 驱动程序 (sqlsrv)？