Haskell 抓取 http-conduit 问题

标签 haskell

我一直在尝试编写我的第一个“真正的”haskell 程序,该程序旨在最终从 http://www.boxofficemojo.com/weekly/chart/?yr=2012&wk=52&p=.htm 形式的页面中抓取有关电影的信息。 。我为此所做的第一步是创建一个能够查询两个日期之间每周信息的函数。我想出的代码不起作用,并且错误消息有点超出了我当前的 haskell 能力。

代码:

import Network.HTTP.Conduit
import Data.Time.Clock
import Data.Time.Calendar.WeekDate
import Data.Time.Calendar (Day, addDays, fromGregorian)
import Control.Monad.Trans.Resource (runResourceT)

import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S


curDate :: IO Day
curDate = fmap utctDay getCurrentTime

dayToWkYr :: Day -> (S.ByteString, S.ByteString)
dayToWkYr day = (C.pack (show year), C.pack (show week))
                where (year, week, _) = toWeekDate day

mkDateList :: Day -> Day -> [Day] -> [Day]
mkDateList start end lst
    | start == end = lst
    | otherwise    = mkDateList (addWk start) end (start:lst)
    where addWk = addDays 7

getMovies' :: Manager -> [Day] -> [Response L.ByteString] -> [Response L.ByteString]
getMovies' manager (d:ds) results = runResourceT $ do
    let (year, week) = dayToWkYr d
    initreq <- parseUrl "http://boxofficemojo.com/weekly/chart/"
    let request = initreq { queryString = "?yr=" `S.append` year `S.append`
                                            "&wk=" `S.append` week}
    response <- httpLbs request manager
    getMovies' manager ds (response:results)

getMovies' _ [] results = results

错误:

scraper.hs:27:37:
    Couldn't match type `[]' with `IO'
    When using functional dependencies to combine
      Control.Monad.Trans.Control.MonadBaseControl [] [],
        arising from the dependency `m -> b'
        in the instance declaration in `Control.Monad.Trans.Control'
      Control.Monad.Trans.Control.MonadBaseControl IO [],
        arising from a use of `runResourceT' at scraper.hs:27:37-48
    In the expression: runResourceT
    In the expression:
      runResourceT
      $ do { let (year, week) = dayToWkYr d;
             initreq <- parseUrl "http://boxofficemojo.com/weekly/chart/";
             let request = ...;
             response <- httpLbs request manager;
             .... }

scraper.hs:33:5:
    Couldn't match type `[]'
                  with `Control.Monad.Trans.Resource.ResourceT []'
    Expected type: Control.Monad.Trans.Resource.ResourceT
                     [] (Response L.ByteString)
      Actual type: [Response L.ByteString]
    In the return type of a call of getMovies'
    In a stmt of a 'do' block:
      getMovies' manager ds (response : results)
    In the second argument of `($)', namely
      `do { let (year, week) = dayToWkYr d;
            initreq <- parseUrl "http://boxofficemojo.com/weekly/chart/";
            let request = ...;
            response <- httpLbs request manager;
            .... }'

如果有人能指出我做错了什么,我将非常感激!

最佳答案

我绝不是 Haskell 专家,但这就是我为了使其编译而进行的更改。

问题出在函数 getMovies' 上。 首先,返回类型应该是 IO [Response L.ByteString]。第二个问题在于您对管道资源 Monad 的处理,函数 runResourceT 返回您在管道流中所做的任何操作,在您的情况下应该是来自 httpLbs 请求管理器<的返回值/强>。因此,您需要将 getMovies' 的递归调用从 Resource monad 中移出。

getMovies' :: Manager -> [Day] -> [Response L.ByteString] -> IO [Response L.ByteString]
getMovies' manager (d:ds) results = do
  response <- runResourceT $ do  -- we get the response here instead
    let (year, week) = dayToWkYr d
    initreq <- parseUrl "http://boxofficemojo.com/weekly/chart/"
    let request = initreq { queryString = "?yr=" `S.append` year `S.append`
                                            "&wk=" `S.append` week}
    httpLbs request manager
  getMovies' manager ds (response:results)

getMovies' _ [] results = return results -- wrap results in the IO monad.

关于Haskell 抓取 http-conduit 问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18966772/

相关文章:

haskell - 在 Haskell 中构建循环列表的最便宜的方法

haskell - Ana-/Catamorphisms 只是更慢吗?

haskell - 这些表达式求值时的奇怪重用

unicode - 带有非英文字符的 Haskell IO

haskell - 为什么Haskell的Data.List中有两个 'reverse'定义

haskell - 函数列表中的通用类型

haskell - LiftIO、do block 和语法

haskell - "iterate"是否改变了应用函数的含义?

haskell - 为什么 foldr 使用辅助函数?

haskell - 创建采用的路径列表