feed-collect-0.2.0.2: Watch RSS/Atom feeds (and do with them whatever you like).

Safe HaskellNone
LanguageHaskell2010

Web.Feed.Collect

Contents

Description

Intro

This module allows you to run a loop which visits web feeds (RSS, Atom, etc.) periodically and reports new items using a function you provide. You can also insert control commands into the loop, such as adding and removing a feed, so that stopping and restarting it isn't required.

Both http and https URLs are supported.

The original use case which motivated the creation of this library is FunBot.

Running

The run function runs the loop, and it takes a collector callback function (as one of its parameters) to be called when new feed items are found.

Here is a simple usage example.

import Data.Default.Class (def)
import Data.Time.Interval
import Data.Time.Units
import Web.Feed.Collect

collect :: Label -> URL -> Feed -> Item -> IO ()
collect label url feed item = do
    putStrLn $ label ++ " : " ++ url
    putStrLn "Got a new feed item!"
    putStrLn $ showFeed feed
    putStrLn $ showItem item

logError :: Label -> Error -> IO ()
logError l e = putStrLn $ l ++ " : " ++ show e

feeds :: [(Label, Url)]
feeds = [("democ-now", "http://www.democracynow.org/democracynow.rss")]

main :: IO ()
main = run def
    { wcCollect       = collect
    , wcLogError      = logError
    , wcVisitInterval = time (1 :: Minute)
    , wcMaxItems      = 3
    , wcFeeds         = fromPairs feeds
    }

For quick testing, you can use one of the predefined collector functions. For example, there is collectorNull which discards the feed items and does nothing (you can use it e.g. when testing success of HTTP requests or debugging the library itself), and collectorPretty which writes a short nicely formatted entry to stdout for each new feed item. And there are more.

If new items aren't being detected correctly, you can enable debugging through the wcDebug field. You can get detailed logs of the detection process. You can tail -f a log file from your terminal to watch new log entries get appended to it.

Using Control Commands

Now let's see how to push control commands into the loop. The wcCommandQueue field is an optional command queue for making changes while the watcher runs. In the example above we didn't provide one. Now let's provide a queue and use it.

Suppose we are writing a program with a command-line interface. It watches news feeds in the background, and can take commands from the user at the same time on the command-line. Feeds can be added, removed, etc. without restarting the program.

Assume we've written a function named parseCommand, which takes a line of user input and returns a command ready to push into the queue. The parsed commands are of type Command and are created using functions like addFeed, removeFeed and so on, which this module provides. Using our parseCommand we can write the program like this:

import Control.Concurrent (forkIO)
import Data.Default.Class (def)
import Data.Time.Interval
import Data.Time.Units
import System.IO
import Web.Feed.Collect

collect :: Label -> URL -> Feed -> Item -> IO ()
collect = collectPretty

feed :: FeedConfig
feed = mkFeed "fsf-news" "https://www.fsf.org/static/fsforg/rss/news.xml"

parseCommand :: String -> Maybe Command
parseCommand line = {- ... - }

main :: IO ()
main = do
    cqueue <- newCommandQueue
    forkIO $ run def
        { wcCollect      = collect
        , wcCommandQueue = Just cqueue
        , wcFeeds        = [feed]
        }
    let loop = do
        line <- getLine
        if line == "quit"
            then putStrLn "Bye!"
            else do
                case parseCommand line of
                    Just cmd -> sendCommand cqueue cmd
                    Nothing  -> putStrLn "Invalid input"
                loop
    loop

Synopsis

Misc Types

type Label = String Source

A short name tag for a feed, for quick reference and internal use.

type Url = String Source

An HTTP or HTTPS URL of a feed or a feed item.

data Error Source

An error occuring while reading from a news feed.

Constructors

HttpError HttpException

Error while creating an HTTP request or receiving a response.

FeedParsingFailed Url

Error while parsing the HTTP response body into feed content

Instances

Specifying Feeds

data FeedConfig Source

Details of a news feed to be watched. Some of these are just initial settings which you can change while the watcher runs, using the command queue.

You can use def and override fields a needed. In that case you should specify at least the label and URL! Alternatively, use mkFeed or fromPairs.

fcLabel :: FeedConfig -> Label Source

A short name tag by which you can refer to the feed.

fcUrl :: FeedConfig -> Url Source

The HTTP or HTTPS URL of the feed.

fcActive :: FeedConfig -> Bool Source

Whether the feed should initially be active. Disabled feeds aren't visited (except once to remember the old feed items). You can enable and disable feeds later, while the watcher runs, using the command queue.

fcDebug :: FeedConfig -> Bool Source

Whether per-feed debug logs should be written for this feed. These are detailed messages listing the results of each step of the new item detection process. You can later toggle this setting per feed using the command queue.

mkFeed :: Label -> Url -> FeedConfig Source

Initialize feed details from a feed label and its URL.

fromPairs :: [(Label, Url)] -> [FeedConfig] Source

Create a list of feed configs from a list of label-url pairs.

Watcher Configuration

data WatcherConfig m Source

Feed watcher behavior description. To create one, use def and override fields as needed.

Instances

wcCollect :: WatcherConfig m -> Label -> Url -> Feed -> Item -> m () Source

Collector, i.e. action to perform when receiving a feed item.

wcCollectMany :: WatcherConfig m -> Maybe (Label -> Url -> Feed -> [Item] -> m ()) Source

Action to perform when receiving feed items. This is just a chance to provide an efficient shortcut instead of repeated use of wcCollect. If there is no such shortcut, simply pass Nothing.

wcLogError :: WatcherConfig m -> Label -> Error -> m () Source

Error logging action. It will be called when an error occurs while trying to download and parse a feed. Such an error doesn't cause anything to stop. Execution simply goes on to read the next feed, and will try the erronous feed again in the next round.

wcCommandQueue :: WatcherConfig m -> Maybe CommandQueue Source

A command queue you can use to change settings while the program runs. For example, you can add a new feed or change the interval between polls without relaunching run.

wcVisitInterval :: WatcherConfig m -> TimeInterval Source

Time interval between visits of a watched feed.

wcMaxItems :: WatcherConfig m -> Int Source

Maximal number of items to collect per visit (if more are available, they will be collected in the next visit).

wcFeeds :: WatcherConfig m -> [FeedConfig] Source

List of feeds and their configuration details.

wcDebug :: WatcherConfig m -> DebugConfig Source

Debugging options. Useful when new items aren't being detected as expected.

Collectors

collectorNull :: Label -> Url -> Feed -> Item -> IO () Source

An item collector which discards the item, i.e. does nothing.

collectorPrint :: Label -> Url -> Feed -> Item -> IO () Source

An item collector which prints feed and item fields to stdout.

collectorPretty :: Label -> Url -> Feed -> Item -> IO () Source

An item collector which prints short friendly feed and item descriptions to stdout.

collectorLog :: (Label -> FilePath) -> Label -> Url -> Feed -> Item -> IO () Source

An item collector which writes friendly descriptions into a log file, determining the log file's name using the function given as the first argument (the feed label, i.e. second argument, is also used as the argument for that function).

Running

run :: MonadIO m => WatcherConfig m -> m () Source

Watch feeds and perform the given action on received feed items.

Control Commands

data CommandQueue Source

A queue of control commands for the run loop to execute while running.

Commands can be pushed concurrently from different threads, but there should be only one run invocation reading from the queue (i.e. although this is unlikely anyway, don't pass the same queue to multiple concurrent run calls).

newCommandQueue :: IO CommandQueue Source

Create a new empty command queue.

sendCommand :: CommandQueue -> Command -> IO () Source

Send a command into the queue.

sendCommands :: CommandQueue -> [Command] -> IO () Source

Send a series of commands into the queue.

data Command Source

A control command sent to the feed watching loop, and affecting its behavior.

addFeed :: FeedConfig -> Command Source

Add a new feed to watch.

removeFeed :: Label -> Command Source

Remove a previously added feed.

setFeedActive :: Label -> Bool -> Command Source

Set whether a given feed should be watched (active) or not (inactive).

setInterval :: TimeUnit t => t -> Command Source

Set the interval, in microseconds, between feed scans.

setMaxPerVisit :: Int -> Command Source

Set the maximal number of feed items to be collected per feed per scan. If more new items are found, they wait for the next scan.

Utilities and Debugging

showFeed :: Feed -> String Source

A short one-line description of a feed.

showItem :: Item -> String Source

A short one-line description of a feed item.

data DebugConfig Source

Debugging options. Useful when new items aren't being detected as expected. The Default instance has all debugging off. These are just the initial settings: you can change some of them while the feed watcher runs, using the command queue.

Since debugging is expected to be used only during development, changes to the fields of this type will cause only a micro version change in the package version. If you think some options here are useful for more than debugging, please contact the author.

Constructors

DebugConfig 

Fields

dcDebugCycle :: Bool

Whether to log a summary of the watcher cycle. The log lists for each iteration which feeds are active and which aren't, whether items were detected for the active feeds, and whether any command came through the command queue.

dcGeneralLog :: FilePath

File into which to write debug messages which aren't specific to a single feed. The cycle debug log (see dcDebugCycle) is written into this file.

dcFeedLog :: Label -> FilePath

A function used to determine the detection debug log file for a given feed.

setFeedDebug :: Label -> Bool -> Command Source

Enable or disable debug logs for a given feed.

setGeneralDebug :: Bool -> Command Source

Enable or disable general (i.e. not feed-specific) debug logs.