Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Citeseer document classification dataset, from :
Qing Lu, and Lise Getoor. "Link-based classification." ICML, 2003.
https://linqs.soe.ucsc.edu/data
The dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.
Synopsis
- stash :: FilePath -> IO ()
- sourceCiteseerGraphEdges :: (MonadResource m, MonadThrow m) => FilePath -> Map String (Int16, Seq Int16, CiteSeerDoc) -> ConduitT i (Maybe (Graph (ContentRow Int16 CiteSeerDoc))) m ()
- loadCiteseerGraph :: FilePath -> IO (Graph (ContentRow Int16 CiteSeerDoc))
- data CiteSeerDoc
1. Download the dataset
2. Reconstruct the citation graph
sourceCiteseerGraphEdges Source #
:: (MonadResource m, MonadThrow m) | |
=> FilePath | directory of data files |
-> Map String (Int16, Seq Int16, CiteSeerDoc) |
|
-> ConduitT i (Maybe (Graph (ContentRow Int16 CiteSeerDoc))) m () |
See sourceGraphEdges
:: FilePath | directory where the data files were saved |
-> IO (Graph (ContentRow Int16 CiteSeerDoc)) |
See loadGraph
Types
data CiteSeerDoc Source #
document classes of the Citeseer dataset