| License | None |
|---|---|
| Maintainer | Volker Strobel (volker.strobel87@gmail.com) |
| Stability | experimental |
| Portability | None |
| Safe Haskell | None |
| Language | Haskell2010 |
Hag
Description
This module is the main interface for Tweet classification.
- type FeatureMap = Map String Float
- parseCsv :: Text -> Either String (Vector Tweet)
- getFiles :: FilePath -> IO [FilePath]
- extractFeatures :: Tweet -> FeatureMap
- frequency :: Vector String -> FeatureMap
- countItem :: Map String Float -> String -> FeatureMap
- insertInMap :: Map Tweet FeatureMap -> Tweet -> Map Tweet FeatureMap
- getNeighbors :: (Vector Tweet, Vector Tweet) -> Vector (Tweet, PSQ Tweet Float)
- featureIntersection :: Map Tweet FeatureMap -> Tweet -> (Tweet, PSQ Tweet Float)
- mergeTweetFeatures :: (FeatureMap -> FeatureMap -> Float) -> Tweet -> Tweet -> FeatureMap -> Binding Tweet Float
- cosineDistance :: FeatureMap -> FeatureMap -> Float
- idftf :: FeatureMap -> FeatureMap -> FeatureMap
- iFrequency :: FeatureMap -> String -> Float -> Float
- compareLabels :: Int -> Vector (Tweet, PSQ Tweet Float) -> Vector Float
- compareLabelsForScheme :: [Vector (Tweet, PSQ Tweet Float)] -> Int -> [Float]
- getLabel :: Int -> PSQ Tweet Float -> String
- getAccuracy :: Vector Float -> Float
- main :: IO ()
Documentation
type FeatureMap = Map String Float Source
getFiles :: FilePath -> IO [FilePath] Source
Get directory contents of FilePath. A better variant is at:
extractFeatures :: Tweet -> FeatureMap Source
Extract features (for the bag of words) for one Tweet.
Thereby, the Tweet will be (in order of application):
* tokenized
* converted to a Vector
* Strings will be converted to lowercase
* Strings that are not isAlpha are removed
* Strings that are element of stopWords are removed
* Empty Strings will be removed
frequency :: Vector String -> FeatureMap Source
countItem :: Map String Float -> String -> FeatureMap Source
Insert an item into a Map. Default value is 1 if the item is
not existing. If the item is already existing, its frequency will
be increased by 1.
insertInMap :: Map Tweet FeatureMap -> Tweet -> Map Tweet FeatureMap Source
Take a Map, consisting of
key: Tweet
value: FeatureMap
and one Tweet and create a new Map with the added features
from the Tweet
featureIntersection :: Map Tweet FeatureMap -> Tweet -> (Tweet, PSQ Tweet Float) Source
mergeTweetFeatures :: (FeatureMap -> FeatureMap -> Float) -> Tweet -> Tweet -> FeatureMap -> Binding Tweet Float Source
cosineDistance :: FeatureMap -> FeatureMap -> Float Source
idftf :: FeatureMap -> FeatureMap -> FeatureMap Source
Takes a dictionary and a mini dictionary (frequency of words in one Tweet) and calculates the idftf values for all words in the mini dictionary.
iFrequency :: FeatureMap -> String -> Float -> Float Source
compareLabels :: Int -> Vector (Tweet, PSQ Tweet Float) -> Vector Float Source
Calculate the amount of tweets where the predicted label matches the actual label.
getAccuracy :: Vector Float -> Float Source
Get sum total of a vector of floats (i.e., the number of correctly classified tweets) and return the accuracy