haggressive-0.1.0.0: Aggression analysis for Tweets on Twitter

LicenseNone
MaintainerVolker Strobel (volker.strobel87@gmail.com)
Stabilityexperimental
PortabilityNone
Safe HaskellNone
LanguageHaskell2010

Hag

Description

This module is the main interface for Tweet classification.

Synopsis

Documentation

type FeatureMap = Map String Float Source

Features are represented by a Map, where the keys are Strings (e.g., the words in the message of a Tweet) and the values are Floats (e.g., the number of occurrence of a word)

parseCsv :: Text -> Either String (Vector Tweet) Source

IO and Parsing

parseCsv parses a Text input for fields in CSV format and returns a Vector of Tweets

getFiles :: FilePath -> IO [FilePath] Source

Get directory contents of FilePath. A better variant is at:

extractFeatures :: Tweet -> FeatureMap Source

Extract features (for the bag of words) for one Tweet. Thereby, the Tweet will be (in order of application): * tokenized * converted to a Vector * Strings will be converted to lowercase * Strings that are not isAlpha are removed * Strings that are element of stopWords are removed * Empty Strings will be removed

frequency :: Vector String -> FeatureMap Source

Calculate the frequency of items in a Vector and return them in a Map.

countItem :: Map String Float -> String -> FeatureMap Source

Insert an item into a Map. Default value is 1 if the item is not existing. If the item is already existing, its frequency will be increased by 1.

insertInMap :: Map Tweet FeatureMap -> Tweet -> Map Tweet FeatureMap Source

Take a Map, consisting of key: Tweet value: FeatureMap and one Tweet and create a new Map with the added features from the Tweet

getNeighbors :: (Vector Tweet, Vector Tweet) -> Vector (Tweet, PSQ Tweet Float) Source

Compare two vectors of Tweets, the first is the test vector, the second the train vector and return the all neighbors for each Tweet. grandDict is a Map, where each entry consits of a Tweet and its features

featureIntersection :: Map Tweet FeatureMap -> Tweet -> (Tweet, PSQ Tweet Float) Source

Take a dictionary and a Tweet and return a pair of this Tweet and all its nearest neighbors

mergeTweetFeatures :: (FeatureMap -> FeatureMap -> Float) -> Tweet -> Tweet -> FeatureMap -> Binding Tweet Float Source

Take a distance function, Tweet 1, Tweet 2 and a dictionary as FeatureMap and create a Binding between Tweet 2 and the distance from this Tweet to the other Tweet.

cosineDistance :: FeatureMap -> FeatureMap -> Float Source

Take the features of two Tweets and return the distance as Num.

idftf :: FeatureMap -> FeatureMap -> FeatureMap Source

Takes a dictionary and a mini dictionary (frequency of words in one Tweet) and calculates the idftf values for all words in the mini dictionary.

compareLabels :: Int -> Vector (Tweet, PSQ Tweet Float) -> Vector Float Source

Calculate the amount of tweets where the predicted label matches the actual label.

getLabel :: Int -> PSQ Tweet Float -> String Source

Get the label for a Tweet by looking at the k nearest neighbors. If there are more aggressive than non_aggressive Tweets, the label will be aggressive, otherwise, it will be non-aggressive.

getAccuracy :: Vector Float -> Float Source

Get sum total of a vector of floats (i.e., the number of correctly classified tweets) and return the accuracy

main :: IO () Source