The Sonnex package

[ Tags: gpl, library, natural-language-processing, text ] [ Propose Tags ]

This package implements an alternative to the Soundex algorithms for french language. It does so by approximating what the word should sound in french. Since it is very basic, it has no other dependencies than base.


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3
Dependencies base (==4.7.*) [details]
License GPL-3
Copyright Copyright © 2014 Frédéric BISSON
Author Frédéric BISSON
Maintainer zigazou@free.fr
Category Text, Natural Language Processing
Home page https://github.com/Zigazou/Sonnex
Source repository head: git clone https://github.com/Zigazou/Sonnex.git
Uploaded Mon Dec 1 19:27:04 UTC 2014 by zigazou
Distributions NixOS:0.1.0.3
Downloads 860 total (6 in the last 30 days)
Rating 0.0 (0 ratings) [clear rating]
  • λ
  • λ
  • λ
Status Docs uploaded by user
Build status unknown [no reports yet]
Hackage Matrix CI

Modules

[Index]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for Sonnex-0.1.0.3

[back to package description]

Sonnex

Sonnex is an alternative to Soundex for french language

The string must contain only one word. The Sonnex code contains the following characters:

  • 1 ← un, ein, in, ain
  • 2 ← en, an
  • 3 ← on
  • a ← a, à, â
  • b ← b, bb
  • C ← ch
  • d ← d, dd
  • e ← e, eu
  • E ← ê, é, è, ai, ei
  • f ← f, ff, ph
  • g ← gu
  • i ← î, i, ille
  • j ← j, ge
  • k ← k, c, qu, ck
  • l ← l, ll
  • m ← m, mm
  • n ← n, nn
  • o ← o, ô
  • p ← p, pp
  • r ← r, rr
  • s ← s, ss
  • t ← t, tt
  • u ← u, ù, û
  • v ← v, w
  • z ← z, s
  • U ← ou

Examples

Here are a few examples of sonnex results:

  • balade | ballade → balad
  • basilic | basilique → bazilik
  • boulot | bouleau → bUlo
  • cane | canne → kan
  • censé | sensé → s2sé
  • compte | comte | conte → k3t
  • cygne | signe → sin
  • date | datte → dat
  • dessin | dessein → dés1
  • différend | différent → difér2
  • cric | crique → krik
  • champ | chant → C2

Testing

The test directory contains two files:

  • test-sonnex-homonymes.txt: a list of french homonyms
  • test-sonnex-streets.txt: a list of streets from Rouen (Normandy)

You can try these files against Sonnex by using the following command:

cat test-sonnex-homonymes.txt | runhaskell test-sonnex.hs