The Sonnex package

[Tags:gpl, library, test]

This package implements an alternative to the Soundex algorithms for french language. It does so by approximating what the word should sound in french. Since it is very basic, it has no other dependencies than base.


[Skip to Readme]

Properties

Versions 0.1.0.0, 0.1.0.1, 0.1.0.2, 0.1.0.3
Dependencies base (==4.7.*) [details]
License GPL-3
Copyright Copyright © 2014 Frédéric BISSON
Author Frédéric BISSON
Maintainer zigazou@free.fr
Stability alpha
Category Text, Natural Language Processing
Home page https://github.com/Zigazou/Sonnex
Source repository head: git clone https://github.com/Zigazou/Sonnex.git
Uploaded Mon Dec 1 19:27:04 UTC 2014 by zigazou
Distributions NixOS:0.1.0.3
Downloads 769 total (5 in the last 30 days)
Votes
0 []
Status Docs uploaded by user
Build status unknown [no reports yet]

Modules

[Index]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees

Readme for Sonnex

Readme for Sonnex-0.1.0.3

Sonnex

Sonnex is an alternative to Soundex for french language

The string must contain only one word. The Sonnex code contains the following characters:

  • 1 ← un, ein, in, ain
  • 2 ← en, an
  • 3 ← on
  • a ← a, à, â
  • b ← b, bb
  • C ← ch
  • d ← d, dd
  • e ← e, eu
  • E ← ê, é, è, ai, ei
  • f ← f, ff, ph
  • g ← gu
  • i ← î, i, ille
  • j ← j, ge
  • k ← k, c, qu, ck
  • l ← l, ll
  • m ← m, mm
  • n ← n, nn
  • o ← o, ô
  • p ← p, pp
  • r ← r, rr
  • s ← s, ss
  • t ← t, tt
  • u ← u, ù, û
  • v ← v, w
  • z ← z, s
  • U ← ou

Examples

Here are a few examples of sonnex results:

  • balade | ballade → balad
  • basilic | basilique → bazilik
  • boulot | bouleau → bUlo
  • cane | canne → kan
  • censé | sensé → s2sé
  • compte | comte | conte → k3t
  • cygne | signe → sin
  • date | datte → dat
  • dessin | dessein → dés1
  • différend | différent → difér2
  • cric | crique → krik
  • champ | chant → C2

Testing

The test directory contains two files:

  • test-sonnex-homonymes.txt: a list of french homonyms
  • test-sonnex-streets.txt: a list of streets from Rouen (Normandy)

You can try these files against Sonnex by using the following command:

cat test-sonnex-homonymes.txt | runhaskell test-sonnex.hs