jacinda: Functional, expression-oriented data processing language

[ agpl, data, gpl, interpreters, language, library, program, text ] [ Propose Tags ]

APL meets AWK. A command-line tool for summarizing and reporting, powered by Rust's regex library.


[Skip to Readme]
Versions [RSS] [faq] 0.1.0.0, 0.2.0.0, 0.2.1.0
Change log CHANGELOG.md
Dependencies array, base (>=4.10.0.0 && <5), bytestring (>=0.11.0.0), containers, microlens, microlens-mtl, mtl, optparse-applicative, prettyprinter (>=1.7.0), recursion (>=1.0.0.0), regex-rure, text, transformers, vector [details]
License AGPL-3.0-only
Author Vanessa McHale
Maintainer vamchale@gmail.com
Category Language, Interpreters, Text, Data
Bug tracker https://github.com/vmchale/jacinda/issues
Source repo head: git clone https://github.com/vmchale/jacinda
Uploaded by vmchale at 2022-01-15T19:48:26Z
Distributions
Executables ja
Downloads 26 total (26 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs not available [build log]
All reported builds failed as of 2022-01-15 [all 2 reports]

Manual Flags

NameDescriptionDefault
cross

Enable to ease cross-compiling

Disabled
Automatic Flags
NameDescriptionDefault

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

For package maintainers and hackage trustees

Candidates


Readme for jacinda-0.2.1.0

[back to package description]

Jacinda is a functional, expression-oriented data processing language, complementing AWK.

Installation

Releases

There are binaries for some platforms on the releases page.

From Source

First, install Rust's regex library. You'll need to put librure.so or librure.dylib etc. in the appropriate place.

If you have cabal and GHC installed (perhaps via ghcup):

cabal install jacinda

Vim Plugin

There is a vim plugin.

SHOCK & AWE

ls -l | ja '(+)|0 {ix>1}{`5:i}'
curl -sL https://raw.githubusercontent.com/nychealth/coronavirus-data/master/latest/now-weekly-breakthrough.csv | \
    ja ',[1.0-x%y] {ix>1}{`5:} {ix>1}{`11:}' -F,

Documentation

See the guide, which contains a tutorial on some of the features as well as examples.

The manpages document the builtins and provide a syntax reference.

Status

The project is in alpha stage, it doesn't necessarily work and there are many missing features, but the language will remain stable.

It is worse than awk but it has its place and it avoids some of the painful imperative/scoping defects.

Missing Features & Bugs

  • sub/gsub function equivalents
  • No nested dfns
  • Obscure renamer edge cases during evaluation
  • Multiple folds are criminally inefficient
  • Documentation for tuples, Option type
  • printf formatting for floats
  • No list literal syntax
  • Typeclasses are not documented
  • Type system is questionable
  • Postfix :f and :i are handled poorly
  • File imports/includes
  • Various bugs in evaluation with regular expressions

Intentionally missing features:

  • No loops
  • No conditionals

The latter in particular I may add if necessary

Further Advantages

PERFORMANCE

Linux + x64

benchmarking bench/ja '(+)|0 {%/Bloom/}{1}' -i /tmp/ulysses.txt
time                 8.110 ms   (7.926 ms .. 8.304 ms)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 8.470 ms   (8.278 ms .. 8.771 ms)
std dev              693.0 μs   (437.4 μs .. 1.008 ms)
variance introduced by outliers: 47% (moderately inflated)

benchmarking bench/original-awk '/Bloom/ { total += 1; } END { print total }' /tmp/ulysses.txt
time                 13.24 ms   (13.04 ms .. 13.39 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 13.39 ms   (13.29 ms .. 13.49 ms)
std dev              256.0 μs   (197.8 μs .. 380.7 μs)

benchmarking bench/gawk '/Bloom/ { total += 1; } END { print total }' /tmp/ulysses.txt
time                 7.804 ms   (7.706 ms .. 7.931 ms)
                     0.996 R²   (0.991 R² .. 0.999 R²)
mean                 7.668 ms   (7.572 ms .. 7.783 ms)
std dev              303.4 μs   (229.7 μs .. 442.5 μs)
variance introduced by outliers: 17% (moderately inflated)

benchmarking bench/mawk '/Bloom/ { total += 1; } END { print total }' /tmp/ulysses.txt
time                 3.179 ms   (3.099 ms .. 3.240 ms)
                     0.997 R²   (0.995 R² .. 0.998 R²)
mean                 3.213 ms   (3.178 ms .. 3.270 ms)
std dev              148.9 μs   (97.11 μs .. 267.6 μs)
variance introduced by outliers: 29% (moderately inflated)

benchmarking bench/busybox awk '/Bloom/ { total += 1; } END { print total }' /tmp/ulysses.txt
time                 12.61 ms   (12.43 ms .. 12.77 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 12.98 ms   (12.86 ms .. 13.09 ms)
std dev              303.1 μs   (234.5 μs .. 396.2 μs)