cgrep: Command line tool

[ gpl, program, utils ] [ Propose Tags ] [ Report a vulnerability ]

Cgrep: a context-aware grep for source codes


[Skip to Readme]

Flags

Automatic Flags
NameDescriptionDefault
enable_pcre

"Use PCRE regex engine (default: disabled)"

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 6.4, 6.4.1, 6.4.2, 6.4.3, 6.4.3.1, 6.4.4, 6.4.5, 6.4.6, 6.4.7, 6.4.8, 6.4.9, 6.4.10, 6.4.11, 6.4.12, 6.4.13, 6.4.14, 6.4.15, 6.4.16, 6.4.17, 6.4.18, 6.4.19, 6.4.20, 6.4.21, 6.4.22, 6.5.0, 6.5.1, 6.5.2, 6.5.3, 6.5.4, 6.5.5, 6.5.6, 6.5.7, 6.5.8, 6.5.9, 6.5.10, 6.5.11, 6.5.12, 6.5.13, 6.5.15, 6.6, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.7, 6.6.8, 6.6.9, 6.6.10, 6.6.11, 6.6.12, 6.6.13, 6.6.14, 6.6.15, 6.6.16, 6.6.17, 6.6.20, 6.6.22, 6.6.23, 6.6.24, 6.6.25, 6.6.30, 6.6.32, 8.0.0, 8.1.0, 9.0.0
Dependencies aeson, ansi-terminal, array, async, atomic-primops, base (>=4.15.0.0 && <4.16), bitarray, bitwise, bytestring, bytestring-strict-builder, clock, concurrency, containers, deepseq, directory, dlist, either, exceptions, extra, filepath, ghc-prim, mmap, monad-loops, mtl, optparse-applicative, os-string, process, regex-base, regex-pcre-text, regex-tdfa, safe, split, stm, stringsearch, template-haskell, text, transformers, unicode-show, unix-compat, unordered-containers, utf8-string, vector, yaml [details]
License GPL-2.0-or-later
Author Nicola Bonelli
Maintainer Nicola Bonelli <nicola@larthia.com>
Category Utils
Home page http://awgn.github.io/cgrep/
Uploaded by NicolaBonelli at 2025-11-09T10:28:58Z
Distributions Arch:8.1.0, NixOS:8.1.0
Reverse Dependencies 1 direct, 0 indirect [details]
Executables cgrep
Downloads 48722 total (87 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs not available [build log]
All reported builds failed as of 2025-11-09 [all 2 reports]

Readme for cgrep-9.0.0

[back to package description]

CGrep: a context-aware grep for source codes

Hackage Join the chat at https://gitter.im/awgn/cgrep

Version 9.0.0 - A powerful, context-aware search tool designed specifically for source code.

CGrep extends the capabilities of traditional grep by understanding the structure and semantics of source code across multiple programming languages. It allows developers to search within specific contexts like code, comments, or string literals, and provides advanced pattern matching with semantic awareness.


What's New in Version 9.0

🚀 Major Performance Improvements

  • 75% faster plain token search - Dramatically improved performance for standard token-based searches
  • 39% faster semantic search - Significant speed boost for advanced semantic pattern matching
  • Full UTF-8 support - Switched from bytestream to text processing, enabling proper handling of UTF-8 character sets and international characters

✨ Enhanced Features

  • Semantic Test Filtering - New capability to filter out test code from search results across 27+ programming languages and their respective testing frameworks (see Test Framework Support below)
  • Improved Text Processing - Native support for Unicode and multi-byte character encodings with accurate column positioning

Installation

From Hackage

cabal update
cabal install cgrep

From Source

git clone https://github.com/awgn/cgrep.git
cd cgrep
cabal install

or using stack:

stack build
stack install

Usage

cgrep 9.0.0 - Usage: cgrep [OPTION] [PATTERN] files...

Usage: cgrep [--file FILE] [-w|--word] [-p|--prefix] [-s|--suffix] [-e|--edit]
             [-G|--regex] [-i|--ignore-case] [-c|--code] [-m|--comment]
             [-l|--literal] [--identifier|--name] [--native|--type] [--keyword]
             [--number] [--string] [--op] [--type TYPE] [--kind KIND]
             [--code-only] [--hdr-only] [-T|--tests ARG] [--prune-dir DIR]
             [-r|--recursive] [-L|--follow] [-S|--semantic] [--strict]
             [--max-count INT] [--force-type TYPE] [--type-list]
             [-v|--invert-match] [-j|--threads INT] [--show-match] [--color]
             [--no-color] [-h|--no-filename] [--no-numbers] [--no-column]
             [--count] [--filename-only] [--json] [--vim] [--editor]
             [--fileline] [--verbose] [--stats] [--null-output] [--palette]
             [PATTERN FILES...] [--version]

  Context-aware grep for source codes

Available options:
  --file FILE              Read PATTERNs from file (one per line)
  -w,--word                Force word matching
  -p,--prefix              Force prefix matching
  -s,--suffix              Force suffix matching
  -e,--edit                Use edit distance
  -G,--regex               Use regex matching (posix)
  -i,--ignore-case         Ignore case distinctions
  -c,--code                Enable search in source code
  -m,--comment             Enable search in comments
  -l,--literal             Enable search in string literals
  --identifier,--name      Identifiers
  --native,--type          Native Types
  --keyword                Keywords
  --number                 Literal numbers
  --string                 Literal strings
  --op                     Operators
  --type TYPE              Specify file types. ie: Cpp, +Haskell, -Makefile
  --kind KIND              Specify file kinds. Text, Config, Language, Data,
                           Markup or Script
  --code-only              Parse code modules only (skip headers/interfaces)
  --hdr-only               Parse headers/interfaces only (skip modules)
  -T,--tests ARG           Filter tests: 'True' tests only, 'False' code only,
                           omitted (search all)
  --prune-dir DIR          Do not descend into dir
  -r,--recursive           Enable recursive search (don't follow symlinks)
  -L,--follow              Follow symlinks
  -S,--semantic            "code" pattern: _, _1, _2... (identifiers), ANY, KEY,
                           STR, LIT, NUM, HEX, OCT
  --strict                 Enable strict semantic for operators
  --max-count INT          Stop search in files after INT matches
  --force-type TYPE        Force the type of file
  --type-list              List the supported file types
  -v,--invert-match        Select non-matching lines
  -j,--threads INT         Approximate number of threads to run search
  --show-match             Show list of matching tokens
  --color                  Use colors to highlight the match strings
  --no-color               Do not use colors (override config file)
  -h,--no-filename         Suppress the file name prefix on output
  --no-numbers             Suppress both line and column numbers on output
  --no-column              Suppress the column number on output
  --count                  Print only a count of matching lines per file
  --filename-only          Print only the name of files containing matches
  --json                   Format output as json object
  --vim                    Run vim editor passing the files that match
  --editor                 Run the editor specified by EDITOR var., passing the
                           files that match
  --fileline               When edit option is specified, pass the list of
                           matching files in file:line format (e.g. vim
                           'file-line' plugin)
  --verbose                Enable verbose mode
  --stats                  Print statistics about the search
  --null-output            Disable output for performance evaluation
  --palette                Show color palette
  -h,--help                Show this help text
  --version                Show version information and exit

Examples

Basic Searches

Search for a simple pattern in source files:

cgrep "main" *.c

Search recursively in a directory:

cgrep -r "TODO" src/

Case-insensitive search:

cgrep -i "buffer" *.cpp

Context-Aware Searching

Search only in code (exclude comments and strings):

cgrep -c "malloc" *.c

Search only in comments:

cgrep -m "TODO" -r src/

Search only in string literals:

cgrep -l "hello" *.cpp

Search in both code and comments, but not in strings:

cgrep -c -m "config" *.js

Token Filters

Search for identifiers only:

cgrep --identifier "main" *.c

Search for native types:

cgrep --type "int" *.c

Search for string literals containing specific text:

cgrep --string "hello" *.cpp

File Type and Kind Filters

Search only in C++ files (recursively):

cgrep --type=Cpp -r "char" test/

Search in Haskell files, but exclude test code:

cgrep --type=Haskell -T False "function" -r .

Search by file kind (configuration files):

cgrep --kind=Config "database" -r /etc/

Test Filtering (New in v9)

Search only in production code, excluding all tests:

cgrep -T False "function" -r src/

Search only in test code:

cgrep -T True "mock" -r tests/

This feature automatically detects and filters test code based on language-specific conventions (see Test Framework Support).

Semantic search allows you to match code patterns using wildcards:

  • _, _1, _2, ... : Match any identifier
  • ANY : Match any token
  • KEY : Match any keyword
  • STR : Match any string literal
  • LIT : Match any literal
  • NUM : Match any number
  • HEX : Match any hexadecimal number
  • OCT : Match any octal number

Find variable assignments with numeric literals:

cgrep -S "_ = NUM" *.c

Advanced Pattern Matching

Use regular expressions (POSIX):

cgrep -G "main|return" *.c

Use word boundaries:

cgrep -w "read" *.c    # Matches "read" but not "thread" or "reader"

Prefix matching:

cgrep -p "ma" *.c   # Matches "main", etc.

Suffix matching:

cgrep -s "rn" *.c  # Matches "return", etc.

Case-insensitive search:

cgrep -i "SED" test.cpp  # Matches "Sed", "sed", etc.

Output Formatting

Show only filenames of files containing matches:

cgrep --filename-only "char" *.cpp

Count matches per file:

cgrep --count "char" test.cpp

JSON output (useful for scripting):

cgrep --json "error" *.log

Suppress filename prefix:

cgrep -h "pattern" file.c

Show matching tokens:

cgrep --show-match "std::" *.cpp

Editor Integration

Open matching files in vim:

cgrep --vim "FIXME" -r src/

Open with your default editor (set via EDITOR environment variable):

cgrep --editor "bug" -r src/

Use vim with file:line format (works with vim-file-line plugin):

cgrep --vim --fileline "error" -r src/

Performance and Control

Limit number of threads:

cgrep -j 4 "pattern" -r large_codebase/

Stop after finding N matches in each file:

cgrep --max-count=5 "TODO" -r src/

Show search statistics:

cgrep --stats "function" -r src/

UTF-8 and International Characters

Version 9's improved UTF-8 support allows searching in files with international characters:

cgrep "Hello" test.utf8

Test Framework Support

Version 9 introduces intelligent test code filtering across 27+ programming languages. When using the -T flag, cgrep can automatically detect and filter test code based on language-specific conventions and testing frameworks.

Language Testing Frameworks Detected Detection Patterns
Rust Built-in, cargo test #[test], #[cfg(test)] modules
Go Built-in testing func Test*, func Benchmark*
Java JUnit @Test annotations
Kotlin JUnit @Test annotations
C Google Test, Catch2 TEST(), TEST_F(), TEST_CASE(), test_* functions
C++ Google Test, Catch2, Boost.Test TEST(), TEST_F(), TEST_CASE(), BOOST_AUTO_TEST, Test* functions
Python pytest, unittest test_* functions, Test* classes, @pytest, @unittest decorators
Zig Built-in test "..." blocks
JavaScript Mocha, Jasmine, Jest describe(), it(), test(), context()
TypeScript Mocha, Jasmine, Jest describe(), it(), test(), context()
Scala ScalaTest, MUnit test(), it(), describe(), scenario(), feature()
Haskell HSpec, Tasty, QuickCheck, HUnit describe, it, context, testCase, testGroup, testProperty, prop_* functions
C# NUnit, xUnit, MSTest [Test], [Fact], [Theory], [TestMethod], [TestFixture], [TestClass]
F# NUnit, xUnit, Expecto [<Test>], [<Fact>], [<Theory>], testCase, testList, test
Dart Built-in test package, Flutter test(), group(), testWidgets()
Elixir ExUnit test "...", describe "...", defmodule *Test
Ruby RSpec, Minitest describe, context, it, test_*, def test_*
PHP PHPUnit @test annotations, test* methods, *Test classes
Swift XCTest XCTestCase classes, func test*()
Objective-C XCTest XCTestCase classes, test methods
R testthat test_that(), describe(), context()
Julia Built-in Test @testset, @test
Perl Test::More, Test::Simple subtest, *.t files
OCaml OUnit, Alcotest let test_*, test_case
Erlang EUnit *_test(), *_test_() functions
Nim unittest unittest module, test "..."
Clojure clojure.test (deftest ...), (testing ...)
D Built-in unittest { ... } blocks

Example: Finding Production Code Only

# Search for "config" only in production code, skipping all tests
cgrep -T False "config" -r src/

# Search for mock usage only in test files
cgrep -T True "mock" -r .

Supported Languages

CGrep supports a wide range of programming languages and file formats:

Programming Languages: C, C++, C#, Java, Kotlin, Scala, JavaScript, TypeScript, CoffeeScript, Python, Ruby, Perl, PHP, Go, Rust, Haskell, OCaml, F#, Erlang, Elixir, Clojure, Lisp, Scheme, Lua, R, Julia, Dart, Nim, Zig, D, Swift, Objective-C, Chapel, Awk, Shell scripts (Bash, Fish)

Markup & Config: HTML, XML, LaTeX, Markdown, YAML, JSON, TOML, INI, Dhall, CMake, Makefile, Cabal

To see the complete list of supported file types:

cgrep --type-list

Configuration

CGrep can be configured using a configuration file located at ~/.cgreprc (or $XDG_CONFIG_HOME/cgrep/cgreprc).

Example configuration:

colors: true
file-types:
  - +Cpp
  - +Haskell
  - -Test
jobs: 8

Performance Tips

  1. Use file type filters to reduce the number of files processed:

    cgrep --type=Cpp "pattern" -r .
    
  2. Limit the search scope with --prune-dir to exclude directories:

    cgrep --prune-dir=node_modules --prune-dir=.git "pattern" -r .
    
  3. Use context filters when you know where to search:

    cgrep -c "pattern" -r .  # Search only in code
    
  4. Use test filtering to focus on production code:

    cgrep -T False "pattern" -r .  # Exclude all test code
    
  5. Adjust thread count for optimal performance on your system:

    cgrep -j 16 "pattern" -r .
    

Benchmarks (v9 vs v8)

Search Type v8 v9 Improvement
Plain token search 1.0x 1.75x +75%
Semantic search 1.0x 1.39x +39%

Benchmarks performed on a typical codebase with ~100k lines of code.


Contributing

Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.


License

CGrep is released under the GPL-2.0-or-later license. See the LICENSE file for details.


Author

Nicola Bonelli nicola@larthia.com