ngx-export-tools-extra-1.2.10: More extra tools for Nginx Haskell module
Copyright(c) Alexey Radkov 2021-2024
LicenseBSD-style
Maintaineralexey.radkov@gmail.com
Stabilitystable
Portabilitynon-portable (requires Template Haskell)
Safe HaskellSafe-Inferred
LanguageHaskell2010

NgxExport.Tools.PCRE

Description

PCRE matching and substitution from the more extra tools collection for nginx-haskell-module.

Synopsis

Matching against regular expressions

This module provides a simple handler matchRegex to match a value against a PCRE regex preliminary declared and compiled in configuration service simpleService_declareRegexes (which is an ignitionService in terms of module NgxExport.Tools.SplitService) and the corresponding service update hook (in terms of module NgxExport) compileRegexes at the start of the service.

Below is a simple example.

File test_tools_extra_pcre.hs

module TestToolsExtraPCRE where

import NgxExport.Tools.PCRE ()

The file does not contain any significant declarations as we are going to use only the exporters of the handlers.

File nginx.conf

user                    nobody;
worker_processes        2;

events {
    worker_connections  1024;
}

http {
    default_type        application/octet-stream;
    sendfile            on;

    haskell load /var/lib/nginx/test_tools_extra_pcre.so;

    haskell_run_service simpleService_declareRegexes $hs_regexes
            '[("userArea", "(?:\\\\|)(\\\\d+)$", "")
             ,("keyValue", "(k\\\\w+)(\\\\|)(v\\\\w+)", "i")
             ]';

    haskell_service_update_hook compileRegexes $hs_regexes;

    server {
        listen       8010;
        server_name  main;
        error_log    /tmp/nginx-test-haskell-error.log;
        access_log   /tmp/nginx-test-haskell-access.log;

        location / {
            haskell_run matchRegex $hs_user_area 'userArea|$arg_user';
            rewrite ^ /internal/user/area/$hs_user_area last;
        }

        location ~ ^/internal/user/area/(PCRE ERROR:.*) {
            internal;
            echo_status 404;
            echo "Bad input: $1";
        }

        location = /internal/user/area/ {
            internal;
            echo_status 404;
            echo "No user area attached";
        }

        location ~ ^/internal/user/area/(.+) {
            internal;
            echo "User area: $1";
        }
    }
}

In this example, we expect requests with argument user which should supposedly be tagged with an area code containing digits only. The user value should match against regex userArea declared alongside with another regex keyValue (the latter has an option i which corresponds to caseless; the regex compiler has also support for options s and m which correspond to dotall and multiline respectively). Notice that regex declarations require 4-fold backslashes as they are getting shrunk while interpreted sequentially by the Nginx configuration interpreter and then by the Haskell compiler too.

Handler matchRegex finds the named regex userArea from the beginning of its argument: the second part of the argument is delimited by a bar symbol and contains the value to match against. If the regex contains captures, then the matched value shall correspond to the contents of the first capture (in case of userArea, this is the area code), otherwise it must correspond to the whole matched value.

A simple test

$ curl 'http://localhost:8010/'
No user area attached
$ curl 'http://localhost:8010/?user=peter|98'
User area: 98
$ curl 'http://localhost:8010/?user=peter|98i'
No user area attached

matchRegex Source #

Arguments

:: ByteString

Key to find the regex, and the value

-> IO ByteString 

Matches a value against a named regex.

The regex must be preliminary declared and compiled by service handlers simpleService_declareRegexes and compileRegexes. The name of the regex and the value are passed in a single argument: the two parts are delimited by the first bar symbol met from the left, e.g. key|value.

This is the core function of the matchRegex handler.

Substitution with regular expressions

There are handlers to make substitutions using PCRE regexes. An ignitionService simpleService_mapSubs declares named plain substitutions which are made in run-time by handlers subRegex and gsubRegex. Functions subRegexWith and gsubRegexWith make it possible to write custom functional substitutions.

Let's extend our example by adding ability to erase the captured area code. We also going to implement a functional substitution to swap the keys and the values matched in the keyValue regex.

File test_tools_extra_pcre.hs

{-# LANGUAGE TemplateHaskell, LambdaCase #-}

module TestToolsExtraPCRE where

import           NgxExport
import           NgxExport.Tools.PCRE

import           Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L

gsubSwapAround :: ByteString -> IO L.ByteString
gsubSwapAround = gsubRegexWith $ const $ \case 
    a : d : b : _ -> B.concat [b, d, a]
    _ -> B.empty

ngxExportIOYY 'gsubSwapAround

Functional substitution handler gsubSwapAround expects a regular expression with at least 3 capture groups to swap the contents of the first and the third groups around. We are going to apply this handler against regex keyValue.

File nginx.conf: erase area code and swap keys and values

    haskell_run_service simpleService_mapSubs $hs_subs
            '[("erase", "")]';

    haskell_var_empty_on_error $hs_kv;
        location /erase/area {
            haskell_run subRegex $hs_user_no_area 'userArea|erase|$arg_user';
            rewrite ^ /internal/user/noarea/$hs_user_no_area last;
        }

        location ~ ^/internal/user/noarea/(PCRE\ ERROR:.*) {
            internal;
            echo_status 404;
            echo "Bad input: $1";
        }

        location ~ ^/internal/user/noarea/(.*) {
            internal;
            echo "User without area: $1";
        }

        location /swap {
            haskell_run gsubSwapAround $hs_kv 'keyValue|$arg_kv';
            echo "Swap $arg_kv = $hs_kv";
        }

Service simpleService_mapSubs declares a list of named plain substitutions. In this example, it declares only one substitution erase which substitutes an empty string, i.e. erases the matched text. Notice that the argument of handler subRequest requires three parts delimited by bar symbols: the named regex, the named substitution, and the value to match against.

A simple test

$ curl 'http://localhost:8010/erase/area?user=peter|98'
User without area: peter
$ curl 'http://localhost:8010/swap?kv=kid|v0012a'
Swap kid|v0012a = v0012a|kid

type SubPasteF Source #

Arguments

 = ByteString

The full match

-> [ByteString]

List of captures

-> ByteString 

Type of functions to perform functional substitutions.

subRegex Source #

Arguments

:: ByteString

Keys to find the regex and the sub, and the value

-> IO ByteString 

Pastes a named plain substitution using a named regex.

The substitution and the regex must be preliminary declared and compiled by service handlers simpleService_declareRegexes, compileRegexes, and simpleService_mapSubs. The names of the regex and the substitution, and the value are passed in a single argument: the three parts are delimited by bar symbols, e.g. regex|sub|value. The substitution gets applied only to the first occurrence of the match.

This is the core function of the subRegex handler.

subRegexWith Source #

Arguments

:: SubPasteF

Function to paste substitutions

-> ByteString

Keys to find the regex and the sub, and the value

-> IO ByteString 

Pastes functional substitutions using a named regex and a function.

The substitutions get applied only to the first occurrence of the match.

gsubRegex Source #

Arguments

:: ByteString

Keys to find the regex and the sub, and the value

-> IO ByteString 

Pastes a named plain substitution using a named regex.

The same as subRegex except that the substitution gets applied globally, wherever the match occurs.

This is the core function of the gsubRegex handler.

gsubRegexWith Source #

Arguments

:: SubPasteF

Function to paste substitutions

-> ByteString

Keys to find the regex and the sub, and the value

-> IO ByteString 

Pastes functional substitutions using a named regex and a function.

The same as subRegexWith except that the substitutions get applied globally, wherever the match occurs.