Copyright | (c) Alexey Radkov 2021-2024 |
---|---|
License | BSD-style |
Maintainer | alexey.radkov@gmail.com |
Stability | stable |
Portability | non-portable (requires Template Haskell) |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
PCRE matching and substitution from the more extra tools collection for nginx-haskell-module.
Synopsis
- matchRegex :: ByteString -> IO ByteString
- type SubPasteF = ByteString -> [ByteString] -> ByteString
- subRegex :: ByteString -> IO ByteString
- subRegexWith :: SubPasteF -> ByteString -> IO ByteString
- gsubRegex :: ByteString -> IO ByteString
- gsubRegexWith :: SubPasteF -> ByteString -> IO ByteString
Matching against regular expressions
This module provides a simple handler matchRegex to match a value
against a PCRE regex preliminary declared and compiled in
configuration service simpleService_declareRegexes (which is an
ignitionService
in terms of module NgxExport.Tools.SplitService) and the
corresponding service update hook (in terms of module NgxExport)
compileRegexes at the start of the service.
Below is a simple example.
File test_tools_extra_pcre.hs
module TestToolsExtraPCRE where import NgxExport.Tools.PCRE ()
The file does not contain any significant declarations as we are going to use only the exporters of the handlers.
File nginx.conf
user nobody; worker_processes 2; events { worker_connections 1024; } http { default_type application/octet-stream; sendfile on; haskell load /var/lib/nginx/test_tools_extra_pcre.so; haskell_run_service simpleService_declareRegexes $hs_regexes '[("userArea", "(?:\\\\|)(\\\\d+)$", "") ,("keyValue", "(k\\\\w+)(\\\\|)(v\\\\w+)", "i") ]'; haskell_service_update_hook compileRegexes $hs_regexes; server { listen 8010; server_name main; error_log /tmp/nginx-test-haskell-error.log; access_log /tmp/nginx-test-haskell-access.log; location / { haskell_run matchRegex $hs_user_area 'userArea|$arg_user'; rewrite ^ /internal/user/area/$hs_user_area last; } location ~ ^/internal/user/area/(PCRE ERROR:.*) { internal; echo_status 404; echo "Bad input: $1"; } location = /internal/user/area/ { internal; echo_status 404; echo "No user area attached"; } location ~ ^/internal/user/area/(.+) { internal; echo "User area: $1"; } } }
In this example, we expect requests with argument user which should
supposedly be tagged with an area code containing digits only. The user
value should match against regex userArea declared alongside with another
regex keyValue (the latter has an option i which corresponds to
caseless
; the regex compiler has also support for options s and m which
correspond to dotall
and multiline
respectively). Notice that regex
declarations require 4-fold backslashes as they are getting shrunk while
interpreted sequentially by the Nginx configuration interpreter and then by
the Haskell compiler too.
Handler matchRegex finds the named regex userArea from the beginning of its argument: the second part of the argument is delimited by a bar symbol and contains the value to match against. If the regex contains captures, then the matched value shall correspond to the contents of the first capture (in case of userArea, this is the area code), otherwise it must correspond to the whole matched value.
A simple test
$ curl 'http://localhost:8010/' No user area attached $ curl 'http://localhost:8010/?user=peter|98' User area: 98 $ curl 'http://localhost:8010/?user=peter|98i' No user area attached
:: ByteString | Key to find the regex, and the value |
-> IO ByteString |
Matches a value against a named regex.
The regex must be preliminary declared and compiled by service handlers simpleService_declareRegexes and compileRegexes. The name of the regex and the value are passed in a single argument: the two parts are delimited by the first bar symbol met from the left, e.g. key|value.
This is the core function of the matchRegex handler.
Substitution with regular expressions
There are handlers to make substitutions using PCRE regexes. An
ignitionService
simpleService_mapSubs declares named plain
substitutions which are made in run-time by handlers subRegex and
gsubRegex. Functions subRegexWith
and gsubRegexWith
make it
possible to write custom functional substitutions.
Let's extend our example by adding ability to erase the captured area code. We also going to implement a functional substitution to swap the keys and the values matched in the keyValue regex.
File test_tools_extra_pcre.hs
{-# LANGUAGE TemplateHaskell, LambdaCase #-} module TestToolsExtraPCRE where import NgxExport import NgxExport.Tools.PCRE import Data.ByteString (ByteString) import qualified Data.ByteString as B import qualified Data.ByteString.Lazy as L gsubSwapAround :: ByteString -> IO L.ByteString gsubSwapAround =gsubRegexWith
$ const $ \case a : d : b : _ -> B.concat [b, d, a] _ -> B.emptyngxExportIOYY
'gsubSwapAround
Functional substitution handler gsubSwapAround expects a regular expression with at least 3 capture groups to swap the contents of the first and the third groups around. We are going to apply this handler against regex keyValue.
File nginx.conf: erase area code and swap keys and values
haskell_run_service simpleService_mapSubs $hs_subs '[("erase", "")]'; haskell_var_empty_on_error $hs_kv;
location /erase/area { haskell_run subRegex $hs_user_no_area 'userArea|erase|$arg_user'; rewrite ^ /internal/user/noarea/$hs_user_no_area last; } location ~ ^/internal/user/noarea/(PCRE\ ERROR:.*) { internal; echo_status 404; echo "Bad input: $1"; } location ~ ^/internal/user/noarea/(.*) { internal; echo "User without area: $1"; } location /swap { haskell_run gsubSwapAround $hs_kv 'keyValue|$arg_kv'; echo "Swap $arg_kv = $hs_kv"; }
Service simpleService_mapSubs declares a list of named plain substitutions. In this example, it declares only one substitution erase which substitutes an empty string, i.e. erases the matched text. Notice that the argument of handler subRequest requires three parts delimited by bar symbols: the named regex, the named substitution, and the value to match against.
A simple test
$ curl 'http://localhost:8010/erase/area?user=peter|98' User without area: peter $ curl 'http://localhost:8010/swap?kv=kid|v0012a' Swap kid|v0012a = v0012a|kid
= ByteString | The full match |
-> [ByteString] | List of captures |
-> ByteString |
Type of functions to perform functional substitutions.
:: ByteString | Keys to find the regex and the sub, and the value |
-> IO ByteString |
Pastes a named plain substitution using a named regex.
The substitution and the regex must be preliminary declared and compiled by service handlers simpleService_declareRegexes, compileRegexes, and simpleService_mapSubs. The names of the regex and the substitution, and the value are passed in a single argument: the three parts are delimited by bar symbols, e.g. regex|sub|value. The substitution gets applied only to the first occurrence of the match.
This is the core function of the subRegex handler.
:: SubPasteF | Function to paste substitutions |
-> ByteString | Keys to find the regex and the sub, and the value |
-> IO ByteString |
Pastes functional substitutions using a named regex and a function.
The substitutions get applied only to the first occurrence of the match.
:: ByteString | Keys to find the regex and the sub, and the value |
-> IO ByteString |
Pastes a named plain substitution using a named regex.
The same as subRegex
except that the substitution gets applied
globally, wherever the match occurs.
This is the core function of the gsubRegex handler.
:: SubPasteF | Function to paste substitutions |
-> ByteString | Keys to find the regex and the sub, and the value |
-> IO ByteString |
Pastes functional substitutions using a named regex and a function.
The same as subRegexWith
except that the substitutions get applied
globally, wherever the match occurs.