DOC: more info on developing filters

author: Daniel Black <grooverdan@users.sourceforge.net> 2013-09-19 13:10:21 +1000
committer: Daniel Black <grooverdan@users.sourceforge.net> 2013-09-19 13:10:21 +1000
commit: 3fac971a5a156f277b4ed25193fbe79ebc14e701 (patch)
tree: 3c08ab22d3d61542ef4707db734b1fffb229ffb7 /DEVELOP
parent: 596abde7126a46bf8ee3b0174fd1aba3b3609a14 (diff)
download: fail2ban-3fac971a5a156f277b4ed25193fbe79ebc14e701.tar.gz
1 files changed, 247 insertions, 17 deletions
diff --git a/DEVELOP b/DEVELOP
index 61248393..74bd8ff3 100644
--- a/DEVELOP
+++ b/DEVELOP
@@ -37,19 +37,258 @@ When submitting pull requests on GitHub we ask you to:
 Filters
 =======
 
-* Include sample logs with 1.2.3.4 used for IP addresses and 
-  example.com/example.org used for DNS names
-* Ensure sample log is provided in testcases/files/logs/ with same name as the
-  filter. Each log line should include match meta data for time & IP above
-  every line (see other sample log files for examples)
+Filters are tricky. They need to:
+* work with a variety of the versions of the software that generates the logs;
+* work with the range of logging configuration options available in the
+  software;
+* work in multiple operating systems;
+* not make assumptions about the log format in excess of the software;
+* make assumptions as to how future versions of the software will log messages;
+* not be susceptable to DoS vulernabilities; and
+* match intended log lines only.
+
+Please follow the steps from Filter Test Cases to Developing Filter Regular
+Expressions and submit a github pull request afterward. If you get stuck,
+create a github issue with what you have done and we'll attempt to help.
+
+Filter test cases
+-----------------
+
+Purpose:
+
+Start by finding the log messages that the application generates related to 
+some form of authentication failure. If you are adding to an existing filter
+think about wheither the log messages are of a simlar importance and purpose
+to the existing filter. If you where a user of fail2ban, and did a package
+update of fail2ban that started matching the new log messages, would anything
+unexpected happen? Would the bantime/findtime for the jail be approprate for
+the new log messages. If it doesn't perhaps it needs to be in a separate
+filter defination, for example like exim is authentication failures and
+exim-spam contains log messages replated to spam.
+
+Even if its a new filter you may consider separating the log messages into 
+different filters based on purpose.
+
+Cause:
+
+Are some of the log lines a result of the same action? For example is a PAM
+failure log message, followed by an application specific failure message the
+result of the same user/script action. The result is if you add regular
+expressions for both you'll end up with two failures for a single action.
+Select the most approprate log message and document the other log message with
+a test case not to match it and a description as to why you chose one over
+another.
+
+With the log lines selected consider what occured to generate those log
+messages and wheither they could of been generated by accidental means. Could
+the log message occur always as this is the first step towards the application
+asking for authentication? Could the log messages occur often? If some of
+these are true make a note of this in the jail.conf example that you provide.
+
+Samples:
+
+Its important to include log file samples so any future change in the regular 
+expression will still work with the log lines you have identified.
+
+The sample log messages are provided in testcases/files/logs/ with same name
+as the filter. Each log line should include a failJSON metadata (so the logs
+lines are tested in the test suite) directly above the log line. If there is
+any specific information about the log message, such as version or an
+application configuration option that is needed for the message to occur, 
+include this in a comment (line beginning with #) above the failJSON metadata.
+
+Log samples should include only one, definately not more than 3, examples of
+log messages of the same form. If log messages are different in different
+versions of the application log messages that show this is encouraged.
+
+If the mechanism to create the log message isn't obvious provide a
+configuration and/or sample scripts testcases/files/config/{filtername} and
+reference these in the comments above the log line.
+
+FailJSON metadata:
+
+A failJSON metadata is a comment immediately above the log message. It will
+look like:
+
+# failJSON: { "time": "2013-06-10T10:10:59", "match": true , "host": "193.169.56.211" }
+
+Time should match the time of the log message. It is in a specific format of
+Year-Month-Day'T'Hour:minute:Second. If your log message does not include a
+year, like the example below, the year will be 2005, if before Sun Aug 14 10am
+UTC, and 2004 if afterwards.
+
+# failJSON: { "time": "2005-03-24T15:25:51", "match": true , "host": "198.51.100.87" }
+Mar 24 15:25:51 buffalo1 dropbear[4092]: bad password attempt for 'root' from 198.51.100.87:5543
+
+The host will contain the IP or domain that should be blocked.
+
+For long lines that you don't want matched, like log injection vulerabilities
+and log lines excluded (see "Cause" section above), a "match": false in the
+failJSON and the reason why in the comment above.
+
+After developing the regexs, the following command will test all the failJSON
+metadata against the log lines:
+
+./fail2ban-testcases testSampleRegex
+
+Developing Filter Regular Expressions
+-------------------------------------
+
+Date/Time:
+
+The first step in checking your log line can have a filter is to check that the
+time format matches an existing regex. To test this copy the time component 
+from the log line and append an IP address. Then test it with:
+
+./fail2ban-regex "2013-09-19 02:46:12 1.2.3.4" "<HOST>"
+
+In the output from this should be something like:
+
+Date template hits:
+|- [# of hits] date format
+|  [1] Year-Month-Day Hour:Minute:Second
+
+Ensure that the template description matches of bits in the time format. If 
+there isn't a matched a format and date regex can be added to 
+server/datedetector.py. Ensure this is added in an order that will match make
+more specific matches occur first and that their is no confusion as to which
+is the date or month.
+
+Filter file:
+
+The filter file is in config/filter.d/{filtername}.conf. The format of the 
+filter file has two sections INCLUDES and Defination as follows:
+
+[INCLUDES]
+
+before = common.conf
+
+after = filtername.local
+
+[Definition]
+
+failregex = ....
+
+ignoreregex = ....
+
+This is also documented in the man pages as jail.conf (section 5). Other
+definations can be added to make failregex's more readable and maintainable.
+
+
+General rules:
+
+Use "before" if you need to include a common set of rules, like syslog or if
+there's a common set of regexs for multiple filters.
+
+Use "after" if you wish to allow the user to overwrite a set of customisations
+of the current filter. This file doesn't need to exist.
+
+Try to avoid using ignoreregex mainly for performance reasons. The case when
+you would use it is if in trying to avoid using ignoreregex, you end up with
+an unreadable failregex.
+
+Syslog:
+
+If your application logs to syslog you can use the following to capture that 
+part. So as a base use:
+
+[INCLUDES]
+
+before = commmon.conf
+
+[Definition]
+
+_daemon = app
+
+failregex = ^%(__prefix_line)s
+
+In this example common.conf defines __prefix_line which also contains the
+_daemon name, (in syslog terms the service) you specified. _daemon can also be
+a regex.
+
+So the following uses a _daemon set to "dovecot"
+
+Dec 12 11:19:11 dunnart dovecot: pop3-login: Aborted login (tried to use disabled plaintext auth): rip=190.210.136.21, lip=113.212.99.193
+
+So now ^%(__prefix_line)s matches "Dec 12 11:19:11 dunnart dovecot: ". Note it
+matches the trailing space. Putting a space after ^%(__prefix_line)s in the
+regex will probably not match.
+
+Substitions:
+
+Substitions are what the syslog uses. The regex bits of %(_name)s substitute
+the _name defination into the regex. They are useful for making the regexes
+more readable and also defining regex parts that occur in multiple log lines.
+
+Regular Expressions:
+
+The regular expression you will be writing will assume that the date/time has
+been removed from the log line because this is how fail2ban works internally.
+
+If the format is like '<date...> error 1.2.3.4 is evil' then you will need to
+match the < at the start so regex should be similar to '^<> <HOST> is evil$'.
+
+Use <HOST> where the IP/domain name appears in the log line.
+
+The following general rules apply to regular expressions:
+
 * Ensure regexs start with a ^ and are restrictive as possible. E.g. not .* if
   \d+ is sufficient
 * Use the functionality of regexs http://docs.python.org/2/library/re.html
-* Take a look at the source code of the application. You may see optional or
-  extra log messages, or parts there of, that need to form part of your regex.
+* Try to make the regular expression readable (as much as possible). E.g. 
+  (?:...) represents a non-capturing regex but (...) is more readable.
 
 If you only have a basic knowledge of regular repressions read
-http://docs.python.org/2/library/re.html first.
+http://docs.python.org/2/library/re.html first. Really. It doesn't take long
+and will remind you which bits you need to escape and which bits you don't.
+
+Developing/testing the regex:
+
+You can develop the regex in the file or on the command line depending on your
+preference. You can also use the samples you've created in the test cases or
+test them one at a time.
+
+The general tool is fail2ban-regex. To see how to use it run:
+
+./fail2ban-regex  --help
+
+Take note of  -l heavydebug  / -l debug  and -v as they will be most useful.
+
+TIP: Take a look at the source code of the application. You may see optional or
+  extra log messages, or parts there of, that need to form part of your regex.
+  It may also show how some parts are contrained and different formats
+  depending on configuration or less common usages.
+
+TIP: Some applications log spaces at the end. If you're not sure add \s*$ as the
+     end part of the regex.
+
+If your regex isn't matching take a look at http://www.debuggex.com/.
+
+Using the regex from the ./fail2ban-regex output (to ensure all substitutions
+are done) and with <HOST> replaced with (?&.ipv4). Set the regex type to
+Python.
+
+For the test data put your log output with the time removed.
+
+When you've fixed the regex put it back into your filter file.
+
+Please give a donation to stoarca for debuggex. Its a great tool isn't it.
+
+Finishing up:
+
+If you've created a new filter, add an entry in config/jail.conf. The theory
+here is that a user will create a jail.conf with [filtername]\nenable=true.
+
+So more specifically in the [filter] section in jail.conf:
+* Ensure that you have "enabled = false", we want people to enable as needed
+* use "filter =" set to your filter name.
+* use a action to disable ports associated with the application
+* set "logpath" to a usual location for the log file for the application.
+* If the default findtime or bantime isn't approprate to the filter set a value
+  that is more approprate.
+
+Send the fail2ban a git pull request (See "Pull Requests" above) containing
+your great work.
 
 Filter Security
 ---------------
@@ -63,8 +302,6 @@ ability to deny any host they choose.
 So the <HOST> part must be anchored on text generated by the application, and not
 the user, to a sufficient extent that the user cannot insert the entire text.
 
-Filters are matched against the log line with their date removed.
-
 Ideally filter regex should anchor to the beginning and end of the log line
 however as more applications log at the beginning than the end, achoring the
 beginning is more important. If the log file used by the application is shared
@@ -73,13 +310,6 @@ use that log file do not log user generated text at the beginning of the line,
 or, if they do, ensure the regexs of the filter are sufficient to mitigate the
 risk of insertion.
 
-When creating a regex that extends back to the begining remember the date part
-has been removed within fail2ban so theres no need to match that. If the format
-is like '<date...> error 1.2.3.4 is evil' then you will need to match the < at
-the start so here the regex would start like '^<> <HOST> is evil$'.
-
-Some applications log spaces at the end. If you're not sure add \s*$ as the
-end part of the regex.
 
 Examples of poor filters
 ------------------------
author	Daniel Black <grooverdan@users.sourceforge.net>	2013-09-19 13:10:21 +1000
committer	Daniel Black <grooverdan@users.sourceforge.net>	2013-09-19 13:10:21 +1000
commit	3fac971a5a156f277b4ed25193fbe79ebc14e701 (patch)
tree	3c08ab22d3d61542ef4707db734b1fffb229ffb7 /DEVELOP
parent	596abde7126a46bf8ee3b0174fd1aba3b3609a14 (diff)
download	fail2ban-3fac971a5a156f277b4ed25193fbe79ebc14e701.tar.gz