Regex to ban clients that issue disconnects too frequently using fail2ban

Joined
Apr 5, 2017
Messages
133
Likes
90
Degree
0
I am running a small service and I want to ban clients that issue this disconnects too too frequently using fail2ban. I use pm2 as well so I think I have to monitor the logs from pm2 from fail2ban. What stumps me is how to do the regex for these two logs:

Code:
2017-11-16 13:59:21+0000 [_GenericHTTPChannelProtocol,3,127.0.0.1] Player disconnected (False, 1006, connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake))
Code:
2017-11-16 13:51:50+0000 [ProxyClient,client] ERROR: *IP_ADDRESS* RETURNED NO RESULT; CONNECTION: CLOSED;

Any help would be great!
 
I think I'm going braindead this week. @doublethinker, could you explain exactly which part of the logs you're trying to write regex for. I've actually been planning on writing a tutorial for BuSo on that very subject! Regex plus advanced text editors like Sublime, Atom, etc. is probably one of the most useful things I ever learned! The stuff it can allow you to accomplish is insane!

As far as more general regex stuff, I use this regex tester occasionally, and it works well. Most of the time I use Sublime though, to manipulate data at scale, but any advanced text editor will do (Atom, Notepad++, etc.).

The thing to do is try and take advantage of regex "flags" where possible (character classes, anchors, groups, etc.). They help make your regex concise, clean, and sometimes easier to reason about. For example, you can use the ^ (carat) to mark the beginning of a line, and the $ sign to mark the end.

Also, say you want to be real specific. Let's say you're trying to pick out very specific URL strings, like a URL that contains geographic data that's always the same format.

Example: /widgets/san-diego-ca/

Sometimes it's tough, especially with URL or string structures that are not well formed. In this case, you could do something like this, looking specifically at those last 2 characters.

Example Regex: ^.*\-[a-z]{2}\/$

In that example:
  1. We chose to reference the beginning of the line (^).
  2. Then we threw in a wildcard for laziness sake (.*).
  3. We add an escaped dash so the next part will work (\-) and not catch weird stuff like (/san-diego/).
  4. We then use the brackets to specify letters-only, a through z specifically ([a-z]).
  5. After that we add the number 2 in curly braces, to say precisely 2 characters from a through z.
  6. We follow that up with an escaped forward slash, and then finally reference the end of the line with ($).
Escaping is important, and not doing it on certain characters can create entirely different meanings for your regex.

That was just an example. It is just like any programming language (and technically is one I guess), so there's TONS of different ways to write regex that accomplishes the same thing. There's probably even shorter and easier ways to write that same example, but I like to do them very linear like that so they're easier to reason about.
 
I hate regex with a passon, but there's also nothing like it. There is enough documentation for simpler, basic alphanumeric checks and such. But for this particular case, I am trying to flag specific messages in logs. Using the same example above:

Code:
2017-11-16 13:59:21+0000 [_GenericHTTPChannelProtocol,3,127.0.0.1] Player disconnected (False, 1006, connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake))

"Player disconnected (False, 1006, connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake))" is a static output, where as the timestamp and header may change. So I'm trying to figure out how to get regex to match this exactly.

The second example is a bit more complex, because the static output has a dynamic variable (i.p. address) in between. This can be changed, but an example showing how this can be done will loosen up the knot of my brain worms a little bit.
 
Code:
2017-11-16 13:59:21+0000 [_GenericHTTPChannelProtocol,3,127.0.0.1] Player disconnected (False, 1006, connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake))

Code:
2017-11-16 13:51:50+0000 [ProxyClient,client] ERROR: *IP_ADDRESS* RETURNED NO RESULT; CONNECTION: CLOSED;

Ex 1)
Code:
^[0-9]{4}-[0-9]{2}-[0-9]{2}.*Player disconnected \(False, 1006, connection was closed uncleanly \(peer dropped the TCP connection without previous WebSocket closing handshake\)\)

This looks for a line starting with the date by looking for 4 numbers, a dash, 2 numbers, dash and 2 numbers:
^[0-9]{4}-[0-9]{2}-[0-9]{2}

then it gets lazy with:
.*
which says any character any amount of times until matching:
Player disconnected \(False, 1006, connection was closed uncleanly \(peer dropped the TCP connection without previous WebSocket closing handshake\)\)

the parenthesis need to be escaped with a backslash.

Ex 2)
Code:
^[0-9]{4}-[0-9]{2}-[0-9]{2}.*ERROR:.*RETURNED NO RESULT; CONNECTION: CLOSED;
The wild card will capture the IP address and anything in between.
 
Last edited:
So I'm trying to figure out how to get regex to match this exactly.

Always look for the simplest way to match it, especially if it's coming from a computer. If it comes from a human you have to make it more robust, but otherwise don't overthink it.

/(.*? .*?) \[.*,(.*)\] Player disconnected \(False, 1006, connection was closed uncleanly \(peer dropped the TCP connection without previous WebSocket closing handshake\)\)/

But yes, regex is line noise, but definitely the most powerful string manipulating language.
 
Back