So, that's the jumping off point for this article's scripts: analyzing log files to understand what's going on and why.
To start, a handy check is to see how many processes are running, because my DDOS was characterized by a ridiculous number of comment and search scripts being triggered—hundreds a minute. How to check?
The
ps
command offers a list of running processes at any given time,
but for many versions, all you see is the Web server "httpd" without any
further details. The -C cmd
flag narrows down output only to those
processes, like this:
: ps -C httpd
PID TTY TIME CMD
20225 ? 00:13:21 httpd
28162 ? 00:00:01 httpd
...
5681 ? 00:00:00 httpd
5683 ? 00:00:00 httpd <defunct>
]]>
(Note the "defunct" process that's about to vanish.)
So one easy test is to see how many httpd processes are running:
$ ps -C httpd | wc -l
108
That seems like a lot, but this server is hosting several sites, including the
super-busy AskDaveTaylor.com tech-support site, which sees more than 100k hits/day. So
how does this vary over time? Hmm...still working on the command line:
$ while /bin/true
> do
> ps -C httpd | wc -l
> sleep 5
> done
108
107
103
99
94
91
87
84
91
121
120
116
So there's a max of 121 and a min of 87. But, what if I actually want to analyze this and
get min, max and average over a longer period of time?
Here's how I solve it:
#!/bin/sh
# Calculates the number of processes running that matches
# a set pattern over time, producing min, max and average.
min=999; max=0; average=0; tally=0; sumtotal=0
pattern="httpd" # ps -C pattern
while /bin/true
do
count=$(ps -C $pattern | wc -l)
tally=$(( $tally + 1 ))
if [ $count -gt $max ] ; then
max=$count
fi
if [ $count -lt $min ] ; then
min=$count
fi
sumtotal=$(( $sumtotal + $count ))
average=$(( $sumtotal / $tally ))
echo "Current ps count=$count: min=$min, max=$max, tally=$tally
↪and average=$average"
sleep 5 # seconds
done
exit 0
Notice in the script that I'm not falling into the trap of calculating the
average by having a running average and somehow factoring in the latest value as
a diminishing additive, but instead I use a sumtotal
variable that keeps
having the latest processor count added. That divided by
tally
is
always the average, although at some point this probably would be greater than
MAXINT (2**32) and would start to produce bad results. On a modern computer,
however, that should take a while. (And the quantum, the period of time between
iterations, also can be adjusted. Five seconds might be too granular for a
process that's going to be run for hours or even days.)
The following are the first few lines of output. Notice how the
min
and
max
vary as the different values are calculated:
sh processes.sh
Current ps count=132: min=132, max=132, tally=1 and average=132
Current ps count=128: min=128, max=132, tally=2 and average=130
Current ps count=124: min=124, max=132, tally=3 and average=128
Current ps count=123: min=123, max=132, tally=4 and average=126
If I let the script run for a longer period of time, the values become a bit more
varied:
Current ps count=90: min=76, max=150, tally=70 and average=107
During the 15 minutes or so that I ran the script, an average of 107
"httpd" processes were running, with a minimum of 76 and a max of 150.
Armed with that information, another script could keep an eye on things via a cron job, like this:
#!/bin/sh
# DDOS - keep an eye on process count to
# detect a blossoming DDOS attack
pattern="httpd"
max=200 # avoid false positives
admin="d1taylor@gmail.com"
count="$(ps -C $pattern | wc -l)"
if [ $count -gt $max ] ; then
echo "Warning: DDOS in process? Current httpd count =
↪$count" | sendmail $admin
fi
exit 0
That's a superficial solution, however, and it has two problems:
1) what I'd really like is to be able to identify the
potential DDOS based
on processor count and watch to see if it's sustained over the next few
invocations of the script, and 2) once it's triggered, if it is a DDOS, in
addition to everything else, I'll also start drowning in e-mail from this
script saying essentially the same thing each time. Not good.
What the script needs is contextual memory so it can differentiate between a sudden spike in traffic and a persistent DDOS attack. In the former case, the script might trigger positive, then the next time it runs, it's all within acceptable limits again. In the latter case, once the attack starts, it'll probably just accelerate.
That's the opposite of the e-mail non-repeat condition though, because in the latter case, I want to know that the e-mail has been sent and not send it again within, say, a 60-minute window.
I'll dig in to both of those situations another time. For now, I need to get back to my server and keep bringing things back on-line, program by program, to try to avoid any problems. Stay tuned!
Source
No comments:
Post a Comment