Showing posts with label mrtg. Show all posts
Showing posts with label mrtg. Show all posts

21 February, 2012

bugs that never go away

I had a problem trying to get mrtg to read ssCpuRawUser.0 and related snmp variables. it was giving some error, even though it can read various other snmp variables fine. i googled. I got a few pages of people going all the way back to 2001 with the same error. no solutions. there's even a bug where the mrtg author WORKSFORMEs it. fuck this. I think there's an XKCD about this. So, I can read the variable with the net-snmp commandline tool and feed it in that way. yay for being able to write my own plugins.

#!/bin/bash
# $1 = name1
# $2 = name
# $2 = host
snmpget -m + -c public -v 1 $3 $1 | sed 's/^.*Counter32: //'
snmpget -m + -c public -v 1 $3 $2 | sed 's/^.*Counter32: //'

02 July, 2011

mrtg user counts

I had a graph in mrtg of two variables: number of login sessions (the blue line, the count of users from the unix users command), and number of unique users logged in (the green solid area - formed by deduping the output of users).

I decided that the number of login sessions was a bit useless especially with people using screen. A more interesting graph is perhaps something else based on the number of users doing things rather than the number of things a user is doing. So I chose to count the number of (normal, not system) users who have at least one process running.

I count it this way:

#!/bin/bash

# first, unique users logged in
echo $(users | sed 's/ /\n/g' | sort | uniq | wc -l)

# number of users who have 3xxx user numbers and have a process running 
# 3xxx is the "normal user" range, though perhaps I could distinguish in other ways
U=$(ps -A --format user= | sort | uniq | while read a ; do id -u $a ; done | grep -e '3...' | wc -l)

echo $U
echo 0
echo 0


and here's the output: (green is logged-in users, blue is users with processes running)



If you click through to the full page, you might see the change happening in the middle of week 17 of 2011, which is the beginning of May.

21 May, 2011

ruby cloudwatch -> mrtg interface

I use mrtg to gather historical data on some of my servers. One of those servers lives in Amazon's Elastic Compute Cloud (EC2) and so is also monitored by Amazon CloudWatch.

Can I get cloudwatch data into mrtg?

mrtg has a fairly straightforward interface for plugging in arbitrary unix executables to collect data, so my first attempt was to use the main Java-based cloudwatch client to get data. that attempt started up one jvm for each metric collected, which massively overloaded my ec2 microinstance, keeping the load average around 3. pretty lame.

Amazon also provides a ruby interface. I had never programmed in ruby before, but its often interesting to learn a new language.

Here's what I ended up with.

First the config block for mrtg, which calls out to the ruby-mrtg-cloudwatch3 program that I wrote:

Target[cloudwatch_network]: `/home/mrtg/ruby-mrtg-cloudwatch/ruby-mrtg-cloudwatch3 NetworkIn NetworkOut AWS
/EC2 InstanceId=i-26bcaf51`
Title[cloudwatch_network]: Network traffic according to cloudwatch
options[cloudwatch_network]: growright,absolute,logscaleMaxBytes[cloudwatch_network]: 100000000

This gives a graph of network traffic according to cloudwatch. I can compare that alongside the network traffic graph for eth0 gathered from the local interface statistics. They should roughly match up, and they do (well hopefully they still do by the time you read this - these are live images):

According to the on-host network interface:


According to cloudwatch:


Now the actual ruby code:

#!/usr/bin/ruby1.8

require 'rubygems'
require 'AWS'

The two cloudwatch metric names, one that measures output data, one that measures input data, are give on the command line:
metrico=ARGV[0]
metrici=ARGV[1]

My code has hardcoded access keys at the moment which is a bit shitty:
ACCESS_KEY_ID='foo'
SECRET_ACCESS_KEY='bar'


Using the above credentials, a new cloudwatch object is made, @cw.

@cw = AWS::Cloudwatch::Base.new(:access_key_id => ACCESS_KEY_ID, :secret_access_key => SECRET_ACCESS_KEY, :server => "eu-west-1.monitoring.amazonaws.com" )

Each of the two metrics will be probed with the probe function. This uses a state file based on the metric name to get only readings which have not already been seen by this script. The two metrics use separate state files because cloudwatch doesn't give an atomic read for multiple metrics at once. The state file stores the time of the last seen reading. If there is no state file, we have to invent a time. There is a subtlety here: data does not appear in cloudwatch until around 5 minutes after its time stamp, so using the current time as an initial value results in not seeing any results. Instead, I go back about 15 minutes the first time, which will seems to be far enough back to get something.

def probe(metric)

  et = Time.now()

  statusfn="cloudwatch-"+ARGV[3]+"-"+metric+".status"
  if FileTest.exist?(statusfn) then

    f = File.new(statusfn, "r")
    tstring = f.gets
    ts = Time.parse(tstring)
    f.close
  else
    ts = et - 900 # needs to be more than 5 mins because otherwise we never get any data.
end

  res = @cw.get_metric_statistics(:measure_name => metric,  :statistics => 'Average,Sum', :namespace => ARGV[2], :period => 300, :start_time => ts, :end_time => et, :dimensions=> ARGV[3])


Now we're going to look at the rows that come back. Usually only one row will come back, if we're running this at about the same rate that cloudwatch is adding readings, but sometheres there will be more, or fewer.

In the case of network traffic, I want to return the sum of all readings for this metric. In other cases, such as disk usage, I would want to return the mean. This distinction is the same as default vs gauge measurements in MRTG.

samples = 0
  sum = 0
  avgsum = 0

  datapoints = res["GetMetricStatisticsResult"]["Datapoints"]

  lt = ts
  if datapoints.nil? then
   # nop
  else
    rows = datapoints["member"]

    rows.each { |r|
      nlt = Time.parse(r["Timestamp"])
      if(nlt < ts) then
        # nop - time was before requested start
      else
        samples += Float(r["Samples"])
        avgsum += Float(r["Average"])
        sum += Float(r["Sum"])
        nlt += 1
        if(nlt > lt) then
          lt = nlt
        end
      end
    } 

Now we can write out the new state file:

f=File.new(statusfn, "w")
    f.puts(lt)
    f.close
  end
  return sum
end


and finally output the MRTG format information:

sumo=probe(metrico)
sumi=probe(metrici)

# output mrtg format
puts sumo
puts sumi
puts 0
puts "cloudwatch: "+metrico+" and "+metrici


The end.

06 February, 2011

Using mrtg and iptables to record IPv4 vs IPv6 traffic

I wanted to plot IPv6 vs IPv4 traffic on my hosts - I have most services enabled for IPv6 but I know they don't get used much. I had MRTG already.

iptables on Linux counts bytes that pass through it, even if there are no iptables rules:

$ iptables -L -v -x
Chain INPUT (policy ACCEPT 6079694 packets, 2474020715 bytes)
[...]

That counts ipv4 packets. The ipv6 equivalent is ip6tables.

So without needing to add any iptables rules at all, I can feed this output to mrtg with a script as follows, which outputs IPv4 traffic (for all three categories) as the first (input) variable, and IPv6 as the second(output) variable.


#!/bin/bash
A=0
IP4=$(/sbin/iptables -L -x -v | grep -e ^Chain | sed 's/.* \([0-9]*\) bytes)$/\1/' | ( while read n ; do A=$(( $A + $n )) ; done ; echo $A))

A=0
IP6=$(/sbin/ip6tables -L -x -v | grep -e ^Chain | sed 's/.* \([0-9]*\) bytes)$/\1/' | ( while read n ; do A=$(( $A + $n )) ; done ; echo $A))

echo $IP4
echo $IP6
echo 0
echo unknown

On one host, there really is hardly any ipv6 traffic (2 bytes/sec!) so I turned on the log scale plot option in MRTG to show the ipv6 a bit more (though to be honest its still pretty invisible).

Here's the config I used in MRTG to call the above script:

Target[ip46]: `/home/benc/bin/iptables-to-mrtg`
Target[ip46]: `curl http://dildano.hawaga.org.uk/mrtg-iptables.txt`
options[ip46]: growright,logscale
MaxBytes[ip46]: 1000000000000
Title[ip46]: IPv4 vs IPv6
YLegend[ip46]: bytes/sec
LegendI[ip46]: IPv4
LegendO[ip46]: IPv6

and here's an example graph (live, click for historical data):



Caveats:

I'm summing the all three iptables chains: input, output, and forwarded, for all interfaces. So some traffic here can be counted unexpectedly: A forwarded packet traverses all three chains (I think) so this is not a good way to count traffic if your linux box is a router; The lo interface will also be counted, so traffic to localhost (127.0.0.1 or ::1) will be counted in this graph. This might be useful to remove.

When there's a tunnel endpoint on the machine, then traffic to that tunnel will be counted twice: one as it passes the tunnel interface, and once as the encapsulated form passes the physical ethernet interface.

These are not insurmountable, I think: by setting specific iptables rules that address these concerns and counting traffic from those instead of the main chain counters.