System Garden

Habitat 2.0 Alpha User Manual

Contents

  1. Introduction
  2. Data Sources
  3. Displaying Data
  4. Customisation and Preferences
  5. The Collection Agent: Clockwork
  6. Command Line Utilities
  7. Data Formats
  8. Upload, Download and Replication
    with System Garden

Manual Pages

The Collection Agent:Clockwork

Starting and Stopping Clockwork

Habitat consists of a collection agent for data gathering called clockwork and a set of front ends for display and extraction of data. MyHabitat, is the main user interface and can be used to easily control local collection. It will check clockwork is running each time it starts up and will ask to start collection if it isn’t already in place (although this is configurable). The menu option Data->Collection... or the Collection button on the status line gives you control over local agent.

Clockwork can also be started on the command line independently of MyHabitat. Run

$ clockwork

on its own and the agent will start silently as a background daemon.

To see if clockwork is running, run the status command

$ statclock
Clockwork process 4616 is running
  was started at 23-Nov-11 01:24:31 PM, user nigel, on /dev/pts/4

To stop the daemon, run

$ killclock
Stopping Clockwork
  pid 4616, user nigel and started on /dev/pts/4 at 23-Nov-11 01:24:31 PM
Stopped

To collect data on several machines without the GUI, just run clockwork on each, which as a daemon will put itself into the background and by default, log its own errors and warnings into the data store for later (see Data Sources for a data file description).

If there is a problem launching clockwork, starting

bin/clockwork -d

will cause diagnostic messages to be sent to stderr, including completion of collection jobs. If the failure is not obvious, send this output to [email protected]. If there are still problems, an exhaustive set of debug messages can be obtained with

bin/clockwork -D

This places clockwork into developer debug mode.

Data Collection

Data can be pulled from a source by clockwork or can have data pushed directly into a data store from the command line or a programming interface.

In order to pull data, clockwork has a table of jobs that are executed at regular intervals (described later). By default these jobs run probes that collect all manner of system information and send it to a data stream called a route. These routes usually address local persistent data storage or remote network storage, so that information sent to a route will end up being saved.

Data can also be pushed on to routes independently of clockwork, by using an API or the command line tool habput. This will take text, in the format of a Fat Headed Array (FHA), and will send it to the route address specified.

A time series of data is built up by repeatedly storing a sequence of tables. Each table defines data at a period in time and is assigned a time stamp, sequence number and duration. This is often expressed in a tabular context by using the special columns _time, _seq and _dur.

A table of values are used for each sample, so that multiple instances may be expressed with out the use of excessive columns. For example, if habitat gathers information about storage, the FHA may look like the following.

_seq _time        id     mount         kread   kwritten   rios    wios
--
200  1107372138   hda1   /             1.98    15.85      15.09  12779.53
200  1107372138   hda2   /usr          6.78    0.00       60.76   0.00
200  1107372138   hda3   /mnt/windows  0.00    0.00       0.00    0.00
201  1107372143   hda1   /             2.03    9.20       30.94   459.53
201  1107372143   hda2   /usr          57.55   0.52       4.13    0.07
201  1107372143   hda3   /mnt/windows  0.00    0.00       0.00    0.00

In the example above, there are two sets of three lines, with each set sharing the same sequence number: 200 and 201. These rows belong to the same sample, share the same time stamp but have different values for id, which is the instance key. In the case of storage, the instance key is a subset of the device name. Thus, to get a time series for a particular disk (say hda1), select the rows with id=hda1 and sort on _seq.

Performance Gathering Probes

A probe is a small piece of code inside clockwork that extracts data from a running system and is called regularly by the job system. Its job is to sample data, process it and potentially repackage for data storage or display. The output is a single table per invocation (called a Fat Headed Array when represented as text) and an error or log stream.

Probes can also act as filters on a queue of data as an input in order to perform mathematics (such as averaging) or other manipulation. In this way, a job configuration can describe a pipeline of collection and manipulation jobs state-full data stored.

You can run a probe independently of clockwork on the command line and see its results on standard output. For example, the following will output the system performance probe on linux; only the first six columns are included for brevity (there are 35 columns) [The info line is staggered over two lines to readability].

$ habprobe sys
load1	load5	load15	runque	nprocs	lastproc
1 minute load average		15 minute load average		num of procs
		5 minute load average		num runnable procs		last proc run	info
""		""		""		""		""		""		key
4		4		4		""		""		""		max
""		""		""		""		""		""		name
abs		abs		abs		abs		abs		abs		sense
nano		nano		nano		u32		u32		u32		type
--
0.08		0.03		0.05		1		335		5142

The following are the probes available for use in clockwork and in the command habprobe, all output is in table format.

intr

Interrupt statistics

io

I/O data, storage and disk statistics

names

Symbolic data from the kernel

ps

Processes

sys

System data, including CPU and memory statistics

timer

Timer data

up

Uptime data, how long the system has been up

down

Down time data, calculated from up probe and can give a view on outages

net

Network device statistics

To confirm the probe list, run habprobe on its own or look at the manual page for habprobe(1) to see all the other probes that are available.

Data Gathering Methods

In addition to running probes to gather data (the probe method), jobs can use other methods to extract data, as follows

exec

Direct submission to exec(2)

sh

Test submit command line to sh(1)

snap

Take a snapshot of a route

tstamp

Timestamp in seconds since 1/1/1970 00:00:00

sample

Sample tables from a route, carries out a mathematical process and produce a single table as a result

pattern

Match patterns on groups of routes to raise events

event

Process event queues to carry out instructions

replicate

Replicate rings to and from a repository

probe

Extract data from the built in probes (see above and habprobe(1))

Where as all the probes produce tables, the methods are not obliged to. The probe method is used to run probes (see above).

The utility habmeth runs the methods on the command line so they can be seen outside of clockwork.

Job Execution

Clockwork uses methods and probes to sample data and build streams of data which get stored locally. See the manual pages for habprobe(1), habmeth(1) and clockwork(8) for details of the configuration.

There are several standard job files in the Habitat distribution, prepared for different collection scenarios (see lib/jobs.*). Select them by running clockwork with -j prefab where prefab is one of the following

norm

Normal job file, the current default sampling once every 60 seconds

normrep

Normal job file, with 24 hour replication to the defined repository

quick

Higher frequency collection, sampling at 10 seconds

Alternately, job tables can be customised into a file or held in a route and clockwork started with the switch -J route. Route should be specified using the pseudo-URL route syntax.

In MyHabitat you can see the current job file used by clockwork by selecting Edit->Collection...

Data Storage

Each job line has a method, command and set of arguments. Two outputs are produced from each job, the result stream and an error stream, analogous to standard output and standard error. Each output is sent to a stream defined by a route specification.

Data in Habitat is time series in nature, whether it is tabular or plain text. The primary storage method is called a ringstore, after the technical description of the data structure, a ring buffer. Examples are log and uptime data which are a time series set of tables, time stamped and sequenced to make ordered and unique. The rings have descriptions and a defined maximum length (including infinite) which can be customised. Data is removed when it exceeds the retained limit. The command line utility habrs or MyHabitat directly can be used to look at this data. Following normal operating system conventions, the data file will have the permissions and ownership of the creating user.