The following are a selection of the manual pages distributed with the habitat package, considered pertinent to the User Guide.
NAME
clockwork - collection daemon for the Habitat suite
SYNTAX
clockwork [-c <purl>] [-C <cfcmd>] [-e <fmt>] [-dDhsv] [-j <jobs>]
DESCRIPTION
Clockwork is the local agent for the Habitat suite. It runs as a dae-
mon process on each machine to be monitored and is designed to carry
out data collection, log file monitoring, data-driven actions and the
distribution of collected data.
The default jobs are to collect system, network, storage and uptime
statistics on the local machine and make them available in a standard
place. The collection of process data and file monitoring is available
by configuring the jobs that drive clockwork. Configuration can be
carried out at a local, regional and global level to allow deligation.
One public and many private instances of clockwork can exist on a sin-
gle machine, allowing individual users to carry out custom data collec-
tion Data is normally held in ring buffers or queues on the local
machine using custom datastores to be self contained and scalable.
Periodic replication of data rings to a repository is used for archiv-
ing and may be done in reverse for central data transmission.
OPTIONS
-c <purl>
Append user configuration data from the route <purl>, rather
than the default file ~/.habrc.
-C <cfcmd>
Append a list of configuration directives from <cfcmd>, sepa-
rated by semicolons.
-d Place clockwork in diagnostic mode, giving an additional level
of logging and sending the text to stderr rather than the
default or configured destinations. In daemon mode, will send
output to the controlling terminal.
-D Place clockwork in debug mode. As -d above but generating a
great deal more information, designed to be used in conjunction
with the source code. Also overrides normal outputs and will
send the text to stderr. In daemon mode, will send output to
the controlling terminal.
-e <fmt>
Change the logging output to one of eight preset alternative
formats, some showing additional information. <fmt> must be
0-7. See LOGGING below.
-h Print a help message to stdout and exit
-v Print the version to stdout and exit
-j <jobs>
Override public job table with a private one provided by the
route <jobs>. Clockwork will not daemonise, run a data service
or take an exclusive system lock (there can only be one public
clockwork instance). Implies -s and alters the logging output
to stderr, unless overridden with the range of elog configura-
tion directives.
-s Disable the public data service from being run, but will con-
tinue to save data as dictated by configuration.
DEFAULTS
When clockwork starts it reads $HAB/etc/habitat.conf and ~/.habrc for
configuration data (see CONFIGURATION for more details). Unless
overridden, clockwork will then look for its jobs inside the default
public datastore for that machine, held in $HAB/var/<hostname>.rs (the
route address is rs:$HAB/var/<hostname>.rs,jobs,0, see below for an
explanation). If it does not find the jobs, clockwork bootstaps itself
by copying a default job template from the file $HAB/lib/clockwork.jobs
into the public datastore and then carries on using the datastore ver-
sion.
The default jobs run system, network and storage data gathering probes
every 60 seconds. It saves results to the public datastore using the
template route rs:$HAB/var/<hostname>.rs,<jobname>,60 and errors to
rs:$HAB/var/<hostname>.rs,err_<jobname>,60
All other errors are placed in rs:$HAB/var/<hostname>.rs,log,0
ROUTES
To move data around in clockwork, an enhanced URL is used as a form of
addressing and is called a 'route' (also known as a pseudo-url or p-url
in documentation). The format is <driver>:<address>, where driver must
be one of the following:-
file: fileov:
reads and write to paths on the filesystem. The format is
file:<file path>, which will always append text to the file when
writing. The fileov: driver will overwrite text when first
writing and is suitable for configuration files or states.
http: https:
reads and writes using HTTP or HTTPS to a network address. The
address is the server name and object name as a normal URL con-
vention.
rs: read and writes to a ring store, the primary local storage mech-
anism. Tabular data is stored in a time series in a queue or
ring buffer structure. Multiple rings of data can be stored in
a single ringstore file, using different names and durations.
sqlrs: reads and writes tabular data to a remote repository service
using the SQL Ringstore method, which is implemented over the
HTTP protocol. Harvest provides repository services. Stores
tabular data in a time series, addressed by host name, ring name
and duration. Data is stored in a queue or ring buffer storage.
CONFIGURATION
By default, clockwork will collect system, network and storage statis-
tics for the system on which it runs. All the data is read and written
from a local datastore, apart from configuration items which come from
external sources. These external configuration sources govern the
operation of all the habitat commands and applications.
Refer to the habconf(5) man page for more details.
JOB DEFINITIONS
Jobs are defined in a multi columned text format, headed by the magic
string 'job 1'. Comments may appear anywhere, starting with '#' and
running to the end of the line.
Each job is defined on a single line containing 11 arguments, which in
order are:-
1. start
when to start the job, in seconds from the starting of clockwork
2. period
how often to repeat the job, in seconds
3. phase
not yet implemented
4. count
how many times the job should be run, with 0 repeating forever
5. name
name of the job
6. requester
who requested the job, by convention the email address
7. results
the route where results should be sent
8. errors
the route where errors should be sent
9. nslots
the number of slots created in the 'results' and 'errors'
routes, if applicable (applies to timestore and tablestore).
10.method
the job method
11.command
the arguments given to each method
See the habmeth(1) manpage for details of the possible methods that may
be specified and the commands that can accept.
DATA ORGANISATION
Data is stored in sequences of tabular information. All data has an
ordered independently of time, allowing multiple separate samples that
share the same time interval. This data is stored in a ringbuffer,
which allows data to grow to a certain number of samples before the
oldest are removed and their space recycled. Throughout the documenta-
tion, each collection of samples is known as a ring, and may be config-
ured to be a simple queue, where data management is left up to admin-
istrators.
To limit the amount of storage used, data in a ring can be sampled
periodically to form new summary data and stored in a new ring with a
different period. In habitat, this is known as cascading and takes
place on all the default collection rings. Several levels of cascading
can take place over several new rings, This allows summaries at differ-
ent frequencies to be collected and tuned to local requirements.
See the habmeth(1) man page for more information about the cascade
method.
DATA REPLICATION
Any ring of information can be sent to or from the repository at known
intervals, allowing a deterministic way of updating both repository and
collection agent.
This is implemented as a regular job which runs the replicate method.
Data for the method is provided by configuration parameters which can
be set and altered in the organisation. Thus the replication job does
not normally need to be altered to change the behaviour.
See the habmeth(1) man page for the replicate method and the formation
of the configuration data.
LOGGING
Clockwork and the probes that provide data, also generate information
and error messages. By convention, these are stored in the route speci-
fication ts:$hab/var/<host>.ts,log The convention for probes is to
store their errors in ts:$HAB/var/<host>.ts,e.<jobname>.
To override the logging location, use the range of elog configuration
directives, or rely on the options -d, -D, -j, which will alter the
location to stderr as a side effect. See habconf(5) for details.
Probe logging is configurable for each job in the job table.
The logging format can be customised using one of a set of configura-
tion directives (see habconf(5)). For convenience, the -e flag speci-
fies one of eight preconfigured text formats that will be sent to the
configured location:-
0 all 17 possible log variables
1 severity character & text
2 severity & text
3 severity, text, file, function & line
4 long severity, short time, short program name, file, function,
line & text
5 date time, severity, long program name, process id, file, func-
tion, line, origin, code & text
6 unix ctime, seconds since 1970, short program name, process id,
thread id, file, function, line, origin, code & text
7 severity, file, line, origin, code, text
FILES
If run from a single directory $HAB:-
$HAB/bin/clockwork
$HAB/var/<hostname>.rs, $HAB/lib/clockwork.jobs
/tmp/clockwork.run
~/.habrc, $HAB/etc/habitat.conf
If run from installed Linux locations:-
/usr/bin/habitat
/var/lib/habitat/<hostname>.rs, /usr/lib/habitat/clockwork.jobs
/var/lib/habitat/clockwork.run
~/.habrc, /etc/habitat.conf
ENVIRONMENT VARIABLES
EXAMPLES
Type the following to run clockwork in the standard way. This assumes
it is providing public data using the standard job file, storing in a
known place and using the standard network port for the data service.
clockwork
On a more secure system, you can prevent the data service from being
started
clockwork -s
Alternatively you can run it in a private mode by specifying '-j' and a
replacement job file.
clockwork -j <file>
AUTHORS
Nigel Stuckey <[email protected]>
SEE ALSO
killclock(1), ghabitat(1), habget(1), habput(1), irs(1), habedit(1),
habprobe(1), habmeth(1), habconf(5)
NAME
ghabitat - Gtk+ Graphical interface to Habitat suite
SYNTAX
ghabitat [-c <purl>] [-C <cfcmd>] [-e <fmt>] [-dDhsv]
DESCRIPTION
This is the standard graphical interface for Habitat, including the
ability to view repository data provided by Harvest.
When the tool starts, a check is made for the existence of the local
collection agent, clockwork. If it is not running, the user is asked
if they wish to run it and what starting behaviour they wish in the
future.
In appearance, clockwork resembles that of a file manager, with choices
on the left and visualisation on the right. If files or other data
sources have been opened before, then their re-opening is attempted by
ghabitat and will be placed under the my files node in the tree.
See DATA SOURCES section for details of the data that can be viewed,
NAVIGATION for how to interpret the data structures and VISUALISATION
for how to examine the data once displayed.
This GUI requires Xwindows to run, use other front ends or command line
tools if you do not have that facility.
OPTIONS
-c <purl>
Append user configuration data from the route <purl>, rather
than the default file ~/.habrc.
-C <cfcmd>
Append a list of configuration directives from <cfcmd>, sepa-
rated by semicolons.
-d Place ghabitat in diagnositc mode, giving an additional level of
logging and sending the text to stderr rather than the default
or configured destinations. In daemon mode, will send output to
the controlling terminal.
-D Place ghabitat in debug mode. As -d above but generating a
great deal more information, designed to be used in conjunction
with the source code. Also overrides normal outputs and will
send the text to stderr. In daemon mode, will send output to
the controlling terminal.
-e <fmt>
Change the logging output to one of eight preset alternative
formats, some showing additional information. <fmt> must be
0-7. See LOGGING below.
-h Print a help message to stdout and exit
-v Print the version to stdout and exit
-s Run in safe mode, which prevents ghabitat automatically loading
data from files or over the network from peer machines or the
repository. Use if ghabitat start up time is excessively long.
Once started, all data resourcese can be loaded manually.
DATA SOURCES
Currently, data can be obtained from four types of sources:-
Storage file
The standard local data storage file known as a ringstore, which
is a structured format using GDBM. Open it with File->Open or
^O and use the file chooser. The file will appear under my
files in the choice tree.
Repository
Centralised data automatically appears under the repository node
in the choice tree if the configuration directive is set to a
valid location. The directive is route.sqlrs.geturl which must
contain the URL of a repository output interface.
(route.sqlrs.puturl works in the opposite direction for replica-
tion.)
Network data
Data for an individual machine can be read from the repository
or a peer clockwork instance on another host. Select File->Host
or ^H, type in the hostname and pick repository or host as a
source. (Currently, host access is not fully implemented.)
Your selection will appear under my hosts in the choice tree.
Route specification
Select File->Route or ^R and type the full route specification
of the data source. This is the most generic way of addressing
in habitat, encompassing all of the styles used above and more.
Files can be removed by selecting their entry from the list brought up
with File->Close (^C).
NAVIGATION
The repository source is special, in that the hierarchical nature of the
group organisation is shown. To get to a machine, one needs to know
its organisational location and traverse it in the tree. Whilst this
aids browsing, one may wish to use the File->Host option to go directly
to a machine.
Opening the data source trees will reveal the capabilities of the data
source, which include the following:-
perf graphs
Performance data is retrieved in a time series and will display
as a chart the visualisation section
perf data
Performance data presented in a textual form, encompassing tabu-
lar time-series data, key-data values or simple list. Visuali-
sation is always in a table.
events Text scanning, pattern matching and threshold breaching func-
tionality is clustered under this node. The configuration tables
are presented here along with the events and logs that have been
generated.
logs Logs and errors from running the jobs in clockwork
replication
Logs and state of the data replication to and from the reposi-
tory
jobs The job table that clockwork follows to collect and refine data
data Contains all the data in the storage mechanism with out inter-
pretation.
Under the performance nodes will be the available data collections,
also known as rings. The names of these collections are decided when
data is inserted into the storage. For example, sending data to the
route tab:fred.ts,mydata and mounting it under ghabitat, will cause the
data to appear here as mydata.
There are conventions for the names of standard probes, but they will
only appear in a data store if their collection is configured in the
job table (usually just uncommenting it: see clockwork(1)):-
sys System data, such as cpu statistics and memory use. Labelled as
system in choice tree
io Disk statistics, such as read/write rates and performance lev-
els. Labelled as storage in the choice tree
net Network statistics, such as packets per second. Labelled as
network in the choice tree
ps Process table. This can contain a significant amount of data
over time, so generally only the most significant or useful pro-
cesses may be included. This is dependent on the configuration
of the ps probe. Labelled as processes in the choice tree
names A set of name-value pairs relating to the configuration of the
operating system. Generally captured at start up only.
The final set of nodes below the ring names are a set of time scales by
which to examine the data. These dictate how much data is extracted
from the data source and generally the speed at which the data will be
visualised. These are preset to useful values, commonly 5 minutes, 1
hour, 4 hours, 1 day, 7 days, 1 month, 3 months, 1 year, 3 years, etc.
VISUALISATION
The right hand section of the window is used for visualisation. Its
major uses are for charting and displaying tables.
When charting, the section is divided into several parts. The greatest
is used for the graph itself, with other areas being used for curve
selection, zooming and data scaling. If the data is multi-instance,
such as with multiple disks, then a further area is added to control
the number of instance graphs being displayed.
The standard sets of data, such as sys and io have default curves that
are displayed when the graph is first drawn. The list of curves down
the right hand side are buttons used to draw or remove data on the
graph. When drawn, the button changes colour to that of of the curve
displayed.
Whilst the largest amount of data displayed is selected from the choice
tree, it is possible to 'zoom-in' to particular times very easily using
the graph. There are two methods: either drag the mouse of the area of
interest, creating a rectangle and click the left button inside or use
the x and y axis zoom buttons from the Zoom & Scale area. The display
shows the enlarged view and changes the scale the x & y rulers. The
time ruler is changes mode to show the most useful feedback of time at
that scale. To move back and forth along time, move the horizontal
scrollbar. To zoom out, either click the right mouse button over the
graph or use the zoom-out button in the Zoom & Scale area.
It is possible to alter the scale and offset of the curves by clicking
on the additional fields button in the Zoom & Scale area. This will
create addition scale and offset controls next to each curve button.
The values relate to the formula y = mx + c, where the offset is c and
the scale is m. Moving the scale changes the magnitude of the curve,
whereas the offset changes the point at which the curve originates.
Using these tools, simple parity can be gained between two curves that
you wish to superimpose on the same chart but do not share the same y
scale.
MENU
The File menu adds and removes file and other data sources to the
choice tree. It also contains import and export routines to convert
between native datastores and plain text, such as csv and tsv files.
The View menu controls the display and refresh of choice and visualisa-
tion. It also give the ability to save or send data being displayed to
e-mail, applications or a file.
The Collect menu controls data collection, if you own the collection
process.
The Graph menu changes the appearance of the chart and is only dis-
played when the graph appears.
Finally, the Help menu gives access to spot help, documentation and
links to the system garden web site for product information. Most help
menu items need a common browser on the users path to show help infor-
mation.
LOGGING
Ghabitat generates information and error messages. By default, errors
are captured internally and can be displayed in the visualisation area
by clicking on the logs node under this client.
Also available in this area are the log routes, which shows the how
information of different severity is dealt with and configuration,
which shows the values of all the current configuration directives in
effect.
See habconf(5) for more information.
FILES
Locations alter depending on how the application is installed.
For the habitat configuration
~/.habrc
$HAB/etc/habitat.conf or /etc/habitat.conf
For graphical appearance: fonts, colours, styles, etc
$HAB/lib/ghabitat.rc or /usr/lib/habitat/ghabitat.rc
For the help information
$HAB/lib/help/ or /usr/lib/habitat/help/
ENVIRONMENT VARIABLES
DISPLAY
The X-Windows display to use
PATH Used to locate a browser to display help information. Typical
browsers looked for are Mozilla, Netscape, Konqueror, Opera,
Chimera
HOME User's home directory
AUTHORS
Nigel Stuckey <[email protected]>
SEE ALSO
clockwork(1), killclock(1), habget(1), habput(1), irs(1), habedit(1),
habprobe(1), habmeth(1), habconf(5)
NAME
habget - Send habitat data to stdout
SYNTAX
habget [-c <purl>] [-C <cfcmd>] [-e <fmt>] [-dDhv] [-E] <route>
DESCRIPTION
Open <route> using habitat's route addressing and send the data to std-
out.
See clockwork(1) for an explanation of the route syntax
OPTIONS
-c <purl>
Append user configuration data from the route <purl>, rather
than the default file ~/.habrc.
-C <cfcmd>
Append a list of configuration directives from <cfcmd>, sepa-
rated by semicolons.
-d Place ghabitat in diagnostic mode, giving an additional level of
logging and sending the text to stderr rather than the default
or configured destinations. In daemon mode, will send output to
the controlling terminal.
-D Place ghabitat in debug mode. As -d above but generating a
great deal more information, designed to be used in conjunction
with the source code. Also overrides normal outputs and will
send the text to stderr. In daemon mode, will send output to
the controlling terminal.
-e <fmt>
Change the logging output to one of eight preset alternative
formats, some showing additional information. <fmt> must be
0-7. See LOGGING below.
-h Print a help message to stdout and exit
-v Print the version number to stdout and exit
-E Escape characters in data that would otherwise be unprintable
EXAMPLES
To output the job table from an established datastore file used for
public data collection. This uses the ringstore driver.
habget rs:var/myhost.rs,clockwork,0
To get the most recent data sample from the 60 second sys ring from the
same datastore as above.
habget rs:var/myhost.rs,sys,60
To find errors that may have been generated by clockwork.
habget rs:var/myhost.rs,log,0
NAME
habput - Store habitat data from stdin
SYNTAX
habput [-s <nslots> -t <desc>] [-c <purl>] [-C <cfcmd>] [-e <fmt>]
[-dDhv] <route>
DESCRIPTION
Open <route> using habitat's route addressing and send data from stdin
to the route.
See clockwork(1) for an explanation of the route syntax
OPTIONS
-c <purl>
Append user configuration data from the route <purl>, rather
than the default file ~/.habrc.
-C <cfcmd>
Append a list of configuration directives from <cfcmd>, sepa-
rated by semicolons.
-d Place ghabitat in diagnostic mode, giving an additional level of
logging and sending the text to stderr rather than the default
or configured destinations. In daemon mode, will send output to
the controlling terminal.
-D Place ghabitat in debug mode. As -d above but generating a
great deal more information, designed to be used in conjunction
with the source code. Also overrides normal outputs and will
send the text to stderr. In daemon mode, will send output to
the controlling terminal.
-e <fmt>
Change the logging output to one of eight preset alternative
formats, some showing additional information. <fmt> must be
0-7. See LOGGING below.
-h Print a help message to stdout and exit
-v Print the version to stdout and exit
-s <nslots>
Number of slots for creating ringed routes (default 1000);
<nslots> of 0 gives a queue behaviour where the oldest data is
not lost
-t <desc>
text description for creating ringed routes
EXAMPLES
To append a sample of tabular data to a table store, use a tablestore
driver. This will create a ring which can store 1,000 slots of data.
habput tab:var/myfile.ts,myring
To save the same data, but limit the ring to just the most recent 10
slots and give the ring a description
habput -s 10 -t "my description" tab:var/myfile.ts,myring
The same data, stored to the same location, but with an unlimited his-
tory (technically a queue). To make the ring readable in ghabitat with
current conventions, we store with the prefix '.r'
habput -s 0 -t "my description" tab:var/myfile.ts,r.myring
To save an error record, use a timestore driver
habput -s 100 -t "my logs" ts:var/myfile.ts,mylogs
AUTHORS
Nigel Stuckey <[email protected]>
SEE ALSO
clockwork(1), killclock(1), ghabitat(1), habget(1), irs(1), habedit(1),
habprobe(1), habmeth(1), habconf(5)
NAME
killclock - Stops clockwork, Habitat's collection agent
SYNTAX
killclock
DESCRIPTION
Stops the public instance of clockwork running on the local machine.
This shell script locates the lock file for clockwork, which is the
collection agent for the Habitat suite. It prints the process id, own-
ing user, controlling terminal and start time of the daemon, before
sending it a SIGTERM.
No check is made that the clockwork process has terminated before this
script ends.
Private instances of clockwork (started with -j option) can not be
stopped by this method, as they do not register in a lock file.
Instead, they should be controlled by conventional process control
methods.
FILES
/tmp/clockwork.run
/var/run/clockwork.run
EXAMPLES
Typing the following:-
killclock
will result in a display similar to below and the termination of the
clockwork daemon.
Stopping pid 2781, user nigel and started on /dev/pts/2 at 25-May-04
08:08:55 AM
AUTHORS
Nigel Stuckey <[email protected]>
SEE ALSO
clockwork(1), ghabitat(1), habget(1), habput(1), irs(1), habedit(1),
habprobe(1), habmeth(1), habconf(5)
The tables below shows the data that is collected by the standard probes in habitat, one table per operating system. It may not be up to date, so always check with the application itself.
|
Probe |
Measure |
Description |
|
system (sys) |
load1 |
1 minute load average |
|
load5 |
5 minute load average |
|
|
load15 |
15 minute load average |
|
|
runque |
number of runnable processes |
|
|
nprocs |
number of processes |
|
|
lastproc |
id of last process run |
|
|
mem_tot |
total memory (kB) |
|
|
mem_used |
memory used (kB) |
|
|
mem_free |
memory free (kB) |
|
|
mem_shared |
used memory shared (kB) |
|
|
mem_buf |
buffer memory (kB) |
|
|
mem_cache |
cache memory (kB) |
|
|
swap_tot |
total swap space (kB) |
|
|
swap_used |
swap space used (kB) |
|
|
swap_free |
swap space free (kB) |
|
|
uptime |
seconds that the system has been up |
|
|
idletime |
seconds that system has been idle |
|
|
%user |
% time cpu was in user space |
|
|
%nice |
% time cpu was at nice priority in user space |
|
|
%system |
% time cpu spent in kernel |
|
|
%idle |
% time cpu was idle |
|
|
pagein |
pages paged in per second |
|
|
pageout |
pages paged out per second |
|
|
swapin |
pages swapped in per second |
|
|
swapout |
pages swapped out per second |
|
|
interrupts |
hardware interrupts per second |
|
|
contextsw |
context switches per second |
|
|
forks |
process forks per second |
|
|
storage (io) |
id |
mount or device identifier |
|
device |
device name |
|
|
mount |
mount point |
|
|
fstype |
filesystem type |
|
|
size |
size of filesystem or device (MBytes) |
|
|
used |
space used on device (MBytes) |
|
|
reserved |
reserved space in filesystem (KBytes) |
|
|
%used |
% used on device |
|
|
kread |
volume of data read (KB/s) |
|
|
kwritten |
volume of data written (KB/s) |
|
|
rios |
number of read operations per second |
|
|
wios |
number of write operations per second |
|
|
read_svc_t |
average read service time (ms) |
|
|
write_svc_t |
average write service time (ms) |
|
|
network (net) |
device |
device name |
|
rx_bytes |
bytes received |
|
|
rx_pkts |
packets received |
|
|
rx_errs |
receive errors |
|
|
rx_drop |
receive dropped packets |
|
|
rx_fifo |
received fifo |
|
|
rx_frame |
receive frames |
|
|
rx_comp |
receive compressed |
|
|
rx_mcast |
received multicast |
|
|
tx_bytes |
bytes transmitted |
|
|
tx_pkts |
packets transmitted |
|
|
tx_errs |
transmit errors |
|
|
tx_drop |
transmit dropped packets |
|
|
tx_fifo |
transmit fifo |
|
|
tx_colls |
transmit collisions |
|
|
tx_carrier |
transmit carriers |
|
|
tx_comp |
transmit compressed |
|
|
uptime (up) |
uptime |
uptime in secs |
|
boot |
time of boot in secs from epoch |
|
|
suspend |
secs suspended |
|
|
vendor |
vendor name |
|
|
model |
model name |
|
|
nproc |
number of processors |
|
|
mhz |
processor clock speed |
|
|
cache |
size of cache in kb |
|
|
fpu |
floating point unit available |
|
|
downtime |
lastup |
time last alive in seconds from epoch |
|
boot |
time of boot in secs from epoch |
|
|
downtime |
secs unavailable |
|
|
processes (ps) |
pid |
process id |
|
ppid |
process id of parent |
|
|
pidglead |
process id of process group leader |
|
|
sid |
session id |
|
|
uid |
real user id |
|
|
pwname |
name of real user |
|
|
euid |
effective user id |
|
|
epwname |
name of effective user |
|
|
gid |
real group id |
|
|
egid |
effective group id |
|
|
size |
size of process image in Kb |
|
|
rss |
resident set size in Kb |
|
|
flag |
process flags (system dependent) |
|
|
nlwp |
number of lightweight processes within this process |
|
|
tty |
controlling tty device |
|
|
%cpu |
% of recent cpu time |
|
|
%mem |
% of system memory |
|
|
start |
process start time from epoc |
|
|
time |
total cpu time for this process |
|
|
childtime |
total cpu time for reaped child processes |
|
|
nice |
nice level for cpu scheduling |
|
|
syscall |
system call number (if in kernel) |
|
|
pri |
priority (high value=high priority) |
|
|
wchan |
wait address for sleeping process |
|
|
wstat |
if zombie the wait() status |
|
|
cmd |
command/name of exec'd file |
|
|
args |
full command string |
|
|
user_t |
user level cpu time |
|
|
sys_t |
sys call cpu time |
|
|
otrap_t |
other system trap cpu time |
|
|
textfault_t |
text page fault sleep time |
|
|
datafault_t |
data page fault sleep time |
|
|
kernelfault_t |
kernel page fault sleep time |
|
|
lockwait_t |
user lock wait sleep time |
|
|
osleep_t |
all other sleep time |
|
|
waitcpu_t |
wait-cpu (latency) time |
|
|
stop_t |
stopped time |
|
|
minfaults |
minor page faults |
|
|
majfaults |
major page faults |
|
|
nswaps |
number of swaps |
|
|
inblock |
input blocks |
|
|
outblock |
output blocks |
|
|
msgsnd |
messages sent |
|
|
msgrcv |
messages received |
|
|
sigs |
signals received |
|
|
volctx |
voluntary context switches |
|
|
involctx |
involuntary context switches |
|
|
syscalls |
system calls |
|
|
chario |
characters read and written |
|
|
pendsig |
set of process pending signals |
|
|
heap_vaddr |
virtual address of process heap |
|
|
heap_size |
size of process heap in bytes |
|
|
stack_vaddr |
virtual address of process stack |
|
|
stack_size |
size of process stack in bytes |
|
|
hardware |
name |
device name |
|
hard |
interrupts from hardware device |
|
|
soft |
interrupts self induced by system |
|
|
watchdog |
interrupts from a periodic timer |
|
|
spurious |
interrupts for unknown reason |
|
|
multisvc |
multiple servicing during single interrupt |
|
|
system |
name |
name |
|
vname |
value name |
|
|
value |
value of symbol |
|
Probe |
Measure |
Description |
|
system (sys) |
updates |
|
|
runque |
+= num runnable procs |
|
|
runocc |
++ if num runnable procs > 0 |
|
|
swpque |
+= num swapped procs |
|
|
swpocc |
++ if num swapped procs > 0 |
|
|
waiting |
+= jobs waiting for I/O |
|
|
freemem |
+= freemem in pages |
|
|
swap_resv |
+= reserved swap in pages |
|
|
swap_alloc |
+= allocated swap in pages |
|
|
swap_avail |
+= unreserved swap in pages |
|
|
swap_free |
+= unallocated swap in pages |
|
|
%idle |
time cpu was idle |
|
|
%wait |
time cpu was idle waiting for IO |
|
|
%user |
time cpu was in user space |
|
|
%system |
time cpu was in kernel space |
|
|
wait_io |
time cpu was idle waiting for IO |
|
|
wait_swap |
time cpu was idle waiting for swap |
|
|
wait_pio |
time cpu was idle waiting for programmed I/O |
|
|
bread |
physical block reads |
|
|
bwrite |
physical block writes (sync+async) |
|
|
lread |
logical block reads |
|
|
lwrite |
logical block writes |
|
|
phread |
raw I/O reads |
|
|
phwrite |
raw I/O writes |
|
|
pswitch |
context switches |
|
|
trap |
traps |
|
|
intr |
device interrupts |
|
|
syscall |
system calls |
|
|
sysread |
read() + readv() system calls |
|
|
syswrite |
write() + writev() system calls |
|
|
sysfork |
forks |
|
|
sysvfork |
vforks |
|
|
sysexec |
execs |
|
|
readch |
bytes read by rdwr() |
|
|
writech |
bytes written by rdwr() |
|
|
rawch |
terminal input characters |
|
|
canch |
chars handled in canonical mode |
|
|
outch |
terminal output characters |
|
|
msg |
msg count (msgrcv()+msgsnd() calls) |
|
|
sema |
semaphore ops count (semop() calls) |
|
|
namei |
pathname lookups |
|
|
ufsiget |
ufs_iget() calls |
|
|
ufsdirblk |
directory blocks read |
|
|
ufsipage |
inodes taken with attached pages |
|
|
ufsinopage |
inodes taked with no attached pages |
|
|
inodeovf |
inode table overflows |
|
|
fileovf |
file table overflows |
|
|
procovf |
proc table overflows |
|
|
intrthread |
interrupts as threads (below clock) |
|
|
intrblk |
intrs blkd/prempted/released (switch) |
|
|
idlethread |
times idle thread scheduled |
|
|
inv_swtch |
involuntary context switches |
|
|
nthreads |
thread_create()s |
|
|
cpumigrate |
cpu migrations by threads |
|
|
xcalls |
xcalls to other cpus |
|
|
mutex_adenters |
failed mutex enters (adaptive) |
|
|
rw_rdfails |
rw reader failures |
|
|
rw_wrfails |
rw writer failures |
|
|
modload |
times loadable module loaded |
|
|
modunload |
times loadable module unloaded |
|
|
bawrite |
physical block writes (async) |
|
|
iowait |
procs waiting for block I/O |
|
|
pgrec |
page reclaims (includes pageout) |
|
|
pgfrec |
page reclaims from free list |
|
|
pgin |
pageins |
|
|
pgpgin |
pages paged in |
|
|
pgout |
pageouts |
|
|
pgpgout |
pages paged out |
|
|
swapin |
swapins |
|
|
pgswapin |
pages swapped in |
|
|
swapout |
swapouts |
|
|
pgswapout |
pages swapped out |
|
|
zfod |
pages zero filled on demand |
|
|
dfree |
pages freed by daemon or auto |
|
|
scan |
pages examined by pageout daemon |
|
|
rev |
revolutions of the page daemon hand |
|
|
hat_fault |
minor page faults via hat_fault() |
|
|
as_fault |
minor page faults via as_fault() |
|
|
maj_fault |
major page faults |
|
|
cow_fault |
copy-on-write faults |
|
|
prot_fault |
protection faults |
|
|
softlock |
faults due to software locking req |
|
|
kernel_asflt |
as_fault()s in kernel addr space |
|
|
pgrrun |
times pager scheduled |
|
|
nc_hits |
hits that we can really use |
|
|
nc_misses |
cache misses |
|
|
nc_enters |
number of enters done |
|
|
nc_dblenters |
num of enters when already cached |
|
|
nc_longenter |
long names tried to enter |
|
|
nc_longlook |
long names tried to look up |
|
|
nc_mvtofront |
entry moved to front of hash chain |
|
|
nc_purges |
number of purges of cache |
|
|
flush_ctx |
num of context flushes |
|
|
flush_segment |
num of segment flushes |
|
|
flush_page |
num of complete page flushes |
|
|
flush_partial |
num of partial page flushes |
|
|
flush_usr |
num of non-supervisor flushes |
|
|
flush_region |
num of region flushes |
|
|
var_buf |
num of I/O buffers |
|
|
var_call |
num of callout (timeout) entries |
|
|
var_proc |
max processes system wide |
|
|
var_maxupttl |
max user processes system wide |
|
|
var_nglobpris |
num of global scheduled priorities configured |
|
|
var_maxsyspri |
max global priorities used by system class |
|
|
var_clist |
num of clists allocated |
|
|
var_maxup |
max number of processes per user |
|
|
var_hbuf |
num of hash buffers to allocate |
|
|
var_hmask |
hash mask for buffers |
|
|
var_pbuf |
num of physical I/O buffers |
|
|
var_sptmap |
size of sys virt space alloc map |
|
|
var_maxpmem |
max physical memory to use in pages (if 0 use all available) |
|
|
var_autoup |
min secs before a delayed-write buffer can be flushed |
|
|
var_bufhwm |
high water mrk of buf cache in KB |
|
|
var_xsdsegs |
num of XENIX shared data segs |
|
|
var_xsdslots |
num of slots in xsdtab[] per segmt |
|
|
flock_reccnt |
num of records currently in use |
|
|
flock_rectot |
num of records used since boot |
|
|
processes (ps) |
pid |
process id |
|
ppid |
process id of parent |
|
|
pidglead |
process id of process group leader |
|
|
sid |
session id |
|
|
uid |
real user id |
|
|
pwname |
name of real user |
|
|
euid |
effective user id |
|
|
epwname |
name of effective user |
|
|
gid |
real group id |
|
|
egid |
effective group id |
|
|
size |
size of process image in Kb |
|
|
rss |
resident set size in Kb |
|
|
flag |
process flags (system dependent) |
|
|
nlwp |
number of lightweight processes within this process |
|
|
tty |
controlling tty device |
|
|
%cpu |
% of recent cpu time |
|
|
%mem |
% of system memory |
|
|
start |
process start time from epoc |
|
|
time |
total cpu time for this process |
|
|
childtime |
total cpu time for reaped child processes |
|
|
nice |
nice level for scheduling |
|
|
syscall |
system call number (if in kernel) |
|
|
pri |
priority (high value=high priority) |
|
|
wchan |
wait address for sleeping process |
|
|
wstat |
if zombie the wait() status |
|
|
cmd |
command/name of exec'd file |
|
|
args |
full command string |
|
|
user_t |
user level cpu time |
|
|
sys_t |
sys call cpu time |
|
|
otrap_t |
other system trap cpu time |
|
|
textfault_t |
text page fault sleep time |
|
|
datafault_t |
data page fault sleep time |
|
|
kernelfault_t |
kernel page fault sleep time |
|
|
lockwait_t |
user lock wait sleep time |
|
|
osleep_t |
all other sleep time |
|
|
waitcpu_t |
wait-cpu (latency) time |
|
|
stop_t |
stopped time |
|
|
minfaults |
minor page faults |
|
|
majfaults |
major page faults |
|
|
nswaps |
number of swaps |
|
|
inblock |
input blocks |
|
|
outblock |
output blocks |
|
|
msgsnd |
messages sent |
|
|
msgrcv |
messages received |
|
|
sigs |
signals received |
|
|
volctx |
voluntary context switches |
|
|
involctx |
involuntary context switches |
|
|
syscalls |
system calls |
|
|
chario |
characters read and written |
|
|
pendsig |
set of process pending signals |
|
|
heap_vaddr |
virtual address of process heap |
|
|
heap_size |
size of process heap bytes |
|
|
stack_vaddr |
virtual address of process stack |
|
|
stack_size |
size of process stack bytes |
|
|
storage (io) |
device |
device name |
|
nread |
number of bytes read |
|
|
nwritten |
number of bytes written |
|
|
reads |
number of read operations |
|
|
writes |
number of write operations |
|
|
wait_t |
cumulative wait (pre-service) time |
|
|
wait_len_t |
cumulative wait length*time product |
|
|
run_t |
cumulative run (service) time |
|
|
run_len_t |
cumulative run length*time product |
|
|
wait_cnt |
wait count |
|
|
run_cnt |
run count |
|
|
System |
name |
name |
|
vname |
value name |
|
|
value |
value |
|
|
system timers |
kname |
timer name |
|
name |
event name |
|
|
nevents |
number of events |
|
|
elapsed_t |
cumulative elapsed time |
|
|
min_t |
shortest event duration |
|
|
max_t |
longest event duration |
|
|
start_t |
previous event start time |
|
|
stop_t |
previous event stop time |
|
|
uptime (up) |
uptime |
uptime in secs |
|
boot |
time of boot in secs from epoch |
|
|
suspend |
secs suspended |
|
|
vendor |
vendor name |
|
|
model |
model name |
|
|
nproc |
number of processors |
|
|
mhz |
processor clock speed |
|
|
cache |
size of cache in kb |
|
|
fpu |
floating point unit available |
|
|
down- |
lastup |
time last alive in secs from epoch |
|
boot |
time of boot in secs from epoch |
|
|
downtime |
secs unavailable |
|
|
hardware
|
name |
device name |
|
hard |
interrupt from hardware device |
|
|
soft |
interrupt self induced by system |
|
|
watchdog |
interrupt from periodic timer |
|
|
spurious |
interrupt for unknown reason |
|
|
multisvc |
multiple servicing during single interrupt |
The Fat Headed Array is a table of information designed for transportation between systems and external representation. Habitat uses FHAs when reading clockwork's central store and for I/O work with harvest. It is also used when loading data into harvest's repository when linking with other systems.
There are three parts to the data in the following order:
Column names: tab separated attribute names that must be in line 1
Info Rows: zero or more lines of meta information for each attribute. Each meta record takes one line, separating its fields with tabs which must be in the same order as the attribute names. Each info row is named with a trailing supplementary field, such that these rows have one more column than the data or column name rows. Info rows are terminated with a row which must start with two dashes '--'.
One or more data rows to follow the column names and info rows.
The following is an example of a FHA.
To represent an empty value, two quotes should be used (“”). Occasionally, this may also be represented as a single dash.
To represent a value that contains tab characters (\t), the value should be contains in quotes (eg “embedded \t tab“).
Column names may only contain characters accepted by SQL database servers to ensure compatibility. This is generally accepted as the range [A-Za-z_]. See the info row name below for greater expression.
In addition to pure character formatting, ghabitat the graphical client, also understands certain named info rows.
info – the text in the row is used as informational help in the client. This can be seen when hovering the mouse over a column name in a table or a curve's button when displaying charts. The information is contained in graphical 'pop-ups' or 'tool tips'.
max – Optional value which, if present, sets the maximum expected value for an attribute. This helps in making charts more understandable.
type – the data type of the column. In version 1, the data types are relatively simple:-
i32 – 32-bit signed integer
u32 – 32-bit unsigned integer
i64 – 64 bit signed integer
u64 – 64bit unsigned integer
nano – nano second precision when used for timers. Currently also used for floating point values with more restrictive accuracy
str – string value
key – The column that is the primary key of the table contains a 1, all the other values contain no value (“”). May be expanded to show secondary or tertiary keys in later versions.
name – The unrestricted full name of the column, if it is not possible to express it in the column name. If blank, the column name is used as the attribute's label. This is used to include punctuation characters such as '-' or '%' in the label as they are disallowed by the SQL naming standard but can be very useful for compact expression.
Clockwork reads a job table and uses the information to establish repeating and timed jobs. It is similar to the Unix scheduler cron, but with greater flexibility.
When first run, clockwork boot straps an initial version of its jobs from the file lib/clockwork.jobs. The resulting table is stored in the ringstore location var/<hostname>.rs,jobs,0. Subsequent runs of clockwork will use this table, so any amendments should be made using habedit on the ringstore route.
Clockwork may also be started with an alternate job table by using the the -j switch to clockwork. In this mode, clockwork runs a private data collector with out starting a network service for the whole machine.
Jobs are defined in a multi columned text format, headed by the magic string 'job 1'. Comments may appear anywhere, starting with '#' and running to the end of the line. Each job is defined on a single line containing 11 arguments, which in order are:-
start – when to start the job, in seconds from the starting of clockwork
period – how often to repeat the job, in seconds
phase – not yet implemented
count – how many times the job should be run, with 0 repeating forever
name – name of the job
requester – who requested the job, by convention the email address
results – the route where results should be sent
errors – the route where errors should be sent
nslots – the number of slots created in the 'results' and 'errors' routes, if applicable (applies to ringstore and SQL ringstore).
method – the job method
command – the arguments given to each method

Part
of a job table taken from the default file lib/clockwork.jobs
is printed below. The top line runs the sys
probe every 60 seconds, gathering system data (which becomes the
system
node in the choice tree). The
remaining lines use the collected high frequency data and transform
it to lower frequencies using an averaging process, running every
five minutes, fifteen minutes and one month (300, 900 and 3600
seconds).
The job running every 300 seconds creates a storage ring with 288 entries, allowing a full day's data at five minute intervals to be collected. The other jobs collect seven days at 15 minutes and one month at hourly intervals.
The methods are probe which is given the command sys, and sample which has the command avg and the route to sample as its argument. These methods are available on the command line using habmeth. The probe data is similarly available using the command habprobe.
Note that special tokens are used which expand when clockwork is running. These are %d for the ring's duration and %h for the hostname. Other tokens are available which are explored in the Administration manual.
The pattern matching table is user configurable, using the processing mechanism is described earlier in this document.
The pattern-matching table, which defines the bahaviour has the following columns:
pattern – the regular expression to look for as a pattern, which should normally match a single line. Each match is considered an event.
embargo time – number of seconds that must elapse after the first event before another event may be raised of the same pattern from the same route.
embargo count – maximum number of identical pattern matches that can take place before another event is raised for that pattern and route.
severity – importance of the event. One of: fatal, error, warning, info, diag, debug
action method – event's execution method
action arguments – method specific arguments
action message – text message to describe the event. It may contain special tokens of the form %<char> that describe other information about the event.
When the event is detected and is not subject to embargo, then an event is raised. A text message is prepared which is turned into an instruction using the action method and arguments. Then, it is appended to the event ring for execution (see below).
The watched sources table defines a set of routes associated with a identifier. A watching job then ties together a set of sources with a set of patterns and executes them periodically. When the watching job starts, it checks all the routes defined in this table for changes in size. Those that have changed will be checked for pattern matches (see the details above).
The format of the table is simple: one entry per line, with each being a valid route format.
The event table is filled from the activities of pattern matching and threshold detection. When there is a match not covered by an embargo, an event will be raised, which is a instruction to execute a method supported by the habitat job environment. The instructions are queued as separate sequences in the event ring, which is stored in the system ringstore.
The table format is simple:-
method – execution method as supported by the habitat job environment.
command – command to give to method
arguments – command arguments, which may contain spaces. The '%' character must be escaped if it is to be used in an argument (see below)
stdin – input text to the method, which must be introduced with '%' to separate it from the argument. Successive % characters represent new lines. To actually print %, escape it with backslash (\%).
When an event has been completed, the next sequence to be processed is stored in a state table. The event ring is a finite length, so old events will be removed automatically over time.