System Garden

Habitat 1.0 User Manual

Contents

  1. A Tour of Habitat
  2. Getting Started
  3. Concepts
  4. Clockwork: The Collection Agent
  5. Graphical Tools
  6. Text Terminal Tools
  7. Command Line Tools
  8. System Performance
  9. Events
  10. Administration
  11. Diagnostics
  12. Appendix

Manual Pages

The following are a selection of the manual pages distributed with the habitat package, considered pertinent to the User Guide.

clockwork

NAME
       clockwork - collection daemon for the Habitat suite

SYNTAX
       clockwork [-c <purl>] [-C <cfcmd>] [-e <fmt>] [-dDhsv] [-j <jobs>]

DESCRIPTION
       Clockwork  is the local agent for the Habitat suite.  It runs as a dae-
       mon process on each machine to be monitored and is  designed  to  carry
       out  data  collection, log file monitoring, data-driven actions and the
       distribution of collected data.

       The default jobs are to collect system,  network,  storage  and  uptime
       statistics  on  the local machine and make them available in a standard
       place.  The collection of process data and file monitoring is available
       by  configuring  the  jobs  that drive clockwork.  Configuration can be
       carried out at a local, regional and global level to allow  deligation.
       One  public and many private instances of clockwork can exist on a sin-
       gle machine, allowing individual users to carry out custom data collec-
       tion  Data  is  normally  held  in  ring buffers or queues on the local
       machine using custom datastores to  be  self  contained  and  scalable.
       Periodic  replication of data rings to a repository is used for archiv-
       ing and may be done in reverse for central data transmission.

OPTIONS
       -c <purl>
              Append user configuration data from  the  route  <purl>,  rather
              than the default file ~/.habrc.

       -C <cfcmd>
              Append  a  list  of configuration directives from <cfcmd>, sepa-
              rated by semicolons.

       -d     Place clockwork in diagnostic mode, giving an  additional  level
              of  logging  and  sending  the  text  to  stderr rather than the
              default or configured destinations.  In daemon mode,  will  send
              output to the controlling terminal.

       -D     Place  clockwork  in  debug  mode.  As -d above but generating a
              great deal more information, designed to be used in  conjunction
              with  the  source  code.  Also overrides normal outputs and will
              send the text to stderr.  In daemon mode, will  send  output  to
              the controlling terminal.

       -e <fmt>
              Change  the  logging  output  to one of eight preset alternative
              formats, some showing additional  information.   <fmt>  must  be
              0-7.  See LOGGING below.

       -h     Print a help message to stdout and exit

       -v     Print the version to stdout and exit

       -j <jobs>
              Override  public  job  table  with a private one provided by the
              route <jobs>.  Clockwork will not daemonise, run a data  service
              or  take  an exclusive system lock (there can only be one public
              clockwork instance).  Implies -s and alters the  logging  output
              to  stderr,  unless overridden with the range of elog configura-
              tion directives.

       -s     Disable the public data service from being run,  but  will  con-
              tinue to save data as dictated by configuration.

DEFAULTS
       When  clockwork  starts it reads $HAB/etc/habitat.conf and ~/.habrc for
       configuration  data  (see  CONFIGURATION  for  more  details).   Unless
       overridden,  clockwork  will  then  look for its jobs inside the default
       public datastore for that machine, held in $HAB/var/<hostname>.rs  (the
       route  address  is  rs:$HAB/var/<hostname>.rs,jobs,0,  see below for an
       explanation).  If it does not find the jobs, clockwork bootstaps itself
       by copying a default job template from the file $HAB/lib/clockwork.jobs
       into the public datastore and then carries on using the datastore  ver-
       sion.

       The  default jobs run system, network and storage data gathering probes
       every 60 seconds.  It saves results to the public datastore  using  the
       template  route  rs:$HAB/var/<hostname>.rs,<jobname>,60  and  errors to
       rs:$HAB/var/<hostname>.rs,err_<jobname>,60

       All other errors are placed in rs:$HAB/var/<hostname>.rs,log,0

ROUTES
       To move data around in clockwork, an enhanced URL is used as a form  of
       addressing and is called a 'route' (also known as a pseudo-url or p-url
       in documentation).  The format is <driver>:<address>, where driver must
       be one of the following:-

       file: fileov:
              reads  and  write  to  paths  on  the filesystem.  The format is
              file:<file path>, which will always append text to the file when
              writing.   The  fileov:  driver  will  overwrite text when first
              writing and is suitable for configuration files or states.

       http: https:
              reads and writes using HTTP or HTTPS to a network address.   The
              address  is the server name and object name as a normal URL con-
              vention.

       rs:    read and writes to a ring store, the primary local storage mech-
              anism.   Tabular  data  is stored in a time series in a queue or
              ring buffer structure.  Multiple rings of data can be stored  in
              a single ringstore file, using different names and durations.

       sqlrs: reads  and  writes  tabular  data to a remote repository service
              using the SQL Ringstore method, which is  implemented  over  the
              HTTP  protocol.   Harvest  provides repository services.  Stores
              tabular data in a time series, addressed by host name, ring name
              and duration.  Data is stored in a queue or ring buffer storage.

CONFIGURATION
       By default, clockwork will collect system, network and storage  statis-
       tics for the system on which it runs.  All the data is read and written
       from a local datastore, apart from configuration items which come  from
       external  sources.   These  external  configuration  sources govern the
       operation of all the habitat commands and applications.

       Refer to the habconf(5) man page for more details.

JOB DEFINITIONS
       Jobs are defined in a multi columned text format, headed by  the  magic
       string  'job  1'.   Comments may appear anywhere, starting with '#' and
       running to the end of the line.

       Each job is defined on a single line containing 11 arguments, which  in
       order are:-

       1. start
              when to start the job, in seconds from the starting of clockwork

       2. period
              how often to repeat the job, in seconds

       3. phase
              not yet implemented

       4. count
              how many times the job should be run, with 0 repeating forever

       5. name
              name of the job

       6. requester
              who requested the job, by convention the email address

       7. results
              the route where results should be sent

       8. errors
              the route where errors should be sent

       9. nslots
              the number of  slots  created  in  the  'results'  and  'errors'
              routes, if applicable (applies to timestore and tablestore).

       10.method
              the job method

       11.command
              the arguments given to each method

       See the habmeth(1) manpage for details of the possible methods that may
       be specified and the commands that can accept.

DATA ORGANISATION
       Data is stored in sequences  of tabular information.  All data  has  an
       ordered  independently of time, allowing multiple separate samples that
       share the same time interval.  This data is  stored  in  a  ringbuffer,
       which  allows  data  to  grow to a certain number of samples before the
       oldest are removed and their space recycled.  Throughout the documenta-
       tion, each collection of samples is known as a ring, and may be config-
       ured to be a simple queue, where data management is left up  to  admin-
       istrators. 

       To  limit  the  amount  of  storage used, data in a ring can be sampled
       periodically to form new summary data and stored in a new ring  with  a
       different  period.   In  habitat,  this is known as cascading and takes
       place on all the default collection rings.  Several levels of cascading
       can take place over several new rings, This allows summaries at differ-
       ent frequencies to be collected and tuned to local requirements.

       See the habmeth(1) man page for  more  information  about  the  cascade
       method.

DATA REPLICATION
       Any  ring of information can be sent to or from the repository at known
       intervals, allowing a deterministic way of updating both repository and
       collection agent.

       This  is  implemented as a regular job which runs the replicate method.
       Data for the method is provided by configuration parameters  which  can
       be  set and altered in the organisation.  Thus the replication job does
       not normally need to be altered to change the behaviour.

       See the habmeth(1) man page for the replicate method and the  formation
       of the configuration data.

LOGGING
       Clockwork  and  the probes that provide data, also generate information
       and error messages. By convention, these are stored in the route speci-
       fication  ts:$hab/var/<host>.ts,log  The  convention  for  probes is to
       store their errors in ts:$HAB/var/<host>.ts,e.<jobname>.

       To override the logging location, use the range of  elog  configuration
       directives,  or  rely  on  the options -d, -D, -j, which will alter the
       location to stderr as a  side  effect.   See  habconf(5)  for  details.
       Probe logging is configurable for each job in the job table.


       The  logging  format can be customised using one of a set of configura-
       tion directives (see habconf(5)).  For convenience, the -e flag  speci-
       fies  one  of eight preconfigured text formats that will be sent to the
       configured location:-

       0      all 17 possible log variables

       1      severity character & text

       2      severity & text

       3      severity, text, file, function & line

       4      long severity, short time, short program name,  file,  function,
              line & text

       5      date  time, severity, long program name, process id, file, func-
              tion, line, origin, code & text

       6      unix ctime, seconds since 1970, short program name, process  id,
              thread id, file, function, line, origin, code & text

       7      severity, file, line, origin, code, text

FILES
       If run from a single directory $HAB:-
       $HAB/bin/clockwork
       $HAB/var/<hostname>.rs, $HAB/lib/clockwork.jobs
       /tmp/clockwork.run
       ~/.habrc, $HAB/etc/habitat.conf

       If run from installed Linux locations:-
       /usr/bin/habitat
       /var/lib/habitat/<hostname>.rs, /usr/lib/habitat/clockwork.jobs
       /var/lib/habitat/clockwork.run
       ~/.habrc, /etc/habitat.conf

ENVIRONMENT VARIABLES
EXAMPLES
       Type  the following to run clockwork in the standard way.  This assumes
       it is providing public data using the standard job file, storing  in  a
       known place and using the standard network port for the data service.

       clockwork

       On  a  more  secure system, you can prevent the data service from being
       started

       clockwork -s

       Alternatively you can run it in a private mode by specifying '-j'  and  a
       replacement job file.

       clockwork -j <file>

AUTHORS
       Nigel Stuckey <[email protected]>

SEE ALSO
       killclock(1),  ghabitat(1),  habget(1),  habput(1), irs(1), habedit(1),
       habprobe(1), habmeth(1), habconf(5)

ghabitat

NAME
       ghabitat - Gtk+ Graphical interface to Habitat suite

SYNTAX
       ghabitat [-c <purl>] [-C <cfcmd>] [-e <fmt>] [-dDhsv]

DESCRIPTION
       This  is  the  standard  graphical interface for Habitat, including the
       ability to view repository data provided by Harvest.

       When the tool starts, a check is made for the existence  of  the  local
       collection  agent,  clockwork.  If it is not running, the user is asked
       if they wish to run it and what starting behaviour  they  wish  in  the
       future.

       In appearance, clockwork resembles that of a file manager, with choices
       on the left and visualisation on the right.  If  files  or  other  data
       sources  have been opened before, then their re-opening is attempted by
       ghabitat and will be placed under the my files node in the tree.

       See DATA SOURCES section for details of the data that  can  be  viewed,
       NAVIGATION  for  how to interpret the data structures and VISUALISATION
       for how to examine the data once displayed.

       This GUI requires Xwindows to run, use other front ends or command line
       tools if you do not have that facility.

OPTIONS
       -c <purl>
              Append  user  configuration  data  from the route <purl>, rather
              than the default file ~/.habrc.

       -C <cfcmd>
              Append a list of configuration directives  from  <cfcmd>,  sepa-
              rated by semicolons.

       -d     Place ghabitat in diagnositc mode, giving an additional level of
              logging and sending the text to stderr rather than  the  default
              or configured destinations.  In daemon mode, will send output to
              the controlling terminal.

       -D     Place ghabitat in debug mode.  As  -d  above  but  generating  a
              great  deal more information, designed to be used in conjunction
              with the source code.  Also overrides normal  outputs  and  will
              send  the  text  to stderr.  In daemon mode, will send output to
              the controlling terminal.

       -e <fmt>
              Change the logging output to one  of  eight  preset  alternative
              formats,  some  showing  additional  information.  <fmt> must be
              0-7.  See LOGGING below.

       -h     Print a help message to stdout and exit

       -v     Print the version to stdout and exit

       -s     Run in safe mode, which prevents ghabitat automatically  loading
              data  from  files  or over the network from peer machines or the
              repository.  Use if ghabitat start up time is excessively  long.
              Once started, all data resourcese can be loaded manually.

DATA SOURCES
       Currently, data can be obtained from four types of sources:-

       Storage file
              The standard local data storage file known as a ringstore, which
              is a structured format using GDBM.  Open it with  File->Open  or
              ^O  and  use  the  file  chooser.  The file will appear under my
              files in the choice tree.

       Repository
              Centralised data automatically appears under the repository node
              in  the  choice  tree if the configuration directive is set to a
              valid location.  The directive is route.sqlrs.geturl which  must
              contain    the   URL   of   a   repository   output   interface.
              (route.sqlrs.puturl works in the opposite direction for replica-
              tion.)

       Network data
              Data  for  an individual machine can be read from the repository
              or a peer clockwork instance on another host.  Select File->Host
              or  ^H,  type  in  the hostname and pick repository or host as a
              source.  (Currently, host  access  is  not  fully  implemented.)
              Your selection will appear under my hosts in the choice tree.

       Route specification
              Select  File->Route  or ^R and type the full route specification
              of the data source.  This is the most generic way of  addressing
              in  habitat, encompassing all of the styles used above and more.

       Files can be removed by selecting their entry from the list brought  up
       with File->Close (^C).

NAVIGATION
       The repository source is special, in that the hierarchical nature of the
       group organisation is shown.  To get to a machine, one  needs  to  know
       its  organisational  location and traverse it in the tree.  Whilst this
       aids browsing, one may wish to use the File->Host option to go directly
       to a machine.

       Opening  the  data source trees will reveal the capabilities of the data
       source, which include the following:-

       perf graphs
              Performance data is retrieved in a time series and will  display
              as a chart the visualisation section

       perf data
              Performance data presented in a textual form, encompassing tabu-
              lar time-series data, key-data values or simple list.   Visuali-
              sation is always in a table.

       events Text  scanning,  pattern  matching and threshold breaching func-
              tionality is clustered under this node. The configuration tables
              are presented here along with the events and logs that have been
              generated.

       logs   Logs and errors from running the jobs in clockwork

       replication
              Logs and state of the data replication to and from  the  reposi-
              tory

       jobs   The job table that clockwork follows to collect and refine data

       data   Contains  all  the data in the storage mechanism with out inter-
              pretation. 

       Under  the  performance  nodes  will be the available data collections,
       also known as rings.  The names of these collections are  decided  when
       data  is  inserted  into the storage.  For example, sending data to the
       route tab:fred.ts,mydata and mounting it under ghabitat, will cause the
       data to appear here as mydata.

       There  are  conventions for the names of standard probes, but they will
       only appear in a data store if their collection is  configured  in  the
       job table (usually just uncommenting it: see clockwork(1)):-

       sys    System data, such as cpu statistics and memory use.  Labelled as
              system in choice tree

       io     Disk statistics, such as read/write rates and  performance  lev-
              els.  Labelled as storage in the choice tree

       net    Network  statistics,  such  as  packets per second.  Labelled as
              network in the choice tree

       ps     Process table. This can contain a  significant  amount  of  data
              over time, so generally only the most significant or useful pro-
              cesses may be included.  This is dependent on the  configuration
              of the ps probe.  Labelled as processes in the choice tree

       names  A  set  of name-value pairs relating to the configuration of the
              operating system.  Generally captured at start up only.

       The final set of nodes below the ring names are a set of time scales by
       which  to  examine  the data.  These dictate how much data is extracted
       from the data source and generally the speed at which the data will  be
       visualised.   These  are preset to useful values, commonly 5 minutes, 1
       hour, 4 hours, 1 day, 7 days, 1 month, 3 months, 1 year, 3 years,  etc.

VISUALISATION
       The  right  hand  section of the window is used for visualisation.  Its
       major uses are for charting and displaying tables.

       When charting, the section is divided into several parts.  The greatest
       is  used  for  the  graph itself, with other areas being used for curve
       selection, zooming and data scaling.  If the  data  is  multi-instance,
       such  as  with  multiple disks, then a further area is added to control
       the number of instance graphs being displayed.

       The standard sets of data, such as sys and io have default curves  that
       are  displayed  when the graph is first drawn.  The list of curves down
       the right hand side are buttons used to draw  or  remove  data  on  the
       graph.   When  drawn, the button changes colour to that of of the curve
       displayed.

       Whilst the largest amount of data displayed is selected from the choice
       tree, it is possible to 'zoom-in' to particular times very easily using
       the graph.  There are two methods: either drag the mouse of the area of
       interest,  creating a rectangle and click the left button inside or use
       the x and y axis zoom buttons from the Zoom & Scale area.  The  display
       shows  the  enlarged  view and changes the scale the x & y rulers.  The
       time ruler is changes mode to show the most useful feedback of time  at
       that  scale.   To  move  back and forth along time, move the horizontal
       scrollbar.  To zoom out, either click the right mouse button  over  the
       graph or use the zoom-out button in the Zoom & Scale area.

       It  is possible to alter the scale and offset of the curves by clicking
       on the additional fields button in the Zoom & Scale  area.   This  will
       create  addition  scale  and offset controls next to each curve button.
       The values relate to the formula y = mx + c, where the offset is c  and
       the  scale  is m.  Moving the scale changes the magnitude of the curve,
       whereas the offset changes the point at  which  the  curve  originates.
       Using  these tools, simple parity can be gained between two curves that
       you wish to superimpose on the same chart but do not share the  same  y
       scale.

MENU
       The  File  menu  adds  and  removes  file and other data sources to the
       choice tree.  It also contains import and export  routines  to  convert
       between native datastores and plain text, such as csv and tsv files.

       The View menu controls the display and refresh of choice and visualisa-
       tion.  It also give the ability to save or send data being displayed to
       e-mail, applications or a file.

       The  Collect  menu  controls data collection, if you own the collection
       process.

       The Graph menu changes the appearance of the chart  and  is  only  dis-
       played when the graph appears.

       Finally,  the  Help  menu  gives access to spot help, documentation and
       links to the system garden web site for product information.  Most help
       menu  items need a common browser on the users path to show help infor-
       mation.

LOGGING
       Ghabitat generates information and error messages.  By default,  errors
       are  captured internally and can be displayed in the visualisation area
       by clicking on the logs node under this client.

       Also available in this area are the log routes,  which  shows  the  how
       information  of  different  severity  is  dealt with and configuration,
       which shows the values of all the current configuration  directives  in
       effect.

       See habconf(5) for more information.

FILES
       Locations alter depending on how the application is installed.

       For the habitat configuration
              ~/.habrc
              $HAB/etc/habitat.conf or /etc/habitat.conf

       For graphical appearance: fonts, colours, styles, etc
              $HAB/lib/ghabitat.rc or /usr/lib/habitat/ghabitat.rc

       For the help information
              $HAB/lib/help/ or /usr/lib/habitat/help/

ENVIRONMENT VARIABLES
       DISPLAY
              The X-Windows display to use

       PATH   Used  to  locate a browser to display help information.  Typical
              browsers looked for are  Mozilla,  Netscape,  Konqueror,  Opera,
              Chimera

       HOME   User's home directory

AUTHORS
       Nigel Stuckey <[email protected]>

SEE ALSO
       clockwork(1),  killclock(1),  habget(1), habput(1), irs(1), habedit(1),
       habprobe(1), habmeth(1), habconf(5)

habget

NAME
       habget - Send habitat data to stdout

SYNTAX
       habget [-c <purl>] [-C <cfcmd>] [-e <fmt>] [-dDhv] [-E] <route>

DESCRIPTION
       Open <route> using habitat's route addressing and send the data to std-
       out.

       See clockwork(1) for an explanation of the route syntax

OPTIONS
       -c <purl>
              Append user configuration data from  the  route  <purl>,  rather
              than the default file ~/.habrc.

       -C <cfcmd>
              Append  a  list  of configuration directives from <cfcmd>, sepa-
              rated by semicolons.

       -d     Place ghabitat in diagnostic mode, giving an additional level of
              logging  and  sending the text to stderr rather than the default
              or configured destinations.  In daemon mode, will send output to
              the controlling terminal.

       -D     Place  ghabitat  in  debug  mode.   As -d above but generating a
              great deal more information, designed to be used in  conjunction
              with  the  source  code.  Also overrides normal outputs and will
              send the text to stderr.  In daemon mode, will  send  output  to
              the controlling terminal.

       -e <fmt>
              Change  the  logging  output  to one of eight preset alternative
              formats, some showing additional  information.   <fmt>  must  be
              0-7.  See LOGGING below.

       -h     Print a help message to stdout and exit

       -v     Print the version number to stdout and exit

       -E     Escape characters in data that would otherwise be unprintable

EXAMPLES
       To  output  the  job  table from an established datastore file used for
       public data collection. This uses the ringstore driver.

       habget rs:var/myhost.rs,clockwork,0

       To get the most recent data sample from the 60 second sys ring from the
       same datastore as above.

       habget rs:var/myhost.rs,sys,60

       To find errors that may have been generated by clockwork.

       habget rs:var/myhost.rs,log,0

habput

NAME
       habput - Store habitat data from stdin

SYNTAX
       habput  [-s  <nslots>  -t  <desc>]  [-c <purl>] [-C <cfcmd>] [-e <fmt>]
       [-dDhv] <route>

DESCRIPTION
       Open <route> using habitat's route addressing and send data from  stdin
       to the route.

       See clockwork(1) for an explanation of the route syntax

OPTIONS
       -c <purl>
              Append  user  configuration  data  from the route <purl>, rather
              than the default file ~/.habrc.

       -C <cfcmd>
              Append a list of configuration directives  from  <cfcmd>,  sepa-
              rated by semicolons.

       -d     Place ghabitat in diagnostic mode, giving an additional level of
              logging and sending the text to stderr rather than  the  default
              or configured destinations.  In daemon mode, will send output to
              the controlling terminal.

       -D     Place ghabitat in debug mode.  As  -d  above  but  generating  a
              great  deal more information, designed to be used in conjunction
              with the source code.  Also overrides normal  outputs  and  will
              send  the  text  to stderr.  In daemon mode, will send output to
              the controlling terminal.

       -e <fmt>
              Change the logging output to one  of  eight  preset  alternative
              formats,  some  showing  additional  information.  <fmt> must be
              0-7.  See LOGGING below.

       -h     Print a help message to stdout and exit

       -v     Print the version to stdout and exit

       -s <nslots>
              Number of slots  for  creating  ringed  routes  (default  1000);
              <nslots>  of  0 gives a queue behaviour where the oldest data is
              not lost

       -t <desc>
              text description for creating ringed routes

EXAMPLES
       To append a sample of tabular data to a table store, use  a  tablestore
       driver. This will create a ring which can store 1,000 slots of data.

       habput tab:var/myfile.ts,myring

       To  save  the  same data, but limit the ring to just the most recent 10
       slots and give the ring a description

       habput -s 10 -t "my description" tab:var/myfile.ts,myring

       The same data, stored to the same location, but with an unlimited  his-
       tory (technically a queue).  To make the ring readable in ghabitat with
       current conventions, we store with the prefix '.r'

       habput -s 0 -t "my description" tab:var/myfile.ts,r.myring

       To save an error record, use a timestore driver

       habput -s 100 -t "my logs" ts:var/myfile.ts,mylogs

AUTHORS
       Nigel Stuckey <[email protected]>

SEE ALSO
       clockwork(1), killclock(1), ghabitat(1), habget(1), irs(1), habedit(1),
       habprobe(1), habmeth(1), habconf(5)

killclock

NAME
       killclock - Stops clockwork, Habitat's collection agent

SYNTAX
       killclock

DESCRIPTION
       Stops the public instance of clockwork running on the local machine.

       This  shell  script  locates  the lock file for clockwork, which is the
       collection agent for the Habitat suite.  It prints the process id, own-
       ing  user,  controlling  terminal  and start time of the daemon, before
       sending it a SIGTERM.

       No check is made that the clockwork process has terminated before  this
       script ends.

       Private  instances  of  clockwork  (started  with -j option) can not be
       stopped by this method, as  they  do  not  register  in  a  lock  file.
       Instead,  they  should  be  controlled  by conventional process control
       methods.

FILES
       /tmp/clockwork.run
       /var/run/clockwork.run

EXAMPLES
       Typing the following:-

       killclock

       will result in a display similar to below and the  termination  of  the
       clockwork daemon.

       Stopping  pid  2781,  user nigel and started on /dev/pts/2 at 25-May-04
       08:08:55 AM

AUTHORS
       Nigel Stuckey <[email protected]>

SEE ALSO
       clockwork(1), ghabitat(1), habget(1),  habput(1),  irs(1),  habedit(1),
       habprobe(1), habmeth(1), habconf(5)

Collected Data

The tables below shows the data that is collected by the standard probes in habitat, one table per operating system. It may not be up to date, so always check with the application itself.

Linux Data

Probe

Measure

Description

system (sys)

load1

1 minute load average

load5

5 minute load average

load15

15 minute load average

runque

number of runnable processes

nprocs

number of processes

lastproc

id of last process run

mem_tot

total memory (kB)

mem_used

memory used (kB)

mem_free

memory free (kB)

mem_shared

used memory shared (kB)

mem_buf

buffer memory (kB)

mem_cache

cache memory (kB)

swap_tot

total swap space (kB)

swap_used

swap space used (kB)

swap_free

swap space free (kB)

uptime

seconds that the system has been up

idletime

seconds that system has been idle

%user

% time cpu was in user space

%nice

% time cpu was at nice priority in user space

%system

% time cpu spent in kernel

%idle

% time cpu was idle

pagein

pages paged in per second

pageout

pages paged out per second

swapin

pages swapped in per second

swapout

pages swapped out per second

interrupts

hardware interrupts per second

contextsw

context switches per second

forks

process forks per second

storage (io)

id

mount or device identifier

device

device name

mount

mount point

fstype

filesystem type

size

size of filesystem or device (MBytes)

used

space used on device (MBytes)

reserved

reserved space in filesystem (KBytes)

%used

% used on device

kread

volume of data read (KB/s)

kwritten

volume of data written (KB/s)

rios

number of read operations per second

wios

number of write operations per second

read_svc_t

average read service time (ms)

write_svc_t

average write service time (ms)

network (net)

device

device name

rx_bytes

bytes received

rx_pkts

packets received

rx_errs

receive errors

rx_drop

receive dropped packets

rx_fifo

received fifo

rx_frame

receive frames

rx_comp

receive compressed

rx_mcast

received multicast

tx_bytes

bytes transmitted

tx_pkts

packets transmitted

tx_errs

transmit errors

tx_drop

transmit dropped packets

tx_fifo

transmit fifo

tx_colls

transmit collisions

tx_carrier

transmit carriers

tx_comp

transmit compressed

uptime (up)

uptime

uptime in secs

boot

time of boot in secs from epoch

suspend

secs suspended

vendor

vendor name

model

model name

nproc

number of processors

mhz

processor clock speed

cache

size of cache in kb

fpu

floating point unit available

downtime
(down)

lastup

time last alive in seconds from epoch

boot

time of boot in secs from epoch

downtime

secs unavailable

processes (ps)

pid

process id

ppid

process id of parent

pidglead

process id of process group leader

sid

session id

uid

real user id

pwname

name of real user

euid

effective user id

epwname

name of effective user

gid

real group id

egid

effective group id

size

size of process image in Kb

rss

resident set size in Kb

flag

process flags (system dependent)

nlwp

number of lightweight processes within this process

tty

controlling tty device

%cpu

% of recent cpu time

%mem

% of system memory

start

process start time from epoc

time

total cpu time for this process

childtime

total cpu time for reaped child processes

nice

nice level for cpu scheduling

syscall

system call number (if in kernel)

pri

priority (high value=high priority)

wchan

wait address for sleeping process

wstat

if zombie the wait() status

cmd

command/name of exec'd file

args

full command string

user_t

user level cpu time

sys_t

sys call cpu time

otrap_t

other system trap cpu time

textfault_t

text page fault sleep time

datafault_t

data page fault sleep time

kernelfault_t

kernel page fault sleep time

lockwait_t

user lock wait sleep time

osleep_t

all other sleep time

waitcpu_t

wait-cpu (latency) time

stop_t

stopped time

minfaults

minor page faults

majfaults

major page faults

nswaps

number of swaps

inblock

input blocks

outblock

output blocks

msgsnd

messages sent

msgrcv

messages received

sigs

signals received

volctx

voluntary context switches

involctx

involuntary context switches

syscalls

system calls

chario

characters read and written

pendsig

set of process pending signals

heap_vaddr

virtual address of process heap

heap_size

size of process heap in bytes

stack_vaddr

virtual address of process stack

stack_size

size of process stack in bytes

hardware
interrupts
(intr)

name

device name

hard

interrupts from hardware device

soft

interrupts self induced by system

watchdog

interrupts from a periodic timer

spurious

interrupts for unknown reason

multisvc

multiple servicing during single interrupt

system
values

(names)

name

name

vname

value name

value

value of symbol



Solaris Data



Probe

Measure

Description

system (sys)

updates


runque

+= num runnable procs

runocc

++ if num runnable procs > 0

swpque

+= num swapped procs

swpocc

++ if num swapped procs > 0

waiting

+= jobs waiting for I/O

freemem

+= freemem in pages

swap_resv

+= reserved swap in pages

swap_alloc

+= allocated swap in pages

swap_avail

+= unreserved swap in pages

swap_free

+= unallocated swap in pages

%idle

time cpu was idle

%wait

time cpu was idle waiting for IO

%user

time cpu was in user space

%system

time cpu was in kernel space

wait_io

time cpu was idle waiting for IO

wait_swap

time cpu was idle waiting for swap

wait_pio

time cpu was idle waiting for programmed I/O

bread

physical block reads

bwrite

physical block writes (sync+async)

lread

logical block reads

lwrite

logical block writes

phread

raw I/O reads

phwrite

raw I/O writes

pswitch

context switches

trap

traps

intr

device interrupts

syscall

system calls

sysread

read() + readv() system calls

syswrite

write() + writev() system calls

sysfork

forks

sysvfork

vforks

sysexec

execs

readch

bytes read by rdwr()

writech

bytes written by rdwr()

rawch

terminal input characters

canch

chars handled in canonical mode

outch

terminal output characters

msg

msg count (msgrcv()+msgsnd() calls)

sema

semaphore ops count (semop() calls)

namei

pathname lookups

ufsiget

ufs_iget() calls

ufsdirblk

directory blocks read

ufsipage

inodes taken with attached pages

ufsinopage

inodes taked with no attached pages

inodeovf

inode table overflows

fileovf

file table overflows

procovf

proc table overflows

intrthread

interrupts as threads (below clock)

intrblk

intrs blkd/prempted/released (switch)

idlethread

times idle thread scheduled

inv_swtch

involuntary context switches

nthreads

thread_create()s

cpumigrate

cpu migrations by threads

xcalls

xcalls to other cpus

mutex_adenters

failed mutex enters (adaptive)

rw_rdfails

rw reader failures

rw_wrfails

rw writer failures

modload

times loadable module loaded

modunload

times loadable module unloaded

bawrite

physical block writes (async)

iowait

procs waiting for block I/O

pgrec

page reclaims (includes pageout)

pgfrec

page reclaims from free list

pgin

pageins

pgpgin

pages paged in

pgout

pageouts

pgpgout

pages paged out

swapin

swapins

pgswapin

pages swapped in

swapout

swapouts

pgswapout

pages swapped out

zfod

pages zero filled on demand

dfree

pages freed by daemon or auto

scan

pages examined by pageout daemon

rev

revolutions of the page daemon hand

hat_fault

minor page faults via hat_fault()

as_fault

minor page faults via as_fault()

maj_fault

major page faults

cow_fault

copy-on-write faults

prot_fault

protection faults

softlock

faults due to software locking req

kernel_asflt

as_fault()s in kernel addr space

pgrrun

times pager scheduled

nc_hits

hits that we can really use

nc_misses

cache misses

nc_enters

number of enters done

nc_dblenters

num of enters when already cached

nc_longenter

long names tried to enter

nc_longlook

long names tried to look up

nc_mvtofront

entry moved to front of hash chain

nc_purges

number of purges of cache

flush_ctx

num of context flushes

flush_segment

num of segment flushes

flush_page

num of complete page flushes

flush_partial

num of partial page flushes

flush_usr

num of non-supervisor flushes

flush_region

num of region flushes

var_buf

num of I/O buffers

var_call

num of callout (timeout) entries

var_proc

max processes system wide

var_maxupttl

max user processes system wide

var_nglobpris

num of global scheduled priorities configured

var_maxsyspri

max global priorities used by system class

var_clist

num of clists allocated

var_maxup

max number of processes per user

var_hbuf

num of hash buffers to allocate

var_hmask

hash mask for buffers

var_pbuf

num of physical I/O buffers

var_sptmap

size of sys virt space alloc map

var_maxpmem

max physical memory to use in pages (if 0 use all available)

var_autoup

min secs before a delayed-write buffer can be flushed

var_bufhwm

high water mrk of buf cache in KB

var_xsdsegs

num of XENIX shared data segs

var_xsdslots

num of slots in xsdtab[] per segmt

flock_reccnt

num of records currently in use

flock_rectot

num of records used since boot

processes (ps)

pid

process id

ppid

process id of parent

pidglead

process id of process group leader

sid

session id

uid

real user id

pwname

name of real user

euid

effective user id

epwname

name of effective user

gid

real group id

egid

effective group id

size

size of process image in Kb

rss

resident set size in Kb

flag

process flags (system dependent)

nlwp

number of lightweight processes within this process

tty

controlling tty device

%cpu

% of recent cpu time

%mem

% of system memory

start

process start time from epoc

time

total cpu time for this process

childtime

total cpu time for reaped child processes

nice

nice level for scheduling

syscall

system call number (if in kernel)

pri

priority (high value=high priority)

wchan

wait address for sleeping process

wstat

if zombie the wait() status

cmd

command/name of exec'd file

args

full command string

user_t

user level cpu time

sys_t

sys call cpu time

otrap_t

other system trap cpu time

textfault_t

text page fault sleep time

datafault_t

data page fault sleep time

kernelfault_t

kernel page fault sleep time

lockwait_t

user lock wait sleep time

osleep_t

all other sleep time

waitcpu_t

wait-cpu (latency) time

stop_t

stopped time

minfaults

minor page faults

majfaults

major page faults

nswaps

number of swaps

inblock

input blocks

outblock

output blocks

msgsnd

messages sent

msgrcv

messages received

sigs

signals received

volctx

voluntary context switches

involctx

involuntary context switches

syscalls

system calls

chario

characters read and written

pendsig

set of process pending signals

heap_vaddr

virtual address of process heap

heap_size

size of process heap bytes

stack_vaddr

virtual address of process stack

stack_size

size of process stack bytes

storage (io)

device

device name

nread

number of bytes read

nwritten

number of bytes written

reads

number of read operations

writes

number of write operations

wait_t

cumulative wait (pre-service) time

wait_len_t

cumulative wait length*time product

run_t

cumulative run (service) time

run_len_t

cumulative run length*time product

wait_cnt

wait count

run_cnt

run count

System
values
(names)

name

name

vname

value name

value

value

system timers
(timer)

kname

timer name

name

event name

nevents

number of events

elapsed_t

cumulative elapsed time

min_t

shortest event duration

max_t

longest event duration

start_t

previous event start time

stop_t

previous event stop time

uptime (up)

uptime

uptime in secs

boot

time of boot in secs from epoch

suspend

secs suspended

vendor

vendor name

model

model name

nproc

number of processors

mhz

processor clock speed

cache

size of cache in kb

fpu

floating point unit available

down-
time
(down)

lastup

time last alive in secs from epoch

boot

time of boot in secs from epoch

downtime

secs unavailable

hardware
interrupts
(intr)


name

device name

hard

interrupt from hardware device

soft

interrupt self induced by system

watchdog

interrupt from periodic timer

spurious

interrupt for unknown reason

multisvc

multiple servicing during single interrupt




Fat Headed Array Format

The Fat Headed Array is a table of information designed for transportation between systems and external representation. Habitat uses FHAs when reading clockwork's central store and for I/O work with harvest. It is also used when loading data into harvest's repository when linking with other systems.

There are three parts to the data in the following order:

    1. Column names: tab separated attribute names that must be in line 1

    2. Info Rows: zero or more lines of meta information for each attribute. Each meta record takes one line, separating its fields with tabs which must be in the same order as the attribute names. Each info row is named with a trailing supplementary field, such that these rows have one more column than the data or column name rows. Info rows are terminated with a row which must start with two dashes '--'.

    3. One or more data rows to follow the column names and info rows.

The following is an example of a FHA.

To represent an empty value, two quotes should be used (“”). Occasionally, this may also be represented as a single dash.

To represent a value that contains tab characters (\t), the value should be contains in quotes (eg “embedded \t tab“).

Column names may only contain characters accepted by SQL database servers to ensure compatibility. This is generally accepted as the range [A-Za-z_]. See the info row name below for greater expression.

In addition to pure character formatting, ghabitat the graphical client, also understands certain named info rows.

  1. info – the text in the row is used as informational help in the client. This can be seen when hovering the mouse over a column name in a table or a curve's button when displaying charts. The information is contained in graphical 'pop-ups' or 'tool tips'.

  2. max – Optional value which, if present, sets the maximum expected value for an attribute. This helps in making charts more understandable.

  3. type – the data type of the column. In version 1, the data types are relatively simple:-

    1. i32 – 32-bit signed integer

    2. u32 – 32-bit unsigned integer

    3. i64 – 64 bit signed integer

    4. u64 – 64bit unsigned integer

    5. nano – nano second precision when used for timers. Currently also used for floating point values with more restrictive accuracy

    6. str – string value

  4. key – The column that is the primary key of the table contains a 1, all the other values contain no value (“”). May be expanded to show secondary or tertiary keys in later versions.

  5. name – The unrestricted full name of the column, if it is not possible to express it in the column name. If blank, the column name is used as the attribute's label. This is used to include punctuation characters such as '-' or '%' in the label as they are disallowed by the SQL naming standard but can be very useful for compact expression.

    1. Job Table Format

Clockwork reads a job table and uses the information to establish repeating and timed jobs. It is similar to the Unix scheduler cron, but with greater flexibility.

When first run, clockwork boot straps an initial version of its jobs from the file lib/clockwork.jobs. The resulting table is stored in the ringstore location var/<hostname>.rs,jobs,0. Subsequent runs of clockwork will use this table, so any amendments should be made using habedit on the ringstore route.

Clockwork may also be started with an alternate job table by using the the -j switch to clockwork. In this mode, clockwork runs a private data collector with out starting a network service for the whole machine.

Jobs are defined in a multi columned text format, headed by the magic string 'job 1'. Comments may appear anywhere, starting with '#' and running to the end of the line. Each job is defined on a single line containing 11 arguments, which in order are:-

    1. start – when to start the job, in seconds from the starting of clockwork

    2. period – how often to repeat the job, in seconds

    3. phase – not yet implemented

    4. count – how many times the job should be run, with 0 repeating forever

    5. name – name of the job

    6. requester – who requested the job, by convention the email address

    7. results – the route where results should be sent

    8. errors – the route where errors should be sent

    9. nslots – the number of slots created in the 'results' and 'errors' routes, if applicable (applies to ringstore and SQL ringstore).

    10. method – the job method

    11. command – the arguments given to each method


Part of a job table taken from the default file lib/clockwork.jobs is printed below. The top line runs the sys probe every 60 seconds, gathering system data (which becomes the system node in the choice tree). The remaining lines use the collected high frequency data and transform it to lower frequencies using an averaging process, running every five minutes, fifteen minutes and one month (300, 900 and 3600 seconds).



The job running every 300 seconds creates a storage ring with 288 entries, allowing a full day's data at five minute intervals to be collected. The other jobs collect seven days at 15 minutes and one month at hourly intervals.

The methods are probe which is given the command sys, and sample which has the command avg and the route to sample as its argument. These methods are available on the command line using habmeth. The probe data is similarly available using the command habprobe.

Note that special tokens are used which expand when clockwork is running. These are %d for the ring's duration and %h for the hostname. Other tokens are available which are explored in the Administration manual.

    1. Pattern Matching Table Format

The pattern matching table is user configurable, using the processing mechanism is described earlier in this document.

The pattern-matching table, which defines the bahaviour has the following columns:

    1. pattern – the regular expression to look for as a pattern, which should normally match a single line. Each match is considered an event.

    2. embargo time – number of seconds that must elapse after the first event before another event may be raised of the same pattern from the same route.

    3. embargo count – maximum number of identical pattern matches that can take place before another event is raised for that pattern and route.

    4. severity – importance of the event. One of: fatal, error, warning, info, diag, debug

    5. action method – event's execution method

    6. action arguments – method specific arguments

    7. action message – text message to describe the event. It may contain special tokens of the form %<char> that describe other information about the event.

When the event is detected and is not subject to embargo, then an event is raised. A text message is prepared which is turned into an instruction using the action method and arguments. Then, it is appended to the event ring for execution (see below).

    1. Watched Sources Table Format

The watched sources table defines a set of routes associated with a identifier. A watching job then ties together a set of sources with a set of patterns and executes them periodically. When the watching job starts, it checks all the routes defined in this table for changes in size. Those that have changed will be checked for pattern matches (see the details above).

The format of the table is simple: one entry per line, with each being a valid route format.

    1. Event Table Format

The event table is filled from the activities of pattern matching and threshold detection. When there is a match not covered by an embargo, an event will be raised, which is a instruction to execute a method supported by the habitat job environment. The instructions are queued as separate sequences in the event ring, which is stored in the system ringstore.

The table format is simple:-

    1. method – execution method as supported by the habitat job environment.

    2. command – command to give to method

    3. arguments – command arguments, which may contain spaces. The '%' character must be escaped if it is to be used in an argument (see below)

    4. stdin – input text to the method, which must be introduced with '%' to separate it from the argument. Successive % characters represent new lines. To actually print %, escape it with backslash (\%).

When an event has been completed, the next sequence to be processed is stored in a state table. The event ring is a finite length, so old events will be removed automatically over time.