|
|
Line 21: |
Line 21: |
| | | |
| 3. service gmond restart | | 3. service gmond restart |
− |
| |
− |
| |
− | <pre><nowiki>
| |
− |
| |
− | Name
| |
− | ganglia - distributed monitoring system
| |
− |
| |
− | Version
| |
− | ganglia 3.1.2
| |
− |
| |
− | The latest version of this software and document will always be found at
| |
− | http://ganglia.sourceforge.net/. You are currently reading $Revision:
| |
− | 1705 $ of this document.
| |
− |
| |
− | Synopsis
| |
− | ______ ___
| |
− | / ____/___ _____ ____ _/ (_)___ _
| |
− | / / __/ __ `/ __ \/ __ `/ / / __ `/
| |
− | / /_/ / /_/ / / / / /_/ / / / /_/ /
| |
− | \____/\__,_/_/ /_/\__, /_/_/\__,_/
| |
− | /____/ Distributed Monitoring System
| |
− |
| |
− | Ganglia is a scalable distributed monitoring system for high-performance
| |
− | computing systems such as clusters and Grids. It is based on a
| |
− | hierarchical design targeted at federations of clusters. It relies on a
| |
− | multicast-based listen/announce protocol to monitor state within
| |
− | clusters and uses a tree of point-to-point connections amongst
| |
− | representative cluster nodes to federate clusters and aggregate their
| |
− | state. It leverages widely used technologies such as XML for data
| |
− | representation, XDR for compact, portable data transport, and RRDtool
| |
− | for data storage and visualization. It uses carefully engineered data
| |
− | structures and algorithms to achieve very low per-node overheads and
| |
− | high concurrency. The implementation is robust, has been ported to an
| |
− | extensive set of operating systems and processor architectures, and is
| |
− | currently in use on over 500 clusters around the world. It has been used
| |
− | to link clusters across university campuses and around the world and can
| |
− | scale to handle clusters with 2000 nodes.
| |
− |
| |
− | The ganglia system is comprised of two unique daemons, a PHP-based web
| |
− | frontend and a few other small utility programs.
| |
− |
| |
− | Ganglia Monitoring Daemon (gmond)
| |
− | Gmond is a multi-threaded daemon which runs on each cluster node you
| |
− | want to monitor. Installation is easy. You don't have to have a
| |
− | common NFS filesystem or a database backend, install special
| |
− | accounts, maintain configuration files or other annoying hassles.
| |
− |
| |
− | Gmond has four main responsibilities: monitor changes in host state,
| |
− | announce relevant changes, listen to the state of all other ganglia
| |
− | nodes via a unicast or multicast channel and answer requests for an
| |
− | XML description of the cluster state.
| |
− |
| |
− | Each gmond transmits in information in two different ways:
| |
− | unicasting/multicasting host state in external data representation
| |
− | (XDR) format using UDP messages or sending XML over a TCP
| |
− | connection.
| |
− |
| |
− | Ganglia Meta Daemon (gmetad)
| |
− | Federation in Ganglia is achieved using a tree of point-to-point
| |
− | connections amongst representative cluster nodes to aggregate the
| |
− | state of multiple clusters. At each node in the tree, a Ganglia Meta
| |
− | Daemon ("gmetad") periodically polls a collection of child data
| |
− | sources, parses the collected XML, saves all numeric, volatile
| |
− | metrics to round-robin databases and exports the aggregated XML over
| |
− | a TCP sockets to clients. Data sources may be either "gmond"
| |
− | daemons, representing specific clusters, or other "gmetad" daemons,
| |
− | representing sets of clusters. Data sources use source IP addresses
| |
− | for access control and can be specified using multiple IP addresses
| |
− | for failover. The latter capability is natural for aggregating data
| |
− | from clusters since each "gmond" daemon contains the entire state of
| |
− | its cluster.
| |
− |
| |
− | Ganglia PHP Web Frontend
| |
− | The Ganglia web frontend provides a view of the gathered information
| |
− | via real-time dynamic web pages. Most importantly, it displays
| |
− | Ganglia data in a meaningful way for system administrators and
| |
− | computer users. Although the web frontend to ganglia started as a
| |
− | simple HTML view of the XML tree, it has evolved into a system that
| |
− | keeps a colorful history of all collected data.
| |
− |
| |
− | The Ganglia web frontend caters to system administrators and users.
| |
− | For example, one can view the CPU utilization over the past hour,
| |
− | day, week, month, or year. The web frontend shows similar graphs for
| |
− | Memory usage, disk usage, network statistics, number of running
| |
− | processes, and all other Ganglia metrics.
| |
− |
| |
− | The web frontend depends on the existence of the "gmetad" which
| |
− | provides it with data from several Ganglia sources. Specifically,
| |
− | the web frontend will open the local port 8651 (by default) and
| |
− | expects to receive a Ganglia XML tree. The web pages themselves are
| |
− | highly dynamic; any change to the Ganglia data appears immediately
| |
− | on the site. This behavior leads to a very responsive site, but
| |
− | requires that the full XML tree be parsed on every page access.
| |
− | Therefore, the Ganglia web frontend should run on a fairly powerful,
| |
− | dedicated machine if it presents a large amount of data.
| |
− |
| |
− | The Ganglia web frontend is written in the PHP scripting language,
| |
− | and uses graphs generated by "gmetad" to display history
| |
− | information. It has been tested on many flavours of Unix (primarily
| |
− | Linux) with the Apache webserver and the PHP module (4.1 or later).
| |
− |
| |
− | Installation
| |
− | The latest version of all ganglia software can always be downloaded from
| |
− | http://ganglia.info/
| |
− |
| |
− | Ganglia runs on Linux (i386, ia64, sparc, alpha, powerpc, m68k, mips,
| |
− | arm, hppa, s390), FreeBSD, NetBSD, OpenBSD, DragonflyBSD, MacOS X,
| |
− | Solaris, AIX, IRIX, Tru64, HPUX and Windows NT/XP/2000/2003/2008 making
| |
− | it as portable as it is scalable.
| |
− |
| |
− | Monitoring Core Installation
| |
− | If you use the Linux RPMs provided on the ganglia web site, you can skip
| |
− | to the end of this section.
| |
− |
| |
− | Ganglia uses the GNU autoconf so compilation and installation of the
| |
− | monitoring core is basically
| |
− |
| |
− | % ./configure
| |
− | % make
| |
− | % make install
| |
− |
| |
− | but there are some issues that you need to take a look at first.
| |
− |
| |
− | Kernel multicast support
| |
− | If you use the ganglia multicast support, you must have a kernel
| |
− | that supports multicast. The vast majority of machines have
| |
− | multicast support by default. If you have problems with ganglia this
| |
− | is a core issue.
| |
− |
| |
− | Gmetad is not installed by default
| |
− | Since "gmetad" relies on the Round-Robin Database Tool ( see
| |
− | http://www.rrdtool.org/ ) it will not be compiled unless you
| |
− | explicit request it by using a --with-gmetad flag.
| |
− |
| |
− | % ./configure --with-gmetad
| |
− |
| |
− | The configure script will fail if it cannot find the rrdtool library
| |
− | and header files. By default, it expects to find them at
| |
− | /usr/include/rrd.h and /usr/lib/librrd.so. If you installed them in
| |
− | different locations then you need to instruct configure where to
| |
− | find them using:
| |
− |
| |
− | % ./configure --with-librrd=/rrd/path --with-gmetad
| |
− |
| |
− | Of course, you need to substitute "/rrd/path" with the real location
| |
− | of the rrd tool directory where the header file can be located
| |
− | inside an include subdirectory and the library can be located inside
| |
− | a lib subdirectory. As an alternative you could set "-L" in LDFLAGS,
| |
− | and "-I" in CFLAGS and CPPFLAGS for the library path and the header
| |
− | path respectively.
| |
− |
| |
− | AIX should not be compiled with shared libraries
| |
− | You must add the "--disable-shared" configure flags if you are
| |
− | running on AIX. For more details refer to the README.AIX file
| |
− |
| |
− | % ./configure --disable-shared
| |
− |
| |
− | Solaris dependencies could be problematic
| |
− | Not really a Solaris specific problem, but since Solaris has several
| |
− | different package repositories, all of them unofficial, it is
| |
− | difficult to be sure that all possible permutations have been
| |
− | confirmed to work reliably.
| |
− |
| |
− | Be sure to have all dependencies covered, as explained in the
| |
− | INSTALL file and to use GNU make and a gcc compiler that builds
| |
− | 32bit binaries with all other libraries matching that ISA.
| |
− |
| |
− | When in doubt, build the problematic dependency from source and
| |
− | remember to distribute it together with your ganglia build as
| |
− | everything is dynamically linked by default.
| |
− |
| |
− | Be particularly careful with libConfuse, especially if using the old
| |
− | 2.5 version. LibConfuse 2.5 is known to be incorrectly packaged and
| |
− | to compile by default as a static library which will fail to link
| |
− | with ganglia.
| |
− |
| |
− | Propietary *NIX systems might not work at all
| |
− | The good news is that the libmetrics code that used to work before
| |
− | 3.1 is still most likely working fine and so there is nothing
| |
− | fundamentally broken about it.
| |
− |
| |
− | But the bad news is that in order to add the dynamic metric
| |
− | functionality, the build system and the way gmond used to locate its
| |
− | metrics had to be changed significantly. Therefore getting gmond to
| |
− | build and work again required fixes to be implemented for all
| |
− | platforms.
| |
− |
| |
− | Since none of the developers had access to HPUX, IRIX, Tru64
| |
− | (OSF/1), or Darwin (MacOS X) those platforms might not be able to
| |
− | build or run a 3.1 gmond yet. If you have access to any of these
| |
− | platforms and want to run ganglia 3.1, feel free to drop by the
| |
− | ganglia-developers list with suggestions, or even better patches.
| |
− |
| |
− | GEXEC confusion
| |
− | GEXEC is a scalable cluster remote execution system which provides
| |
− | fast, RSA authenticated remote execution of parallel and distributed
| |
− | jobs. It provides transparent forwarding of stdin, stdout, stderr,
| |
− | and signals to and from remote processes, provides local environment
| |
− | propagation, and is designed to be robust and to scale to systems
| |
− | over 1000 nodes. Internally, GEXEC operates by building an n-ary
| |
− | tree of TCP sockets and threads between gexec daemons and
| |
− | propagating control information up and down the tree. By using
| |
− | hierarchical control, GEXEC distributes both the work and resource
| |
− | usage associated with massive amounts of parallelism across multiple
| |
− | nodes, thereby eliminating problems associated with single node
| |
− | resource limits (e.g., limits on the number of file descriptors on
| |
− | front-end nodes). (from http://www.theether.org/gexec )
| |
− |
| |
− | "gexec" is a great cluster execution tool but integrating it with
| |
− | ganglia is a bit clumsy. GEXEC can run standalone without access to
| |
− | a ganglia "gmond". In standalone mode gexec will use the hosts
| |
− | listed in your GEXEC_SVRS variable to run on. For example, say I
| |
− | want to run "hostname" on three machines in my cluster: "host1",
| |
− | "host2" and "host3". I use the following command line.
| |
− |
| |
− | % GEXEC_SVRS="host1 host2 host3" gexec -n 3 hostname
| |
− |
| |
− | and gexec would build an n-ary tree (binary tree by default) of TCP
| |
− | sockets to those machines and run the command "hostname"
| |
− |
| |
− | As an added feature, you can have "gexec" pull a host list from a
| |
− | locally running gmond and use that as the host list instead of
| |
− | GEXEC_SVRS. The list is load balanced and "gexec" will start the job
| |
− | on the *n* least-loaded machines.
| |
− |
| |
− | For example..
| |
− |
| |
− | % gexec -n 5 hostname
| |
− |
| |
− | will run the command "hostname" on the five least-loaded machines in
| |
− | a cluster.
| |
− |
| |
− | To turn on the "gexec" feature in ganglia you must configure ganglia
| |
− | with the "--enable-gexec" flag
| |
− |
| |
− | % ./configure --enable-gexec
| |
− |
| |
− | Enabling "gexec" means that by default any host running gmond will
| |
− | send a special message announcing that gexec is installed on it and
| |
− | open for requests.
| |
− |
| |
− | Now the question is, what if I don't want gexec to run on every host
| |
− | in my cluster? For example, you may not want to have "gexec" run
| |
− | jobs on your cluster frontend nodes.
| |
− |
| |
− | You simply add the following line to your "gmond" configuration file
| |
− | ("/etc/ganglia/gmond.conf" by default)
| |
− |
| |
− | no_gexec on
| |
− |
| |
− | Simple huh? I know the configuration file option, "no_gexec", seems
| |
− | crazy (and it is). Why have an option that says "yes to no gexec"?
| |
− | The early versions of gmond didn't use a configuration file but
| |
− | instead commandline options. One of the commandline options was
| |
− | simply "--no-gexec" and the default was to announce gexec as on.
| |
− |
| |
− | Once you have successfully run
| |
− |
| |
− | % ./configure <options>
| |
− | % make
| |
− | % make install
| |
− |
| |
− | you should find the following files installed in "/usr" (by default).
| |
− |
| |
− | /usr/bin/gstat
| |
− | /usr/bin/gmetric
| |
− | /usr/sbin/gmond
| |
− | /usr/sbin/gmetad
| |
− |
| |
− | If you installed ganglia using RPMs then these files will be installed
| |
− | when you install the RPM. The RPM is installed simply by running
| |
− |
| |
− | % rpm -Uvh ganglia-gmond-3.1.2.i386.rpm
| |
− | % rpm -Uvh ganglia-gmetad-3.1.2.i386.rpm
| |
− |
| |
− | Once you have the necessary binaries installed, you can test your
| |
− | installation by running
| |
− |
| |
− | % ./gmond
| |
− |
| |
− | This will start the ganglia monitoring daemon. You should then be able
| |
− | to run
| |
− |
| |
− | % telnet localhost 8649
| |
− |
| |
− | And get an XML description of the state of your machine (and any other
| |
− | hosts running gmond at the time).
| |
− |
| |
− | If you are installing by source on Linux, scripts are provided to start
| |
− | "gmetad" and "gmond" at system startup. They are easy to install from
| |
− | the source root.
| |
− |
| |
− | % cp ./gmond/gmond.init /etc/rc.d/init.d/gmond
| |
− | % chkconfig --add gmond
| |
− | % chkconfig --list gmond
| |
− | gmond 0:off 1:off 2:on 3:on 4:on 5:on 6:off
| |
− | % /etc/rc.d/init.d/gmond start
| |
− | Starting GANGLIA gmond: [ OK ]
| |
− |
| |
− | Repeat this step with gmetad.
| |
− |
| |
− | PHP Web Frontend Installation
| |
− | 1. The ./web directory of the ganglia distribution contains all the
| |
− | necessary PHP files for running your web frontend. Copy those files
| |
− | to "/var/www/html", however look for the variable "DocumentRoot" in
| |
− | your Apache configuration files to be sure. All the PHP script files
| |
− | use relative URLs in their links, so you may place the "ganglia/"
| |
− | directory anywhere convenient.
| |
− |
| |
− | 2. Ensure your webserver understands how to process PHP script files.
| |
− | Currently, the web frontend contains certain php language that
| |
− | requires PHP version 4 or greater. Processing PHP script files
| |
− | usually requires a webserver module, such as the "mod_php" for the
| |
− | popular Apache webserver. In RedHat Linux, the RPM package that
| |
− | provides this module is called simply "php".
| |
− |
| |
− | For Apache, "mod_php" module must be enabled. The following lines
| |
− | should appear somewhere in Apache's *conf files. This example
| |
− | applies to RedHat and Mandrake Linux. The actual filenames may vary
| |
− | on your system. If you installed the php module using an RPM
| |
− | package, this work will have been done automatically.
| |
− |
| |
− | <IfDefine HAVE_PHP4>
| |
− | LoadModule php4_module extramodules/libphp4.so
| |
− | AddModule mod_php4.c
| |
− | </IfDefine>
| |
− |
| |
− | AddType application/x-httpd-php .php .php4 .php3 .phtml
| |
− | AddType application/x-httpd-php-source .phps
| |
− |
| |
− | 3. The webfrontend requires the existance of the gmetad package on the
| |
− | webserver. Follow the installation instructions on the gmetad page.
| |
− | Specifically, the webfrontend requires the rrdtool and the "rrds/"
| |
− | directory from gmetad. If you are a power user, you may use NFS to
| |
− | simulate the local existance of the rrds.
| |
− |
| |
− | 4. Test your installation. Visit the URL:
| |
− |
| |
− | http://localhost/ganglia/
| |
− |
| |
− | With a web-browser, where localhost is the address of your
| |
− | webserver.
| |
− |
| |
− | Installation of the web frontend is simplified on Linux by using rpm.
| |
− |
| |
− | % rpm -Uvh ganglia-web-3.1.2-1.i386.rpm
| |
− | Preparing... ########################################### [100%]
| |
− | 1:ganglia-web ########################################### [100%]
| |
− |
| |
− | Configuration
| |
− | Gmond Configuration
| |
− | The configuration file format has changed between gmond version 2.5.x
| |
− | and version 3.x. The change was necessary in order to allow more complex
| |
− | configuration options.
| |
− |
| |
− | Gmond has a default configuration it will use if it does not find the
| |
− | default configuration file /etc/ganglia/gmond.conf. To see the default
| |
− | configuration simply run the command:
| |
− |
| |
− | % gmond --default_config
| |
− |
| |
− | and gmond will output its default configuration to stdout. This default
| |
− | configuration can serve as a good starting place for building a more
| |
− | custom configuration.
| |
− |
| |
− | % gmond --default_config > gmond.conf
| |
− |
| |
− | would create a file gmond.conf which you can then edit to taste and copy
| |
− | to /etc/ganglia/gmond.conf or elsewhere.
| |
− |
| |
− | To start gmond with a configuration file other then
| |
− | /etc/ganglia/gmond.conf, simply specify the configuration file location
| |
− | by running
| |
− |
| |
− | % gmond --config /my/ganglia/configs/custom.conf
| |
− |
| |
− | If you want to convert a 2.5.x configuration file to 3.x file format,
| |
− | run the following command
| |
− |
| |
− | % gmond --convert ./old_25_config.conf
| |
− |
| |
− | and gmond with output the equivalent 3.x configuration file to stdout.
| |
− | You can then redirect that output to a new configuration file which can
| |
− | serve as a starting point for your configuration.
| |
− |
| |
− | % gmond --convert ./old_25_config.conf > ./new_26_config.conf
| |
− |
| |
− | For details about gmond configuration options, simply run
| |
− |
| |
− | % man gmond.conf
| |
− |
| |
− | for a complete listing of options with detailed explanations.
| |
− |
| |
− | Gmetad Configuration
| |
− | The behavior of the Ganglia Meta Daemon is completely controlled by a
| |
− | single configuration file which is by default
| |
− | "/etc/ganglia/gmetad.conf". For gmetad to do anything useful you much
| |
− | specify at least one "data_source" in the configuration. The format of
| |
− | the data_source line is as follows
| |
− |
| |
− | data_source "Cluster A" 127.0.0.1 1.2.3.4:8655 1.2.3.5:8625
| |
− | data_source "Cluster B" 1.2.4.4:8655
| |
− |
| |
− | In this example, there are two unique data sources: "Cluster A" and
| |
− | "Cluster B". The Cluster A data source has three redundant sources. If
| |
− | gmetad cannot pull the data from the first source, it will continue
| |
− | trying the other sources in order.
| |
− |
| |
− | If you do not specify a port number, gmetad will assume the default
| |
− | ganglia port which is 8649 (U*N*I*X on a phone key pad)
| |
− |
| |
− | For a sample gmetad configuration file with comments, look at the
| |
− | gmetad.conf file provided as part of the distribution package in the
| |
− | gmetad directory
| |
− |
| |
− | "gmetad" has a "--conf" option to allow you to specify alternate
| |
− | configuration files
| |
− |
| |
− | % ./gmetad -conf=/tmp/my_custom_config.conf
| |
− |
| |
− | PHP Web Frontend Configuration
| |
− | Most configuration parameters reside in the "ganglia/conf.php" file.
| |
− | Here you may alter the template, gmetad location, RRDtool location, and
| |
− | set the default time range and metrics for graphs.
| |
− |
| |
− | The static portions of the Ganglia website are themable. This means you
| |
− | can alter elements such as section lables, some links, and images to
| |
− | suit your individual tastes and environment. The "template_name"
| |
− | variable names a directory containing the current theme. Ganglia uses
| |
− | TemplatePower to implement themes. A user-defined skin must conform to
| |
− | the template interface as defined by the default theme. Essentially, the
| |
− | variable names and START/END blocks in a custom theme must remain the
| |
− | same as the default, but all other HTML elements may be changed.
| |
− |
| |
− | Other configuration variables in "conf.php" specify the location of
| |
− | gmetad's files, and where to find the rrdtool program. These locations
| |
− | need only be changed if you do not run gmetad on the webserver.
| |
− | Otherwise the default locations should work fine. The "default_range"
| |
− | variable specifies what range of time to show on the graphs by default,
| |
− | with possible values of hour, day, week, month, year. The
| |
− | "default_metric" parameter specifies which metric to show on the cluster
| |
− | view page by default.
| |
− |
| |
− | Commandline Tools
| |
− | There are two commandline tools that work with "gmond" to add custom
| |
− | metrics and query the current state of a cluster: "gmetric" and "gstat"
| |
− | respectively.
| |
− |
| |
− | Gmetric
| |
− | The Ganglia Metric Tool (gmetric) allows you to easily monitor any
| |
− | arbitrary host metrics that you like expanding on the core metrics that
| |
− | gmond measures by default.
| |
− |
| |
− | If you want help with the gmetric sytax, simply use the "help"
| |
− | commandline option
| |
− |
| |
− | % gmetric --help
| |
− | gmetric 3.1.2
| |
− |
| |
− | Purpose:
| |
− | The Ganglia Metric Client (gmetric) announces a metric
| |
− | on the list of defined send channels defined in a configuration file
| |
− |
| |
− | Usage: gmetric [OPTIONS]...
| |
− |
| |
− | -h, --help Print help and exit
| |
− | -V, --version Print version and exit
| |
− | -c, --conf=STRING The configuration file to use for finding send channels
| |
− | (default=`/etc/ganglia/gmond.conf')
| |
− | -n, --name=STRING Name of the metric
| |
− | -v, --value=STRING Value of the metric
| |
− | -t, --type=STRING Either
| |
− | string|int8|uint8|int16|uint16|int32|uint32|float|double
| |
− | -u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius
| |
− | (default=`')
| |
− | -s, --slope=STRING Either zero|positive|negative|both (default=`both')
| |
− | -x, --tmax=INT The maximum time in seconds between gmetric calls
| |
− | (default=`60')
| |
− | -d, --dmax=INT The lifetime in seconds of this metric (default=`0')
| |
− | -S, --spoof=STRING IP address and name of host/device (colon separated) we
| |
− | are spoofing (default='')
| |
− | -H, --heartbeat spoof a heartbeat message (use with spoof option)
| |
− |
| |
− | Gmetric sends the metric specified on the commandline to all
| |
− | udp_send_channels specified in the configuration file
| |
− | /etc/ganglia/gmond.conf by default. If you want to send metric to
| |
− | alternate udp_send_channels, you can specify a different configuration
| |
− | file as such:
| |
− |
| |
− | % gmetric --conf=./custom.conf -n "wow" -v "it works" -t "string"
| |
− |
| |
− | All metrics in ganglia have a name, value, type and optionally units.
| |
− | For example, say I wanted to measure the temperature of my CPU
| |
− | (something gmond doesn't do by default) then I could send this metric
| |
− | with name="temperature", value="63", type="int16" and units="Celcius".
| |
− |
| |
− | Assume I have a program called "cputemp" which outputs in text the
| |
− | temperature of the CPU
| |
− |
| |
− | % cputemp
| |
− | 63
| |
− |
| |
− | I could easily send this data to all listening gmonds by running
| |
− |
| |
− | % gmetric --name temperature --value `cputemp` --type int16 --units Celcius
| |
− |
| |
− | Check the exit value of gmetric to see if it successfully sent the data:
| |
− | 0 on success and -1 on failure.
| |
− |
| |
− | To constantly sample this temperature metric, you just need too add this
| |
− | command to your cron table.
| |
− |
| |
− | Gstat
| |
− | The Ganglia Cluster Status Tool (gstat) is a commandline utility that
| |
− | allows you to get status report for your cluster.
| |
− |
| |
− | To get help with the commandline options, simply pass "gstat" the
| |
− | "--help" option
| |
− |
| |
− | % gstat --help
| |
− | gstat 3.1.2
| |
− |
| |
− | Purpose:
| |
− | The Ganglia Status Client (gstat) connects with a
| |
− | Ganglia Monitoring Daemon (gmond) and output a load-balanced list
| |
− | of cluster hosts
| |
− |
| |
− | Usage: gstat [OPTIONS]...
| |
− | -h --help Print help and exit
| |
− | -V --version Print version and exit
| |
− | -a --all List all hosts. Not just hosts running gexec (default=off)
| |
− | -d --dead Print only the hosts which are dead (default=off)
| |
− | -m --mpifile Print a load-balanced mpifile (default=off)
| |
− | -1 --single_line Print host and information all on one line (default=off)
| |
− | -l --list Print ONLY the host list (default=off)
| |
− | -n --numeric Print numeric addresses instead of hostnames (default=off)
| |
− | -iSTRING --gmond_ip=STRING Specify the ip address of the gmond to query (default='127.0.0.1')
| |
− | -pINT --gmond_port=INT Specify the gmond port to query (default=8649)
| |
− |
| |
− | Note: gstat with no option will only show gexec-enabled hosts. To see
| |
− | all hosts that are UP (regardless of their gexec state) you need to add
| |
− | the --all flag.
| |
− |
| |
− | % gstat --all
| |
− |
| |
− | Extending Ganglia through metric modules
| |
− | There are currently two ways in which metric modules can be written and
| |
− | plugged into Gmond in order to extend the types of metrics that Ganglia
| |
− | is able to monitor. As of Ganglia 3.1, a pluggable interface has been
| |
− | added to allow the Gmond metric gathering agent to collect any type of
| |
− | metric that can be acquired through programatic means. The primary
| |
− | metric module interface is C with a secondary python interface. This
| |
− | means that pluggable modules can either be written and compiled into
| |
− | dynamically loadable C based language modules or written and deployed as
| |
− | python pluggable modules.
| |
− |
| |
− | The basic steps when writting a pluggable module either in C or in
| |
− | python, is as follows:
| |
− |
| |
− | 1. Create a module definition structure that contains callback data and
| |
− | metric information
| |
− | 2. Implement 3 callback functions that will serve as the links between
| |
− | the Gmond metric gathering agent and the metric module. These callback
| |
− | functions include module initialization, metric handler and module
| |
− | cleanup.
| |
− |
| |
− | There are simple metric module examples for both a C based and a python
| |
− | based module under the gmond/modules and gmond/python_modules source
| |
− | code sub-trees. Please see these module examples for more details.
| |
− |
| |
− | Frequently Asked Questions (FAQ)
| |
− | What metrics does ganglia collect on platform x?
| |
− | To see a complete list of the metrics that a particular gmond
| |
− | supports, run the command:
| |
− |
| |
− | % gmond -m
| |
− |
| |
− | and gmond will output all the metrics that it is capable of
| |
− | collecting and sending.
| |
− |
| |
− | This table describes all the metrics that ganglia collects and shows
| |
− | what platforms the metric are supported on. (The following table is
| |
− | only partially complete).
| |
− |
| |
− | Metric Name Description Platforms
| |
− | -----------------------------------------------------------------------
| |
− | boottime System boot timestamp l,f
| |
− | bread_sec
| |
− | bwrite_sec
| |
− | bytes_in Number of bytes in per second l,f
| |
− | bytes_out Number of bytes out per second l,f
| |
− | cpu_aidle Percent of time since boot idle CPU l
| |
− | cpu_arm
| |
− | cpu_avm
| |
− | cpu_idle Percent CPU idle l,f
| |
− | cpu_intr
| |
− | cpu_nice Percent CPU nice l,f
| |
− | cpu_num Number of CPUs l,f
| |
− | cpu_rm
| |
− | cpu_speed Speed in MHz of CPU l,f
| |
− | cpu_ssys
| |
− | cpu_system Percent CPU system l,f
| |
− | cpu_user Percent CPU user l,f
| |
− | cpu_vm
| |
− | cpu_wait
| |
− | cpu_wio
| |
− | disk_free Total free disk space l,f
| |
− | disk_total Total available disk space l,f
| |
− | load_fifteen Fifteen minute load average l,f
| |
− | load_five Five minute load average l,f
| |
− | load_one One minute load average l,f
| |
− | location GPS coordinates for host e
| |
− | lread_sec
| |
− | lwrite_sec
| |
− | machine_type
| |
− | mem_buffers Amount of buffered memory l,f
| |
− | mem_cached Amount of cached memory l,f
| |
− | mem_free Amount of available memory l,f
| |
− | mem_shared Amount of shared memory l,f
| |
− | mem_total Amount of available memory l,f
| |
− | mtu Network maximum transmission unit l,f
| |
− | os_name Operating system name l,f
| |
− | os_release Operating system release (version) l,f
| |
− | part_max_used Maximum percent used for all partitions l,f
| |
− | phread_sec
| |
− | phwrite_sec
| |
− | pkts_in Packets in per second l,f
| |
− | pkts_out Packets out per second l,f
| |
− | proc_run Total number of running processes l,f
| |
− | proc_total Total number of processes l,f
| |
− | rcache
| |
− | swap_free Amount of available swap memory l,f
| |
− | swap_total Total amount of swap memory l,f
| |
− | sys_clock Current time on host l,f
| |
− | wcache
| |
− |
| |
− | Platform key:
| |
− | l = Linux, f = FreeBSD, a = AIX, c = Cygwin
| |
− | m = MacOS, i = IRIX, h = HPUX, t = Tru64
| |
− | e = Every Platform
| |
− |
| |
− | If you are interested in how the metrics are collected, just take a
| |
− | look in directory "./libmetrics" in the source distribution. There
| |
− | is a directory for each platform that is supported.
| |
− |
| |
− | What does the error "Process XML (x): XML_ParseBuffer() error at line x:
| |
− | not well-formed"
| |
− | This is an error that occurs when a ganglia components reads data
| |
− | from another ganglia component and finds that the XML is not
| |
− | well-formed. The most common time this is a problem is when the PHP
| |
− | web frontend tries to read the XML stream from gmetad.
| |
− |
| |
− | To troubleshoot this problem, capture an XML from the ganglia
| |
− | component in question (gmetad/gmond). This is easy to do if you have
| |
− | telnet installed. Simply login to the machine running the component
| |
− | and run.
| |
− |
| |
− | % telnet localhost 8651
| |
− |
| |
− | By default, gmetad exports its XML on port 8651 and gmond exports
| |
− | its XML on port 8649. Modify the port number above to suite your
| |
− | configuration.
| |
− |
| |
− | When you connect to the port you should get an XML stream. If not,
| |
− | look in the process table on the machine to ensure that the
| |
− | component is actually running.
| |
− |
| |
− | Once you are getting an XML stream, capture it to a file by running.
| |
− |
| |
− | % telnet localhost 8651 > XML.txt
| |
− | Connection closed by foreign host.
| |
− |
| |
− | If you open the file "XML.txt", you will see the captured XML
| |
− | stream. You will need to remove the first three lines of the
| |
− | "XML.txt" which will read...
| |
− |
| |
− | Trying 127.0.0.1...
| |
− | Connected to localhost.
| |
− | Escape character is '^]'.
| |
− |
| |
− | Those lines are output from "telnet" and not the ganglia component
| |
− | (I wish telnet would send those messages to "stderr" but they are
| |
− | send to "stdout").
| |
− |
| |
− | There are many ways that XML can be misformed. The great tool for
| |
− | validating XML is "xmllint". "xmllint" will read the file and find
| |
− | the line containing the error.
| |
− |
| |
− | % xmllint --valid --noout XML.txt
| |
− |
| |
− | will read your captured XML stream, validate it against the ganglia
| |
− | DTD and check that it is well-formed XML. "xmllint" will quiet exit
| |
− | if there are no errors. If there are errors they will be reported
| |
− | with line numbers. For example...
| |
− |
| |
− | /tmp/XML.txt:3393: error: Opening and ending tag mismatch: HOST and CLUSTER
| |
− | </CLUSTER>
| |
− | ^
| |
− | /tmp/XML.txt:3394: error: Opening and ending tag mismatch: CLUSTER and GANGLIA_XML
| |
− | </GANGLIA_XML>
| |
− | ^
| |
− | /tmp/XML.txt:3395: error: Premature end of data in tag GANGLIA_XML
| |
− |
| |
− | If you get errors, open "XML.txt" and go to the line numbers in
| |
− | question. See if you can understand based on your configuration how
| |
− | these errors could occur. If you cannot fix the problem yourself,
| |
− | please email your "XML.txt" and output from "xmllint" to
| |
− | "ganglia-developers@lists.sourceforge.net". Please include
| |
− | information about the version of each component in question along
| |
− | with the operating system they are running on. The more details we
| |
− | have about your configuration the more likely it is we will be able
| |
− | to help you. Also, all mailing to "ganglia-developers" is archiving
| |
− | and available to read on the web. You may want to modify "XML.txt"
| |
− | to remove any sensitive information.
| |
− |
| |
− | How do I remove a host from the list?
| |
− | A common problem that people have is not being able to remove a host
| |
− | from the ganglia web frontend.
| |
− |
| |
− | Here is a common scenario
| |
− |
| |
− | 1. All hosts in a cluster are send on the ganglia udp_send_channels.
| |
− | 2. One of the hosts fails or is moved for whatever reason.
| |
− | 3. All the hosts in the cluster report that the host is "dead" or
| |
− | "expired".
| |
− | 4. The sysadmin wants to removed this host from the "dead" list.
| |
− |
| |
− | Unfortunately there is currently no nice way to remove a single dead
| |
− | host from the list. All data in gmond is soft state so you will need
| |
− | to restart all gmond and gmetad processes. It is important to note
| |
− | that ALL dead hosts will be flushed from the record by restarting
| |
− | the processes (since they have to hear the host at least once to
| |
− | know it is expired).
| |
− |
| |
− | If you add the line
| |
− |
| |
− | globals {
| |
− | host_dmax = 3600
| |
− | }
| |
− |
| |
− | then hosts will be removed from host tables when they haven't been
| |
− | heard from in 3600 seconds. See "man gmond.conf" for details.
| |
− |
| |
− | How good is Solaris, IRIX, Tru64 support?
| |
− | Here is an email from Steve Wagner about the state of the ganglia on
| |
− | Solaris, IRIX and Tru64. Steve is to thank for porting ganglia to
| |
− | Solaris and Tru64. He also helped with the IRIX port.
| |
− |
| |
− | State of the IRIX port:
| |
− |
| |
− | * CPU percentage stuff hasn't improved despite my efforts. I fear there
| |
− | may be a flaw in the way I'm summing counters for all the CPUs.
| |
− | * Auto-detection of network interfaces apparently segfaults.
| |
− | * Memory and load reporting appear to be running properly.
| |
− | * CPU speed is not being reported properly on multi-proc machines.
| |
− | * Total/running processes are not reported.
| |
− | * gmetad untested.
| |
− | * Monitoring core apparently stable in foreground, background being tested
| |
− | (had a segfault earlier).
| |
− |
| |
− | State of the Tru64 port:
| |
− |
| |
− | * CPU percentage stuff here works perfectly.
| |
− | * Memory and swap usage stats are suspected to be inaccurate.
| |
− | * Total/running processes are not reported.
| |
− | * gmetad untested.
| |
− | * Monitoring core apparently stable in foreground and background.
| |
− |
| |
− | State of the Solaris port:
| |
− | * CPU percentages are slightly off, but correct enough for trending
| |
− | purposes.
| |
− | * Load, ncpus, CPU speed, breads/writes, lreads/writes, phreads/writes,
| |
− | and rcache/wcache are all accurate.
| |
− | * Memory/swap statistics are suspiciously flat, but local stats bear
| |
− | this out (and they *are* being updated) so I haven't investigated
| |
− | further.
| |
− | * Total processes are counted, but not running ones.
| |
− | * gmetad appears stable
| |
− |
| |
− | Anyway, all three ports I've been messing with are usable and fairly
| |
− | stable. Although there are areas for improvement I think we really can't
| |
− | keep hogging all this good stuff - what I'm looking at is ready for
| |
− | release.
| |
− |
| |
− | Where are the debian packages?
| |
− | Debian packages for 2.5 are available from the main Debian archive
| |
− | for all releases.
| |
− |
| |
− | There was never an oficial Debian package for 3.0 but packages for
| |
− | 3.1 are available from Debian experimental and will be available in
| |
− | the Debian archive as soon as they are stabilized.
| |
− |
| |
− | If you are interested on using them (and help them stabilize) you
| |
− | can get them from:
| |
− |
| |
− | http://packages.debian.org/experimental/ganglia-monitor
| |
− |
| |
− | How should I configure multihomed machines?
| |
− | Here is an email that Matt Massie sent to a user having problems
| |
− | with multihomed machines
| |
− |
| |
− | i need to add a section in the documentation talking about this since it
| |
− | seems to be a common question.
| |
− |
| |
− | when you use...
| |
− |
| |
− | mcast_if eth1
| |
− |
| |
− | .. in /etc/ganglia/gmond.conf that tells gmond to send its data out the "eth1"
| |
− | network interface but that doesn't necessarily mean that the source
| |
− | address of the packets will match the "eth1" interface. to make sure that
| |
− | data sent out eth1 has the correct source address run the following...
| |
− |
| |
− | % route add -host 239.2.11.71 dev eth1
| |
− |
| |
− | ... before starting gmond. that should do the trick for you.
| |
− |
| |
− | -matt
| |
− |
| |
− | > I have seen some post related to some issues
| |
− | > with gmond + multicast running on a dual nic
| |
− | > frontend.
| |
− | >
| |
− | > Currently I am experiencing a weird behavior
| |
− | >
| |
− | > I have the following setup:
| |
− | >
| |
− | > -----------------------
| |
− | > | web server + gmetad |
| |
− | > -----------------------
| |
− | > |
| |
− | > |
| |
− | > |
| |
− | > ----------------------
| |
− | > | eth0 A.B.C.112 |
| |
− | > | |
| |
− | > | Frontend + gmond |
| |
− | > | |
| |
− | > | eth1 192.168.100.1 |
| |
− | > ----------------------
| |
− | > |
| |
− | > |
| |
− | >
| |
− | > 26 nodes each
| |
− | > gmond
| |
− | >
| |
− | > In the frontend /etc/gmond.conf I have the
| |
− | > following statement: mcast_if eth1
| |
− | >
| |
− | > The 26 nodes are correctly reported.
| |
− | >
| |
− | > However the Frontend is never reported.
| |
− | >
| |
− | > I am running iptables on the Frontend, and I am seing
| |
− | > things like:
| |
− | >
| |
− | > INPUT packet died: IN=eth1 OUT= MAC= SRC=A.B.C.112 DST=239.2.11.71
| |
− | > LEN=36 TOS=0x00 PREC=0x00 TTL=1 ID=53740 DF PROTO=UDP SPT=41608 DPT=8649
| |
− | > LEN=16
| |
− | >
| |
− | > I would have expected the source to be 192.168.100.1 with mcast_if eth1
| |
− | >
| |
− | > Any idea ?
| |
− |
| |
− | How should I configure my Cisco Catalyst Switches?
| |
− | Perhaps information regarding gmond on networks set up through cisco
| |
− | catalyst switches should be mentioned in the ganglia documentation.
| |
− | I think by default multicast traffic on the catalyst will flood all
| |
− | devices unless configured properly. Here is a relavent snipet from a
| |
− | message forum, with a link to cisco document.
| |
− |
| |
− | If what you are trying to do, is minimizing the impact on your
| |
− | network due to a multicast application, this link may describe what
| |
− | you want to do: http://www.cisco.com/warp/public/473/38.html
| |
− |
| |
− | We set up our switches according to this after a consultant came in
| |
− | and installed an application multicasting several hundred packets
| |
− | per second. This made the network functional again.
| |
− |
| |
− | Getting Support
| |
− | The tired and thirsty prospector threw himself down at the edge of the
| |
− | watering hole and started to drink. But then he looked around and saw
| |
− | skulls and bones everywhere. "Uh-oh," he thought. "This watering hole
| |
− | is reserved for skeletons." --Jack Handey
| |
− |
| |
− | There are three mailing lists available to you: "ganglia-general",
| |
− | "ganglia-developers" and "ganglia-announce". You can join these lists or
| |
− | read their archives by visiting
| |
− | https://sourceforge.net/mail/?group_id=43021
| |
− |
| |
− | "All of the ganglia mailing lists are closed". That means that in order
| |
− | to post to the lists, you must be subscribed to the list. We're sorry
| |
− | for the inconvenience however it is very easy to subscribe and
| |
− | unsubscribe from the lists. We had to close the mailing lists because of
| |
− | SPAM problems.
| |
− |
| |
− | When you need help please follow these steps until your problem is
| |
− | resolved.
| |
− |
| |
− | 1. completely read the documentation
| |
− |
| |
− | 2. check the "ganglia-general" archive to see if other people have had
| |
− | the same problem
| |
− |
| |
− | 3. post your support request to the "ganglia-general" mailing list
| |
− |
| |
− | 4. check the "ganglia-developers" archive
| |
− |
| |
− | 5. post your question to the "ganglia-developers" list
| |
− |
| |
− | please send all bugs, patches, and feature requests to the
| |
− | "ganglia-developers" list after you have checked the
| |
− | "ganglia-developers" archive to see if the question has already been
| |
− | asked and answered.
| |
− |
| |
− | Copyright
| |
− | Copyright (C) 2002,2003 University of California, Berkeley
| |
− |
| |
− | Authors
| |
− | The Ganglia Development Team...
| |
− |
| |
− | Bas van der Vlies basv Developer basv at users.sourceforge.net
| |
− | Neil T. Spring bluehal Developer bluehal at users.sourceforge.net
| |
− | Brooks Davis brooks_en_davis Developer brooks_en_davis at users.sourceforge.net
| |
− | Eric Fraser fraze Developer fraze at users.sourceforge.net
| |
− | greg bruno gregbruno Developer gregbruno at users.sourceforge.net
| |
− | Jeff Layton laytonjb Developer laytonjb at users.sourceforge.net
| |
− | Doc Schneider maddocbuddha Developer maddocbuddha at users.sourceforge.net
| |
− | Mason Katz masonkatz Developer masonkatz at users.sourceforge.net
| |
− | Mike Howard mhoward Developer mhoward at users.sourceforge.net
| |
− | Matt Massie massie Project Admin massie at users.sourceforge.net
| |
− | Oliver Mössinger olivpass Developer olivpass at users.sourceforge.net
| |
− | Preston Smith pmsmith Developer pmsmith at users.sourceforge.net
| |
− | Federico David Sacerdoti sacerdoti Developer sacerdoti at users.sourceforge.net
| |
− | Tim Cera timcera Developer timcera at users.sourceforge.net
| |
− | Mathew Benson wintermute11 Developer wintermute11 at users.sourceforge.net
| |
− | Brad Nicholes bnicholes Developer bnicholes at users.sourceforge.net
| |
− | Carlo Arenas carenas Developer carenas at users.sourceforge.net
| |
− |
| |
− | Contributors
| |
− | There have been dozens of contributors who have provided patches and
| |
− | helpful bug reports. We need to list them here later.
| |
− |
| |
− | </nowiki></pre>
| |