Lab : the Contract Subsystem
Lab : Working with process contracts.
=============================
Note : please send comments, suggestions and remarks to nieuwenj@nieuwenj.com
Last edited : 08/03/2005
A process contract is the formal definition of the relationship that exists between a Process A and its monitoring rocess. In case process A terminates abnormally, the monitoring process will be able to restart it.
This is a small exercise that is using the contract subsystem and its associated commmands.
0. Get some information
# man contract
1. SMF and contracts
Check the state of the “syslog” service :
# svcs system-log
STATE STIME FMRI
online Dec_10 svc:/system/system-log:default
#
# pgrep -fl syslog
658 /usr/sbin/syslogd
# ps -o pid,comm,ctid | grep syslog
658 /usr/sbin/syslogd 35
# svcs -pv system-log
STATE NSTATE STIME CTID FMRI
online - Dec_10 35 svc:/system/system-log:default
Dec_10 658 syslogd
All these commands show that the service is enabled ( syslogd is started with pid 658 ) and is monitored by the contract subsystem using ContractID 35.
Another command shows more information about the contract :
#ctstat -i 35
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
35 0 process owned 7 0 - -
Or in verbose mode :
#ctstat -vi 35
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
35 0 process owned 7 0 - -
cookie: 0×20
informative event set: none
critical event set: core signal hwerr empty
fatal event set: none
parameter set: inherit regent
member processes: 658
inherited contracts: none
The HOLDER field indicates that the process holding the contract is having pid 7 :
# ps -fp 7
UID PID PPID C STIME TTY TIME CMD
root 7 1 0 Dec 10 ? 0:41 /lib/svc/bin/svc.startd
which is the main SMF daemon. Let’s find out more about the contract held by this process. The /proc file system
can help there :
# cd /proc/7/contracts/
# ls
17 18 19 22 27 30 32 34 35 36 37 38 39 40 42 43 46 50 83
where we find out why ’svc.startd’, also knows as svc://system/svc/restarter:default is called the master
restarter in the SMF framework : it is holding contracts for many many services.
Coming back to the verbose output of the ctstat command, we see that some events received by the holder of contract 35 are considered critical. Among them, “signal” is an event that indicates the reception of a fatal signal from another process. The restarter has the job of restarting syslogd if one ‘critical’ event is received.
Let’s check that…
2. The master restarter
Before helping syslogd to die, let’s open another terminal and type
#ctwatch -rv 35
CTID EVID CRIT ACK CTTYPE SUMMARY
which can be used to see all the events related to contract 35.
We can now use svcadm :
#svcadm refresh system-log
Nothing visible, a quick look at the Pid of syslogd tells us that it only reread its config file. We did the
famous “pkill -HUP syslogd” in the SMF way. Let’s try again :
#svcadm restart system-log
#svcs -pv system-log
STATE NSTATE STIME CTID FMRI
online - 10:24:37 89 svc:/system/system-log:default
10:24:37 2173 syslogd
while the other “ctwatch” terminal shows :
#ctwatch -rv 35
CTID EVID CRIT ACK CTTYPE SUMMARY
35 33 crit no process contract empty
What happend ? The syslogd process was terminated by svcadm. The contract being linked to the process is then also terminated and is ‘empty’. While starting another syslogd, another contract, number 89, was created.
Now in the “ctwatch” terminal, let’s type
#ctwatch -rv 89
CTID EVID CRIT ACK CTTYPE SUMMARY
while in the other window, we kill the syslog daemon (only in the global zone) :
#pkill -9 -z 0 syslogd
We observe in the “ctwatch terminal” :
#ctwatch -rv 89
CTID EVID CRIT ACK CTTYPE SUMMARY
89 34 crit no process process 2173 received a fatal signal
signal: 9 (SIGKILL)
sender pid: 2187
sender ctid: 86
89 35 crit no process contract empty
Which shows that the contract subsystem was notified that the syslogd process received a signal. We even may know who sent it. In our case, it is the “kill” command, which terminated already. The result of the signal is that
contract is now empty, ended.
But :
# svcs -pv system-log
STATE NSTATE STIME CTID FMRI
online - 10:27:04 90 svc:/system/system-log:default
10:27:04 2193 syslogd
shows that the master restarter has done its job. The system-log service is still online because “svc.startd” has
instructed “init” to fork and exec a new version of syslogd to keep the service running.
3. The contract file system
Everyone knows the /proc filesystem, used to provide information to system administrators about the running processes in a nice well-known file-based manner. Commands like “ps”, “pfile”, “pgrep”,… get their information by opening and reading files in /proc which are actually an interface to the process structures maintained by the kernel.
The same is true for ctfs, the contract file system. All the contract commands get their input from the kernel through another pseudo filesystem mounted on /system/contract.
Example :
# truss -t open ctstat -i 35
open(”/var/ld/ld.config”, O_RDONLY) Err#2 ENOENT
open(”/lib/libcontract.so.1″, O_RDONLY) = 3
open(”/lib/libuutil.so.1″, O_RDONLY) = 3
open(”/lib/libc.so.1″, O_RDONLY) = 3
open(”/lib/libnvpair.so.1″, O_RDONLY) = 3
open(”/lib/libnsl.so.1″, O_RDONLY) = 3
open(”/platform/SUNW,Sun-Blade-100/lib/libc_psr.so.1″, O_RDONLY) = 3
open(”/usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3″, O_RDONLY) = 3
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
open64(”/system/contract/all/35/status”, O_RDONLY) = 3
35 0 process owned 7 0 - -
4. Monitor and restart ANY application
The command ‘ctrun’ can be used to create a contract for any application. It will then monitor the application process for all the events that you specify and restart the application if some fatal event occurs. Check it out!
#ctrun -r 0 -o noorphan -f signal /usr/openwin/bin/xclock &
My xclock will be restarted any number of times ( -r 0 ), ctrun will make sure that all processes get killed before restarting ( -o noorphan ) and we monitor the ’signal’ type of fatal event.
You get a contractid and so on.
Then just kill the clock and see it nicely reappear…
