sinqwiki: SICSlog

SICSlog

SICS Log Files and Tracing

SICS Log Files

In the instrument account, there is a log directory. To this directory, the SICS server automatically writes an automatic log file. These log files contain all commands executed with either user or managers privilege. They thus allow to figure out what is going on at the instrument. It is always a good idea to look into the log files when a problem is reported: the statements of the scientists often are misleading or wrong.

The log files follow a naming convention. It is autoYYYY-MM-dd@HH-MM.log. Where YYYY is for the year, MM for the months, DD for the day, HH for the hour and MM for the minute. Normally, there is only one log file per day. But when SICS gets restarted, a new log file is generated. In order to use those log files, do this:

Extract from the scientists the date and time the problem happened
cd into the log directory
With ls autoYYY-MM-DD*.log figure out which log files were written that day. Replace placedolders as required: for example auto2014-07-9*.log to show all log files july, 9 in 2014.
Open the relevant log file
The log file contains time stamps, use these and the file writing times to navigate to the interesting section of the log file
Look for interesting information and error messages

Tracing

For the really hard problems, log files are not enough. This is why there is a tracing facility in SICS. Trace files contain everything interesting SICS does: communication with hardware, device start/stop messages, performance measurements, I/O, you name it. Not surprisingly, trace files can become quite large. This is why tracing is off by default. From a SICS command line, tracing can be controlled via the following commands:

trace log: shows if tracing is on or off

trace log filename: Starts writing a trace file into the filename given as a parameter

trace log close: switches tracing off again.

Analysing Trace Files

Obviously, analysing trace files depends on the problem at hand. Thus only some general information about trace files can be given. A trace file starts with a dump of all parameters known to SICS. This is followed by the real trace entries. A trace entry looks like this:

io:countersct:1404857127.006489:send:RS

This is subsystem specifier:name of component: time stamp: component specific data. For the example this means: io is the subsystem, countersct is the name, 1404857127.006489 is the time stamp, send:RS is the component specific data. The time stamp is unix time in seconds, after the comma is the base 10 converted sub second resolution.

Known sub systems include:

io: For input/output

par: for parameter changes

com: for client communication

sys: system messages

dev: device messages, mostly start and stop of devices

Any 10 minutes there are sys:TIMESTAMP message with time in easily human readable format. Search for these when it is known when something interesting has happened.

For everything else, grep is your friend for extracting the I/O to a certain device etc. The detailed timestamps give an information about response times which can be interesting when hardware is supected of being slow.