/** \page howToUse How to Use the Interface \brief Instructions and Code Snippets Showing How to Use the DiFX Python Interface This document outlines how to use the DiFX Python Interface with Python coding examples. Most of this code is swiped directly of the Example Programs - when and from where will be noted in the text. \tableofcontents \section theDifxServer The DiFX Server to Your Python Client The DiFX software package is primarily a collection of stand-alone processes originally designed to be run individually from the command line. These processes communicate with each other to some degree, but there is no overall "controlling" process that runs them all. The guiServer process was created as a server for DiFX GUI clients. Using a simple communications protocol based on one or more TCP connections, guiServer acts on instructions from the GUI to execute different DiFX processes (often using system() commands) and reports their results. The Python DiFX Interface utilizes this same protocol, appearing to guiServer like a GUI client. As a consequence of this arrangement, guiServer must be running for the DiFX Python Interface to work. \subsection runningGuiServer Running guiServer The guiServer process is part of the DiFX source tree. It is a C++ program that must be run on a system and under a user that has access to all data required for DiFX work, write permission in locations where DiFX processes need to write things, and execute permission for all DiFX processes as well as mpirun. The best place to run guiServer is on the "head node" of your DiFX cluster, using whatever user name you would use to run a DiFX process by hand (in the following examples this user will be called "oper"). Assuming your DiFX environment variables have been set up properly, guiServer should run from the command line. It will respond with the port number at which clients can connect.
	oper@headnode DIFX-DEVEL ~> guiServer
	server at port 50200
	guiServer: wait for new client connection
Alternatively you can specify the port number you want:
	oper@headnode DIFX-DEVEL ~> guiServer 50400
	server at port 50400
	guiServer: wait for new client connection
\section makingClientConnection Making a Client Connection Before doing anything, the DiFX Python Interface must make a client connection to guiServer. This is done using an instance of the DiFXControl.Client class. The connect() method is used to make the connection. It has two, optional arguments - a string containing the name of the host where guiServer is running (as it is addressed from where the client is running) and an integer representing the port number provided by guiServer. \code{.py} import DiFXControl difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) \endcode Your guiServer session will respond to this with the following (or something similar):
	guiServer: client connection from address 127.0.0.1
	guiServer: wait for new client connection
If the hostname or port is not provided, or is set to None, the connect() method will employ some defaults. If the hostname is not provided, it will use the value of the environment variable "DIFX_CONTROL_HOST". Failing that, it will guess the hostname is "localhost". If the port is omitted, the environment variables "DIFX_CONTROL_PORT" and "DIFX_MESSAGE_PORT" will be used, with 50401 serving as the final default of they don't exist. \subsection closing Closing the Client Connection When you are done with a client connection, it is best to close it. This is done with the close() method: \code{.py} import DiFXControl difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) difx.close() \endcode The guiServer session will respond with:
	127.0.0.1 disconnected
\section runningTheMonitor Monitoring Communications from the Server The client connection to guiServer is two-way - in addition to allowing commands to be sent to the server, it allows the server to send data back (the server can also be set to relay the UDP communications between DiFX processes - see \ref difxMessages "DiFX UDP Messages"). Nominally the client ignores this communication, but the monitor() function can be used to start a DiFXControl.MonitorThread that consumes and appropriately distributes it. \code{.py} import DiFXControl difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) difx.monitor() \endcode \subsection whatThe Why Doesn't the Client Monitor Automatically? It would appear that monitoring communicaiton from the server is something the client would always want to do, which begs the question: why require the user to run the monitor() function by hand - why not just make it part of the connect() function? The reason is the DiFXControl.MonitorThread runs a select() on the TCP socket to respond to incoming data. There are times when you might want to run your own select() on the socket, and the DiFXControl.MonitorThread would mess things up. At the same time, you might want some of the functionality of the DiFXControl.MonitorThread - its ability to interpret the guiServer packet protocol and trigger callbacks for instance. For these purposes there is a passiveMonitor() method which creates an instance of DiFXControl.MonitorThread but doesn't start it. If you wish to do your own select(), the DiFXControl.Client sock variable gives you access to the TCP socket. \code{.py} import DiFXControl import select difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) difx.passiveMonitor() # Run select yourself iwtd, owtd, ewtd = select.select( [difx.sock], [], [], .05 ) # Do stuff... \endcode \subsection relayPackets Collecting Relayed DiFX Communication A particular type of server-to-client communication is the relay of DiFX UDP packets. Many of the DiFX processes use UDP broadcasts to report progress or status, and these packets can be intercepted by guiServer and relayed to the Python Interface. This topic is complex enough that it is given its own section: \ref difxMessages "DiFX UDP Messages". \section environment Viewing (and Changing) Your DiFX Run Environment The DiFXControl.Client class can query guiServer to obtain information about its run environment that may influence how DiFX processes will be run. In response to the DiFXControl.Client.versionRequest() method, guiServer will provide information about the user that is running it, the DiFX version it was run under and will run other DiFX processes under (not necessarily the same!), available DiFX versions, and evironment variable values. All these data will be stored in class variables. \code{.py} import DiFXControl import time difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) difx.monitor() # Send a request for version information. Response is threaded so give it a second # to complete. difx.versionRequest() time.sleep( 1.0 ) # Print results print "Server version: " + str( difx.serverVersion ) print "DiFX will be run by user: " + str( difx.serverUser ) print "DiFX will run using version: " + str( difx.versionPreference ) \endcode The environment variables are stored as a dictionary list, where the environment variable name is the key to its value. Here we print out the entire list which might be pretty long. \code{.py} # Print the environment variables seen by guiServer. for key in difx.serverEnvironment.keys(): print key + " = " + difx.serverEnvironment[key] \endcode \subsection Using Different DiFX Versions Depending on how DiFX is installed on your system, you may have access to multiple versions of the software. There are occasionally reasons why you would wish to run using one version or another. GuiServer can run all DiFX processes using any available version as long as a specific structure is in place (this structure is installed automatically as part of the difxbuild process, and possibly other installation procedures). For guiServer to run a given version, it must have access to a "rungeneric" file. These are scripts stored in the path:
$DIFX_BASE/bin/rungeneric.{VERSION_NAME}
Each script sets up all required environment variables and whatever else needs to be done to execute a DiFX process using the given DiFX version. GuiServer runs DiFX processes through the script. For instance, to run the DiFX process vex2difx using version DIFX-DEVEL, guiServer will do the following:
$DIFX_BASE/bin/rungeneric.DIFX_DEVEL vex2difx [args]
This structure is a complexity, but it allows guiServer to run any version of DiFX that you have installed, and to switch between them with ease. And you don't have to use it - guiServer will try to run this way, but if you don't have the proper "rungeneric" files in the right place, it will execute DiFX commands without any preceding script. GuiServer itself is version-dependent, but it is designed with this "version flexibility" in mind so most versions of guiServer should be able to run most versions of DiFX. That being said, there is no guarantee that incompatibilities will not surface at some point. The DiFXControl.Client.versionRequest() call allows you to see the DiFX versions that guiServer has access to: \code{.py} # Send a version request and wait a couple of seconds for it to complete difx.versionRequest() time.sleep( 2.0 ) if len( difx.availableVersion ) > 0: print "Available DiFX Versions:" for ver in difx.availableVersion: print str( ver ) else: print "No DiFX versions available to this server." \endcode You can then set your DiFX version at any time using the DiFXControl.Client.version() method. Any subsequent DiFX operations will use the version you set: \code{.py} # Set the version to DIFX_DEVEL. difx.version( "DIFX_DEVEL" ) # do DiFX stuff... \endcode The DiFXControl.Client.version() method also queries the available versions and will not set the version you request unless it is available. \section simpleOperations Performing Simple File Operations GuiServer permits a limited number of simple file operations, including moving, removing, and creating new files and directories. The user running guiServer must have permission to run these operations for them to succeed. With the exception of \ref lsOperation "ls", these operations take place (and succeed or fail) silently (although failures will produce DiFX errors that can be collected via \ref relayPackets "relayed packets"). It should be noted that in allowing these operations guiServer is opening what is potentially a gaping security hole - a TCP connection without password protection is being given permission to create, move, and delete files. Argument lists to commands are terminated and limited in length, but are not examined for malicious activities. This is possibly a candidate for a future fix, but for the moment guiServer depends on your DiFX cluster being in a fire-walled, protected network, surrounded by friendly scientists and operators. \subsection simpleCommands Simple Operations: mv, rm, mkdir, rmdir The mv, rm, mkdir and rmdir commands have specific functions within the DiFXControl.Client class that can be used to perform them. The rm() function accepts arguments that are passed to the rm command on the server, the others do not accept any arguments. The mkdir operation will create all missing levels of a specified path (equivalent to running "mkdir -p" on the command line). \code{.py} import DiFXControl difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) # Make a new directory difx.mkdir( "/full/path/to/new/directory" ) # Remove the directory difx.rmdir( "/full/path/to/undesired/directory" ) # Remove a path (using -r argument) difx.rm( "/full/path/to/remove", "-r" ) # Move a file difx.mv( "/full/path/of/file", "/full/path/of/destination" ) difx.close() \endcode \subsection lsOperation The ls Command The ls operation is slightly more complicated because it generates a response, and the response will come in an undetermined amount of time (hopefully quickly). The DiFXControl.Client.ls() function utilizes the DiFXls.Client class to send the ls command and wait for a response (the length of the wait can be set using the DiFXControl.Client.waitTime() method). The result of the ls operation is returned as a list of strings. In this example, the ls operation is run with the "-l" argument - DiFXControl.Client.ls() accepts (almost) all normal ls arguments. \code{.py} import DiFXControl difx = DiFXControl.Client() difx.connect( "headnode.difxcluster", 50400 ) difx.monitor() dirlist = difx.ls( "/full/path", "-l" ) if dirlist == None: print "No such file or directory" else: for item in dirlist: print item \endcode While the DiFXControl.Client.ls() is waiting, it isn't doing anything. You may wish to accomplish other things during this wait, or change the process in other ways. The DiFXls.Client class contains code that can be taken apart and rearranged to hopefully accomplish what you wish. \section feedback Collecting Information Provided by Running Processes The DiFXContol.Client class provides a number of callback structures that allow calling programs to monitor progress or become aware of problems. They all require the same structure on the part of the calling program:
  1. A function is defined to accept either zero or one arguments, depending on the callback type (see below).
  2. A callback request is added to the client. Whenever the condition associated with the callback type is encountered, a call to the callback function will be made.
Below is a code example that sets up a "message" type callback. \code{.py} # Define a callback function for messages. They expect one argument (that # would be the message!) def myMessageCallback( argstr ): print "Hey! You have a message: " + argstr # Set message callback for subsequent DiFX processes difx.messageCallback( myMessageCallback ) # Do DiFX Stuff ... \endcode The following callback "types" follow this structure. To use them, replace the name of the type in the above code ("Message" or "message") with the name of the type you are interested in. All types may be monitored at the same time.
Message
Provides information about running processes when they are running well, triggered when something of significance happens. A string argument contains the message.
Warning
Triggered when something goes wrong in a process that (possibly) can be recovered from. Processing is continuing. A string argument contains the warning.
Error
Triggered when something goes wrong in a process that (probably) necessitates killing it. Processing is (almost always) stopping. A string argument contains the error.
Timeout
The DiFXControl.Client class maintains a "time out" interval after which it will give up waiting for a response from the server. This function is called when that occurs. No arguments.
Interval
This is called when some milestone of partial completion of a process is passed, for example a file transfer has moved a portion of the data. No arguments are included, however the callback may indicate that there are new data settings that can be consulted to measure progress.
Final
Triggered when a process completes. There are no arguments. This callback is occasionally employed by the control classes internally, however only in methods that do not return until complete (so this shouldn't be a problem).
You can also obtain information from running processes by monitoring DiFX message traffic. This traffic is far more cluttered, but also provides considerably more detail. See the section on \ref messages "Monitoring DiFX Messages" for detail. \section fileTransfer Transfering File Data Between Client and Server The DiFXFileTransfer.Client class can be used to obtain the contents of files on the DiFX server and to create files on the server. It inherits the DiFXControl.Client class and can also be called from that class. To get the contents of a file, the DiFXFileTransfer.Client.getFile() method can be used (DiFXControl.Client also has a method with the same name that does the same thing). \code{.py} # New instance of the client... difx = DiFXFileTransfer.Client() difx.connect() difx.monitor() # Get the file... fileStr = difx.getFile( "/full/path/to/the/file" ) # "fileStr" is a string that now contains the contents of the file ... \endcode Sending file content follows a similar pattern using the DiFXFileTransfer.Client.sendFile() method. The content is a string variable. \code{.py} # Some data... fileData = "this is the file data" # Send it difx.sendFile( "/full/path/to/the/newfile", fileData ) \endcode The \ref difxgetfile "DiFXgetFile" and \ref difxsendfile "DiFXsendFile" example programs show this class is action. \section messages Monitoring DiFX Messages Many DiFX processes broadcast UDP messages when they run. The guiServer process collects these broadcasts and sends them, if requested, to client connections. The DiFX Python Interface can be used to gather and parse this traffic to monitor the health and performance of the DiFX cluster and running processes. To collect the DiFX Message traffic, the DiFXControl.Client class must be told to "relay" message traffic: \code{.py} # New Client instance difx = DiFXControl.Client() # Make connection to DiFX server difx.connect( ( "localhost", 50401 ) ) # Start the monitor thread difx.monitor() # Tell the Client to relay UDP packets difx.relayPackets() \endcode A specific callback can be defined for relay data. This function can call the DiFXControl.parseXML function, which returns a class containing the data for the DiFX Message. The class type can be determined from the typeStr variable of the DiFXControl.XMLMessage class (which is the base class for all of the other classes that might be returned). \code{.py} def messageCallback( self, data ): # Parse the message. xmlDat = DiFXControl.parseXML( data ) print "got message of type " + xmlDat.typeStr difx.addRelayCallback( messageCallback ) \endcode You can request to have only specific message types relayed (by default all are relayed). This is more efficient than collecting every message type and throwing away those you don't want because the instruction to relay a subset of messages is passed to the server, so unwanted messages never become part of the relay traffic. To do so, the DiFXControl.Client.messageSelection() method is passed a list of message types (in the form of strings): \code{.py} messageTypes = [] messageTypes.append( "DifxAlertMessage" ) messageTypes.append( "DifxStatusMessage" ) difx.messageSelection( messageTypes ) \endcode All DiFX broadcast message types are defined in "difxmessage.h" in the difxmessage directory under the DiFX source tree. The following (possibly incomplete and probably not fully accurate) table lists the different message types (using the name that can be used to select them with DiFXControl.Client.messageSelection(), the class provided by DiFXControl.parseXML(), and a description of what (I think) they are used for (this is where the inaccuracies might appear). Note that many of the message types were created (and are exclusively used for) communication with guiServer. These message types will probably not appear as UDP broadcasts.
DifxLoadMessage DiFXControl.DifxLoadMessage Transmitted by processors or Mark5's (from mk5daemon), shows CPU load, transmit and receive rate.
DifxAlertMessage DiFXControl.DifxAlertMessage Transmits an "alert" associated with an .input file - presumably a running job.
Mark5StatusMessage DiFXControl.Mark5StatusMessage Transmitted by Mark5 units (from mk5daemon), shows VNS's for mounted modules, scan number being worked on, etc.
DifxStatusMessage DiFXControl.DifxStatusMessage Transmits the status of a running job (indentified by .input file name) including current MJD and start/stop MJDs (all of which allow you to get a rough completion percentage).
DifxInfoMessage DiFXControl.DifxInfoMessage Contains a single piece of information in the form of a string.
DifxDatastreamMessage not available Definition appears to be missing in difxmessage.h. Is this message in use?
DifxCommand DiFXControl.DifxCommandMessage Contains a single command (which is just a string).
DifxParameter DiFXControl.DifxParameter Transmits a single parameter name and value.
DifxStart DiFXControl.DifxStart Instructs guiServer to start a job using its .input file name.
DifxStop DiFXControl.DifxStop Instructs guiServer to try to stop a running job using the .input file name. I say "try" because it is often unsuccessful.
Mark5VersionMessage DiFXControl.Mark5VersionMessage Contains information about a Mark5 unit - firmware versions, that sort of thing.
Mark5ConditionMessage not available Deprecated
DifxTransientMessage DiFXControl.DifxTransientMessage Used to tell Mk5daemon to copy some data at the end of the correlation to a disk file.
DifxSmartMessage DiFXControl.DifxSmartMessage Contains S.M.A.R.T. information for one disk in a Mark5 module. Typically 8 such messages will be needed to convey results from the conditioning of one module.
Mark5DriveStatsMessage DiFXControl.Mark5DriveStatsMessage Transmits drive statistics for Mark5 modules - serial numbers, VSNs, etc.
DifxDiagnosticMessage DiFXControl.DifxDiagnosticMessage Used to pass out diagnostic-type info like buffer states.
DifxFileTransfer DiFXControl.DifxFileTransfer Uses guiServer to grab the contents of a file from the DiFX server or push content into a named file on the DiFX server.
DifxFileOperation DiFXControl.DifxFileOperation Tells guiServer to perform operations such as mkdir, rmdir, ls and rm.
DifxVex2DifxRun DiFXControl.DifxVex2DifxRun Instructs guiServer to run the DiFX application vex2difx, which will create .input files out of .vex and .v2d files.
DifxMachinesDefinition DiFXControl.DifxMachinesDefinition Instructs guiServer to build .machines and .threads files associated with a particular .input file. Developed for the GUI/guiServer interface.
DifxGetDirectory DiFXControl.DifxGetDirectory Starts a session with guiServer to obtain the directory for a Mark5 module.
DifxMk5Control DiFXControl.DifxMk5Control Tells guiServer to run mk5control for a specific task - this will in turn generate another message type.
DifxMark5Copy DiFXControl.DifxMark5Copy Tells guiServer to run mk5cp to make file copies of Mark5 data.
The \ref difxmessages "DiFXMessages" example program shows how to collect and parse all of the different DiFX message types. \section creatingJobs Creating New Jobs Jobs are "created" on the DiFX server by running vex2difx and a "calc" procedure. For this to work, a .v2d file must exist on the server containing, at a minimum, the identity of a legal .vex file that describes the observations. The path to the identified .vex file is used as a destination for the .input, .calc., and .flag files created by vex2difx and the .input files created by the calc process that comprise DiFX-runnable jobs. The DiFX Python Interface is slightly inflexible in that it expects the .v2d and .vex files to reside in the same writeable directory. It may be possible to trick it into working with other arrangements, but the assumption of this documentation is that a directory exists or has been \ref simpleCommands "created" with DiFX server write permission, and a .vex file and .v2d file referencing it have been \ref fileTransfer "put there". \subsection runningvex2difx Running vex2difx The DiFXvex2difx.Client class is used to run both vex2difx and the calc process. This class inherits the DiFXControl.Client class and has access to all of its methods. To create jobs, you need to provide this class with the path of the directory in which your .v2d and .vex files live using the DiFXvex2difx.Client.passPath() method and the name of the .v2d file (without the .v2d extension) using the DiFXvex2difx.Client.v2dFile() method. Then run the both vex2difx and the calc process using DiFXvex2difx.Client.runVex2Difx() method. By default this method will run silently and not until everything is complete. \code{.py} import DiFXvex2difx # New client instance, connecting to the DiFX server in the usual manner difx = DiFXvex2difx.Client() difx.connect( ( "localhost", 50401 ) ) difx.monitor() # Set path and .v2d file name (the directory path should contain newJobs.v2d) difx.passPath( "/data/correlator/newExperiment" ) difx.v2dFile( "jobName" ) # Run vex2difx and calc difx.runVex2Difx() print "vex2difx and calc complete" difx.close() \endcode You can also run vex2difx and calc in the background by providing "False" as an argument to the DiFXvex2difx.Client.runVex2Difx() method. \code{.py} ... # Run vex2difx and calc in the background difx.runVex2Difx( False ) print "vex2difx and calc are running" # Do other stuff ... \endcode Feedback can be collected from the DiFXvex2difx.Client.runVex2Difx() method using callbacks. One callback can be set to respond as each .input file is created (indicating vex2difx has created a new job) and each .im file is created (indicating the calc process has completed work on that job and it is ready to run). This callback must take an argument (the name of the newly-created file). The other callback will be triggered when all processing is complete - it does not require an argument. Callbacks must be defined and then assigned using the DiFXvex2difx.Client.newFileCallback() and DiFXvex2difx.Client.processCompleteCallback() methods. The callbacks can contain whatever code you like, and work whether you are running DiFXvex2difx.Client.runVex2Difx() such that it returns immediately or not. \code{.py} import DiFXvex2difx vex2difxRunning = False # Define some callback functions. def myNewFileCallback( newFile ): print newFile + " was created" def myProcessCompleteCallback(): vex2difxRunning = False # New client instance, connecting to the DiFX server in the usual manner difx = DiFXvex2difx.Client() difx.connect( ( "localhost", 50401 ) ) difx.monitor() # Set path and .v2d file name difx.passPath( "/data/correlator/newExperiment" ) difx.v2dFile( "jobName" ) # Assign callbacks difx.newFileCallback( myNewFileCallback ) difx.processCompleteCallback( myProcessCompleteCallback ) # Run vex2difx and calc, return immediately vex2difxRunning = True difx.runVex2Difx( False ) print "vex2difx started" # Do some other stuff. while vex2difxRunning: print "still running" ... print "vex2difx and calc complete!" difx.close() \endcode \subsection calcprocess Setting the Calc Process By default, the calc process run on the DiFX server is calcif2 - the process used at the time of the creation of the DiFX Python Interface. It is quite possible that this will not always be the case, as a DiFX-specific calc process is in development. You can specify a different calc process using the DiFXvex2difx.Client.calcCommand() method. \code{.py} ... difx.calcCommand( "mycalcprocess" ) difx.runVex2Difx() \endcode The calc process is expected to accept a .calc file path as an argument following the "-f" flag and produce an .im file (this is what calcif2 does). It is your own responsibility to assure that the specified process exists and runs this way - the DiFX server will simply run whatever it is told to run without checking for sanity. \subsection runningcalc Running the Calc Process Alone Because the calc process is occasionally problematic (which is to say it fails a lot), it is not uncommon that it needs to be run on a set of existing jobs. This can be done using the DiFXvex2difx.Client.calcOnly() method. A list of jobs on which to perform this operation must be specified by using the job name(s) and the DiFXvex2difx.Client().jobName() method (this method accepts "ls" wildcards). \code{.py} ... difx.calcOnly( True ) difx.jobName( "jobs_0*" ) difx.runVex2Difx() \endcode The vex2difx process will be skipped, and calc will be run on any existing .calc files where the job names match the specification. Any existing .im files for these jobs will be over-written. \section jobControl Starting, Stopping and Monitoring DiFX Jobs All job control is done using the DiFXJobControl.Client class, which inherits the DiFXControl.Client class. The class can be used to define data sources and processors, start and stop jobs, and (to a limited extend) produce real-time results from running jobs. Creating an instance of this class for job control is done in a pretty standard way. \code{.py} import DiFXJobControl difx = DiFXJobControl.Client() difx.connect() difx.monitor() \endcode While this class inherits the DiFXControl.Client class, it more complex than the \ref taskSpecific "Task-Specific Classes", and its methods cannot be called by the DiFXControl.Client class. \subsection jobControl_inputFile Identifying a Job Using the .input File Each job on the DiFX server can be uniquely identified by the full path to its .input file. The .input file (along with some other files) contains a full description of a job, and must exist for DiFX to process it. The .input file always has the extension ".input". It is created from the .vex and .vex files by vex2difx. The DiFXJobControl.Client class uses the full path of the .input file to refer control and monitoring instructions to the correct job. Before you start controlling a job, you must give the class an existing .input file to work with. \code{.py} # Give the client class the name of the .input file for our job difx.inputFile( "/the/full/path/to/the/job.input" ) \endcode \subsection jobControl_machines Defining Data Sources and Processors To run on a multi-processor system, DiFX requires a list of the processing nodes that will be used as data sources, those that will be used for processing along with the number of processing threads to run on each, and the name of the node that will serve as the manager of the others, or the "head node". These specifications are contained in two files on the DiFX server with the extensions ".machines" and ".threads". If these files exist for your .input file you can download the content of them using the DiFXJobControl.Client.getMachines() method and examine it by using the DiFXJobControl.Client.headNode(), DiFXJobControl.Client.dataSources() and DiFXJobControl.Client.processors() methods. \code{.py} # Get the content of the .machines and .threads files difx.getMachines() # The head node is stored as a string print difx.headNode() # The data sources are stored as a list of strings for node in difx.dataSources(): print node # The processors are stored as a list of tuples that have node names and threads for node in difx.processors(): print node[0] + " " + str( node[1] ) \endcode You can change the node names and threads in these lists using the DiFXJobControl.Client.headNode(), DiFXJobControl.Client.addDataSource() and DiFXJobControl.Client.addProcessor() methods, or clear them completely using the DiFXJobControl.Client.clearDataSources() and DiFXJobControl.Client.clearProcessors() methods. \code{.py} # change the head node difx.headNode( "myHeadNode" ) # empty the data source list difx.clearDataSources() # add a few data sources of our own difx.addDataSource( "myDataSource1" ) difx.addDataSource( "myDataSource2" ) # add a few processors to the existing list, along with the threads to be used by # each -- we are not emptying the list first so these are added to whatever was there. difx.addProcessor( "myProcessor1", 10 ) difx.addProcessor( "myProcessor2", 8 ) \endcode The above functions change the lists of nodes in the DiFXJobControl.Client class, but the .machines and .threads files on the DiFX server aren't changed until you run the DiFXJobControl.Client.defineMachines() method. \code{.py} # create new .machines and .threads files using our definitions difx.defineMachines() \endcode \subsection jobControl_start Running a Job and Monitoring Progress Once the .input file is defined and the .machines and .threads files are in place, a new instance of the DiFXJobControl.Client.JobRunner class must be created to actually run the job. This is done by calling the DiFXJobControl.Client.newJob() method. The class instance makes a "frozen" copy of all DiFXJobControl.Client settings (.input file, timeouts, machine specifications, etc.), the purpose of which is to allow DiFXJobControl.Client to specify and start different jobs simultaneously. Once the class instance is created, starting a job is a simple matter of calling the DiFXJobControl.Client.JobRunner.start() method. Calling the method without arguments will make it return only after the job is complete. In the following example we give the client a long "time out" period using the DiFXControl.Client.waitTime() method. This keeps the start command from returning prematurely (which it may do if your system is slow to spawn a job - the wait time is something that has to be experimented with to tailor it for each system). \code{.py} # Set a very long wait time so we don't time out waiting for the job to start difx.waitTime( 300.0 ) # Create a new class to run the job thisJob = difx.newJob() # Start the job - method will return when it is done thisJob.start() print "job done!" \endcode Alternatively you can start a job such that the JobRunner.start() method returns immediately. This will allow you to do other things, but you won't necessarily know when the job is done unless you make other monitoring arrangements. No need to mess with the wait time in this case. \code{.py} # Create a JobRunner instance to run the job thisJob = difx.newJob() # Start the job - method will return immediately thisJob.start( False ) print "job started!" \endcode \subsubsection monitor_progress How Can I See What is Going On? If you run a job using DiFXJobControl.Client.JobRunner.start() it will run silently, and except for returning when it is done (or failed), it won't tell you much about what is going on (you can also throw away that paltry piece of information by telling DiFXJobControl.Client.JobRunner.start() to return immediately). However, there is more information available. For instance, the DiFXJobControl.Client.JobRunner.jobComplete variable can be consulted to determine when a job is no longer running. In the following example a thread monitors a job to determine when it is complete. \code{.py} # Simple class to monitor a job class JobEndMonitor( threading.Thread ): def __init__( self, thisJob, jobName ): threading.Thread.__init__( self ) self.jobName = jobName self.thisJob = thisJob def run( self ): quitNow = False while not thisJob.jobComplete and not quitNow: try: time.sleep( 0.1 ) except KeyboardInterrupt: quitNow = True if not quitNow: print jobName + " completed!" thisJob.closeChannel() ... # Set the .input file for the job difx.inputFile( inputFile ) # Find the job name from the .input file name jobName = inputFile[inputFile.rfind( "/" ) + 1:inputFile.rfind( "." )] ... # Create a JobRunner instance to run the job thisJob = difx.newJob() # Start the job - method will return immediately thisJob.start( False ) print "job started!" # Start an instance of the thread to monitor the job theThread = JobEndMonitor( thisJob, jobName ) theThread.start() # Do other stuff... ... \endcode When guiServer runs a job in response to the DiFXJobControl.Client.JobRunner.start() method, it collects all output to stdout and stderr and transmits these data back to the client where they are treated as "messages", "warnings", and "errors". These data can be monitored by setting appropriate callback functions (see \ref feedback "Collecting Information Provided by Running Processes"). \code{.py} # Callback functions for messages, warnings, and errors. def messageCallback( argstr ): print str( argstr ) def warningCallback( argstr ): print "WARNING: " + str( argstr ) def errorCallback( argstr ): print "ERROR: " + str( argstr ) # Assign the callbacks difx.messageCallback( messageCallback ) difx.warningCallback( warningCallback ) difx.errorCallback( errorCallback ) # Start the job - output will be printed by the callbacks thisJob = difx.newJob() thisJob.start() \endcode GuiServer provides some other feedback as well, mostly in the form of problems encountered when trying to start a job. The client translates these data into messages, warnings, and errors as it sees fit. You can also directly gather the feedback from guiServer yourself using your own assigned callback. See the DiFXJobControl.Client.JobRunner.setStartCallback() method for more information. The trouble with monitoring messages, warnings, and errors is that DiFX simply does not provide much information to stdout and stderr while a job is running (there is a burst of information when it starts, but nothing after that). A slightly trickier approach to viewing the progress of a job, but one that will provide you with more information, is to monitor DiFX message traffic, which is described in the \ref messages "Monitoring DiFX Messages" section above. There are several DiFX UDP message types that contain information related to specific jobs (see the \ref difxbusy "DiFXBusy" example application), but for the following code example we will be collecting messages of the "DifxStatusMessage" type from which we can compute a (rough) completion percentage. \code{.py} # Define a class to respond to messages class Responder: def __init__( self ): self.jobName = None self.jobProgress = 0 # Callback triggered when the monitoring client receives a new DiFX message. def difxRelayCallback( self, data ): # Parse the message. xmlDat = DiFXControl.parseXML( data ) # See if this is a message type we are interested in. if xmlDat.typeStr == "DifxStatusMessage": # Make sure the identifier (job name) is one we are interested in. if self.jobName != None and xmlDat.identifier == self.jobName: # Compute a progress value. The data for this might not be there, so we # have some default situations. try: self.jobProgress = 100.0 * ( float( xmlDat.visibilityMJD ) - float( xmlDat.jobstartMJD ) ) / ( float( xmlDat.jobstopMJD ) - float( xmlDat.jobstartMJD ) ) except: # Exception caused by failures of float conversions - because fields are empty if xmlDat.state == "Done" or xmlDat.state == "MpiDone": self.jobProgress = 100 elif xmlDat.state == "Ending": # Ending state is annoying - use the job's known progress pass else: self.jobProgress = 0 # print the results print xmlDat.state + " " + str( int( self.jobProgress ) ) + "% complete" ... # Create a new Responder class instance responder = Responder() # Tell it what the job name is (which we generate from the full path to the .input file) responder.jobName = inputFile[inputFile.rfind( "/" ) + 1:inputFile.rfind( "." )] # Add the callback in the responder class instance difx.addRelayCallback( responder.difxRelayCallback ) # Turn on packet "relays" - this causes guiServer to feed DiFX message traffic to the client difx.relayPackets() # Start the job and watch the results thisJob = difx.newJob() thisJob.start() \endcode \subsection jobControl_stop Stopping a Running Job GuiServer uses mpirun to spawn a job, and has little control over it once it has been started. Once started, a job will continue until completion (or failure), even if guiServer is killed. This can be annoying if you realize that a job is doing the wrong thing and wish it to stop consuming resources. The DiFXJobControl.Client.stop() method can be used to attemp to stop a running job. It has an (optional) .input file argument that can be used to specify the job to stop. By default it tries to stop whatever .input file DiFXJobControl.Client was last provided with. \code{.py} # Stop the previously referenced .input file difx.stop() # Stop some different job using its input file path difx.stop( "/full/path/to/other/file.input" ) \endcode The only problem with the DiFXJobControl.Client.stop() method is that it often quite mysterously fails to work. Upon receipt of the stop request, guiServer will do everything in its power to stop a job, but often, and especially when some aspect of processing has become wedged (annoyingly the situation where you would most likely want to stop running), nothing happens. \subsection jobControl_monitor Monitoring the Data Output of a Running Job The DiFXJobControl.Client class has the ability to collect real-time data products from a running job. This ability depends on the presence of the monitor_server application in the DiFX bin on the DiFX server (it doesn't have to be running - it just has to be there), and can only monitor one running job at a time (DiFX can run many simultaneously). The process is also a bit buggy. Real time monitoring can be instructed to collect a number of data "products" for each frequency in a scan. As a time segment is processed within a scan, the data products will become available. The products themselves are described in detail in the DiFXJobControl.Client.getMonitorProducts() method, but in brief summary they include: These data are obtained by callback functions that you must create and assign. The following code (with limited explanation) shows the collection of some real time data. \code{.py} # Define a callback for lag data def lagCallback(): # Print the array of lag data print difx.lag # Print the delay and SNR print difx.meanLagDelay, difx.meanLagSnr # Create your class instance, define an .input file, etc. ... # Start the real time monitor difx.startMonitor() # Generate a list of available data products. This will give you # of "product numbers" that you put in a list (see below) monProducts = difx.getMonitorProducts() # Request a list of product using their product numbers (see above). # The numbers used here are arbitrary. You can request any number of # products that you like. difx.requestProducts( ( 3, 5, 8 ) ) # Turn monitoring on - the job will now run with the real time monitoring difx.runWithMonitor( True ) # Select your callbacks for one or more of the six product types. We # are only interested in the accumulated lag values difx.monitorDataCallbacks( None, None, None, None, None, lagCallback ) # Start the job - real time data will trigger the callback as each # segment is processed. thisJob = difx.newJob() thisJob.start() # Do this when you are done. difx.stopMonitor() \endcode There are problems with the real time monitor that make it not quite ready for prime time. If multiple jobs are run in sequence guiServer will occasionally (or perhaps the word should be "eventually") crash. In addition, the monitor_server program will often not provide data after the first job and will need to be killed by hand. More work is necessary to clean this process up. \section jobHistory Obtaining the Run History of a Job Any job that has been run using guiServer or the startdifx script will add to a record of run-time activities and final run status to a .difxlog file (located in the same directory as the .input file). The DiFXJobControl.Client.jobStatus() method can be used to trigger guiServer to parse a list of .difxlog files and return a description of their run history. \code{.py} # Get status information for a list of .input files - normal "ls" wildcards permitted. statusInfo = difx.jobStatus( "/full/path/to/*.input" ) # Print the overall time stamp - first item in a returned tuple print statusInfo[0] # Then results list - second item in the returned tuple if statusInfo[1] == None: print "No .input files found." else: # Each item (corresponds to each requested .input file) is a tuple itself. # Compose a line of output for each. for item in statusInfo[1]: # First part of tuple is the .input file outStr = item[0] + " " # Second item of the tuple is yet another tuple, the second item of which # is the final status! outStr += item[1][1] print outStr \endcode The .difxlog file contains quite a bit of information about the run history of a job, but currently the DiFXJobControl.Client.jobStatus() method will only return the most recent status of a job (never started, successful, failed, running, etc). This is useful information, but future expansion of the capability should be able to provide much more. \section vsnStuff Mark5-Specific Operations \subsection vsnStuff_mk5 Sending a Mark5 Command \subsection vsnStuff_directory Obtaining the Directory of a Mark5 Module \subsection vsnStuff_filelist Generating a New Mark5 "File List" \subsection vsnStuff_copy Copying Data from a Mark5 Module */