How to Run DiFX With the GUI

This document contains an end-to-end description of how the DiFX user interface can be used to run DiFX jobs.  All steps in the process are described in roughly the sequence they would be employed in actually running a job.  Because the GUI has many options that can cause branching paths in step-by-step instructions, what follows is only a description of a "sample" procedure for running a job.  The specific needs of individual users, data and installations will likely require approaches that differ some, or possibly a lot.  With this in mind, links are provided to other sections of the documentation that may provide additional details on each subject.

This document is not a comprehensive tutorial on all of the functionality of DiFX itself.  A good place to start looking for that sort of thing is here.

Install and Build

The DiFX GUI and its associated client process guiServer are part of the DiFX software tree.  , currently located in the sub-directory:
	applications/gui
The Java archive (".jar") file used to run the GUI itself is in:
	applications/gui/gui/dist/gui.jar
This directory also contains a number of other ".jar" files that are necessary to run the GUI.  If you want to move the GUI to another disk location it is best to simply copy the complete contents of the "dist" directory.

The top-level of this documentation is here:
	applications/gui/doc/intro.html
The guiServer application is also part of the DiFX software tree, in:
	application/guiServer
Using DiFX build procedures (such as difxbuild) will compile guiServer and install it and gui.jar file in the appropriate bin directories.  The ".jar" files do not need compilation.

Startup and Connecting to the DiFX Host

There are two components to the DiFX GUI.  A single instance of the server application guiServer runs on one of the processing nodes in the DiFX cluster  (the processor running guiServer is often referred to in the documentation as the "DiFX Host" or the  "DiFX head node").  GuiServer must by run by a user (not root for security reasons) that has read/write permissions over all data directories used by DiFX.  This user must also be able to start processes on all other nodes using mpirun.  It probably makes most sense to have the user that you normally use to run DiFX from the command line run guiServer.

GuiServer is run from the command line on the DiFX Host:
	guiServer [PORT #]
The optional port number is the TCP connection port used to communicate with the GUI.  If it is not specified, guiServer will use a default port number (it uses the value given by the DIFX_MESSAGE_PORT environment variable, or 50200 if that is not available).  As soon as it is started, guiServer will produce a message indicating the port it is using:
	server at port 50200
The GUI itself is a Java program that can be run anywhere that a network connection to the DiFX cluster is available.  Because the GUI and guiServer communicate using insecure TCP connections there must be no intervening firewalls between them (there are ways to deal with firewalls and in fact run the GUI anywhere - see Running DiFX Remotely).  Run the GUI using its ".jar" (Java archive) file:
	java -jar [GUI DIST PATH]/gui.jar
The "GUI DIST PATH" is the location of the "dist" subdirectory in the gui portion of the DiFX installation tree (see here).  Once the GUI is running, the address of the DiFX Host (where guiServer is running) and the port number (what guiServer told you above) can be entered in the Settings menu to connect the two (see DiFX Control Connection in the Settings documentation for details).  A proper connection will be pretty obvious - the guiServer Connection Monitor will turn green, a "connection successful" message will appear, and, assuming you have mk5daemon operating properly, the GUI should start displaying information about the components of the DiFX cluster.
The GUI and guiServer can be started in any order - the GUI will connect as soon as a guiServer becomes available (for the most part - remote connections are sometimes more touchy about this).  Any number of GUI sessions can be run simultaneously using the same guiServer, although there are considerations one should take into account to make sure ports are always available.
Run mk5daemon!

For the GUI to work properly it is important that the mk5daemon process be running on every DiFX hardware component (processors, MK5 units, etc.) in the DiFX cluster.  The reason for this is that mk5daemon produces the periodic "heartbeats" for each component, including such information as CPU and memory load and read/write operations.  Mk5daemon is also important because it is the only way the GUI knows that a component exists and is available as a resource - without it the component will not be utilized in DiFX processing.  Your DiFX cluster may be set up such that mk5daemon is started by each component when it boots, but in the event it is not you will need to log into each component (use the DIFX_USER) and start it by typing:

	mk5daemon &

Soon after mk5daemon is run on a component, the component should appear in the GUI Hardware Monitor (see Monitoring the DiFX Cluster).

It is possible to run DiFX using the GUI with mk5daemon absent on some or all components, but this is not a subject covered here.

Some Important Settings

The GUI has many options the user can set to govern processing, how data are stored, and where necessary components are located.  Most of these needn't be touched on a job-by-job basis as long as the GUI is running smoothly and appears to be doing things correctly.  Below is a list of some of the settings that are more likely to require user changes (each item is linked to detailed explanations).  A comprehensive list of all settings and their options is contained in the Settings Documentation.

Note that settings are preserved between GUI sessions, so once you have things set up and running properly you should be able to restart the GUI and have it run properly right away.  You can also save specific setting configurations to files that can later be loaded.  See here.

Monitoring the DiFX Cluster

Monitoring DiFX Jobs

The Contents of the Queue Browser

The Queue Browser is described in greater detail here.

The Queue Browser organizes DiFX jobs under a three-level hierarchy with "Experiments" at the top level.

Adding Existing Experiments to the Queue Browser


Creating a New Experiment

To create a new experiment all that is required is a .vex file and appropriate data.  The GUI will perform (or facilitate) the various steps required to set up an experiment for DiFX processing based on instructions from the user.  In short, these steps amount to:

  1. Setting up a location to do the processing
  2. Creating a .v2d file to go with the .vex file
  3. Running vex2difx and calcif2 to create .input and .im files
The GUI tries to be as flexible as possible about this, although it has a "preferred" way of arranging things such that running DiFX processing on the created experiments is possible through the GUI as well. 

To create a new experiment, select "Create New.." under the "Experiments" menu in the Queue Browser.
IMAGE
This will bring up the "Create New Experiment" window:
IMAGE
The purpose of this window is to allow the user to tailor a new experiment to meet their needs.  It creates a "working" directory for the new experiment, allows the specification of data sources, and puts all relevant DiFX files (.v2d, .input, etc.) in the working directory from which they can be run (either through the GUI or by hand).  Experimentation with different GUI settings while creating an experiment is not dangerous as the original .vex file and data files are not moved or altered in any way.  If you mess up, delete the experiment and try again.

Naming the New Experiment and Putting It Somewhere

Getting .vex File Content

bleah

Changing .vex File Content

Once you have obtained .vex data from some source, the data are displayed in the ".vex File Editor" panel.  This panel provides a (rather rudimentary) text editor that can be used to edit the .vex data by hand.  The final edited text is used in the .vex file assigned to your created experiment, a copy of which is put in your working directory. 

Some care should be taken in directly editing .vex data as it is trivial to corrupt the .vex to the point where it can't be used (the DiFX operational paradigm says that users should never need to do this), but editing this content does not alter the original source .vex file, only the final .vex file associated with the experiment - so playing around with things is not permanently harmful.

Correlation Tuning Parameters

The Correlation Tuning Parameters section includes values that can be changed to adjust the quality of the correlation results, and/or the total time processing takes.  Adjustments to many of these values is something of an art in itself, and the details of what things do and what their "best" values should be is not covered here (some talks at DiFX Users Meetings have covered the subject - slides can be viewed here).

Each item has an associated "apply" check box.  If this box is not checked, no instructions regarding the item will be put in the .v2d file and vex2difx will be allowed to pick its own defaults.  Unless you know what you are doing, don't check the apply box - let vex2difx pick the values!  The GUI has default values for all items but they are not based on anything - they are essentially placeholders.  The default values that are picked by vex2difx are far better. 

Stations and Data Sources

Each antenna involved in the observations described by the .vex data triggers the creation of a panel in the "Stations" section.  The two-letter code station/antenna code is used as a panel title (associated with each station is a check box that can be used to eliminate the station from the experiment - see below).  Each station panel contains four sub-sections: Data Source; Antenna; Site; and Settings.
IMAGE OF A STATION PANEL
In most cases users only make changes to the Settings and Data Source sections.

    Data Source

The Data Source section tells DiFX where the data for a particular station/antenna can be found.

Because filling out the Data Source section can be tedious, the DiFX GUI provides a way of pre-defining all Data Source settings for a station/antenna in the Settings "Job Creation Settings" section (see Antenna Defaults).

    Settings

The Settings section contains settings for Tone, Phase Calibration Interval and Delta Clock.  The Delta Clock value is often gleaned by running a "Clock Pass" on a subset of the experiment's data (see some sort of explanation here).

Selecting Specific Scans

When new .vex data are selected, the GUI begins with the assumption that all scans described in the data will be included in the new experiment.  There are a number of ways of adjusting which scans are ultimately used, and which stations are used in which scans.  These changes are reflected in the final .v2d and .vex files that are created as part of the new experiment.  The "Scan Selection" Editor can be used at any time to view the scans that will be included in the experiment when it is created.
IMAGE OF SCAN SELECTION EDITOR

Some of the scan and station selection controls can work at cross-purposes - effectively they provide more than one way to cause a scan or a station to be used.  When a conflict occurs, the GUI will give the most recent command precedence (if, for instance, a command is given that a scan be included in the final experiment when a previous command eliminated the scan, the GUI will include the scan). 
    Eliminating Stations in the "Source" .vex Data

Stations can be eliminated from individual scans by putting a "-1" in the "code" column within the appropriate "scan" section in the "source" .vex data.  When the GUI encounters the "-1", it will remove the station from the scan.  This duplicates hardware correlator behavior.  Starting with the .vex file snippet below, the final .vex file will not include the station "Bd" because of the "-1" in the final column.

  scan 128-1703;
    start = 2014y128d17h03m34s;
    mode = GEOSX8N.8F;
    source = 1846+322;
    station = Bd :    0 sec :    20 sec :     0 ft : 1A : &n : -1;
    station = Ho :    0 sec :    20 sec :     0 ft : 1A : &n : 1;
    station = Kk :    0 sec :    20 sec :     0 ft : 1A : &cw : 1;
    station = Ny :    0 sec :    20 sec :     0 ft : 1A : &ccw : 1;
    station = Ts :    0 sec :    20 sec :     0 ft : 1A : &cw : 1;
  endscan;
If you do not want the GUI to pay attention to the "-1" code in this way, un-check the Eliminate Stations With "-1" Code box in the Settings menu.
    Eliminating Stations in the "Stations" Section

The "Stations" section is primarily set up to change parameters related to each antenna involved in an experiment, and to select the data sources associated with them (see above).  However it also includes a check box that can be used to completely remove each station from the experiment. Any scans that no longer have enough stations to form a baseline (i.e. less than two) will be eliminated. 
IMAGE

    The "Scan/Station Timeline" Editor

The "Scan/Station Timeline" section provides a visual map of all scans and the stations used in them in a timeline.  It allows the selection/deselection of individual stations within scans or the inclusion of data from different stations based on time.
IMAGE
Somewhat more complex explanation here.

    Selecting by Source Using the "Sources" Editor

The "Sources" section shows all sources and the stations used to observe them.  It allows sources to be selected and deselected, and stations to be selectively used or eliminated from sources.
IMAGE

The Sources section is something of a work in progress, and not something anyone uses at the USNO, so it is a little confused at this point as to what it wants to be.  It was developed originally with the idea that astronomical observers would be interested in sources (in geodesy they are uninteresting).  Suggestions are welcome.

    Selecting Specific Scans With the "Scan Selection" Editor

At any time in the scan/station selection process, the "Scan Selection" editor will show which scans will be included in the final experiment (included scans are green, scans not included are gray).  It allows the user to make selections on a scan-by-scan basis by clicking on individual scans, or by turning all scans on or off using the "Select All" and "Clear All" buttons.
IMAGE OF SCAN SELECTION PANEL WITH LABELS HERE
The Scan Selection Editor includes a "Time Limits" plot that shows all scans from the original .vex file as a time sequence (again, scans in green are included, those in gray are not included).  The mouse wheel can be used to "zoom in" on different time limits, and the red and blue triangles can be grabbed and dragged to limit the final experiment in time.  This widget is somewhat redundant with the Scan/Station Timeline Editor, but it may be useful to someone.

Running DiFX Jobs

Running Jobs With the Scheduler

Using the Real-Time Monitor

monitor_server

If you wish to monitor running jobs through the GUI's real-time plotting capabilities, the DiFX application monitor_server needs to be running.  This program provides a TCP server at which real-time data from running DiFX processes can be obtained.  The absence of this process is not usually a problem - if you request the real-time plotting it should be started automatically.  However if you find that real-time plotting isn't working, this could be a cause.  For details, see the Real-Time Monitor Documentation.  Note that at this time real-time monitoring is best considered "experimental".



Running DiFX Remotely

The GUI/guiServer communications link, which handles all interaction between the GUI and DiFX, is based on insecure TCP socket connections.  This works fine if you run the GUI on the same LAN as the software correlator, but breaks down if you move outside firewalls.  To get around such restrictions, an "ssh tunnel" can be set up through a firewall as long as you can ssh to the firewall.  Running a DiFX cluster that is located behind a firewall using a GUI running on a machine outside the firewall can be accomplished using the following steps (the order of which is unimportant):

1. Start guiServer normally on the head node of the DiFX cluster.  The TCP connection port will be referred to as the "connection port" in the following steps.
2. Start an ssh tunnel from the location where you wish to run the GUI through the firewall.  This is done using an ssh command with some options:
		ssh -N -L [LOCALPORT]:[WHAT FIREWALL CALLS DIFX HEADNODE]:[CONNECTION PORT] USER@FIREWALL
3.  Start the GUI on the machine outside the firewall.  In the "DiFX Control Connection" section of the "Settings" window, set "DiFX Host" to "localhost" and "Control Port" to the value of LOCALPORT you used in the ssh tunnel.  The "guiServer Connection" light should turn green and you should start seeing data from the nodes in the DiFX cluser.
4.  Click on "Channel All Data" in the "DiFX Control Connection" section of the "Settings" window.
What is "Channel All Data" Doing?
The GUI/guiServer does not use a single socket for communication.  Many activities, including creating/running jobs, examining directory structures on the correlator, and even tab completion for many GUI text fields, require opening new sockets.  The port numbers for these sockets are all within a specific range, which you can control (see BLAH), so in theory you could set up tunnels for all of them.  Instead, the GUI allows you to "channel" all of these exchanges through to single, tunnelled primary connection.  To do this, turn on the "Channel All Data" setting in the "DiFX Control Connection" of the Settings window.  The GUI and guiServer handle the organization of packets on either side of the connection so the change should be seamless, and all activities that normally require independent sockets should act normally.  This arrangement has reassuringly little impact on the performance of the single connection socket.
Order Is (Maybe) Important
Experience has shown that it is best to start guiServer and any ssh tunnel or tunnels before trying to connect with the GUI. 

Problems you May Encounter (and What to Do About Them)

GUI/guiServer Connection Problems
Run Permissions and RSA Problems on the DiFX Host

When guiServer runs multi-core processes, it needs to be able to execute things remotely on the other nodes in the DiFX cluster.  If remote keys are not set up correctly, remote hosts will prompt for permission keys.  GuiServer has no way of intercepting these requests, so the runs will fail.  You need to make sure all of your keys are in place beforehand.

To do so, log into your head node - where you are running guiServer - using the same user name (I'm calling this user name the "DiFX user" below) and network route that you are using to run guiServer itself.  The latter is quite important - if you are running guiServer by logging into the head node remotely, log in again that way.  If you are running guiServer from the head node console, log in that way.

Next, try using ssh to remotely log into all of the nodes on your cluster.  If you can do so without any key or password requests, you should be set.

Okay, let's say you can't.  What do you do about it?.  The following may work (or you may wish to bother a system administrator or somebody who knows what they are doing).   Make sure you have an RSA key (in the file .ssh/id_rsa.pub in DiFX user's home directory).  If you don't, create one with the following command (answer any questions by hitting "return"):

	ssh-keygen -t rsa
You will have to log out and log back in for the new key to be active, or type:
	ssh-add
Then type this for every node you are using, including the head node itself.  You should use complete addresses for machine names, not aliases (it is not entirely clear that this is a problem, but we had some issues with it):
	ssh-copy-id user@node
For instance, if your DiFX user is "difx", your head node is "king" and your other processing nodes and mark 5's are "pawn1", "pawn2", "mark5-1" and "mark5-2" you would need to do the following (as the DiFX user on "king"):
	ssh-copy-id difx@king
ssh-copy-id difx@pawn1
ssh-copy-id difx@pawn2
ssh-copy-id difx@mark5-1
ssh-copy-id difx@mark5-2
To test whether this has worked, you should be able to "ssh" to your DiFX user on all nodes from root on the DiFX head node without entering a password or key.  If you can't do this, things are not set up right and jobs will not run.  Seek professional help.

These changes should survive reboots, as well as RSA key stuff ever does.