How to Run DiFX With the GUI

Install and Build
Startup and Connecting to the DiFX Host
Some Important Settings
Monitoring the DiFX Cluster
Monitoring DiFX Jobs
Creating a New Experiment
Running a DiFX Job
Running Jobs With the Scheduler
Using the Real-Time Monitor
Problems You May Encounter (and What to Do About Them)

This document contains an end-to-end description of how the DiFX user interface can be used to run DiFX jobs. All steps in the process are described in roughly the sequence they would be employed in actually running a job. Because the GUI has many options that can cause branching paths in step-by-step instructions, what follows is only a description of a "sample" procedure for running a job. The specific needs of individual users, data and installations will likely require approaches that differ some, or possibly a lot. With this in mind, links are provided to other sections of the documentation that may provide additional details on each subject.

This document is not a comprehensive tutorial on all of the functionality of DiFX itself. A good place to start looking for that sort of thing is here.

Install and Build

The DiFX GUI and its associated client process guiServer are part of the DiFX software tree. , currently located in the sub-directory:

	applications/gui

The Java archive (".jar") file used to run the GUI itself is in:

	applications/gui/gui/dist/gui.jar

This directory also contains a number of other ".jar" files that are necessary to run the GUI. If you want to move the GUI to another disk location it is best to simply copy the complete contents of the "dist" directory.

The top-level of this documentation is here:

	applications/gui/doc/intro.html

The guiServer application is also part of the DiFX software tree, in:

	application/guiServer

Using DiFX build procedures (such as difxbuild) will compile guiServer and install it and gui.jar file in the appropriate bin directories. The ".jar" files do not need compilation.

Startup and Connecting to the DiFX Host

There are two components to the DiFX GUI. A single instance of the server application guiServer runs on one of the processing nodes in the DiFX cluster (the processor running guiServer is often referred to in the documentation as the "DiFX Host" or the "DiFX head node"). GuiServer must by run by a user (not root for security reasons) that has read/write permissions over all data directories used by DiFX. This user must also be able to start processes on all other nodes using mpirun. It probably makes most sense to have the user that you normally use to run DiFX from the command line run guiServer.

GuiServer is run from the command line on the DiFX Host:

	guiServer [PORT #]

The optional port number is the TCP connection port used to communicate with the GUI. If it is not specified, guiServer will use a default port number (it uses the value given by the DIFX_MESSAGE_PORT environment variable, or 50200 if that is not available). As soon as it is started, guiServer will produce a message indicating the port it is using:

	server at port 50200

The GUI itself is a Java program that can be run anywhere that a network connection to the DiFX cluster is available. Because the GUI and guiServer communicate using insecure TCP connections there must be no intervening firewalls between them (there are ways to deal with firewalls and in fact run the GUI anywhere - see Running DiFX Remotely). Run the GUI using its ".jar" (Java archive) file:

	java -jar [GUI DIST PATH]/gui.jar

The "GUI DIST PATH" is the location of the "dist" subdirectory in the gui portion of the DiFX installation tree (see here). Once the GUI is running, the address of the DiFX Host (where guiServer is running) and the port number (what guiServer told you above) can be entered in the Settings menu to connect the two (see DiFX Control Connection in the Settings documentation for details). A proper connection will be pretty obvious - the guiServer Connection Monitor will turn green, a "connection successful" message will appear, and, assuming you have mk5daemon operating properly, the GUI should start displaying information about the components of the DiFX cluster.

The GUI and guiServer can be started in any order - the GUI will connect as soon as a guiServer becomes available (for the most part - remote connections are sometimes more touchy about this). Any number of GUI sessions can be run simultaneously using the same guiServer, although there are considerations one should take into account to make sure ports are always available.

Run mk5daemon!

For the GUI to work properly it is important that the mk5daemon process be running on every DiFX hardware component (processors, MK5 units, etc.) in the DiFX cluster. The reason for this is that mk5daemon produces the periodic "heartbeats" for each component, including such information as CPU and memory load and read/write operations. Mk5daemon is also important because it is the only way the GUI knows that a component exists and is available as a resource - without it the component will not be utilized in DiFX processing. Your DiFX cluster may be set up such that mk5daemon is started by each component when it boots, but in the event it is not you will need to log into each component (use the DIFX_USER) and start it by typing:

	mk5daemon &

Soon after mk5daemon is run on a component, the component should appear in the GUI Hardware Monitor (see Monitoring the DiFX Cluster).

It is possible to run DiFX using the GUI with mk5daemon absent on some or all components, but this is not a subject covered here.

Some Important Settings

The GUI has many options the user can set to govern processing, how data are stored, and where necessary components are located. Most of these needn't be touched on a job-by-job basis as long as the GUI is running smoothly and appears to be doing things correctly. Below is a list of some of the settings that are more likely to require user changes (each item is linked to detailed explanations). A comprehensive list of all settings and their options is contained in the Settings Documentation.

DiFX Host is the host name of the "head node" of the DiFX cluster - where guiServer should be running. The DiFX host name should be whatever the machine on which the GUI is run calls the DiFX head node - i.e. a "ping" of this host name from the GUI host should be successful.
Control Port is the port number at which the GUI will repeatedly attempt to make a connection to guiServer if such a connection does not exist. The control port should match the port number given by guiServer when it is started.
Run w/DiFX Version is the version of DiFX software that will be run by the GUI. This version does not need to match that of the GUI or guiServer, however it does need to be installed on the DiFX cluster.
DiFX Execute Script is a script on the DiFX processing nodes that is used to execute all DiFX and mpi commands. The script defines environment variables and performs any other necessary setup before running things. Most of the time the script selected automatically by the GUI should be fine.
Relay Using guiServer Connection determines whether the GUI gathers DiFX messages using UDP directly or "relayed" via TCP from guiServer. It is selected by default, and generally should remain so. UDP messaging only works if the GUI is on the same LAN as the DiFX nodes and leaves the GUI in a strictly monitoring role, without the capacity to control anything.
Working Directory is the path under which new directories are stored for DiFX experiments that the user creates using the GUI. The "user" running DiFX needs to have write permission in this location or none of this will work.

Note that settings are preserved between GUI sessions, so once you have things set up and running properly you should be able to restart the GUI and have it run properly right away. You can also save specific setting configurations to files that can later be loaded. See here.

Monitoring the DiFX Cluster

Monitoring DiFX Jobs

The Contents of the Queue Browser

The Queue Browser is described in greater detail here.

The Queue Browser organizes DiFX jobs under a three-level hierarchy with "Experiments" at the top level.

An Experiment is usually bound to a single data set (one or more scans) collected over a specific time span - the results of a single observing session for instance. It can contain any number of "Passes" (including zero).
A Pass is used to contain a single analysis of a subset of the data. Often Experiments contain a "Clock Pass" run on a few scans to generate the time delays for each involved antenna, and a "Production Pass" run on all scans with those time delays in place.
Within each Pass is a series of Jobs, each controlling the processing of at least one scan.

Adding Existing Experiments to the Queue Browser

Creating a New Experiment

To create a new experiment all that is required is a .vex file and appropriate data. The GUI will perform (or facilitate) the various steps required to set up an experiment for DiFX processing based on instructions from the user. In short, these steps amount to:

Setting up a location to do the processing
Creating a .v2d file to go with the .vex file
Running vex2difx and calcif2 to create .input and .im files

The GUI tries to be as flexible as possible about this, although it has a "preferred" way of arranging things such that running DiFX processing on the created experiments is possible through the GUI as well.

To create a new experiment, select "Create New.." under the "Experiments" menu in the Queue Browser.
IMAGE
This will bring up the "Create New Experiment" window:
IMAGE
The purpose of this window is to allow the user to tailor a new experiment to meet their needs. It creates a "working" directory for the new experiment, allows the specification of data sources, and puts all relevant DiFX files (.v2d, .input, etc.) in the working directory from which they can be run (either through the GUI or by hand). Experimentation with different GUI settings while creating an experiment is not dangerous as the original .vex file and data files are not moved or altered in any way. If you mess up, delete the experiment and try again.

Naming the New Experiment and Putting It Somewhere

Getting .vex File Content

bleah

Changing .vex File Content

Once you have obtained .vex data from some source, the data are displayed in the ".vex File Editor" panel. This panel provides a (rather rudimentary) text editor that can be used to edit the .vex data by hand. The final edited text is used in the .vex file assigned to your created experiment, a copy of which is put in your working directory.

Some care should be taken in directly editing .vex data as it is trivial to corrupt the .vex to the point where it can't be used (the DiFX operational paradigm says that users should never need to do this), but editing this content does not alter the original source .vex file, only the final .vex file associated with the experiment - so playing around with things is not permanently harmful.

Correlation Tuning Parameters

The Correlation Tuning Parameters section includes values that can be changed to adjust the quality of the correlation results, and/or the total time processing takes. Adjustments to many of these values is something of an art in itself, and the details of what things do and what their "best" values should be is not covered here (some talks at DiFX Users Meetings have covered the subject - slides can be viewed here).

Each item has an associated "apply" check box. If this box is not checked, no instructions regarding the item will be put in the .v2d file and vex2difx will be allowed to pick its own defaults. Unless you know what you are doing, don't check the apply box - let vex2difx pick the values! The GUI has default values for all items but they are not based on anything - they are essentially placeholders. The default values that are picked by vex2difx are far better.

Stations and Data Sources

Each antenna involved in the observations described by the .vex data triggers the creation of a panel in the "Stations" section. The two-letter code station/antenna code is used as a panel title (associated with each station is a check box that can be used to eliminate the station from the experiment - see below). Each station panel contains four sub-sections: Data Source; Antenna; Site; and Settings.
IMAGE OF A STATION PANEL
In most cases users only make changes to the Settings and Data Source sections.

Data Source

The Data Source section tells DiFX where the data for a particular station/antenna can be found.

Because filling out the Data Source section can be tedious, the DiFX GUI provides a way of pre-defining all Data Source settings for a station/antenna in the Settings "Job Creation Settings" section (see Antenna Defaults).

Settings

The Settings section contains settings for Tone, Phase Calibration Interval and Delta Clock. The Delta Clock value is often gleaned by running a "Clock Pass" on a subset of the experiment's data (see some sort of explanation here).

Selecting Specific Scans

When new .vex data are selected, the GUI begins with the assumption that all scans described in the data will be included in the new experiment. There are a number of ways of adjusting which scans are ultimately used, and which stations are used in which scans. These changes are reflected in the final .v2d and .vex files that are created as part of the new experiment. The "Scan Selection" Editor can be used at any time to view the scans that will be included in the experiment when it is created.
IMAGE OF SCAN SELECTION EDITOR

Some of the scan and station selection controls can work at cross-purposes - effectively they provide more than one way to cause a scan or a station to be used. When a conflict occurs, the GUI will give the most recent command precedence (if, for instance, a command is given that a scan be included in the final experiment when a previous command eliminated the scan, the GUI will include the scan).

Eliminating Stations in the "Source" .vex Data

Stations can be eliminated from individual scans by putting a "-1" in the "code" column within the appropriate "scan" section in the "source" .vex data. When the GUI encounters the "-1", it will remove the station from the scan. This duplicates hardware correlator behavior. Starting with the .vex file snippet below, the final .vex file will not include the station "Bd" because of the "-1" in the final column.

  scan 128-1703;
    start = 2014y128d17h03m34s;
    mode = GEOSX8N.8F;
    source = 1846+322;
    station = Bd :    0 sec :    20 sec :     0 ft : 1A : &n : -1;
    station = Ho :    0 sec :    20 sec :     0 ft : 1A : &n : 1;
    station = Kk :    0 sec :    20 sec :     0 ft : 1A : &cw : 1;
    station = Ny :    0 sec :    20 sec :     0 ft : 1A : &ccw : 1;
    station = Ts :    0 sec :    20 sec :     0 ft : 1A : &cw : 1;
  endscan;

If you do not want the GUI to pay attention to the "-1" code in this way, un-check the Eliminate Stations With "-1" Code box in the Settings menu.

Eliminating Stations in the "Stations" Section

The "Stations" section is primarily set up to change parameters related to each antenna involved in an experiment, and to select the data sources associated with them (see above). However it also includes a check box that can be used to completely remove each station from the experiment. Any scans that no longer have enough stations to form a baseline (i.e. less than two) will be eliminated.
IMAGE

The "Scan/Station Timeline" Editor

The "Scan/Station Timeline" section provides a visual map of all scans and the stations used in them in a timeline. It allows the selection/deselection of individual stations within scans or the inclusion of data from different stations based on time.
IMAGE
Somewhat more complex explanation here.

Selecting by Source Using the "Sources" Editor

The "Sources" section shows all sources and the stations used to observe them. It allows sources to be selected and deselected, and stations to be selectively used or eliminated from sources.
IMAGE

All sources included in the .vex file are listed. Boxes show which sources are observed with which stations.
Hover over the name of a source to produce a tooltip that includes information about the source as well as the names of the scans used to observe it and the stations used for each of those scans (stations marked in red have been eliminated).
Hover over the boxes to produce a tooltip that includes which scans use the associated station on the associated source. A scan that appears in red text has been eliminated - either explicitly or because it lacks sufficient stations to form a baseline.
Use check boxes to add or eliminate a source. When a source is removed, all scans associated with it are eliminated from the final experiment. When a source is added, all scans associated with it are put into the final experiment (assuming they have sufficient stations to form a baseline).
Click on the boxes to add or eliminate a station from the observations of a given source. Scans will be added or eliminated from the final experiment based on whether changes give them enough stations to form a baseline.

The Sources section is something of a work in progress, and not something anyone uses at the USNO, so it is a little confused at this point as to what it wants to be. It was developed originally with the idea that astronomical observers would be interested in sources (in geodesy they are uninteresting). Suggestions are welcome.

Selecting Specific Scans With the "Scan Selection" Editor

At any time in the scan/station selection process, the "Scan Selection" editor will show which scans will be included in the final experiment (included scans are green, scans not included are gray). It allows the user to make selections on a scan-by-scan basis by clicking on individual scans, or by turning all scans on or off using the "Select All" and "Clear All" buttons.
IMAGE OF SCAN SELECTION PANEL WITH LABELS HERE
The Scan Selection Editor includes a "Time Limits" plot that shows all scans from the original .vex file as a time sequence (again, scans in green are included, those in gray are not included). The mouse wheel can be used to "zoom in" on different time limits, and the red and blue triangles can be grabbed and dragged to limit the final experiment in time. This widget is somewhat redundant with the Scan/Station Timeline Editor, but it may be useful to someone.

Running DiFX Jobs

Running Jobs With the Scheduler

Using the Real-Time Monitor

monitor_server

If you wish to monitor running jobs through the GUI's real-time plotting capabilities, the DiFX application monitor_server needs to be running. This program provides a TCP server at which real-time data from running DiFX processes can be obtained. The absence of this process is not usually a problem - if you request the real-time plotting it should be started automatically. However if you find that real-time plotting isn't working, this could be a cause. For details, see the Real-Time Monitor Documentation. Note that at this time real-time monitoring is best considered "experimental".

Running DiFX Remotely

The GUI/guiServer communications link, which handles all interaction between the GUI and DiFX, is based on insecure TCP socket connections. This works fine if you run the GUI on the same LAN as the software correlator, but breaks down if you move outside firewalls. To get around such restrictions, an "ssh tunnel" can be set up through a firewall as long as you can ssh to the firewall. Running a DiFX cluster that is located behind a firewall using a GUI running on a machine outside the firewall can be accomplished using the following steps (the order of which is unimportant):

1. Start guiServer normally on the head node of the DiFX cluster. The TCP connection port will be referred to as the "connection port" in the following steps.

2. Start an ssh tunnel from the location where you wish to run the GUI through the firewall. This is done using an ssh command with some options:

		ssh -N -L [LOCALPORT]:[WHAT FIREWALL CALLS DIFX HEADNODE]:[CONNECTION PORT] USER@FIREWALL

USER@FIREWALL is how you would log into the firewall from your local machine
CONNECTION PORT is the TCP connection port used by guiServer on the head node
WHAT FIREWALL CALLS DIFX HEADNODE is the node name of the head node as the firewall sees it (what you would use if logging into the head node from the firewall)
LOCALPORT is a port on the machine where the GUI is being run.

3. Start the GUI on the machine outside the firewall. In the "DiFX Control Connection" section of the "Settings" window, set "DiFX Host" to "localhost" and "Control Port" to the value of LOCALPORT you used in the ssh tunnel. The "guiServer Connection" light should turn green and you should start seeing data from the nodes in the DiFX cluser.
4. Click on "Channel All Data" in the "DiFX Control Connection" section of the "Settings" window.

What is "Channel All Data" Doing?

The GUI/guiServer does not use a single socket for communication. Many activities, including creating/running jobs, examining directory structures on the correlator, and even tab completion for many GUI text fields, require opening new sockets. The port numbers for these sockets are all within a specific range, which you can control (see BLAH), so in theory you could set up tunnels for all of them. Instead, the GUI allows you to "channel" all of these exchanges through to single, tunnelled primary connection. To do this, turn on the "Channel All Data" setting in the "DiFX Control Connection" of the Settings window. The GUI and guiServer handle the organization of packets on either side of the connection so the change should be seamless, and all activities that normally require independent sockets should act normally. This arrangement has reassuringly little impact on the performance of the single connection socket.

Order Is (Maybe) Important

Experience has shown that it is best to start guiServer and any ssh tunnel or tunnels before trying to connect with the GUI.

Problems you May Encounter (and What to Do About Them)

GUI/guiServer Connection Problems
Run Permissions and RSA Problems on the DiFX Host

GUI/guiServer Connection Problems

Run Permissions and RSA Problems on the DiFX Host

When guiServer runs multi-core processes, it needs to be able to execute things remotely on the other nodes in the DiFX cluster. If remote keys are not set up correctly, remote hosts will prompt for permission keys. GuiServer has no way of intercepting these requests, so the runs will fail. You need to make sure all of your keys are in place beforehand.

To do so, log into your head node - where you are running guiServer - using the same user name (I'm calling this user name the "DiFX user" below) and network route that you are using to run guiServer itself. The latter is quite important - if you are running guiServer by logging into the head node remotely, log in again that way. If you are running guiServer from the head node console, log in that way.

Next, try using ssh to remotely log into all of the nodes on your cluster. If you can do so without any key or password requests, you should be set.

Okay, let's say you can't. What do you do about it?. The following may work (or you may wish to bother a system administrator or somebody who knows what they are doing). Make sure you have an RSA key (in the file .ssh/id_rsa.pub in DiFX user's home directory). If you don't, create one with the following command (answer any questions by hitting "return"):

	ssh-keygen -t rsa

You will have to log out and log back in for the new key to be active, or type:

	ssh-add

Then type this for every node you are using, including the head node itself. You should use complete addresses for machine names, not aliases (it is not entirely clear that this is a problem, but we had some issues with it):

	ssh-copy-id user@node

For instance, if your DiFX user is "difx", your head node is "king" and your other processing nodes and mark 5's are "pawn1", "pawn2", "mark5-1" and "mark5-2" you would need to do the following (as the DiFX user on "king"):

	ssh-copy-id difx@king
	ssh-copy-id difx@pawn1
	ssh-copy-id difx@pawn2
	ssh-copy-id difx@mark5-1
	ssh-copy-id difx@mark5-2

To test whether this has worked, you should be able to "ssh" to your DiFX user on all nodes from root on the DiFX head node without entering a password or key. If you can't do this, things are not set up right and jobs will not run. Seek professional help.

These changes should survive reboots, as well as RSA key stuff ever does.