There exists a private document "Scatter-Gather File System Format" (Roger Cappallo 2014.7.1)
with a description of the overall structure of scatter-gather files. However there is 
apparently no *public* documentation yet on the Mark6 non-RAID recording format.

The information here was reverse-engineered from a disk set with actual Mark6 scatter-
gather (SG) mode recordings, the data in individual files, and Mark6 d-plane 1.12 source code.

0) Some points on SG mode vs. RAID
1) Description of scatter-gather mode file sets and mounted disks
2) The metadata files under /mnt/disks/.meta/[1-4]/[0-7]/ ('group', 'slist')
3) The layout of data and headers in files of a SG mode file set
4) Failure modes


0) Some points on SG mode vs. RAID

Disk failure after recording in:

  SG mode      -->  a subset of data is lost (periodic chunks in reconstructed file)
  RAID 0 mode  -->  the XFS file system will be broken and all files are lost,
                    although a low-level VDIF-specific recovery might be possible if
                    there are suitable tools for this...
  RAID 5 mode  -->  single disk failure is survivable, no data lost

Disk failure during recording in:

  SG mode      -->  might be able to continue recording? Mark6 dplane implementation
                    not quite clear...
  RAID 0 mode  -->  recording probably freezes
  RAID 5 mode  -->  should be able to continue writing even with the degraded RAID,
                    at least until the next disk fails

Recording-time performance:

  SG mode       --> should allow best throughput, should be resilient to disk
                     failures although this depends on the Mark6 dplane implementation
  RAID 5 mode   --> probably fine for 4 Gbps recording, but for 4 to 16+ Gbps
                     it might be too slow? (untested...)

Data reassembly in:

  SG mode      -->  data from files scattered across all disks has to be combined,
                    removing embedded metadata during this gathering process
  RAID 0&5     -->  no special reassembly steps
   
Mark6 disk mounting:

  SG mode

  Each of the two LSI SAS HBA cards in the Mark6 handles 16 disks. 
  The Mark6 software divides the disks into 8-disk groups (historical reasons?)
  Hence there can be 4 groups with 8 disks each (groups 1 to 4; disks 0 to 7)
  Disks have each their own XFS file systems (2 XFS partitions per disk: data and metadata)
  Automounting? Apparently not, must mount with a script, or manually, or via dplane+cplane+da-client

  RAID mode

  Certainly just a single XFS file system distributed over all disks
  Not tested if individual disks have two partitions (could have 1 in RAID, 1 with disk-specific metadata)
  Automounting? mdadm automount perhaps?


1) Description of scatter-gather mode file sets and mounted disks

In scatter-gather mode there is a set of files associated with each VLBI scan.
The files are on scatter-gather related disk partitions on XFS.file systems.

All partitions are:

 cat /proc/partitions
   8        0  488386584 sda            -- OS disk (fixed)
   8        1  468596736 sda1           Linux, Mark6 root file system
   8        2          1 sda2           ?
   8        5   19786752 sda5           Linux, swap
   8       16 3907018584 sdb            -- VLBI disk, 1 of 16 (removable)
   8       17 3906919424 sdb1           data partition, XFS file system
   8       18      97280 sdb2           meta data partition, XFS file system
 ...
   8      240 3907018584 sdp            -- VLBI disk, 15 of 16 (removable)
   8      241 3906919424 sdp1           data partition, XFS file system
   8      242      97280 sdp2           meta data partition, XFS file system
  65        0 3907018584 sdq            -- VLBI disk, 16 of 16 (removable)
  65        1 3906919424 sdq1           data partition, XFS file system
  65        2      97280 sdq2           meta data partition, XFS file system

Mounted as:

  /dev/sdb1 on /mnt/disks/1/0 type xfs (rw)
  /dev/sdb2 on /mnt/disks/.meta/1/0 type xfs (rw)
  ...
  /dev/sdp1 on /mnt/disks/2/6 type xfs (rw)
  /dev/sdp2 on /mnt/disks/.meta/2/6 type xfs (rw)
  /dev/sdq1 on /mnt/disks/2/7 type xfs (rw)
  /dev/sdq2 on /mnt/disks/.meta/2/7 type xfs (rw)

Meta data files (see 2):

  /mnt/disks/.meta/[1-4]/[0-7]/group    diskpack volume serial number
  /mnt/disks/.meta/[1-4]/[0-7]/slist    scan list in JSON format

Scatter-gather files (see 3):

  /mnt/disks/[1-4]/[0-7]/filename       

  When scans are recorded across 16-disks ("2 diskpacks") in SG mode the
  resulting set of data files is, for example:

  /mnt/disks/1/0:
  -rw-r--r--  1 oper mark6 1.2G Oct 26 15:13 n14st02c_fila10_2014y299d06h12m40s.vdif
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:09 n14st02c_fila10_2014y299d15h08m49s.vdif
  /mnt/disks/1/1:
  -rw-r--r--  1 oper mark6 1.4G Oct 26 15:13 n14st02c_fila10_2014y299d06h12m40s.vdif
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:09 n14st02c_fila10_2014y299d15h08m49s.vdif
  /mnt/disks/1/2:
  -rw-r--r--  1 oper mark6 1.4G Oct 26 15:13 n14st02c_fila10_2014y299d06h12m40s.vdif
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:09 n14st02c_fila10_2014y299d15h08m49s.vdif
  ...
  /mnt/disks/1/7:
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:13 n14st02c_fila10_2014y299d06h12m40s.vdif
  -rw-r--r--  1 oper mark6 1.4G Oct 26 15:09 n14st02c_fila10_2014y299d15h08m49s.vdif
  /mnt/disks/2/0:
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:13 n14st02c_fila10_2014y299d06h12m40s.vdif
  -rw-r--r--  1 oper mark6 1.4G Oct 26 15:09 n14st02c_fila10_2014y299d15h08m49s.vdif
  ...
  /mnt/disks/2/7:
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:13 n14st02c_fila10_2014y299d06h12m40s.vdif
  -rw-r--r--  1 oper mark6 1.3G Oct 26 15:09 n14st02c_fila10_2014y299d15h08m49s.vdif

  In the above example two scans (n14st02c_fila10_2014y299d06h12m40s.vdif and
  n14st02c_fila10_2014y299d15h08m49s.vdif) were recorded on the two diskpacks.
  The scatter files are about 1.3 GB in size each. The total size of one 
  scan was about 21 GB, or 16 files x 1.3 GB/file.


2) The metadata files under /mnt/disks/.meta/[1-4]/[0-7]/ ('group', 'slist')

The 'group' meta data files contain a single line, without a newline.

The information reflects (hopefully) the volume serial number sticker on the diskpack. 
When recording across two diskpacks (16 disks total) the metadata might look like this:

$ for ii in `seq 0 7`; do cat /mnt/disks/.meta/1/$ii/group; echo; done
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
$ for ii in `seq 0 7`; do cat /mnt/disks/.meta/2/$ii/group; echo; done
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8
2:KVN00500/32000/4/8:KVN00600/32000/4/8

The 'slist' meta data contains a list of scans on the disks.
Each disk should contain a copy identical with the 'slist' files
on the other disks of a diskpack or disk group.

$ for ii in `seq 0 7`; do md5sum /mnt/disks/.meta/1/$ii/slist; done
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/0/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/1/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/2/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/3/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/4/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/5/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/6/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/1/7/slist
 for ii in `seq 0 7`; do md5sum /mnt/disks/.meta/2/$ii/slist; done
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/0/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/1/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/2/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/3/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/4/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/5/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/6/slist
ff872a296c46b64138041cb27b1197b1  /mnt/disks/.meta/2/7/slist

The 'slist' files are in JSON format, for example (with slight reformatting):

{
  1: {'status': 'recorded', 'num_str': 1, 'start_tm': 1410876577.706074, 'create_time': '2014y259d14h09m38s', 'sn': 'wrtest_2nd_test_001', 'dur': 100, 'spc': 0, 'size': '7.546'}, 
  2: {'status': 'recorded', 'size': '8.533', 'start_tm': 1410909599.106127, 'create_time': '2014y259d23h20m00s', 'sn': 'wrtest_2nd_tvg_001', 'dur': 100, 'spc': 0, 'num_str': 1}, 
  3: {'status': 'recorded', 'size': '0.000', 'start_tm': 1410938164.948671, 'create_time': '2014y260d07h16m05s', 'sn': 'wrtest_2nd_cw_001', 'dur': 100, 'spc': 0, 'num_str': 1},
  ...
  59: {'status': 'recorded', 'size': '15.226', 'start_tm': 1415584753.080439, 'create_time': '2014y314d01h59m14s', 'sn': 'wrtest_kjcc_test_2014y314d10h59m09s', 'dur': 15, 'spc': 0, 'num_str': 1}}
}


We can look at the last scan ('wrtest_kjcc_test_2014y314d10h59m09s', 16 files called "wrtest_kjcc_test_2014y314d10h59m09s.vdif").
The recording was started manually and the file name part about 2014y314d10h59m09s is not entirely correct.
In addition the Mark6 system was in the KST time zone (Korean Standard Time).

The 16 files can be combined for a readable VDIF file using Mark6 'gather' or 'm6sg_gather' in this library package.
JSON/slist: the create_time of the scan is 2014y314d01h59m14s
VDIF file:  the first time stamp is MJD = 56971/01:59:14.19 (10Nov2014/DOY314) as inserted set by the VLBI backend

The VDIF time stamp and the time stamp in the JSON-formatted metadata agree.

If one were to check a diskpack contents in say, DiFX, to correlate a full diskpack without
explicitly specifying a list of individual scans, one could *probably* use the JSON data to 
map different time ranges to scan names.


3) The layout of data and headers in files of a SG mode file set

Each file in a SG file set (files associated with the same scan) looks like this:

  [file header]
  [block a header] [block a data (~10MB)]
  [block b header] [block b data (~10MB)]
  [block c header] [block c data (~10MB)]
  ...

The file header
   sync_word      uint32  0xfeed6666
   version        uint32  version number of the SG file format
   block_size     uint32  nominal size of blocks including block header
   packet_format  uint32  0:vdif, 1:mark5, 2:unknown   (not sure what this is used for)
   packet_size    uint32  length of data packets (frames?) in bytes

The block header (version 1)
   blocknum       uint32  sequence number of the block, increasing and unique
                          in the file set of the scan

The block header (version 2)
   blocknum       uint32  sequence number of the block, increasing and unique
                          in the file set of the scan
   wb_size        uint32  size of this block including the block header,
                          might be shorter than block_size of the file header,
                          e.g. the last block in a scan might be just partially filled 


The file and block header describe mainly the packet size and the size of a data block.

In the Mark6 source code these blocks are sometimes referred to as "cells" (?)

The data section of each block is ensured to contain an integer number of VLBI frames.
In version 2 of the file format the blocks can differ in size from one block to the next,
perhaps to accomodate some kind of VLBI data that yields a dynamically changing packet size.

The block numbers within a file are always (?) increasing. Consecutive blocks in the same
file have increasing block numbers that have "gaps" (e.g, a=0, b=16, c=35, ...). 
The block numbers that are "missing" in one file (in this example, the missing blocks would 
be 1,2,3..,15,17,18,..,34,...) are found in one of the other files associated with the scan.

The Mark6 recording software is able to do on-the fly conversion from Mark5B into VDIF,
and one may probably safely assume that the recorded data on the disks are always VDIF.

During scatter-gather mode recording the Mark6 opens one file on each of the disks.
Network data are written to this set of open files. Because the 10GbE recording in 
Mark6 software is not done Round-Robin across these files, the order of blocks across 
the files is somewhat random.

In a 4-disk example, the four files of one SG recording (one file on every disk) might contain:

        file 0  file 1  file 2  file 3
      --------------------------------
block        0       1       2       3
block        4       7       5       6
block        9       8      10      11
block      ...

It is not clear from the Mark6 source code whether the following is also possible,
e.g., when the disk containing file 3 is very slow compared to the other disks:

        file 0  file 1  file 2  file 3
      --------------------------------
block        0       1       2       3
block        4       5       6       9
block        7       8      10      13
block       11       12     14     ...


4) Failure modes

Hard failures

In RAID 0 mode most failures are probably catastrophic.

In RAID 5 mode a disk failure probably needs the operator to 
insert a new disk and start the rebuild of the degraded array.
At that time, or even if reading data while in degraded mode
without starting the array rebuild, Murphy's law lad dictates
that a second disk will fail, causing a catastrophic failure.


Soft failures

Dead/unmounted disk in SG mode: regular pieces will be
missing in the VLBI data. Equivalent to reading a VDIF
file where some frames are missing.

Missing a file in a file set of some scan: equivalent to
the above case.

Disks attached but not mounted: just mount them normally


Bizarre failures

Scatter-gather file metadata corrupt, or XFS file systems
partially corrupt and return bad file data: 
  probably needs manual intervention, might try to rename that
  file in the file set that contains bad metadata, or unmount
  that particular disk/file system

Mismatch in /mnt/disks/.meta/[1-4]/[0-7]/group and ./slist JSON metadata
and actual contents of XFS file systems on the scatter-gather disks:
  needs manual intervention? 
  or just ignore those metadata, they are largely just for convenience?