Notes on the developmental roadmap to vdifuse.
----------------------------------------------
(c) Massachusetts Institute of Technology, 2010..2023
(c) Geoffrey B. Crew, 2010..2023

Work began with the burst mode recorder (BMR) in May, 2010.  That
system worked well in tests, but a field deployment in science
observations was never funded (two units were needed and funds were
spent long before I showed up).  It did serve to validate the single
channel mode (Whitney et al. 2013) on the Westford-GGAO (Goddard, MD)
baseline.  That DBE was the prototype for the eventual R2DBE deployed
by the EHT at all single-antenna sites (Vertatschitsch et al., 2015).

Later, when it became clear that something more than a Mark5 was needed
for both VGOS and the EHTC, work began on a true Mark6 successor.  The
initial version was developed by G. Crew and D. Lapsley and presented
at the TOW (May 2012) using the X-Cube storage system (M. Taveniku).
However discussions with X-Cube broke down (money or IP, perhaps) late in
that year and the plan shifted (early 2013) to the current cplane/dplane
arrangement implemented by C. Ruszczyk and R. Cappallo.  The first used
a raid implementation on the modules, and then later went through two
versions of the scatter-gather (sg) plan.  To be honest, a 3rd version that
used packet-sized headers for the packet blocks would have been much easier
to deal with, but that has never been implemented.  Components of the burst
mode system (grab and push) were used to develop support tools for the Mark6.

One of the driving features of vdifuse was that "scan check" for the Mark5
was a somewhat challenging process as the data needed to be read off the
storage modules into a local file for examination...at that was a
sequential start.  (I.e. the file might be fine at the beginning and
then seriously corrupt later on in the recordings. Worse, the proprietary
nature of the Mark5 implementations meant that it was not possible to
recover in such cases.)  Early birthing pains with the sg system made it
clear that a complete examination of the fragments was necesary to be
sure that the recordings would be usable.  Thus the need for random
access to the sg fragments was needed, and the sg_access.? library was
born to be used in the Mark6 "scan check" tool.  Unfortunately while
resources were barely available for the Mark6 cplane/dplane development
(as is usual with such things, the effort was grossly underestimated to
meet the resources available) and a production-ready plan for using the
recordings was not seriously addressed.  The developmental tools included
a dqa program that could check and then assemble flat recording files from
the sg fragments, but this requires substantial RAID space and is highly
inefficient.  This has ultimately worked well enough for VGOS with its
lower data rates, but for the EHTC (where money for the media was barely
available) it would be a non-starter.

Thus vdifuse was born to create a FUSE layer that could make the native
recordings on the Mark6 available as flat files.  Since DIFX already had
an interface for flat file recordings, the EHTC need was solved.  That
this sort of thing could be done had previously been proven for the Mark5;
however, the development here of vdifuse was ab initio.  As the ETHC case
required only a single thread, that case was worked out first.  And since
the block numbering of the sg format was in early cases suspect, the approach
taken was to fully validate the flow of data through the packet times (as is
done in DIFX once the data read was processed.)  It was also unclear what
all the thread use cases were going to be (and indeed, the plan in dplane
evolved at the same time so it was hard to code to a moving target).

So threads became a "later" thing.  It became urgent for the EHTC when NOEMA
appeared.  In the meantime, however, it seems the prejudices against a fuse
approach within the DiFX community (which was not popular when vdifuse was
started) yielded to eventually embrace that, with other efforts to support
Mark6 recordings "natively" in DiFX.  (I.e. develop infrastructure in DiFX
to support module access and automated access to the recordings as had
previously been done for the Mark5 at the VLBA and Bonn correlators.)

It is unclear that all the potential use cases are addressed, and thus
it seems sensible to finish up the vdifuse application with proper thread
support and perhaps other features for the known use cases.

-----
Whitney, A. R., Beaudoin, C. J., Cappallo, R. J., Corey, B. E., Crew, G. B.,
    Doeleman, S. S., Lapsley, D. E., Hinton, A. A., McWhirter, S. R.,
    Niell, A, E., Rogers, A. E. E., Ruszczyk, C. A., Smythe, D. L.,
    SooHoo, J. and Titus, M. A.,  "Demonstration of a 16 Gbps Station
    Broadband-RF VLBI System", PASP 125, 196, 2013, 10.1086/669718,
    https://ui.adsabs.harvard.edu/abs/2013PASP..125..196W.
Vertatschitsch, L., Primiani, R, Young, A, Weintroub, J, Crew, G. B.,
    McWhirter, S. R., Beaudoin, C. J., Doeleman, S. S., and Blackburn, L,
    "R2DBE: A Wideband Digital Backend for the Event Horizon Telescope",
    PASP 127, 1226, 2015, 10.1086/684513,
    https://ui.adsabs.harvard.edu/abs/2015PASP..127.1226V.

General FUSE:
    http://fuse.sourceforge.net/doxygen/fusexmp__fh_8c.html &c.
    man mount.fuse
    . user_allow_other is required in /etc/fuse.conf for non-private usage
    . the with the default (async_read), the kernel read requests sometimes
      get out of order.

Error numbers (used by vdifuse):
    /usr/include/asm-generic/errno-base.h
    Currently return only these (not 100% sure about usage):
     2 ENOENT   No such file or directory
     5 EIO      I/O error
     9 EBADF    Bad file number
    14 EFAULT   Bad address
    29 ESPIPE   Illegal seek
    30 EROFS    Read-only file
    No longer used:
     1 EPERM    Operation not permitted (EROFS is clearer)

Wisdom:
    About io performance tuning
        http://cromwell-intl.com/linux/performance-tuning/disks.html
    About xfs tuning:
        http://everything2.com/index.pl?node_id=1479435
    To clear out cached pages and memory for (repeat) benchmarking:
        free && sync && echo 3 > /proc/sys/vm/drop_caches && free
    See context-example.c for how to save and restore context.
    For improved performance (on sd?):
        echo 4096 > /sys/block/sd?/queue/read_ahead_kb
        home=/data-sk31/alma-apr2016 \
        mount=/data-sk31/alma-apr2016/rc48 prep-one-scan.sh 099
    (unclear if this is necessary on all kernels)
    Note that the Mark6 module disks are grouped by 4 {0..3} and {4..7}
        to share an e-SATA cable which limits each disk to 3 Gbps, so
        the read rate is capped at 11.2 Gbps or 1.4 GB/s.
    Several methods are coded to improve read performance.  This can be
        adjusted via an environment variable SG_ACCESS_ADVICE.  The default
        (2) uses the normal linux kernel advice machinery on paging.  The
        POSIX version (3) works as well.  A version using p-threads (4) to
        access and deal with the page fault waits is also coded.  A version
        that does this with re-usable p-threads is coded, but buggy.

Nomenclature: since we have two types of threading (VDIF and CPU core)
    going on, the comments and variables use p-threads and v-threads
    (and variations) to distinguish the two usages.  With regard to VDIF:
    note that whether we have v-thread handling engaged or not, there is
    always at least one thread.  (It may be that the DBEs are not populating
    those bits, however.)

Pending changes to be checked:
    More than 6 epoch bits: the current VDIF epoch rolls-over in 2032.
    The options are to indeed roll it over, or to co-opt the one or two
    reserved bits above it in the header.  A ROLLOVER define with if..else
    logic is now coded into these:  fix_the_file.?, push_vdif.c, sg_access.c,
    vdif_epochs.h, vdif.h and vdiftst.c.

The current implementation was driven by EHT needs, but has been demonstrated
    to work efficiently on all of the VLBA, ALMA, NOEMA use cases.  Support
    for VGOS (which handles threads differently) is still a work in progress.

More complete documentation is pending.