Python ctypes Interface to mark5_access Jan Wagner April 14, 2015 0 Introduction ~~~~~~~~~~~~~~ This Python module provides an interface to the C/C++ mark5_access library and its VLBI data inspection and fast data decoding functions. The main intention of this Python interface is to allow one to prototype algorithms, and process raw VLBI data in numerical packages such as Numpy and Scipy that offer higher-level numerical computing. Note that there are also other options to use mark5_access from Python. These include SWIG or a hybrid language such as cython. The current interface ctypes-based due to the low complexity, and the one-way data path of mark5_access. 1 Preqrequisites ~~~~~~~~~~~~~~~~ 1.1 Prerequisites for using the Python wrapper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The mark5_access Python interface makes use of a standard Python module called 'ctypes'. The ctypes project describes itself as "a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python." https://docs.python.org/library/ctypes.html You may install it with, for example, $ sudo aptitude install python-ctypes In your Python script that calls mark5_access, use import ctypes import mark5access as m5lib 1.2 Prerequisites for re-generating the Python wrapper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the event that the C/C++ interface of mark5_access has changed it is necessary to re-generate the Python wrapper code. This wrapper is based on code generators that convert C header files into ctypes interfaces. These generators are part of the ctypes library utilities, specifically, h2xml.py and xml2py.py. To install: $ sudo aptitude install python-ctypeslib Note that ctypeslib version 0.5.6 has a buggy implementation of regexp handling related to command line option "-r ". This option is used to extract C library function names. In that version, a patch may need to be applied to the ctypeslib file 'codegenerator.py'. The patch is included here under ./patch/codegenerator.patch 2 Compiling and installing the mark5_access Python wrapper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ First, compile and install the mark5access library. Next, make sure you patched ctypeslib file codegenerator.py if necessary, see above. Compile and install the mark5access Python wrapper by $ make # generates the ctypes interface $ make check # checks if the generated file looks ok $ sudo make install # installs the Python binding 3 Calling mark5access from Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Import the mark5access library Python interface via import ctypes import mark5access as m5lib The mark5access functions have the same names and parameters as you would expect from the C/C++ library interface. Please refer to the "DeveloperGuide" and "UserGuide" documents for details on the mark5_access library. Usage in Python should be relatively straightforward. You might, however, want to consult the examples in the ctypes documentation at https://docs.python.org/library/ctypes.html, or see an online "ctypes tutorial". 3.1 Some peculiarities worth pointing out ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) mark5access library function arguments in Python will have 'ctypes' data types, such as ctypes.c_int. 2) buffers passed between Python and the mark5access library are often declared as type ctypes.POINTER() 3) the contents of mark5access C data structures can be accessed via '.contents.' 3.2 Basic usage ~~~~~~~~~~~~~~~ An existing file on disk can be opened as follows try: m5file = m5lib.new_mark5_stream_file(filename, ctypes.c_longlong(offset)) m5fmt = m5lib.new_mark5_format_generic_from_string(fileformat) ms = m5lib.new_mark5_stream_absorb(m5file, m5fmt) except: print ("Error: problem opening or decoding %s\n" % (fn)); return EXIT_FAILURE The array to be filled with new sample data can be allocated as follows FLT32ARR = ctypes.c_float*nsamples PFLT32ARR = ctypes.POINTER(ctypes.c_float)*ms.contents.nchan pdata = PFLT32ARR() for i in range(ms.contents.nchan): pdata[i] = FLT32ARR() or alternatively by using a helper function in mark5access.helpers pdata = mark5access.helpers.make_decoder_array(ms, nsamples) New data can then be read with m5lib.mark5_stream_decode(ms, nsamples, pdata) To process decoded raw data of a particular channel in NumPy/SciPy/stats, chdata = pdata[channel_nr][0:nsamples] m = numpy.mean(chdata) Alternatively, to access the same data but as a numpy array, chdata_p = ctypes.cast(pdata[channel_nr], ctypes.POINTER(ctypes.c_float*nsamples)) chdata = numpy.frombuffer(chdata_p.contents, dtype='float32') 3.3 Helper functions in mark5access.helpers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The sub-module 'mark5access.helpers' offers some additional functions. They wrap and pythonize certain mark5access library functions that store their return values in arguments passed by pointers. 3.3.1 mark5access.helpers.get_frame_time(ms) Returns the timestamp of the most recently read data frame as a tuple. Example: (mjd,sec,ns) = mark5access.helpers.get_frame_time(ms) 3.3.2 mark5access.helpers.get_sample_time(ms) Returns the timestamp of the next sample to be decoded as a tuple. Example: (mjd,sec,ns) = mark5access.helpers.get_sample_time(ms) 3.3.3 mark5access.helpers.count_high_states(ms, nsamples) Returns the high-state counts of the next 'nsample' samples in each recorded channel. Example: counts = mark5access.helpers.count_high_states(ms, 8192) 3.3.4 mark5access.helpers.make_decoder_array(ms, nsamples, [dtype=ctypes data type]) Returns a 'nchan x nsamples' array that can be passed to one of the mark5_stream_decode variants. The default data type of ctypes.c_float can be used in conjunction with mark5_stream_decode(), while ctypes.c_double is required for mark5_stream_decode_double(). Example: pdata = mark5access.helpers.make_decoder_array(ms, 8192) 3.3.5 mark5access.helpers.get_VDIF_time_from_MJD(mjd, second) Returns a tuple (refep,refsec) containing the VDIF Reference Epoch and VDIF Seconds since Reference Epoch that correspond to the given MJD day and second-of-day. 3.3.6 mark5access.writers.VDIFEncapsulator class A basic VDIF encapsulator with a file-like interface. The values in VDIF frame headers can be specified with a mark5access-like format string (e.g., VDIF_8192-1024-1-32). User provided binary data (Python strings) write()'n to the class are encapsulated into VDIF frames. Frame numbers and timestamps are advanced automatically. The class assists in writing data that has been decoded via mark5access read functionality and processed in some way in the user application, e.g., filtered or upsampled. The class does not contain sample quantization conversions, this is up to the user application. 3.4 Example programs ~~~~~~~~~~~~~~~~~~~~ A few example Python scripts can be found under ./examples/ m5stat.py : A data statistics checker for raw VLBI data. Has the same command line arguments as the C program 'm5bstate'. It performs a similar statistical analysis of the sample data distribution (histogram) as m5bstate. In addition, it reports other moments such as standard deviation, skewness, and kurtosis that especially for >2-bit data allow a better check of the signal Gaussianity in each recorded channel. Example output for a VDIF_8192-8192-4-2 format file with poor thresholds: Ch mean std skew kurt gfact state counts from -HiMag to +HiMag 0 +0.01 +1.08 -0.40 +2.19 +2.62 1.5 / 46.3 / 52.1 / 0.1 1 -0.11 +1.29 -0.53 +3.10 +1.96 5.7 / 44.4 / 48.9 / 1.0 2 +0.02 +1.16 -0.26 +2.86 +2.29 2.2 / 45.7 / 51.0 / 1.2 3 +0.03 +1.07 -0.23 +2.13 +2.65 1.0 / 46.9 / 51.7 / 0.4 based on 81920 samples (0 invalid) m5time.py : Reports the first timestamp of a raw VLBI recording. In addition, it checks a few subsequent data frames in an attempt to verify the validity of the first timestamp and that the specified data format was indeed correct for the file. This check helps in particular with formats like VDIF that, unlike Mark5B, do not contain well identified magic bytes or "sync words" in their headers. m5subband.py : Extracts a narrow subband using Fourier filtering methods. Performs DFT-based filtering. A specified channel in the raw VLBI recording is Fourier transformed and the desired bins of the transform are copied and inverse Fourier transformed to produce a time-domain narrow-band output signal. In order to improve the filter steepness and reduce sidelobes, the narrowband output signal is generated using windowed overlapped (I)DFTs, with zero padding, and phase coherent overlap-add. Overlap factors of 2 (50%), 4 (75%), or more can be used. The bandwidth reduction ratio is quite freely selectable. For example, m5subband.py can output the time domain signal from a 64 kHz wide band around, say, a maser line that is found in a 512 MHz wide recording. The narrow band output signal can be processed in other tools.