Hi Geoff 1- I modified the "architecture.h.in" file and I have attached it to this email. You need to replace it with the original file and disable "ipp_enabled" variable in "configure.ac" file and re-compile the code. (I re-compiled it one more time this morning and it compiled without an error on a new fresh linux system). Basically, in this file I have asked CUDA cuFFT to take control of fftw3 operations and do everything automatically by changing a header file and benchmarks in my presentation is based on all automatic operations (like transmitting data, memory allocation, device selection and more). In this case, there are excessive amount of data transitions between Host's memory and Device's memory. So, although there were improvement in my experiments, I believe the improvement can get much more significant by optimizing the code. There are many inline functions like the one below as an example: inline vecStatus genericAdd_32f_I(const f32 *src, f32 *srcdest, int length) { for(int i=0;i From: Geoff Crew Sent: Tuesday, October 17, 2017 10:50:06 AM To: Arash Roshanineshat Cc: Jonathan Weintroub; Adam Deller Subject: GPU acceleration of DiFX   Hi Aresh, I'm at the DiFX meeting being held this week in Bologna, Italy, and the topic of GPU acceleration of DiFX came up.  I shared your work from this past summer with Jonathan on using cuFFTw to speed up DiFX and the benchmarking (from the presentation you made at CASPER which Jonathan pointed me at). They are *very* interested in following this up.  So: (a) Are the changes to the DiFX source available somewhere that we can scoop up and incorporate into DiFX and try on other GPU enabled architectures?  and (b) What are your plans?  Are you interested in continuing with this topic? (I've CC'd Adam Deller who can speak for the DiFX team.) --                 Geoff Crew (gbc@haystack.mit.edu) !DSPAM:59e7737b253345083819021!