Radix2 Fast Fourier Transform implemented in C++Matlab code demonstrating use of fft (Fast Fourier Transform)Fourier transformationComputing a Discrete Cosine Transform matrixFast integer handlingBode plot filters implemented in JavaScriptFast Document Distance, C++Iterative algorithm to convert a signal to Fourier seriesDFT (Discrete Fourier Transform) Algorithm in SwiftSpectrum Analysis with Discrete Fourier Transform
Icon is not displayed in lwc
Did Michelle Obama have a staff of 23; and Melania have a staff of 4?
Why is su world executable?
What are some tips and tricks for finding the cheapest flight when luggage and other fees are not revealed until far into the booking process?
Ending a line of dialogue with "?!": Allowed or obnoxious?
Output with the same length always
How to render "have ideas above his station" into German
Has there ever been a truly bilingual country prior to the contemporary period?
The Lucky House
Build a mob of suspiciously happy lenny faces ( ͡° ͜ʖ ͡°)
Regression when x and y each have uncertainties
What allows us to use imaginary numbers?
Are there any rules on how characters go from 0th to 1st level in a class?
May the tower use the runway while an emergency aircraft is inbound?
Quick destruction of a helium filled airship?
Are unaudited server logs admissible in a court of law?
Can anybody tell me who this Pokemon is?
Unconventional examples of mathematical modelling
Alignment of different align environment
Does knowing that the exponent is in a certain range help solving discrete log?
How do the Durable and Dwarven Fortitude feats interact?
Is this bar slide trick shown on Cheers real or a visual effect?
Get file name and directory in .vimrc file
Representing an indicator function: binary variables and "indicator constraints"
Radix2 Fast Fourier Transform implemented in C++
Matlab code demonstrating use of fft (Fast Fourier Transform)Fourier transformationComputing a Discrete Cosine Transform matrixFast integer handlingBode plot filters implemented in JavaScriptFast Document Distance, C++Iterative algorithm to convert a signal to Fourier seriesDFT (Discrete Fourier Transform) Algorithm in SwiftSpectrum Analysis with Discrete Fourier Transform
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Note: If you don't know much about Fourier transform algorithms, a simple review of whether I am doing anything inefficient with C++ in general would be appreciated.
I've been working on implementing an efficient Radix2 Fast Fourier Transform in C++ and I seem to have hit a roadblock. I have optimized it in every possible way I can think of and it is very fast, but when comparing it to the Numpy FFT in Python it is still significantly slower. Note that my FFT is not done in-place, but neither is the Python implementation so I should be able to achieve at least the same efficiency as Numpy.
I've already taken advantage of symmetry when the input is a real signal which allows the use of an N/2 point FFT for an N-length real signal, and I also pre-compute all of the twiddle factors and optimize the twiddle factor calculation so redundant twiddle factors are not re-calculated.
The code:
#include <cassert>
#include <complex>
#include <vector>
// To demonstrate runtime
#include <chrono>
#include <iostream>
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W);
static bool IsPowerOf2(size_t x);
static size_t ReverseBits(const size_t x, const size_t n);
std::vector<std::complex<double>> FFT(const std::vector<double>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Taking advantage of symmetry the FFT of a real signal can be computed
// using a single N/2-point complex FFT. Split the input signal into its
// even and odd components and load the data into a single complex vector.
std::vector<std::complex<double>> x_p(N / 2);
for (size_t n = 0; n < N / 2; ++n)
// x_p[n] = x[2n] + jx[2n + 1]
x_p[n] = std::complex<double>(x[2 * n], x[2 * n + 1]);
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
std::vector<std::complex<double>> W_p(N / 4);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
// The N/2-point complex DFT uses only the even twiddle factors
if (k % 2 == 0)
W_p[k / 2] = W[k];
// Perform the N/2-point complex FFT
std::vector<std::complex<double>> X_p = FFTRadix2(x_p, W_p);
// Extract the N-point FFT of the real signal from the results
std::vector<std::complex<double>> X(N);
X[0] = X_p[0].real() + X_p[0].imag();
for (size_t k = 1; k < N / 2; ++k)
// Extract the FFT of the even components
auto A = std::complex<double>(
(X_p[k].real() + X_p[N / 2 - k].real()) / 2,
(X_p[k].imag() - X_p[N / 2 - k].imag()) / 2);
// Extract the FFT of the odd components
auto B = std::complex<double>(
(X_p[N / 2 - k].imag() + X_p[k].imag()) / 2,
(X_p[N / 2 - k].real() - X_p[k].real()) / 2);
// Sum the results and take advantage of symmetry
X[k] = A + W[k] * B;
X[k + N / 2] = A - W[k] * B;
return X;
std::vector<std::complex<double>> FFT(const std::vector<std::complex<double>>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
return FFTRadix2(x, W);
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
assert(IsPowerOf2(N));
// Calculate how many stages the FFT must compute
size_t stages = static_cast<size_t>(log2(N));
// Pre-load the output vector with the input data using bit-reversed indexes
std::vector<std::complex<double>> X(N);
for (size_t n = 0; n < N; ++n)
X[n] = x[ReverseBits(n, stages)];
// Calculate the FFT one stage at a time and sum the results
for (size_t stage = 1; stage <= stages; ++stage)
size_t N_stage = static_cast<size_t>(std::pow(2, stage));
size_t W_offset = static_cast<size_t>(std::pow(2, stages - stage));
for (size_t k = 0; k < N; k += N_stage)
for (size_t n = 0; n < N_stage / 2; ++n)
auto tmp = X[k + n];
X[k + n] = tmp + W[n * W_offset] * X[k + n + N_stage / 2];
X[k + n + N_stage / 2] = tmp - W[n * W_offset] * X[k + n + N_stage / 2];
return X;
// Returns true if x is a power of 2
static bool IsPowerOf2(size_t x)
return x && (!(x & (x - 1)));
// Given x composed of n bits, returns x with the bits reversed
static size_t ReverseBits(const size_t x, const size_t n)
size_t xReversed = 0;
for (size_t i = 0; i < n; ++i)
((x >> i) & 1U);
return xReversed;
int main()
size_t N = 16777216;
std::vector<double> x(N);
int f_s = 8000;
double t_s = 1.0 / f_s;
for (size_t n = 0; n < N; ++n)
x[n] = std::sin(2 * M_PI * 1000 * n * t_s)
+ 0.5 * std::sin(2 * M_PI * 2000 * n * t_s + 3 * M_PI / 4);
auto start = std::chrono::high_resolution_clock::now();
auto X = FFT(x);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;
Output (running a few times and averaging):
3671677
This was compiled in Visual Studio 2019 in Release mode with the /O2
, /Oi
and /Ot
optimization compiler flags to try and squeeze as much speed as possible out of it.
A comparable snippet of Python code that uses the Numpy FFT is shown below:
import numpy as np
import datetime
N = 16777216
f_s = 8000.0
t_s = 1/f_s
t = np.arange(0, N*t_s, t_s)
y = np.sin(2*np.pi*1000*t) + 0.5*np.sin(2*np.pi*2000*t + 3*np.pi/4)
start = datetime.datetime.now()
Y = np.fft.fft(y)
stop = datetime.datetime.now()
duration = stop - start
print(duration.total_seconds()*1e6)
Output (running a few times and averaging):
2100411.0
As you can see, the Python implementation is still faster by about 43%, but I can't think of any ways my implementation can be improved.
From what I understand, the Numpy version is actually implemented with C code underneath so I'm not terribly disappointed in the performance of my own code, but it still leaves me wondering what I am missing that I could still do better?
c++ performance algorithm signal-processing complex-numbers
$endgroup$
add a comment |
$begingroup$
Note: If you don't know much about Fourier transform algorithms, a simple review of whether I am doing anything inefficient with C++ in general would be appreciated.
I've been working on implementing an efficient Radix2 Fast Fourier Transform in C++ and I seem to have hit a roadblock. I have optimized it in every possible way I can think of and it is very fast, but when comparing it to the Numpy FFT in Python it is still significantly slower. Note that my FFT is not done in-place, but neither is the Python implementation so I should be able to achieve at least the same efficiency as Numpy.
I've already taken advantage of symmetry when the input is a real signal which allows the use of an N/2 point FFT for an N-length real signal, and I also pre-compute all of the twiddle factors and optimize the twiddle factor calculation so redundant twiddle factors are not re-calculated.
The code:
#include <cassert>
#include <complex>
#include <vector>
// To demonstrate runtime
#include <chrono>
#include <iostream>
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W);
static bool IsPowerOf2(size_t x);
static size_t ReverseBits(const size_t x, const size_t n);
std::vector<std::complex<double>> FFT(const std::vector<double>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Taking advantage of symmetry the FFT of a real signal can be computed
// using a single N/2-point complex FFT. Split the input signal into its
// even and odd components and load the data into a single complex vector.
std::vector<std::complex<double>> x_p(N / 2);
for (size_t n = 0; n < N / 2; ++n)
// x_p[n] = x[2n] + jx[2n + 1]
x_p[n] = std::complex<double>(x[2 * n], x[2 * n + 1]);
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
std::vector<std::complex<double>> W_p(N / 4);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
// The N/2-point complex DFT uses only the even twiddle factors
if (k % 2 == 0)
W_p[k / 2] = W[k];
// Perform the N/2-point complex FFT
std::vector<std::complex<double>> X_p = FFTRadix2(x_p, W_p);
// Extract the N-point FFT of the real signal from the results
std::vector<std::complex<double>> X(N);
X[0] = X_p[0].real() + X_p[0].imag();
for (size_t k = 1; k < N / 2; ++k)
// Extract the FFT of the even components
auto A = std::complex<double>(
(X_p[k].real() + X_p[N / 2 - k].real()) / 2,
(X_p[k].imag() - X_p[N / 2 - k].imag()) / 2);
// Extract the FFT of the odd components
auto B = std::complex<double>(
(X_p[N / 2 - k].imag() + X_p[k].imag()) / 2,
(X_p[N / 2 - k].real() - X_p[k].real()) / 2);
// Sum the results and take advantage of symmetry
X[k] = A + W[k] * B;
X[k + N / 2] = A - W[k] * B;
return X;
std::vector<std::complex<double>> FFT(const std::vector<std::complex<double>>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
return FFTRadix2(x, W);
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
assert(IsPowerOf2(N));
// Calculate how many stages the FFT must compute
size_t stages = static_cast<size_t>(log2(N));
// Pre-load the output vector with the input data using bit-reversed indexes
std::vector<std::complex<double>> X(N);
for (size_t n = 0; n < N; ++n)
X[n] = x[ReverseBits(n, stages)];
// Calculate the FFT one stage at a time and sum the results
for (size_t stage = 1; stage <= stages; ++stage)
size_t N_stage = static_cast<size_t>(std::pow(2, stage));
size_t W_offset = static_cast<size_t>(std::pow(2, stages - stage));
for (size_t k = 0; k < N; k += N_stage)
for (size_t n = 0; n < N_stage / 2; ++n)
auto tmp = X[k + n];
X[k + n] = tmp + W[n * W_offset] * X[k + n + N_stage / 2];
X[k + n + N_stage / 2] = tmp - W[n * W_offset] * X[k + n + N_stage / 2];
return X;
// Returns true if x is a power of 2
static bool IsPowerOf2(size_t x)
return x && (!(x & (x - 1)));
// Given x composed of n bits, returns x with the bits reversed
static size_t ReverseBits(const size_t x, const size_t n)
size_t xReversed = 0;
for (size_t i = 0; i < n; ++i)
((x >> i) & 1U);
return xReversed;
int main()
size_t N = 16777216;
std::vector<double> x(N);
int f_s = 8000;
double t_s = 1.0 / f_s;
for (size_t n = 0; n < N; ++n)
x[n] = std::sin(2 * M_PI * 1000 * n * t_s)
+ 0.5 * std::sin(2 * M_PI * 2000 * n * t_s + 3 * M_PI / 4);
auto start = std::chrono::high_resolution_clock::now();
auto X = FFT(x);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;
Output (running a few times and averaging):
3671677
This was compiled in Visual Studio 2019 in Release mode with the /O2
, /Oi
and /Ot
optimization compiler flags to try and squeeze as much speed as possible out of it.
A comparable snippet of Python code that uses the Numpy FFT is shown below:
import numpy as np
import datetime
N = 16777216
f_s = 8000.0
t_s = 1/f_s
t = np.arange(0, N*t_s, t_s)
y = np.sin(2*np.pi*1000*t) + 0.5*np.sin(2*np.pi*2000*t + 3*np.pi/4)
start = datetime.datetime.now()
Y = np.fft.fft(y)
stop = datetime.datetime.now()
duration = stop - start
print(duration.total_seconds()*1e6)
Output (running a few times and averaging):
2100411.0
As you can see, the Python implementation is still faster by about 43%, but I can't think of any ways my implementation can be improved.
From what I understand, the Numpy version is actually implemented with C code underneath so I'm not terribly disappointed in the performance of my own code, but it still leaves me wondering what I am missing that I could still do better?
c++ performance algorithm signal-processing complex-numbers
$endgroup$
$begingroup$
Can you give a hint about the used C++ standard (11, 14, 17) Can you use additional libraries such as gsl or abseil?
$endgroup$
– miscco
7 hours ago
$begingroup$
@miscco, C++11 is currently used but C++17 is available. Additional libraries are not out of the question. If they are available through the Conan package manager that would be a huge plus. Also it is important that it stays cross platform compatible so OS specific libraries are a no go.
$endgroup$
– tjwrona1992
7 hours ago
add a comment |
$begingroup$
Note: If you don't know much about Fourier transform algorithms, a simple review of whether I am doing anything inefficient with C++ in general would be appreciated.
I've been working on implementing an efficient Radix2 Fast Fourier Transform in C++ and I seem to have hit a roadblock. I have optimized it in every possible way I can think of and it is very fast, but when comparing it to the Numpy FFT in Python it is still significantly slower. Note that my FFT is not done in-place, but neither is the Python implementation so I should be able to achieve at least the same efficiency as Numpy.
I've already taken advantage of symmetry when the input is a real signal which allows the use of an N/2 point FFT for an N-length real signal, and I also pre-compute all of the twiddle factors and optimize the twiddle factor calculation so redundant twiddle factors are not re-calculated.
The code:
#include <cassert>
#include <complex>
#include <vector>
// To demonstrate runtime
#include <chrono>
#include <iostream>
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W);
static bool IsPowerOf2(size_t x);
static size_t ReverseBits(const size_t x, const size_t n);
std::vector<std::complex<double>> FFT(const std::vector<double>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Taking advantage of symmetry the FFT of a real signal can be computed
// using a single N/2-point complex FFT. Split the input signal into its
// even and odd components and load the data into a single complex vector.
std::vector<std::complex<double>> x_p(N / 2);
for (size_t n = 0; n < N / 2; ++n)
// x_p[n] = x[2n] + jx[2n + 1]
x_p[n] = std::complex<double>(x[2 * n], x[2 * n + 1]);
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
std::vector<std::complex<double>> W_p(N / 4);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
// The N/2-point complex DFT uses only the even twiddle factors
if (k % 2 == 0)
W_p[k / 2] = W[k];
// Perform the N/2-point complex FFT
std::vector<std::complex<double>> X_p = FFTRadix2(x_p, W_p);
// Extract the N-point FFT of the real signal from the results
std::vector<std::complex<double>> X(N);
X[0] = X_p[0].real() + X_p[0].imag();
for (size_t k = 1; k < N / 2; ++k)
// Extract the FFT of the even components
auto A = std::complex<double>(
(X_p[k].real() + X_p[N / 2 - k].real()) / 2,
(X_p[k].imag() - X_p[N / 2 - k].imag()) / 2);
// Extract the FFT of the odd components
auto B = std::complex<double>(
(X_p[N / 2 - k].imag() + X_p[k].imag()) / 2,
(X_p[N / 2 - k].real() - X_p[k].real()) / 2);
// Sum the results and take advantage of symmetry
X[k] = A + W[k] * B;
X[k + N / 2] = A - W[k] * B;
return X;
std::vector<std::complex<double>> FFT(const std::vector<std::complex<double>>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
return FFTRadix2(x, W);
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
assert(IsPowerOf2(N));
// Calculate how many stages the FFT must compute
size_t stages = static_cast<size_t>(log2(N));
// Pre-load the output vector with the input data using bit-reversed indexes
std::vector<std::complex<double>> X(N);
for (size_t n = 0; n < N; ++n)
X[n] = x[ReverseBits(n, stages)];
// Calculate the FFT one stage at a time and sum the results
for (size_t stage = 1; stage <= stages; ++stage)
size_t N_stage = static_cast<size_t>(std::pow(2, stage));
size_t W_offset = static_cast<size_t>(std::pow(2, stages - stage));
for (size_t k = 0; k < N; k += N_stage)
for (size_t n = 0; n < N_stage / 2; ++n)
auto tmp = X[k + n];
X[k + n] = tmp + W[n * W_offset] * X[k + n + N_stage / 2];
X[k + n + N_stage / 2] = tmp - W[n * W_offset] * X[k + n + N_stage / 2];
return X;
// Returns true if x is a power of 2
static bool IsPowerOf2(size_t x)
return x && (!(x & (x - 1)));
// Given x composed of n bits, returns x with the bits reversed
static size_t ReverseBits(const size_t x, const size_t n)
size_t xReversed = 0;
for (size_t i = 0; i < n; ++i)
((x >> i) & 1U);
return xReversed;
int main()
size_t N = 16777216;
std::vector<double> x(N);
int f_s = 8000;
double t_s = 1.0 / f_s;
for (size_t n = 0; n < N; ++n)
x[n] = std::sin(2 * M_PI * 1000 * n * t_s)
+ 0.5 * std::sin(2 * M_PI * 2000 * n * t_s + 3 * M_PI / 4);
auto start = std::chrono::high_resolution_clock::now();
auto X = FFT(x);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;
Output (running a few times and averaging):
3671677
This was compiled in Visual Studio 2019 in Release mode with the /O2
, /Oi
and /Ot
optimization compiler flags to try and squeeze as much speed as possible out of it.
A comparable snippet of Python code that uses the Numpy FFT is shown below:
import numpy as np
import datetime
N = 16777216
f_s = 8000.0
t_s = 1/f_s
t = np.arange(0, N*t_s, t_s)
y = np.sin(2*np.pi*1000*t) + 0.5*np.sin(2*np.pi*2000*t + 3*np.pi/4)
start = datetime.datetime.now()
Y = np.fft.fft(y)
stop = datetime.datetime.now()
duration = stop - start
print(duration.total_seconds()*1e6)
Output (running a few times and averaging):
2100411.0
As you can see, the Python implementation is still faster by about 43%, but I can't think of any ways my implementation can be improved.
From what I understand, the Numpy version is actually implemented with C code underneath so I'm not terribly disappointed in the performance of my own code, but it still leaves me wondering what I am missing that I could still do better?
c++ performance algorithm signal-processing complex-numbers
$endgroup$
Note: If you don't know much about Fourier transform algorithms, a simple review of whether I am doing anything inefficient with C++ in general would be appreciated.
I've been working on implementing an efficient Radix2 Fast Fourier Transform in C++ and I seem to have hit a roadblock. I have optimized it in every possible way I can think of and it is very fast, but when comparing it to the Numpy FFT in Python it is still significantly slower. Note that my FFT is not done in-place, but neither is the Python implementation so I should be able to achieve at least the same efficiency as Numpy.
I've already taken advantage of symmetry when the input is a real signal which allows the use of an N/2 point FFT for an N-length real signal, and I also pre-compute all of the twiddle factors and optimize the twiddle factor calculation so redundant twiddle factors are not re-calculated.
The code:
#include <cassert>
#include <complex>
#include <vector>
// To demonstrate runtime
#include <chrono>
#include <iostream>
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W);
static bool IsPowerOf2(size_t x);
static size_t ReverseBits(const size_t x, const size_t n);
std::vector<std::complex<double>> FFT(const std::vector<double>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Taking advantage of symmetry the FFT of a real signal can be computed
// using a single N/2-point complex FFT. Split the input signal into its
// even and odd components and load the data into a single complex vector.
std::vector<std::complex<double>> x_p(N / 2);
for (size_t n = 0; n < N / 2; ++n)
// x_p[n] = x[2n] + jx[2n + 1]
x_p[n] = std::complex<double>(x[2 * n], x[2 * n + 1]);
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
std::vector<std::complex<double>> W_p(N / 4);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
// The N/2-point complex DFT uses only the even twiddle factors
if (k % 2 == 0)
W_p[k / 2] = W[k];
// Perform the N/2-point complex FFT
std::vector<std::complex<double>> X_p = FFTRadix2(x_p, W_p);
// Extract the N-point FFT of the real signal from the results
std::vector<std::complex<double>> X(N);
X[0] = X_p[0].real() + X_p[0].imag();
for (size_t k = 1; k < N / 2; ++k)
// Extract the FFT of the even components
auto A = std::complex<double>(
(X_p[k].real() + X_p[N / 2 - k].real()) / 2,
(X_p[k].imag() - X_p[N / 2 - k].imag()) / 2);
// Extract the FFT of the odd components
auto B = std::complex<double>(
(X_p[N / 2 - k].imag() + X_p[k].imag()) / 2,
(X_p[N / 2 - k].real() - X_p[k].real()) / 2);
// Sum the results and take advantage of symmetry
X[k] = A + W[k] * B;
X[k + N / 2] = A - W[k] * B;
return X;
std::vector<std::complex<double>> FFT(const std::vector<std::complex<double>>& x)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
// TODO: Implement other algorithms for when N is not a power of 2
assert(IsPowerOf2(N));
// Pre-calculate twiddle factors
std::vector<std::complex<double>> W(N / 2);
for (size_t k = 0; k < N / 2; ++k)
W[k] = std::polar(1.0, -2 * M_PI * k / N);
return FFTRadix2(x, W);
static std::vector<std::complex<double>> FFTRadix2(const std::vector<std::complex<double>>& x, const std::vector<std::complex<double>>& W)
size_t N = x.size();
// Radix2 FFT requires length of the input signal to be a power of 2
assert(IsPowerOf2(N));
// Calculate how many stages the FFT must compute
size_t stages = static_cast<size_t>(log2(N));
// Pre-load the output vector with the input data using bit-reversed indexes
std::vector<std::complex<double>> X(N);
for (size_t n = 0; n < N; ++n)
X[n] = x[ReverseBits(n, stages)];
// Calculate the FFT one stage at a time and sum the results
for (size_t stage = 1; stage <= stages; ++stage)
size_t N_stage = static_cast<size_t>(std::pow(2, stage));
size_t W_offset = static_cast<size_t>(std::pow(2, stages - stage));
for (size_t k = 0; k < N; k += N_stage)
for (size_t n = 0; n < N_stage / 2; ++n)
auto tmp = X[k + n];
X[k + n] = tmp + W[n * W_offset] * X[k + n + N_stage / 2];
X[k + n + N_stage / 2] = tmp - W[n * W_offset] * X[k + n + N_stage / 2];
return X;
// Returns true if x is a power of 2
static bool IsPowerOf2(size_t x)
return x && (!(x & (x - 1)));
// Given x composed of n bits, returns x with the bits reversed
static size_t ReverseBits(const size_t x, const size_t n)
size_t xReversed = 0;
for (size_t i = 0; i < n; ++i)
((x >> i) & 1U);
return xReversed;
int main()
size_t N = 16777216;
std::vector<double> x(N);
int f_s = 8000;
double t_s = 1.0 / f_s;
for (size_t n = 0; n < N; ++n)
x[n] = std::sin(2 * M_PI * 1000 * n * t_s)
+ 0.5 * std::sin(2 * M_PI * 2000 * n * t_s + 3 * M_PI / 4);
auto start = std::chrono::high_resolution_clock::now();
auto X = FFT(x);
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;
Output (running a few times and averaging):
3671677
This was compiled in Visual Studio 2019 in Release mode with the /O2
, /Oi
and /Ot
optimization compiler flags to try and squeeze as much speed as possible out of it.
A comparable snippet of Python code that uses the Numpy FFT is shown below:
import numpy as np
import datetime
N = 16777216
f_s = 8000.0
t_s = 1/f_s
t = np.arange(0, N*t_s, t_s)
y = np.sin(2*np.pi*1000*t) + 0.5*np.sin(2*np.pi*2000*t + 3*np.pi/4)
start = datetime.datetime.now()
Y = np.fft.fft(y)
stop = datetime.datetime.now()
duration = stop - start
print(duration.total_seconds()*1e6)
Output (running a few times and averaging):
2100411.0
As you can see, the Python implementation is still faster by about 43%, but I can't think of any ways my implementation can be improved.
From what I understand, the Numpy version is actually implemented with C code underneath so I'm not terribly disappointed in the performance of my own code, but it still leaves me wondering what I am missing that I could still do better?
c++ performance algorithm signal-processing complex-numbers
c++ performance algorithm signal-processing complex-numbers
edited 7 hours ago
200_success
135k21 gold badges173 silver badges443 bronze badges
135k21 gold badges173 silver badges443 bronze badges
asked 8 hours ago
tjwrona1992tjwrona1992
1921 silver badge9 bronze badges
1921 silver badge9 bronze badges
$begingroup$
Can you give a hint about the used C++ standard (11, 14, 17) Can you use additional libraries such as gsl or abseil?
$endgroup$
– miscco
7 hours ago
$begingroup$
@miscco, C++11 is currently used but C++17 is available. Additional libraries are not out of the question. If they are available through the Conan package manager that would be a huge plus. Also it is important that it stays cross platform compatible so OS specific libraries are a no go.
$endgroup$
– tjwrona1992
7 hours ago
add a comment |
$begingroup$
Can you give a hint about the used C++ standard (11, 14, 17) Can you use additional libraries such as gsl or abseil?
$endgroup$
– miscco
7 hours ago
$begingroup$
@miscco, C++11 is currently used but C++17 is available. Additional libraries are not out of the question. If they are available through the Conan package manager that would be a huge plus. Also it is important that it stays cross platform compatible so OS specific libraries are a no go.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
Can you give a hint about the used C++ standard (11, 14, 17) Can you use additional libraries such as gsl or abseil?
$endgroup$
– miscco
7 hours ago
$begingroup$
Can you give a hint about the used C++ standard (11, 14, 17) Can you use additional libraries such as gsl or abseil?
$endgroup$
– miscco
7 hours ago
$begingroup$
@miscco, C++11 is currently used but C++17 is available. Additional libraries are not out of the question. If they are available through the Conan package manager that would be a huge plus. Also it is important that it stays cross platform compatible so OS specific libraries are a no go.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@miscco, C++11 is currently used but C++17 is available. Additional libraries are not out of the question. If they are available through the Conan package manager that would be a huge plus. Also it is important that it stays cross platform compatible so OS specific libraries are a no go.
$endgroup$
– tjwrona1992
7 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Putting this through the built-in profiler reveals some hot spots. Perhaps surprisingly: ReverseBits
. It's not the biggest thing in the list, but it is significant while it shouldn't be.
You could use one of the many alternate ways to implement ReverseBits
, or the sequence of bit-reversed indexes (which does not require reversing all the indexes), or the overall bit-reversal permutation (which does not require bit reversals).
For example here is a way to compute the sequence of bit-reversed indexes without explicitly reversing any index:
for (size_t n = 0, rev = 0; n < N; ++n)
X[n] = x[rev];
size_t change = n ^ (n + 1);
rev ^= change << (__lzcnt64(change) - (64 - stages));
On my PC, that reduces the time from around 2.8 million microseconds to 2.3 million microseconds.
This trick works by using that the XOR between adjacent indexes is a mask of ones up to and including the least significant zero (the +1 borrows through the least significant set bits and into that least significant zero), which has a form that can be reversed by just shifting it. The reversed mask is then the XOR between adjacent reversed indexes, so applying it to the current reversed index with XOR increments it.
__lzcnt64
is the MSVC intrinsic, you could use some preprocessor tricks to find the right intrinsic for the current compiler. Leading zero count can be avoided by using std::bitset
and its count
method:
size_t change = n ^ (n + 1);
std::bitset<64> bits(~change);
rev ^= change << (bits.count() - (64 - stages));
count
is recognized by GCC and Clang as an intrinsic for popcnt
, but it seems not by MSVC, so it is not reliable for high performance scenarios.
Secondly, there is a repeated expression: W[n * W_offset] * X[k + n + N_stage / 2]
. The compiler is often relied on to remove such duplication, but here it didn't happen. Factoring that out reduced the time to under 2 million microseconds.
$endgroup$
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
|
show 2 more comments
$begingroup$
You should be able to work inplace by utilizing the fact that std::complex has a predefined layout. Therefore you can actually cast between an array of double and an array of (half as many) complex numbers.
std::vector<std::complex<double>> a(10);
double *b = reinterpret_cast<double *>(a.data());
EDIT:
To be more clear I would write
span<std::complex<double>> x_p(reinterpret_cast<std::complex<double>*>(x.data()), x.size() / 2);
This works in both ways. To enable safe and modern features you should use a span object. Unfortunately std::span
is only available in C++20 so you should either write your own (which is a nice exercise) or have a look at abseil::span
or gsl::span
.
The code to implement those is rather minimal. With that you can remove two copies from your code
$endgroup$
$begingroup$
Are you saying replacevector
withspan
? I haven't usedspan
before. I'll look into it!
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data intox_p
you should create aspan
that reinterprests the data inx
as an array ofstd::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f226323%2fradix2-fast-fourier-transform-implemented-in-c%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Putting this through the built-in profiler reveals some hot spots. Perhaps surprisingly: ReverseBits
. It's not the biggest thing in the list, but it is significant while it shouldn't be.
You could use one of the many alternate ways to implement ReverseBits
, or the sequence of bit-reversed indexes (which does not require reversing all the indexes), or the overall bit-reversal permutation (which does not require bit reversals).
For example here is a way to compute the sequence of bit-reversed indexes without explicitly reversing any index:
for (size_t n = 0, rev = 0; n < N; ++n)
X[n] = x[rev];
size_t change = n ^ (n + 1);
rev ^= change << (__lzcnt64(change) - (64 - stages));
On my PC, that reduces the time from around 2.8 million microseconds to 2.3 million microseconds.
This trick works by using that the XOR between adjacent indexes is a mask of ones up to and including the least significant zero (the +1 borrows through the least significant set bits and into that least significant zero), which has a form that can be reversed by just shifting it. The reversed mask is then the XOR between adjacent reversed indexes, so applying it to the current reversed index with XOR increments it.
__lzcnt64
is the MSVC intrinsic, you could use some preprocessor tricks to find the right intrinsic for the current compiler. Leading zero count can be avoided by using std::bitset
and its count
method:
size_t change = n ^ (n + 1);
std::bitset<64> bits(~change);
rev ^= change << (bits.count() - (64 - stages));
count
is recognized by GCC and Clang as an intrinsic for popcnt
, but it seems not by MSVC, so it is not reliable for high performance scenarios.
Secondly, there is a repeated expression: W[n * W_offset] * X[k + n + N_stage / 2]
. The compiler is often relied on to remove such duplication, but here it didn't happen. Factoring that out reduced the time to under 2 million microseconds.
$endgroup$
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
|
show 2 more comments
$begingroup$
Putting this through the built-in profiler reveals some hot spots. Perhaps surprisingly: ReverseBits
. It's not the biggest thing in the list, but it is significant while it shouldn't be.
You could use one of the many alternate ways to implement ReverseBits
, or the sequence of bit-reversed indexes (which does not require reversing all the indexes), or the overall bit-reversal permutation (which does not require bit reversals).
For example here is a way to compute the sequence of bit-reversed indexes without explicitly reversing any index:
for (size_t n = 0, rev = 0; n < N; ++n)
X[n] = x[rev];
size_t change = n ^ (n + 1);
rev ^= change << (__lzcnt64(change) - (64 - stages));
On my PC, that reduces the time from around 2.8 million microseconds to 2.3 million microseconds.
This trick works by using that the XOR between adjacent indexes is a mask of ones up to and including the least significant zero (the +1 borrows through the least significant set bits and into that least significant zero), which has a form that can be reversed by just shifting it. The reversed mask is then the XOR between adjacent reversed indexes, so applying it to the current reversed index with XOR increments it.
__lzcnt64
is the MSVC intrinsic, you could use some preprocessor tricks to find the right intrinsic for the current compiler. Leading zero count can be avoided by using std::bitset
and its count
method:
size_t change = n ^ (n + 1);
std::bitset<64> bits(~change);
rev ^= change << (bits.count() - (64 - stages));
count
is recognized by GCC and Clang as an intrinsic for popcnt
, but it seems not by MSVC, so it is not reliable for high performance scenarios.
Secondly, there is a repeated expression: W[n * W_offset] * X[k + n + N_stage / 2]
. The compiler is often relied on to remove such duplication, but here it didn't happen. Factoring that out reduced the time to under 2 million microseconds.
$endgroup$
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
|
show 2 more comments
$begingroup$
Putting this through the built-in profiler reveals some hot spots. Perhaps surprisingly: ReverseBits
. It's not the biggest thing in the list, but it is significant while it shouldn't be.
You could use one of the many alternate ways to implement ReverseBits
, or the sequence of bit-reversed indexes (which does not require reversing all the indexes), or the overall bit-reversal permutation (which does not require bit reversals).
For example here is a way to compute the sequence of bit-reversed indexes without explicitly reversing any index:
for (size_t n = 0, rev = 0; n < N; ++n)
X[n] = x[rev];
size_t change = n ^ (n + 1);
rev ^= change << (__lzcnt64(change) - (64 - stages));
On my PC, that reduces the time from around 2.8 million microseconds to 2.3 million microseconds.
This trick works by using that the XOR between adjacent indexes is a mask of ones up to and including the least significant zero (the +1 borrows through the least significant set bits and into that least significant zero), which has a form that can be reversed by just shifting it. The reversed mask is then the XOR between adjacent reversed indexes, so applying it to the current reversed index with XOR increments it.
__lzcnt64
is the MSVC intrinsic, you could use some preprocessor tricks to find the right intrinsic for the current compiler. Leading zero count can be avoided by using std::bitset
and its count
method:
size_t change = n ^ (n + 1);
std::bitset<64> bits(~change);
rev ^= change << (bits.count() - (64 - stages));
count
is recognized by GCC and Clang as an intrinsic for popcnt
, but it seems not by MSVC, so it is not reliable for high performance scenarios.
Secondly, there is a repeated expression: W[n * W_offset] * X[k + n + N_stage / 2]
. The compiler is often relied on to remove such duplication, but here it didn't happen. Factoring that out reduced the time to under 2 million microseconds.
$endgroup$
Putting this through the built-in profiler reveals some hot spots. Perhaps surprisingly: ReverseBits
. It's not the biggest thing in the list, but it is significant while it shouldn't be.
You could use one of the many alternate ways to implement ReverseBits
, or the sequence of bit-reversed indexes (which does not require reversing all the indexes), or the overall bit-reversal permutation (which does not require bit reversals).
For example here is a way to compute the sequence of bit-reversed indexes without explicitly reversing any index:
for (size_t n = 0, rev = 0; n < N; ++n)
X[n] = x[rev];
size_t change = n ^ (n + 1);
rev ^= change << (__lzcnt64(change) - (64 - stages));
On my PC, that reduces the time from around 2.8 million microseconds to 2.3 million microseconds.
This trick works by using that the XOR between adjacent indexes is a mask of ones up to and including the least significant zero (the +1 borrows through the least significant set bits and into that least significant zero), which has a form that can be reversed by just shifting it. The reversed mask is then the XOR between adjacent reversed indexes, so applying it to the current reversed index with XOR increments it.
__lzcnt64
is the MSVC intrinsic, you could use some preprocessor tricks to find the right intrinsic for the current compiler. Leading zero count can be avoided by using std::bitset
and its count
method:
size_t change = n ^ (n + 1);
std::bitset<64> bits(~change);
rev ^= change << (bits.count() - (64 - stages));
count
is recognized by GCC and Clang as an intrinsic for popcnt
, but it seems not by MSVC, so it is not reliable for high performance scenarios.
Secondly, there is a repeated expression: W[n * W_offset] * X[k + n + N_stage / 2]
. The compiler is often relied on to remove such duplication, but here it didn't happen. Factoring that out reduced the time to under 2 million microseconds.
edited 7 hours ago
answered 7 hours ago
haroldharold
2,2268 silver badges12 bronze badges
2,2268 silver badges12 bronze badges
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
|
show 2 more comments
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
Is there a way that doesn't involve preprocessor tricks to make this cross platform compatible? I'd prefer to avoid using the preprocessor although it's not completely out of the question.
$endgroup$
– tjwrona1992
7 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
@tjwrona1992 I added an alternative. You could try one of the other permutation algorithms to avoid this problem altogether.
$endgroup$
– harold
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
Thanks this all looks like very useful advice! I'll try to take advantage of some of it and let you know how it goes. Out of curiosity, how did you know the compiler didn't factor out the duplicated code?
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
@tjwrona1992 by proof-reading the assembly code, painful as it was
$endgroup$
– harold
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
Ouch... hahaha. You sir, are a boss.
$endgroup$
– tjwrona1992
6 hours ago
|
show 2 more comments
$begingroup$
You should be able to work inplace by utilizing the fact that std::complex has a predefined layout. Therefore you can actually cast between an array of double and an array of (half as many) complex numbers.
std::vector<std::complex<double>> a(10);
double *b = reinterpret_cast<double *>(a.data());
EDIT:
To be more clear I would write
span<std::complex<double>> x_p(reinterpret_cast<std::complex<double>*>(x.data()), x.size() / 2);
This works in both ways. To enable safe and modern features you should use a span object. Unfortunately std::span
is only available in C++20 so you should either write your own (which is a nice exercise) or have a look at abseil::span
or gsl::span
.
The code to implement those is rather minimal. With that you can remove two copies from your code
$endgroup$
$begingroup$
Are you saying replacevector
withspan
? I haven't usedspan
before. I'll look into it!
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data intox_p
you should create aspan
that reinterprests the data inx
as an array ofstd::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
add a comment |
$begingroup$
You should be able to work inplace by utilizing the fact that std::complex has a predefined layout. Therefore you can actually cast between an array of double and an array of (half as many) complex numbers.
std::vector<std::complex<double>> a(10);
double *b = reinterpret_cast<double *>(a.data());
EDIT:
To be more clear I would write
span<std::complex<double>> x_p(reinterpret_cast<std::complex<double>*>(x.data()), x.size() / 2);
This works in both ways. To enable safe and modern features you should use a span object. Unfortunately std::span
is only available in C++20 so you should either write your own (which is a nice exercise) or have a look at abseil::span
or gsl::span
.
The code to implement those is rather minimal. With that you can remove two copies from your code
$endgroup$
$begingroup$
Are you saying replacevector
withspan
? I haven't usedspan
before. I'll look into it!
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data intox_p
you should create aspan
that reinterprests the data inx
as an array ofstd::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
add a comment |
$begingroup$
You should be able to work inplace by utilizing the fact that std::complex has a predefined layout. Therefore you can actually cast between an array of double and an array of (half as many) complex numbers.
std::vector<std::complex<double>> a(10);
double *b = reinterpret_cast<double *>(a.data());
EDIT:
To be more clear I would write
span<std::complex<double>> x_p(reinterpret_cast<std::complex<double>*>(x.data()), x.size() / 2);
This works in both ways. To enable safe and modern features you should use a span object. Unfortunately std::span
is only available in C++20 so you should either write your own (which is a nice exercise) or have a look at abseil::span
or gsl::span
.
The code to implement those is rather minimal. With that you can remove two copies from your code
$endgroup$
You should be able to work inplace by utilizing the fact that std::complex has a predefined layout. Therefore you can actually cast between an array of double and an array of (half as many) complex numbers.
std::vector<std::complex<double>> a(10);
double *b = reinterpret_cast<double *>(a.data());
EDIT:
To be more clear I would write
span<std::complex<double>> x_p(reinterpret_cast<std::complex<double>*>(x.data()), x.size() / 2);
This works in both ways. To enable safe and modern features you should use a span object. Unfortunately std::span
is only available in C++20 so you should either write your own (which is a nice exercise) or have a look at abseil::span
or gsl::span
.
The code to implement those is rather minimal. With that you can remove two copies from your code
edited 6 hours ago
answered 7 hours ago
misccomiscco
3,6346 silver badges17 bronze badges
3,6346 silver badges17 bronze badges
$begingroup$
Are you saying replacevector
withspan
? I haven't usedspan
before. I'll look into it!
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data intox_p
you should create aspan
that reinterprests the data inx
as an array ofstd::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
add a comment |
$begingroup$
Are you saying replacevector
withspan
? I haven't usedspan
before. I'll look into it!
$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data intox_p
you should create aspan
that reinterprests the data inx
as an array ofstd::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
$begingroup$
Are you saying replace
vector
with span
? I haven't used span
before. I'll look into it!$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
Are you saying replace
vector
with span
? I haven't used span
before. I'll look into it!$endgroup$
– tjwrona1992
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data into
x_p
you should create a span
that reinterprests the data in x
as an array of std::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
No, A span is non-owning. But rather than copying the data into
x_p
you should create a span
that reinterprests the data in x
as an array of std::complex
$endgroup$
– miscco
6 hours ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
$begingroup$
Ahh I see, maybe I will implement another one that does it in place like that. But I would also like to have the option to do it out of place as well
$endgroup$
– tjwrona1992
1 hour ago
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f226323%2fradix2-fast-fourier-transform-implemented-in-c%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Can you give a hint about the used C++ standard (11, 14, 17) Can you use additional libraries such as gsl or abseil?
$endgroup$
– miscco
7 hours ago
$begingroup$
@miscco, C++11 is currently used but C++17 is available. Additional libraries are not out of the question. If they are available through the Conan package manager that would be a huge plus. Also it is important that it stays cross platform compatible so OS specific libraries are a no go.
$endgroup$
– tjwrona1992
7 hours ago