Machine Learning

Fourier Feature Network(FFN) VS Sinusoidal Representation Networks(Siren)

Moreover, can we combine them?

Abstract

Many recent work have been conducted in using multi-layer perceptron (MLP) as implicit data represenation. The underlying data to be represenated can be image, 3d structure, analog signals, etc. However, so far there is no quantitative comparision between different methods.

In this blog post, we will compare the performace of Fourier Feature Network(FFN) and Siren MLP, proposed in work Implicit Neural Activations with Periodic Activation Functions. In terms of accuracy in image fitting.

In the end, we proposed a combined version FourierFeatureSiren (FFSiren), which brings the advantages of both networks together. We show that it outperforms both existing models.

The Siren implementation is a copy-paste from the original implementation from the author.
FFN implementation is a pytorch port from the original jax implemenation from the author.

Image Fitting Comparision

Task: Given the pixel coordinate (x,y) of the input camera_man (grayscale 256x256) image, predict the corresponding intensity of that pixel.

For fair comparision, all three models are having ~0.263 Million parameters, training with 500 steps and Adam optimizer with learning rate 1e-4.

Roughly looking at the final results, they seem perform equally well. But when zoom into the images, we can see FFN suffers from noisy pixels, and Siren’s output is not sharp enough. Only the output of FourierFeatureSiren is not having certain obvious drawbacks.

FourierFeatureSiren

FourierFeatureSiren(FFSiren) is simply a combined version of the two. It takes the input embbeding using fourier feature mapping, and feed it into a Siren network with sinusoidal activation functions In terms of image fitting, the outcome of FFSiren out-performs both exisiting networks, its MSE loss is ~20 times less than FFN, yet still has the fast converge rate during training and low output noise like Siren.

FFSiren model architecture:

Sidenote: I have also tried to make the gaussian random B matrix from FFN trainable, but it has no improvement on performance.

Results

	PSNR	Pixel MSE	Gradient MSE
Siren	36.80	0.0008	33.28
FFN	38.43	0.00057	48.12
FFSiren	51.83	0.0000262	18.12

PSNR: higher the better; MSE: lower the better

Peak signal-to-noise ratio:

Mean squared error:

Comparision on applying Sobel kernel on output images

Conclusions

	Pros	Cons
Siren	Converges very fast during training, it has smooth output as well as its gradient and laplacian	The output image is not sharp enough, high frequency signals are not recovered)
Fourier Feature Network (FFN)	Has higher PSNR in the end than Siren, The image is sharper than Siren	Converges the slowest during training, it also induce significantly more noise in the final image
Fourier Feature Siren (FFSiren)	Converges as fast as Siren during training, it has the highest PSNR, and has much less noise than FFN	Training loss has slight sparks which seems big in PSNR plot (due to log operation)