Savitzky–Golay Filter: 60 Years 🎂 of Smoothing Noisy Data
Sixty years ago, in July of 1964, Abraham Savitzky and Marcel J. E. Golay published a seminal paper in Analytical Chemistry where they introduced a technique designed to smooth out noise in spectroscopic data and enhance the signal without significantly distorting the underlying trends. The Savitzky-Golay filter was one of the first signal processing techniques that I’ve learned in grad school, and the simplicity and elegance of this method has captured my heart ever since. As we approach the 60th anniversary of this influential technique, let’s explore its development, applications, and the enduring impact on science.
Oh, and we are also going to implement it in Excel VBA.
The Origins
The Savitzky-Golay filter arose from the need to correct for distortions in spectroscopic data, a common issue in analytical chemistry where precise data is crucial for accurate analysis. The genius of Savitzky and Golay was in using polynomial least squares fitting to a set of consecutive data points. This method applies a convolution operation using polynomial coefficients that are pre-computed, making the process computationally efficient.
What sets the Savitzky-Golay filter apart from other smoothing techniques is its ability to preserve important data features such as relative maxima, minima, and width, which are crucial in interpretative and quantitative analysis. Its design allows it to smooth out noise while maintaining the shape and features of the underlying signal, a crucial advantage when accuracy and precision are paramount.
The Math
The Savitzky-Golay filter works by fitting successive subsets of adjacent data points with a low-degree polynomial by the method of linear least squares. When fitting each subset, the central point is then replaced with the value evaluated from the polynomial equation. This process is repeated across the data set. This method increases the accuracy of the fit and, therefore, of the data smoothing, and it tends to preserve features of the distribution that are usually ‘flattened’ by other types of smoothing filters.
The key mathematical concept behind the Savitzky-Golay filter is the fitting of a polynomial to a subset of the data. For a set of data points \((x_i, y_i)\), where \(i\) ranges from \(1\) to \(n\), a polynomial of degree \(k\) (usually \(k \leq 4\) for most applications) is fitted to each subset of \(m\) points (also known as the “running window”), where \(m\) is odd so that it is symmetric around the central point.
The least squares method is used to find the polynomial coefficients \(a_0, a_1, \dots, a_k\) that minimize the sum of the squared residuals:
$$ \sum_{i=1}^m [y_i - f(x_i)]^2 $$
where \(f(x_i) = a_0 + a_1x_i + a_2x_i^2 + \dots + a_kx_i^k\).
The output of the Savitzky-Golay filter is a linear combination of the input data, which means the smoothed value \(y’_i \) at each point \( i \) can be represented as a convolution of the input data \( y_i \) with a set of coefficients \( c_j \):
$$ y’_i = \sum_{j=-\frac{m-1}{2}}^{\frac{m-1}{2}} c_j y_{i+j} $$
The coefficients \(c_j\) depend only on the relative position within the window of \(m\) points and are derived from the polynomial fit. They are computed such that they represent the convolution kernel for the least squares fitting of a polynomial of degree \(k\).
The Savitzky-Golay filter can also be used to calculate derivatives of the data by modifying the convolution coefficients to represent the derivative of the polynomial rather than the polynomial itself. This is particularly useful for calculating the first or second derivative of a data set without amplifying the noise significantly, as would be typical with simpler numerical differentiation methods like finite differences.
Implementing in VBA
The Savitzky-Golay filter was one of the first signal processing techniques I have implemented on my own in grad school, when I needed a good way to smooth data while analyzing it in Excel. It was a quick User-defined function (UDF) implementation in VBA. Let me bring you this bug-filled implementation in its full glory!
Option Explicit
Function sovgol (rrange As Range, points As Integer, Optional derOrder As Integer = 0) As Variant
' An array function that smoothes a range of equally spaced data using Savitzky-Golay method
'Description of Arguments:
'rrange = Contiguous range of data to be smoothed
'points = Number of points for a running filter (must be an odd number larger than 3)
'derOrder = (Optional) Order of derivation 0-2
Dim m As Integer, i As Integer, j As Integer
Dim c As Variant
Dim rsize As Long
Dim sumprod As Double
Dim p() As Double
Dim original() As Variant, smoothed() As Variant
'Check if $points is odd, and convert to m
If (points Mod 2 <> 1) Or (points < 3) Then
sovgol = CVErr(xlErrValue)
Exit Function
Else
m = (points-1)/2
End If
ReDim p(-m To m)
' Pick the right equation to define the coeficients
Select Case derOrder
Case 0
For i=-m To m
p(i)= (3*(3*(m^2)+3*m-1-5*(i^2)))/((2*m+3)*(2*m+1)*(2*m-1))
Next i
Case 1
For i=-m To m
p(i)= (3*i)/((2*m+1)*(m+1)*m)
Next i
Case 2
For i=-m To m
p(i)= (30*(3*i^2-m*(m+1)))/((2*m+3)*(2*m+1)*(2*m-1)*(m+1)*m)
Next i
Case Else
sovgol = CVErr(xlErrValue)
Exit Function
End Select
'Count number of points in $rrange
rsize = Application.WorksheetFunction.CountA(rrange)
ReDim original(1 To rsize)
ReDim smoothed(1 To rsize, 1 to 1)
' Import range
For i=1 To rsize
original(i) = rrange.Cells(i,1).Value
Next i
'Start with m'th point
For i=m+1 to rsize-m
sumprod = 0
For j=-m To m
sumprod = sumprod + (original(i+j)*p(j))
Next j
smoothed(i,1)=sumprod
Next i
'Fill in missing points at start
For i=1 to m
smoothed(i,1) = smoothed(m+1,1)
Next i
'Fill in missing points at end
For i=(rsize-(m-1)) to rsize
smoothed(i,1) = smoothed(rsize-m,1)
Next i
sovgol = smoothed
End Function
This sovgol
function takes a range of data, a number of points that define the “running window” for the filter, and an optional order of the derivative to calculate (0 for smoothing, 1 for first derivative, and 2 for second derivative). Based on the derivative order and the number of points, equation coefficients are pre-calculated for the filter using polynomial least squares fitting. The function computes the smoothed value for each data point starting from the midpoint of the “running window” to avoid boundary issues. It uses a nested loop where each point is smoothed by summing the product of neighboring points and their corresponding coefficients. The function assigns smoothed values at the edges of the data range by replicating the smoothed values near the boundary. This simple approach handles edge effects but might not be ideal for all applications. As an output, the function returns a “smoothed” two-dimensional array.
To use this function in Excel, paste it in as a VBA module. Then, you would enter a formula in a cell like =sovgol(A1:A100, 5)
for smoothing, or =sovgol(A1:A100, 5, 1)
to compute the first derivative. This function must be entered as an array formula, which involves selecting a range of cells for output, entering the formula, and pressing Ctrl+Shift+Enter
(in older versions of Excel) or simply Enter
in newer versions that support dynamic arrays.
Applications Across Disciplines
Over the decades, the versatility of the Savitzky-Golay filter has allowed it to be adopted in numerous fields beyond its original application in chemistry. In physics, it is used to process signals in instruments that require noise reduction without compromising the integrity of the signal. In astronomy, the filter helps enhance the visibility of celestial bodies by reducing background noise in images.
The digital world has also benefitted from this filter, particularly in the realm of digital signal processing where it is used to process audio signals and other digital data streams. In finance, the Savitzky-Golay filter assists in smoothing out stock market price data, providing a clearer view of market trends without the distraction of short-term volatility.
The Savitzky-Golay filter’s influence extends into the realm of technology development, particularly in the area of data science and machine learning. In these fields, data often contains a significant amount of noise, and maintaining the integrity of the signal during preprocessing is crucial. The filter’s ability to enhance signal clarity without distorting key data features makes it an invaluable tool for algorithms that require clean, accurate data for training and analysis.
Moreover, the algorithm behind the Savitzky-Golay filter has inspired advancements in software and hardware implementations, optimizing it for real-time data processing in various high-tech applications. This adaptability demonstrates its enduring relevance in a rapidly evolving technological landscape.
As we celebrate the 60th anniversary of the Savitzky-Golay filter, it is clear that the legacy of this simple yet powerful tool is ongoing. From its origins in analytical chemistry to its broad applications across numerous scientific and technological disciplines, the filter remains a testament to the enduring importance of fundamental research and its potential to offer practical solutions across decades. Its development not only enhanced our ability to analyze and interpret data more effectively but also set a foundation for future innovations in data processing. The Savitzky-Golay filter continues to be a cornerstone technique in scientific data analysis, symbolizing a bridge between the past and future of technological advancement.