CHAPTER 2
TIME SERIES ANALYSIS

The estimation of the power spectrum of a time series formed by equally-spaced observations is well defined in terms of the Fourier transform, and a volume of references are available that describe the concepts of the Fourier transform, the integral equations and the orthogonal functions (Margenau and Murphy 1956; Linnik 1961; Wilf 1962; Hamilton 1964; Bracewell 1965; Hildebrand 1965; Goertzl 1966; Grenville 1967; Arfken G. 1970; Hamming 1971; Hauck 1971; Pall 1971; Brigham 1974; Brownwell 1974; Cooley, Tukey, Lewis and Welch 1974; Esch 1974; Newbury 1974; Ahmed and Rao 1975; Oppenheim, Johnson and Schafer 1975; Rabiner and Gold 1975; Davies 1978 and Kovacs 1981). The problems associated with Fourier analysis of finite, equally-spaced data (leakage, aliasing, Gibbs phenomena, over-sampling, under-sampling and noise) are also well described in the literature (Schwartz 1959; Hofstetter 1964; Bracewell 1965; Van der Ziel 1970; Brault and White 1971; Brigham 1974; Fischel 1976; Gray 1976; Smith and Parsons 1976; Kovacs 1981). This chapter details the modified Scargle periodogram and the data-compensated discrete Fourier transform, which are used for the spectral analysis of unevenly-spaced data. A brief discussion of the general Fourier transform and its characteristics are included here for the sake of completeness.

2.1 Fourier Transforms

The continuous Fourier transform is defined to be a two-sided Laplace transform with a purely real coefficient such that

Equation 2-1

. (2-1)

The above transformation assumes that the function f(t) is both infinite and continuous. When the data to be analyzed are finite and discrete, then the general formulation of the finite discrete Fourier transform (DFT) must be employed. The formulation for the DFT over N equidistant points of a function f(kDt) is given by

Equation 2-2

(2-2)

where the function f(kDt) is assumed to be sampled at evenly-spaced intervals Dt over a total interval DT. The transform is only applicable within the limits of the so called spectral window. The fundamental frequency is the lowest frequency for which there is information in the data and is given by 1/DT. This frequency corresponds to a sine wave of period equal to the whole interval DT. The highest frequency for which there is information in the data is the Nyquist frequency and is given by 1/(2Df). This frequency is a result of the Shannon sampling theorem which states that in order to reconstruct a signal containing frequency components in the spectrum 0 to wm from sampled data, uniformly-spaced samples must be taken at a rate of at least 2wm.

The DFT and the continuous Fourier transform are exactly the same, within a scaling constant, only when the function f(t) is periodic, band-limited (meaning that the transform is negligible outside a finite range of frequencies), sampled at the Nyquist frequency and truncated at exactly an integer multiple of the period, in which case the spectrum is simply replicated at harmonics of the Nyquist frequency (Figure 2-1). If any of these conditions are not met, then leakage will occur (power from one frequency component will appear at other frequencies).

Spectral leakage is inherent to frequency analysis with a finite amount of data. Leakage to distant frequencies arises from the finite size of the interval between samples. The well known phenomena of aliasing (the leakage of power from high frequencies to much lower frequencies) is a special case of leakage caused by the strict enforcement of a uniform sample spacing. Irregular data spacing substantially reduces aliasing (Beutler 1966, 1970; Gaster and Roberts 1975, 1977; Masry and Lui 1975; Higgins 1976). Leakage to nearby frequencies (sidelobes) occurs because of the finite total interval over which the data are sampled. The shorter the total sampling interval, the more pronounced the sidelobe structure. In fact, the semi-regular spacing inherent in astronomical observations can result in significant leakage of power into the sidelobes to give an effect very similar to conventional aliasing (Deeming 1975; Scargle 1981).

The use of a data window, or tapering function, is a standard method for reducing sidelobe structures in Fourier transforms. The choice of the tapering function is somewhat arbitrary, but Bingham et al. (1967) have suggested the use of a cosine bell over the first and last 10% of the data. Unfortunately, the only effect tapering has on the data for this work is to reduce the overall power within the signal, because the sidelobe structures are primarily caused by the irregular spacing.

2.2 The Modified Scargle Periodogram

As discussed in Chapter 1, the DFT is inadequate for the spectral decomposition of unevenly-spaced data. Horne and Baliunas (1986) have describe a method for the accurate period determination of irregularly spaced data based on the modified periodogram (Scargle 1982), which is given by:

Equation 2-3

(2-3)

where

Equation 2-4

(2-4)

and

Equation 2-5

(2-5)

This method not only gives an excellent estimation of the frequency of the periodic signal, but it also provides for the estimation of the statistical error in the frequency determination (Kovacs 1981) and a simple estimate of the significance of the height of a peak in the power spectrum (Horne and Baliunas 1986).

Kovacs (1981) found the standard deviation of the frequency to be

Equation 2-6

(2-6)

where A is the amplitude of the signal, s2n is the variance of the noise after the signal has been filtered, T is the total length of the data set and N is the number of points in the data set. Kovacs' derivation assumes a single signal with Gaussian noise and even data spacing, but Baliunas et al. (1985) have shown that uneven spacing does not degrade the uncertainty to any noticeable degree.

Horne and Baliunas (1986) have shown that for a periodogram scaled by the total variance of the data (s2), the probability that some peak is of height z or higher is the false alarm probability Á and is given by:

Equation 2-7

(2-7)

Scargle (1982) showed that the expected noise peak can be expressed as

Equation 2-8

(2-8)

where the square brackets indicate the greatest integer function and

Equation 2-9

(2-9)

In these formulations, N is the total number of points represented in the data set and Ni represents a numerical fit, determined by Horne and Baliunas (1986), to the number of independent frequencies in the data.

For uniformly-spaced data, the fundamental and Nyquist frequencies specify the limits of the spectral window in which periodicity analysis may be performed. For unevenly-spaced data, the fundamental frequency is still well defined, but the Nyquist frequency is not. Scargle (1982) has shown that for unevenly-spaced data, the spectral window which defines these "natural frequencies" cannot be put into closed form, so it must be estimated through analysis on synthetic data sets by replacing the amplitude component of the data with a high frequency sine wave. The pseudo-Nyquist frequency can then be determined by measuring the separation of the harmonic signals from the primary signal.

For evenly-spaced data, there is no spectral information above the Nyquist limit. This limitation is not strictly true for unevenly-spaced data since there may be data spacings smaller than the pseudo-Nyquist limit determined from the pseudo-window. If the data were evenly-spaced, all the Nyquist harmonics would be exact duplications of the primary frequency pattern, thus it is impossible to determine if a given signal is real or simply a harmonic of a higher frequency (Figure 2-1). For unevenly-spaced data, however, the harmonics fall off optimally in amplitude as one gets further away from the primary signal, thus it is possible to distinguish a real signal from a harmonic, even if the real signal is above the pseudo-Nyquist limit (Figure 2-2). Problems can arise however because of noise and under-sampling. In some cases the nearest harmonic and primary signals will be negligibly different, so a clear determination based strictly on the photometric data cannot always be made.

2.3 The Data-Compensated Discrete Fourier Transform

The finite Fourier transform has demonstrated itself as inadequate for the harmonic filtering of unevenly-spaced data sets. Ferraz-Mello (1981) has proposed the use of the data-compensated discrete Fourier transform (DCDFT) to overcome the limitations of the finite Fourier transform and allow for the correct filtering of a signal. The measurement of the complex power spectrum F(w), for a function f (t) consisting of N data points using the DCDFT is given by:

Equation 2-10

(2-10)

where

Equation 2-11

(2-11)

with

Equation 2-12

(2-12)

and

Equation 2-13

(2-13)

In these formulas, the parentheses mean inner products

Equation 2-14.

(2-14)

The filtered time series, f'(t), is now specified by

Equation 2-15

(2-15)

where the filtering function, d(w,t), for a specific angular frequency w at time t, is given by

Equation 2-16

(2-16)

With a combination of the Scargle periodogram and the data-compensated discrete Fourier transform, the complete spectral characteristics of an unevenly-spaced data set may be determined down to the noise level of the observations.

Secondary peaks can result from the filtering process itself as well as true periodicity. Consider the periodogram of an unevenly sampled sine wave with w=1 (Figure 2-3a). Figure 2-3b shows the periodogram for same sine wave after the primary frequency has been removed from the signal. The significance of the peak in Figure b would indicate, falsely, that there is a secondary frequency. In order to distinguish real secondaries from artifacts of the filtering process, the data is filtered on either side of the primary frequency. The filtered data sets are then reanalyzed for the secondary frequency. If the secondary frequency shows a dependence with respect to the filtering frequency, then the signal is an artifact of the filtering process and cannot be considered real. Figure 2-4 shows an example of the dependency of the residual frequency (wr) on the filtered frequency (wf) for an unevenly sampled sine wave [A=0.1cos(0.6t-1.96)]. The frequency dependence of the residual is clearly visible. The dashed line in Figure 2-4 shows the response when a true secondary (in this case at an angular frequency of 0.3 and an amplitude of 0.02) is present in the signal. This secondary frequency is clearly independent of the filtering process.

Problems can arise if frequencies are too close together. As pointed out by Horne and Baliunas (1986), if the difference between the angular frequencies of two signals is less than 5.5662/T, where T is the length of the data set, then the resultant periodogram peak will lie between them. Filtering of this central peak will produce strong secondaries at, or near, the resolution limit of the periodogram producing a multitude of closely spaced signals.

2.4 Analysis Procedure

Spectral pseudo-windows are determined for each data set by replacing the actual data with a high frequency sine wave (w = 20) and then generating a periodogram over the angular frequency range 0 < w £ 40 (Scargle 1982). The pseudo-Nyquist frequency is obtained by measuring the separation of the repeated patterns of the fundamental signal (see Chapter 6). For all the data sets, the pseudo-Nyquist frequency is on the order of 2 days-1. This harmonic frequency is not surprising, considering the observing pattern of the APT (see Chapter 4).

V-band periodograms from the fundamental frequency (w = 2p/DT) to three times the pseudo-Nyquist frequency (w » 19), with a resolution of 2048 points over each harmonic group, are generated for each source. These spectra allow for the initial evaluation of periodic structure. If no structure is apparent, then periodograms for each color are generated and examined. For example, Figure 2-5 shows the periodogram, for the first harmonic group, for HD 136901 where periodic structure is clearly visible, while the periodogram for 13 Cet (A), Figure 2-6, shows no apparent structure. If periodic structure is visible, then the frequency ranges over the peaks are taken and the peak values, from the normalized periodograms, are determined for each color.

If a frequency group lies near the orbital period, then that group is selected. If no such group exists, then the following consistency criteria are used for the selection: If the frequencies for all colors in each frequency group are similar and if all the amplitudes show a decline, or an initial rise and then a decline, as a function of frequency, then the peak amplitude frequency group is used.

If the frequencies within a selected group are not consistent, then the individual periodograms for the three colors are compared. For sources such as 47 Dra and 6 Tri, the discrepancy is simply caused by a lack of any visible signal at U-band (see Chapter 6). Other sources, however, show heavy side lobes. At this point it must be assumed that all colors of the observations will share the same periodic nature, but not necessarily identical leakage structures, so a convolution of the signals would amplify the primary signal while diminishing the harmonics, isolating the desired frequency.

Once the primary frequencies have been identified (see Chapter 3 for a description of the analysis environment), the amplitude and phases are tabulated and are filtered from the data sets. The entire procedure is then repeated to search for remaining structure. Residual frequency dependence is examined as described at the end of the previous section. The results and interpretations of the analysis, as well as the sensitivity limits, will be discussed in Chapters 6 and 7.


Figure 2-1
Nyquist Harmonic Power Pattern for Evenly-Spaced Data

This figure illustrates the repeated Nyquist harmonic power pattern in a periodogram for evenly-spaced data using the function sin(0.3t) with 100 data points and a Nyquist angular frequency of 1.7. It also illustrates the effects of aliasing and the inability to distinguish between the harmonic peaks for evenly-spaced data. A log amplitude scale was used to more strongly illustrate the pattern repetition.

Figure 2-1


Figure 2-2
Nyquist Harmonic Power Pattern for Unevenly-Spaced Data

This figure illustrates the repeated Nyquist harmonic power pattern in a periodogram for unevenly-spaced data using the same function, sin(0.3t), as seen in Figure 2-1. It is clear that the aliased signals have been removed and that there is a definite power drop-off in the Nyquist harmonic signals away from the primary.

Figure 2-2


Figure 2-3
Effects of Incomplete Filtering

These figures show an extreme case of the effects of incomplete filtering. Figure 2-3a is the periodogram of the function sin(t) sampled over 100 unevenly-spaced data points while 2-3b is the residual power pattern after the signal at w=1 has been removed.

Figure 2-3


Figure 2-4
Residual Frequency Dependence

This figure illustrates residual frequency dependence for the function 0.1cos(0.6t - 1.96) with a plot of the residual angular frequency versus the filtered angular frequency. The solid line is the response when no secondary signal is present while the dashed line is the response when the secondary signal 0.02cos(0.3t)has been added.

Figure 2-4


Figure 2-5
HD 136901 Broad Band Periodogram

This figure displays the 2048 point periodogram over the angular frequency range 0 < w £ 6.4 for the source HD 136901 at V-band. It is clear that there is a strong signal near w = 0.7. The signal near w = 5.7 is leakage from the first Nyquist harmonic which is not visible on this plot.

Figure 2-5


Figure 2-6
13 Cet (A) Broad Band Periodogram

This figure displays the 2048 point periodogram over the angular frequency range 0 < w £ 6.4 for the source 13 Cet (A) at V-band. There is clearly a visible lack of periodic structure in this source.

Figure 2-6
Back Up Next
IDTR Home Page

05/22/02 ern

Copyright (c) 1988-1997, Eric R. Nelson, Ph.D.