Correlation functions

The functions in this section compute estimates of several space and time correlation functions. The precised definitions of the different quantities computed are summarised below. For more details, caveats and discussion of the correlation functions we refer to the review article by T. S. Grigera[1].

Time correlations and correlation time

Time correlation function

The time (auto-)correlation function of a signal $a(t)$ (which is a noisy quantity that can be found to take different values if it is measured several times, i.e. it is a stochastic process) is defined as

\[ C(t_0,t) = \langle a^*(t_0) a(t_0+t) \rangle,\]

where $a^*$ is the complex conjugate and brackets stands for a history-average (an average over many realizations of the stochastic process, i.e. over many repetitions of the experiment, resetting the initial conditions).

The connected time correlation, or auto-covariance, is

\[ C_c(t_0,t) = \left\langle \delta a^*(t_0) \delta a(t_0+t) \right\rangle = C(t_0,t) - \left\langle a^*(t_0) \right\rangle \left\langle a(t_0+t) \right\rangle,\]

where $\delta a(t)=a(t)-\langle a(t) \rangle$.

Assume the process $a(t)$ is sampled uniformly in time with interval $\Delta t$. Calling $N$ the number of time samples and $M$the number of experiments (i.e. different measurements of $a(t)$ after resetting the initial conditions), we represent these data as $M$ sequences $A_i^{(n)}$ with $i=1,\ldots,N$, $n=1,\ldots,M$. Then the following statistical estimators of the correlation functions can be computed.

In the general case,

\[\begin{align*} \langle a(t_i) \rangle & \approx \overline{A_i} =\frac{1}{M} \sum_{n=1}^M A_i^{(n)}, \\ C_c(t_i,t_k) & \approx \hat C^{(c)}_{i,k} =\frac{1}{M}\sum_n^M \delta A_i^{(n)} \delta A_{i+k}^{(n)} , \qquad \delta A_i^{(n)} = A_i^{(n)} - \overline{A_i}. \end{align*}\]

If the process $a(t)$ is stationary, then the correlation functions time-translation invariant (i.e. independent of $t_0$), and an estimate can be obtained with a single sample, averaging over the time origin:

\[\begin{align*} \overline{A} &= \frac{1}{N}\sum_{i=1}^N A_i, \\ \hat C^{(c)}_k &= \frac{1}{N-k} \sum_{j=1}^{N-k} \delta A_j \delta A_{j+k}, \qquad \delta A_j = A_j - \overline{A}. \end{align*}\]

Note that the estimate obtained this way is noisier the larger the value of $k$.

If the process is not stationary, then the first estimator, which needs several samples, must be used. Although we implement this here, note that in the connected case the covariance function cov from the Statistics package can be used instead and is more convenient. The present routine is more useful for stationary case.

Refer to the review article[1] for more details.

Correlation time

The correlation time is a time scale that measures how separated in time two measurements must be for them to be significantly decorrelated. A good, though abstract, definition is

\[ \tau = \lim_{t\to\infty} \frac{t}{-\log C_c(t)}.\]

which picks the slowest (i.e. longest-ranged) exponential decay rate (cf Correlation length).

Since this definition is not directly applicable to finite data, several practical alternatives have been proposed[1]. We mention here only those that are currently implemented in BioStatPhys.

Spectral correlation time

The spectral correlation time $\tau_S$ is defined as the inverse of the frequency $\omega_0$ such that half the spectral content of the Fourier transform of the connected correlation $C_c(t)$ is contained in the interval $[-\omega_0,\omega_0]$. Translated to the time domain, the definition is

\[ \int_0^\infty \!\!dt \, \frac{C_c(t)}{C_c(0)} \frac{\sin t/\tau}{t} = \frac{\pi}{4}\]

This is computed by correlation_time_spectral.

API

The stationary (or TTI) estimate for a single real or complex sequence is implemented with an algorithm that uses fast-Fourier-transforms, giving $O(N\log N)$ performance.

BioStatPhys.time_correlationFunction
time_correlation(A; connected=true, normalized=false, nt=nothing, i0=nothing, Amean=nothing)

Compute the time (auto-)correlation function for signal A. Returns a vector C[1:nt]

  • A: the time signal, assumed to be sampled at evenly-spaced intervals. If i0==nothing, it must be a Vector (real or complex), which is a single realisation or measurment of the random signal, otherwise it must be a matrix, where columns represent times and rows are different realizations of the random signal.

  • If i0==nothing (default), then A is assumed stationary (or time-translation invariant, TTI), and an estimate employing a single sequence is computed. Otherwise, it must be an integer in the range 1<=i0<=size(A,2) and is interpreted as an index for the desired reference time.

  • connected: if true, compute the connected (i.e. subtracting the time average) correlation.

  • normalized: if true, return the result normalized by C[1]. Not recommended if non-TTI.

  • nt: the maxium time difference to compute in the TTI case, otherwise ignored. Default size(A,1)÷2.

  • Amean: if connected is requested, then the signal mean can be given if known, otherwise it will be computed.

In the TTI case, an FFT-based implementation is used.

In the non-TTI, connected, case, it is probably better to use the covariance function of the Statistics package, as cov(A,dims=1).

source
BioStatPhys.correlation_time_spectralFunction
correlation_time_spectral(C,Δt)

Compute the spectral correlation time $\tau$ from the connected time correlation C (e.g. as computed by time_correlation). The scalar Δt is the time step for the sampling of C.

The exact definition $\tau$ is

\[ \int_0^\infty \!\!dt \, \frac{C_c(t)}{C_c(0)} \frac{\sin t/\tau}{t} = \frac{\pi}{4}\]

source

Density correlations

Here we consider point-like particles in continuous space. The (instantaneous) density of a given configuration $\{\mathbf{r}_i\}$ is

\[\begin{equation*} \rho(\mathbf{r}) = \sum_{i=1}^N \delta(\mathbf{r}-\mathbf{r}_i), \end{equation*}\]

where $\delta(\mathbf{r})$ is Dirac's delta. The density-density correlation function is

\[ G(\mathbf{r},\mathbf{r}_0) = \langle \rho(\mathbf{r}_0) \rho(\mathbf{r}_0+\mathbf{r})\rangle = \left\langle \sum_{i,j} \delta(\mathbf{r}_0-\mathbf{r}_i),\delta(\mathbf{r}_0+\mathbf{r}-\mathbf{r}_i) \right\rangle = \frac{1}{V} \left\langle \sum_{i,j} \delta(\mathbf{r}-(\mathbf{r}_i-\mathbf{r}_j)) \right\rangle, \]

where the last equality holds for homogeneous systems. The angle brackets denote an ensemble average, i.e. an average over configurations weighted with the appropriate probability distribution (e.g. Boltzmann's distribution in a physical system in equilibrium).

The routines in this section compute the density-density correlation function and a couple of other, related and very often used, correlation functions, for the case of homogeneous and isotropic systems in several geometries.

The other correlation functions computed are the radial distribution function

\[g(r) = \frac{1}{\rho^2}G(r) - \frac{\delta(r)}{4\pi\rho r^2} = \frac{1}{\rho N} \left\langle \sum_{ij} \delta\left( r - r_{ij}\right) \right\rangle.\]

and the correlation integral

\[ C(r) = \frac{\rho}{N} \int_0^r \!\!G(r')\,dr' = \frac{1}{N^2} \left\langle \sum_{ij} \Theta(r-r_{ij}) \right\rangle,\]

where $\Theta(r)$ is Heavisde's function.

The computation is done in two steps. First an internal DensityCorrelation object is built from a set of configurations, then the desired correlation is computed passing this object. The first step is slow ($O(N^2)$), while the second is fast. To build the DensityCorrelation object one needs to load one configuration in a vector of vectors and define the region (see Regions) in which the particles lie, then call density_correlation, specifying the desired range and resolution. If more configurations are available, they can be added by calls to density_correlation!. Finally, the desired correlation function is computed. For example:

pos = load_conf()
region = Rectangle(pos)
dc = density_correlation(region,pos,0.1,rmax=10.)
while more_confs()
   pos = load_conf()
   density_correlation!(dc,pos)
end
gr, Cr = rdf(pc)

The DensityCorrelation object is not altered by calling rdf and the like, so that more statics can be added by further calls to density_correlation. Internally, different subtypes of DensityCorrelation are used for different regions, so that the different cases are handled correctly. In particular, for non-periodic regions, the unweighted Hanisch [4][1] method is used.

Configurations are interpreted as a vector of vectors, where each element is a vector of the size equal to the dimension of the region being used. The functions expect AbstactVectors, so for example both Vector{Vector{Float64} and Vector{SVector{2,Float64}} are valid types for a 2-dimensional region, but the latter (using StaticArrays) can be much more efficient.

Fluctuating number of particles

density_correlation! accepts configurations with different number of particles. In this case, rdf will compute $g(r)$ using the definition for the grand canonical ensemble, which for homogeneous systems is [5]

\[ g(r) = \frac{\rho^{(2)}(r)}{\rho^2}, \qquad \rho^{(2)} = \sum_N p(N) \rho_N^{(2)},\]

where $p(N)$ is the probability of finding a configuration with $N$ particles, and $\rho_N^{(2)}(r) = \rho_N^2 g_N(r) = (N/V)^2 g_N(r)$. Note that this is not the same as averaging $g(r)$ over configurations. In particular, for completely uncorrelated systems one gets $g(r) \to \langle \rho^2\rangle / \left(\langle\rho\rangle\right)^2$, which is equal to one in the grand canonical ensemble in the thermodynamic limit, but is larger than 1 in general. It is up to you to decide whether you need this or an average over the radial distribution function.

API

BioStatPhys.density_correlationFunction
density_correlation(region,pos;Δr,rmax)

Return a Density_correlation object for the given region and configuration pos, with resolution Δr and maximum range rmax. The returned object can be passed to the functions that compute the final correlations, such as rdf, or more configurations can be added to it by calling density_correlation(::DensityCorrelation,pos).

pos must be a vector of vectors, i.e. each element of pos must be a vector of the appropriate dimensionality. Performance advantage may be obtained using StaticArrays to represent individual positions. More precisely, the type of pos is

AbstractVector{T} where T<:AbstractVector{W} where W<:Number
source
BioStatPhys.density_correlation!Function
density_correlation!(dcorr::Density_correlation{R},pos) where R <: Region

Use configuration pos to add statistics to the DensityCorrelation object dcorr. The configuration need not have the same number of particles as those used previously, but check with functions rdf and the like how these fluctuations are treated.

source
BioStatPhys.rdfFunction
rdf(dcorr::Density_correlation{R};two_particle_density=false)

If two_particle_densityis false (default), compute the radial distribution function $g(r)$ and the correlation integral $C(r)$. Both are returned as a tuple of ZBinnedVector.

If two_particle_density is true, return the density-density correlation function $G(r)$ instead of $g(r)$. For the isotropic case assumed here, $G(r) = \rho^2 g(r) + \rho \delta(r)/4\pi r^2$ (in 3-d). Since $G(r)$ is singular at $r=0$, here for $r=0$ we return $\rho$, i.e. the integral of $G(r)$ in a very small volume around the origin.

If the dcorr object was fed with configurations with fluctuating number of particles, then the radial distribution function will be computed using the definition for the grand canonical ensamble (see J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, Academic Press (2005)). This may or may not be what you want. See the online documentation of the package for more details.

source

Space correlations and correlation length

The space correlation functions of a space-dependent quantity $a(\mathbf{r})$ are defined as

\[\begin{align*} C(\mathbf{r}) & = \left\langle a(\mathbf{r_0}) a(\mathbf{r_0}+\mathbf{r}) \right\rangle,\\ C_c(\mathbf{r}) & = \left\langle \delta a(\mathbf{r_0}) \delta a(\mathbf{r_0}+\mathbf{r}) \right\rangle, \end{align*}\]

where $\delta a(\mathbf{r}) = a(\mathbf{r}) - \langle a\rangle$ and we are assuming the system is homogeneous (i.e. translation-invariant). $C_c(\mathbf{r})$ is called connected correlation in the physics literature, or auto covariance in the mathematical statistics literature.

The space_correlation function computes the estimate of $C(r)$ for an isotropic discrete system,

\[\begin{align*} \hat C_c(r) & = \frac{\sum_{ij} \delta a_i \delta a_j \delta(r-r_{ij}) } {\sum_{kl} \delta(r-r_{kl})}. \end{align*}\]

The average $\langle a\rangle$ can be estimated with space average or phase average (see below and the review[1] for details).

Correlation length

The correlation length is a length scale that measures how far apart two points in space must be taken for them to be significantly decorrelated. A good, though abstract, definition is

\[ \xi = \lim_{r\to\infty} \frac{r}{-\log C_c(r)},\]

which picks the slowest (i.e. longest-ranged) exponential decay rate (cf. Correlation time).

Since this definition is not directly applicable to finite data, several alternatives have been proposed, more suited to experimental or numerical determination but respecting the functional dependence with control parameters[1]. We mention here only those that are currently implemented in BioStatPhys.

$r_0$

If the connected correlation has been computed with space averaging, $C_c(r)$ will have at least one zero, and the location $r_0$ of the first of these can be used as a proxy of $\xi$. We stress that $r_0$ is not a correlation length, but a useful scale in the case the correlation can be measured for different system sizes $L$ (or in observation windows of different size[2]). $r_0$ scales with size as[3]

\[ \begin{align*} r_0 &\sim \xi \log(L/\xi), & L\gg\xi, \\ r_0 & \sim L, & L\ll \xi. \end{align*}\]

The second situation will always be the case in scale-free systems where $\xi=\infty$. See correlation_length_r0.

API

BioStatPhys.space_correlationFunction
space_correlation(binning::DistanceBinning, X; connected=true, normalized=false, Xmean=nothing)

Compute space correlation of scalar signal X in an isotropic, space-translation-invariant system using the given DistanceBinning. Since the system is assumed to be isotropic and homogeneous, correlations should depend only on the distances among the pairs, so return a tuple (r,C) where r is a vector of distances and C the correlation function at the corresponding distance.

Positions are assumed to be fixed in all configurations, with only the signal changing value.

  • X: a matrix where each column holds the signal at different positions in space and at a given time. All columns are used to average the correlation estimate over configurations.

  • binning: instead of positions, this function expects a pre-built DistanceBinning object holding lists of pairs separated by distances in the given range, such as the binnings created by distance_binning.

  • connected: If true, compute the connected correlation, i.e. subtracting the mean of the signal. If Xmean is nothing, space-averaging is used, i.e. at each configuration (column) the mean of X is computed and subtracted. For phase averaging, give a precomputed mean in Xmean.

  • normalized: If true, normalize by $C(r=0)$.

source
space_correlation(region::Region,Δr;rmax=nothing,connected=false)
space_correlation(region::Region,pos::ConfigurationT,Δr;rmax=nothing,connected=false)

Build and return a Space_correlation_vectorqty object to compute space correlations of some vectorial observable. If pos is given, it will be assumed that positions will be the same for all configurations, and a DistanceBinning object will be built to save time later. If posiitons change with every configuration, then omit this argument.

To add data to an object corr, call

space_correlation!(corr,pos,vec)

or

space_correlation!(corr,vec)

if particles are static. Correlation functions are obtained calling correlations(corr).

source
BioStatPhys.correlation_length_r0Function
correlation_length_r0(r,C)

Compute the correlation length proxy $r_0$ from $C_c(r)$ given as vectors r and C (as obtained e.g. from space_correlation(@ref).

Note that $r_0$ is not a correlation length, just a proxy that scales with system size $L$ as $\log L$ or $L$ if the actual correlation length is much shorter or much larger than $L$ respectively.

source

References

  • 1T. S. Grigera, Correlation functions as a tool to study collective behaviour phenomena in biological systems. J. Phys. Complex. 2, 045016 (2021). [DOI]. This review discusses the correlation functions and estimators computed by the routines documented here and is the general reference for this section.
  • 2D. A. Martin, T. L. Ribeiro, S. A. Cannas, T. S. Grigera, D. Plenz, and D. R. Chialvo. Box scaling as a proxy of finite size correlations. Sci Rep 11, 15937. (2021) [DOI]
  • 3A. Cavagna, I. Giardina, and T. S. Grigera. The physics of flocking: Correlation as a compass from experiments to theory. Physics Reports 728, 1–62 (2018). [DOI]
  • 4K. H. Hanisch, Some remarks on estimators of the distribution function of nearest neighbour distance in stationary spatial point processes. Ser. Stat. 15, 409 (1984). [DOI]
  • 5J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, Academic Press (2005)