Data assimilation in coastal ocean models

Alexander Barth, <>

This presentation is on-line at

Scripts are available at

Hit the space bar for next slide

Data assimilation (basic concepts)

Data assimilation aims to combine model results and observation in clever way.

Ways to represent uncertainty

Errors in your model

Errors in your model might be due to

Errors in your observations

Errors in your observations might be due to

Observations might not represent exactly the same as the model variables

Errors covariance

Data assimilation in coastal ocean models

Ensemble simulation

  1. Perturbation of uncertain model inputs (initial conditions, boundary conditions, parameters) within their range of uncertainty
    • Homogeneous and isotropic perturbations
    • EOF perturbations
  2. Model run with perturbed forcing
  3. Analysis
    • global scheme
    • local scheme

Ensemble perturbation

Generate 500 random 2D-fields representing an area of 300 km x 300 km and a spatial correlation length scale of 20 km with adequate resolution. Compute the covariance of the central grid point with all other grid points. The ensemble covariance between two random variables x and y (with zero mean) is computed as:
\mbox{cov}(x,y) = \frac{1}{K} \sum_{k=1}^{K} x_k y_k
where k is the ensemble member index. Repeat with K = 100. Describe the structures you are seeing and the differences between the case K = 100 and K = 500.


m = 100;
n = 100;
K = 500;
q = randomfield(m,n,K,3e3,3e3,20e3);

covar = zeros(m,n);

for j = 1:n
  for i = 1:m
    covar(i,j) = sum(q(i,j,:) .* q(m/2,n/2,:));
covar = covar/K;

% or with without loops
covar2 = sum(q .* repmat(q(m/2,n/2,:),[m n 1]),3)/K;

Ensemble perturbation

Ensemble perturbation using EOFs

Create wind perturbations using multivariate EOFs using the test data in The matrix \mathbf W containing the zonal u^a and meridional v^a wind components is defined as:
\begin{eqnarray} \mathbf W_{ij} &=& u^a_{ij} \\ \mathbf W_{i+m,j} &=& v^a_{ij} \end{eqnarray}
where i is the spatial index (1 \le i \le m) and j is the temporal index (1 \le j \le n) and m and n are the number of spatial grid points and time steps respectively. The size of the matrix \mathbf W is thus 2m \times n. Wind fields anomalies:
\begin{eqnarray} \overline{\mathbf{W}}_{i} &=& \frac{1}{n} \sum_{j=1}^n W_{ij} \\ \mathbf W'_{ij} &=& \mathbf W_{ij} - \overline{\mathbf{W}}_{j} \end{eqnarray}
for all i \le 2m.

Ensemble perturbation using EOFs

A SVD decomposition, limited to the r=10 dominant eigenvectors (EOFs):
\begin{equation} \mathbf W' \sim \mathbf U \mathbf \Sigma \mathbf V^T \end{equation}
where \mathbf U^T \mathbf U = \mathbf I, \mathbf V \mathbf V^T = \mathbf I and \mathbf \Sigma is a r \times r diagonal matrix. The EOFs \mathbf U are used to produce multivariate, spatial perturbations of the wind field:
\begin{equation} \mathbf W^{(k)} = \mathbf W + \mathbf U \mathbf \Sigma \mathbf A^{(k)} \end{equation}
where \mathbf A^{(k)} is a r \times n matrix containing random time series and where the superscript k represents the ensemble member. The random time series (rows of \mathbf A^{(k)}) have a standard deviation of \frac{1}{3} and a temporal correlation length scale of L=7 days.

Ensemble perturbation using EOFs

Useful functions:


Twin experiment

Scheme of a twin experiment:

Twin experiment

Test case

Ocean Assimilation Kit (OAK)


# Default compilation (using all variables in
# Override some variables in
make OPENMP=on

Concepts for the assimilation module

NetCDF files from Octave/Matlab

Initialization file

State vector

Model.variables = [                  'zeta',         'temp',         'salt']  
Model.gridX     = [',:,end)','','']  
Model.gridY     = [',:,end)','','']  
Model.gridZ     = [  ',:,end)',  '',  '']  
Model.mask      = [  ',:,end)',  '',  '']  
Model.path      = '/home/user/Data/' 

Note: The coordinate arrays gridX, gridY and gridZ should have the same size than the mask. If some dimensions are missing in grid array, then the values are repeated along this dimension. For example if mask is 25x20x10, but gridX is 25x20, then the values of gridX are repeated 10 times.

Observation vector

Obs001.time      = '2010-07-06T00:30:00.00'  ! time as YYYY-MM-DDThh:mm:ss[.ss]
Obs001.path      = 'Obs/'                    ! where the file can be found
Obs001.variables  = [               'TEM']   ! name as in Model.variables
Obs001.names      = [      'temp_profile']   ! descriptive name
Obs001.gridX      = [       '']   ! longitude
Obs001.gridY      = [       '']   ! latitude
Obs001.gridZ      = [         '']   ! depth
Obs001.value      = [      '']   ! value of the observations
Obs001.mask       = [      '']   ! mask of the observations (FillValue_)
Obs001.rmse       = [ '']   ! root mean square error

Reduced order analysis

The best linear unbiased estimator (BLUE) of the model's state vector given the model forecast \mathbf x^f with error covariance \mathbf P^f and the observation \mathbf y^o with error covariance \mathbf R is given by \mathbf x^a:
\begin{eqnarray} \mathbf x^a &=& \mathbf x^f + \mathbf K \left(\mathbf y^o - \mathbf H \mathbf x^f \right) \\ \mathbf K &=& \mathbf P^f \mathbf H^T \left( \mathbf H \mathbf P^f \mathbf H^T + \mathbf R \right)^{-1} \\ \mathbf P^a &=& \mathbf P^f - \mathbf K \mathbf H \mathbf P^f \end{eqnarray}
where \mathbf H is the observation operator extracting the observed part of the state vector and \mathbf P^a the error covariance of the analysis \mathbf x^a. As many other assimilation schemes (SEEK, RRSQRT, ESSE, EnKF) we decompose \mathbf P^f in:
\mathbf P^f = \mathbf S^f {\mathbf S^f}^T
Where \mathbf S^f is a n \times r matrix. The reduced order implementation is only effective if r is small (r << n).
For an ensemble {\mathbf x^f}^{(k)} \; (k=1,\dots,r), \mathbf x^f is the ensemble average and the columns of \mathbf S^f are constructed by:
(\mathbf S^f)^{(k)} = \frac{1}{\sqrt{r-1}} \left({\mathbf x^f}^{(k)} - {\mathbf x^f} \right)
In practice, the following eigenvalue decomposition is made,
\left(\mathbf H \mathbf S^f\right)^T \mathbf R^{-1} \mathbf H \mathbf S^f = \mathbf U \mathbf \Lambda \mathbf U^T
where \mathbf U^T \mathbf U = \mathbf I and where \Lambda is diagonal. The Kalman gain \mathbf K and \mathbf S^a can be computed by:
\begin{eqnarray} \mathbf K &=& \mathbf S^f \mathbf U (1+ \mathbf \Lambda)^{-1} \mathbf U^T (\mathbf H \mathbf S^f)^T \mathbf R^{-1} \\ \mathbf S^a &=& \mathbf S^f \mathbf U (1+ \mathbf \Lambda)^{-1/2} \mathbf U^T \mathbf \Omega \end{eqnarray}
\mathbf \Omega is an orthogonal matrix chosen such that average of all columns is zero. \mathbf S^a is the square root of \mathbf P^a:
\mathbf P^a = \mathbf S^a {\mathbf S^a}^T
Based on \mathbf x^a and \mathbf S^a, an ensemble can be reconstructed:
{\mathbf x^a}^{(k)} = \mathbf x^a + {\sqrt{r-1}} \; {\mathbf S^a}^{(k)}

Error space

ErrorSpace.dimension = 50
ErrorSpace.path      = './'
ErrorSpace.init      = ['("Ens",I3.3,"/")','("Ens",I3.3,"/")']
ErrorSpace.type      = 2 ! 1: modes; 2: ensemble
The strings in ErrorSpace.init are Fortran formats. For example for the first ensemble member, ("Ens",I3.3,"/") will be expanded as Ens001/


! same elements as state vector
Diag*.stddevxf  =  ['','','']
Diag*.stddevxa  =  ['','','']
Diag*.xf        =  ['','','']
Diag*.xa        =  [      '',      '',      '']
Diag*.xa-xf     =  [     '',     '',     '']

! same element as observation vector
Diag001.Hxf       = [        '']
Diag001.Hxa       = [        '']
Diag001.yo        = [        '']
Diag001.invsqrtR  = [  '']
Diag001.stddevHxf = [ '']
Diag001.stddevHxa = [ '']
Diag001.path   = 'Analysis001/'

Assimilation of a point measurement

Using the ensemble of the Ligurian Sea, assimilate a point measurement located at (lon = 9.5, lat = 43.2) and 3 meters depth. Choose sensible values for temperature, and root-mean square error. Repeat this with a point at (lon = 9, lat = 44.2). Describe the differences of the increments \mathbf x^a - \mathbf x^f and error standard deviation of the forecast and the analysis. Useful files: assim.init.template and A model output can be used as mask.
assim-gfortran-single assimTS.init 1

Global versus local assimilation

Error reduction

Local assimilation

! possible values of "schemetype" are:
! 0 = global assimilation (default)
! 1 = local assimilation

schemetype = 0
Definition of zones:
!Model.variables =       [                  'ETA',         'TEM',         'SAL']  
Zones.partition        = [',:,end)','','']
Zones.corrLength.const = [                   30e3,          30e3,          30e3]
Zones.maxLength.const  = [                 2000e3,        2000e3,        2000e3]
Zones.path = 'Common/'
All elements of those arrays are integer values. Variables are in the same zone if the same integer is associated to them.

Local assimilation

Repeat the point measurement assimilation with local scheme and describe the differences of the increments \mathbf x^a - \mathbf x^f. Use the script gen_part to create the partition file from the mask.
assim-gfortran-single-openmp assimTS.init 1

SST Assimilation

Describe the differences of the increments \mathbf x^a - \mathbf x^f.
Variable RMS(\mathbf x^f,\mathbf x^t) RMS(\mathbf x^a,\mathbf x^t)
Temperature 0.080 0.066
Salinity 0.0063 0.0052

Note: RMS is here a volume average

SST Assimilation

local versus global

4D Extension

Including the time dimension: Including the forcing fields:

Assimilation of HF radar surface currents

Assimilation of HF radar surface currents

local versus global
Radial currents on 2010-07-06 18:30 relative to the Palmaria site: WERA (left) and ROMS without assimilation (right).

Surface current assimilation (exercise)

Describe the differences of the increments \mathbf x^a - \mathbf x^f.

Surface current assimilation (exercise)

Variable RMS(\mathbf x^f,\mathbf x^t) RMS(\mathbf x^a,\mathbf x^t)
Temperature 0.080 0.067
Salinity 0.0063 0.0057
u-velocity 0.010 0.0097
v-velocity 0.010 0.0095
u-wind 0.61 0.40
v-wind 0.60 0.54

Challenges for data assimilation

Challenges for data assimilation (cont)