# Divand

### From GHER

Line 19: | Line 19: | ||

= Requirements = | = Requirements = | ||

- | Your need [http://www.octave.org GNU Octave] (version 3.4 or later) or MATLAB to run divand. | + | Your need [http://www.octave.org GNU Octave] (version 3.4 or later) or MATLAB to run <tt>divand</tt>. |

= Installing = | = Installing = |

## Revision as of 16:38, 16 January 2014

`divand` performs an n-dimensional variational analysis of arbitrarily located observations. Observation will be interpolated on a curvilinear grid in 2, 3 or more dimensions.

Please cite this paper as follows if you use `divand` in a publication:

Barth, A., Beckers, J.-M., Troupin, C., Alvera-Azcárate, A., and Vandenbulcke, L.: divand-1.0: n-dimensional variational data analysis for ocean observations, Geosci. Model Dev., 2013, in press. URL http://hdl.handle.net/2268/160913

The submitted manuscript is also available at Geoscientific Model Development 'Discussion', but please cite the final version (in Geoscientific Model Development) rather than the discussion paper.

## Contents |

# Requirements

Your need GNU Octave (version 3.4 or later) or MATLAB to run `divand`.

# Installing

- Download the source code from http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/divand-1.1.tar.gz.
- Extract the package
- In MATLAB add the directory divand-x.y to the search path. In octave add the command to your
`.octaverc`file (for testing it is sufficient if you move your current working directory to the divand-x.y directory):

addpath('/path/to/divand-x.y.');

# Testing

A test script `test_divand` is included to verify the correct functioning of the toolbox.
The script should be run in a Octave or MATLAB session. All tests should pass without warning or error.

run test_interp_1d: (max difference=2.22045e-16) OK run test_interp_2d: (max difference=2.22045e-16) OK run test_interp_regular: (max difference=2.22045e-16) OK run test_sparse_diff: (max difference=0) OK run test_1dvar: OK run test_2dvar: (max difference=4.28659e-16) OK run test_2dvar_check: (max rms=2.39788e-16) OK run test_2dvar_adv: OK run test_2dvar_rellen: OK run test_2dvar_lenxy: OK run test_2dvar_check_correrr: (max rms=7.45461e-16) OK run test_2dvar_constrain: (max rms=9.41909e-10) OK run test_2dvar_cyclic: (max rms=2.71352e-14) OK run test_2dvar_eof_var: OK run test_2dvar_eof_check: OK run test_3dvar: (max difference=0.0182573) OK run test_3dvar_large_stacked: (max difference=2.40796e-15) OK run test_4dvar: (max difference=0.0111807) OK

# Documentation

The main routine of this toolbox is called `divand` which performs an n-dimensional variational analysis of arbitrarily located observations. It can be called with a different number of input and output arguments:

[fi,err,s] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,err] = divand(mask,pmn,xi,x,f,len,lambda,...); [fi,s] = divand(mask,pmn,xi,x,f,len,lambda,...);

## Input arguments

- mask
- binary mask delimiting the domain. 1 is inside and 0 outside. This parameters has the same number of dimensions and grid points as the domain. For oceanographic application, this is the land-sea mask.
- pmn
- scale factor of the grid. pmn is a cell array with n arrays (each array has the same size as the mask). Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.
- xi
- cell array with n arrays (each array has the same size as the mask). Every element represents a coordinate of the final grid on which the observations are interpolated
- x
- cell array with n vectors (m-by-1 vector). Every vector represents a coordinate of the observations
- f
- vector with the value of the observations *minus* the background estimate (m-by-1 array). (see note)
- len
- correlation length (units must be consistent with scale factors)

- if len is scalar, then the correlation length is the same in all directions and everywhere in the domain
- if len is cell array of n scalars, then the correlation length can depend on the direction but for a given direction the correlation length is the same everywhere in the domain. The option is for example used when analysing 3D domain (longitude, latitude and depth) since the vertical scale is much smaller than the horizontal scale.
- if len is cell array of n arrays (each array has the same size as the mask), then the correlation length can depend on the direction and on the location. The individual arrays should have the same size as the domain.

- lambda
- signal-to-noise ratio of observations (if lambda is a scalar). lambda is defined as the ratio of background error variance over the observation error variance. The larger this value is, the closer is the interpolated field
`fi`to the observation.

- If lambda is a scalar, then R = 1/lambda I, where R is the observation error covariance matrix (scaled by the background error variance),
- If lambda is a vector, then R = diag(lambda)
- If lambda is a matrix (or matrix-like object such CovarParam), then R = lambda

## Optional input arguments

Optional input arguments specified as pairs of keyword and values:

- 'velocity', vel
- cell array with n arrays (each array has the same size as the mask) representing the velocity of advection constraint. The default is no advection constraint

- 'alpha'
- alpha is vector of coefficients multiplying various terms in the cost function.

The first element multiplies the norm. The other i-th element of alpha multiplies the (i+1)-th derivative. Per default, the highest derivative is m = ceil(1+n/2) where n is the dimension of the problem. The values of alpha are per default the (m+1)th row of the Pascal triangle:

m=0 1 m=1 1 1 m=1 1 2 1 (n=1,2) m=2 1 3 3 1 (n=3,4) ...

- 'diagnostics'
- 0 or 1 turns diagnostic and debugging information on (1) or off (0, default). If on, they will be returned as the last output argument

- 'constraint'
- a structure with user specified constraint

- 'moddim'
- modulo for cyclic dimension (vector with n elements). Zero is used for non-cyclic dimensions. Halo points should not be included for cyclic dimensions. For example if the first dimension is cyclic, then the grid point corresponding to mask(1,j) should be between mask(end,1) (left neighbord) and mask(2,j) (right neighbord)

- 'fracdim'
- fractional indices (n-by-m array). If this array is specified, then x and xi are not used.

Note: 'velocity' and 'constraint' may appear multiple times

## Output

- fi
- the analysed field
- err
- error variance of the analysis field relative to the error variance of the background
- s
- structure with some diagnostics mainly for debugging purposes.

### Note

If zero is not a valid first guess for your variable (as it is the case for e.g. ocean temperature), you have to subtract the first guess from the observations before calling divand and then add the first guess back in.

## Advanced usage

### Additional constraint

An arbitrary number of additional constraint can be included to the cost function which should have the following form:

For every constrain a structure with the following fields is passed to `divand`:

- yo
- the vector
- H
- the matrix
- R
- the matrix (symmetric and positive defined)

Internally the observations are also implemented as an additional constraint.

## Example

### Simple analytical example

see divand_simple_example.m

% A simple example of divand is 2 dimensions % with observations from an analytical function % observations x = rand(75,1); y = rand(75,1); f = sin(x*6) .* cos(y*6); % final grid [xi,yi] = ndgrid(linspace(0,1,30)); % reference field fref = sin(xi*6) .* cos(yi*6); % all points are valid points mask = ones(size(xi)); % this problem has a simple cartesian metric % pm is the inverse of the resolution along the 1st dimension % pn is the inverse of the resolution along the 2nd dimension pm = ones(size(xi)) / (xi(2,1)-xi(1,1)); pn = ones(size(xi)) / (yi(1,2)-yi(1,1)); % correlation length len = 0.1; % signal-to-noise ratio lambda = 20; % fi is the interpolated field fi = divand(mask,{pm,pn},{xi,yi},{x,y},f,len,lambda); % plotting of results subplot(1,2,1); pcolor(xi,yi,fref); shading flat,colorbar ca = caxis; set(findobj(gcf,'FontSize',10),'FontSize',16) title('Reference field and observation loc.'); hold on plot(x,y,'k.'); subplot(1,2,2); hold on pcolor(xi,yi,fi); shading flat,colorbar caxis(ca); set(findobj(gcf,'FontSize',10),'FontSize',16) title('Interpolated field'); set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 25 12 ]); print -dpng divand_simple_example.png

### Simple realistic example

A realistic example of divand in 2 dimensions with Salinity observations in the Mediterranean Sea at 30m is provided. Observations can be downloaded from http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/data.dat and the bathymetry is available at http://modb.oce.ulg.ac.be/mediawiki/upload/Alex/divand/diva_bath.nc.

% load observations load data.dat % extract columns of data lonobs = data(:,1); latobs = data(:,2); S = data(:,3); % load bathymetry % bath is negative in water and positive in air bat = ncread('diva_bath.nc','bat'); lon = ncread('diva_bath.nc','lon'); lat = ncread('diva_bath.nc','lat'); % compute grid metric % pm and pn are in meters^-1 [lon,lat] = ndgrid(lon,lat); [pm,pn] = divand_metric(lon,lat); % compute mean and anomalies Smean = mean(S); Sanom = S - mean(S); % correlation length (in meters) len = 200e3; % signal-to-noise ratio lambda = 0.5; % land-sea mask % mask everything below 30 m mask = bat < -30; % mask for the analysis maska = mask; % remove Atlantic maska(1:60,130:end) = 0; % remove Black Sea maska(323:end,100:end) = 0; % make analysis Sa = divand(maska,{pm,pn},{lon,lat},{lonobs,latobs},Sanom,len,lambda); % add mean back Sa2 = Sa + Smean; % create plots % color axis ca = [36.2520 39.4480]; % aspect ratio ar = [1 cos(mean(lat(:)) * pi/180) 1]; subplot(2,1,1); scatter(lonobs,latobs,10,S) caxis(ca); colorbar hold on, contour(lon,lat,mask,0.5,'k') xlim(lon([1 end])) ylim(lat([1 end])) title('Observations'); set(gca,'DataAspectRatio',ar,'Layer','top') subplot(2,1,2); pcolor(lon,lat,Sa2), shading flat,colorbar hold on plot(lonobs,latobs,'k.','MarkerSize',1) caxis(ca) contour(lon,lat,mask,0.5,'k') xlim(lon([1 end])) ylim(lat([1 end])) title('Analysis'); set(gca,'DataAspectRatio',ar,'Layer','top') set(gcf,'PaperUnits','centimeters','PaperPosition',[0 0 15 15 ]); print -dpng divand_realistic_example.png

This example gives the following output:

### Determining the parameters

The parameter lambda and parameter len are crucial for the analysis. lambda corresponds to the signal-to-noise ratio (variance of background error over variance of observation error). Therefore, its value depends on how accurate and how representative the observations are. The value of len can sometimes be determined by physical arguments. One statistical way to determine the parameter(s) is to do a cross-validation.

- choose, at random, a relatively small subset of observations (about 5%). This is the validation data set.
- make the analysis without your validation data set
- compare the analysis to your validation data set and compute the RMS difference
- repeat steps 2 and 3 with different values of the parameters and try to minimize the RMS difference.

The minimization of the RMS difference can be automated using e.g. the octave/matlab function `fminsearch`.
You can repeat all steps with a different validation data set to ensure that the optimal parameters values are robust.

# Fun

A educational web application has been developed to reconstruct a field based on point "observations". The user must choose in an optimal way the location of 10 observations such that the analysed field obtained by divand based on these observation is as close as possible to the original field.

# Troubleshooting

Cholesky factorization fails with the following error:

??? Error using ==> chol CHOLMOD internal error. Error in ==> CovarIS.factorize at 5 [self.RP, self.q, self.PP] = chol (self.IS);

This might happen if:

- memory is insufficient,
- matrix contains a NaN introduced through metric scale factor equal to zero or a user constraint.

## ichol failed

You get the error:

Error using ichol Encountered nonpositive pivot.

You might try diagonal compensation, e.g.

icholparam = struct('michol','on','diagcomp',0.0036);

or the incomplete Cholesky decomposition with threshold dropping (ict) and diagonal compensation:

icholparam = struct('type','ict','droptol',1e-6,'diagcomp',0.0001);

To pass the ichol parameters you must call `divand` like this:

[fi,s] = divand(mask,{pm,pn,pk},{xi,yi,zi},{x,y,z},f,len,lambda,... 'diagnostics',1,'primal',1,'inversion','pcg',... 'tol',1e-6,'maxit',50000,... 'compPC',@(iB,H,R) divand_pc_michol(iB,H,R,icholparam));

At bit of experimentation is necessary. Try to use lowest value of diagcomp (without triggering the error in ichol) and the lowest value of troptol for the ict decomposition (without running out of memory).