5 MOTION ESTIMATION AND COMPENSATION

To exploit temporal redundancy, motion estimation and compensation are used for prediction.

Prediction is called forward if reference is made to a frame in the past (in display order) and called backward if reference is made to a frame in the future. It is called interpolative if reference is made to both future and past.

For this TM the search range should be appropriate for each sequence, and therefore a vector search range per sequence is listed below:

Sequence Frame vertical rangeField vertical rangeHorizontal range
Table Tennis± 15 samples± 3 samples± 7 samples
Flower Garden± 15 samples± 3 samples± 7 samples
Calendar± 15 samples± 3 samples± 7 samples
Popple± 15 samples± 3 samples± 7 samples
Football± 31 samples± 7 samples± 15 samples
PRL Car± 63 samples± 15 samples± 31 samples

A positive value of the horizontal or vertical component of the motion vector signifies that the prediction is formed from pixels in the referenced frame, which are spatially to the right or below the pixels being predicted.

5.1 Motion Vector Estimation

For the P and B-frames, two types of motion vectors, Frame Motion Vectors and Field Motion Vectors, will be estimated for each macroblock. In the case of Frame Motion Vectors, one motion vector will be generated in each direction per macroblock, which corresponds to a 16x16 pels luminance area. For the case of Field Motion Vectors, two motion vectors per macroblock will be generated for each direction, one for each of the fields. Each vector corresponds to a 16x8 pels luminance area.

The algorithm uses two steps. First a full search algorithm is applied on original pictures with full pel accuracy. Second a half pel refinement is used, using the local decoded picture.

5.1.1 Full Search

A simplified Frame and Field Motion Estimation routine is listed below. In this routine the following relation is used:


  (AE of Frame) = (AE of FIELD1) + (AE of FIELD2)

where AE represents a sum of absolute errors.

With this routine three vectors are calculated, MV_FIELD1, MV_FIELD2 and MV_FRAME.

  Min_FRAME  = MAXINT;

  Min_FIELD1 = MAXINT;

  Min_FIELD2 = MAXINT;

  for (y = -YRange; y < YRange; y++)
  {

    for (x = -XRange; x < XRange; x++)
    {
      AE_FIELD1 = AE_Macroblock( prediction_mb(x,y), lines_of_FIELD1_of_current_mb );
      AE_FIELD2 = AE_Macroblock( prediction_mb(x,y), lines_of_FIELD2_of_current_mb );

      AE_FRAME = AE_FIELD1 + AE_FIELD2;

      if (AE_FIELD1 < Min_FIELD1)
      {
        MV_FIELD1 = (x,y);
        Min_FIELD1 = AE_FIELD1;
      }

      if (AE_FIELD2 < Min_FIELD2)
      {
        MV_FIELD2 = (x,y);
        Min_FIELD2 = AE_FIELD2;
      }

      if (AE_FRAME < Min_FRAME)
      {
        MV_FRAME = (x,y);
        Min_FRAME = AE_FRAME;
      }
    }
  }

The search is constrained to take place within the boundaries of the significant pel area. Motion vectors which refer to pixels outside the significant pel area are excluded.

5.1.2 Half pel search

The half pel refinement uses the eight neighbouring half-pel positions in the referenced corresponding local decoded field or frame which are evaluated in the following order:

1
2
3
4
0
5
6
7
8

where 0 represents the previously evaluated integer-pel position. The value of the spatially interpolated pels are calculated as follows:

S(x+0.5,y ) = (S(x,y)+S(x+1,y))//2,
S(x ,y+0.5) = (S(x,y)+S(x,y+1))//2,
S(x+0.5,y+0.5) = (S(x,y)+S(x+1,y)+S(x,y+1)+S(x+1,y+1))//4.

where x, y are the integer-pel horizontal and vertical coordinates, and S is the pel value. If two or more positions have the same total absolute difference, the first is used for motion estimation.

NOTE: In field searches, the refence system is the correspondig field. In a field the line distance is 1.

5.1.3 Motion estimation for Special prediction mode

The first step is to obtain four candidate motion vectors as follows :

First, four field motion vectors with half-pel accuracy from reference field 1 / field 2 to predicted field 1 / field 2 are searched by normal motion vector search defined in the Test Model. Then these vectors are appropriately scaled, if the parity of the predicted field is opposite to that of the predicted field.

The second step is to evaluate the prediction errors of Dual-prime prediction using possible combinations of four candidate motion vectors obtained by the first step, and 3Vx3H = 9 candidate differential motion vectors.

The prediction error is computed using the reconstructed pictures. The combination with the smallest MSE is selected.

5.2 Motion Compensation

Motion compensation is performed differently for field coding and for frame coding. General formulas for frame and field coding are listed below.

Forward motion compensation is performed as follows:

S(x, y) = S1(x + FMVx(x, y), y + FMVy(x, y))

Backward motion compensation is performed as follows:

S(x, y) = SM+1(x + BMVx(x,y), y + BMVy(x,y))

Temporal interpolation is performed by averaging.

S(x,y) = ( S1(x + FMVx(x,y) , y + FMVy(x,y)) + SM+1(x + BMVx(x,y), y + BMVy(x,y)))//2

where FMV is the forward motion compensated macroblock, thus making reference to a 'previous picture', and BMV is the backward motion compensated macroblock, making reference to a 'future picture'.

A displacement vector for the chrominance is derived by halving the component values of the corresponding MB vector, using the formula from CD 11172, section ......:
right_for = (recon_right_for / 2) >> 1;
down_for = (recon_down_for / 2) >> 1;
right_half_for = recon_right_for/2 - 2*right_for;
down_half_for = recon_down_for/2 - 2*down_for;

5.2.1 Frame Motion Compensation

In frame prediction macroblocks there is one vector per macroblock. Vectors measure displacements on a frame sampling grid. Therefore an odd-valued vertical displacement causes a prediction from the fields of opposite parity. Vertical half pixel values are interpolated between samples from fields of opposite parity. Chrominance vectors are obtained directly by using the formulae above. The vertical motion compensation is illustrated in figure 5.1.


Figure 5.1: Frame Motion Compensation

5.2.2 Field Motion Compensation

Field-based MV is expressed in the very same way as frame-based vectors would be if the source (reference) field and the destination field were considered as "frames" (see Figure).

Considering that in each field, lines are numbers 1.0, 2.0, 3.0, ... (1 is the top line of the field), if the pel located at line "n" of the destination field is predicted from line "m" of the reference field, the vertical coordinate of the field vector is "n-m".

Note: when coding the motion vectors, "m" and "n" are expressed in units of one vertical half-pel in the field.

When necessary, motion_vertical_field_select (one bit) will be transmitted to identify the selected field.


4:2:0 format

4:2:2 format and 4:4:4 format

5.2.2.1. Chrominance Field-based MV

In 4:2:0 sequences :

The vertical coordinate of the chrominance Field-based MV is derived by dividing by 2 the vertical coordinate of the luminance Field-based MV, as done in MPEG-1.

The horizontal coordinate of the chrominance MV (Field-based or Frame-based) is derived by dividing by 2 the horizontal coordinate of the luminance MV, as done in MPEG-1.

In 4:2:2 sequences :

The vertical coordinate of the Field-based MV for chrominance is equal to the vertical coordinate of the luminanceField-based MV.

The horizontal coordinate of the chrominance MV (Field-based or Frame-based) is derived by dividing by 2 the horizontal coordinate of the luminance MV, as done in MPEG-1.

In 4:4:4 sequences :

The horizontal (resp. vertical) coordinate of MV for chrominance is equal to the horizontal (resp. vertical) coordinate of the luminance MV.

5.3 Special prediction mode

5.3.1. Overview of Special Prediction mode

There is only one special prediction mode (Dual-prime) remaining in this Test Model and this is based on Field-based prediction. THIS IS ONLY USED FOR M=1 CODING (NO B_FRAMES) FOR THE MAIN PROFILE, MAIN LEVEL. FOR OTHER PROFILES AND LEVELS IT HAS NOT BEEN DECIDED. This mode has been included in particular for low delay applications.

Dual Prime prediction involves the averaging of two forward field based predictions from the last two nearest decoded fields (in time).

In the syntax of the Special prediction mode, for forward prediction, one field motion vector is transmitted, followed by a differential motion vector. Each of the coordinates of the differential motion vector is limited to the values [-1, 0, +1] (half pixel values), and is transmitted with a 1-2 bit code.

Combinations of the transmitted field motion vector (possibly scaled according to the field temporal distance) and of the differential motion vector are used for the prediction, as described in the following sections. A separate section defines precisely how field motion vectors are scaled.


Figure 1 : Special prediction mode (frame structure picture coding mode)

Plain arrows represent the transmitted field motion vector. Dashed arrows represent the scaled-up or scaled-down field motion vectors. Vertical arrows represent the transmitted differential motion vector.

5.3.2. Specification of Dual-prime vectors

Motion vectors used for Dual-prime prediction are field motion vectors obtained as follows:

1. If the reference field and the predicted field are same parity, the field motion vector used is equal to the transmitted field motion vector.

2. If the reference field and the predicted field are different parity, the field motion vector used is obtained by adding the differential motion vector to the scaled transmitted motion vector.

NOTE: that the same differential motion vector is used for the scaled-down and the scaled-up field motion vectors.

5.3.3. Temporal Scaling of the Field Motion Vector

The transmitted field motion vector (x, y) corresponds to the temporal distance between two fields of same parity. The horizontal and vertical coordinate are in 1/2-pel units.

The transmitted field motion vector is used for computing two scaled field motion vectors that serve in the Special prediction mode when reference field and predicted fields are opposite parity. One of the scaled field motion vectors is longer ("scaled-up"), the other one is shorter ("scaled-down").

Scaling is done as follows :

If the same parity reference frame is at a distance of 2*k fields from the predicted field, the coordinates (x', y') of the scaled motion vector used for accessing the different-parity field is computed as follows:

x' = (x * K) // 32 (x and x' are integers)
y' = ((y * K) // 32) + e (y and y' are integers)

K = (m * 16) // k (k is integer)

m = field-distance between the predicted field and the different-parity-field. NOTE: FURTHER APPROXIMATION OF SCALING SHALL BE REDEFINED(See MPEG93/227)

The "e" is an adjustment necessary to reflect the vertical shift between the lines of field 1 and field 2. To give an example, line 1 of field 2 is in fact located 1/2 line under line 1 of field 1.

e is defined as follows :

e = -1 if the reference field corresponding to the scaled vector is field 2
e = +1 if the reference field corresponding to the scaled vector is field 1

[NOTE: The formula assumes frame based coding and will be updated]

5.3.4. Prediction of Chrominance Blocks

The motion vector used for chrominance is obtained from the luminance Dual-prime motion vector with precisely the same rule as in the case of field-based prediction (for 4:2:0 : divide each coordinate by 2 as described section 5.2.2.1. of TM). The rules of prediction are same as for lumanance.


  • Back to Contents