Frequently Asked Questions
about MPEG Splicing and the SMPTE Splicing Standard
December 8, 1997
Norm Hurst, Katie Cornog
What is splicing?
Splicing is the process of connecting the end of one MPEG stream (the "Old" stream) to the beginning of another (the "New" stream) to create a stream that may be decoded with minimal artifacts at the splice point. If the splice results in an unbroken sequence of frames, it is called a "seamless" splice. Otherwise it is called a "non-seamless" splice. The "end" of the Old stream may occur in the middle of a larger stream, as may the "beginning" of the New Stream.
Elementary streams may be spliced, or PES streams may be spliced, but the proposed standard addresses the splicing of transport streams. Program streams, as used for example with DVD, are not considered by the proposed standard.
What is spliced?
Programs carried in MPEG-2 Transport streams. A program is a set of data streams such as video, audio, and related data that share a common timebase. Individual PID streams may also be spliced. Splicing occurs at transport packet boundaries.
Can this splicing standard be used with ATSC streams?
Can this splicing standard be used with non-ATSC MP@ML streams?
What does the proposed standard describe?
The proposed standard describes constraints for MPEG-2 transport streams that enable them to be spliced without modifying the payload of the PES packets contained therein.
What does the proposed standard not describe?
It does not describe how to create a splicing device, how to synchronize two streams for splicing, how often splice points should occur in a stream, or how to cook a chicken.
What is a "splice point"?
A splice point is a location in a bitstream that provides an opportunity to splice. It is an attribute of an individual PID stream, not of a program; each PID of a program must have a splice point which corresponds to each splice point in the PCR_PID. Such a set of corresponding splice points is called a "program splice point" (see below). A splice point is not a command to splice, any more than the vertical interval is a command to splice. A splice point is marked by syntax in a transport packet. The "point" itself is an imaginary point between two packets of a particular PID stream.
How often do splice points occur?
This is to be determined by users, and is not specified by the SMPTE proposed standard. The maximum rate is limited by GOP structure, and the minimum rate is zero. Splice Points may occur at irregular intervals. A commercial might have one at the beginning and one at the end. A live event might insert them regularly, perhaps once every second, or once every GOP. A commercial might fade to black and then have a splice point for each of the last 15 frames.
Is a splice point a "command" to splice?
No. It merely represents an opportunity to splice.
What are "In Points" and "Out Points"?
They are the two different types of splice points. An Out Point is a location where the underlying elementary stream is well-constrained for a clean exit: immediately after an I or P frame (in presentation order). An In Point is a location where the stream is well-constrained for a clean entry: just before a sequence header and I frame at a closed GOP (no prediction is allowed back before the In Point). An In Point is always co-located with an Out Point (except at the start of a stream), however an Out Point is not necessarily co-located with an In Point.
What are "In Point Packets" (IPPs) and "Out Point Packets" (OPPs)?
An Out Point Packet is the packet of a PID stream immediately preceding an Out Point. All OPPs have splicing_point_flag = 1, splice_countdown =0, and seamless_splice_flag = 1 (even for non-seamless out points, since non-seamless out points require DTS_next_AU to be present).
An In Point Packet is the packet of a PID stream immediately following an In Point. All IPPs have splicing_point_flag = 1, splice_countdown = –1, and seamless_splice_flag = 1 (even for non-seamless in points, since non-seamless in points require DTS_next_AU to be present.) In addition, IPPs must have payload_unit_start_indicator = 1, random_access_indicator = 1, and data_alignment_indicator (in the PES header) =1.
How do "seamless" and "non-seamless" splice points differ?
Seamless splice points require that the underlying video stream be encoded with a specific buffer delay at the splice point, but non-seamless splice points do not require this.
How can you tell a seamless splice point from a non-seamless one?
If the seamless_splice_flag in the transport packet header is ‘1’ and the splice_type is not ‘0xF’, the splice point is seamless. splice_type of ‘0xF’ indicates an ‘unknown’ splice type. This case is allowed so that DTS_next_AU may be carried in non-seamless splices to allow splice calculations to be made without having to parse the PES packet header.
If the buffer delay is not controlled at a splice point, doesn't a non-seamless splice risk buffer overflow?
It is possible to avoid buffer overflow if the splicer inserts an appropriate amount of delay (dead time) between the end of the old stream and the start of the new stream. This gives the buffer time to drain down to a level that is safe. The required amount of new-stream delay may be calculated by the splicing device from data carried in the transport stream headers of the old stream at the time of the splice.
Can the display time of the first frame of the new stream be controlled exactly (to the frame)?
For seamless splices, yes, since this is part of the definition of a Seamless Splice. With non-seamless splicing, since the splicing device delays the new stream by an amount that depends on how the old stream ends, the exact start time may vary by a number of frames.
Is audio spliced seamlessly?
The proposed standard does not prevent it, but much depends on the structure of the audio stream. If the audio is organized into frames, but the audio frame size does not exactly match the video frame size (e.g. 1/30 sec), then the boundaries of the audio and video frames will be rarely if ever aligned. This is the case with both AC-3 and MPEG-2 audio. Moreover, if the Old and New video frame boundaries of two programs are aligned, the audio frames of the two programs will likely not be aligned to each other. The result is a "gap" in the audio frame sequence which will be one audio frame or less in length. This will usually cause the audio buffer to become empty. An audio system with a frame size which matches the video frame size may be spliced seamlessly.
Does splicing introduce significant delay in the signal path?
Not necessarily. If the new stream is ready to start when the Out Point of the Old stream arrives, then the new stream can be cut in without delay. This presumes control over the start time of the new stream. However, if there is no control over either of the streams and no coordination of the location of splice points, a splicing device may use buffers to attempt to perform stream synchronization on the fly, and this may introduce delay.
How are spliceable streams different from streams that are not spliceable?
Spliceable streams contain one or more splice points. If a video splice point is of the "seamless" variety, then that video elementary stream must meet certain VBV buffer constraints: it must have a particular decoding delay at the splice point.
Is AC-3 supported?
Yes in the sense that the Proposed Standard does not explicitly mention any particular audio compression scheme.
Is MPEG audio supported?
Yes in the sense that the Proposed Standard does not explicitly mention any particular audio compression scheme.
Is uncompressed audio supported?
Yes, uncompressed audio could be used, and if the length of the audio frames matched the length of the video frames, then the so-called audio "gaps" at the splice could be avoided.
Gaps? What are audio gaps?
Because most audio compression systems deal with "frames" of audio which are slightly different in display duration than video frames (e.g. 32 ms vs. 33 ms) the frame boundaries almost never line up in a given stream. And the audio frames in a second stream will likely not align with the audio frames in the first. As a result you can butt-edit the audio or the video frames but not both. If you choose to align video frames, you can either have two overlapping audio frames or a gap between two audio frames. The first represents an audio buffer overflow (very bad) and the second represents a buffer underflow (not as bad).
The standard defines "Program Splice Points" as groups of splice points in the various PIDs of a program which correspond in presentation time. Do I have to splice all the PID streams there?
No. Individual PID streams may be spliced at different times.
Can split audio edits be performed?
Yes. Individual PID streams may be spliced at different times.
Can a voiceover be performed by replacing just the audio stream?
Yes. Individual PID streams may be spliced.
Are multi-program transport streams supported?
Yes: any number of PID streams may be removed and replaced by splicing.
Is statmux supported?
Yes in the sense that a stream may be removed and replaced by splicing, although splicing does not solve the problem of fitting a higher bitrate stream in place of a lower rate stream.
Can I splice streams of different bitrates?
Yes. If the new stream is a lower bitrate, then there is no danger of overflow. If the new stream is a higher bitrate, there is a danger of overflowing the VBV buffer during the time when the new stream is entering the buffer and old stream bits still remain in the buffer. A seamless Out Point is carefully constrained during encoding to protect against overflow during this time (up to some max_splice_rate). Non-seamless Out Points are not constrained in this way and the splicing device must protect against overflow.
Can streams of different picture sizes and aspect ratios be spliced? E.g. 1920x1080 to 704x480?
Yes. MPEG requires the video elementary stream to contain the four-byte sequence_end_code between these streams. The Out Point Packet (OPP) contains exactly four bytes of stuffing (zeros), so a splicing device may insert the sequence_end_code by simply swapping the OPP packet.
Can streams of different frame rates be spliced? E.g. 24 and 30? Can this be done seamlessly?
Yes. Again, a sequence_end_code is required between streams when changing any of the values in the sequence_header (except quant matrices). For streams conformant with the SMPTE splicing constraints, inserting this sequence_end_code (SEC) is simply a matter of swapping the OPP with one that contains an SEC.
The presentation process at a change of frame rate should be timed to the decode process (and not the other way around). Consider this sequence. Note the relative spacing showing the frame rate change:
2 4 H z -splice 3 0 H z
I0 P3 B1 B2 P6 B4 B5 i0 p3 b1 b2 p6 b4 b5
The MPEG rule for decode/present is that an anchor frame (I or P) is presented when the next anchor frame is decoded, and B frames are decoded and presented simultaneously.
decode: I0 P3 B1 B2 P6 B4 B5 i0 p3 b1 b2 p6 b4 b5
present: - I0 B1 B2 P3 B4 B5 P6 i0 b1 b2 p3 b4 b5 p6
Note that the duration of P6 must be 1/30 second instead of 1/24 second. For seamless splices the decode time for i0 is determined by the display duration of B5. P6 is presented when i0 is decoded. The frame p3 must be decoded 1/30 second after i0 was decoded, and i0 must be displayed at that time. Thus the presentation of P6 ends 1/30 second after it began, although it was a frame from a 24 fps stream.
Is a stream either "seamless" or "non-seamless"?
No, rather it is splice points (In Points and Out Points) which have the characteristic of being seamless or non-seamless by virtue of the constraints at that point in the stream. A given stream may contain both seamless and non-seamless splice points.
How do you perform an unscheduled "emergency" splice?
The document does not describe how to build a splicing device. However, various techniques of non-seamless splicing may be used, but there may not be "splice points" marked as such in the stream, so a splicing device may need to analyze the payload of the transport stream to find I frames and other relevant data.
Can encrypted streams be spliced without first decrypting them?
This breaks down into the following three sub-questions:
Can a splicer calculate buffer delays without being able to see anything but transport headers?
Yes, since all splice point packets (seamless and non-seamless) carry DTS_next_AU.
Can the encryption keys be sent sufficiently "soon enough" after a splice?
This depends on the particular encryption system.
What about timestamps in encrypted-stream splicing?
MPEG requires that no PTS or DTS refer to a previous PCR clock after a PCR discontinuity, but at a splice there is an overlap of about 200 ms where the audio may contain an old PTS after the video has been spliced. If the audio PTS were in the clear, a splicer could remove or restamp the offending PTS, but it cannot do this on an encrypted stream. Encrypted streams might need to be created to avoid having audio PTSs in this overlap period, or at least send these timestamps in the clear. The proposed standard does not require this.
How are two streams to be synchronized for splicing?
This is outside the scope of the proposed standard. However, the solution requires making an Old stream Out Point and a New Stream In Point arrive simultaneously at the splicing device. Solutions to this problem should be determined at the studio architecture level.
Can you insert a pre-encoded stream into a live stream? Can you switch from one live stream to another live stream? Can you splice two pre-encoded streams?
The answers to these questions hinge on whether you there is an Out Point in the Old stream where you need to go out, and if you have control over the start time of the New stream (e.g. control of a local server). Networks and affiliates/headends should make agreements on timing of splice points. See "How often do splice points occur?", above.
How are PATs and PMTs handled at a splice?
The proposed standard does not address these issues, but splicing devices should be careful to avoid creating non-compliant streams. If a splice causes the contents of a transport stream, including the PID values within the transport stream, to change, then the changes must be reflected in valid PSI packets. Splicing devices are responsible for sending any alterations to the PMT required to accommodate changes in the number of PIDs after a splice. However, in order to prevent "commercial killing" devices from taking advantage of changes in PSI, systems with splicing are encouraged to avoid changes to the PMT by reusing existing video and audio PIDs after a splice.
Why does Table 4 use a lower value for ATSC max_splice_rate than MPEG's MP@HL?
The max_splice_rate in MPEG is the ratio of the maximum vbv_buffer_size for that profile/level to the splice_decoding_delay for the given splice_type. ATSC specifies a smaller vbv_buffer size than Main@High, but that only lowers the max_splice_rate from 38 to 31 Mbps. But today it is rather difficult to meet even this lower max_splice_rate constraint with a 19.39 Mbps stream, and since almost no one initially will be using a rate higher than 19.39, the max_splice_rate has been chosen to maximize the picture quality for the most users.
How can the OPP contain both the last byte of the picture data and also be defined as 0x00000000?
MPEG does not allow sequence_header values to change in mid-sequence. This includes picture size and frame rate. In order to splice streams of different picture size or frame rate, the old stream needs to be "ended" by inserting a sequence_end_code, which is four bytes long. Just appending this code would upset the VBV buffer accounting, so to keep the total bitcount the same, we must remove four bytes in order to add the sequence end code. Since we don't want to remove four bytes that are important, the proposed standard requires that at least the last four bytes at the out point be stuffing bytes. Then these four bytes can be replaced with a sequence_end_code without changing the bitcount for the picture.
This change is further facilitated by requiring that these four bytes be in their own packet, so a splicer only needs to swap a packet to insert a sequence_end_code.
What about broken_link?
The standard prohibits the use of backward prediction by B pictures which immediately follow an In Point. Splicers are not required to support broken_link in any way.
Do the audio and video splice points occur at the same point in the stream?
Usually not, because the video usually requires substantially more time to decode, and rather than have extra audio delay in every decoder, this delay is essentially placed at the encoder so that the video appears earlier in the stream, on the order of 200 ms. It is very much like the sound track along the side of a piece of movie film: because the projector's sound playback head is several inches downstream of the film gate, the sound track is displaced from the corresponding frames by several inches. The difference is that in MPEG this displacement varies with VBV buffer fullness.
What is "restamping"?
Imagine a server with many bitstream files on it, perhaps commercials, and each one begins with a PCR value of zero. Now we splice one after the other. At each splice the PCR suddenly resets to zero!
There are two options in this case. One is to declare a timing discontinuity by setting the discontinuity_indicator flag to '1' in the IPP after the splice. This bit tells decoders to jam their PCR PLL counters with the next PCR value rather than trying to slew the VCO to catch up. MPEG compliance requires that decoders perform well through timing discontinuities.
Another solution is for the splicing device to change all the timing values in the new stream (PCRs, PTSs, and DTSs) to correct for the offset induced by the splice. This process, called "restamping", eliminates the timing discontinuity (although it does not compensate for a change in clock rate) . It is more complex than just setting the discontinuity_indicator at the splice point, and it is not possible for encrypted streams since PTS and DTS values are encrypted. However, the result is a timing sequence with no breaks.
Is PCR/PTS/DTS restamping after splicing required?
No. (Restamping refers to changing all the time-sample values that appear in a stream, usually by adding an offset to each one.) Restamping at a splice can be done to avoid causing a PCR (timing) discontinuity at the splice. Alternatively, the "discontinuity_indicator" flag may be set right after the splice to indicate to decoders that they should reset their local PLL counters. If the stream is encrypted, then restamping is impossible since the PTS and DTS values are carried in the PES headers, which are encrypted as part of the transport payload. PCRs could be restamped, but that would make the un-restamped PTSs and DTSs invalid.
Is setting the discontinuity_indicator at a splice sufficient to assure MPEG-compliance in regards to timestamps?
No. PCRs are carried in only one PID of each program, called the PCR_PID (often it is the same PID that carries the video). MPEG requires that once a PCR discontinuity is declared in that PID stream, no other PID stream thereafter may carry a PTS or DTS that refers to the old PCR.
Now consider that the video is sent perhaps 200 ms earlier in the transport stream than its corresponding audio. After splicing to the new video PID (and declaring a PCR discontinuity) there remains about 200 ms of old audio packets multiplexed with the new video. If any of these audio packets contains a PTS value that refers to the old PCR clock, it must either be restamped or removed. Otherwise the stream will not conform to MPEG requirements.
How is closed captioning handled across a splice?
A new closed captioning message must begin at an In Point. The closed caption decoder sees the beginning of a new caption and terminates the previous message if it has not been completely received.
Can the splicing standard work with data streams?
Yes, if they are carried in PES packets.
Are both CBR and VBR VBV modes supported?
Yes, for both seamless and non-seamless splicing. Since VBR streams (vbv_delay = FFFF) initialize the VBV buffer by completely filling the buffer at the bitrate specified in the sequence_header() before decoding the first picture, the startup delay is therefore the vbv_size divided by the bitrate. The bitrate should be selected to determine the splice_decoding_delay, i.e. the rate that will fill the VBV buffer from empty in splice_decoding_delay seconds. For most profile/levels, this rate will be the max_splice_rate.
Can VBR and CBR streams be spliced together?
Can splicing devices be cascaded? Can splice points be reused?
Yes, splice points remain after a splicing operation.
Do any MPEG syntax elements take on extended semantics in the proposed standard?
If the splice_countdown is -1 and random_access_indicator is 1, this indicates that an I frame follows (it is an In Point). Otherwise (and this is the more significant point) it means that the constraints for an In Points are not met.
How is splice_countdown used?
When it is '0' it means that an Out Point follows the packet. When it is '-1' and data_alignment_indicator = 1, it means that an In Point precedes the packet.
How do repeat_first_field and top_field_first affect splicing?
The last displayed field at an Out Point must be a bottom field. The first field after an In Point must be a top field.
Do splice points enable commercial killers?
It should be a simple matter to remove splicing information from a stream before it is aired. However, any discontinuity will be a clue to a commercial killer, and even seamless splices will probably have an audio discontinuity.
Can audio-only transport streams be spliced (streams without video)?
What is the Splice Information Section?
The Splice Information Section is a mechanism for carrying splicing commands in the transport stream. Each Splice Information Section contains information about a splice event. The splice event may be one of three types: schedule, preroll or execute.
What’s the difference between schedule, preroll and execute splicing commands?
The schedule command is used to tell a downstream splicer a schedule of upcoming splices. The schedule command must be followed by a separate execute command for each splice event. The preroll command is optional. It gives a relative time warning of a splice. Prerolls are sent a few seconds prior to the arrival of the splice point and may be sent repeatedly as a countdown to the splice execute command. The execute command is the only command that actually causes a splice to occur.
How is the correct splice point identified in the splice command?
The command contains a splice_time field. For the execute command the splice_time value must be given in the 90 khz clock ticks. For an Out Point, the splice_time value is set to be the same as the value in the DTS_next_AU field of the Out Point Packet of the PCR_PID. For an In Point, the splice_time value is set to be the same as the value in the DTS_next_AU field of the In Point Packet. The splice_time value may be optionally given as a SMPTE time code in addition to the DTS value. For the schedule command, time code alone may be used.
Where is the Splice Information Section carried in the stream?
The Splice Information Section is carried on a per program basis in a separate PID pointed to by the program’s PMT. The commands contained in that PID apply to only that program.