Exactly this. It isn’t even really “stitching” as YouTube videos are served as a series of short chunks anyways. So you simply tell the player that there are a few extra chunks which happen to be ads. There is no video processing required it is basically free to do it this way on the sever side.
That is true. But then you could probably use the chunk length to determine where the ads starts and ends since there is with a very high probability an unusually long chunk at those times.
I don’t know about YouTube but the chunks are often a fixed length. For example 1 or 2 seconds. So as long as the ad itself is an even number of seconds (which YouTube can require, or just pad the add to the nearest second) so there is no concrete difference between the 1s “content” chunks vs the 1s “ad” chunks.
If you are trying to predict the ad chunks you are probably better off doing things like detecting sudden loudness changes, different colour tones or similar. But this will always be imperfect as these could just be scene changes that happened to be chunk aligned in the content.
Exactly this. It isn’t even really “stitching” as YouTube videos are served as a series of short chunks anyways. So you simply tell the player that there are a few extra chunks which happen to be ads. There is no video processing required it is basically free to do it this way on the sever side.
That is true. But then you could probably use the chunk length to determine where the ads starts and ends since there is with a very high probability an unusually long chunk at those times.
I don’t know about YouTube but the chunks are often a fixed length. For example 1 or 2 seconds. So as long as the ad itself is an even number of seconds (which YouTube can require, or just pad the add to the nearest second) so there is no concrete difference between the 1s “content” chunks vs the 1s “ad” chunks.
If you are trying to predict the ad chunks you are probably better off doing things like detecting sudden loudness changes, different colour tones or similar. But this will always be imperfect as these could just be scene changes that happened to be chunk aligned in the content.