GSoC Red Hen Lab — Week 12

Harshith Mohan Kumar
4 min readSep 10, 2022
GSoC 2022 Red Hen Lab

Pipeline for Multimodal Television Show Segmentation

Hi, welcome back! If you haven’t already read the introductory article based on the community bonding period then I strongly recommend you do so before continuing.

As sad as it may be, this week marks the final week of the coding phase. This is the closing chapter of my wonderful GSoC journey. Over these twelve wonderful weeks, I’ve developed and refined many new skills. I’m forever grateful for my wonderful mentors who have blessed me with this opportunity. This week the main focus was around wrapping up the project by refining the documentation and producing visualizations and performance metrics for the final analysis of the pipeline.

Goals for the week

  1. Map out the outputs for an individual file across the various stages of the pipeline.
  2. Improve documentation of the codebase.

Work Done

In order to track the outputs of the various stages for an individual mp4 file, I wrote a Jupyter notebook to extract and store these outputs. Starting with the music segmentation stage, since all of the files have already been processed all I had to do was open up the CSV which has been produced.

The following screenshot displays the start and stops timestamps where music was found in the audio. This output has been collected for the file:1996–08–01_0000_US_00017469_V2_VHS52_MB19_E4_MB.mp4

Music segmentation output

During the next stage, five keyframes are selected from the range between the start and stop timestamps. The image classification model is used to predict if the keyframes are commercials or title sequences. The average confidence is computed and assigned as the final decision for its overall prediction.

The following screenshot depicts the keyframes extracted from the timestamps. Although very small, if you zoom in on these individual keyframes the classification output is displayed on the top along with its corresponding confidence score.

Keyframes extracted from the music segmentation intervals

These keyframes are then passed to the clustering phase where the RNN-DBSCAN algorithm assigns clusters to each keyframe. The following screenshot shows the index values of keyframes within a particular cluster.

Jupyter notebook output of index values of keyframes within a cluster.

These index values don’t indicate much to us, therefore by backtracking I was able to find the according keyframes made by these indices.

Keyframe images with index values [61,62,63,64,65] starting from the left most image.

Notice I’ve plotted an extra image [index=65] , this is to visually illustrate the performance of the clustering algorithm.

In addition to backtracking through the outputs, I also spent some time refining the code and adding documentation to all of the methods. This is one of the most important coding practices since it allows others to interpret and expand on your code.

Conclusion

An important conclusion to draw from this illustrative tracing of the pipeline is that the Silhouette Coefficient, however low it may be, is not a good indicator of performance in this senario. Previously I observed very low values of the Silhouette Coefficient, but from visually analyzing the keyframes within the clusters, we are able to see that the clustering algorithm is grouping together very similar keyframes.

It may be that the Silhouette coefficient is being skewed by closely grouped clusters. Futher analysis would need to performed on this and would be a potential future project.

The last few months have been quite wonderful! This Google Summer of Code project has made me realize how much I love designing and building Machine Learning powered tools. The challenges of dealing with enormous amounts of data and compute power required for these ML models has pushed me to explore the deepest points of software engineering and computer science.

--

--