GSoC Red Hen Lab — Week 5

3 min readJul 24, 2022

Pipeline for Multimodal Television Show Segmentation

Hi, welcome back! If you haven’t already read the introductory article based on the community bonding period then I strongly recommend you do so before continuing.

The objective for this week was to scale up the functional working pipeline and extract music segments for thousands of mp4 files. Additionally, I started to transition to the next step within stage one which pertains to filtering out false positives using image classification. The idea is to train an image classifier to detect commercials and title sequences. Since a majority of music segments also fall under commercials, this filtering technique would allow to filter out any false positives.

Goals for the week

Extract images from music segments
Run music segmentation pipeline as an array batch job on category one files.
Label extracted images as either title sequence or commercial.

Work Done

I started by submitting the existing code as an array job to the slurm controller. Initially I ran into a few issues with the slurm file. Since this was my first time trying to run batch jobs, It was quite difficult to familiarize myself with the logs and job command fields. Ultimately I was not able to run the file since the HPC slurm controller was scheduled for maintenance later in the week and they had stopped job scheduling ahead of time.

The HPC was down for maintenance this week for three days. The slurm scheduler was down earlier than expected. Therefore, I transitioned to work locally on the frame extraction code. I was able to successfully map timestamps to keyframes and display the images. From a quick visual inspection, a majority of the extracted keyframe images were from commercials and a few were title sequences. However, since I was working with only one video file for the temporary testing my results may have been skewed.

Additionally, I started to process the commercial CSV data provided by Professor Tim Groeling. This CSV file contains several start and end timestamps of commercials for various mp4 files in 1989. I believe this data would be quite useful to extract images for the commercial class.

The image below shows a small snapshot of how the files are stored once the music segmentation pipeline is finished. If you observe closely you can see that I’m storing the mfcc and loge audio features for each 45 min interval.

Additionally I store a *_feats.csv file which contains the start_second, stop_second, difflen, mfcc_path and loge_path . This can be observed in the screenshot provided below.

The pipeline also produces a filename.csv file which contains the music intervals within that mp4. This is shown in the screenshot example below.

Difficulties along the way

The problems I faced this week dealt with submitting and exectuing batch array jobs. Since the HPC was under maintenance, the jobs were not executing as planned. Apart from that, It was a fairly smooth week with a lot of progress.

Conclusion

Prior to the HPC maintenance, I was able to extract music segments for approximately 30 mp4 files. By using these segmentations I developed the next feature for the pipeline. In order to seperate the working music segmentation from the newly added keyframe extraction code, I created a new branch on github. In the following week, I’ll continue to submit the array jobs once the HPC cluster is back up online. Additionally, I’ll be extracting and labeling a lot more images and focusing on building an image classifier.