GSoC Red Hen Lab — Week 1

2 min readJun 15, 2022

Pipeline for Multimodal Television Show Segmentation

Hi, welcome back! If you haven't already read the introductory article based on the community bonding period then I strongly recommend you do so before continuing.

June 13th marks officially marks the start of the coding phase! During this week, I’ll be focusing on setting up a slurm job to perform music classification in a parallelized manner.

Goals for the week

I’ve compiled a list of main tasks which I set out to complete within the end of this week.

Extract category 1 (V1-V2) mp4 files and store them in CSV
Develop SLURM job
Modify InaSpeechSegmenter
Store audio vector representations
Run music classification

Difficulties along the way

There were quite unforeseen challenges which popped up. First was the process of parallelizing code. To make efficient use of the HPC cluster, my mentor Frankie, suggested to take a deep dive into figuring out how to utlize Array jobs. This is essentially a structure in Slurm which enables users to easily submit and run several instances of the same Slurm script independently in the queue.

A great resource to learn about High Performance Computing (HPC) and Slurm is by watching this HPC course from Aalto Unversity uploaded on YouTube. The Case Western Reserve HPC documentation is also quite useful.

I overlooked the exact method in which I’ll be reading/copying the files to the allocated GPU node. During my slurm job, I’ll have to rsync my files over to the node and use the temporary directory to perform all my operations.

Conclusion

This week helped me setup a sense of direction and the foundational code in order to get the first stage of the pipeline functional. Along the way I’ve made a few architectural mistakes which I plan to rectify in the following week. Significant progress was made in filtering out the category one data files and ingesting them.

I added the InaSpeechSegmenter library as a git sub-module and compiled all of it’s dependencies within the Dockerfile. These changes required me to create a new singularity image and test that on the HPC.

I will have to modify the featGenerator method within the InaSpeechSegmenter in the following week. The issue is that the method at the moment is quite inefficient in the way that it processes large batches of files. For my requirements, the method should be able to process batches of 100 files each consisting of 6–8 hour long videos. Therefore, it’s quite important the ensure that there won’t be any computational bottlenecks int this very first step of the pipeline.

GSoC Red Hen Lab — Week 1

Pipeline for Multimodal Television Show Segmentation

Goals for the week

Difficulties along the way

Conclusion

Written by Harshith Mohan Kumar

No responses yet