GSoC Red Hen Lab — Week 3

3 min readJul 4, 2022

Pipeline for Multimodal Television Show Segmentation

Hi, welcome back! If you haven’t already read the introductory article based on the community bonding period then I strongly recommend you do so before continuing.

Although this week had to be cut short by having to travel and attend the IEEE Region 10 Symposium (TENSYMP 2022) — where my co-authors and I presented our work — I was able to fix a majority of the issues from last week as well as run and extract the music labels for one mp4 file on the GPU cluster.

Goals for the week

Process and store music labels for category one files in a CSV.
Store audio features in the CSV as well.
Fix issues with using singularity within GPU cluster.

Work Done

First things first, according to Frankie’s advice, I changed up the media2feats to add the 45 segments to a queue as soon as the file was processed and features (mspec, loge, difflen) were returned. Then by modifying featGenerator I was able to return the features one at a time. This ultimately allowed me to yield the result even if the entire thread is not finished. This optimization increased the performance by roughly 3 times.

Then I set out to fix the issue with the singularity not being able to access the .singularity folder. I ended up rerouting the symbolic link to my /scratch workspace. Since scratch is visible to the GPU nodes, it didn’t have an issue accessing the cache. Scratch also provided upwards to 1TB of storage data for members. The only catch is that it gets cleared every 14 days. By fixing the singularity issue, I was able to use the GPU nodes to run my code. The speedup using the CUDA enabled Tesla GPUs was substantial.

I preponed my meeting with Frankie due to my conference. During this meeting Frankie advised me to use the Memray memory profiler to determine where my program was running out of memory. In addition we also talked about the next stage of the project where I’d have to extract images from the music segments and annotate them to train a classifier for the purpose of filtering false positives.

Thursday night after settling into the guest house in IIT-Bombay, I attended a talk held by Mark Turner, Francis Steen, and Peter Uhrig. During this talk the mentors talked about the importance of organizing and formatting the metadata produced in a standardized format for better reusability. This made me realize that I too should try to structure the data which I’m producing in a standardized manner.

Difficulties along the way

Running the entire first stage of the pipeline on the GPU node raised a few errors which were hidden from plain site. While executing my code on Jupyter Notebook within the GPU node, the program was terminated since the memory usage far exceeded the requested allocation. After doing a few quick calculations, I realized that I can only rsync up to around 8 mp4 files at a time as supposed to the 100 proposed in my original proposal.

Conclusion

Overall, my paper conference has set me a bit behind schedule but I’m still slowly progressing at a constant rate. I have made great progress in executing my code on the GPU node but I’ll have to keep in mind the extract format in which I’ll be storing the results/metadata. I also need to ensure that my code can reliably scale up with minimal failures.