GSoC Red Hen Lab — Community Bonding Period
Pipeline for Multimodal Television Show Segmentation
Hello there!
My name is Harshith Mohan Kumar and I’m quite fortunate to say that I’ll be working with Red Hen Lab under Google Summer of Code 2022. This is the first of many more posts to come which aim to summarize my ongoing contribution to the organization.
In this article, I’ll briefly mention the specifications of my project, and the outcomes of the community bonding period and maintain a list of links to the various project progress articles in the outline section.
In addition to these written blog posts, I’ll be creating video summaries on my YouTube channel.
Outline
*Will get updated with time as the project progresses*
- Community Bonding Phase
- Week 1
- Week 2
- Week 3
- Week 4
- Week 5
- Week 6
- Week 7 — Midway! :)
- Week 8
- Week 9
- Week 10
- Week 11
- Week 12
Mentors
Project Abstract
This project proposes a multi-modal multi-phase pipeline to tackle television show segmentation on the Rosenthal videotape collection. The two-stage pipeline will begin with “feature filtering” using pre-trained classifiers and heuristic-based approaches. This stage will produce noisy title sequence segmented data containing audio, video, and possibly text. These extracted multimedia snippets will then be passed to the second pipeline stage. In the second stage, the extracted features from the multimedia snippets will be clustered using RNN-DBSCAN. Title sequence detection is possibly the most efficient and the best path to high precision segmentation for the first and second tiers of the Rosenthal collection (which have fairly structured recordings). This detection algorithm may not bode well for the more unstructured V8+ and V4 VCR tapes in the Rosenthal collection. However as Co-Director of Red Hen Lab Francis Steen pointed out in an email conversation, “the project will be a success if 90% of the boundaries are accurately detected” and “solutions that produce high accuracy for a small number and remain silent on some are still very useful”. Therefore the goal is to produce accurate video cuts and split metadata results for the first and second tiers of the Rosenthal collection.
The entire proposal can be found here.
Community Bonding Period Summary
During the community bonding period I received and activated my Case Western Reserve University (CWRU) ID for the purpose of gaining access to the High Performance Computing (HPC) clusters.
I then proceeded to setup the connection to the HPC clusters through VPN access. I’ve uploaded a video on my channel which goes through the detailed steps on how to set it up for Ubnutu 20.04.
I then used GitHub workflows to create and publish a docker container so that I could setup my singularity image. In order to move on to the next stage and get an early look at the dataset, I setup jupyter notebook through the singularity shell and forwarded the port to access the notebook on my local computer. If interested to learn more, I have provided much more detailed instructions in my github repository.
Then I wrote a small notebook to analyze the type of files in the Rosenthal collection. I plotted a few bar graphs to analyze the percentage of mp4, srt and txt3 files existing within the directory. Unfortunately however, all video data on the gallina was lost. A script has been initiated which is restoring the Rosenthal collection. Therefore the visual analysis which I was able to perform was on a very small subset of the directory.
I then had another call with my mentor Frankie to discuss the overall progress during the community bonding phase and figure out the objectives for the following week.