GSoC Red Hen Lab — Week 8
Pipeline for Multimodal Television Show Segmentation
Hi, welcome back! If you haven’t already read the introductory article based on the community bonding period then I strongly recommend you do so before continuing.
Last week I concluded the first stage of the Multimodal Television Show Segmentation (mtvss) pipeline. This stage produces a csv file with the classification label (commercial/title sequence), start time, end time and confidence of prediction. Now moving on to the second stage, I’ll be storing the features extracted from these title sequence keyframe images using the existing ResNet50V2 model. Then I’ll be clustering these features using RNN-DBSCAN.
Goals for the week
- Continue to run the pipeline stage-1 to extract title sequences.
- Configure and setup a new isolated code base for stage-2.
- Extract image features and store them as a .npy file.
Work Done
As I continued to work on the stage-2 of the pipeline, I made sure to isolate the existing working version of the pipeline so that the changes I continue to make won’t conflict with the working version. This isolation is not only necessary to avoid errors but also helps in making the code base a lot more modular. It opens up a lot of room for further improvements later down the line.
The current tree structure of my project looks like this:
├── mtvss
│ ├── annotation
│ ├── constants.py
│ ├── data
│ ├── data_raw.py
│ ├── __init__.py
│ ├── pipeline_stage1
│ ├── pipeline_stage2
│ └── __pycache__
Where pipeline_stage1 and 2 are directories which have python scripts which work independently of one another.
│ ├── pipeline_stage1
│ │ ├── data.py
│ │ ├── __init__.py
│ │ ├── model.py
│ │ ├── PretrainedResNet50V2.py
│ │ ├── __pycache__
│ │ └── run_pipeline_stage1.py
│ ├── pipeline_stage2
│ │ ├── data.py
│ │ ├── __init__.py
│ │ ├── model.py
│ │ └── run_pipeline_stage2.py
Although I’m using the same naming conventions for the files, which might be a bit confusing, it illustrates how similar the two stages are in the way that they are structured.
Moving on, the input data for the stage 2 of the pipeline are image features of size (#imgs, 2048).
Just to quickly recap, this output is extracted from the global average pooling layer. The architecture of the model is once again shown below.
Conclusion
Overall the progress was quite slow this week due to a family emergency. However, I was able to lay down an executable plan for the second half of GSoC. By allowing the existing stage-1 code in the background I was able to increase the amount of category 1 files I’ve processed to a little under 50% (~5000 files). At the same time, I’ve been working on extracting and storing the title sequence images in a binary format.
Next week I hope to work on implementing the RNN-DBSCAN and would love to explore and analyze the results obtained from stage-1 to see how they would affect the performance of the clustering stage.