Every second, 5 hours of video are uploaded to YouTube. Cisco estimates the volume of video content this year at about 50 Exabytes (50 EB = 50 million TB). We want to search that video. We want to extract meaningful insights from it. We want to ingest its content without having to attempt the near-impossible task of watching it. So how do we do it?
Almost all video content tagging requires complete human input – name and description of a video, keyframes, etc. Teams of human taggers extract metadata from the Netflix content library, the corpus of Daily Show episodes, etc. But what if machines did most of the heavy lifting?
Muse.ai, a startup company, is building a platform to unlock insights from video by enabling people to leverage state-of-the-art computer vision and machine learning.
As a fellow in the Insight Data Science Fellowship program, I worked with Muse.ai on a video recognition project. My project focused on creating a Python pipeline to segment a video into scenes, as well as to identify and track faces in a video. I used IPython, Scikit-learn, pandas, OpenCV and ImageIO (ffmpeg plugin) as my main pipeline tools, as well as Jupyter notebooks for data exploration.