Philippe Petitpont, Moments Lab: How MXT-1.5 AI Transforms Content Production and Viewer Experience

09.10.2024

TFT1957 Interviews Philippe Petitpont, CEO & Co-Founder of Moments Lab at IBC 2024.

– What new products is your company showcasing at IBC2024?

At Moments Lab, we figured out that the main problem with building content is the time it takes to create a video. It takes five minutes just to find one shot to tell a story. If it’s a ten-minute video story, there could be more than 600 shots, making it super long and complex to build. We have developed an AI that can understand video, make it searchable, and instead of spending five minutes per shot, it now takes 2 seconds.

Our AI is called MXT-1.5. It’s what we call a multimodal AI. Multimodal AI means that we analyze everything happening in the video and audio using more than 20 different AI models. We put all of that together to make sure we can understand video better and faster than humans, and in a much more scalable way. The great thing is that our AI is learning every day. It’s trained on customer data and keeps learning daily to describe the world better than humans.

Our recent benchmarking of Moments Lab’s multimodal AI model, MXT-1.5, shows it surpasses major models like GPT-4o, Google Gemini 1.5 Pro, and Nvidia VILA 1.5 on the VideoMME dataset.

Unique three-level hierarchical indexing for accurate video analysis.
Combination of generative and expert AI systems.
AI trained for specific industries (Television and Sports).

One important thing about this multimodal AI is that it’s super advanced. Many companies in the tech space think we’re about two years ahead of the market. Our AI can analyze 500 hours of video per minute, and it’s also seven times more cost-efficient than other AI models. The goal is to make sure this AI is ready for business units, helping them provide analysis and cost-effective solutions—whether they are dealing with 1,000 hours or millions of hours of video.

When you can describe a huge volume of video at scale, it unlocks new ways of working. It allows you to find videos quickly and build rough cuts faster. Instead of working shot by shot, you can identify the 50 shots you need to build a video. Hunting for shots one by one is a thing of the past, and now we’re moving towards assembling 10 or even 50 shots at once to tell a story. We believe the future of video editing is in prompting, which will make the process faster and more accessible to more people. It won’t be necessary to have the skills of a video editor, making video storytelling more ubiquitous.

Now, regarding the viewer experience, viewers want a more tailored approach to content. People want to see only what’s interesting to them. With the ability to know exactly what’s inside the content, you can tailor an experience for the viewer. For instance, if someone is passionate about World War II stories, they can interact with an interface to find or get suggested content on that topic. The potential for content recommendation and ad targeting is huge.

As we move into a cookieless world, targeting becomes harder if you don’t know the viewer. That’s where understanding the content becomes essential. For example, if there’s an ad for a sandwich brand in a cooking show, we can make sure that ad appears when the content matches, like during a scene about food. Understanding video in this way will provide a better experience for viewers and offer more engagement, bigger audiences, and better-targeted advertising, especially in areas that are not well-targeted today.

– Thanks a lot for the interview!

Full coverage of IBC2024 here

About TFT1957

TFT1957 (Television and Film Technologies) is a multimedia platform with scientific and technical content focused on current research and developments in broadcasting, film production, and Pro AV. Our capabilities include:

Historical magazine TFT1957 (Television and Film Technologies, published since 1957)
Program ”360 Seconds. Broadcast News & Commentary”
Online round tables on Broadcasting / Cinema / Pro AV
Hybrid exhibitions and conferences on Broadcasting / Cinema / Pro AV
TFT1957 YouTube channel
tkt1957.com portal
TFT1957 Facebook group
TFT1957 LinkedIn group
TFT1957 X group (formerly Twitter)
TFT1957 Instagram group
TFT International Awards

About Moments Lab

Moments Lab, formerly known as Newsbridge, is a company that focuses on using AI to improve video content management and indexing. Founded in 2016.
Clients: TV networks, production companies, journalists, producers, and archivists to scale content creation workflows and new revenue streams.
Company Products: Media Hub, Live Asset Manager, Media Marketplace, Just Index.
The company’s flagship technology, MXT-1.5, is a multimodal AI indexing model that provides advanced features such as automatic sound bites, sequences detection, and media topic categorization.

Lawo will sponsor the AV & Media Central Asia 2025 exhibition and conference

Darren Long Leads Content Supply Transformation at ITV

Sergey Gorbunov Joins the Technical Committee of AV & Media Central Asia 2025

Samuel Recine: How IP-Based Standards Are Shaping the Future of Pro AV

AI and Av-over-IP: Evertz Showcases Innovations in Broadcast and Corporate AV at ISE 2025