Twelve Labs Secures $50 Million in Series A Funding to Revolutionize Video Understanding

June 5, 2024, 3:37 pm

Twelve Labs

AdTechArtificial IntelligenceDataInfrastructureSearchVideo

Location: United States, California, San Francisco

Employees: 11-50

Founded date: 2021

Total raised: $77M

Nvidia

Location: United States, California, Santa Clara

In a groundbreaking move, Twelve Labs, a pioneering video understanding company based in San Francisco, has successfully raised a substantial $50 million in Series A funding. This significant investment, led by prominent investors New Enterprise Associates (NEA) and NVentures, marks a pivotal moment in the company's journey towards shaping the future of multimodal AI.

The infusion of capital will serve as a catalyst for Twelve Labs' ongoing efforts to develop state-of-the-art foundation models dedicated to all aspects of video understanding. With a strong focus on innovation and technological advancement, the company is set to leverage this funding to expand its research and development initiatives, nearly double its current headcount, and onboard over 50 new employees by the end of the year.

At the core of Twelve Labs' mission is the goal of making video content instantly accessible, intelligently searchable, and easily understandable. Through their cutting-edge video understanding technology, the company aims to empower users to discover valuable moments within a vast sea of videos, enabling them to engage more deeply and learn more effectively.

One of the key highlights of Twelve Labs' recent developments is the integration of various NVIDIA frameworks and services within its platform. By leveraging technologies such as the NVIDIA H100 Tensor Core GPU and NVIDIA L40S GPU, as well as inference frameworks like NVIDIA Triton Inference Server and NVIDIA TensorRT, Twelve Labs has been able to pioneer the development of foundation models for multimodal video understanding.

Furthermore, the company has embarked on strategic product and research collaborations with NVIDIA, with the aim of bringing best-in-class multimodal foundation models and enabling frameworks to the market. This partnership underscores Twelve Labs' commitment to driving innovation and pushing the boundaries of video understanding technology.

In a significant move towards advancing their capabilities, Twelve Labs has introduced Marengo 2.6, a cutting-edge multimodal foundation model that represents a significant leap in video understanding technology. This state-of-the-art model offers a pioneering approach to multimodal representations tasks, encompassing not only video but also image and audio. By enabling any-to-any search tasks, including Text-To-Video, Text-To-Image, Text-To-Audio, Audio-To-Video, and more, Marengo 2.6 is poised to revolutionize the way we interact with multimedia content.

Additionally, Twelve Labs has unveiled the latest version of its Pegasus model, Pegasus-1, which sets a new standard in video-language modeling. Designed to understand and articulate complex video content with exceptional accuracy and detail, Pegasus-1 is set to transform the landscape of video understanding and analysis. The open beta release of Pegasus-1 has been met with enthusiasm, offering enhanced performance and accessibility to users.

In line with their commitment to innovation, Twelve Labs has also introduced the Multimodal Embeddings API, a groundbreaking tool that provides users with direct access to raw multimodal embeddings. By supporting all data modalities, including image, text, audio, and video, the Embeddings API enables users to turn data into vectors in the same space, without the need for siloed solutions for each modality. This advancement represents a significant step towards achieving Twelve Labs' vision of making videos as easily accessible as text.

The successful completion of the Series A funding round is a testament to the industry's recognition of Twelve Labs' groundbreaking technology and innovative approach to video understanding. With a growing user base of 30,000 organizations utilizing their APIs for tasks such as semantic video search and summarization, Twelve Labs is well-positioned to establish deep industry partnerships and integrations with leading companies in various sectors.

As Twelve Labs continues to push the boundaries of video understanding technology and drive innovation in the field of multimodal AI, the company remains committed to its mission of making videos as accessible and understandable as text. With a talented team of experts and a strong focus on research and development, Twelve Labs is poised to lead the charge in shaping the future of video understanding and multimodal AI.