Video is taking over the internet, but in many ways, it has not changed significantly in the past 40 years. The way we discover and pay for video content has changed significantly, of course, but we still consume video in a continuous, linear sequence, and that’s about to change.
Sandeep Casi and his team at Videogram are using deep learning to change the way you and I discover and watch video. They’ve already had success in the enterprise realm, and they are now bringing the technology to consumers.
Interestingly, Videogram was not founded the way most startups are, and Sandeep’s approach to leveraging the intellectual property locked up inside Japan’s large corporations might represent a unique and important avenue for innovation here in Japan.
One that might become every bit as important as traditional seed-funded startups.
We also dive into the paradox of enterprise innovation, and Sandeep explains a few things that all startups need to understand about corporate accelerators before joining.
It’s an interesting discussion, and I think you’ll enjoy it.
Show Notes
Why the key-frame model of video presentation is broken How General Motors pioneered VR in the early 90s Why there are fewer breakthrough technologies than you think What a startup can do when you are too early to market Why technology companies need to be content companies Why we might see more spinouts from Japanese enterprise How to raise funds as a foreigner in Japan How the Olympics will force Japan's video market and culture to change How to overcome the aversion some Japanese VCs have to foreign founders
Links from the Founder
Learn more about Videogram Check out Sandeep's home page Follow Sandeep on Twitter @sandeepcasi Friend him on Facebook The Increasing Interplay of Video and Social Media How Machine Learning Unlocks the Value of Video
[shareaholic app="share_buttons" id="7994466"] Leave a comment Transcript Welcome to Disrupting Japan, straight talk from Japan’s most successful entrepreneurs. I’m Tim Romero, and thanks for joining me.
Today, we’re going to be talking about the future of both how we create and how we consume video. Now I grant you, this may sound like something that’s pretty hard to do in an audio format but I think is some ways it’s actually easier. After listening back to this interview after I recorded it, it became clear that imaging the possibility in your mind’s eye us much more powerful than laying it all out for you in two dimensions. But we’ll get to all of that in just a little bit.
You see, today we sit down and talk with Sandeep Casi, founder and CEO of Videogram. We talk not only about the future of video but also about a new model for unlocking some of the intellectual property that’s currently locked up in large Japanese companies. Sandeep and his team followed a very different startup model than what we see in Silicon Valley. It’s something we might be seeing a lot more of in Japan because the model is so well-suited to conditions here in Japan.
Sandeep also has some really practical advice for participating in corporate accelerators and for new things startups absolutely must keep in mind when trying to sell innovative products to large enterprises. There are definitely tradeoffs. In fact, you could say there’s almost a built-in conflict of interest. We also share some real-world suggestions on how foreign founders can successfully raise multiple funding rounds in Japan.
But you know, Sandeep tells that story much better than I can. So let’s hear from our sponsor and get right to the interview.
[pro_ad_display_adzone id="1411" info_text="Sponsored by" font_color="grey" ]
[Interview]
Tim: So I’m sitting here with Sandeep Casi of Videogram, which is an amazing video product. Thanks for sitting down with me.
Sandeep: Thank you, Tim, and thanks for the opportunity to talk to your audience.
Tim: It’s so hard to describe video on an audio podcast. But if I understand it correctly, you use AI to create paneled previews of videos. It’s kind of a storyboard or a comic book view.
Sandeep: What you see as an end product is what you just described which is more of a pictorial summary of a video. But what’s going on in the backend is a much more deeper technology. We actually use machine learning to understand the context of a video, whether the video has scenes that’s of interest, celebrities that have certain status that we think we should be surfacing for discovery, as well as objects which might be interesting for monetization within the video.
Tim: As it scans through the video, it could actually recognize not just there is a person here but wow, that’s Angelina Jolie or --
Sandeep: Yes.
Tim: Wow.
Sandeep: Or it recognizes maybe something she’s wearing, maybe sunglasses that she’s wearing. And if you have trained the system, we can also recognize brand of that sunglass. So what we do is we index the video down to its most atomic level. If you look at what Google does with text indexing, it indexes and brings in contextual results when you type in the key word. So we de very similar things for video. We break a video down into the most atomic level by understanding what’s inside of the video, not only from the perspective of the scene, the clip, the music, the lyrics, and object, as well as context. For example, beach, a car on the beach. Once we get that metadata, it’s almost like a lego block. Once we have all of these lego pieces, then you could construct different use cases from that lego.
Tim: Okay. You put those together in an engaging panel format.
Sandeep: That’s right.
Tim: It’s really cool technology but why is it important? What’s it good for?
Sandeep: There are many different use cases. The first use case is a discovery piece. If you notice with online video from its inception, it’s usually one frame with a play button. That one frame, it has no context because by definition, a video is a bunch of frames. And then a publisher picks one frame in order to create an advertisement of that video, so enticement frame, as we call it. That frame is what enables you to click into the video. Most of the times, that frame is actually a click bait.
Tim: Yes. Of course, like any headline.
Sandeep: Like any headline, right. What happens is that even people click into that, they kind of drop off immediately because there’s no instant gratification. They licked into that frame because they want to consume that frame but they don’t see it. Or even when they scan it, maybe they don’t see it and they drop off. So what we do in terms of discovery is by surfacing the storyboard of the video, a contextual storyboard, everybody has a choice of their interest in the video. Somebody might actually see a dog in the video that they might be interested in or somebody might see sunglasses of a celebrity that somebody is wearing. So everybody has a choice and then they can click on to that frame and there’s an instant gratification because what they clicked on is what they see.
Tim: Okay. It certainly makes sense that when you’re providing more variety, you should get a higher level of engagement and higher level of people watching the videos. Do the numbers bear that out?
Sandeep: Yes, it does. Usually a click-through rate for a single frame video is in the range of 15%. We are actually seeing anywhere between 40-60%.
Tim: Okay. So three times plus.
Sandeep: Three times. And it’s very simple. The reason that we see a larger click-through rate is because there’s more choices to click on. So imagine a website which has one headline and that’s the only headline you have. If I don’t like that headline, I don’t click on it. But imagine an article with multiple headlines. Each headline going into a certain paragraph of that article, I’ll be more interested in choosing the paragraph that I want. So by nature, we were actually not trained to watch content from end to end. This is an issue that for some reason platforms like YouTube and Facebook have created, that a video should be consumed from beginning to end and [publisher] have bought into it. But if you look at how a newspaper or a magazine is consumed. Let’s go into the analog world. You don’t read the newspaper from the front page to the back page or a magazine --
Tim: No. it’s very nonlinear.
Sandeep: Nonlinear. So you jump into the parts that actually entices you. You scan or browse and you jump in. You consume and you comment on it, maybe, to you friends, whatever you want. So why can’t video be the same way? That’s basically the vision behind Videogram, to create at random access into video so that you can consume the parts that you like. And once you like that part, you should be able to share that part or clip or segment with other. And then number 1, you don’t have to consume end-to-end.
Tim: Right, right. And so you’re mentioning sharing. So if I decided to share a video, I could share just a particular snippet of it?
Sandeep: That’s right.
Tim: Interesting. Tell me about your customers. Who’s using Videogram and how?
Sandeep: We have variety of customers that are using Videograms. We actually ran a few trials with almost close to 29 studios globally, in India, in Los Angeles, and even in Japan. Out of that, we came out with verticals. The first vertical is called Videogram Music. If you notice SoundCloud, SoundCloud is all about commenting on a certain clip or a certain segment of the audio. So we provide same type of features for music videos where you could stop at a frame and then you could basically comment on that frame and that comment is attached to that particular frame. And when you’re scanning, you’re able to jump into that and clearly engage on that. The second vertical we do is Videogram Live which is mostly focused on sports, e-sports. And the third is Videogram Ads.
Nyd den ubegrænsede adgang til tusindvis af spændende e- og lydbøger - helt gratis
Dansk
Danmark
