About the talk
Action recognition in multiplayer sports has been a very challenging problem to be solved at a fine-grained level. Several research literature on soccer, baseball, tennis, have focused on using OCRs to find start and end times of full continuous time game videos, to align of video time to game event (e.g. at 3 min 26 seconds of a video, Messi scores a goal, or Stephen Curry scores a 3-pointer), by matching time of the game with play-by-play commentary, in order to gather large amount of training data for action/event recognition. Inspired by these researches, and equipped with a lot of rich business partnership driven play-by-play commentary data, we tried to seek such techniques to be used in production (to annotate game videos with markers of players and actions within the video play time) for various sport leagues Yahoo! Sports reports and covers(NBA/NFL/NHL/MLB/Soccer). Our primary motivation, was to enable sports users to skip to exciting section of sports videos (where there favorite player scores etc) so as to decrease abandonment rate video watch times, as well to give personalized clips of game highlights to Yahoo Sports app users who have declared explicit interest of watching a team or a player (like Lakers in the NBA etc). However to utilize such automated training data collection methods, and turn them into models which serve production on a day to day basis, we face challenges of general purpose trained text detectors and recognizers not accurate enough to identify box score/game clocks with high accuracy. We also do not have full uninterrupted videos like the selected videos used in retrieving training data in the above research literature. Composed/edited highlight videos of the game, are freely available all the time. We innovate on efficiently transfer learning general purpose text detectors and recognizers by efficiently collecting training data, by using knowledge constraints. Employing transfer learning on cheaply collected training data by virtue of noise correction of training data obtained by using general purpose pre-trained models to predict on unlabelled videos, with domain knowledge based constraints, we develop high precision, focused models. These models while being almost 100% accurate in identifying game times, from box scores, helps us expand to different leagues like NBA, NFL, NHL and several soccer leagues without any humanly labelled training data. We also propose strategies of scaling prediction times and scale beyond our licensed videos to evergreen Youtube highlight videos from different broadcasting channels. Finally, we propose a high-quality fine-grained action recognition datasets for training non-OCR based action recognition classifiers, which are bigger and more diverse than proposed in the research community.
Our primary contributions include:
(1) elaboration on cost-effective training methods without ample manual labeling for accurately identifying bounding boxes for a specific type of objects, and text inside them and use them to expand to several new sports domains quickly,
(2) optimize core video processing run-time to support the computational needs of processing videos, to accommodate O and O videos and a massive corpus of evergreen Youtube videos of longer duration, and
(3) a novel high-quality basketball, football, and ice hockey datasets for action recognition research.
(4) converting a training data generation technique from full length videos, to work on sliced video highlights, by training high precision models, such that it can enable in video semantic search.
Topojoy currently leads information extraction on text and videos in Yahoo Knowledge Graph which powers search and information organization in products in Yahoo! like Finance, Sports, entity search and browse. He has worked on Yahoo Knowledge Graph (YK) for 4 years on various aspects of creating knowledge graphs like reconciliation of source graphs, type classifying semistructured pages to right ontology types, ranking related entities beyond the obvious neighbors to name a few. Before Yahoo Knowledge graphs, he worked for Yahoo shopping on attribute extraction and classification of shopping feeds into large taxonomies of products.View the profile
Hello, everyone, and we have our next speaker, ready to go. We've got to Okeechobee zweilous and he is joining us from Sunnyvale California, to take it away. Are some of the presentation for the previous speaker on a very fascinating subject. On face detection of lines. I have never believed that with you or something like that. I'll move on to something more funky hour or so. I am a part of the research. We were crawling golf mining, but this one is about how we do Sports, understanding the president.
Give a brief idea about what Yahoo! Sports has and what not? See, I was bored is as being a brand white round with WWL radio sports news without a live scores. Like you can go online and you can check it out. Android apps iPhone apps to vary widely used. I don't like when a major leagues in u.s. European soccer college, basketball football and brightest and the things which Cricket tennis Olympics. In terms of what kind of data we have and what kind of services you provide, we have a variety of content
Partnerships starting with NBA NFL. NHL stalkers video on demand that you can browse anytime streaming on live play-by-play commentary and as well live media coverage for Olympics and other On the side. Yahoo! Knowledge is a platform to research platform, which focuses on crawling extracting and organizing entities, a relationship of the star Players Sports Awards, taking the scooters. They have done. And we also do to get entities across various domains. For example, if I get Stephen
Curry, who is NBA star NBA player from Wikipedia versus yahoo, sport versus other articles. I will try to be information from a weekend identify a player or team in inside information on to Images of sports our knowledge graph Cowboys players teams, all different kinds of sport, all different kind of team starting from the usual basketball football soccer Olympic cyclist competition. Like the different competitions like NBA season, 2029 and their scores and also place to watch
what we had done new and which is what we are trying percent is the Deep linking of entities. Like there's something inside videos, all the height of these players on what's Happening inside news, article says that we can combine information to multimodal fuses. I would actually be going into a little bit of experiences and user experiences, which we support and which are other computer experiences from other a Sportster bike. Send Yahoo! Sports video explaining. This is this is this particular one is an Apple TV HD
of the highlights of the game today yesterday or you know the place. you have a very similar view of looking at videos courses videos, but you can select your favorite teams and A similar. But slightly different experience from YouTube TV about Jeep linking inside. Looks like this. Where do you have the game going on? Inside the game? You have different players being shown at different points in time, taking different action-housing Oriental, forwarding a video.
Storage. Browsing was a trend and the start of the Olympics. So I will find out of fuel survey results from YouTube with John top search trend on sport videos and highlights of video, read the Premier League highlights for funny videos like So far from Bad Girls. Also interested in the daily interviews, post-game interviews of players and the star players on the light purple. This is to get into the minds of how they're performing. The other type of videos which are
quite looked at our, how to do videos. Like, once you were a fan of a game, you know, how to dance, how to play, how to do a soccer movies. And the latest and greatest moments in the game history. And then obviously the sports highlights as well. This is a very similar experience and similar kind of word is we get on with our search. Knowing the full context of this truck videos and DIYs Premier broadcasted videos and what kind of search and browse experiences,
we can take a look at the statistics. To breathe into the data composition freshness of these different, a broadcaster sport video. Do for highlight videos which covered up milk? When should I get my brother? Ali highlights or a team highlight or a particular weeks highlights? These videos are there be there, basically semi-automatic, find them. After the game has finished, the broadcaster's, the YouTube channel or, you know, we have fun. So these videos are kind of semi-automatic.
Kiss of the best video of the best moves and the funny mood. They're also kind of the similar in terms of they get posted every every week, or every week, and they consist of typical key player actions across different games. The most memorable ones and the idea of what's best. And what's funny is subjective weather kind of covered by what's been written about in the news articles and Finally, the interview videos are all about, you don't have been having one player for one-bedroom focused on talking about them and
also come into his tea. In terms of identifying words, being spoken of in the video, you can look at it. Support is highlighter videos of two different cameras. Always has the game clock going on. Notes, when you look up when you look at these different kinds of views, which major league composed, the highlights, and the broadcast build. What you got to do, identify what's happening? You don't actually me to look at the entire set of players just focus on the floor and try to align it to the
play-by-play,. We have a lot of our own. We should get from our provider like in the NFL's. Different Soccer League. Do a line. You also get a fair idea of what alternate views are going to be shown. And what do you want? What I want to see a close-up of the pair. I want to see a replay of the volunteering to go and see how the ball. Knowing yourself for interview videos. You don't need to align to the play-by-play to move from the image to Maine to detect when you already have a bundle of text to make
sense of what the videos about what is on. With that, let me try to get into the email. Talk to each other in a blender. Jazzy spooky free online play by play. This is a typical play-by-play record where you can see the first and the second and third quarter of the player moving the time. And if you can find, if you can only fight that and you can align it to the time in the clock. You should be getting a very good match of what's Happening. Not, if we do that, well. Thinking of videos deep-linking inside videos of what the Browns p.
If you do that, well, what happens is video composition of the composition of the video is very well. Understood. What does it have? It is much more than the title and the text. Original sin 2 video composition you can personalize highlight-reel based based on these rich content. For example, I know. Images of video storage diamond. Game design identification of the Playboy, please. Do I have a lot of contact information and I can enable search Village. No, the other kinds of information as we spoke Eastern text information extraction from the
tickets available in the video and now Both of these things in place. What you can enable is very deep. Semantic, topical, Curry, like the ones we saw like highlights interviews except Know what's with this kind of information? What you can also enable is this kind of a topical search and browse the query, they give results for Steph Curry. 3-pointer deep freezer and wanted to know is if you're deep enough and accurate enough in your understanding and you can actually get much fresher
experiences than 2019 old games, you can give the game. The focus of the dog. Right now. We're basically go to talk about alignments of the Plymouth Place to soccer clock. The other kinds of extraction decks information extraction that is identified by please. And then finally, it enables all the different kind of search and browsing speed. The name of the Stark River focus on doing play-by-play extraction, really, really well. Imagine that Getting deeper into the thing. What is play-by-play? The play-by-play transcription is basically a ton of detail on what happened in each world in the
game. So it is a witch has clock on the quarter in which it was happening. It has a text of which player or did. What kind of ammo for example? In this case is Aaliyah and if you can identify the clock well and that is the box of the underbody box of the flock. Well and the text inside the clock, which has the water and the match it with the play-by-play, you should get a ton of information about what is happening in the face. Doesn't really deserve what I'm trying and time in the quarter with high quality.
We've been down this hole in 20 hours into smaller problems. Someone is the detection of the subject block and then text detection and text recognition and we don't mix it up with the text here and the dicks. Otherwise in the field in the end. We are heavily reliant on accurate. Understand. Why is there a challenge to season? If you look at a variety of different clocks with your present and Devan for provider-to-provider, Gabriel Valley from country to Country
and even providers change their clocks from? So, what we do is, basically read.. Across the video, the benefits from this from this, open-source and crackers. And that should work out. Don't finally, I want to have his feelings. I want to have this video. So you have to realize that because some of them are coming from a lot of different types of 90% of the content in a transport. Okay, so now I'm down to how what do we do in the case of first one? We used to
the confidence laws on Hively besides the object. I presentation in our case. It is just one place. We don't need to listen chili. That's why we bump up the volume. And that along with the device? So now I'm too extremely accurate information to a completely different text domain, which has a lot of aggressive inference about what is happening inside. Look like generally Ozzy Ozzy Osbourne, both stations, which are tighter Bowling Green. And also I need to find
one. Approach. Is that a w s or any other generalize text Victor? When you apply default on these on this clock? They don't work out with this time. If you get the divided into two different boxes, and we can end up alone. Will not be recognized because while training in need of nurses General finger that we don't know what that is, but it doesn't come out well. Into the forest, the line of the bombing bus line passes through 1, which makes one not identifiable. And then
You cannot use general-purpose. Now, what do we do? So if we have to be together, we're using knowledge constraints about this. What does it mean? The big idea is that if you know these objects will and if you and for example, in this place and if you know what constitutes time and what constitutes quarter, then you can write a knowledge constraints around, for example, in this case, we have finally. Depictions of time because is 4.58 or 45 minutes. No, the other thing about clocks is that, if you, if you can sequence them to find them. So if you
look at 1:12, which is behind fight properly of the clock. And the surface Warfare 114 is also behind, so you can see that. But really want to go back salad. So what happens is? There are two possibly second. And first of all, two contenders for the investigation of water. And is that the one, which is beside the time? No reason to object help us also. And if I depart from surface, pumpkin seeds, The lost because I was talking about certain times. Ridiculous. What is the seventh
about here at the proper time in between 1 and 1/2, you know, there's something wrong. So this is to be used. Basically, out of quality training by using predictions on general purpose. Not going to have the streams. What you do is with you. And then you find unitransfer, learn to Hotel hate entrapped. We have one minute warning, detecting Jersey. For example, you can detect the body parts of a person and say that. This whole thing basically is about out of the window.
The idea basically has to use contextual and accurate and we tried to print it out in the main level which are somewhat right. Somewhat wrong. Right comes, along comes from the fact that the barrier draw your boundaries, can go wrong because the objects are not super familiar. The reason to use a used knowledge, constraints to provide the data and then trained with Kylie, cantrall. Is what we use is Eastern Sierra. Ford Expedition recognition will find you and we basically modify some parts and that
gives us. I will talk about the details of the latest super fast. When we get down to each frame. We use our facilities, which have been fine-tuned on clock to do Bosch protection. Then, once we have those plus we use our tax deductions again, which is super accurate. Me train from Genoa to get these different pieces out. And once we have the rematch and that's the time that we have that we can match. And I would say that no action for Action, recognition, Oliver to make them work, we have to train with
us. End of the presentation. Let me briefly present the team to you. Who was the manager. I would like to see. looks like, Lemon tree very quickly, go through. Digital internal diesel YouTube video, playing the video. If you click on it, it'll get to that point in the video, which has the 3-pointer by James Harden and further down. It will get you to the next level. Three pointers of the dance. We know that there's a big deal with your other job by James Harden, and when you click on that, I think you will get you.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.