Peepaal

Continuous Community Learning

KeyVID - A Video Search with a Difference!!

 

1.Idea


Problem Statement and Scope

 

There is an increasing trend of colleges and universities recording the lectures and uploading them for the students to refer. While this facility is very unique and most helpful it also creates one big problem.."When in the lecture did the professor mention that word??!!"

Had this been a text content searching in it wouldn't have been a problem, but since these are video files CTRL+F isn't an option and going through the entire lecture just to get a reference to a word is impractical and a waste of time.This is where our application comes into picture. With our application it would be possible to search for a keyword in a video file and the result displayed will have a list of all the possible occurrences of the keyword in the video along with the timestamps of the exact occurrences. The application will have support for avi,mp4,mpeg files only.

At this point other media files wont be supported.

 

Novelty of Idea

       

It will be one of the very few applications that is capable of doing this and the only one which is open-source. The application will take the video file as input along with the keyword/s the user wants to search for and then it will strip the audio from the video and create subtitles for it. After the creation of subtitles it will search for the keyword/s what were taken as input. Once an occurrence is found it will create a list of occurrences and link them to the video but in the range of -10secs and +10secs of the occurrence so that the user can get the context of the occurrence of the word.

Eg: The user wants to search for the keyword "Virus" in a video lecture of 30mins. The word occurs at 3:45 and 24:54, the list will have the results as 3:35 - 3:55 and 24:44 - 25:04.

 

Relevance and Application

     

With the explosion of online content, the need for this application will keep on increasing and its use will not be restricted only the video lectures in universities, but can also be extended to the general videos of any genre.

       


2. Solution 

 

Technology Layer

We intend to use the JSAPI,JMF in Java for the development of this application.

UPDATE: We are planning to use Lucene engine for the mapping of the video files to their respective subtitled files, and to enhance the searching parameters in the subtitled files.

     Using it we will be able to search the text like a google search with wildcard support and promixity searches.

     Eg: If we want to find when (in the video/s) did certain 2 words occurred in the proximity of say 10 or 20 words all we have to search is this : "Router Hub"~10.

                This will search for the occurrence of the words "router" and "hub" within 10 words of each other.

     Other features will include support of *,+ n ? wildcards.

     Eg: 1."Rout*" will give the list of all the occurrences of the words starting from "rout" including routes,route,router  etc..

           2."Route?" will give the list of all the occurrences of the words starting with "Route" and with 1 more character at the end, like router,routes.


 

3. R&D

 

R&D Elements in the solution

       

We are going to use the concept of "subtitling" heavily as it is going to be the core of our application (Audio-Text).


Improves on existing solution
There are a few applications in the market that give the feature of subtitling but thats all to it. We intend to take this further and give the user the ability to search through the subtitles and also see the exact reference of the occurrence in the video.

 

4. Updates:

 

03/02/2011: Successfully extracted audio from .flv files.

05/02/2011: Successfully extracted audio from .flv, .mkv, .avi files.

07/02/2011: Got a working demo of sphinx. Recognizes "Good morning" and "Hello" followed

                    by ( Paul | Rita | Philip | Bhiksha | Will ).

Views: 53

Comment

You need to be a member of Peepaal to add comments!

Join Peepaal

Comment by Saad Ulde on May 25, 2011 at 8:27pm

Gave the final project demo today in college, and they liked the idea and the way we implemented it! :) :D After a week o so of resting we'll get back to KeyVID again! We'll upload a end-to-end video and give the link here so that you'll can see the finalized project! :)

Just to show it in the college we had to write the code in a way which we dint link..hard coding a few parts..going around a problem instead of solving it n many more crazy things :P , but now that we have all the time we need, we can actually sit down and re-write the code as it shouldve been and also to add a few concepts which, while working on the project, we thought will enhance the idea even more. :)

I'll be writing a blog post soon detailing the challenges, and comical errors we made while developing the project, and also what are our future plans with this project! :)

Comment by Mandar Pande on May 21, 2011 at 3:29pm
This was the input file for Third module..... on which search is performed
Comment by Mandar Pande on May 21, 2011 at 3:28pm
This is how Output Of Third Module will Look Like ... :) here the search is performed on file "just.txt" and the keyword searched is "hello", the time at which that keyword has occurred is mentioned in the table... these occurrences will be converted to links so that user can jump to the corresponding  video...
Comment by Saad Ulde on May 19, 2011 at 4:17am
@Kaushik, Anoop: I cant seem to get it to work for economics related terms either, but here are the things that i could get it to work.

1.Getting bigger sentences to get transcribed properly -
For the sentence
"the green one on the middle of the right side half way back but on the right side"
i got the following output:
"the green one on the middle all then a side off we back but one that right side"

Here what i feel is that there are a particular word set for which (in any combination) its transcribing it properly with nearly 80% accuracy as above, but any other word outside the wordset its giving similar sounding words from the original wordset. Like for "market" its transcribing it to "on it". :| :(

About the timings i'll work on it and give an update as soon as i get something working.
Comment by Kaushik on May 17, 2011 at 4:44pm

Saad, Lets focus on the basic helloworld voice recognition.

First check if the basic helloworld works,  for the standard words it already has. Do this in a quiet place.

 

Next check the link below, it allows you to add additional words to the grammar.

http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/14/add-recogn...

 

The see if you can get the additional word to be recognized.

Comment by Saad Ulde on April 28, 2011 at 10:39pm
@Anoop, finally got the sphinx working in eclipse! It was that stupid version issue that was causing the error the code you compiled it was sphinx4(beta 6) while i was using sphinx4(beta 5). Now that its running i'll tweak around the config files to get a higher accuracy. Also i've loaded an american accent voice which has both numbers spoken in it as well as some other words too..i'll post the updates here! :)
Comment by Anoop Kunchukuttan on April 26, 2011 at 6:58pm

Mandar,

Searching for multiple occurrences is not important. Remember , we want to retrieve the entire video file given a query. (Just think of a Google query). Also, having larger text to index will help do better ranking. This is because Lucene looks at how many times a word appears in a document, and the more often the word occurs the more important the document is for the query. This kind of feedback wont be efficient with a  line-as-document approach.

Comment by Anoop Kunchukuttan on April 26, 2011 at 6:54pm

Saad, I tried out Sphinx 4. The documentation is good and usage is pretty simple. You could get started with these steps:

1. Download Sphinx 4.1 beta. (download the source too, if you want to understand the sample examples.
2. Install JSAPI. Follow instructions given in doc/jsapi_setup.html
3. See the architecture diagram (doc-files/) and read the Sphinx-4 Application Programmer's Guide doc/ProgrammerGuide.html to understand the architecture of a Sphinx app and how to write Sphinx apps
   - First example (Hello World) is for recognizing on specific vocabulary - you need to learn the JSFG grammar to define this vocabulary. I suggest you do not try changing this example. Just run it. (http://www.w3.org/TR/jsgf/)
   - Second example (Hello N-gram) is for the more general case of recognizing data from all kinds of vocabularies. Understand the configuration file, and try playing around with the parameters, changing the parameters)
4. Understand the configuration file. The important things to understand are: the Decoder, the Recognizer, the Language, Acoustic models, the dictionary

The toolkit is pretty rich, and you can mix and match a lot of things with just the config file. For example, to detect and remove background noise will require some front-end configuration.


Comment by Mandar Pande on April 26, 2011 at 4:12pm

but anoop if i index complete file as a document in the index, then i wont be able to search for multiple occurances of a keyword in a file... bt if we index each line separately , we can find multiple occurances also...

bt still i'll give a try to achieve what you are saying and get back to you shortly.....

Comment by Anoop Kunchukuttan on April 26, 2011 at 3:10pm

Apologies, I haven't been able to reply over the last couple of days.

 

A few comments:

- Most of Lucene's configuration parameters are pretty low-level stuff like RAM memory for buffers, number of segments, etc. It should be ok to keep these in config files, and change them when required. I don't think it is worth building a UI for the same.

- Looks like you have indexed one line at a time. As we talked earlier, what you should do is consider an entire document as a unit and index the document as a whole. The search should then return document. By one document, I mean the text contents for one video. It is at this document level that search makes sense.

- Make  sure the results are shown in the same order as returned by Lucene TopDocs, because these are ranked results

- So the Lucene index should contain the index over the content + metadata to fetch the original document/video. It could be some key to a DB where you are keeping the data or a file system path where you are maintaining the files.

 

Can you try out the following so that you know enough Lucene functionality required for your project.

- Index multiple files (one index entry per file). The index should contain the following fields - title, content, filepath

- search with a query over the entire index - using either the content field or the title field

- return to the user the ranked list of documents. The list should contain the title and the path to the file.

 

With this you should be ready to integrate Lucene into your project.

© 2012   Created by Kaushik.

Badges  |  Report an Issue  |  Terms of Service