I make horrible decisions. I’m guessing that’s why I was never very inclined to become a Supreme Court Judge or something of the sort. Whereas most people spend most of their time moving forward in life, I tend to spend most of it repairing all the damage I have caused. One such horrible decision of mine involved rushing into a project with ethically dubious implications. Despite recognizing the mistake, I started working on what I called “Shady Stuff”.
Shady Stuff is how I addressed the task of motion detection in video. As bad as it made me feel doing it (not very very very bad), I did learn a great deal. It was like re-inventing the wheel without knowing that I was re-inventing it. There I sat, in a computer lab, with finals just around the corner, pondering the very simple question: “How do we, humans, see objects and recognize them?”
My task was to write some code that would enable my contractor to detect moving objects in a video and get data relating to them. I started off with the idea that if I could somehow detect which parts of the picture had changed in appearance, I’d be able to detect where the moving object was. Turns out, implementation is way too time-consuming, even for a computer because it has to see each tiny bit ‘pixel’ of all the pictures (ordinarily, videos have thirty pictures for every second of a recorded video). Thus, computers display images as red color, blue color and green color and comparing each teeny tiny part of an image for variations in all those color levels for all the images is a LOT of work.
So I sat down the next day and tried to come up with a new method of doing the same thing. This time I thought, subtracting the intensities of the two images must be easier. The idea is - wherever there is the same intensity I get a zero otherwise I get some value and thus I can know that something moved around there. And here is the second problem that I came across: most videos are not very perfect and have some level of noise in them.
Now the video I was working with was plagued with a great deal of noise. So began Problem Two - how to get rid of all that noise. Since I am a CS (Computer Science) major, I hadn’t yet taken any course which dealt with the math of filtering out noise. So I sat in labs one week before the exams thinking about how to remove the noise. Foolishly, I first tried to write some algorithms for sorting out noise and surprisingly they worked, but to little effect and with exceedingly slow speed. Then one day I browsed the internet to find how people remove noise from videos and struck upon some amazing signal processing ideas that I had to get my head around. Soon, I was back on track again reducing noise in no time.
Now I faced a new problem: the feed that I was using was not from a stationary camera so all this while I had to try and make a moving camera stationary. Two days before my exams were due to begin, I was really tense because I hadn’t studied much and the project was due right after the exams. I then got the go ahead from the contractor to use stationary feed, by which time I had sorted out the problem of reducing time further.
So I turned my new awesome stationary feed black and white, subtracted the single whiteness color level between different images to make a feed, adjusted a lot of noise content to get suggestions from a computer of where objects are at a certain time. From that point on, work was simple so I studied my butt off for the exams and in one night put the remainder of this nightmarishly-long ever-expanding unpaid for project to rest.
All this somewhat senseless rambling leads one to believe that what humans have achieved and are capable of at the moment is very hard to impersonate in the digital world. Making human-like computers is very hard albeit possible at the present stage. I achieved a moderate, (shhh, im being modest here) level of success in my attempts in doing the same in the same time as humans do but who’s to say someone else can’t do it?
Working with images and videos made me understand the value in ‘a picture is worth a thousand words’. In the near future, we must anticipate a large number of applications of computer vision – for instance, I think it’s highly possible to make computers visually determine age, ethnicity, and even race (this is called Population Demographic Collection). In some advanced cases, phenotypic expression (appearance) can be used to make an estimate of the genes of the ‘living thing’, any living thing that is. This, of course, has limited applications for disease detection. The best applications of the future, however, will most likely rely on robotics. Computer vision is our first step towards making robots capable of human interaction. Consider the implications: robots that can recognize objects can be trained to work with them just like we teach young kittens and dogs how to play with a ball.
So: in retrospect, perhaps, I didn’t make the best decision by picking a project in a field I had never worked in before and that too so close to the finals. However, computer vision’s value in demonstrating the practical realities of the future is enormous. This causes me to stop and think - sometimes it pays to be a fool.
Haris