mattbell | Entries tagged with vision

This video reminded me of how a lot of our dreams about robotics over the last 50 years are coming to fruition. Ideas that have been dreams long deferred are starting to roll out quickly in research labs. There are of course significant barriers to deploying some of these systems for consumer use but I expect that to change quickly.

Watch a team of autonomous helicopters build a primitive structure:
http://www.youtube.com/watch?v=W18Z3UnnS_0 -- lj sucks... youtube embed is broken

The developments appear to be coming quickly now; I was very excited by Google's recent ambitious yet underreported Manhattan project to build a truly robust self-driving car.
Here's what I think will make the robotics revolution happen easily:

- Complex semi-automated production lines that can build robotic toys like the Pleo, which has almost 2000 parts -- This will make production costs for complex robots low enough to be viable for home use.
- 3D vision technologies like the PrimeSense camera (which is used in the Kinect I'm so fond of) -- This will enable robots to easily see and maneuver through a wide variety of environments. A lot of hard 2D vision problems are easy in 3D
- Standards for robotics software systems such as Willow Garage's ROS, which will simplify the development of hardware and software ecosystems for robotics. Willow Garage isn't trying very hard to make money despite being for-profit, but they are helping create the substrate to give birth to a new industry.

---

Anyway, I'm home sick. Time for a nap.

I've been working on other things with the Kinect, but I do want to keep making multiple reality videos.

I got a couple of friends who do acroyoga to come over. Here's what we made:

In case you haven't been following along:

I wrote some software to merge multiple 3D video streams captured by the Kinect into a single 3D space. Objects from each video stream are superimposed as if they occupy the same physical space, with nearby objects from one video occluding more distant ones from another. Sometimes objects overlap, creating interesting mutant forms.

As most of you know, I've spent the vast majority of the last 2 years not working -- instead I've chosen to focus on developing other skills and experiences, including traveling the world, fixing my insomnia, improving my nutrition, developing an exercise plan that's changed my body, improved my health, and taught me lots of fun physical skills (rock climbing, snowboarding, parkour, yoga, and hang gliding), doing lots of creative projects, and getting involved with the disorganization of the Ephemerisle festival.

Despite the fun of my laid-back adventures, I've been missing working on a big, meaty, potentially worldchanging project. I have looked at various opportunities, but I've been hesitant to jump into anything, knowing firsthand just how much work a startup can be.

However, at this point, I'm excited enough about new possibilities created by low cost 3D computer vision that I'm eager to start something new. Technologies like the Kinect allow people to capture the world around them in 3D, enabling them to easily bridge between the physical and virtual worlds. How important is 3D capture? I think it will ultimately become as important as photography. By capturing objects and environments in 3D, you will be able to do many things you cannot do with photographs. You will be able to rotate around objects and see them from many perspectives, or walk through real environments as virtual worlds. It's the difference between looking at a scene and being *in* the scene. Better yet, you will be able to seamlessly mix physical and virtual worlds -- you could upload all your favorite physical objects into an online virtual world, drop virtual annotations and objects onto a physical environment, and preview changes to the physical world (such as new furniture in your living room or new clothing on your body), among numerous other things. While many of these things are happening already, they have not been within reach of consumers until now.

While some of the more far-out visions for the seamless merging of physical and virtual worlds will take years to come to fruition, I'm looking at some ways that I can provide some useful tools (and make some money) in the short term. Unlike my last company, which took on a lot of funding and became divorced from the realities of the market, I intend to dramatically shorten the cycle of market feedback.

I'm developing a toolset that will make it as easy as possible to use a Kinect for various 3D capture applications. I should leave the specifics out of this public post, but I encourage those of you who share an interest in the possibilities of 3D vision to contact me. I'm already working with two potential clients.

This is all very exciting, which is exactly what work should be.

Lest I be getting lazy on a Friday night, I made a 3rd Kinect video. Here is another fun thing you can do with your own software on a 3D camera:

By taking a 3D snapshot of the room with furniture in it, I can remove the furniture and then wander in the 3d "ghost" space left behind.

I made some improvements to my program from yesterday. Now I can control how multiple RGB/Depth images are merged together to create a virtual 3D sculpture I can walk through. This stuff is seriously fun.

Videosurf is a new video search engine that has somehow managed to design a better video search site than Youtube while getting permission from Youtube to use their entire database as well as the databases of other sites. The interface is even more Google-y than Youtube -- it lets you refine the search by video length and other information, and offers little thumbnail snippets that are automatically chosen to give you a sense of the different parts of the video. (It's doing image processing on the video to find the most relevant parts and to learn more about the content) You can click the snippets to jump directly to the relevant part of the video, which allows you to deal with those annoying videos where nothing happens for the first minute or two.

Check it out:

videosurf.com

I'm giving a computer vision talk at the H+ conference on Dec 5-6 in Irvine near Los Angeles. I could just pop in and out of there via the Santa Ana airport, or I could turn it into an interesting adventure through LA. Anyone interested in meeting up?

I went to a computer vision conference today. It reminded me of how much I like computer vision. It has hard math. It requires maintaining an interesting dichotomy between the exactitude of computer science and the fuzziness of the real world. It is a crucial stepping stone on the path to artificial intelligence. It bridges the gap between the physical and virtual worlds. It involves making demos with pretty pictures. What's not to like?

Here are some of the coolest things I saw:

Computer vision and fashion: like.com and covet.com
These services are an unlikely but highly innovative merging of computer vision and women's fashion. When you select items you like, it uses image recognition algorithms to find aesthetically similar alternatives and accessories. Unsurprisingly, it's best on things with patterns (eg floral print dresses) and tends to confuse different styles that have the same color. It's still very impressive though. On covet.com, they have a "get to know your style" app where you repeatedly pick which of two clothing styles you like better. An algorithm then analyzes the clothing that you chose for pattern, shape, and texture. The trouble is that all the photos are of Hollywood actresses wearing what I usually regard as fairly ugly stuff. I told them they need a "neither" button and more style variety. Still, they've managed to do very well in a bad economy. Although they are a website, they are a feeder for online retailers, and thus can make a ton of money off of affiliate fees instead of depending on advertising. It's a great place to be.

SnapTell
SnapTell is one of the most useful iphone applications I've seen. You can take a photograph of any book, video game, CD, or DVD, and it will recognize it within a few seconds. The recognition is done on a remote server. Once it's recognized, you can see the product on Amazon, Barnes & Noble, and various retailers' sites as well as read reviews and other useful information. The company was bought by Amazon just in the last couple of weeks.

Watching dreams with MRIs. (research page here)
A Berkeley researcher has built a map of the human vision system that can, with very high accuracy, figure out what part of what movie you're watching from a dataset of 10,000 hours of video. It builds up a model of your vision system by watching you look at pictures in an MRI for a while. Then this model can be run in reverse to find pictures that correspond to your current vision activity. It can't tell in detail what you're looking at, but it's very good at finding similar scenes. One of the things the researcher wants to do in the future is use the device to decipher the contents of people's dreams as they are having them. The technology could potentially be used in the future to read your verbal thoughts. It's unclear how far away these sci-fi goals are, but the amount of progress that's been made gives me goosebumps.

I was also impressed at the various efforts at scene recognition. Researchers are getting a lot better at labeling the various objects in a scene, which is something that even a mouse can do easily but computers have a very hard time with. There's been a lot of progress in the last three years.

There's an interesting New York Times article about the beginnings of augmented reality programs that lay data onto the real world as seen through your cellphone's camera. Thus, you can read about local businesses while looking at them.

http://www.nytimes.com/2009/07/12/business/12proto.html?hpw

Back in 2003 I took several hundred pictures of my apartment, covering every corner of it from multiple angles. I figured that within a decade, it would be possible to use software tools to reconstruct the whole place in 3D. As it turns out they're of the way there already:

This service requires 5-15 photos of an object to reconstruct it in 3D, and probably can't handle shiny surfaces very well. However, it's progress.

It turns out that one of the most impressive tech demos from Siggraph 2007 is already in Photoshop CS4.

Content-aware scaling lets you resize photos without cropping or stretching. Instead, It just compresses the least interesting parts of the image.

Confused?

Check out this video (skipping the first minute if you don't want the backstory)

This is one of the most interesting applications I saw over the past weekend:

It takes everyone's vacation photos online, and then uses image processing algorithms to match features and figure out the shape of what had been photographed and the position it had been photographed from. Then, the pictures are stitched together in 3D so that you can navigate from picture to picture. It's a beautiful concept, and it shows how online photo sharing can be taken to a totally new level.

About 4 years ago, I took about 500 photos of every corner of my apartment, figuring that the technology would exist one day to use all those photos to reconstruct my apartment in 3D. I guess that day is closer than I thought.