Update: check out my new post about this https://www.morethantechnical.com/2012/10/17/head-pose-estimation-with-opencv-opengl-revisited-w-code/
Hi
Just wanted to share a small thing I did with OpenCV – Head Pose Estimation (sometimes known as Gaze Direction Estimation). Many people try to achieve this and there are a ton of papers covering it, including a recent overview of almost all known methods.
I implemented a very quick & dirty solution based on OpenCV’s internal methods that produced surprising results (I expected it to fail), so I decided to share. It is based on 3D-2D point correspondence and then fitting of the points to the 3D model. OpenCV provides a magical method – solvePnP – that does this, given some calibration parameters that I completely disregarded.
Here’s how it’s done
Category: Website
Hi
Been working hard at a project for school the past month, implementing one of the more interesting works I’ve seen in the AR arena: Parallel Tracking and Mapping (PTAM) [PDF]. This is a work by George Klein [homepage] and David Murray from Oxford university, presented in ISMAR 2007.
When I first saw it on youtube [link] I immediately saw the immense potential – mobile markerless augmented reality. I thought I should get to know this work a bit more closely, so I chose to implement it as a part of advanced computer vision course, given by Dr. Lior Wolf [link] at TAU.
The work is very extensive, and clearly is a result of deep research in the field, so I set to achieve a few selected features: Stereo initialization, Tracking, and small map upkeeping. I chose not to implement relocalization and full map handling.
This post is kind of a tutorial for 3D reconstruction with OpenCV 2.0. I will show practical use of the functions in cvtriangulation.cpp, which are not documented and in fact incomplete. Furthermore I’ll show how to easily combine OpenCV and OpenGL for 3D augmentations, a thing which is only briefly described in the docs or online.
Here are the step I took and things I learned in the process of implementing the work.
Update: A nice patch by yazor fixes the video mismatching – thanks! and also a nice application by Zentium called “iKat” is doing some kick-ass mobile markerless augmented reality.
Hi All
It looks like it’s finally here – a way to grab the raw data of the camera frames on the iPhone OS 3.x.
Update: Apple officially supports this in iOS 4.x using AVFoundation, here’s sample code from Apple developer.
A gifted hacker named John DeWeese was nice enough to comment on a post from May 09′ with his method of hacking the APIs to get the frames. Though cumbersome, it looks like it should work, but I haven’t tried it yet. I promise to try it soon and share my results.
Way to go John!
Some code would be awesome…
Roy.
Hi
In the past few weeks I have been working hard at a few projects for end-of-term at Uni. One of the projects is what I called “SmartHome”, for Embedded computing [link] course, is a home monitoring [link] application. In the course the students were given an LPC2148 arm7-MCU (NXP) based education board, implemented by Embedded Artists [link]. My partner Gil and I decided to work with ZigBee extension modules [link] to enable remote communication.
Here are the steps we took to bring this project to life.
Links of the week
http://www.runnersworld.com/article/1,7124,s6-240-319–13001-0,00.html
Shoes tying hacks
http://www.engadget.com/2010/01/18/misa-digital-guitar-cuts-the-strings-brings-the-noise/
Very nice! A digital guitar…
http://www.newscientist.com/article/dn18036
An interesting concept – see-through walls w/ augmentd reality
http://gizmodo.com/5452140/one-third-of-us-11+year+olds-have-cellphones
The “Youth market”‘s little brother – the “Toddler market” – is booming
http://gizmodo.com/5451876/rumor-apple-iphone-os-40-features-detailed
Some goodies from iPhone OS 4 – where is video-pixel-bytes access already?!
http://lifehacker.com/5452786/memorize-now-helps-you-commit-long-passages-to-memory
I like! A helper webapp to memorize text
http://gizmodo.com/5452684/voice-band-iphone-app-converts-bah-ba-ba-bah-into—
This is awesome.
http://gizmodo.com/5453436/googles-html5-youtube-videos-dont-need-flash
YouTube without flash: I tried it on Chrome, the video was choppy, volume control didn’t work proerly and the progressing download & play made the position marker bounce around. But in the end, anything that replaces Flash, and Adobe’s reign over internet interactive animation, is good..
C ya’ll next week!
Roy.
Hi
Stuff I picked up on the web the last week:
http://www.billshrink.com/blog/nexus-one-vs-iphone-droid-palm-pre-total-cost-of-ownership/
Compare the leading smartphones on the market
http://www.techcrunch.com/2010/01/05/quantcast-mobile-web-apple-android/
Mobile web usage stats: iPhone 65%, Android 12%, RIM 9%
http://gizmodo.com/5442217/the-invisible-oled-laptop-to-end-all-laptops
A transparent screen – Cool? yes. Practical? Not so much.
http://www.techcrunch.com/2010/01/06/augmented-reality-vs-virtual-reality/
Augmented reality is officially more popular than virtual reality.
http://gizmodo.com/5439721/new-touchless-mobile-interface-could-eliminate-fingerprint-smudging-forever
You don’t need a mouse anymore (if you have a 154 frames-per-second camera, and very steady hands)
http://gizmodo.com/5442385/samsung-projector-phone-in-action
Samsung’s projector mobile phone in action in CES
http://weblogs.baltimoresun.com/news/technology/2010/01/apple_tablet_3d.html
Apple is putting proximity sensors on new device to allow for 3D desktop manipulation.
http://gizmodo.com/5441682/att-sdk-for-dumbphones-announced
AT&T goes app-store on dumbphones, releases SDK for BREW
C y’all next week!
Roy
Just links [Links of the week]
Hi
Stuff I picked up on the web the last week:
http://www.techcrunch.com/2009/12/21/world-map-social-networks/
Never tired of infographics: World map of social networks
http://gizmodo.com/5429631/implausible-digital-forensics-in-tv-and-film-a-medley
Awe-some and then some. Enhance.
http://lifehacker.com/5431998/ribbit-app-delivers-voicemail-transcripts-to-your-iphone
Give Ribbit credit for the bold stand in front of G-Voice.
http://gizmodo.com/5428610/rumor-google-working-on-chrome-os+branded-netbook-with-one-secret-manufacturer
A G netbook! That’s what we’ve been missing! Not.
http://gizmodo.com/5428642/apple-patent-sees-you-computing-hands+free-in-3d
3D interface by Apple, based on position of user.
http://gizmodo.com/5433074/open-apps-on-a-virtual-iphone-thanks-to-augmented-reality
Orange Israel promoting iPhones in a cute way: iPhone inside iPhone with AR.
http://www.techcrunch.com/2009/12/23/confirmed-jajah-sold-207-million
JaJah sold to Telefonica (O2) – for 145 million euros!
See ya’ll next week!
Roy.
Leenky wiks! [Links of the week]
Hello people of high measure,
Forth are listed the hyperlinks thy humble servant hath collected in past days:
http://gizmodo.com/5423006/i-cant-stop-smiling-over-google-chromes-new-ad
Nice ad by G for Chrome!
http://www.techcrunch.com/2009/12/09/geoapi-creation/
Very interesting: huge database of geo-tagged information with API for developers.
http://gizmodo.com/5424468/mits-bidirectional-display-lets-you-control-objects-with-a-wave-of-your-hand
I saw this contraption in the lab and the guy demoed it for me. It’s strange looking, but an interesting concept.
http://gizmodo.com/5425146/the-real-google-phone-everything-is-different-now
A G phone!
This is only a fraction of the online buzz about…
http://gizmodo.com/5425012/the-pen-de-touch-for-driving-light-cycles
Air-Pen. Nice implementation. But, is this is how we’re going to interface with computers in the future? Don’t think so.
http://www.techcrunch.com/2009/12/14/4g-mobile-network-sweden-teliasonera/
Välkommen till Sverige – LTE! (“Welcome to Sweden – LTE!” in Swedish)
TeliaSonera launching LTE
http://www.techcrunch.com/2009/12/14/the-unofficial-google-text-to-speech-api/
Free Text-To-Speech from G! Hurrah!
http://gizmodo.com/5425874/fuse-what-your-next-touch-phone-is-going-to-feel-like
TAT are as usual a good indicator of future UI (Social AR…). This time: 3D UI, haptic interface.
http://gizmodo.com/5426963/the-android-market-is-getting-ready-to-explode
Some stats on the Android market, looks good.
http://www.techcrunch.com/2009/12/16/google-browser-size/
G seem very committed to making the web better. First tools for webmasters to make their sites faster, and making sure the resolution fits
http://gizmodo.com/5428174/shooting-challenge-anthropomorphism
Anthropomorphism: Look at the video, (though it’s an AD) it will put a smile on your face!
and then go shoot some anthropomorphous objects.
http://gizmodo.com/5428233/microsoft-and-palm-treading-water-while-other-mobile-platforms-grow
Mobile OS stats (Feb-Oct): Apple and RIM skyrocket! WinMo, Symb, Palm, Android – stagnate.
Enjoy thy weekend!
Roy.
Justin Talbot has done a tremendous job implementing the GrabCut algorithm in C [link to paper, link to code]. I was missing though, the option to load ANY kind of file, not just PPMs and PGMs.
So I tweaked the code a bit to receive a filename and determine how to load it: use the internal P[P|G]M loaders, or offload the work to the OpenCV image loaders that take in many more type. If the OpenCV method is used, the IplImage is converted to the internal GrabCut code representation.
Image<Color>* load( std::string file_name ) { if( file_name.find( ".pgm" ) != std::string::npos ) { return loadFromPGM( file_name ); } else if( file_name.find( ".ppm" ) != std::string::npos ) { return loadFromPPM( file_name ); } else { return loadOpenCV(file_name); } } void fromImageMaskToIplImage(const Image<Real>* image, IplImage* ipli) { for(int x=0;x<image->width();x++) { for(int y=0;y<image->height();y++) { //Color c = (*image)(x,y); Real r = (*image)(x,y); CvScalar s = cvScalarAll(0); if(r == 0.0) { s.val[0] = 255.0; } cvSet2D(ipli,ipli->height - y - 1,x,s); } } } Image<Color>* loadIplImage(IplImage* im) { Image<Color>* image = new Image<Color>(im->width, im->height); for(int x=0;x<im->width;x++) { for(int y=0;y<im->height;y++) { CvScalar v = cvGet2D(im,im->height-y-1,x); Real R, G, B; R = (Real)((unsigned char)v.val[2])/255.0f; G = (Real)((unsigned char)v.val[1])/255.0f; B = (Real)((unsigned char)v.val[0])/255.0f; (*image)(x,y) = Color(R,G,B); } } return image; } Image<Color>* loadOpenCV(std::string file_name) { IplImage* im = cvLoadImage(file_name.c_str(),1); Image<Color>* i = loadIplImage(im); cvReleaseImage(&im); return i; }
Well, there’s nothing fancy here, but it does give you a fully working GrabCut implementation on top of OpenCV… so there’s the contribution.
GrabCutNS::Image<GrabCutNS::Color>* imageGC = GrabCutNS::loadIplImage(orig); GrabCutNS::Image<GrabCutNS::Color>* maskGC = GrabCutNS::loadIplImage(mask); GrabCutNS::GrabCut *grabCut = new GrabCutNS::GrabCut( imageGC ); grabCut->initializeWithMask(maskGC); grabCut->fitGMMs(); //grabCut->refineOnce(); grabCut->refine(); IplImage* __GCtmp = cvCreateImage(cvSize(orig->width,orig->height),8,1); GrabCutNS::fromImageMaskToIplImage(grabCut->getAlphaImage(),__GCtmp); //cvShowImage("result",image); cvShowImage("tmp",__GCtmp); cvWaitKey(30);
I also added the GrabCutNS namespace, to differentiate the Image class from the rest of the code (that probably has an Image already).
Code is as usual available online in the SVN repo.
Enjoy!
Roy.
Hi good people,
Well after a two-week break, I’m back with some more interesting weekly links..
So here’s what I’ve picked up on the web the past 2 weeks:
http://gizmodo.com/5409898/3d-scanning-a-webcams-latest-trick
Awesome 3D reconstruction work with only a webcam! I dig that stuff.
http://www.techcrunch.com/2009/11/20/hot-potato-launch
Me like! An old-new mobile social concept comes to life: Ad-Hoc Social Networks meet Foursquare.
http://www.techcrunch.com/2009/11/23/apple-and-android-now-make-up-75-percent-of-u-s-mobile-web-traffic/
Current mobile-phones OSs and devices usage numbers:
iPhone is slightly rising still: 48% Sep09 to 55% Oct09, Android is growing – 17% Sep09 to 20% Oct09, and Palm is losing height – 10% Sep09 to 5% Oct09.
http://www.techcrunch.com/2009/11/23/linkedin-api-open/
LinkedIn API – well it’s about time!
http://gizmodo.com/5411779/swype-vs-qwerty-fight
The new Virtual Keyboard typing method: SWYPE, takes on the good ol’ QWERTY.
Check out Layar 3.0, awesome 3Ds in AR:
http://gizmodo.com/5417946/dear-new-layar-30-you-got-me-at-beatles
http://www.techcrunch.com/2009/12/02/layar-3-0-mobile-augmented-reality/
The HTML 5.0 craze is upon us, and it is going to change everything we know and think about web applications:
http://www.mobilecrunch.com/2009/12/02/video-webgl-might-eventually-bring-awesome-3d-to-web-apps
http://lifehacker.com/5416100/how-html5-will-change-the-way-you-use-the-web
http://lifehacker.com/5417088/create-abstract-light-art-by-snapping-a-camera+toss-photo/gallery/
Instant wallpaper photo maker: Slow shutter and just throw the camera in the air!
http://gizmodo.com/5420164/htc-2010-product-roadmap-leak-legends-salsa-buzz
This is what HTC has in store for us next year
http://gizmodo.com/5421983/lumino-project-next+generation-lego-crossed-with-microsoft-surface
Surface UI! not new, but this advancement is very interesting..
http://lifehacker.com/5421670/negotiate-anything-by-keeping-three-things-in-mind
Nice video lecture, take a look at 5:00 – speaking about innovative ideas and solutions while negotiating:
Instead of thinking “what will I gain right now”, think about make a long-term partnership with the other side.
http://lifehacker.com/5422413/vevo-is-where-your-missing-music-videos-went-to
My intuition says this is the beginning of a war on YouTube, in the Copyrights arena. Sporadic things have been done already, now let’s have a showdown!
Google is taking over the world – again! Corner
http://lifehacker.com/5411108/google-puts-coupons-on-your-phone-so-you-can-forget-the-scissors
Google coupons!
And then, just as they started to pick up, came Google and busted them all down.
Big G steps in the restaurants ranking business. Yelp, Zagat – behind you!!
http://gizmodo.com/5420737/in-the-future-we-all-will-be-google+approved
http://www.techcrunch.com/2009/12/06/google-local-maps-qr-code/
http://www.techcrunch.com/2009/12/07/google-realtime/
Big G goes realtime web… The REAL realtime web.
http://gizmodo.com/5420894/google-goggles-googles-scary-good-visual-search-app
Big G keeps on rolling… now conquering the visual search domain. what else?
http://lifehacker.com/5408499/youtube-adds-machine+generated-automatic-captions
Speech-to-text is piece of cake for Google…
Have a cracking weekend and See ya’ll next week!
Roy.