Categories
Uncategorized

מאגר שמות ישראליים בעברית – Israeli hebrew names dataset

זהו פוסט ראשון בבלוג בעברית, מאחר והוא דן בנושא שמות ישראליים בעברית. לאחרונה יצא לי להתעסק בכריית שמות מדפי אינטרנט ומהר הבנתי שלא אתקדם הרבה אם לא תהיה לי רשימת מילים שהן למעשה שמות, כדי להפריד בקלות את הטקסט מהשמות.
לא מצאתי רשימה כזו פשוטה, למרות שבאתר מ.י.ל.ה של הטכניון יש לקסיקון די נרחב של מילים בעברים עם טיוג גם לשמות. למרות שאפשר בקלות לדלות משם את השמות עם JAXB על הסכמה של הXML, לא עשיתי זאת מפאת חוסר זמן וקוצר רוח.
אז עשיתי רשימה בעצמי. התחלתי ממאגר שמות שקיים אצלי ופירקתי לשם פרטי ומשפחה באמצעות רווחים, ולאחר מכן התחלתי במלאכת הכרייה שהוסיפה הרבה מאוד שמות למאגר.
לאחר מכן חזרתי למאגר שלי ומניתי את המופעים של כל שם כשם פרטי ושם משפחה, כדי לעזור בכרייה עתידית. כך אפשר למצוא עוד שמות למשל אם לוקחים את המילה שבאה לפני שם משפחה מובהק מאוד.
עם זאת ישנם שמות מאוד מבלבלים מבחינת שיוך לשם פרטי או משפחה, למשל “גל”, “שלום”, או “ברק”. לעומתם שמות מובהקים לכאן או לכאן כמו “אהוד” או “לוי”
בכל מקרה, הנה הרשימה לשימושכם החופשי.
נא לקחת בחשבון שזו רשימה חלקית ביותר, וכן המנייה של השמות חלקית ביותר גם היא.
hebrew_names
This is the first hebrew speaking post on the MTT blog, since it speaks of names in Hebrew. This is also not a translation of the above text, just a preamble to it. I’ve collected a list of Hebrew first and last names and counted the number of times a name appears as first and last on a private database of names. The result may be useful for someone extracting Hebrew names from the web.
Enjoy!
Roy.

Categories
3d Augmented Reality code graphics Mapping opengl programming Tracking video vision

Bootstrapping planar AR and tracking without markers [w/code]

Years ago I wanted to implement PTAM. I was young and naïve 🙂
Well I got a few moments to spare on a recent sleepless night, and I set out to implement the basic bootstrapping step of initializing a map with a planar object – no known markers needed, and then tracking it for augmented reality purposes.

Categories
code Java programming

Apache Tapestry 5 Progress Bar with jQuery+Bootstrap [w/code]

Just sharing a code snippet about how to implement a jQuery+Bootstrap progress bar for a background operation in Tapestry 5. There’s not a lot to it, but it took me a while and serious digging through the internet to find how to make it work. Essentially it’s based on a couple of examples and references I found:

But I simplified things because I don’t like the over-design Java can easily make you do…

Categories
3d Augmented Reality code graphics opencv opengl programming qt Tracking video vision

Augmented Reality on libQGLViewer and OpenCV-OpenGL tips [w/code]

You already know I love libQGLViewer. So here a snippet on how to do AR in a QGLViewer widget. It only requires a couple of tweaks/overloads to the plain vanilla widget setup (using the matrices properly, disable the mouse binding) and it works.

The major problems I recognize with getting a working AR from OpenCV’s intrinsic and extrinsic camera parameters are their translation to OpenGL. I saw a whole lot of solutions online, and I contributed from my own experience a while back, so I want to reiterate here again in the context of libQGLViewer, with a couple extra tips.

Categories
code Mapping opencv Tracking video vision

Simplest 20-lines OpenCV video stabilizer [w/ code]

Just sharing a simple recipe for a video stabilizer in OpenCV based on goodFeaturesToTrack() and calcOpticalFlowPyrLK().
Well… it’s a bit more than 20 lines, but it is short. And it doesn’t work for every kind of video (although the results are funny anyway! :).

Categories
code Music opencv programming vision

Using Hidden Markov Models for staff line removal (in OMR) [w/code]

Screen Shot 2015-01-24 at 10.11.00 PM

So lately I’m into Optical Music Recognition (OMR), and a central part of that is doing staff line removal. That is when you get rid of the staff lines that obscure the musical symbols to make recognition much easier. There are a lot of ways to do it, but I’m going to share with you how I did it (fairly easily) with Hidden Markov Models (HMMs), which will also teach us a good lesson on this wonderfully useful approach.

OMR has been around for ages, and if you’re interested in learning about it [Fornes 2014] and [Rebelo 2012] are good summary articles.
The matter of Staff Line Removal has occupied dozens of researchers for as long as OMR exists; [Dalitz 2008] give a good overview. Basically the goal is to remove the staff lines that obscure the musical symbols, so they would be easier to recognize.

But, the staff lines are connected to the symbols, so simply removing them will cut up the symbols and make them hardly recognizable.
So let’s see how we could do this with HMMs.

Categories
code graphics opencv vision

Run length encoding in OpenCV [w/code]

RLE exampleSharing a simple code snippet for run-length encoding with OpenCV…

Categories
graphics opencv vision

Simplest Color Balance with OpenCV [w/code]

Color balanceI came across an extremely simple color balancing algorithm here. And I thought I’ll quickly transcode it to OpenCV.
Here’s the gist:

Categories
code Music programming qt Recommended Software

Touch up your sound with SoundTouch [w/code]

Screen Shot 2014-10-13 at 11.55.42 AM
So I needed to speed up / slow down an audio stream I had (speech generated with Flite TTS) and naively I thought it would suffice to simply sample it at the right intervals and interpolate.
I quickly discovered that just re-sampling won’t do because changing frequency also changes pitch proportionally. And then I discovered the world of Time Scaling in audio and it’s many algorithms and approaches to change the tempo without changing pitch.
To my surprise there were a number of ready made free libraries that do it, but the first one I tried – RubberBand – did not work out, it had too many dependencies I simply couldn’t be bothered compiling it for the Mac. But SoundTouch, well it had a Homebrew formula so it won by default.
I wrote a little simple wrapper around it, that interfaces nicely with Qt.
Let’s see what’s going on there

Categories
Android ffmpeg linux Music Networking Solutions Stream

FFMpeg with Lame MP3 and streaming for the Arduino Yun

So, I’ve been trying to stream audio off of a USB microphone connected to an Arduino Yun.
Looking into it online I found some examples using ffserver & ffmpeg, which sounded like they could do the trick.
However right from the start I’ve had many problems with playing the streams on Android and iOS devices.
Seems Android likes a certain list of codecs (http://developer.android.com/guide/appendix/media-formats.html) and iOS like a different set of codecs (Link here), but they do have on codec in common – good ol’ MP3.
Unfortunately, the OpenWRT on the Arduino Yun has an ffmpeg build which does not provide MP3 encoding… it does have the MP3 muxer/container format, but streaming anything other then MP3 in it (for example MP2, which the Yun-ffmpeg does have) simply doesn’t work on the Android/iOS.
From experiments streaming from my PC a ffmpeg/libmp3lame MP3 stream, it looks like the mobile devices are quite happy with it – so I will need to recompile ffmpeg with Lame MP3 support to be able to stream it.