
מאגר שמות ישראליים בעברית – Israeli hebrew names dataset

זהו פוסט ראשון בבלוג בעברית, מאחר והוא דן בנושא שמות ישראליים בעברית. לאחרונה יצא לי להתעסק בכריית שמות מדפי אינטרנט ומהר הבנתי שלא אתקדם הרבה אם לא תהיה לי רשימת מילים שהן למעשה שמות, כדי להפריד בקלות את הטקסט מהשמות.
לא מצאתי רשימה כזו פשוטה, למרות שבאתר מ.י.ל.ה של הטכניון יש לקסיקון די נרחב של מילים בעברים עם טיוג גם לשמות. למרות שאפשר בקלות לדלות משם את השמות עם JAXB על הסכמה של הXML, לא עשיתי זאת מפאת חוסר זמן וקוצר רוח.
אז עשיתי רשימה בעצמי. התחלתי ממאגר שמות שקיים אצלי ופירקתי לשם פרטי ומשפחה באמצעות רווחים, ולאחר מכן התחלתי במלאכת הכרייה שהוסיפה הרבה מאוד שמות למאגר.
לאחר מכן חזרתי למאגר שלי ומניתי את המופעים של כל שם כשם פרטי ושם משפחה, כדי לעזור בכרייה עתידית. כך אפשר למצוא עוד שמות למשל אם לוקחים את המילה שבאה לפני שם משפחה מובהק מאוד.
עם זאת ישנם שמות מאוד מבלבלים מבחינת שיוך לשם פרטי או משפחה, למשל “גל”, “שלום”, או “ברק”. לעומתם שמות מובהקים לכאן או לכאן כמו “אהוד” או “לוי”
בכל מקרה, הנה הרשימה לשימושכם החופשי.
נא לקחת בחשבון שזו רשימה חלקית ביותר, וכן המנייה של השמות חלקית ביותר גם היא.
This is the first hebrew speaking post on the MTT blog, since it speaks of names in Hebrew. This is also not a translation of the above text, just a preamble to it. I’ve collected a list of Hebrew first and last names and counted the number of times a name appears as first and last on a private database of names. The result may be useful for someone extracting Hebrew names from the web.

3d Augmented Reality code graphics Mapping opengl programming Tracking video vision

Bootstrapping planar AR and tracking without markers [w/code]

Years ago I wanted to implement PTAM. I was young and naïve 🙂
Well I got a few moments to spare on a recent sleepless night, and I set out to implement the basic bootstrapping step of initializing a map with a planar object – no known markers needed, and then tracking it for augmented reality purposes.

code Java programming

Apache Tapestry 5 Progress Bar with jQuery+Bootstrap [w/code]

Just sharing a code snippet about how to implement a jQuery+Bootstrap progress bar for a background operation in Tapestry 5. There’s not a lot to it, but it took me a while and serious digging through the internet to find how to make it work. Essentially it’s based on a couple of examples and references I found:

But I simplified things because I don’t like the over-design Java can easily make you do…

3d Augmented Reality code graphics opencv opengl programming qt Tracking video vision

Augmented Reality on libQGLViewer and OpenCV-OpenGL tips [w/code]

You already know I love libQGLViewer. So here a snippet on how to do AR in a QGLViewer widget. It only requires a couple of tweaks/overloads to the plain vanilla widget setup (using the matrices properly, disable the mouse binding) and it works.

The major problems I recognize with getting a working AR from OpenCV’s intrinsic and extrinsic camera parameters are their translation to OpenGL. I saw a whole lot of solutions online, and I contributed from my own experience a while back, so I want to reiterate here again in the context of libQGLViewer, with a couple extra tips.

code Mapping opencv Tracking video vision

Simplest 20-lines OpenCV video stabilizer [w/ code]

Just sharing a simple recipe for a video stabilizer in OpenCV based on goodFeaturesToTrack() and calcOpticalFlowPyrLK().
Well… it’s a bit more than 20 lines, but it is short. And it doesn’t work for every kind of video (although the results are funny anyway! :).

code Music opencv programming vision

Using Hidden Markov Models for staff line removal (in OMR) [w/code]

Screen Shot 2015-01-24 at 10.11.00 PM

So lately I’m into Optical Music Recognition (OMR), and a central part of that is doing staff line removal. That is when you get rid of the staff lines that obscure the musical symbols to make recognition much easier. There are a lot of ways to do it, but I’m going to share with you how I did it (fairly easily) with Hidden Markov Models (HMMs), which will also teach us a good lesson on this wonderfully useful approach.

OMR has been around for ages, and if you’re interested in learning about it [Fornes 2014] and [Rebelo 2012] are good summary articles.
The matter of Staff Line Removal has occupied dozens of researchers for as long as OMR exists; [Dalitz 2008] give a good overview. Basically the goal is to remove the staff lines that obscure the musical symbols, so they would be easier to recognize.

But, the staff lines are connected to the symbols, so simply removing them will cut up the symbols and make them hardly recognizable.
So let’s see how we could do this with HMMs.

code graphics opencv vision

Run length encoding in OpenCV [w/code]

RLE exampleSharing a simple code snippet for run-length encoding with OpenCV…

graphics opencv vision

Simplest Color Balance with OpenCV [w/code]

Color balanceI came across an extremely simple color balancing algorithm here. And I thought I’ll quickly transcode it to OpenCV.
Here’s the gist:


New Year, New Look

Hi Everybody
Another look at our blog, made us think: Why do we still look so 90’s?
This is when we decided to do some cosmetic and functional changes:
We changed the theme (We though about buying a wordpress theme, but for some reason, they’re way too expensive),
We installed a new commend system called “Disqus”
We have a new logo,
And we generally want to make your stay more comfortable.
We hope you like it.
Roy and Arnon

code Music programming qt Recommended Software

Touch up your sound with SoundTouch [w/code]

Screen Shot 2014-10-13 at 11.55.42 AM
So I needed to speed up / slow down an audio stream I had (speech generated with Flite TTS) and naively I thought it would suffice to simply sample it at the right intervals and interpolate.
I quickly discovered that just re-sampling won’t do because changing frequency also changes pitch proportionally. And then I discovered the world of Time Scaling in audio and it’s many algorithms and approaches to change the tempo without changing pitch.
To my surprise there were a number of ready made free libraries that do it, but the first one I tried – RubberBand – did not work out, it had too many dependencies I simply couldn’t be bothered compiling it for the Mac. But SoundTouch, well it had a Homebrew formula so it won by default.
I wrote a little simple wrapper around it, that interfaces nicely with Qt.
Let’s see what’s going on there