Handling large data in MySQL

In a very large DB, very small details in indexing and querying make the difference between smooth sailing and catastrophe.

[This post was inspired by conversations I had with students in the workshop I’m attending on Mining Software Repositories. It has been updated a few times.]

Continue reading

Posted in advice, software repositories | Tagged , | 5 Comments

When History is rewritten and replaced with a Good Story

I recently watched a TED talk with a fun topic: why is ‘x’ used as the unknown variable in Algebra and beyond? If you haven’t seen it, this 4-min talk is above. The thesis is this: x is used, because the Spanish, who were the first to translate the great scientific and engineering knowledge coming from the Arab world, don’t have in their language the ‘sh’ sound that dominated the original Arabic word, ‘something‘. Hence, by convention, they used the sound ‘k’ as in the Greek letter chi (χ) (pronounced ‘kah-i’ in English); later translations of these works to Latin mapped that to the Latin letter x, because the symbols look the same. The speaker is engaging and confidant, the talk is short and sweet, it’s a great story to tell over a dinner party. Except for one detail: it’s full of historical inaccuracies.

Continue reading

Posted in academia, commentary, conferences | Tagged , , | 5 Comments

Simulating a City

For the past 4 years or so, in my spare time, I have been working with a small start-up company, Encitra, whose goal is to help cities and real estate developers make sustainable urban plans come to life in the minds and hearts of stakeholders and the general public. We go at it with virtual reality. Not just computer animation movies; we develop complete multi-user interactive virtual environments that are built and re-built over time by multiple people, and that simulate urban areas — both structural and dynamic aspects — as faithfully as possible. Recently, we accomplished an important milestone: we were able to simulate an area of 3km x 1.5km of the city of Uppsala, Sweden. This includes the actual terrain, the major landmarks of the city, several hundred assorted buildings, as well as traffic and pedestrians. It’s all live and accessible on the Internet, although not on the Web browser. This post explains the technology behind it. For the most part, it’s all based on open source software!

Continue reading

Posted in simulation, social software systems | Tagged , | 3 Comments

The Single Most Important Thing

What is the single most important feature of a programming system without which you can’t write programs effectively?

Continue reading

Posted in commentary, social software systems | 1 Comment

Research in Programming Languages

Is there still research to be done in Programming Languages? This essay touches both on the topic of programming languages and on the nature of research work. I am mostly concerned in analyzing this question in the context of Academia, i.e. within the expectations of academic programs and research funding agencies that support research work in the STEM disciplines (Science, Technology, Engineering, and Mathematics). This is not the only possible perspective, but it is the one I am taking here.

Continue reading

Posted in academia, research | Tagged | 104 Comments

To Dish or Not To Dish

Dear @bby,

I am being asked to write a recommendation letter for someone who has been working with me for 3 years and who I think sucks. What should I do? Should I simply decline to do it? Or should I say what I honestly think about that person and his work? — because he deserves it!

Sincerely, Conflicted Recommender

Continue reading

Posted in academia, advice, sarcasm | Comments Off on To Dish or Not To Dish

Ethics in Economics

Imagine this. You have a brilliant idea for how to reverse the effects of aging in female infertility, a wonderful combination of drugs that you have been developing in your lab with your graduate students, and that will open the possibility of motherhood to hundreds of thousands of women who waited just too long to conceive. You have done your Math, your Chemistry, you have developed the model explaining why your idea works. You have tested it in mice. You have tested it in pigs. You got 90% success. You have very little doubt that it works in humans too. If only you could test it… Now imagine that this is 1925, there are no Institutional Review Boards, no Ethics committees to go through, no clinical protocols. In order to test your ideas, you simply need to recruit women who routinely come to your medical office lamenting that they would like to have children but they are too old to conceive. You wholeheartedly believe in your cure and dream with the Nobel prize. Those women desperation is a powerful context for testing your ideas; they want it, they will gladly try anything!

Continue reading

Posted in academia, commentary, ethics | 2 Comments

Producing SPLASH



I’m chairing SPLASH/OOPSLA this year. That means that I’m like a Producer, I get to do all the work behind the scenes in order to make the conference come to life. And it’s finally coming to life. After one year and a half of “programming,” I just pressed “Run.” It’s a little crazy if you believe in agile. A whole year and a half of designing and “programming,” with no testing whatsoever, no small chunks, just a long process of envisioning, estimating, guessing, coordinating, signing contracts, making decisions; then we unleash the event during 5 days over almost 600 people and hope for the best!

So what’s involved in producing a conference like SPLASH? Read on if you want to know.

Continue reading

Posted in academia, conferences | Tagged , | Comments Off on Producing SPLASH

A Theory of Aspects as Latent Topics

Underlying the work on Aspect-Oriented Programming (AOP) there is a premise that no one ever challenged: the existence of cross-cutting concerns that find their way to programs in a tangled and scattered manner. We’ve all seen it. But do tangling and scattering of program concerns really exist in real programs? Do they have a strong effect or is this one of those academic non-issues? That was the question we set out to answer in a paper we published at OOPSLA 2008. And the answer was: yes, these effects do exist in real programs, they are noticeable and detectable, and they reveal a few insights on the nature of those concerns. But they raise even more questions for AOP. Here is a summary of our study. For all the details, read the paper [1].

Continue reading

Posted in research, software repositories | Tagged , , , | Comments Off on A Theory of Aspects as Latent Topics

Vandalism Detection in Wikipedia

If you have to develop a classifier for detecting vandalism in Wikipedia with just a small number of features, what kind of features give the best results? According to our latest work on vandalism detection in Wikipedia, to be presented at WikiSym 2011, the best features are the ones pertaining to user behavior within the system — things like the deletion of other users’ content, the survivability of the user’s additions, number of words deleted by a user, whether the user has a page on Wikipedia or not, etc. Other kinds of features such as textual and language model features are routinely used in email spam filters, but it turns out that these don’t do as well as the user behavior features. That’s right, the user behavior within these systems contains a very strong signal for detecting what the users are capable of doing in the future, and therefore can detect vandalism fairly well, especially the more subtle kinds of vandalism. I’ve been wanting to write an overview of this work for a long time, finally here it is. For all the details, read the paper.

Continue reading

Posted in research, social software systems | Tagged , , | Comments Off on Vandalism Detection in Wikipedia