Cool Free/Open Source Software from Google

As has been spoken about endlessly (OStatic, OSnews), there is a great blog post from 0×1fff with many (started at 35, is now many more) open source projects from Google. In fact and indeed, there is some cool stuff on there. I knew about Caja and Protocol Buffers (wish there was a JS port of protocol buffers) but did not know about CRUSH and skia. Honestly, there are plenty of cool projects out there and my already-positive opinion of Google is only bolstered by the fact that they give back so willingly. Gotta love it.

ES5 is finally here, JavaScript geeks rejoice!

So, earlier this week, it was announced that ECMAScript 5 has finally been released. This is a good thing and I caught the highlights on InfoQ. The full draft is a 252 page PDF  beast of a document which covers basically about everything there is to cover. The things which strike me as interesting are the improved Array functions (like map, filter and reduce), some (finally) ways to harden Objects (in the form of freeze and seal) and JSON in the language. The other big deal which has me excited is the availability of a strict mode which has been spoken about by Douglas Crockford in his Google Tech Talk as well in his book JavaScript: The Good Parts, which you should buy. Honestly, it makes you appreciate JavaScript so very, very much as D-Crock highlights the best and worst features of JavaScript.

In reality, this has been a big month for JavaScript with Google open sourcing its internal JS toolkit, Closure along with much attention being paid to to projects like NodeJS (for network stuff) and CommonJS (for everything else).

Messing with OpenStreetMap

Some people might not see the reason for a project like OpenStreetMap when there are plenty of good mapping products and services laying around. I am not one of them. Whenever I use a GPS, I think quite a bit about it’s inner workings. How does it figure out which route is best? How does it calculate things on the fly? All of these questions usually lead me to think, at one point or another, that it depends very much on the data. While most mapping services and individual GPS devices use various algorithms for calculating routes, etc. (prob based on some weighted graph or something), they also rely on different sets of map data. A GPS can only tell you where on the planet you are, not what road you are on. For that, it needs map data. The only issue is that all of the map data used by popular services is proprietary!

Enter, OpenStreetMap. Seeded with the geographical data made publicly available by various governments and public universities, OpenStreetMap provides Free (as in freedom, licensed under CC-BY-SA) map data to anyone who wants it. The data is usually in pretty good shape because the initial measurements are in good shape. However, things aren’t perfect. Lucky for the web, OpenStreetMap.org allows users to help improve the data in a number of ways.

First, users can upload GPS traces to help improve the quality of unmapped regions such as seriously-rural areas along with bike and hiking trails. Second, users are able to tweak the mapping data to correct errors. There are a number of ways to do this but OpenStreetMap.org has an online editor which lets you overlay OSM data onto sattelite imagery so you can move those roads, landmarks and the like into the right location. In about an hour, I had cleaned up much of my hometown and began to add local landmarks, parks and buildings. It’s quite easy.

The project itself seems off to a great start and the planet shows a fair bit of activity. In particular, I like the idea of mapping parties where people get together and work on a given area. This seems like a great way to give back to the community and I plan to float the idea at the next SCOSUG meeting.

It’s never too early for elevator calculus

So, this past Friday night we had some friends over for a lovely Shabbos dinner. It really was a delightful time. Good food, good company and good conversation. What could be more relaxing and appropriate for Shabbos after a long week? Anyway, after dinner, they all left and Sarah and I cleaned up and promptly fell asleep.

Unfortunately, the building fire alarm went off around 12:30 forcing us out into the cold Connecticut air in a hustle of confusion. Out there we met another couple who lives far above us on the 17th floor. It turns out that there was some sort of fire in the penthouse and everything was resolved by New Haven’s finest. However,  that left us with the problem of trying to catch the elevator up when there were hundreds of humans in the crowd waiting to get lifted. We only live on the 4th floor so it was no problem for us to walk up to our floor but the other couple was left to fend for themselves being too tired (obviously) to climb 17 floors! I told them to come up to the 4th floor with us.

Therefore, I suggested that they try to catch an elevator moving in either direction from the 4th floor. I figured that the volume of people moving up would make the most hotly-saught-after commodity a space on an elevator so I figured that it’d better for them to get two spots on the way down and ride down so they could then ride back up. This sort of made sense to me but I could not forsee one major factor: drunk people coming down at the same time! So this morning, I found a few references to special elevator algorithms.

Was my gut instinct correct about catching a lift from the 4th floor?

Data visualization on a web page

Quickie: Two of my favorite ways to get data visualized on a web page are the Google Chart API and flot, the amazing canvas-based plotting library built on top of jQuery.

The Google Chart API provides a rediculously clever way to get high-quality information graphics which are generated on the back of the clearly-amazing Google infrastructure. You just use the URL layout provided and it sort of just works. All types of charts can be created. It’s very nice if you’re willing to take the time to piece together the URLs in the proper format. There are some abstractions, though.

Then there’s flot (which I’m told is Swedish for “pretty”). Flot is a library written in JavaScript on top of jQuery which produces very nice charts inside a canvas element. The demos are quite gorgeous and it’s operation seems straightforward enough. As a side note, Lift has a built-in flot widget.

Good talk about DSL construction in JS

Neat talk, should remind everyone about the joys of doing cool stuff in the browser. Plus, it made me think about DSLs in general which is good because of all the Scala goodness I’ve been messing around with recently. The talk is on InfoQ.

Thoughts on different types of data

At work, my title is the Data Warehouse Manager. Mostly, I work with numbers. Lots of test scores, checklist data, compositve variables and the like. Numbers are cool and they can often yield plenty of insight. However, numbers are not the only story that can be told! So much of business intelligence right now focuses on natural-language processing (NLP). I once heard someone say that getting to 80% in NLP was almost comically easy but getting to 90% has proven to be amazingly difficult! Obviously, I’m not content to wait around until the singularity (though I do believe it is near) to be able to utilize textual data for informing educational decisions. So then where is the merit in utilizing textual data if we can be only 90% sure of something? Numeric metrics like test scores are perfectly clear and definite, right?

Wrong! Any good educator knows that test scores (especially from a single test) are not the only way to gain additional insight. While the strengths and weaknesses of the standardized-testing model are a different discussion entirely, suffice it to say that the community needs to remember that, when it comes to education, all numbers are fuzzy. The learning process is not discrete, definite or clean-cut so why should we expect a signle number to show singular accuracy with any great fidelity? In fact, we cannot and it is for this reason that an automatic evaluation of textual data might help educators make better decisions. Perhaps an example is in order.

Let us say that every so often in a math class, teachers write a brief one-paragraph essay about each student summing up their competency and general performance. Then let us also say that students are able to write a sefl-evaluation each time their teacher writes something. So after, a period of time, each student has a bit of text in addition to their test scores and general grades. I propose using some sort of automated classification mechanism to evaluate the textual data and produce best guesses as to the nature of the content. Afterall, if there are 25 students in a class and there are 10 classes taught each day, then that’s a lot of evaluation data to be read! So what about using something like a naive bayesian classifier to sort through things?

But wait, doesn’t this just give us another mechanism to boil down the real-world intuition of the educational system? Well, no. If we let go of the idea that these numbers are perfectly discrete (which we know they aren’t), then we can use them to inform our descisions rather than letting the data decide for us. Probabilities are ok so long as we use them correctly. Few people would wear shorts in the winter just because the weatherman said there’s a chance of sun. So how might such a system be used?

Well, the point of a naive classifier is that it needs to be trained. First, teachers would need to assmble a training data set of some sort. In younger grades, this might be quite easy because students have a more limited vocabulary and might not construct sentences with as much nuance as they learn to do in later grades. As for teacher evaluations, there is a somewhat-finite set of words in the vocabulary of an educator to usefully describe educational progress. Relying on the consistency of such a set nomeclature might very well yield consistent results. Though, all of this would need to be tested anyway, right? This is just a thought experiment…

So the system would be able to chew through these evaluations and produce a guess about whether they reflect positively on student learning or not. Perhaps they could even be used to demonstrate growth. Getting this right would take work but it might lead to a useful way to model evaluative-data. Obviously, relying too much on this might very well to a negation in the positive nature of narrative evaluations but it’s worth a try!

Podcasts I now listen to

I commute to work. Unfortunately, there is no way to carpool so I end up driving about 80 miles a day alone. Still, I drive a very fuel-efficient Honda Fit and try not to feel too guilty about the fuel consumption considering that I use very little. So, with all that time spent on the Merritt, what should I be listening to in the car? Well, I now have a few podcasts that I listen to.

First and foremost, I enjoy the LKML Summary podcast put out by Jon Masters. He does an exceptional job of summarizing things and his side commentary is usually rather hilarious, actually.  I do wish he’d provide a touch more background on certain things as it can be hard to jump right into the high-level technical discussion on the mailing list. Masters’ radio-compatible voice is a pleasure to listen to in the car, even if each episode is only just under nine minutes long.

Next, I have Software Engineering Radio, which has some really neat discussions. The latest one is on APIs and then I’m going to listen to the one where they interview Odersky. The commentary on topics is quite lucid and I greatly appreciate the candor of guests when discussing specific issues. After all is said and done, the software community at large can be a political group (myself included) so it’s nice to have some really great technical discussions as well.

So far, that’s it. I want to also sample the OSNews podcast but haven’t gotten around to it yet. So with about 2 hours in the car every day, anyone know of anything else I should be listening to now that LugRadio is over? I’d be interested in some podcasts on the topics of emacs, Debian or Scala stuff.

Slow-cooked lentil stew

It’s simple, easy and really good if you have the patience to wait while the slow-cooker does it’s work. Just make sure you season well enough because lentils can be bland. This being said, it certainly can get too salty so just be careful not to add too much salt with other stuff.

Ingredients:

  • 2 Large yellow onions
  • 1 whole head of garlic (just cut each clove in half)
  • 2 tsp black pepper
  • 2 tsp adobo powder (without hot pepper)
  • 1.5 tsp cumin seeds
  • 1 tsp turmeric
  • 1 bag (500g?) of black lentils
  • Beef marrow bones
  • Water to cover

Dice the onion, cut the garlic cloves in half and throw them in the pot. Add the marrow bones and the lentils. Throw the spices in and cover the whole affair with water. Put on low and leave overnight in the slow cooker. The top will become very dark as it cooks but that’s ok. It just means it’s time to stir. Taste about an hour before serving and adjust the seasoning as needed.

The most important part of what makes this recipe great is the marrow bones. Don’t try to substitute any other kind of bones like neck or something because it just isn’t the same. Besides, there is something very primally satisfying about sucking on the marrow bones after eating the lentils. If you can eat it quickly enough, the bones will still be rather warm and the flavor from the spices makes the marrow both smooth and satisfying.

Getting somewhere with Scala

Ok, so when beginning with Scala, it was hard to figure certain things out. For example, what’s the best way to document my code? How should I go about testing my new programs? Which race in StarCraft best represents the ideals of Scala? Well, to prevent other people from getting stuck on these, I am writing this post to help out.

Writing code

How best to write code? Are there IDEs which support or maybe even encourage Scala? Yes! Yes, there are!

Project management

How best to keep track of Scala projects? Well, everyone’s got their favorite way to build stuff (Make and cousins) and those will most likely work fine. However, if you want something a little more specialized there are several tools which can help.

Testing

There is no one way to test Scala, and this is true of most langauges. However, Scala does include a test framework called SUnit. That being said, it’s slated for deprecation as soon as it can be replaced and removed. Also, the current thinking out there seems to be that SUnit sucks. So, you know what, kids? We aren’t going to let that bother us because there are many great frameworks out there which may be used to effectively test Scala. Since Scala compiles to Java bytecode, there shouldn’t be trouble testing Scala from Java or Java from Scala.

Documentation

Who doesn’t love documentation? I know that I sure do! To help satisfy my deep-seated urge to author code-centric exposition, there is scaladoc. Ever hear of javadoc? Yeah, it’s like that. Read about javadoc here and then check out the scaladoc manpage. If, for some reason you don’t like scaladoc’s output, you’re in luck because there is an alternative generator called vscaladoc.

Notes

This list is most likely incomplete and certainly not exhaustive. If someone would like me to add something, please provide the link and info in the comments.