:: Swarthmore Projects

Upper level CS classes at Swarthmore end the semester with a self-directed project, normally with a partner or two. I've attached the papers that resulted from these projects. Classes are listed chronologically, starting with the most recent.

Research

I had the opportunity of working for the summers of my sophomore and junior years at the Swarthmore CS department with Andrew Danner. An informal abstract summarizing our work is below.

One application of Geographic Information System (GIS) data is the hydrologic analysis of gridded Digital Elevation Models (DEMs). Using Light Detection and Ranging (LIDAR) technology, large, hi-resolution elevation data can be collected. In addition to terrain, the data might include smaller features like trees, buildings, and bridges. Our research goals were to automate detection of bridges, which impede watershed analysis, and to alter the terrain to yield hydrologically correct DEMs.

A machine learning algorithm (AdaBoost) was used to combine sets of simple filters that detect local bridge-like features. It takes features describing hand-labeled positive and negative training samples as input. The algorithm combines these features into a powerful bridge classifier. This classifier is then used on unlabeled test data and outputs a grid that assigns to each likely bridge location a numerical confidence score. Using a training set of about 600 hand-tagged positive and negative samples, the classifier correctly classifies bridges about 95% of the time. Once bridges are classified, a two-pass conditioning approach is used to remove the bridges from the terrain data. The first pass makes large cuts that result in hydrologically correct but blocky terrain. Using the flow models made available by the first pass, a carving algorithm is implemented that makes smaller, more precise cuts in the original terrain. The river networks extracted from the minimally conditioned terrain show marked improvement over the original terrain. Future work may involve parallelizing the algorithm, which relies on independent classified regions making it an excellent candidate for parallelization.

We also wrote a paper that was accepted to the 2010 ACM GIS conference. (paper)

CS91 - Software Engineering: iOS Development

Our project is a music player called OpenHouse centered around an instant playlist. A typical user starts out as the DJ, adding songs to his or her playlist. Nearby users can then join as guests, view the DJ's library and current playlist, and request songs from it. The DJ can see the most requested songs and choose to add them to the playlist. While the player works great as a stand-alone music player, we designed it for parties or road trips where multiple users want to have some say over the music that's playing. You can grab it on the App Store.

CS97 - Databases

For our CS senior capstone course, a partner and I implemented a better query optimizer in PostgreSQL. The current system assumes independence between attributes when calculating selectivity (the fraction of rows a query returns), which is often wrong. Consider, for example, that education and income are highly correlated, or that the model implies the make of a car. Using this independence assumption, selectivity is often underestimated, which can impact runtimes. Our approach uses a two-dimensional histogram to better estimate selectivity. We show that it would not be difficult for PostegreSQL developers to implement more general multidimensional histograms without great changes to the existing code base. (paper)

CS65 - Natural Language Processing

We implemented a minimally supervised learning algorithm as a layer over OCR output. We used GOCR, which outputs an unknown marker when it is unsure of a character. Our algorithm first learns a set of characters that are commonly confused by our program (e.g. o,0). It then combines that information with the unknown markers to come up with possible words, and uses Google n-gram counts to identify the most likely candidate given the syntactic context. (paper)

CS87 - Parallel & Distributed Computing

GPGPU programming is simultaneously really cool and really difficult. We looked at some common use cases of CUDA and implemented a very high-level language with a graphical frontend that takes advantage of the graphics card while keeping the details hidden. Our interface provides a drag-and-drop interface for the user to create programs using the paradigms of a GPGPU programmer. (paper)

CS81 - Adaptive Robotics

We were interested to see if object tracking could be done without any explicit edge detection. We implemented a tracking algorithm using Growing Neural Gas that simultaneously tracks multiple objects. As a proof of concept we did some initial interfacing with an AIBO robot to direct its head to move with an object. (paper)

CS63 - Artificial Intelligence

This class had everyone competing at a single task: given a simulation of an emergency situation, our (virtual) robot was tasked with triaging victims stranded in a room. Our robot had the ability to sense a few spaces around itself and pick up/carry/put down victims. There were four major parts of this project. We used A* search to reach victims. We implemented a room detection algorithm to make sure we didn't check the same room for victims twice. We adapted AdaBoost to check if victims, who had dynamic health stats, were worth saving. And we generated a probabilistic model to track wandering victims. (paper)