Big Data/BI Zone is brought to you in partnership with:

Gary Sieling is a software developer interested in dev-ops, database technologies, and machine learning. He has a computer science degree from the Rochester Institute of Technology. He has worked on many products in the legal and regulatory industries, having worked on and supported several data warehousing applications. Gary is a DZone MVB and is not an employee of DZone and has posted 62 posts at DZone. You can read more from them at their website. View Full User Profile

My 2012 Side Projects

01.02.2013
| 2235 views |
  • submit to reddit

ExtJS Sample code
http://garysieling.com/blog/tag/extjs

Throughout the year I wrote a couple dozen how-to samples of ExtJS code. This product seems to be poorly documented, although it has improved, and is poorly covered on blogs outside the Sencha forums. This series has proved one of the most popular, and has received most of the comments my blog.

Scraper in Chrome
http://garysieling.com/blog/building-a-website-scraper-using-chrome-and-node-js

I built a scraper as a Chrome plugin, to try building an entire project in a single programming language. I saw some comments that Google may use part of Chrome for parsing sites, and thought this would be an interesting to try. I set up a backend in Node.js, and a controlling script in Windows Scripting host- I found that different Javascript engines are different enough in APIs and memory management to be a pain, although not a major stumbling block. The nice thing about this system is it makes for easy RPC calls, and you can pass code over the wire.

Scraping Flippa data
http://garysieling.com/blog/one-third-site-auctions-abandoned
http://garysieling.com/blog/advertisers-used-by-banned-sellers-flippa
http://garysieling.com/blog/advertisers-referenced-in-flippa-auctions

I wanted to test out the difficulties of scraping, as well as doing some market research. I pulled a small subset of auctions to run some tests. This had challenges that work more work than anticipated – memory management, error handling, and dealing with various page formatting. This got me into looking at Weka and R for data mining (e.g. looking for indicators of good values or spam), and it proved difficult to get statistically significant conclusions from the data.

Scraping ads with PhantomJS
http://garysieling.com/blog/scraping-adsense-ads-with-phantomjs

I thought it’d be interested to test the difficulty of scraping advertising, as a market research exercise. This was insprired by Mixrank, and I was thinking of using it to do due diligence on website purchases (e.g. a website with adsense, or a competitor to a website with adsense).

Expert Search
http://garysieling.com/blog/category/full-text-search

I went to a Solr/Lucene conference and wanted to build a project while there. I wrote a simple ETL script to convert git history to a solr index, and a UI to facet search results by authors or companies. This addresses the interesting problem of resolving which engineer to call when a client calls the front desk, as well as identifying for new engineers who worked on what project.

Beehive Plans

http://www.makingbeehives.com

This is an ongoing side project I’m working on with my father- he wrote several short books of beehive plans, which sell well at conferences and through a major beekeeping magazine. I set up a domain and an online store for him, as an exercise in learning a bit about SEO and marketing, which has been quite fruitful. I have a free  account with adzerk, a company that does hosted ad servers. This enabled me to run a couple A/B tests on Amazon products; nothing earth-shattering, but still interesting.

Sound Processing
http://garysieling.com/blog/tag/r
http://garysieling.com/blog/category/data-mining

I did some experimentation with R, partly because I like math, and partly to see an alternative way of managing data – one could think of R as a SQL like language that operates on an in-memory database, and it makes for an interesting learning exercise. I did some experimentation with reverse-engineering music (what chords are listed in a song), which is easy to do in some simple cases, and difficult to do well – to make this work effectively requires a very large and broad training set.

This Blog
http://garysieling.com/blog/

I wanted to improve my writing and document side projects. Much to my surprise, people have subscribed to this and follow along. It has also proven to be a good way to get feedback on these projects, as people periodically leave updates or further requests on sample code. I’ve also met a few people and started receiving more recruiter spam, although not enough to be overwhelming.

Future Projects

Invoicing
I’m currently working on a simple project to build an invoicing application for someone, which will let me test out some new APIs, as well as working on a workflow to move from mock-ups to functional prototypes more quickly, as well as getting a better handle on what is available for PDF APIs, testing twilio, and mobile device APIs.

Scala
This will be signficant work-wise, and my future JVM based projects will likely move to Scala, so stay tuned.

Published at DZone with permission of Gary Sieling, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)