NoSQL Zone is brought to you in partnership with:

Wille Faler is an experienced software developer, architect and agile coach with experience across a number of different industries as an independent consultant. Wille specializes in backend, integration, and Java technologies, but has more recently found a passion for Scala and text mining/analysis. Wille is a DZone MVB and is not an employee of DZone and has posted 39 posts at DZone. You can read more from them at their website. View Full User Profile

On Machine Intelligence and Predictive Power

05.31.2012
| 3986 views |
  • submit to reddit
I launched GreedAndFearIndex - a SaaS platform that automatically reads thousands of financial news articles daily to deduce what companies are in the news and whether financial sentiment is positive or negative.

It’s an app built largely on Scala, with MongoDB and Akka playing prominent roles to be able to deal with the massive amounts of data on a relatively small and cheap amount of hardware.

The app itself took about 4-5 weeks to build, although the underlying technology in terms of web crawling, data cleansing/normalization, text mining, sentiment analysis, name recognition, language grammar comprehension such as subject-action-object resolution and the underlying “God”-algorithm that underpins it all took considerably longer to get right.

Doing it all was not only lots of late nights of coding, but also reading more academic papers than I ever did at university, not only on machine learning but also on neuroscience and research on the human neocortex.

What I am getting at is that financial news and sentiment analysis might be a good showcase and the beginning, but it is only part of a bigger picture and problem to solve.

Unlocking True Machine Intelligence & Predictive Power
The human brain is an amazing pattern matching & prediction machine - in terms of being able to pull together, associate, correlate and understand causation between disparate, seemingly unrelated strands of information it is unsurpassed in nature and also makes much of what has passed for “Artificial Intelligence” look like a joke.

However, the human brain is also severely limited: it is slow, it’s immediate memory is small, we can famously only keep track of 7 (+/-2) things at any one time unless we put considerable effort into it. We are awash in amounts of data, information and noise that our brain is evolutionary not yet adapted to deal with.

So the bigger picture of what I’m working on is not a SaaS sentiment analysis tool, it is the first step of a bigger picture (which admittedly, I may not solve, or not solve in my lifetime):

What if we could make machines match our own ability to find patterns based on seemingly unrelated data, but far quicker and with far more than 5-9 pieces of information at a time?

What if we could accurately predict the movements of financial markets, the best price point for a product, the likelihood of natural disasters, the spreading patterns of infectious diseases or even unlock the secrets of solving disease and aging themselves?

The Enablers
I see a number of enablers that are making this future a real possibility within my lifetime:

  • Advances in neuroscience: our understanding of the human brain is getting better year by year, the fact that we can now look inside the brain on a very small scale and that we are starting to build a basic understanding of the neocortex will be the key to the future of machine learning. Computer Science and Neuroscience must intermingle to a higher degree to further both fields.
  • Cloud Computing, parallelism & increased computing power: Computing power is cheaper than ever with the cloud, the software to take advantage of multi-core computers is finally starting to arrive and Moore’s law is still advancing at ever (the latest generation of MacBook Pro’s have roughly 2.5 times the performance of my barely 2 year old MBP).
  • “Big Data”:  we have the data needed to both train and apply the next generation of machine learning algorithms on abundantly available to us. It is no longer locked away in the silos of corporations or the pages of paper archives, it’s available and accessible to anyone online.
  • Crowdsourcing: There are two things that are very time intensive when working with machine learning - training the algorithms, and once in production, providing them with feedback (“on the job training”) to continually improve and correct. The internet and crowdsourcing lowers the barriers immensely. Digg, Reddit, Tweetmeme, DZone are all early examples of simplistic crowdsourcing with little learning, but where participants have a personal interest in participating in the crowdsourcing. Combine that with machine learning and you have a very powerful tool at your disposal.

Babysteps & The Perfect Storms
All things considered, I think we are getting closer to the perfect storm of taking machine intelligence out of the dark ages where they have lingered far too long and quite literally into a brave new world where one day we may struggle to distinguish machine from man and artificial intelligence from biological intelligence.

It will be a road fraught with setbacks, trial and error where the errors will seem insurmountable, but we’ll eventually get there one babystep at a time.
I’m betting on it and the first natural step is predictive analytics & adaptive systems able to automatically detect and solve problems within well-defined domains.

Published at DZone with permission of Wille Faler, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)