Thursday, March 1, 2012

Got Big Data?

If competing based on time and information really will drive the next major economic era, then Big Data is real and represents a huge opportunity. If you’re a business analyst or technologist responsible for mapping data to decisions, then the variety, velocity, and volume of data available to you today has never been richer. And, your responsibility has never been greater.

I’ve previously discussed the different classes of data source technologies that can legitimately be used to harness (or tame) big data. Hadoop is one of those technologies, as the most popular software framework associated with this rising trend. Others include NoSQL databases, MPP data stores and even ETL/Data Integration approaches (for moving Big Data by the batch into some more usable format). Each of these technologies align with an appropriate use-case that makes more understandable the variety of products emerging in this world of Big Data.

For simplicity, I like to talk about three popular approaches to connecting to and making use of Big Data for business intelligence reporting and analysis.

Interactive Exploration – the most dynamic because it involves native connectivity directly from the BI tool to the Big Data source and can offer results in near-real-time. Hadoop HBase, Hadoop HDFS, and MongoDB are just three of the most popular data sources to which direct connection would be an advantage.

Direct Batch Reporting – an important and mainstream approach (especially in this early market of Big Data) that relies on tried-and-true SQL access to Big Data. Hadoop Hive is the best known example, but Cassandra offers CQL access that delivers similar results and functionality.

Batch ETL – using extract, transform and load techniques to create a more usable subset of the Big Data is also popular, especially when the insight being sought is less urgent, probably in the order of hours or days after data capture. Most every ETL tool has now been improved to connect to and transform Big Data. Some even integrate nicely with underlying Hadoop technologies (like Pig), making the data steward’s life potentially simpler.

Sometime last year, it occurred to me that Jaspersoft is in a unique position with regard to Big Data. Because of Jaspersoft’s data-agnostic architecture, we’ve quickly offered a broad variety of native Big Data connectors, many of which have been available for more than one year (for free download) . . . and because of our large, growing community of developers (we have more than 260,000 registered community members, growing at about 6,000/month at the time of this writing), we have important data about Big Data. This realization led us to the Big Data Index.

Big Data Index

We’ve tracked the downloads of our Big Data connectors over the last year, charting the ups and downs with each, corresponding to the relative rise and fall of their popularity. Over this time, we’ve seen more than 15,000 downloads, so our view is pretty good. Here’s a static version of the latest data for the four most popular Big Data connector downloads:



During the course of the past year, the Hadoop technologies (HBase & Hive combined) proved the most popular. The fastest growing and the leader at the moment is MongoDB (from 10gen). Cassandra holds a solid and consistent fourth position (which should benefit DataStax, the commercial company behind Cassandra). Many other Big Data connectors are tracked as well, with a dynamic chart updated monthly.

As interest in Big Data grows, so will the potential uses for these technologies that are designed to map this data to decisions and insights. At the moment, I’m just content knowing I have a front-row seat via the Big Data Index.

We’re at the very beginning of this era, which will surely be reliant on more data than we could barely fathom just ten years ago. This is why your thoughts and comments on this topic are appreciated.

Brian Gentile
Chief Executive Officer
Jaspersoft

2 comments: