Tuesday, November 8, 2011

Making Sense of it All

I’ve been writing about how important it is to build and deliver big data projects that can succeed, because the opportunity to do so has never been better and the business reasons to do so have never been more compelling. Seems like each week, more tools and products are available to make big, complex data types useful for a variety of business purposes.

But, what about the unforgiving worlds of natural language and semi-structured data sources? Is there any hope to generate insight from them, even in this new big data world?

It’s one thing to make sense of more traditionally structured big data sources; its quite another to parse natural language and complex, industry-specific data types. To quickly understand the difficulties of these data environments, I recommend Brett Sheppard’s excellent blog post on this topic.

Informatica’s HParser to the Rescue

Enter Informatica’s HParser, announced last week. Now, accessing and then making sense of practically any data type has just become far simpler. You can learn more about this important new Informatica product here. HParser is a parsing technology that can run inside a MapReduce job and which allows users to structure the unstructured or semi-structured data in Hadoop and ready it for analysis. This takes a lot of the complexity out of creating custom scripts, which is what developers need to do today. HParser is available in both a community and commercial edition and features a visual development environment that, when combined with its myriad out-of-the-box parsers for semi-structured industry standard data, can eliminate up to 80% of the time it takes to turn this data into insight.

Integration with Jaspersoft

I’m thrilled that Jaspersoft has collaborated with Informatica to deliver rich reporting and analysis of natural language and semi-structured data, working directly with Informatica’s new HParser. Through integration with Jaspersoft’s BI server, creating any variety of reports and analyses is drag-and-drop easy. You can learn more about our work together through this brief video.

In short, we’ve worked with Informatica to ensure the Jaspersoft BI platform can provide analytic access to Hadoop for anyone who needs to access and understand data – whether its an executive who wants a summarized dashboard or a manager who needs a detailed operational report. And, our BI platform can handle both batch processing (through Hive) as well as direct, ad hoc and near real-time access to this data, which we uniquely provide through direct HBase access. That should satisfy even the most analytic end user.

Now there’s no reason not to consider any big data source. Toward the goal of genuinely harnessing the opportunity all this new (big) data represents, it’s good to see Informatica and Jaspersoft help lead the way. Your comments are appreciated.

Brian Gentile

Chief Executive Officer


Tuesday, October 25, 2011

Too Big (Data) to Fail

Will we look back at 2011 and think of it as “the year of Big Data”? This does feel like the year when organizations can genuinely take advantage of the opportunity presented by big data – harnessing its volume, variety and velocity – in both concept and implementation.

The venture capital and investment community has been betting with its wallet. During the first three quarters of 2011, several high-profile acquisitions occurred and at least one dozen new, early stage investments were made. The key theme is big data analytics and the goal is big insights that drive new business decisions (presumably, decisions that couldn’t have been made without leveraging that big data).

The problem is that, currently, big data analytics is fraught with far too much data and far too little analytics. Should this continue without a more intelligent way to connect to and actually use all this data, the result will often be project failure.

The next generation of big data connectors must be more intelligent, providing views into these vast swaths of data, so the opportunity for big insights can be more commonly realized. Jaspersoft’s recent work and announcement this week with IBM and its InfoSphere BigInsights product take a major step in this direction.

IBM InfoSphere BigInsights

Building on the Apache Hadoop open source framework, IBM InfoSphere BigInsights adds administrative, workflow, provisioning, and security features, along with best-in-class analytical capabilities. The IBM software package comes in a Basic Edition (freely downloadable) and an Enterprise Edition. The Basic Edition includes the complete Apache Hadoop install, a web-based management console, and pre-built integration with IBM InfoSphere Warehouse, IBM Smart Analytics System, and DB2. The Enterprise Edition goes on to include text analytics capabilities with a rules engine, a spreadsheet-like browser-based tool (called BigSheets) for data exploration and job creation, a metric-driven scheduler, large scale indexing, a JDBC connector, LDAP support, and a query language that enables analysis of structured and non-traditional data types (called Jaql).

Finding insight from within all the data can be challenging. The BigInsights toolset is made far more useful with a modern, powerful BI server out in front of it. So, IBM’s partnership with Jaspersoft provides this critical component of a complete Big Data analytics solution.

2nd Generation “Intelligent” Connectors

Connecting to a Hadoop-class data source is only useful if done intelligently. Running a query that returns millions of rows (and columns) of data probably won’t answer the business question being posed. Intelligently interrogating the data structure during the query is necessary. To accomplish this, Jaspersoft has delivered a 2nd generation connector for the IBM InfoSphere BigInsights platform. This connector builds incrementally on providing data access via Hive and it builds exponentially on allowing direct and intelligent access to HBase. The Jaspersoft connector supports filters, delivers greater performance and usability, and enables yet unseen flexibility for interacting with Big Data.

1. Filters: Because HBase has no native query language, there's no automatic filtering capability. But there are filtering APIs. The new Jaspersoft connector not only supports simple filters (e.g., StartRow and EndRow) but also supports a wide array of complex filters (like RowFilter, FamilyFilter, ValueFilter, SkipValueFilter, and so on). In fact, the universe of supported Apache Hadoop filters is listed here.

2. Performance & Usability: In addition to the systems monitoring and management niceties provided by IBM, a Jaspersoft HBase query can specify exactly the ColumnFamilies and/or Qualifiers that are to be returned. This is particularly helpful for query performance tuning and usability, in that some HBase users have very wide tables, so accessing just the necessary fields offers a much faster and more usable solution.

3. Flexibility: To unpack data from HBase and make sense of it within a reporting tool, Jaspersoft’s connector supports a deserialization engine framework. The connector automatically understands HBase's shell and Java default serializations. Then, a customer can plug in existing or customized Java deserializers so the connector will automatically convert from HBase's raw bytes into meaningful data types. This delivers flexible support for the widest array of data within Hadoop’s HBase environment.

We’ve truly come a long way from the earliest days of Apache Hadoop, moving beyond the technical elite, on to the IT team (thanks to IBM) and now on to the business user (thanks to Jaspersoft). The result of Jaspersoft’s integration with IBM InfoSphere BigInsights is a complete Big Data solution, including the ability to manage and process large volumes of data and the ability to extract key information using flexible and easy-to-use reporting, dashboard and analytic views in one integrated solution. There’s plenty more to learn about Jaspersoft’s integration with IBM InfoSphere BigInsights.

The fastest path toward uncovering real analytic insight from Hadoop comes through a combination of proven, best-in-class software. Just in time, because the untapped potential for bold new insight from within the growing volumes of data is too big to fail.

Brian Gentile

Chief Executive Officer