I’ve been writing about how important it is to build and deliver big data projects that can succeed, because the opportunity to do so has never been better and the business reasons to do so have never been more compelling. Seems like each week, more tools and products are available to make big, complex data types useful for a variety of business purposes.
But, what about the unforgiving worlds of natural language and semi-structured data sources? Is there any hope to generate insight from them, even in this new big data world?
It’s one thing to make sense of more traditionally structured big data sources; its quite another to parse natural language and complex, industry-specific data types. To quickly understand the difficulties of these data environments, I recommend Brett Sheppard’s excellent blog post on this topic.
Informatica’s HParser to the Rescue
Enter Informatica’s HParser, announced last week. Now, accessing and then making sense of practically any data type has just become far simpler. You can learn more about this important new Informatica product here. HParser is a parsing technology that can run inside a MapReduce job and which allows users to structure the unstructured or semi-structured data in Hadoop and ready it for analysis. This takes a lot of the complexity out of creating custom scripts, which is what developers need to do today. HParser is available in both a community and commercial edition and features a visual development environment that, when combined with its myriad out-of-the-box parsers for semi-structured industry standard data, can eliminate up to 80% of the time it takes to turn this data into insight.
Integration with Jaspersoft
I’m thrilled that Jaspersoft has collaborated with Informatica to deliver rich reporting and analysis of natural language and semi-structured data, working directly with Informatica’s new HParser. Through integration with Jaspersoft’s BI server, creating any variety of reports and analyses is drag-and-drop easy. You can learn more about our work together through this brief video.
In short, we’ve worked with Informatica to ensure the Jaspersoft BI platform can provide analytic access to Hadoop for anyone who needs to access and understand data – whether its an executive who wants a summarized dashboard or a manager who needs a detailed operational report. And, our BI platform can handle both batch processing (through Hive) as well as direct, ad hoc and near real-time access to this data, which we uniquely provide through direct HBase access. That should satisfy even the most analytic end user.
Now there’s no reason not to consider any big data source. Toward the goal of genuinely harnessing the opportunity all this new (big) data represents, it’s good to see Informatica and Jaspersoft help lead the way. Your comments are appreciated.
Chief Executive Officer