The Information Management group at IBM Research – India is focused on developing next-generation technologies in various areas such as advanced business intelligence and insight generation, context-oriented information integration, and extraction of semantic knowledge from unstructured data. These technologies are driven by IBM Research's goal of building intelligent solutions and services to address business problems in various industrial sectors, including financial, telecommunication, retail, and healthcare, among others.
We bring together the capabilities of information integration and data analytics to build next-generation integrated enterprise information management systems. This would encompass techniques for context extraction at the time of data upload and building new interfaces for on-demand access to information by enhanced business driven search and dynamic faceted browsing. We are also exploring the value of incorporating text data in various predictive analytic models for customer lifetime value (CLV), churn prediction, and targeted marketing.
The Information Management team develops novel techniques for loosely-coupled structured and unstructured data through symbiotic and semantically-disambiguated information in an enterprise. This is achieved by viewing the structured data in the relational database as a set of predefined "entities" and identifying the entities from this set that best match a given document.
We also focus on information extraction (IE) from unstructured data where we develop technologies, which involve the identification of entities such as organizations, places, product names and relationships among entities such as sellers and employees. To address the need for scalability in IE systems, we are developing innovative techniques that work on the inverted index of document collections. Some of the research challenges in this domain include techniques to deal with extremely noisy data (such as SMS, instant messenger logs, e-mail and automatically transcribed conversational data) and modeling and maintaining uncertainty and conflicts associated with information extraction.