Data Cleansing as a Transient Service
Real-world data is noisy. Data error rates vary widely between approximately 0.5%-30%, with 1%-5% being very common. Noisy data results in inaccurate reporting, poor customer service and bad decision making. Small errors can result in big problems: A wrong address in the database can result in delays or incorrect shipment of a product; billing discrepancies results in the wrong amount being billed to a customer. These errors result in poor customer satisfaction, increased churn and eventually loss of revenue. Infact statistics reveal that poor data costs billions of dollars to businesses. CaaTS offers solutions for cleansing noisy data and improving data quality.
The project focuses on delivering high accuracy data cleansing that:
Massive Data Management and Analysis
Increasingly large enterprises in telecom, finance, retail etc are faced with the challenge of managing and exploiting massive amounts of customer and network operations data that they accumulate at an ever-increasing rate. Companies that succeed in turning data into information and products can gain important business advantage in an intensely competitive industry. Hadoop is a promising infrastructure for data-intensive distributed analytics using inexpensive commodity hardware. However, its focus is on scalable processing of large amounts of input data with shallow analytics, and it ignores resource data requirements for deep analytics. Many large-scale data analytic tasks depend on massive amounts of analytic resource data (for example, statistical models), and it is difficult to co-locate or We look at how existing data models, storage techniques, access methods can be adapted to work with Hadoop and how to leverage these adaptations for performing analysis.
Micro-Analytics for Channel Intelligence
With basic banking becoming a commodity and all banks having similar service and capability, Micro-Analytics and Channel Intelligence provides a unique opportunity to banks to offer personalized banking experience to customers. Business Intelligence (BI) is a proven and established science to analyze customer behavior and understand important trends and behavior. As of today, banks analyze customer data and discover patterns at corporate level. However, these insights and recommendations are not real time to engage customers, at an individual level, as they visit the branch or call the bank. The real time solution to engage customers with accurate personalized product recommendations is an emerging trend and has several successful adopters. The solution to forecast customer needs by applying micro-segmentation and predictive models is called Micro-Analytics and will be developed and deployed leveraging SPSS platform.
Master Content Analytics
By leveraging enterprise content for reuse in master records, customers can deliver a single-view of party applications that span all types of information -- data and content. IBM InfoSphere Master Content for InfoSphere Master Data Management Server identifies and synchronizes enterprise content (and its data) with data in the InfoSphere Master Data Management Server. Such an integrated view on structured data and content that is trusted and complete supports better decisions and smarter business outcomes. In this project, we will work on several problems in this area including identifying syntactically and semantically related content in a repository, identifying the master copy from the related content, and semi-automatically linking the master copy to the structured data based on actual content. The content metadata can be enhanced by applying information extraction to the unstructured content to identify key pieces of information that can be used to link with the structured data.