Statistical analysis and business intelligence (BI) engaged in a number of data analysis products such as SAS Institute (hereinafter, SAS). The company, either to facilitate how the data analysis, with a view such as either easy to use how to clerical department, continues to polish the function of the data cleansing tool. The features of SAS data cleansing tool, I talked to Izumi Kobayashi of the company business Development Division Information Management & Analytics group manager.
SAS is as one component of the service that combines essential data management-related methodologies and products in data analysis “SAS Data Management“, provides a data cleansing tool. “It is not a data cleansing for the computer-assisted name identification, comprehensive to provide data cleansing for the analysis” intention is, and that is in the background to take this offer format.
SAS Data Management, the server provides a stand-alone version, in addition to the version. “I would like to clean the data on their own when the analysis was to meet the needs of the clerical department.” Cleansing engine and user interface is identical to the stand-alone version and a server version of both, first to introduce a stand-alone version, data cleansing target is shifted to the Server Edition After increase, it can also be easily be such.
Operability / management-oriented cleansing tool “SAS Data Management”
Feature of SAS data cleansing tool, but three points of “ease of configuration,” “rule-based cleansing,” “enhancement of profiling function”. Let’s look for more information about each.
General data cleansing products is to use “dictionary” that covers personal names and place names, company name, carried cleansing in step such match from the beginning of the data dictionary. The SAS data cleansing tool for, that it uses a rule-based cleansing approach is different. In the case of US, address certain rules, such as arranged in the order of the “State”, “city”, “address”, “building / apartment name”. And to modify the data on the basis of these rules, but the basic concept of rule-based cleansing.
For dictionary-based data cleansing products, in addition to the dictionary itself is expensive, frequent updates to follow it is necessary to change place names or company names. If rule-based, it is possible to eliminate the burden, such as dictionaries purchase and update.