Just knowledge it: data mining

The process of analyzing existing databases in order to find useful information is called data mining. Generally, a data-base, whether scientific or commercial, is designed for a particular purpose, such as recording scientific observa-tions or keeping track of customers’ account histories. How-ever, data often has potential applications beyond those conceived by its collector.

Conceptually, data mining involves a process of refining data to extract meaningful patterns—usually with some new purpose in mind. First, a promising set or subset of the data is selected or sampled. Particular fields (variables) of interest are identified. Patterns are found using techniques such as regression analysis to find variables that are highly correlated to (or predicted by) other variables, or through clustering (finding the data records that are the most simi-lar along the selected dimensions). Once the “refined” data is extracted, a representation or visualization (such as a report or graph) is used to express newly discovered infor-mation in a usable form.

Similar (if simpler) techniques are being used to target or personalize marketing, particularly to online customers. For example, online bookstores such as Amazon.com can find what other books have been most commonly bought by people buying a particular title. (In other words, iden-tify a sort of reader profile.) If a new customer searches for that title, the list of correlated titles can be displayed, with an increased likelihood of triggering additional purchases. Businesses can also create customer profiles based on their longer-term purchasing patterns, and then either use them for targeted mailings or sell them to other businesses (see e-commerce). In scientific applications, observations can be “mined” for clues to phenomena not directly related to the original observation. For example, changes in remote sensor data might be used to track the effects of climate or weather changes. Data-mining techniques can even be applied to the human genome (see bioinformatics).

Trends

Data mining of consumer-related information has emerged as an important application as the volume of e-commerce continues to grow, the amount of data generated by large systems (such as online bookstores and auction sites) increases, and the value of such information to marketers becomes established. However, the use of consumer data for purposes unrelated to the original purchase, often by companies that have no pre-existing business relationship to the consumer, can raise privacy issues. (Data is often rendered anonymous by removing personal identification information before it is mined, but regulations or other ways to assure privacy remain incomplete and uncertain.)

The most controversial applications of data mining are in the area of intelligence and homeland security. Because such applications are often shrouded in secrecy, the public and even lawmakers have difficulty in assessing their value and devising privacy safeguards. According to the Govern-ment Accountability Office, as of 2007 some 199 different data-mining programs were in use by at least 52 federal agencies. One of the most controversial is ADVISE (Anal-ysis Dissemination, Visualization, Insight and Semantic Enhancement), developed by the Department of Homeland Security since 2003. The program purportedly can match and create profiles using government records and users’

Web sites and blogs. Privacy advocates and civil libertarians have raised concerns, and legislation has been introduced that would require that all federal agencies report their data-mining activities to Congress (see also counterterrorism and computers and privacy in the digital age.)

Just knowledge it

Search This Blog

Monday, 28 October 2013

data mining

No comments:

Post a Comment