Data mining

Доклад - Компьютеры, программирование

Другие доклады по предмету Компьютеры, программирование

nsЃh (Weiss, et al., 2005, p. 5).Weiss, et al. (2005), the process of getting the text ready for text mining is very much like the knowledge discovery steps described earlier. In text mining, the text is usually converted first to XML format for consistency. It is then converted to a series of tokens (sometimes punctuation is interpreted as a token, sometimes as a delimiter). Then, some form of stemming is applied to the tokens to create the standardized dictionary. Familiar IR/data mining processes such as TF-IDF can be applied to assign different weights to the tokens. Once this has been done, classification and clustering algorithms are applied.on the goal of the text mining operation, it may or may not be important to incorporate linguistic processing in the text mining process. Examples of linguistic processing include marking certain types of words (part-of-speech tagging), clarifying the meaning of words (disambiguation) and parsing sentences. Per Benoit (2002),mining brings researchers closer to computational linguistics, as it tends to be highly focused on natural language elements in texts (Knight, 1999). This means TM applications (Church & Rau, 1995) discover knowledge through automatic content summarization (Kan & McKeown, 1999), content searching, document categorization, and lexical, grammatical, semantic, and linguistic analysis (Mattison, 1999). (p. 291)

Data mining is a synonym for knowledge discovery. Data mining also refers to a specific step in the knowledge discovery process, a process that focuses on the application of specific algorithms used to identify interesting patterns in the data repository. These patterns are then conveyed to an end user who converts these patterns into useful knowledge and makes use of that knowledge.mining has evolved out of the need to make sense of huge quantities of information. Usama M. Fayyad says that stored data is doubling every nine months and the Ѓgdemand for data mining and reduction tools increase exponentially (Fayyad, Piatetsky-Shapiro, & Uthurusamy, 2003, p. 192).Ѓh In 2006, $6 billion in text and data mining activities are anticipated (Zanasi, Brebbia, & Ebecken, 2005).U.S. government is involved in many data mining initiatives aimed at improving services, detecting fraud and waste, and detecting terrorist activities. One such activity, the work of Able Danger, had identified one of the men who would, one year later, participate in the 9/11 attacks (Waterman, 2005). This fact emphasizes the importance of the final step of the knowledge discovery process: putting the knowledge to use.U.S. governments data mining activities have helped stir concerns about data mining and their impact on privacy (Boyd, 2006). Privacy preserving data mining has only recently caught the attention of researchers (Verykios, Bertino, Fovino, Provenza, Saygin & Theodoridis, 2004).is much work to done in the area of knowledge discovery and data mining, and its future depends on developing tools and techniques that yield useful knowledge without causing undue threats to individualsЃf privacy.

References

1.Andrassoya, E., & Parali., J. (1999, September). Knowledge discovery in databases - a comparison of different views. Presented at the 10th International Conference on Information and Intelligent Systems, Sept. 1999, Varazdin, Croatia.

2.Baeza-Yates, & R., Ribeiro-Neto, B. (1999). Modern Information Retrieval. New York: ACM Press.

.Benoit, Gerald. (2002). Data Mining [Chapter 6, pps 265-310). In Cronin, B. (Ed.), Annual Review of Information Science and Technology: Vol. 36 (pp. 265-310). Silver Spring, MD: American Society for Information Science and Technology.

.Boyd, R.S. (2006, February 2). Data mining tells government and business a lot about you. Common Dreams Newscenter.

.Buchanan, B.G. (2006). Brief History of Artificial Intelligence.

.Dunham, M.H. (2003). Data mining introductory and advanced topics. Upper Saddle River, NJ: Pearson Education, Inc.

.Fayyad, U.M, Piatetsky-Shapiro, G., & Smyth, P. (1996, Fall). From data mining to knowledge discovery in databases. AI Magazine, 17(3), pp. 37-54.

.Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R. (2003). Summary from the KDD-03 Panel . Data mining: The next 10 years. SIGKDD Explorations, 5(2). Retrieved March 22, 2006 from ACM Digital Library database.

.Frawley, W.J., Piatetsky-Shapiro, G., & Matheus, C.J. (1991). Knowledge Discovery in Databases: An Overview. In Piatetsky-Shapiro, G. & Frawley, W.J. (Eds.), Knowledge discovery in databases (pp. 1-27). Cambridge, MA: AAAI Press/MIT Press.

.Freitas, A.A. (1999). On rule interestingness measures. Knowledge-Based Systems, 12, 309-315. Retrieved February 28, 2005 from Elsevier database.

.General Accounting Office. (2004, May 4). Data Mining: Federal Efforts Cover a Wide Range of Uses (GAO-04-548).

.Han, J., & Kamber, M. (2001). Data mining: concepts and techniques (Morgan-Kaufman Series of Data Management Systems). San Diego: Academic Press.

.Hearst, M. (1999, June). Untangling Text Data Mining. Presentation at the 37th Annual Meeting of the Association of Computational Linguistics, University of Maryland, MD.

.Hearst, M. (2003). What is text mining?

.Howland, P. & Park, H. (2003) Cluster-Preserving Dimension Reduction Methods for Efficient Classification of Text Data. In Survey of text mining: clustering, classification, and retrieval. New York: Springer Science+Business Media, Inc.

.Kobayashi, M. & Aono, M. (2003) Vector space models for search and cluster mining. In Survey of text mining: clustering, classification, and retrieval. New York: Springer Science+Business Media, Inc.

.MetaCombine Project. (n.d.) A project of Emory UniversityЃfs MetaScholar Initiative.

.Roddick, J.F., & Spiliopoulou, M. (1999, June). A bibliography of temporal, spatial and spatio-temporal data mining research. ACM SIGKDD Explorations Newsletter. Retrieved March 22, 2006 from ACM Digital Library.

.Schneier, B. (2005, March 9). Why data mining wonЃft stop terror. Wired News.

.Senellart, P.P., & Blondel, V.D. (2003). Automatic Discovery of words Words. In Berry, M.W. (Ed.). In Survey of text mining: clustering, classification, and retrieval. New York: Springer Science+Business Media, Inc.

.Two Crows. (1999) About Data Mining [Third Edition].

.Verykios, V.S., Bertino, E., Fovino, I.N., Provenza, L.P., Saygin, Y., & Theodoridis, Y. (2004). State-of-the-art in privacy preserving data mining. SIGMOD Record, 33(1). Retrieved March 22, 2006 from ACM Digital Library.

.Waterman, S. (2005, September 20). Probing Able Danger [editorial]. The Washington Times [online version]. Retrieved January 20, 2006 from NewsBank database.

.Weiss, S.M., Indurkha, N., Zhang, T., Damerau, F.J. (2005). Text mining: Predictive methods for analyzing unstructured information. New York: Springer Science+Business Media, Inc.

.Witten, I.H., Frank, E. (2005). Data mining: practical machine learning tools and techniques(2nd ed, Morgan-Kaufman Series of Data Management Systems). San Francisco: Elsevier.

.Zaiane, O.R., Han, J., Li, Z., Hou, J. (1998). Mining multimedia data. Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative Research, Toronto, Ontario, Canada. Retrieved March 22, 2006 from ACM Digital Library.

.Zanasi, A., Brebbia, C.A., Ebecken, N.F.F. (2005). Preface. In Zanasi, A., Brebbia, C.A., Ebecken, N.F.F.(Eds.), Sixth International Conference on Data Mining: Data Mining VI. Southampton, England: WIT Press.