Data Warehousing/Data Mining

From Isopedia

Contents

Technical Information

Data Mining

Data Mining, also known as Knowledge Discovery in Databases, is by definition the process of searching data with mathematical algorithms to identify trends and hidden patterns and then predicting how these trends or patterns will behave in the future. In other words, it is the process of finding and extracting useful information from raw datasets. Data mining is the process on which a business is able to predict future trends and behaviors of the company by looking deep within databases for hidden patterns.

Data Mining automatically searches large volumes of data. It is a fairly recent topic in computer science but applies many older computational techniques especially those used in statistics.

Data mining can be used to generate an hypothesis. For example, an analyst might use a neural net to discover a pattern that analysts did not think to try - for example, that people over 30 years old with low incomes and high debt but who own their own homes and have children are good credit risks.

How is data mining able to tell you important things that you didn't know or what is going to happen next? That technique that is used to perform these feats is called modeling. Modeling is simply the act of building a model (a set of examples or a mathematical relationship) based on data from situations where the answer is known and then applying the model to other situations where the answers aren't known. Modeling techniques have been around for centuries, of course, but it is only recently that data storage and communication capabilities required to collect and store huge amounts of data, and the computational power to automate modeling techniques to work directly on the data, have been available.

Data Mining is primarily used today by companies with a strong consumer focus such as retail, financial, communication, and marketing companies. Data Mining allows these companies to determine relationships among internal factors which include price, product positioning or staff skills, and external factors like economic indicators, competition, and customer demographics. The process of Data Mining also enables them to determine the impact on sales, customer satisfaction, and corporate profits based on past events. Finally, it enables them to see any and all information and to view detailed transactional data. An advantage to data mining is the fact that it uses computer cycles to replace human cycles. A disadvantage is that it still requires a significant level of expertise from users.



Data Warehousing

Data Warehousing on the other hand is an application. It began in the late 1980’s and early 1990’because the previous software was very slow and the demand for a better way to store information was growing. It involves a computer database. The computer database collects, sorts, and stores information for a company. Its aim is to manage and support analysis techniques such as Data Mining. The goal of data warehousing is to free the information that is locked up in the operational databases (process data to support critical operational needs) and to mix it with information from other, often external, sources of data.

By definition Data Warehousing is a repository of an organization's data, where the informational assets of the organization are stored and managed, to support various activities such as reporting, analysis, decision-making and other various activities such as support for optimization of organizational operational processes.

Some benefits of datawarehousing are executives, managers and staff are provided with dramatically improved access to data from many databases within the organization, managers manage with the data they want rather than with the data they get, less time spent gathering data from disparate systems, and more time available to analyze and act, reduction in paper reporting and administration, easier to do general financial analysis for the organization, less time spent on data reconciliation and problem resolution, clients get quicker, more accurate answers to questions when they call, and minimal amount of administrative labor required to maintain the Data Warehouse. Data Warehousing allows everyone in a company to analyze data and make better decisions based on the data.


There are six major components of Data Warehousing.

Components of Data Warehousing.

a.) Data Sources-Data sources refers to any electronic repository of information that contains data of interest for management use or analytics.

b.) Data Transformation-The Data Transformation layer receives data from the data sources, cleans and standardises it, and loads it into the data repository.

c.) Data Warehouse-It must be organized to hold information in a structure that best supports not only query and reporting, but also advanced analysis techniques, like data mining. Most data warehouses hold information for at least 1 year and sometimes can reach half century.

d.) Reporting

e.) Metadata-Used not only to inform operators and users of the data warehouse about its status and the information held within the data warehouse, but also as a means of integration of incoming data and a tool to update and refine the underlying DW model.

f.) Operations-Comprised of the processes of loading, manipulating and extracting data from the data warehouse. Operations also cover user management, security, capacity management and related functions

Some drawbacks of data warehousing systems include the fact that they only include information that is generated from internal transactions allowing information to be limited in some aspects of the process. It also takes time for a business to interpret the findings via data warehousing and then put them to use to run a more efficient business. Sometimes strategic applications of data warehousing have a short life span. Costs may overpower the results, and data warehousing requires an immense amount of maintenance in order to keep the system running smoothly and up to date.


History


The process of Data Mining is a relatively new process although the components are actually very old. Data mining is a powerful new technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers. It discovers information within the data that queries and reports can't effectively reveal. The main component of Data Mining is Statistics, which is the foundation on which it is built. Like statistics, data mining is not a business solution, it is just a technology. For example, consider a catalog retailer who needs to decide who should receive information about a new product. The information operated on by the data mining process is contained in a historical database of previous interactions with customers and the features associated with the customers, such as age, zip code, their responses. The data mining software would use this historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are most likely to respond. The operational business software can then feed the results of the decision to the appropriate touch point systems (call centers, direct mail, web servers, email systems, etc.) so that the right customers receive the right offers.Data Mining is also built on Artificial Intelligence and Machine Learning. Machine Learning is the combination of Statistics and Artificial Intelligence. Artificial Intelligence is a branch of computer science and engineering that deals with intellignet behavior, learning, and adaptation in machines.

Data Warehouses became a very popular and distinct form of computer warehouse during the late 1980's and early 1990's. They developed to meet a growing demand for management information and analysis that could not be met by the current operational systems at that time.

Due to this problem new, separate databases were being built. These databases were designed to support management information and analysis. The main feature of these new data bases that made them so useful was that they were able to combine sources such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheets, and integrate this information into a single place. This has lead to the widespread growth and popularity of Data Warehousing. Over time, as technology improved (lower cost for more performance) and user requirements increased (faster data load cycle times and more features), data warehouses have evolved as well.



“Bill Inmon's formal systems definition of a data warehouse is a computer database and its upporting components that is: • Subject-oriented, meaning that the data in the database is organised so that all the data elements relating to the same real-world event or object are linked together; • Time-variant, meaning that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time; • Non-volatile, meaning that data in the database is never over-written or deleted, but retained for future reporting; and, • Integrated, meaning that the database contains data from most or all of an organisation's operational applications, and that this data is made consistent.” (Data Warehousing, Wikipedia)


Sources


http://en.wikipedia.org/wiki/Data_Warehousing [1]

http://en.wikipedia.org/wiki/Data_Mining [2]

http://www.data-mining-software.com/data_mining_history.htm [3]

http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm [4]

http://www.thearling.com

http://www.the-data-mine.com/

http://www.eco.utexas.edu/~norman/BUS.FOR/course.mat/Alex/#5

http://www.kenorrinst.com/dwpaper.html

http://www.breuer.com/benefits.asp

http://www.dwinfocenter.org/

http://www.dbmsmag.com/9807m01.html

http://en.wikipedia.org/wiki/Artificial_intelligence

Team Members

Jason Nalbandian

Lauren Cuomo

Christina DiCioccio

Nicole Slabicki