Data Warehouse Interview Questions: DataMining Interview Questions Part 2

Monday, 26 March 2012

DataMining Interview Questions Part 2

Q.Explain clustering algorithm.
Clustering algorithm is used to group sets of data with similar characteristics also called as clusters. These clusters help in making faster decisions, and exploring data. The algorithm first identifies relationships in a dataset following which it generates a series of clusters based on the relationships. The process of creating clusters is iterative. The algorithm redefines the groupings to create clusters that better represent the data.

Q.What is Time Series algorithm in data mining?
A.Time series algorithm can be used to predict continuous values of data. Once the algorithm is skilled to predict a series of data, it can predict the outcome of other series. The algorithm generates a model that can predict trends based only on the original dataset. New data can also be added that automatically becomes a part of the trend analysis.E.g. Performance one employee can influence or forecast the profit

Q.Explain Association algorithm in Data mining?
A.Association algorithm is used for recommendation engine that is based on a market based analysis. This engine suggests products to customers based on what they bought earlier. The model is built on a dataset containing identifiers. These identifiers are both for individual cases and for the items that cases contain. These groups of items in a data set are called as an item set. The algorithm traverses a data set to find items that appear in a case. MINIMUM_SUPPORT parameter is used any associated items that appear into an item set.

Q.What is Sequence clustering algorithm?
Sequence clustering algorithm collects similar or related paths, sequences of data containing events. The data represents a series of events or transitions between states in a dataset like a series of web clicks. The algorithm will examine all probabilities of transitions and measure the differences, or distances, between all the possible sequences in the data set. This helps it to determine which sequence can be the best for input for clustering.
E.g. Sequence clustering algorithm may help finding the path to store a product of “similar” nature in a retail ware house.

Q.Explain the concepts and capabilities of data mining.
A.Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Explore the data in data mining helps in reporting, planning strategies, finding meaningful patterns etc. it is more commonly used to transform large amount of data into a meaningful form. Data here can be facts, numbers or any real time information like sales figures, cost, meta data etc. Information would be the patterns and the relationships amongst the data that can provide information.

Q.Explain how to work with the data mining algorithms included in SQL Server data mining.
A.SQL Server data mining offers Data Mining Add-ins for office 2007 that allows discovering the patterns and relationships of the data. This also helps in an enhanced analysis. The Add-in called as Data Mining client for Excel is used to first prepare data, build, evaluate, manage and predict results.

Q.Explain how to use DMX-the data mining query language.
A.Data mining extension is based on the syntax of SQL. It is based on relational concepts and mainly used to create and manage the data mining models. DMX comprises of two types of statements:

Data definition and Data manipulation. Data definition is used to define or create new models, structures.

Example:
CREATE MINING SRUCTURE
CREATE MINING MODEL

Data manipulation is used to manage the existing models and structures.

Example:
INSERT INTO
SELECT FROM .CONTENT (DMX)

Q.Explain how to mine an OLAP cube.
A.A data mining extension can be used to slice the data the source cube in the order as discovered by data mining. When a cube is mined the case table is a dimension.

Data Warehouse Interview Questions

HowToGetSoftwareJob

Monday, 26 March 2012

DataMining Interview Questions Part 2

No comments:

Post a Comment

Stats

About Me