HowToGetSoftwareJob

HowToGetSoftwareJob
HowToGetSoftwareJob Google Hangout

Monday, 19 March 2012

Data Warehouse Concepts Interview Questions

1. Define Data Warehouse.
Ans.
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.
2. Which are three layers in which data warehouse maintains it’s functions?
Ans.
A data warehouse maintains its functions in staging, integration and access.
3. Can you elaborate it?
Ans. Staging:
It is used to store raw data for use by developers.
Integration: It is used to integrate data and to have a level of abstraction from users.
Access: It is for accessing data out of data warehouse for it’s users.
4. What is Data Mart?
Ans.
A Data Mart is the access layer of the data warehouse environment that is used to make data accessible to it’s users. It is subset of Data Warehouse.
5. What is dimensional approach for storing data in data warehouse?
Ans.
In dimensional approach transaction data are partitioned into either “facts”, which are generally numeric transaction data, or “dimensions”, which are the reference information that gives context to the facts.
6. What is normalization approach for storing data in data warehouse?
Ans.
In normalization approach the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.).
7. How we organize sales transactions data into dimensional approach?
Ans.
A sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.
8. What is the main advantage in using dimensional approach?
Ans.
The main advantage in using dimensional approach is that the data warehouse is easier for the user to understand and to use. The retrieval of data from data warehouse tends to operate very quickly.
9. What are the main disadvantages of using dimensional approach?
Ans.
There are mainly two disadvantages of using dimensional approach to storing data in data warehouse.
1. For maintaining integrity of facts and dimensions, the process of loading the data from data warehouse from different operational systems gets complicated.
2. If organization is adopting the dimensional approach changes the way in which it does the business, it is difficult to modify the data warehouse structure.
10. What are the advantages and disadvantage of using normalization approach for storing data in data warehouse?
Ans. Advantages
1. The main advantage is that it is easy to add information into the database.
Disadvantages
1. It is difficult to join data from different sources into meaningful information
2. It is also difficult to access the information without a precise understanding of the source of data and the data structure of the data warehouse.
11. What is dimension?
Ans.
It is a data element. It categorizes each item in a data set into non-overlapping regions. It aids to “slice and dice” data in data warehouse.
12. Can you describe different types of dimensions?
Ans.
There are three types of dimensions are available in dimensional approach.
Conformed dimension: It is a set of data attributes which is physically implemented in multiple database tables using same structure, attributes, domain values, definitions and concepts in each implementation. It cuts across many facts.
Dimensions are conformed when they are either same(with keys) or one is perfect subset of other.
Junk Dimension: It is grouping of low cardinality flags and indicators. These flags and indicators are removed from the fact table while placing them into a useful dimensional framework.
Role Playing Dimension: Often dimensions are recycled for multiple applications within the same database. i.e. “date” dimension can be used for “sales date” or “hire date”. This is referred as role playing dimension.
13. What is cube?
Ans.
Cube is a data structure that allows fast analysis of data. It is the capability of manipulating and analyzing data from different perspective.
14. What data cleansing(or data scrubbing) will do?
Ans.
It will remove errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse. (part of quality assurance process)
15. What do you know about data mining?
Ans.
It is process of researching data marts and data warehouses to detect specific patterns in the data sets. It may be performed on databases and multi-dimensional data cubes with ad-hoc query tools and OLAP.
The queries and reports are designed to answer specific questions to uncover trends or hidden relationships in the data.
16. What you know about drill-down and drill-up?
Ans. Drill-down
is the ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube.
Drill-up is the ability of a data-mining tool to move back up into higher levels of details in a data mart, data warehouse or multi-dimensional data cube.
17. What is a fact table?
Ans.
It is primary table that contains measurements(mostly numeric data like grade, wages, etc).
18. What is OLAP(On-Line Analytical Processing)?
Ans.
Online retrieval and analysis of data to disclose business trends and statistics not directly visible in the data is known as OLAP.
19. What is OLTP(On-Line Transaction Processing)?
Ans.
It refers to a class of systems that facilitates and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.
20. What are advantages and disadvantages of data warehouse?
Ans. Advantages
of using data warehouse includes:
1. Integrating data from multiple sources.
2. Performing new types of analysis and reducing cost to access historical data.
3. Improving turnaround time for reporting and analysis.
4. Supporting ad-hoc query and reporting.
Disadvantage
1. Long initial implementation time and associated high cost.
2. Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries.
3. Data owners lose control over their data, raising ownership, privacy and security issues.
4. Adding new data source takes time and associated high cost.

No comments:

Post a Comment