Data Warehouse Interview Questions: Explain the difference between star and snowflake schemas

Q.Explain the difference between star and snowflake schemas.
A.A snow flake schema design is usually more complex than a start schema. In a start schema a fact table is surrounded by multiple fact tables. This is also how the Snow flake schema is designed. However, in a snow flake schema, the dimension tables can be further broken down to sub dimensions. Hence, data in a snow flake schema is more stable and standard as compared to a Start schema.

E.g. Star Schema: Performance report is a fact table. Its dimension tables include performance_report_employee, performance_report_manager
Snow Flake Schema: the dimension tables can be broken to performance_report_employee_weekly, monthly etc.
Star schema: A highly de-normalized technique. A star schema has one fact table and is associated with numerous dimensions table and depicts a star.
Snow flake schema: The normalized principles applied star schema is known as Snow flake schema. Every dimension table is associated with sub dimension table.

Differences:

A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.

Data warehousing - Differences between star and snowflake schema. - May 19, 2009 at 11:40 am by Rajmeet Ghai
Differences between star and snowflake schema.
A snowflake schema is a more normalized form of a star schema. In a star schema, one fact table is stored with a number of dimension tables. On the other hand, in a star schema, one dimension table can have multiple sub dimensions. This means that in a star schema, the dimension table is independent without any sub dimensions.

Q.What is the difference between view and materialized view?
A.A view is created by combining data from different tables. Hence, a view does not have data of itself.On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc. The data stored by calculating it before hand using queries.
When a view is created, the data is not stored in the database. The data is created when a query is fired on the view. Whereas, data of a materialized view is stored.

View:

Tail raid data representation is provided by a view to access data from its table.It has logical structure can not occupy space.
Changes get affected in corresponding tables.Materialized view

Pre calculated data persists in materialized view.
It has physical data space occupation. Changes will not get affected in corresponding tables.

Q.What is a Cube and Linked Cube with reference to data warehouse?
A.A data cube stores data in a summarized version which helps in a faster analysis of data. Where as linked cubes use the data cube and are stored on another analysis server. Linking different data cubes reduces the possibility of sparse data. E.g. A data cube may store the Employee_performance. However in order to know the hours which calculated this performance, one can create another cube by linking it to the root cube (in this case employee_performance). Logical data representation of multidimensional data is depicted as a Cube. Dimension members are represented by the edge of cube and data values are represented by the body of cube.

Q.What is junk dimension?
A.In scenarios where certain data may not be appropriate to store in the schema, this data (or attributes) can be stored in a junk dimension. The nature of data of junk dimension is usually Boolean or flag values.

E.g. whether the performance of employee was up to the mark? , Comments on performance.

A single dimension is formed by lumping a number of small dimensions. This dimension is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.
They are linked in order to make the data remain constant.

Q.What are fundamental stages of Data Warehousing?
A.Stages of a data warehouse helps to find and understand how the data in the warehouse changes.At an initial stage of data warehousing data of the transactions is merely copied to another server. Here, even if the copied data is processed for reporting, the source data’s performance won’t be affected.
In the next evolving stage, the data in the warehouse is updated regularly using the source data.In Real time Data warehouse stage data in the warehouse is updated for every transaction performed on the source data (E.g. booking a ticket).When the warehouse is at integrated stage, It not only updates data as and when a transaction is performed but also generates transactions which are passed back to the source online data.

Offline Operational Databases: This is the initial stage of data warehousing. In this stage the development of database of an operational system to an off-line server is done by simply copying the databases.

Offline Data warehouse: In this stage the data warehouses are updated on a regular time cycle from operational system and the data is persisted in an reporting-oriented data structure.

Real time Data Warehouse: Data warehouses are updated based on transaction or event basis in this stage. An operational system performs a transaction every time.

Integrated Data Warehouse: The activity or transactions generation which are passed back into the operational system is done in this stage. These transactions or generated transactions are used in the daily activity of the organization.

Q.What is Virtual Data Warehousing?
A.The aggregate view of complete data inventory is provided by Virtual Warehousing. The metadata is utilized for forming logical enterprise data model which is a part of database of record infrastructure , is contained in virtual data warehousing. The infrastructure consists of publishments of legacy database sysems with their metadta extracted. The standards JEE, JMS and EJBs are used in the infrastructure for the purpose of transactional unit requests and extract-tranform-load tools are used for loading real time bulk data.A virtual data warehouse provides a compact view of the data inventory. It contains Meta data. It uses middleware to build connections to different data sources. They can be fast as they allow users to filter the most important pieces of data from different legacy applications.

Q.What is active data warehousing?
A.The transactional data captured and reposited in the Active Data Warehouse. This repository can be utilized in finding trends and patterns that can be used in future decision making.An Active data warehouse aims to capture data continuously and deliver real time data. They provide a single integrated view of a customer across multiple business lines. It is associated with Business Intelligence Systems

Q.Difference between dependent and independent data warehouse
A.Dependent data ware house are build ODS,where as independent data warehouse will not depend on ODS.A dependent data warehouse stored the data in a central data warehouse. On the other hand independent data warehouse does not make use of a central data warehouse.

Q.What is data modeling and data mining?
A.Designing a model for data or database is called data modelling. Data is reposited in fact table and dimension table. Fact table consists of data about transaction and dimensional table consists of master data. Data model is used to design abstract model of database.

The process of obtaining the hidden trends is called as data mining. Data mining is used to transform the hidden into information. Data mining is also used in a wide range of practicing profiles such as marketing, surveillance, fraud detection.

Data modeling aims to identify all entities that have data. It then defines a relationship between these entities. Data models can be conceptual, logical or Physical data models. Conceptual models are typically used to explore high level business concepts in case of stakeholders. Logical models are used to explore domain concepts. While Physical models are used to explore database design.

Data mining is used to examine or explore the data using queries. These queries can be fired on the data warehouse. Data mining helps in reporting, planning strategies, finding meaningful patterns etc. it can be used to convert a large amount of data into a sensible form.

Data Warehouse Interview Questions

HowToGetSoftwareJob

Monday, 26 March 2012

Explain the difference between star and snowflake schemas

No comments:

Post a Comment

Stats

About Me