HowToGetSoftwareJob

HowToGetSoftwareJob
HowToGetSoftwareJob Google Hangout

Monday, 26 March 2012

Datawarehouse Architecture Interview Questions


Q.What is real time data-warehousing?
A.Real-time data warehousing is a combination of two things: 1) real-time activity and 2) data warehousing. Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it. Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
Q.What are slowly changing dimensions?
A.SCD stands for Slowly changing dimensions. Slowly changing dimensions are of three types
1.SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new address.
2.SCD2: maintaining historical information and current information by using
a.Effective Date
b.Versions
c.Flags or combination of these
3.SCD3: by adding new columns to target table we maintain historical information and current information
Q.What are Semi-additive and factless facts and in which scenario will you use such kinds of fact tables?
A.Snapshot facts are semi-additive, while we maintain aggregated facts we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
While maintain the promotion values of the transaction (ex: product samples) because this table doesn’t contain any measures.
Q.Why build a Data Warehouse?
A.Data Warehouse can turn your company's data into timely information, facilitate intelligent decision making and give your organization a competitive edge.
A good Data Warehouse can:
1.Allow companies to build connections between the various disconnected islands of information
2.Provide one-stop shopping for all decision support and ad-hoc reporting needs
3.Provide a business view that simplifies end-user data access (masks underlying complexity of operational database structures)
4.Offload reporting and decision support processing from operational systems and onto the Data Warehouse (improved performance and reduced contention with OLTP activities)
5.Make end-users more self-sufficient by distributing the reporting and decision support function throughout the organization
6.Free-up I.T. resources for other (more complicated) development activities including building the Data Warehouse.
7.To achieve greater performance by separating the Decision Support and OLAP activities from the day to day OLTP activities. For instance, the star schema design with integer surrogate (synthetic) keys is faster than a third normal form relational structure.
8.To integrate data from different source systems by standardizing and transforming the data to common dimensions and measures. The banking industry, in particular, has seen considerable consolidation over the past decade.
9.To allow greater user interaction (self service) by providing the data in structures and terminology understandable by the user community.  This includes the ability to access the data using off the shelf query, statistical, predictive, and other OLAP softwares depending on the users' skills and responsibilities.
Information is power: providing your decision makers with access to complete, timely and reliable information will equip them to make better and more informed decisions.
Q.Why do many Data Warehouses fail?
A.There are many reasons why Data Warehouse efforts fail. Here are some of the most common reasons:
1.Lack of Business Sponsorship (Buy-in):The "If you build it they will come" scenario doesn't usually work.
2.Tackling Too Much Too Soon :By the time the Data Warehousing concept gains wide acceptance (and budget approval), ideas start flowing and everybody wants their data right away. This usually leads to over-committment, missed deadlines and cost over-runs.
3.Lack of DW expertise :Building a Data Warehouse requires an entirely different set of design and development expertise compared to building an OLTP application.
4.Poor DW design :Lack of DW expertise usually leads to poor initial DW database design - novice DW development teams tend to create inflexible DW models that are not condusive to change and/or the evolution of the Data Warehouse.
5.Lack of data integrity :Due to the complexity of managing and integrating multiple sets of source data, some DW efforts wind up with incorrect and/or duplicate data.
6.Missed deadlines and cost over-runs :Novice DW development teams tend to lack the expertise required to accurately estimate DW efforts, they also tend to lack the problem-solving bag of tricks required to keep the DW project on track. This usually results in missed deadlines, cost over-runs and erosion of end user confidence in the DW team.
7.Lack of timely delivery of new information : Once the initial phase of the Data Warehouse has been rolled out, and assuming that it was successful, demand for additional information (new subject areas) grows exponentially. This causes DW teams to delay and/or turn down requests, which results in disappointed end-users and the dreaded "too little, too late" scenario.
8.End-user analysis and reporting tools are too complex :Typical end-user Business Intelligence tools are overly complex for the infrequent and/or non-technically savy end-user. A large segment of the end-user community often reverts back to the operational reports and/or delegates reporting requests back to I.T. This unfortunately erodes one of the main benfits of building a Data Warehouse, which is empowering end-users and making them self-sufficient when it comes to decision support.
Q.How do I ensure the success of my Data Warehouse?
A.
1.Secure a strong Business Sponsor : A strong business sponsor will help secure funding and will help synchronize the data warehouse evolution with future business requirement changes.
2.Under-commit and over-deliver for your first phase: The Data Warehouse team's credibility is at stake (especially during the initial phases of the data warehouse). DW teams need to give themselves sufficient elbow room... in order to make sure that the initial phases are delivered on time, on budget, and that data integrity id paramount.
3.Supplement your team with seasoned DW experts:For the initial phases of the data warehouse, and until the development team is fully versed in the DW Design and Development activities, it is highly recommended that the internal design/development team is supplemented with 1 or 2 DW experts. These experts will bring proven DW design methodologies to the table, along with a substantial problem-solving bag of tricks, and will expedite the learning curve for the entire internal DW design/development team.
4.Design your DW for growth and constant change :The DW Database design is the foundation layer, on which future phases of the data warehouse are based - need to make sure that the foundation layer is flexible and robust enough to accomdate future DW changes and the addition of new subject areas without having to undo or redo previously built DW components.
5.Invest in a good ETL tool, it should pay for itself in the first phase of your DW project: Custom coding DW interfaces from scratch is both in-efficient and time consuming. ETL tools come pre-built with all the functionality one would need to develop even the more complex interfaces. Typical ETL tools will yield at a minimum 50% productivity improvement (i.e. at a minimum). you will be able to develop twice as many interfaces in the same timeframe. Not all ETL Tools are expensive, there are several full-featured second generation ETL tools on the market (like the Power*Loader) that sell for under $50,000.
6.Data integrity is paramount, since Data Integrity = End User Confidence = DW Credibility: Need to audit the integrity of the Data Warehouse on an on-going basis, to ensure that any data integrity issues are identified, communicated to the end-user and rectified in a timely fashion. Lack of end-user confidence in the underlying data will usually spell the end of their use of the Data Warehouse, as they will revert back to their operational reports and their source systems for more accurate and reliable data.
7.Promote the constant growth of the data warehouse: a static Data Warehouse is a dead Data Warehouse :A data warehouse that does not keep up with the evolution of business and data requirements will be obsolete very quickly. Plan on at least 2 or 3 data warehouse releases in the course of the year - this will keep the information fresh, should enable you to meet end-user demand for new information in a timely fashion, and should keep the DW users interest in the Data Warehouse high.
8.Involve the End-Users from start to finish : Business Users (especially power-users) are the Data Warehouse team's greatest allies: they will help direct the DW evolution, they will help you promote the use of the Data Warehouse, and they will be the first line support much of the data and reporting questions.
 
Q.What is Architecture?
A."Often, architecture and infrastructure are used interchangeably. There’s a subtle difference. Infrastructure consists of applications, hardware, and software. Architecture, on the other hand, refers to the overlying principles and processes that lead the organization’s infrastructure deployment.
Infrastructure is a snapshot, while architecture is a continuously evolving set of ideas and philosophies. Architecture includes infrastructure, but the two aren’t synonymous."
Q.What is Data Architecture?
A.Data architecture is "the blueprint of the data within your company."  It "includes things like enterprisewide data models, the meta-data catalog, and notions of data ownership."  Data infrastructure, on the other hand, covers the physical data structures and data transport mechanisms.
Q.What is EAI Architecture?
A.Enterprise Application Integration (EAI) typically divides the world into "point-to-point" or "hub-and-spoke".Organizations often have developed piecemail integrations incrementally from one application to another, that is, so-called "point-to-point" application integrations.  These incremental integrations span a variety of tools, and may be occasionally conflicting, since the integrations and data aren't viewed as a coherent, shared resource.
Forward-looking companies work toward a "hub-and-spoke" or enterprise approach.  Data sources and owners are defined, integrations are centrally controlled at the hub, and target systems are registered.  This probably sounds, and rightly so, very similar to data warehousing Extraction-Transformation-Load (ETL) architecture -- especially since a datamart often molds diverse data into a single "version of the truth."Suppose the company acquires a new business.The new business is just one more spoke on the company hub.  Data from the new business should flow (once the spoke is implemented) into the hub and be published as necessary through the enterprise.  For example, sales data from the new business should flow into the existing company sales datamart.
Q.What is ETL Architecture?
A.ETL architecture connects three areas:  source, staging, and target.  Sources are typically one or more OLTP systems. Staging is a common holding and work area.  Targets are typically a data warehouse or mart."The source to stage component is intended to focus the efforts of reading the source data (sourcing) and replicating the data to the staging area.""The stage to warehouse component focuses the effort of standardizing and centralizing the data from the source systems into a single view of the organization’s information. This centralized target can be a data warehouse, data mart, operational data store, customer list store, reporting database or any other reporting/data environment."The Best Practice is "to consolidate the business rules into the stage to warehouse ETL component".  This Best Practice can tie ETL architecture to EAI architecture.
Q.What is ODS?
A.ODS means Operational Data Store..A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.

No comments:

Post a Comment