These
tips have been developed over the course of my career in data
warehousing. They span information and lessons learned from multiple
clients across multiple industries. New tips as they are developed will
be added so be sure to check back often for new tips!
Never underestimate the complexity of a source system
Many data warehousing projects never fully appreciate the complexity of a source system until it is too late, which forces architecture changes late in development or data quality issues found after deployment. Often times ETL developers are pointed towards some source system tables and given no data dictionaries, no time to do substantial data analysis including data profiling and no SME from the source system owner. This is becoming increasingly common in larger enterprise organizations where distinct data warehousing groups exist seperated from the source system applications.
Obtain proper business requirements to guide development of data warehouse
Very rarely will building a data warehouse without proper requirements be successfull. However it is commonly done, especially in situations where IT feels it cannot wait for business requirements or that IT feels the business is going to want everything so lets just bring everything into the data warehouse. The end result is that the data warehouse team is tasked with too large of a project with no real clear direction or end. Issues also result from not knowing how the business wants to use the data which often leads to issues surrounding data quality, grain, implied relationships and calculations/derivations. Proper requirements are mandatory in every other IT project don't make them optional in building your data warehouse.
The data warehouse can't do all of the integration
If source systems put little to no effort into integratin then the job to further integrate data in the data warehouse is very difficult. The integration between multiple source systems at the data layer often makes it extremely difficult or impossible for the data warehouse to complete the integration process. Often times a tight integration at the application layer exists but at the data layer many of the details behind that integration are lost. Source systems are usually concerned with getting the data they need right now and are not concerned with how a data warehousing application might need the data in the future. The better that source system integration is the better data quality will be in the data warehouse and one of the primary costs of building a data warehouse which is data integration goes down signficantly.
Never underestimate the complexity of a source system
Many data warehousing projects never fully appreciate the complexity of a source system until it is too late, which forces architecture changes late in development or data quality issues found after deployment. Often times ETL developers are pointed towards some source system tables and given no data dictionaries, no time to do substantial data analysis including data profiling and no SME from the source system owner. This is becoming increasingly common in larger enterprise organizations where distinct data warehousing groups exist seperated from the source system applications.
Obtain proper business requirements to guide development of data warehouse
Very rarely will building a data warehouse without proper requirements be successfull. However it is commonly done, especially in situations where IT feels it cannot wait for business requirements or that IT feels the business is going to want everything so lets just bring everything into the data warehouse. The end result is that the data warehouse team is tasked with too large of a project with no real clear direction or end. Issues also result from not knowing how the business wants to use the data which often leads to issues surrounding data quality, grain, implied relationships and calculations/derivations. Proper requirements are mandatory in every other IT project don't make them optional in building your data warehouse.
The data warehouse can't do all of the integration
If source systems put little to no effort into integratin then the job to further integrate data in the data warehouse is very difficult. The integration between multiple source systems at the data layer often makes it extremely difficult or impossible for the data warehouse to complete the integration process. Often times a tight integration at the application layer exists but at the data layer many of the details behind that integration are lost. Source systems are usually concerned with getting the data they need right now and are not concerned with how a data warehousing application might need the data in the future. The better that source system integration is the better data quality will be in the data warehouse and one of the primary costs of building a data warehouse which is data integration goes down signficantly.
No comments:
Post a Comment