Добавить в избранное   Сделать стартовой   Главная   E-mail   Форум   Мой блог 
   
Cертификации

Errors

ETL

FAQ (по темам)

GIS

Web

wiki

Администрирование

Безопасность

Книги
Oracle, ...

Новости

ОС

Программирование

Проектирование БД

Производительность

Скачать

Советы

Тестирование

Установка

FAQ - по базам данных
FAQ - по базам данных
Установка СУБД
Oracle
Sybase
MySQL
PostgreSQL
MS SQL Server
Interbase, Firebird
Другие DB
Администрирование
Oracle
MySQL
Sybase
PostgreSQL
MS SQL Server
Interbase, Firebird
IBM DB2
Другие DB
Проектирование БД
Статьи
ETL
Теория БД
ErWin
Designer 2000
PowerDesigner
Хранилища данных
CASE средства
OLAP
Бизнес - анализ (BI)
Производительность
Oracle
MSSQL
Interbase, Firebird
IBM DB2
MySQL
PostgreSQL
SYBASE
Безопасность БД
Oracle
MS SQL Server
Инъекция SQL
Программирование
Transact-SQL
PL/SQL
C++
XML
SQL
PostgreSQL
MDX
Java
VBA Excel
Книги по базам
Oracle
Заказ книг
ОС
Установка и настройка
UBUNTU
ОС
Установка и настройка
UBUNTU
FAQ
FAQ - по базам данных
Главная arrow ETL arrow Generate surrogate key (ETL)

Generate surrogate key (ETL)

Печать E-mail

Generate surrogate key (ETL)

Goal

Fill in a data warehouse dimension table with data which comes from different source systems and assign a unique record identifier (surrogate key) to each record.

Scenario overview and details

To illustrate this example, we will use two made up sources of information to provide data about customers dimension. Each extract contains customer records with a business key (natural key) assigned to it.


In order to isolate the data warehouse from source systems, we will introduce a technical surrogate key instead of re-using the source system's natural (business) key.

A unique and common surrogate key is a one-field numeric key which is shorter, easier to maintain and understand, and independent from changes in source system than using a business key. Also, if a surrogate key generation process is implemented correctly, adding a new source system to the data warehouse processing will not require major efforts.

 

Surrogate key generation mechanism may vary depending on the requirements, however the inputs and outputs usually fit into the design shown below:
Inputs:
- an input respresented by an extract from the source system
- datawarehouse table reference for identifying the existing records
- maximum key lookup

Outputs:
- output table or file with newly assigned surrogate keys
- new maximum key
- updated reference table with new record

 

Proposed solution

Assumptions:
- The surrogate key field for our made up example is WH_CUST_NO.
- To make the example clearer, we will use SCD 1 to handle changing dimensions. This means that new records overwrite the existing data.
The ETL process implementation requires several inputs and outputs.
Input data:
- customers_extract.csv - first source system extract
- customers2.txt - second source system extract
- CUST_REF - a lookup table which contains mapping between natural keys and surrogate keys
- MAX_KEY - a sequence number which represents last key assignment

Output data:
- D_CUSTOMER - table with new records and correctly associated surrogate keys
- CUST_REF - new mappings added
- MAX_KEY sequence increased


The design of an ETL process for generating surrogate keys will be as follows:

 

ETL process to load extract and assign surrogate keys

The following spoon Transformation loads the customers extract into a data warehouse and assigns a surrogate (technical) key to all records. As a final step the transform updates the maximum key dictionary table.

ETL process design

The process has been designed in Pentaho Data Integration - Kettle Spoon 3.02.

There are probably many other ways to achieve the same goal, however our intention is also to show how various Spoon components work. For instance, most of the objects might be replaced by a Java Script component to do the calculations and descrease the number of objects.


Inputs:
- CSV input textfile, an extract from the source system
- Datawarehouse table (WH_PARAMS) reference for lookup of the maximum surrogate key

Outputs:
- Output table with newly loaded extract and assigned surrogate keys (D_CUSTOMER)
- New maximum key which updates the reference table (WH_PARAMS)

Please refer to the screenshot below for the transform design:

Surrogate key generation transform design in Kettle Spoon:

Surrogate key generation transform design in Kettle Spoon

 

Step by step ETL process flow

Below the description of all ETL process steps used in the transformation with the specification of used Spoon objects:

 
 
« Пред.   След. »
Взаимосвязанные статьи
     

Последние добавленные статьи
Поиск
Ссылки
Главная
Скачать
Курсы
Роль АБД (SYSDBA)
Карта сайта
Автостекла
Контакты
Войти на сайт
Популярные статьи
Online - тесты
1Z0-042
Rambler's Top100 МЕТА - Украина. Рейтинг сайтов хостинг от freehost.com.ua

Все права защищены.SYSDBA 2010 | Если у Вас есть хороший материал пришлите его нам.