|
ETL technology assimilates data,
mostly through batch processing from
source systems within the enterprise
into integrated and consistent data
suitable for consumption by
downstream decision support target
systems. Source and target systems
are usually databases and files, but
they can also be other types of data
stores such as a message queue. In
more recent times ETL systems have
also been utilised to migrate from
old or legacy solutions to new
applications.
The traditional target for an ETL
system is a database such as a data
warehouse, data mart or operational
data store. ETL systems integrate
data between your operational
systems and your decision support
systems, data can be extracted in
schedule-driven pull mode or
event-driven push mode. Pull mode
operation supports data
consolidation and is typically done
in batch, push mode operation is
done online by propagating data
changes to the target data store.
Data transformation may involve data
record restructuring and
reconciliation, data content
cleansing and/or data content
aggregation. Data loading may cause
a complete refresh of a target data
store or may be done by updating the
target destination. Interfaces used
here include de facto standards like
ODBC, JBDC, JMS, for example, or
native database and application
interfaces. Early ETL solutions
involved running batch jobs at
scheduled intervals to capture data
from flat files and relational
databases and consolidate it into a
data warehouse database managed by a
relational DBMS.
Over recent years, commercial ETL
vendors have made a wide range of
improvements and extensions to their
products in both the design and
operational functions, such as:
DESIGN
Additional sources (i.e. legacy
data, application packages, XML
files, Web logs, EAI sources, Web
services and unstructured data),
additional targets (i.e. EAI targets
and Web services) and improved data
transformation (i.e. user defined
exits, data profiling and data
quality management, support for
standard programming languages, DBMS
engine exploitation and Web
services).
OPERATIONAL
Better administration (i.e. job
scheduling and tracking, metadata
management, error recovery), better
performance (i.e. parallel
processing, load balancing, caching,
support for native DBMS application
and data load interfaces), improved
usability (i.e. better visual
development interfaces) and support
for a data federation approach to
data integration.
These enhancements can extend the
use of ETL systems beyond just
consolidating data for data
warehousing and legacy application
migration to a wide range of other
enterprise data integration
projects.
|