Слайд 1Introduction to SQL Server 2008 Integration Services
Слайд 2Disclaimer
The information contained in this slide deck represents the current
view of Microsoft Corporation on the issues discussed as of
the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.
© 2008 Microsoft Corporation. All rights reserved.
Microsoft, SQL Server, Office System, Visual Studio, SharePoint Server, Office PerformancePoint Server, .NET Framework, ProClarity Desktop Professional are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Слайд 3Where Are We?
Data Warehouse
Client Access
Слайд 4Agenda
Reviewing ETL Fundamentals
Introducing SSIS Components
Designing ETL Packages
Slowly Changing Dimension Transformation
Presentation
title goes in here
Слайд 5Module Overview
Reviewing ETL Fundamentals
Introducing SSIS Components
Designing ETL Packages
Slowly Changing Dimension
Transformation
Слайд 6Reviewing ETL Fundamentals
Data Integration
Business Scenarios
SQL Server 2008 Data Integration
Populating Dimension
Tables
Populating Fact Tables
Слайд 7Data Integration
The ability to transform corporate data into meaningful and
actionable information
Challenges
Retrieve and merge data from multiple sources
Cleanse and transform
the data
Load the data into appropriate data stores for analysis and reporting
Enterprises spend 60%–80% of their resources in the data integration stage
Слайд 8Business Scenarios
Populating data warehouses (ETL)
Cleaning and standardizing data
Merging data from
heterogeneous data stores
Automating administrative tasks
Слайд 9SQL Server 2008 Integration Services
Satisfies traditional demands of ETL operations
and general-purpose data integration
Provides a robust, flexible, fast, scalable, and
extensible architecture
Challenges traditional ETL design approaches
Слайд 10Populating Dimension Tables
Y
Insert new
record
Update changed
column(s)
Expire existing
record
Transform
Correlate
records
N
N
Y
Type 2
change?
Y
Type 1
change?
New record?
Слайд 11Populating Fact Tables
Y
Insert new
record
Insert new
dimension record
Lookup
dimension key
N
Lookup failed?
Repeat for each
dimension key
Transform
Слайд 12Introducing SSIS Components
Architecture
The Package
Connection Manager
Control Flow
Data Flow
Слайд 13DEMO
Examining the SSIS build environment
Слайд 14Architecture
SQL Server Integration Services
(SSIS) service
SSIS object model
Two distinct runtime engines:
Control
flow
Data flow
32-bit and 64-bit editions
Слайд 15The Package
The basic unit of work, deployment, and execution
An organized
collection of:
Connection managers
Control flow components
Data flow components
Variables
Event handlers
Configurations
Can be designed
graphically or built programmatically
Saved in XML format to the file system or SQL Server
Слайд 16Connection Manager
Logical representation of a connection
Stored in the package and
cannot be shared between
packages
Used by package elements
Does not need
to connect
to SQL Server
Слайд 18Control Flow
Control flow is a process-oriented workflow engine
A package contains
a single control flow
Control flow elements
Containers
Tasks
Precedence constraints
Variables
Слайд 19Containers
Provide structure and services for
Grouping tasks
Implementing repeating flows
Execute in sequence
defined by precedence constraints in control flow
Manage variable and transactional
boundaries
Слайд 20Data Flow Task
Encapsulates the data flow engine
Extract
Transform
Load
Слайд 22Scripting Tasks
Use a Script Task to develop scripts that are
precompiled for better performance
Develop VB.NET or C# .NET scripts in
Visual Studio for Applications (VSTA) using these features
IntelliSense
Color coding
Integrated Help
References:
.NET Framework libraries
Managed assemblies
COM objects
Debugging
Consider a Custom Task to reuse functionality between packages
Слайд 23Precedence Constraints
Link containers and tasks (executables) to control the order
of execution
Configure conditions that determine whether the executable runs
Success, Failure,
or Completion Constraints
Expressions
Logical AND/OR for
multiple constraints
Слайд 24Variables
Variables customize package behavior by changing expression values or object
properties
System variables store values collected while the package runs
User variables
store
values assigned
By default in the
current package
By a variable
passed by a parent package
By an expression
All variables use
case-sensitive names
Variables can be scoped
at package, container, or task level
Слайд 25DEMO
Developing a Query-Driven E-Mail Delivery System
Слайд 26Data Flow
The Data Flow Task
Encapsulates the data flow engine
Exists in
the context of an overall control flow
Performs traditional ETL in
addition to other extended scenarios
Is fast and scalable
Data Flow Components
Extract data from Sources
Load data into Destinations
Modify data with Transformations
Service Paths
Connect data flow components
Create the pipeline
Слайд 27Data Flow Sources
Sources extract data from
Relational tables and views
Files
Analysis Services
databases
Слайд 28Data Flow Destinations
Destinations load data to
Relational tables and views
Files
Analysis Services
databases and objects
DataReaders and Recordsets
Слайд 29Row Transformations
Update column values or create new columns
Transform each row
in the pipeline input
Слайд 30Rowset Transformations
Create new rowsets that can include
Aggregated values
Sorted values
Sample rowsets
Pivoted
or unpivoted rowsets
Are also called asynchronous components
Mar-2008
Microsoft Developer & Platform
Evangelism
Слайд 31Split and Join Transformations
Distribute rows to different outputs
Create copies of
the transformation inputs
Join multiple inputs into one output
Perform lookup operations
Слайд 32DEMO 2
Loading and Integrating Data from Multiple Sources
Слайд 33Designing ETL Packages
Orchestrating the ETL Process
Populating Dimension Tables
Populating Fact Tables
Слайд 34Orchestrating the ETL Process
A single package performs ETL for a
single dimension or fact table
A “master package” orchestrates the sequence
of the entire ETL process by using Execute Package tasks
Execute all dimension packages first
Execute all fact table packages second
A robust design includes extensive logging to store execution history and audit trails
Implement package logging to capture most execution details
Customize and extend logging by using
control flow tasks
Слайд 35Populating Dimension Tables
Y
Insert new
record
Update changed
column(s)
Expire existing
record
Transform
Correlate
records
N
N
Y
Type 2
change?
Y
Type 1
change?
New record?
Most dimension
packages benefit from this transformation
Слайд 36Populating Fact Tables
Y
Insert new
record
Insert new
dimension record
Lookup
dimension key
N
Lookup failed?
Process each dimension
key
Transform
The Lookup transformation is ideal for dimension key lookups and
can be configured to manage lookup failures
Слайд 37Slowly Changing Dimension Transformation
The wizard-based configuration promotes rapid ETL development
Supports
Type
0 (Fixed Attribute)
Type 1 (Changing Attribute)
Type 2 (Historical Attribute)
Inferred member
management
Automatically constructs the downstream data flow
Handles the majority of slowly changing dimension scenarios
Слайд 38LastName update to Valdez-Smythe
SCD Type 1
Existing record is updated
History is
not preserved
Слайд 39SCD Type 2
Existing record is ‘expired’ and new record inserted
History
is preserved
Most common form of Slowly Changing Dimension
SalesTerritoryKey update to
10
Слайд 40ISV Innovation
ISV Innovation will now be the headquarters for all
U.S. ISV training. We’re dedicated to providing comprehensive training for
the entire developer community so be sure to make www.isvinnovation.com your first stop for all your training needs.
This site is filled with fantastic new resources, features, and functionality. We plan to make this site a one-stop shop for all your ISV needs so if you have any feedback please contact us as usisv@microsoft.com.