DBT
                        dbt, data build tool helps to transform data. dbt performs the T in ELT process
What is dbt
- dbt performs the T in ELT (Extract, Load, Transform) process
 
- dbt is extremely good at transforming data that’s already loaded into your warehouse
 
- dbt enables analytics engineers to transform data in their warehouses by simply writing select statements
 
What it is not
- dbt does not extract or load data from other systems or databases
 
Why Data Transformation
Data from source systems needs to be transformed for the purpose of:
- Data Validation & Cleansing
 - Data Standardization & Enrichment
 - Data Integration between systems
 - Apply Business Logic & Rules
 - Derive Business KPI
 - Modeling suited for Analytics
 
dbt in Data Platform Landscape

Features of dbt
Software Engineering best practices can also be applied to Data Engineering Pipelines-
- Version Control - Analytic code should be version controlled. Analysis changes as data and businesses evolve, and it’s important to know who changed what, when
 
- Quality Assurance - Bad data can lead to bad analyses, and bad analyses can lead to bad decisions. Any code that generates data or analysis should be reviewed and tested
 
- Documentation - Data analysis should come with a basic description of how it should be interpreted, and the team should be able to add to that documentation as additional questions arise
 
- Modularity - Data analysis code should also follow the principles of Reusability, Manageability & DRY
 
How dbt works
- Main focus will be on writing models (i.e. SELECT only queries) that reflect core business logic
 
- There’s no need to write boilerplate code to create tables and views, or to define the order of execution of your models
 
- Instead, dbt handles turning these models into objects in the warehouse
 
- Generates the DDL required to build the model, as per the model's materialization
 
- Executes the compiled queries against your data warehouse
 
- Maintain model dependencies
 
Data Platform without DBT

DBT Based Flow

DBT Overview
dbt Projects
- A dbt project is a directory of .sql and .yml files
 
- The directory must contain at a minimum:
 
- Project file: a dbt_project.yml file which configures and defines your dbt project
 
- Models: A model is a single .sql file. Each model contains a single select statement that either transforms raw data into a dataset that is ready for analytics, or, more often, is an intermediate step in such a transformation
 - Projects typically contain a few other resources as well, including snapshots, tests, and seed files
 
dbt Adapters
- dbt connects to and runs SQL against your database, warehouse, platform, or query engine. It works by using a dedicated adapter for each technology
 
- You will need a data warehouse with source data already loaded in it to use dbt
 
- dbt natively supports connections to Snowflake, BigQuery, Redshift and Postgres data warehouses
 - When you define your connection, you’ll also be able to specify the target schema where dbt should create your models as tables and views
 
dbt Sources
- Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools
 
- Select from source tables in your models using the source() function. It helps to define the lineage of your data
 
- Test your assumptions about your source data
 
- Calculate the freshness of your source data
 
dbt Models
- A model is a select statement.
 
- Models are defined in .sql files (typically in your models directory)
 
- Each .sql file must contain only one model
 
- The name of the file is used as the model name
 
- By default dbt will create the models as views
 
- Build models in a target schema you define
 - A model can be referenced in downstream models using the ref() function. It helps to define the lineage of your data
 
dbt Materializations
- dbt handles boilerplate code to materialize queries as relations. For each model you create, you can easily configure a materialization
 
- A materialization represents a build strategy for your select query – the code behind a materialization is robust, boilerplate SQL that wraps your select query in a statement to create a new, or update an existing, relation
 
- dbt ships with the following built-in materializations:
 
- view (default): The model is built as a view in the database
 
- table: The model is built as a table in the database
 - ephemeral: The model is not directly built in the database but is instead pulled into dependent models as common table expressions (CTEs)
 - incremental: The model is initially built as a table, and in subsequent runs, dbt inserts new rows and updates changed rows in the table
 
Advantages of dbt
- A well known and widely used Technology stack in modern Data Engineering space
 
- Quicker development, testing & documentation + less overhead on infrastructure management
 
- Less time spent troubleshooting data pipelines
 
- Compliments & co-exists with Airflow & Snowflake
 
- Ease to use with great documentation leads to low learning curve
 
- Data Transformation, Data Lineage & Data Documentation, Data Searching are the strong points which also covers the Data Governance aspects
 
- Many engineers are familiar with SQL & dbt hence onboarding new team members to data platform is fast
 
- Migration of Target Data warehouse will have minimal effect on existing data pipelines
 
dbt Cloud Features
- A web-based IDE to develop dbt projects
 
- A purpose-built Job scheduler
 
- Share dbt documentation with your team
 
- VCS Integration
 
- Logging & Alerting
 
- Single Sign On (SSO)
 
- Role-based ACLs
 - Audit logging
 
dbt Cloud vs Open Source
- Provide a place for data analysts to complete a request in a few hours
 
- Without getting caught up in the intricacies of infrastructure (CLI, k8s cluster access)
 
- CI is self-serve for less technical users. All in git / hosted over internet so can easily access
 
- Notifications - hosted logs in runs accessible to users. Not locked in k8s cluster
 
- Access for everyone - giving access to business users to reduce bottlenecks → increase change velocity
 
- without the overhead of manual coordination and infrastructure management
 
In-House Tooling vs. dbt Cloud

Reference:
- Business Case Guide & dbt Cloud TCO Calculator:
https://www.getdbt.com/resources/making-the-case-for-dbt-cloud