Build a Data Platform from scratch (Airflow 3.0, Databricks, Kubernetes)
About
No channel description available.
Latest Posts
Video Description
💻 source code: https://github.com/jayzern/databricks-airflow3.0-template Build and deploy a Data Platform from scratch using cutting edge tools like Airflow 3.0, Databricks, and Kubernetes. This year’s been a crazy year with AI - but I wanted to go back to fundamentals on how to create a full stack data platform to support thousands of data scientists/engineers. Whether you’re new to data engineering, this video will walk you through the basics of a data platform, kubernetes deployment, and way more. Demand for data engineers will only increase as AI models make us more productive over time. Key Features 1. Data Aware orchestration using Data Assets 2. Declarative transformation on top of Databricks notebooks 3. Incremental upsert loading using Delta Lake 4. Local kubernetes cluster running on Kind 5. Data Quality checks using DQX library 6. Git-sync sidecar to mount `/dag/` folders in your Airflow deployment 7. Local Persistent Volume and Persistent Volume Claims for logging 8. Installing Airflow using Helm charts 9. CI/CD using Github actions 10. AWS ECR for container repository Timestamp 0:00 intro 2:24 architecture overview 7:55 databricks setup 16:52 create a spark job 28:16 data quality check using DQX 34:56 declarative transformations w spark 43:45 incremental upsert using delta lake 51:00 basic spark optimization 54:38 one big table denormalization 1:01:38 dashboard 1:06:11 transformation using databricks workflows 1:09:04 save your work on github 1:11:08 airflow 3.0 + kubernetes overview 1:15:13 intro to kubectl 1:19:19 install airflow using helm charts 1:32:09 example dag 1:36:33 ingestion for posts and users 1:37:27 data aware orchestration 1:49:44 external volume mount 1:52:24 databricks workflow trigger operator 2:02:21 KubernetesExecutor + git sync side car container 2:12:42 persistent volume / persistent volume claim for logs 2:21:38 deployment to AWS ECR using github actions 2:33:13 thank you Socials 📱 linkedin: https://www.linkedin.com/in/jayzern/ insta: https://www.instagram.com/jayzern/ Sub Count: 20,090
Essential Data Engineering Tools
AI-recommended products based on this video

Thdeukoty Mini PC with Core i9-9880H 2.3 up to 4.8GHz, 64G DDR4 2T SSD, Windows 11 Pro Desktop Computer, DP*1, HDMI*2 4K@60Hz Triple Display, Optical, Dual LAN, WiFi6E/BT5.3,VESA

GEEKOM A5 Mini PC, AMD Ryzen 7 5825U(8C/16T, up to 4.5GHz), 16GB DDR4&512GB M.2 PCIe NVMe SSD, Vega 8 Graphics, Windows 11 Pro Desktop Computer Support 8K UHD/Wi-Fi 6/Bluetooth 5.2/USB 3.2

GEEKOM A6 Mini PC (3-Year Quality Support) with AMD Ryzen 7 6800H(8C/16T), 32GB DDR5&1TB M.2 2280 NVMe Gen4*4 SSD, Windows 11 Pro Desktop Computer Support 8K UHD/Wi-Fi 6/Bluetooth 5.2/USB 3.2

GEEKOM A8 AI Mini PC (3-Year Quality Support) with AMD Ryzen 9 8945HS Mini Computers, 32GB DDR5 RAM&2TB M.2 2280 NVMe Gen4*4 SSD, Windows 11 Pro Mini Desktop, WiFi 6E/BT5.2/USB4/8K/Quad Display

BOSGAME Linux Mini PC, Intel 12th N100 16GB DDR4 RAM 512GB SSD Linux Server Computers Preinstalled Ubuntu 22.04,Support 4K Triple Display/USB3.2/WiFi 5/2.5GbE

TP-Link Tapo 2K Pan/Tilt Indoor Security WiFi Camera, Baby & Pet Camera w/ 360° Motion Tracking, 2-Way Audio, Night Vision, Cloud & Local Storage (Up to 256 GB), Works w/ Alexa & Google (Tapo C210)

TP-Link Tapo 3K 5MP Pan/Tilt Security WiFi Camera, Baby & Pet Camera, 360° Motion Tracking, 2-Way Audio, 40Ft. Night Vision, Cloud & Local Storage (Up to 512 GB), Works w/Alexa & Google (Tapo C230)

TP-Link Tapo 2K Pan/Tilt Indoor Security WiFi Camera, Baby & Pet Camera w/ 360° Motion Tracking, 2-Way Audio, Night Vision, Cloud & Local Storage (Up to 256 GB), Works w/ Alexa & Google (Tapo C210)

![What is Dagster? Asset Based Orchestration [2hr full course]](https://imgz.pc97.com/?width=500&fit=cover&image=https://i.ytimg.com/vi/Xe8wYYC2gWQ/hqdefault.jpg)

