I recently completed the Microsoft Fabric workshop (DP-600) and spent time building an end-to-end reporting pipeline. In this post I summarise my key takeaways and practical patterns that made the workflows robust.
 
       Microsoft Fabric Overview
Microsoft Fabric is a unified, AI-powered data platform that simplifies the entire data lifecycle—from ingestion to visualization. It is a cloud-based solution providing end-to-end services for data management, analytics, and decision-making, seamlessly integrating Azure and Power BI for real-time insights, data processing, and machine learning to enhance business decisions.
Why Microsoft Fabric?
- Unified Data Platform: Covers data ingestion, preparation, storage, analysis, and visualization.
- AI-Powered Analytics: Automates tasks and provides real-time insights using built-in ML models.
- Seamless Azure Integration: Enhances processing capabilities for large and complex datasets.
- Cross-Platform Collaboration: Enables data scientists, analysts, and business users to work on the same datasets.
- Scalability & Flexibility: Adapts to growing data volumes and evolving business needs without compromising performance or security.
Key Features
- Power BI: Transform data into interactive visual insights.
- Data Factory: Automate data migration and integration for smooth analysis.
- Data Activator: Monitor data proactively and set automated alerts.
- Industry Solutions: Pre-built templates and industry-specific insights.
- Synapse Data Engineering: Clean, transform, and prepare data at scale in a low-code environment.
- Synapse Data Science: Build and deploy advanced ML models in a unified workspace.
- Synapse Data Warehouse: Secure, scalable storage optimized for fast querying.
- Synapse Real-Time Analytics: Gain real-time insights from streaming data.
Per User License
- Free: Allows you to create and share Fabric content other than Power BI items if you have access to a Fabric capacity (trial or paid).
- Pro: Lets you share Power BI content with other users. Every organization needs at least one user with a Pro or Premium Per User (PPU) license to use Power BI within Fabric.
    - SKUs smaller than F64 require a Power BI Pro or PPU license for each user consuming Power BI content.
- Content in workspaces on F64 or larger Fabric capacities is available for users with a Free license if they have a viewer role.
 
- Premium per-user (PPU): Allows organizations to access Power BI Premium features by licensing each user with PPU instead of purchasing Premium capacity.
    - More cost-effective when fewer than 250 users need Premium features.
- Uses a shared capacity across the organization to provide computing power for Power BI operations.
 
Delta Parquet Format
Parquet is a columnar file format commonly used in data lakes for efficient storage and faster queries. However, it has limitations with updates, deletes, transactions, and time travel.
Delta Parquet (Delta Lake) enhances Parquet by adding a transaction log (_delta_log), enabling:
- ACID transactions for reliable and consistent operations.
- Direct updates and deletes, not just append operations.
- Time travel to query older versions of data.
- Schema evolution for handling data structure changes over time.
In Microsoft Fabric, all tables are stored in Delta Parquet format by default, ensuring consistent, fast, and easily manageable data usable across SQL, Spark, and Power BI.
Workspace Roles
- Admin: Full control over the workspace, including role and permission management.
- Member: Can create, modify, and share items within the workspace.
- Contributor: Can create and modify items with limited sharing capabilities.
- Viewer: Can view items but cannot modify or share them.
Advantages
- Data-driven decisions with AI and machine learning.
- Reduced data complexity with end-to-end lifecycle management.
- Improved collaboration across teams.
- Cloud-native architecture for scalable, secure data management.
Use Cases by Industry
- Retail: Unify customer data, analyze sales forecasts, optimize inventory, and personalize shopping experiences.
- Healthcare: Integrate data from EHRs, medical devices, and surveys; personalize treatment and improve diagnostics.
- Sustainability: Process ESG metrics, optimize initiatives, and comply with reporting requirements.
Interoperability with Other Services
- Azure SQL Database: Structured data storage with unstructured data processing.
- Power BI: Real-time visualization and reporting.
- Azure Integration: Secure, scalable cloud infrastructure.
Business Model
SaaS with pay-as-you-go or reserved Fabric Capacity. Benefits include:
- Single pool of compute for all workloads.
- Flexible scaling up/down as needed.
- Centralized dashboard for usage and cost monitoring.
Data Management & Architecture
Medallion Architecture:
- Bronze → Raw data
- Silver → Transformed data
- Gold → Curated/Final data
Data Storage: Delta Parquet format for ACID transactions, schema evolution, and time travel.
Mirroring vs OneLake
- Mirroring:
    - Makes an exact copy of data in another location.
- Used mainly for backup or replication.
- Changes in the original don’t automatically sync unless configured.
- Think of it as a “photocopy” of your data.
 
- OneLake:
    - Fabric’s centralized data lake.
- Stores all tables in Delta Parquet format.
- Single source of truth: all tools (SQL, Power BI, Spark) access the same data.
- Changes are immediately available everywhere.
- Think of it as a “shared cloud notebook” for all your data.
 
Workspace Roles:
- Admin – Full control
- Member – Create, modify, share items
- Contributor – Create/modify items, limited sharing
- Viewer – View only
Data Movement & Transformation
- Dataflow Gen2: Prepares and transforms data in Delta Parquet format, supports incremental refresh and time travel.
- Pipelines: Orchestrates multiple data tasks like dataflows, copy data, notebooks, and control flows.
- Data Wrangler: Clean, transform, and prepare data visually without code.
- Semantic Tables: Store metadata and relationships to make analytics easier.
Data Modeling Concepts
- Fact Table: Stores numeric data, tall/slim, can have duplicates.
- Dimension Table: Stores descriptive/categorical info, wide/fat, no duplicates.
- Flat Table: Single initial table for analysis.
- Star Schema: One fact table connected to denormalized dimension tables for fast querying.
- Snowflake Schema: Normalized dimension tables to save space and maintain consistency.
Power BI Concepts
- Table vs Matrix: Table shows raw data; Matrix supports summarization, grouping, hierarchies, and drill-down.
- Row & Filter Context: Row context processes each row individually; filter context applies slicers or visual filters automatically.
- Pivoting & Unpivoting: Restructure data for analysis.
- Duplicate vs Reference: Duplicate = independent copy; Reference = linked version with transformations.
- Import Mode: Data loaded into memory; fast but requires refresh.
- DirectQuery Mode: Queries live data; always fresh but slower.
Practical Tasks & Exercises
- Create pipelines to pull data from Azure Blob Storage and SharePoint.
- Load data from raw to silver, transform, and store in gold warehouse.
- Create semantic models and publish dashboards.
- Power Apps Invoice/Employee Salary Slip App using Fabric backend.
- Clean, transform, and create dashboards from IPL dataset using Snowflake and OneLake.
Additional Concepts
- Real-time intelligence: KQL for live analysis; OneLake for historical data.
- Mirroring vs OneLake: Mirroring = backup copy; OneLake = centralized single source of truth.
- Dataflow Gen1 vs Gen2: Gen2 is faster, scalable, supports Delta Parquet.
Outcome
Microsoft Fabric enables end-to-end data management, AI-driven insights, and collaborative analytics. By combining Delta Lake storage, pipelines, dataflows, semantic models, and Power BI visualizations, users can transform raw data into actionable intelligence efficiently and securely.