
Microsoft Fabric Mirroring changes the game with Data Ingestion, giving you near real-time data with a no-code framework.
Microsoft’s Fabric Mirroring will change how you perform data ingestion. If you are using products to automate batch processes for data dumping, did you know that Fabric Mirroring might remove the need for these tools and provide you with near real-time access to the data as it changes in the source systems?
Suppose you have not yet heard of the medallion architecture. In that case, it involves utilizing bronze, silver, and gold layers to describe the data processing processes from intake into your data hub to consumption from your reporting applications of choice. This multi-layered approach existed before I started my analytics career in the early 2000s. Think of it simply as bronze being your unprocessed data, silver being somewhat cleaned and organized data processed from your bronze layer, and gold being your aggregated and optimized data ready for prime-time business insights.
It’s essential to understand the evolution of data management. From the ’90s to the early 2000s, the process of getting data from each application (referred to as a spoke) into your data repository (data hub) was complex. In the Microsoft world, multiple SSIS packages or other processes were used to pull data into tables with varchar(max); this was typically a batch process that ran on a schedule, leading to potential issues. There were so many SSIS packages that we needed an automation language to build them all, rather than doing them individually.
Many businesses’ analytics projects struggle to quickly integrate the correct data into their hub so that data transformations and validations can be effective. If you get this wrong, there is no point in collecting $200 and passing Go. Your data analytics project might end up going straight to jail.
How can we load data quickly and successfully?
I am introducing you to a no-code, near-real-time option for loading your data into your data lake (data hub) within Fabric. This new feature is known as Fabric Mirroring.
While I love the functionality of Fabric Mirroring, I am not a fan of the name. Many people with SQL Server experience think this is similar to Database Mirroring because these names are similar.
In my opinion, Fabric mirroring is similar to implementing Change Data Capture (CDC) on your SQL Server databases. CDC feeds data into a real-time streaming tool like Apache Kafka to copy data from your spoke (SQL Server application database) into your hub (Data Lake).
The benefit here is twofold. First, you don’t have to manage the Change Data Capture or Kafka implementations. Second, and most importantly, this is more than just an SQL Server solution. In the future, you can use Fabric Mirroring to ingest data from all your sources (spokes) into your data hub in near real-time, with minimal to no code required.
For example, here is how to use Fabric Mirroring to import Dynamics 365 or Power Apps data into Fabric. You can do the same for Azure Cosmos Database and Snowflake. SQL Server is coming soon.
Currently, the following databases are available:
| Platform | Near real-time replication | Type of mirroring |
|---|---|---|
| Microsoft Fabric mirrored databases from Azure Cosmos DB (preview) | Yes | Database mirroring |
| Microsoft Fabric mirrored databases from Azure Databricks (preview) | Yes | Metadata mirroring |
| Microsoft Fabric mirrored databases from Azure Database for PostgreSQL flexible server (preview) | Yes | Database mirroring |
| Microsoft Fabric mirrored databases from Azure SQL Database | Yes | Database mirroring |
| Microsoft Fabric mirrored databases from Azure SQL Managed Instance (preview) | Yes | Database mirroring |
| Microsoft Fabric mirrored databases from Snowflake | Yes | Database mirroring |
| Microsoft Fabric mirrored databases from SQL Server (preview) | Yes | Database mirroring |
| Open mirrored databases | Yes | Open mirroring |
| Microsoft Fabric mirrored databases from Fabric SQL database (preview) | Yes | Database mirroring |
Now I know I can use Fabric Mirroring to help me get near real-time data into my hub with no code required. Why else should Fabric Mirroring be a game-changer for my analytics projects?
The Fabric Mirror enables us to accomplish a lot more in less time.
Suppose you have an SLA for getting data into a data warehouse in 24 hours. Processing through all the layers took you 20 hours (12 hours into bronze, 6 hours from bronze to silver, and 6 hours from silver to gold). If you now had near real-time, say 90 seconds, to get changes into bronze, that gives you an extra 11 hours and 59 minutes to improve data quality, data validation, and other processes upstream.
Centralized Data Management
Having a single hub that the applications (spokes) automatically send data to, a centralized database, and the clients and tools used, eliminates the need to install additional software. You now transition from pulling data from the spokes with batch processing to pushing data from the spokes in near real-time. It also simplifies data governance and enhances security because combining this with Preview lets you see which spokes the data goes into.
For example, you must comply with GDPR, and Sarah in the UK has now requested that her data be removed. You can now easily find the data in the spokes from the hub to determine what data needs to be purged quickly.
Simplified Data Ingestion.
Instead of mixing and matching different data sources, your delta tables will be created across your Cosmos Databases, Azure SQL databases, Dynamics 365, and other future fabric mirroring sources. You no longer need to worry about which sources are in Excel, CSV, flat file, JSON, etc. They are all in the same format, ready for you to do your transformations, data validation, and apply any business rules required for your silver level.
Improved Query Performance
Those who know me know that I love discussing query performance tuning. I am passionate about making databases go just as fast as your favorite F1 race car. I also know that you have at least one group of people running reporting queries against your line-of-business application database or an availability group replica. This leads to increased locks that slow down the original purpose of your application databases. These locks are now removed, and these reports can be sent against your data hub.
The mirrored data is also stored in an analytics-ready format, such as delta tables, which enhances query performance across various tools within Microsoft Fabric, including Power BI.
What if you cannot use Fabric Mirroring?
The sources for Microsoft Fabric to date are limited. If I had on-premise data sources or other sources that are not ready for Fabric Mirroring, I would still encourage this architecture approach of using change data capture, where available, to lead to streaming your data into your data hub of choice.
About ProcureSQL
ProcureSQL is the industry leader in providing data architecture as a service, enabling companies to harness their data and grow their business. ProcureSQL is 100% onshore in the United States and supports the four quadrants of data, including application modernization, database management, data analytics, and data visualization. ProcureSQL serves as a guide, mentor, leader, and implementer, providing innovative solutions to drive better business outcomes for all businesses. Click here to learn more about our service offerings.