Setting up¶
Sign in¶
You can sign in using the account with which the Qualdo instance is created.
After you successfully sign in, you need to configure at least one Datasource to continue further. Refer to the Datasources to know how to configure different Datasources.
To help you further with the initial setup, after your first login, you will be presented with a quick start and other helper video tutorials.
Environment¶
Before you can configure a Datasource, you must create an environment and organise your data within a particular environment. For example, if you have data stored across multiple environments like testing or staging or production, Qualdo makes it easier to manage environments. Qualdo then allows you to monitor data in all the different environments, very easily. You can also logically group your Datasources under respective environments.
Configuring Datasources¶
- Datasources
Datasources are nothing but the databases, datasets or files containing the data needs to be monitored. Qualdo currently supports Azure Blob Storage, Azure Data Lake, Snowflake, AWS S3, Google BigQuery, Postgres, MySQL, AWS Redshift, Google Cloud Storage, Azure SQL Server and Google Cloud SQL Server.
- Datasets
Each table or file in the configured Datasource, depending on the Datasource type, is referred to as a Dataset.
Data Refresh¶
- Qualdo Data Refresh
In Qualdo, Data Refresh refers to the operations that are used to provide Qualdo with the most current version of data for analysis.
- Files as Datasources
For file based datasources, which may contain one or more Datasets, Qualdo supports three types of data refresh:
File Replace - Qualdo will automatically discover any newly added file and identify it as a new Dataset. Future uploads for any discovered Dataset will be identified as a different version of the Dataset based on timestamp.
File version - Qualdo will automatically discover any newly added file whose file name matches the provided regex refresh format as a new Dataset and performs File Version data refresh. It considers all the files which are qualified for File Version refresh having matching base names (i.e. sales_mmddyy, product_mmddyy) as different versions of the same Dataset. Future uploads for any discovered Dataset will be identified as a different version of the Dataset based on timestamp. Files having names that do not match the provided Refresh Format are considered as separate Datasets, and File Replace data refresh logic will be applied.
Folder version - Qualdo will automatically discover any newly added file whose folder name matches the provided refresh format as a new Dataset and performs a Folder Version data refresh. It considers all the files which are qualified for Folder Version data refresh having a matching “foldername/filename” pattern as different versions of a same Dataset. Future uploads for any discovered Dataset will be identified as a different version of the Dataset based on timestamp. Any folder name which does not match with the provided Refresh Format, along with the files present inside those folders, are considered to be a separate Dataset, and File Replace data refresh logic will be applied. For any files present in the folder tree that are not part of the Folder Version definition, File Replace data refresh logic will be applied to those files.
- Databases as Datasources
For database datasources, based on the provided Incremental Data Identifier attribute, and/or on the presence of a datetype attribute, Qualdo supports three different types of data refresh:
Provided Incremental Data Identifier - When a timestamp attribute is provided as an Incremental Data Identifier, Qualdo will track the last known timestamp value of that attribute to handle any subsequent data refresh. If the configured attribute goes missing in any of the refresh or the values get changed, Qualdo’s default way of refresh handling will be applied.
No Incremental Data Identifier - When the user has not provided any attribute to track as the Incremental Data Identifier, or the provided attribute is not available in the Dataset, then Qualdo identifies all the timestamped attributes and tracks the last know timestamp values from all those attributes to handle the subsequent refresh data. This is the default refresh handling for database datasources in Qualdo when one or more timestamp attributes exist.
No Timestamped Attribute - When the user has not provided any attribute to track the Incremental Data Identifier or the provided attribute is not available in the Dataset, and no other timestamped attribute present in the Dataset, then Qualdo will track the number of rows to detect the subsequent refresh data. This is the default refresh handling for database datasources in Qualdo when there is no timestamp attribute.
The first step is to choose a Datasource you want to use with Qualdo.
- Verify Datasource setup requirements
While you are configuring a Datasource, we also recommend you to carefully go through the Help section displayed on the screen.
On-Screen help section guide you on any additional permission/requirements that must be met on the cloud storage destination to allow Qualdo to connect to the Datasource. We also recommend you to use the Test Connection feature to verify the set up before proceeding with saving the details.
Note
You can see the configured Datasources in the grid present at the bottom. The processing status can also be visualized at the grid present at the bottom of the page.