Data Sources, Schemas and Streams

Definitions

A data source is a container or folder for collected data. It might represent a specific source system (e.g. Salesforce) or it might represent the purpose of a particular integration (e.g. MonthEndInvoicing).

A schema defines the structure of collected data. Most often this will be a simple table structure with one or more columns defined. A data source can have multiple schemas. Schemas can be created manually or they can be created automatically when a collection is run.

For example, an organization may have a data source QuickBooks for all their collected data from QuickBooks Online (QBO). There will be a schema for each of the QBO reports that the organization is collecting: PnL for the Profit and Loss Detail report and Balance for the Balance Sheet by Month report.

A stream is a set of data defined by a schema. Each schema can have multiple streams which are identified by creation timestamp and can be thought of as versions of the data over time. Each schema can have at most one open stream.

A stream may be created automatically by a collection or file upload or in the case of single-file schemas, the streams may be created manually and have data entered manually.


In the image below, the company has collected the Profit and Loss Detail report twice and the Balance Sheet by Month report once.

Schema Types

The schema type impacts the behavior of the associated streams and queries. Once the schema is created these settings cannot be changed.

Schema Type

Behavior

Schema Type

Behavior

Default

Applies to all schemas automatically created by a collector. Can also be created manually by leaving Realtime = false, Single file = false

Queries will use the most recent Closed stream. If no closed stream exists, the query will fail.*

This prevents a query from returning incomplete results if a stream is being updated by a collector at the moment the query is run.

Realtime

Queries will use an open stream if one exists. If there is not an open stream, then the most recent closed stream will be used.*

This allows for queries to run against a complete data set which is incrementally updated.

Single File

The open stream can be edited.

Also considered Realtime.

Queries will use an open stream if one exists. If there is not an open stream, then the most recent closed stream will be used.*

*Applies to queries with stream select type Default.

It is not necessary to create columns prior to saving a schema.

Usage

The Usage section can be utilized to conveniently locate collectors and queries in which the created data sources are used. Users may use the search bar so easily navigate through multiple collectors or queries.

Details

The details section documents who the Data Source was created and updated by and the corresponding times. This allows for easy tracking of multiple Data Sources.

Creating a Favorite

Creating a favorite is simple. Users may favorite a Credential, Collector, Data Target, Query, Data Source, or Workflow. To create a favorite, users navigate to the star icon on the upper left next to Edit.

Please note that users cannot favorite an Unpivot, Data Quality Report, Schema, Agent or Notification.

Saved Views

Saved views are a unique feature offered by DataBlend that allow users to quickly view filtered searches. Setting a saved view is simple. Click the gear icon in the upper right corner. A drop-down will appear with option to save the current view, restore the default view, or copy share URL. Copying a Share URL will allow other users with the URL to view the same saved view.

 

 Want to see more? Visit our helpful demo page or attend an office hour. https://lp.datablend.com/demos