Collectors

Overview

A DataBlend collector defines how data should be pulled from an external system like Salesforce or FTP into DataBlend. It links how to access the external system (DataBlend credential), what data should be collected, and where the data should be stored (DataBlend data source and schema).

A collection is a specific execution of a collector. If the collection succeeds, then it creates a stream under the configured data source and schema. If it fails, an error will be reported in the collection log. Creating a schema with fields is not necessary before creating a collector. The collector will populate the schema.

Streams are created by collections, but exist independently of them; the collector configuration can be changed or the collection history purged without impacting the collected data.

Running a Collector

Collectors are used to pull data from source systems. Each collector is tied to a Credential (see: https://datablend.atlassian.net/wiki/spaces/DS1/pages/1196228684)

To set up a new Collector, click the Add button

Select the type of system to collect data from (must already have a credential set up to make the connection).

Depending on the Collector type, users will be able to specify schemas, fields, filters and/or initial queries that users would like to collect from the source system. Options vary based on Collector type.

In order to create a new Schema within a collector, the Create New link will need to be clicked after a data source has been selected

 

 

Parameters

Many collectors can filter data collected. For example the QuickBooks Online Profit and Loss Detail collector accepts a date range and returns only transactions that have a date within range. If the filter values will not change, it is appropriate to set them directly in the collector configuration. However, in other situations, the filter value may change over time (the start of the fiscal quarter) or may need to be set through a workflow ( multiple collectors will be run in sequence for the same set of accounts or for the same date range). In these situations, users should set up collection parameters.

String

A string is a parameter that is useful for characters, text, numbers, or symbols.

Date

Date parameters provide users the ability to collect data within a specific window of time. The dates are entered as specific dates such as Month/Day/Year.

Relative Date

Relative Date parameters provide users the ability to collect data within a relative window of time. The dates are entered as within a wide variety of timeframes such as start of the first quarter and end of the last quarter.

Boolean

A Boolean Parameter is useful for users wishing to utilize True, False, or NULL values.

Advanced

Field

Required/ Optional

Comments

Schema Update Type

Required

Add Only: Existing schema columns are preserved regardless of whether they exist in the first record collected. New columns identified in the first record collected are added to the schema. If the collection returns no records, the existing schema is unchanged.

Auto: Default. Schema is recreated from the first record collected. If the collection returns no records, the existing schema is unchanged.

Manual: No changes are made to the schema during collection.

If the schema is changed by the file upload, data in previous streams may no longer be available. (Columns may “go missing”.) When in doubt, set to Add Only.

History Retention (Days)

Optional

Default set as zero. Users may set the days they wish their collector data to be stored.

Timeout (seconds)

Optional

The Timeout section allows users to determine if they would like to timeout collections taking longer than a set number of seconds to collect data.

Run As

Required

Run As allows users to select from a drop-down list of users to run the Workflow.

Schedule

Optional

The Schedule option is a convenient way for users to make sure collections are running at the desired time. Simply select from the presets menu provided.

Is Paused

The Is Paused Toggle allows users to enable or disable a schedule. The toggle default is “false”. If the toggle is enabled (“true”) the schedule is paused.

Details

The Details section documents who the collector was created and updated by and the corresponding times. This allows for easy tracking of multiple collectors.

 

Latest Collection

The Latest Collection section documents the state of the collector, created time, and the status of the query. States include complete and error.

Logs

Job logs are easily accessible via the state link in the Latest Collection section. Click the linked state and the user is taken to the Collections section. Here users view items, details and logs related to the ran job. Logs are downloadable via the download log button indicated at the top right of the log section. Logs are useful to see how much data was collected, the steps taken, and the time in which it occurred.

Collection

The Collections section documents when the Collector was created, started, completed and the total amount of data scanned. The status includes information regarding the state of the Collector. This allows for easy tracking of multiple collections.

Creating a Favorite

Creating a favorite is simple. Users may favorite a Credential, Collector, Data Target, Query. Data Source, or Workflow. To create a favorite, users navigate to the star icon on the upper left next to Edit.

Please note that users cannot favorite an Unpivot, Data Quality Report, Schema, Agent or Notification.

Saved Views

Saved views are a unique feature offered by DataBlend that allow users to quickly view filtered searches. Setting a saved view is simple. Click the gear icon in the upper right corner. A drop-down will appear with option to save the current view, restore the default view, or copy share URL. Copying a Share URL will allow other users with the URL to view the same saved view.