Custom Insights

Introduction

Custom insights provide a powerful way to create your own customer level insights, using your own logic.
These insights are defined using our SQL like grammar, and are executed at the time of ingestion. Custom insights can be managed using the custom insights API, or those unfamiliar with SQL can create and manage them through the Bud console.

How to navigate to custom insights in the console

Start by logging in to the Bud console.

Once logged in click on Projects.

Then click on Configure and choose Custom insights.

From here you can now start creating your own custom insights directly in the console.

The rest of this guide will focus on creating custom insights using the API.

How to write a custom insight query

Overview

Broadly a custom insight query consists of four parts:

INFER <name> FROM <datasource> [WHERE <expression>] [HAVING <expression>]

INFER The name of the insight we are defining.
FROM The data source we are querying (in this case transactions).
This datasource provides the columns we can query against, and any custom functions for that datatype.
WHERE (Optional) An expression that filters the data source, applied before the HAVING clause.
The expression must return a boolean vector equal in length to the data source.
HAVING (Optional) An expression that must return a boolean.

If there is just a where clause, the insight will apply to the customer if the data source, after filtering, is not empty.
If there is just a having clause, the insight will apply if the having clause returns true.
If there is neither a having nor a where clause, the insight will never apply.

For instance, the following insight will always apply to a customer:

INFER always_true FROM transactions HAVING SUM(1, 2, 3) = 6

Types

The current types (for terms) are:

Type	Description	Literal example
`string`	String	`"hello"`
`numeric`	Floating point	`42.42`
`boolean`	Boolean	`TRUE`\|`FALSE`
`datetime`	datetime	n/a
`stringArray`	Array of strings	`("A", "B", "C")`
`numericArray`	Array of booleans	`(1, 2, 3)`
`booleanArray`	Array of booleans	`(TRUE, TRUE, FALSE)`

These can either be vectors (from a symbol ref) or non vectors (from a literal).

Expression

An expression is composed of one or more conditions anded together:

<expression> ::= <condition> ("AND" <condition>)*

For instance:
1 + 1 = 2 AND 4 = 2 + 2
is a valid expression that will evaluate to a non vector TRUE.

Condition

A condition must return a boolean.
This can be achieved by using a condition operator:

Operator	Description
`=`	Equal to
`<>`	Not equal to
`!=`	Not equal to
`>`	Greater than
`>=`	Greater than or equal to
`<`	Less than
`<=`	Less than or equal to
`IN`	Exists in array

If one side of a condition is a vector and the other a non vector, the result will be a vector of the same length as the vector side.
If neither side are vectors, the result will be a non vector. If both sides are vectors they must be of the same length.

Note on IN operator

The IN operator is the only operator that can be used to compare against an array type.
The array can be a literal array, or a symbol reference to an array column in the datasource.
For instance:

"B" IN ("A", "B", "C") -> TRUE

Arithmetic

Arithmetic operators are also supported on numeric types.
For instance: 1 + 1 = 2 will evaluate to TRUE.
Expressions are evaluated left to right.
Currently supported operations are:

+ Addition
- Subtraction
* Multiplication
/ Division

Functions

The following special functions are supported:

Function	Description
SUM(...)	Returns a non vector `numeric` sum of all arguments, if an argument is a vector, it will return the sum of all its elements. Arguments must be `numeric` type
DATETIME_FROM(dt, y, m, d)	Returns a `boolean` vector of the same length as `dt`, where the value is true if the date is within `y` years, `m` months, `d` days of the current datetime. `dt` must be type `datetime`, `y`, `m`, `d` must be type `numeric` and integers
DATETIME_TO(dt, y, m, d)	Returns a `boolean` vector of the same length as `dt`, where the value is false if the date is within `y` years, `m` months, `d` days of the current datetime. `dt` must be type `datetime`, `y`, `m`, `d` must be type `numeric` and integers

Examples

If we have a data source with a numeric symbol ref (column) amount = [1, 2, 3, 4, 5], we could have the following condition:

amount > 3 -> [FALSE, FALSE, FALSE, TRUE, TRUE]

We could also have:

amount > 3 AND amount < 5 -> [FALSE, FALSE, FALSE, TRUE, FALSE]

Or:

SUM(amount) = 15 -> TRUE

If we have another symbol ref date_time of type datetime = [now, 1 day ago, 2 days ago, 3 days ago, 4 days ago], we could have the following condition:

DATETIME_FROM(date_time, 0, 0, 3) -> [TRUE, TRUE, TRUE, FALSE, FALSE]

Data source

The only data source that currently exists is transactions.
This data source has the following columns:

Symbol Ref	Type	Description
`date_time`	`datetime`	The date time of the transaction
`credit_debit_indicator`	`string`	The credit debit indicator of the transaction (`"credit"`\|`"debit"`)
`amount`	`numeric`	The value of the transaction (negative if debit transaction)
`currency`	`string`	The three-letter ISO currency code of the transaction
`status`	`string`	The status of the transaction. (`"booked"`\|`"pending"`)
`category_l1`	`string`	l1 category of the transaction
`category_l2`	`string`	l2 category of the transaction
`merchant`	`string`	merchant slug for the transaction
`processor`	`string`	Payment processor associated with a Transaction
`regularity_frequency`	`string`	The regularity frequency of the transaction. `""` if not regular
`country`	`string`	An ISO-3166 Alpha-3 country code representing the detected country the transaction was located
`transaction_types`	`stringArray`	Transaction types associated with the transaction
`names`	`stringArray`	Names that were detected in the transaction description
`tags`	`stringArray`	A list of potential tags associated with the transaction after contextual enrichment

Important note on transactions

You MUST filter (where clause) on date_time in your query using theDATETIME_FROM function, else the time window of transactions the query will include will be non-deterministic.

Embedded Select Statements

It is possible to embed select statements within a condition to build up more complex queries. An embedded statement should be wrapped in parentheses. The structure of an embedded select is:

SELECT <term> FROM <datasource> [WHERE <expression>]

A common use case for this is to compare on income (see examples).

Examples

Active Gambler

Starting with a simple example, we can mark any customer who has made a gambling transaction in the last month as an active gambler.

INFER active_gambler FROM transactions 
WHERE category_l1 = "gambling_and_lotteries" 
  AND credit_debit_indicator = "debit"
  AND currency = "GBP"
  AND DATETIME_FROM(date_time, 0, 1, 0)

Big Gambler

Let's expand this further, and mark a gambler who spent more than 10% of their last months income on gambling:

INFER big_active_gambler FROM transactions 
WHERE category_l1 = "gambling_and_lotteries" 
  AND credit_debit_indicator = "debit"
  AND currency = "GBP"
  AND DATETIME_FROM(date_time, 0, 1, 0)
HAVING ABS(SUM(amount)) > SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 0, 1, 0)) * 0.1

Segment Active Gambler

Let's only apply this to customers who have an annual income between £40,000 and £50,000:

INFER segment_big_active_gambler FROM transactions 
WHERE category_l1 = "gambling_and_lotteries" 
  AND credit_debit_indicator = "debit"
  AND currency = "GBP"
  AND DATETIME_FROM(date_time, 0, 1, 0)
HAVING ABS(SUM(amount)) > SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 0, 1, 0)) * 0.1
  AND SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 1, 0, 0)) > 40000
  AND SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 1, 0, 0)) < 50000