Custom Insights

Introduction

Custom insights provide a powerful way to create your own customer level insights, using your own logic.
These insights are defined using our SQL like grammar, and are executed at the time of ingestion. Custom insights can be managed using the custom insights API, or those unfamiliar with SQL can create and manage them through the Bud console.


How to navigate to custom insights in the console

Start by logging in to the Bud console.

Once logged in click on Projects.

Then click on Configure and choose Custom insights.


From here you can now start creating your own custom insights directly in the console.

The rest of this guide will focus on creating custom insights using the API.


How to write a custom insight query

Overview

Broadly a custom insight query consists of four parts:

INFER <name> FROM <datasource> [WHERE <expression>] [HAVING <expression>]
  • INFER The name of the insight we are defining.
  • FROM The data source we are querying (in this case transactions).
    This datasource provides the columns we can query against, and any custom functions for that datatype.
  • WHERE (Optional) An expression that filters the data source, applied before the HAVING clause.
    The expression must return a boolean vector equal in length to the data source.
  • HAVING (Optional) An expression that must return a boolean.

If there is just a where clause, the insight will apply to the customer if the data source, after filtering, is not empty.
If there is just a having clause, the insight will apply if the having clause returns true.
If there is neither a having nor a where clause, the insight will never apply.

For instance, the following insight will always apply to a customer:

INFER always_true FROM transactions HAVING SUM(1, 2, 3) = 6

Types

The current types (for terms) are:

TypeDescriptionLiteral example
stringString"hello"
numericFloating point42.42
booleanBooleanTRUE|FALSE
datetimedatetimen/a
stringArrayArray of strings("A", "B", "C")
numericArrayArray of booleans(1, 2, 3)
booleanArrayArray of booleans(TRUE, TRUE, FALSE)

These can either be vectors (from a symbol ref) or non vectors (from a literal).

Expression

An expression is composed of one or more conditions anded together:

<expression> ::= <condition> ("AND" <condition>)*

For instance:
1 + 1 = 2 AND 4 = 2 + 2
is a valid expression that will evaluate to a non vector TRUE.

Condition

A condition must return a boolean.
This can be achieved by using a condition operator:

OperatorDescription
=Equal to
<>Not equal to
!=Not equal to
>Greater than
>=Greater than or equal to
<Less than
<=Less than or equal to
INExists in array

If one side of a condition is a vector and the other a non vector, the result will be a vector of the same length as the vector side.
If neither side are vectors, the result will be a non vector. If both sides are vectors they must be of the same length.

Note on IN operator

The IN operator is the only operator that can be used to compare against an array type.
The array can be a literal array, or a symbol reference to an array column in the datasource.
For instance:

"B" IN ("A", "B", "C") -> TRUE

Arithmetic

Arithmetic operators are also supported on numeric types.
For instance: 1 + 1 = 2 will evaluate to TRUE.
Expressions are evaluated left to right.
Currently supported operations are:

  • + Addition
  • - Subtraction
  • * Multiplication
  • / Division

Functions

The following special functions are supported:

FunctionDescription
SUM(...)Returns a non vector numeric sum of all arguments, if an argument is a vector, it will return the sum of all its elements. Arguments must be numeric type
DATETIME_FROM(dt, y, m, d)Returns a boolean vector of the same length as dt, where the value is true if the date is within y years, m months, d days of the current datetime. dt must be type datetime, y, m, d must be type numeric and integers
DATETIME_TO(dt, y, m, d)Returns a boolean vector of the same length as dt, where the value is false if the date is within y years, m months, d days of the current datetime. dt must be type datetime, y, m, d must be type numeric and integers

Examples

If we have a data source with a numeric symbol ref (column) amount = [1, 2, 3, 4, 5], we could have the following condition:

amount > 3 -> [FALSE, FALSE, FALSE, TRUE, TRUE]

We could also have:

amount > 3 AND amount < 5 -> [FALSE, FALSE, FALSE, TRUE, FALSE]

Or:

SUM(amount) = 15 -> TRUE

If we have another symbol ref date_time of type datetime = [now, 1 day ago, 2 days ago, 3 days ago, 4 days ago], we could have the following condition:

DATETIME_FROM(date_time, 0, 0, 3) -> [TRUE, TRUE, TRUE, FALSE, FALSE]

Data source

The only data source that currently exists is transactions.
This data source has the following columns:

Symbol RefTypeDescription
date_timedatetimeThe date time of the transaction
credit_debit_indicatorstringThe credit debit indicator of the transaction ("credit"|"debit")
amountnumericThe value of the transaction (negative if debit transaction)
currencystringThe three-letter ISO currency code of the transaction
statusstringThe status of the transaction. ("booked"|"pending")
category_l1stringl1 category of the transaction
category_l2stringl2 category of the transaction
merchantstringmerchant slug for the transaction
processorstringPayment processor associated with a Transaction
regularity_frequencystringThe regularity frequency of the transaction. "" if not regular
countrystringAn ISO-3166 Alpha-3 country code representing the detected country the transaction was located
transaction_typesstringArrayTransaction types associated with the transaction
namesstringArrayNames that were detected in the transaction description
tagsstringArrayA list of potential tags associated with the transaction after contextual enrichment

Important note on transactions

You MUST filter (where clause) on date_time in your query using theDATETIME_FROM function, else the time window of transactions the query will include will be non-deterministic.

Embedded Select Statements

It is possible to embed select statements within a condition to build up more complex queries. An embedded statement should be wrapped in parentheses. The structure of an embedded select is:

SELECT <term> FROM <datasource> [WHERE <expression>]

A common use case for this is to compare on income (see examples).

Examples

Active Gambler

Starting with a simple example, we can mark any customer who has made a gambling transaction in the last month as an active gambler.

INFER active_gambler FROM transactions 
WHERE category_l1 = "gambling_and_lotteries" 
  AND credit_debit_indicator = "debit"
  AND currency = "GBP"
  AND DATETIME_FROM(date_time, 0, 1, 0)

Big Gambler

Let's expand this further, and mark a gambler who spent more than 10% of their last months income on gambling:

INFER big_active_gambler FROM transactions 
WHERE category_l1 = "gambling_and_lotteries" 
  AND credit_debit_indicator = "debit"
  AND currency = "GBP"
  AND DATETIME_FROM(date_time, 0, 1, 0)
HAVING ABS(SUM(amount)) > SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 0, 1, 0)) * 0.1

Segment Active Gambler

Let's only apply this to customers who have an annual income between £40,000 and £50,000:

INFER segment_big_active_gambler FROM transactions 
WHERE category_l1 = "gambling_and_lotteries" 
  AND credit_debit_indicator = "debit"
  AND currency = "GBP"
  AND DATETIME_FROM(date_time, 0, 1, 0)
HAVING ABS(SUM(amount)) > SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 0, 1, 0)) * 0.1
  AND SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 1, 0, 0)) > 40000
  AND SUM(SELECT amount FROM transactions 
    WHERE "income" IN tags 
      AND currency = "GBP"
      AND DATETIME_FROM(date_time, 1, 0, 0)) < 50000