Custom Insights
Introduction
Custom insights provide a powerful way to create your own customer level insights, using your own logic.
These insights are defined using our SQL like grammar, and are executed at the time of ingestion. Custom Insights can be managed either through the Bud console, for those unfamiliar with SQL, or by using the custom insights API. The rest of this guide will focus on using the custom insights API.
How to write a custom insight query
Overview
Broadly a custom insight query consists of four parts:
INFER <name> FROM <datasource> [WHERE <expression>] [HAVING <expression>]
- INFER The name of the insight we are defining.
- FROM The data source we are querying (in this case transactions).
This datasource provides the columns we can query against, and any custom functions for that datatype. - WHERE (Optional) An expression that filters the data source, applied before the HAVING clause.
The expression must return a boolean vector equal in length to the data source. - HAVING (Optional) An expression that must return a boolean.
If there is just a where clause, the insight will apply to the customer if the data source, after filtering, is not empty.
If there is just a having clause, the insight will apply if the having clause returns true.
If there is neither a having nor a where clause, the insight will never apply.
For instance, the following insight will always apply to a customer:
INFER always_true FROM transactions HAVING SUM(1, 2, 3) = 6
Types
The current types (for terms) are:
Type | Description | Literal example |
---|---|---|
string | String | "hello" |
numeric | Floating point | 42.42 |
boolean | Boolean | TRUE |FALSE |
datetime | datetime | n/a |
stringArray | Array of strings | ("A", "B", "C") |
numericArray | Array of booleans | (1, 2, 3) |
booleanArray | Array of booleans | (TRUE, TRUE, FALSE) |
These can either be vectors (from a symbol ref) or non vectors (from a literal).
Expression
An expression is composed of one or more conditions anded together:
<expression> ::= <condition> ("AND" <condition>)*
For instance:
1 + 1 = 2 AND 4 = 2 + 2
is a valid expression that will evaluate to a non vector TRUE
.
Condition
A condition must return a boolean.
This can be achieved by using a condition operator:
Operator | Description |
---|---|
= | Equal to |
<> | Not equal to |
!= | Not equal to |
> | Greater than |
>= | Greater than or equal to |
< | Less than |
<= | Less than or equal to |
IN | Exists in array |
If one side of a condition is a vector and the other a non vector, the result will be a vector of the same length as the vector side.
If neither side are vectors, the result will be a non vector. If both sides are vectors they must be of the same length.
Note on IN operator
The IN operator is the only operator that can be used to compare against an array type.
The array can be a literal array, or a symbol reference to an array column in the datasource.
For instance:
"B" IN ("A", "B", "C") -> TRUE
Arithmetic
Arithmetic operators are also supported on numeric types.
For instance: 1 + 1 = 2
will evaluate to TRUE
.
Expressions are evaluated left to right.
Currently supported operations are:
+
Addition-
Subtraction*
Multiplication/
Division
Functions
The following special functions are supported:
Function | Description |
---|---|
SUM(...) | Returns a non vector numeric sum of all arguments, if an argument is a vector, it will return the sum of all its elements. Arguments must be numeric type |
DATETIME_FROM(dt, y, m, d) | Returns a boolean vector of the same length as dt , where the value is true if the date is within y years, m months, d days of the current datetime. dt must be type datetime , y , m , d must be type numeric and integers |
DATETIME_TO(dt, y, m, d) | Returns a boolean vector of the same length as dt , where the value is false if the date is within y years, m months, d days of the current datetime. dt must be type datetime , y , m , d must be type numeric and integers |
Examples
If we have a data source with a numeric
symbol ref (column) amount
= [1, 2, 3, 4, 5]
, we could have the following condition:
amount > 3 -> [FALSE, FALSE, FALSE, TRUE, TRUE]
We could also have:
amount > 3 AND amount < 5 -> [FALSE, FALSE, FALSE, TRUE, FALSE]
Or:
SUM(amount) = 15 -> TRUE
If we have another symbol ref date_time
of type datetime
= [now, 1 day ago, 2 days ago, 3 days ago, 4 days ago]
, we could have the following condition:
DATETIME_FROM(date_time, 0, 0, 3) -> [TRUE, TRUE, TRUE, FALSE, FALSE]
Data source
The only data source that currently exists is transactions
.
This data source has the following columns:
Symbol Ref | Type | Description |
---|---|---|
date_time | datetime | The date time of the transaction |
credit_debit_indicator | string | The credit debit indicator of the transaction ("credit" |"debit" ) |
amount | numeric | The value of the transaction (negative if debit transaction) |
currency | string | The three-letter ISO currency code of the transaction |
status | string | The status of the transaction. ("booked" |"pending" ) |
category_l1 | string | l1 category of the transaction |
category_l2 | string | l2 category of the transaction |
merchant | string | merchant slug for the transaction |
processor | string | Payment processor associated with a Transaction |
regularity_frequency | string | The regularity frequency of the transaction. "" if not regular |
country | string | An ISO-3166 Alpha-3 country code representing the detected country the transaction was located |
transaction_types | stringArray | Transaction types associated with the transaction |
names | stringArray | Names that were detected in the transaction description |
tags | stringArray | A list of potential tags associated with the transaction after contextual enrichment |
Important note on transactions
You MUST filter (where clause) on date_time in your query using theDATETIME_FROM
function, else the time window of transactions the query will include will be non-deterministic.
Embedded Select Statements
It is possible to embed select statements within a condition to build up more complex queries. An embedded statement should be wrapped in parentheses. The structure of an embedded select is:
SELECT <term> FROM <datasource> [WHERE <expression>]
A common use case for this is to compare on income (see examples).
Examples
Active Gambler
Starting with a simple example, we can mark any customer who has made a gambling transaction in the last month as an active gambler.
INFER active_gambler FROM transactions
WHERE category_l1 = "gambling_and_lotteries"
AND credit_debit_indicator = "debit"
AND currency = "GBP"
AND DATETIME_FROM(date_time, 0, 1, 0)
Big Gambler
Let's expand this further, and mark a gambler who spent more than 10% of their last months income on gambling:
INFER big_active_gambler FROM transactions
WHERE category_l1 = "gambling_and_lotteries"
AND credit_debit_indicator = "debit"
AND currency = "GBP"
AND DATETIME_FROM(date_time, 0, 1, 0)
HAVING ABS(SUM(amount)) > SUM(SELECT amount FROM transactions
WHERE "income" IN tags
AND currency = "GBP"
AND DATETIME_FROM(date_time, 0, 1, 0)) * 0.1
Segment Active Gambler
Let's only apply this to customers who have an annual income between £40,000 and £50,000:
INFER segment_big_active_gambler FROM transactions
WHERE category_l1 = "gambling_and_lotteries"
AND credit_debit_indicator = "debit"
AND currency = "GBP"
AND DATETIME_FROM(date_time, 0, 1, 0)
HAVING ABS(SUM(amount)) > SUM(SELECT amount FROM transactions
WHERE "income" IN tags
AND currency = "GBP"
AND DATETIME_FROM(date_time, 0, 1, 0)) * 0.1
AND SUM(SELECT amount FROM transactions
WHERE "income" IN tags
AND currency = "GBP"
AND DATETIME_FROM(date_time, 1, 0, 0)) > 40000
AND SUM(SELECT amount FROM transactions
WHERE "income" IN tags
AND currency = "GBP"
AND DATETIME_FROM(date_time, 1, 0, 0)) < 50000
Updated 8 months ago