Use and meaning of Expressions in CollectiveAccess
Expressions are statements that are evaluated by CollectiveAccess and result in a quantity.
Currently, Expressions can be used to set triggers in an import mapping. In this context the result that matters is true or false. Truth here is "true" as defined in the mathematical boolean logic. Expression can do sophisticated comparisons of values in a record and use the results of these comparisons to trigger actions, including (for now) skipping of records or (soon) rewriting data in the process of importing it.
At its simplest an expression is a number or text quantity. These are examples of perfectly valid expressions:
- "Software is great"
You'll notice in the examples above that numbers are just numbers while text must be enclosed in quotes (single or double). Any quantity that is non-empty and non-zero will evaluate to "true" meaning that 5 = true while 0 = false. -1 is also true, since it's non-zero. Any strings besides "" (no text at all) is true, even " " (a single space).
While single values are valid expressions, they're not very useful. The utility really begins with operators. Operators are symbols that take two operands (values), perform some operation, and return a new value based upon the operands. There are several kinds of operators available to our expressions. They are:
Comparison operators compare two operands and return true or false. The most common operator is "=", which returns true if the operands are exactly the same, false if they are not. For example, "wood" = "wood" is true whereas "wood" = "cement" is not.
In an import mapping it is possible to use the "=" operator to check if an input field is a certain value.
Other comparison operators are:
- > greater than
- < less than
- >= greater than or equal
- <= less than or equal
- <> not equal
- != not equal (alternate form)
Greater than/less than operators only work with numeric values. Equal and not equal work with numbers or text.
With expressions you can perform mathematical operations on numbers using +, -, * and /. These are addition, subtraction, multiplication and division respectively. The + operator also works on text, and will merge two text values together into a single run-on text value. For example:
- 4 + 5 returns 9
- "Julia" + " plus " + "Allison" returns "Julia plus Allison"
It is also possible to string together many expressions into a larger composite expression using the boolean logic operators "AND" and "OR". "AND" returns true if, and only if, both operands evaluates as true. "OR" returns true if, and only if, at least one operand evaluates as true. For example:
- (5 > 10) AND ("seth" = "seth") is false because 5 is not greater than 10, and both expressions need to be true for the composite AND to be true
- (5 > 10) OR ("seth" = "seth") is true because "seth" = "seth" is true and only one needs to be true for logical OR to return true
Additional comparison operators
The comparison operators shown above are useful but limited. There are a couple of additional ones that are really where the action is :-) They are:
The "IN" operator.
"IN" lets you compare a value to a list of values. It returns true if ANY value in the list matches the value you are comparing. For example:
- "Seth" IN ("Julia", "Allison", "Sophie", "Maria", "Angie", "Seth") returns true
- "Joe" IN ("Julia", "Allison", "Sophie", "Maria", "Angie", "Seth") returns false
There's also the related "NOT IN" operator which returns true if the value is not in the list.
from v1.6 onwards, the notation changes slightly. The arrays after the IN and NOT IN operators must be enclosed in square parens, like so:
- "Joe" IN ["Julia", "Allison", "Sophie", "Maria", "Angie", "Seth"]
The =~ (regular expression) operator.
You can compare a value against a regular expression using the =~ operator. Regular expressions are a very powerful and very flexible pattern matching syntax. At its most basic a regular expression is a simple bit of text that is matched anywhere in the value being compared. For example:
- "Software is great" =~ /soft/ returns true
Notice that the regular expression is on the right side of the operator and is enclosed in "/" characters. This is a traditional notation for regular expressions; they are enclosed in the forward slashes to set them off from normal text.
There's also a related !~ operator that returns true when the value does not match the regular expression.
This is all well and good, but the above examples are not terribly useful with hardcoded values in them. Where things start getting truly useful is variables. Any source in an import record can be used as a variable by prefixing its name with a "^" character. So if you were importing an Excel spreadsheet and wanted to apply rules when the word "allison" appears anywhere in the value of column 4 you'd write
- ^4 =~ /allison/
Similarly, if you want to make sure that the value in the 10th column is equal to "metal" then you use the expression:
- ^10 = "metal"
If you wanted to make sure that both conditions applied to a record then you'd use:
- (^4 =~ /allison/) AND (^10 = "metal")
If either would suffice you could use "OR" rather than "AND"
For XML input data the variable names are the XML paths – the exact same thing used in the source specification but with a "^" tacked onto the front.
Functions are black-boxes that you put a number of values into in order to get a single value out of. The expression system current allows the following functions, all of which are mathematical:
- abs = returns the absolute value of a number (eg. changes negative numbers to positive ones); takes a single value as input
- ceil = rounds a fractional number up to the next highest integer; takes a single value as input
- floor = rounds a fractional number down to the next lower integer; takes a single value as input
- int = forces a number to be an integer. If the number has a decimal component it is discarded; takes a single value as input
- max = returns the largest value of those passed to it; takes any number of values as input
- min = returns the smallest value of those passed to it; takes any number of values as input
- round = rounds the number to the closest integer; takes a single value as input
- random = returns a random number between zero and the number provided as input ; takes a single value as input
- current = evaluates to true if the supplied date expression encompasses the current server date/time [available from version 1.5]
- future = evaluates to true if the supplied date expression ends any time after the current server date/time. The start date is not considered, so the range may start before or after the current date/time and still evaluate to true [available from version 1.5]
- wc = returns number of words (wc = "word count") in a supplied text value [available from version 1.5]
- length = returns number of characters in a supplied text value [available from version 1.5]
- sizeof - returns number of parameters. useful for counting values. see example below [available from version 1.6]
- age - calculates age in years. accepts an arbitrary number of parameters greater than 1. It'll take the earliest and latest dates in the parameter list as start and end of the time span, so you don't have to worry about the order. If the result is a span of 0 years (e.g. because only 1 date was passed), it'll retry with the current date added to the list. This is useful to calculate something's/someone's current age. [available from version 1.6]
- ageyears - alias for age [available from version 1.6]
- agedays - same as age/ageyears, only for days. [available from version 1.6]
- avgdays - calculates the average length of the time spans passed as parameters. Accepts an arbitrary number of parameters (>1). [available from version 1.6]
- formatdate - formats a valid date expression using PHP's date() function. Formats dates as ISO by default but accepts an optional second parameter to specify the format that gets passed to date(). See the PHP documentation for available options. [available from version 1.6]
- formatgmdate - formats a valid date expression in UTC using PHP's gmdate() function. Formats dates as ISO by default but accepts an optional second parameter to specify the format that gets passed to gmdate(). See the PHP documentation for available options. [available from version 1.6]
- isvaliddate - returns true if parameter parses as a valid date [available from version 1.7]
- join - returns a list of values delimited by the first argument. All other arguments are values. Alias implode. [available from version 1.7 or 1.6.2]
- trim - trims leading and trailing whitespace from a string. [available from version 1.7 or 1.6.2]
To include the function-produced value in your expression just add the function name with a paren-enclosed list of values following. For example:
- random(10) > 5 returns true if the random number between 0 and 10 is greater than 5
- ceil(5.2) returns 6
- floor(5.6) returns 5
- round(5.2) returns 5
- round(5.6) returns 6
- length("hello") returns 5
- sizeof(1,2,3,4) returns 4
- age("23 June 1912", "7 June 1954") returns 41
- age("7 June 1954", "23 June 1912") returns 41 (order doesn't matter)
- age("7 June 1954", "9 May 1945", "23 June 1912") returns 41 ('extra' dates don't matter)
- age("28 January 1985") returns something > 29; 30 if you run it before 28 January 2016
- agedays("23 June 1912", "7 June 1954") returns 15324
- agedays("1912/06/23") returns something > 37653
- avgdays("1912/06/23 - 1954/06/07", "1985/01/28 - 2015/07/24") returns 13229
- avgdays("1945/01/02 - 1945/01/03", "1985/01/28 - 1985/01/29") returns 1
- formatdate("1985/01/28") returns 2015-08-05T14:28:31-04:00. Note that this result can vary based on your time zone setting in setup.php!
- formatgmdate("1985/01/28") returns 1985-01-28T05:00:00+00:00. Note that this result can vary based on your time zone setting in setup.php!
- formatgmdate("1985/01/28", "Y") returns 1985
- trim(" this text has spaces at the end ") returns "this text has spaces at the end"
- join(", ", "Smith", "Bob") returns "Smith, Bob"
You may have noticed that parens have been sprinkled through some of the examples. You can use matched parens to group elements of an expression. This makes it easier to read and also ensures that operators are applied in the desired sequence in complex expressions. The three things you need to know about parens are: (1) each paren'ed sub-expression is evaluated as a single unit, before being combined with other sub-expressions (2) you must always match each opening paren with a closing paren and (3) parens don't hurt anything, but can improve readability of the expression so you are encouraged to use them liberally.