Hands-On Graph Analytics with Neo4j
上QQ阅读APP看书,第一时间看更新

Using aggregation functions

It is often very useful to compute some aggregated quantities for the entities in our database, such as the number of friends in a social graph or the total price of an order for an e-commerce website. We will discover here how to do those calculations with Cypher.

Count, sum, and average

In a similar way to SQL, you can compute aggregates with Cypher. The main difference with SQL is that there is no need to use a GROUP BY statement; all fields that are not in an aggregation function will be used to create groups:

MATCH (FL:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN FL.name as state_name, COUNT(n.code) as number_of_neighbors

The result is the following one, as expected:

╒════════════╤═════════════════════╕
│"state_name"│"number_of_neighbors"│
╞════════════╪═════════════════════╡
│"Florida" │2 │
└────────────┴─────────────────────┘

The following aggregate functions are available:

  • AVG(expr): available for numeric values and durations
  • COUNT(expr): the number of rows with non-null expr
  • MAX(expr): the maximum value of expr over the group
  • MIN(expr): the minimum value of expr over the group
  • percentileCont(expr, p): the p (percentage) of expr over the group, interpolated
  • percentileDisc(expr, p): the p (percentage) of expr over the group
  • stDev(expr): the standard deviation of expr over the group
  • stDevP(expr): the population standard deviation of expr over the group
  • SUM(expr): available for numeric values and duration
  • COLLECT(expr): see the next section

For instance, we can compute the ratio between a state population and the sum of all people living in its neighboring states like so:

MATCH (s:State)-[:SHARE_BORDER_WITH]-(n)
WITH s.name as state, toFloat(SUM(n.population)) as neighbor_population, s.population as pop
RETURN state, pop, neighbor_population, pop / neighbor_population as f
ORDER BY f desc

The WITH keyword is used to perform intermediate operations.

Creating a list of objects

It is sometimes useful to aggregate several rows into a single list of objects. In that case, we will use the following:

COLLECT

For instance, if we want to create a list containing the code of the states sharing a border with Colorado:

MATCH (:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN COLLECT(n.code)

This returns the following result:

["GA","AL"]

Unnesting objects

Unnesting consists of converting a list of objects into rows, each row containing an item of the list. It is the exact opposite of COLLECT, which groups objects together into a list.

With Cypher, we will use the following statement:

UNWIND

For instance, the following two queries are equivalent:

MATCH (:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
WITH COLLECT(n.code) as codes
UNWIND codes as c
RETURN c

// is equivalent to, since COLLECT and UNWIND cancel each other:
MATCH (CO:State {code: "FL"})-[:SHARE_BORDER_WITH]-(n)
RETURN n.cod

This returns our well-known two state codes.

The UNWIND operation will be useful for data imports, since some files are formatted in a way that several pieces of information can be aggregated on a single row. Depending on the data format, this function can be useful when importing data into Neo4j, as we will see in the next section.