
Creating the users table
Our first table will store basic user account information: username, email, and password. To create the table, fire up the CQL shell (don't forget to use the USE "my_status";
statement if you are starting a fresh session) and enter the following CQL statement:
CREATE TABLE "users" ( "username" text PRIMARY KEY, "email" text, "encrypted_password" blob );
In the above statement, we created a new table called users
, which has three columns: username
and email
, which are text
columns, and encrypted_password
, which has the type blob
. The username
column acts as the primary key for the table.
Structuring of tables
Cassandra structures tables in rows and columns, just like a relational database. Also like a relational database, the columns available to a table are defined in advance. New columns cannot be added on-the-fly when inserting data, although it's possible to update an existing table's schema.
Every table defines one or more columns to act as the primary key; each row is uniquely identified by the value(s) in its primary key column(s), and those columns cannot be left blank in any row. Cassandra does not offer auto-incrementing primary keys; each row, when created, must be explicitly assigned a primary key by the client. One good way to structure the primary key is to use a natural key, which is a value that is fundamentally unique for each row you want to store. That's what we do in the users
table by making username
, a naturally unique identifier, the primary key.
Note
Before we continue, I should emphasize the importance of using the latest version of Cassandra. Cassandra and CQL have changed substantially over the past few years, and the current CQL version 3.1 is not backward compatible with older versions of the language. The examples in this book target the latest Cassandra version (at the time of writing, 2.1.2); make sure you're running this version or a newer one.
Table and column options
The table creation statement given in the previous section is very simple: little more than a list of column names and their respective types. In contrast, an SQL CREATE TABLE
statement will often look quite complex, with multiple options set on each column, such as defaults, constraints, and nullability flags. What sorts of bells and whistles can we add to a CQL table?
At the table level, Cassandra does have quite a few configuration options, known as table properties. These properties allow you to tune a wide range of under-the-hood aspects of the table, such as caching, compression, garbage collection, and read repair. Table properties do not, however, bear on the table's behavior from the application's standpoint. For this reason, we won't go into detail about them in this book.
Columns, on the other hand, have very few knobs to turn. In fact, outside of the type, each column is pretty much the same. There are several column options you may be accustomed to from SQL databases that don't carry over to Cassandra, such as the following:
- Cassandra doesn't have a concept of
NULL
values; columns either have data, or they don't. Primary key columns are always required to have a value; non-key columns are always optional. You will see the wordnull
appear in the cqlsh output, but that simply means there is no data in this column, and should not be confused with the concept ofNULL
in a relational database. - Cassandra doesn't support default values for columns. If a row is inserted without a value for a certain column, that column just doesn't have a value.
- Cassandra doesn't provide data validations like length limits or other more complex column constraints. As long as a value is of the right type for the column you're putting it in, it's valid.
Happily, most modern applications do not need to rely on the sorts of constraints listed above; domain modeling libraries and object mappers typically allow you to easily apply these constraints at the application level.
The type system
Each column defined for a table has a defined type. In the users
table, we used the text
and blob
types, but those are only two of the many types built in to Cassandra.
Cassandra has two types that store string data:
Neither of the above string types has a limit on the length of strings that can be stored. In CQL, string literals are surrounded in single quotes, 'like this'
.
Cassandra has three types that store integers:
- The
int
type stores 32-bit integers, which can store values ranging from approximately -2.1 billion to 2.1 billion - The
bigint
type stores 64-bit integers, which can store values from about -9 quintillion to 9 quintillion - The
varint
type stores integers of arbitrary size; it has no minimum or maximum value
All integer types are signed, meaning they can hold positive or negative integers. There are no unsigned numeric types in Cassandra. Integer literals in CQL, like in most languages, consist of an optional minus sign followed by one or more digits, such as 3549
.
Cassandra has three types that store non-integer numbers:
- The
float
type stores 32-bit floating point numbers. - The
double
type stores 64-bit floating point numbers. - The
decimal
type stores variable-precision decimal numbers, with no upper bound on size. Unlike a floating point number, a variable-precision decimal will never suffer from base 10 rounding errors in the fractional part of the number.
Like the integer types, the floating point and decimal types are always signed. In CQL, floating point and decimal literals are represented by an optional minus sign, followed by a series of digits, followed by a period, followed by another series of digits, such as 7152.6846
. You can also write them using exponential notation, like 9.021e14
.
Dates and times can be stored using the timestamp
type, which holds date/time data at millisecond precision. Timestamp literals are enclosed in single quotation marks like string literals, and take the format 'yyyy-mm-dd HH:mm:ssZ'
, for example '2014-05-18 15:49:31-0400'
. There is no type that stores dates without times, although the time portion of a timestamp literal can be omitted, which defaults the time to midnight in the given time zone. To represent timestamps with millisecond precision, you may also use a numeric literal with the number of milliseconds since midnight UTC on January 1, 1970, for instance, 1400442761830
.
Cassandra has two types that store universally unique identifiers:
Note
A UUID, which is short for universally unique identifier, is essentially a very large number generated in a specific way, designed to guarantee that the same UUID will never be generated anywhere in the world at any time. Version 1 UUIDs are generated using a high-precision timestamp and the generating computer's MAC address; the timestamp can be extracted from the UUID. Version 4 UUIDs use random or pseudorandom numbers.
CQL uses the canonical representation of UUIDs, which is a sequence of hexadecimal digits broken up in specific places by dashes, in the form 8-4-4-4-12. A CQL UUID literal is not surrounded by quotation marks or any other delimiter; for example, 550e8400-e29b-41d4-a716-446655440000
is a valid UUID literal. Most languages have UUID libraries available that will generate UUIDs and output them in the canonical format.
The boolean
type stores simple true/false values. A boolean literal is either true
or false
, with no surrounding quotation marks.
The blob
type stores unstructured binary data. Blobs are a good choice for images, audio, and encrypted data. In CQL, the blob literal is a sequence of hexadecimal digits, prefixed with 0x
, for instance, 0x1d4375023013dba2d5f9a
. Blob literals are not surrounded by quotation marks.
Now that you know the full range of types available in Cassandra, you may be wondering what purpose those types serve. In fact, the type system in Cassandra plays a few roles:
- Types are used for input validation. If you attempt to put a string value in an integer column, for instance, Cassandra will return an error.
- Type information is made available to client libraries; most adapters will return the results of queries with values represented using the appropriate data type for the language.
- In some scenarios, rows can be ordered by the value of a certain column. In that case, the type of the column determines the order of values in the column. For instance, in a
text
column,'2'
is larger than'10'
, but in anint
column,10
is larger than2
.
Note
You can find a full list of all the CQL data types in the DataStax CQL documentation, at http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/cql_data_types_c.html.
Now that we've got a firm grasp of how to create a Cassandra table and what options are available to us when creating columns, it's time to put the table to use.