Creating the users table_Learning Apache Cassandra-QQ阅读男生中文玄幻网

上QQ阅读APP看书，第一时间看更新

Creating the users table

Our first table will store basic user account information: username, email, and password. To create the table, fire up the CQL shell (don't forget to use the USE "my_status"; statement if you are starting a fresh session) and enter the following CQL statement:

CREATE TABLE "users" (
  "username" text PRIMARY KEY,
  "email" text,
  "encrypted_password" blob
);

In the above statement, we created a new table called users, which has three columns: username and email, which are text columns, and encrypted_password, which has the type blob. The username column acts as the primary key for the table.

Structuring of tables

Cassandra structures tables in rows and columns, just like a relational database. Also like a relational database, the columns available to a table are defined in advance. New columns cannot be added on-the-fly when inserting data, although it's possible to update an existing table's schema.

Every table defines one or more columns to act as the primary key; each row is uniquely identified by the value(s) in its primary key column(s), and those columns cannot be left blank in any row. Cassandra does not offer auto-incrementing primary keys; each row, when created, must be explicitly assigned a primary key by the client. One good way to structure the primary key is to use a natural key, which is a value that is fundamentally unique for each row you want to store. That's what we do in the users table by making username, a naturally unique identifier, the primary key.

Note

Before we continue, I should emphasize the importance of using the latest version of Cassandra. Cassandra and CQL have changed substantially over the past few years, and the current CQL version 3.1 is not backward compatible with older versions of the language. The examples in this book target the latest Cassandra version (at the time of writing, 2.1.2); make sure you're running this version or a newer one.

Table and column options

The table creation statement given in the previous section is very simple: little more than a list of column names and their respective types. In contrast, an SQL CREATE TABLE statement will often look quite complex, with multiple options set on each column, such as defaults, constraints, and nullability flags. What sorts of bells and whistles can we add to a CQL table?

At the table level, Cassandra does have quite a few configuration options, known as table properties. These properties allow you to tune a wide range of under-the-hood aspects of the table, such as caching, compression, garbage collection, and read repair. Table properties do not, however, bear on the table's behavior from the application's standpoint. For this reason, we won't go into detail about them in this book.

Columns, on the other hand, have very few knobs to turn. In fact, outside of the type, each column is pretty much the same. There are several column options you may be accustomed to from SQL databases that don't carry over to Cassandra, such as the following:

Cassandra doesn't have a concept of NULL values; columns either have data, or they don't. Primary key columns are always required to have a value; non-key columns are always optional. You will see the word null appear in the cqlsh output, but that simply means there is no data in this column, and should not be confused with the concept of NULL in a relational database.
Cassandra doesn't support default values for columns. If a row is inserted without a value for a certain column, that column just doesn't have a value.
Cassandra doesn't provide data validations like length limits or other more complex column constraints. As long as a value is of the right type for the column you're putting it in, it's valid.

Happily, most modern applications do not need to rely on the sorts of constraints listed above; domain modeling libraries and object mappers typically allow you to easily apply these constraints at the application level.

The type system

Each column defined for a table has a defined type. In the users table, we used the text and blob types, but those are only two of the many types built in to Cassandra.

Strings

Cassandra has two types that store string data:

The text type stores UTF-8 encoded strings. It's also aliased as the varchar type; you can use these interchangeably.
The ascii type stores strings of ASCII characters (bytes 0–127).

Neither of the above string types has a limit on the length of strings that can be stored. In CQL, string literals are surrounded in single quotes, 'like this'.

Integers

Cassandra has three types that store integers:

The int type stores 32-bit integers, which can store values ranging from approximately -2.1 billion to 2.1 billion
The bigint type stores 64-bit integers, which can store values from about -9 quintillion to 9 quintillion
The varint type stores integers of arbitrary size; it has no minimum or maximum value

All integer types are signed, meaning they can hold positive or negative integers. There are no unsigned numeric types in Cassandra. Integer literals in CQL, like in most languages, consist of an optional minus sign followed by one or more digits, such as 3549.

Floating point and decimal numbers

Cassandra has three types that store non-integer numbers:

The float type stores 32-bit floating point numbers.
The double type stores 64-bit floating point numbers.
The decimal type stores variable-precision decimal numbers, with no upper bound on size. Unlike a floating point number, a variable-precision decimal will never suffer from base 10 rounding errors in the fractional part of the number.

Like the integer types, the floating point and decimal types are always signed. In CQL, floating point and decimal literals are represented by an optional minus sign, followed by a series of digits, followed by a period, followed by another series of digits, such as 7152.6846. You can also write them using exponential notation, like 9.021e14.

Dates and times

Dates and times can be stored using the timestamp type, which holds date/time data at millisecond precision. Timestamp literals are enclosed in single quotation marks like string literals, and take the format 'yyyy-mm-dd HH:mm:ssZ', for example '2014-05-18 15:49:31-0400'. There is no type that stores dates without times, although the time portion of a timestamp literal can be omitted, which defaults the time to midnight in the given time zone. To represent timestamps with millisecond precision, you may also use a numeric literal with the number of milliseconds since midnight UTC on January 1, 1970, for instance, 1400442761830.

UUIDs

Cassandra has two types that store universally unique identifiers:

The uuid type stores Version 1 and Version 4 UUIDs
The timeuuid type stores Version 1 UUIDs, and has special functionality for conversion between UUIDs and timestamps

Note

A UUID, which is short for universally unique identifier, is essentially a very large number generated in a specific way, designed to guarantee that the same UUID will never be generated anywhere in the world at any time. Version 1 UUIDs are generated using a high-precision timestamp and the generating computer's MAC address; the timestamp can be extracted from the UUID. Version 4 UUIDs use random or pseudorandom numbers.

CQL uses the canonical representation of UUIDs, which is a sequence of hexadecimal digits broken up in specific places by dashes, in the form 8-4-4-4-12. A CQL UUID literal is not surrounded by quotation marks or any other delimiter; for example, 550e8400-e29b-41d4-a716-446655440000 is a valid UUID literal. Most languages have UUID libraries available that will generate UUIDs and output them in the canonical format.

Booleans

The boolean type stores simple true/false values. A boolean literal is either true or false, with no surrounding quotation marks.

Blobs

The blob type stores unstructured binary data. Blobs are a good choice for images, audio, and encrypted data. In CQL, the blob literal is a sequence of hexadecimal digits, prefixed with 0x, for instance, 0x1d4375023013dba2d5f9a. Blob literals are not surrounded by quotation marks.

The purpose of types

Now that you know the full range of types available in Cassandra, you may be wondering what purpose those types serve. In fact, the type system in Cassandra plays a few roles:

Types are used for input validation. If you attempt to put a string value in an integer column, for instance, Cassandra will return an error.
Type information is made available to client libraries; most adapters will return the results of queries with values represented using the appropriate data type for the language.
In some scenarios, rows can be ordered by the value of a certain column. In that case, the type of the column determines the order of values in the column. For instance, in a text column, '2' is larger than '10', but in an int column, 10 is larger than 2.

Note

You can find a full list of all the CQL data types in the DataStax CQL documentation, at http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/cql_data_types_c.html.

Now that we've got a firm grasp of how to create a Cassandra table and what options are available to us when creating columns, it's time to put the table to use.