Vitess is a database solution for deploying, scaling and managing large clusters of MySQL instances. It’s architected to run as effectively in a public or private cloud architecture as it does on dedicated hardware. It combines and extends many important MySQL features with the scalability of a NoSQL database. Vitess can help you with the following problems:
- Scaling a MySQL database by allowing you to shard it, while keeping application changes to a minimum.
- Migrating from baremetal to a private or public cloud.
- Deploying and managing a large number of MySQL instances.
Vitess includes compliant JDBC and Go database drivers using a native query protocol. Additionally, it implements the MySQL server protocol which is compatible with virtually any other language.
Vitess has been serving all YouTube database traffic since 2011, and has now been adopted by many enterprises for their production needs.
The following example will use a simple commerce database to illustrate how Vitess can take you through the journey of scaling from a single database to a fully distributed and sharded cluster. This is a fairly common story, and it applies to many use cases beyond e-commerce.
It’s 2018 and, no surprise to anyone, people are still buying stuff online. You recently attended the first half of a seminar on disruption in the tech industry and want to create a completely revolutionary e-commerce site. In classic tech postmodern fashion, you call your products widgets instead of a more meaningful identifier and it somehow fits.
Naturally, you realize the need for a reliable transactional datastore. Because of the new generation of hipsters, you’re probably going to pull traffic away from the main industry players just because you’re not them. You’re smart enough to foresee the scalability you need, so you choose Vitess as your best scaling solution.
Prerequisites
Before we get started, let’s get a few things out of the way.
- Download vitess
- Install Minikube
- Start a minikube engine:
minikube start --cpus=4 --memory=5000
. Note the additional resource requirements. In order to go through all the use cases, many vttablet and mysql instances will be launched. These require more resources than the defaults used by minikube. - Install etcd operator
- Install helm
- After installing, run
helm init
Optional
- Install mysql client. On Ubuntu:
apt-get install mysql-client
- Install vtctlclient
- Install go 1.11+
go get vitess.io/vitess/go/cmd/vtctlclient
- vtctlclient will be installed at
$GOPATH/bin/
Starting a single keyspace cluster
So you searched keyspace on Google and got a bunch of stuff about NoSQL… what’s the deal? It took a few hours, but after diving through the ancient Vitess scrolls you figure out that in the NewSQL world, keyspaces and databases are essentially the same thing when unsharded. Finally, it’s time to get started.
Change to the helm example directory:
cd examples/helm
In this directory, you will see a group of yaml files. The first digit of each file name indicates the phase of example. The next two digits indicate the order in which to execute them. For example, ‘101_initial_cluster.yaml’ is the first file of the first phase. We shall execute that now:
helm install ../../helm/vitess -f 101_initial_cluster.yaml
This will bring up the initial Vitess cluster with a single keyspace.
Verify cluster
Once successful, you should see the following state:
~/...vitess/helm/vitess/templates> kubectl get pods,jobs
NAME READY STATUS RESTARTS AGE
po/etcd-global-2cwwqfkf8d 1/1 Running 0 14m
po/etcd-operator-9db58db94-25crx 1/1 Running 0 15m
po/etcd-zone1-btv8p7pxsg 1/1 Running 0 14m
po/vtctld-55c47c8b6c-5v82t 1/1 Running 1 14m
po/vtgate-zone1-569f7b64b4-zkxgp 1/1 Running 2 14m
po/zone1-commerce-0-rdonly-0 6/6 Running 0 14m
po/zone1-commerce-0-replica-0 6/6 Running 0 14m
po/zone1-commerce-0-replica-1 6/6 Running 0 14m
NAME DESIRED SUCCESSFUL AGE
jobs/commerce-apply-schema-initial 1 1 14m
jobs/commerce-apply-vschema-initial 1 1 14m
jobs/zone1-commerce-0-init-shard-master 1 1 14m
If you have installed the mysql client, you should now be able to connect to the cluster using the following command:
~/...vitess/examples/helm> ./kmysql.sh
mysql> show tables;
+--------------------+
| Tables_in_commerce |
+--------------------+
| corder |
| customer |
| product |
+--------------------+
3 rows in set (0.01 sec)
You can also browse to the vtctld console using the following command (Ubuntu):
./kvtctld.sh
Minikube Customizations
The helm example is based on the values.yaml
file provided as the default helm chart for Vitess. The following overrides have been performed in order to run under minikube:
resources
: have been nulled out. This instructs the Kubernetes environment to use whatever is available. Note, this is not recommended for a production environment. In such cases, you should start with the baseline values provided inhelm/vitess/values.yaml
and iterate from those.- etcd and vtgate replicas are set to 1. In a production environment, there should be 3-5 etcd replicas. The number of vtgates will need to scale up based on cluster size.
mysqlProtocol.authType
is set tonone
. This should be changed tosecret
and the credentials should be stored as Kubernetes secrets.- A serviceType of
NodePort
is not recommended in production. You may choose not to expose these end points to anyone outside Kubernetes at all. Another option is to create Ingress controllers.
Topology
The helm chart specifies a single unsharded keyspace: commerce
. Unsharded keyspaces have a single shard named 0
.
NOTE: keyspace/shards are global entities of a cluster, independent of a cell. Ideally, you should list the keyspace/shards separately. For a cell, you should only have to specify which of those keyspace/shards are deployed in that cell. However, for simplicity, the existence of keyspace/shards are implicitly inferred from the fact that they are mentioned under each cell.
In this deployment, we are requesting two replica
type tables and one rdonly
type tablet. When deployed, one of the replica
tablet types will automatically be elected as master. In the vtctld console, you should see one master
, one replica
and one rdonly
vttablets.
The purpose of a replica tablet is for serving OLTP read traffic, whereas rdonly tablets are for serving analytics, or performing cluster maintenance operations like backups, or resharding. rdonly replicas are allowed to lag far behind the master because replication needs to be stopped to perform some of these functions.
In our use case, we are provisioning one rdonly replica per shard in order to perform resharding operations.
Schema
create table product(
sku varbinary(128),
description varbinary(128),
price bigint,
primary key(sku)
);
create table customer(
customer_id bigint not null auto_increment,
email varbinary(128),
primary key(customer_id)
);
create table corder(
order_id bigint not null auto_increment,
customer_id bigint,
sku varbinary(128),
price bigint,
primary key(order_id)
);
The schema has been simplified to include only those fields that are significant to the example:
- The
product
table contains the product information for all of the products. - The
customer
table has a customer_id that has an auto-increment. A typical customer table would have a lot more columns, and sometimes additional detail tables. - The
corder
table (named so becauseorder
is an SQL reserved word) has an order_id auto-increment column. It also has foreign keys into customer(customer_id) and product(sku).
VSchema
Since Vitess is a distributed system, a VSchema (Vitess schema) is usually required to describe how the keyspaces are organized.
{
"tables": {
"product": {},
"customer": {},
"corder": {}
}
}
With a single unsharded keyspace, the VSchema is very simple; it just lists all the tables in that keyspace.
NOTE: In the case of a single unsharded keyspace, a VSchema is not strictly necessary because Vitess knows that there are no other keyspaces, and will therefore redirect all queries to the only one present.
Vertical Split
Due to a massive ingress of free-trade, single-origin yerba mate merchants to your website, hipsters are swarming to buy stuff from you. As more users flock to your website and app, the customer
and corder
tables start growing at an alarming rate. To keep up, you’ll want to separate those tables by moving customer
and corder
to their own keyspace. Since you only have as many products as there are types of yerba mate, you won’t need to shard the product table!
Let us add some data into our tables to illustrate how the vertical split works.
./kmysql.sh < ../common/insert_commerce_data.sql
Doc: