Overview
In the previous post, a repository image running a web server was download from the Docker Hub Repository and a container instance was started. Additionally, an introduction to Docker networking was presented and a walkthrough of mapping external ports to container ports was demonstrated. This port mapping allowed a web browser running on the host computer to connect and communicate with the web server running in the Docker container. This posting will continue exploring containers with a discussion about data persistence using an example of running Microsoft SQL Server (for Linux) in a container on a development machine.
Prerequisites:
For this walkthrough, the following prerequisites must be met:
- Docker installed on the appropriate platform:
- Mac – Docker for Mac
- Windows with Hyper-V – Docker for Windows
- Windows with Oracle VirtualBox – Docker Toolbox
- At least 4GB of free memory (RAM)
- Available disk space for downloading repository images and running containers (for this initial post, 2GB will be required)
- Microsoft SQL Server Management Studio or another SQL Server Management tool such as Toad for SQL Server)
Note: for this walkthrough, all commands will be illustrated in a PowerShell prompt on Windows. Alternatively, a command prompt on windows or a shell/terminal prompt for Mac can be used as necessary.
Getting Started
This posting is going to take an iterative approach to running the Microsoft SQL Server (for Linux) container. The iterative approach was intentionally chosen to walk through the hurdles that might be encountered when running a more complex container as well as to demonstrate how volumes can be used to handle issues of data persistence.
Downloading a Repository Image from Docker Hub
As mentioned in the first posting, the process of pulling a repository image locally to the machine running Docker is optional as the docker run command will implicitly perform the pull if the image is not found. However, the Microsoft SQL Server repository image that will be used in this article is 1.31GB at the time of creating this post. Because of the large image size, it is recommended to pull this image locally using a fast, reliable internet connection. To pull a copy of the Developer Edition of Microsoft SQL Server for Linux locally, run the following command:
docker pull microsoft/mssql-server-linux:latest
One difference in this command from the ones used previously that is worth mentioning is that the name of the image has two parts separated by a forward slash (‘/’). This convention is used in public repositories to denote not only the image name but the author of the image. In this case, the author of the image is Microsoft and the image is named mssql-server-linux. This convention prevents conflicts when multiple repository image authors use the same name for a package.
Starting the SQL Server Container
Now that the image is pulled local, a SQL Server container can be started. However, when looking at the Docker Hub page for the repository, it states that there are more options than have been discussed in this blog series to date. The argument not previously covered in this blog series is the argument ‘-e’. The ‘-e’ argument is used to set environment variables inside the container at startup. In the case of the microsoft/mssql-server-linux repository image, there are two environment variables necessary to start up the container.
- The first environment variable is to set that the End User License Agreement (EULA) is accepted. To agree to the EULA, the value “ACCEPT_EULA=Y” must be passed.
- The other environment variable is to set the initial database administrator account password for the server. To set the initial database administrator account (userid = ‘sa’) password the value “SA_PASSWORD=<your_strong_password>” must be passed (where <your_strong_password> is an appropriately strong initial password). Also of note, the initial database administrator password, while a strong password, should not be the final password for the SQL Server instance in the container. Later in the blog, it will be shown that the value passed as an environment variable can be seen later after the container is running.
Now that the parameters have been discussed, start up the instance of the microsoft/mssql-server-linux repository image mapping the standard SQL Server port 1433 to an unused port on the host system using the following command:
docker run -p 1401:1433 -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=Pa%%word12#4" microsoft/mssql-server-linux:latest
When executing the command, one of two things will happen. Either the container will start and show several screens of logs from the SQL Server process or an error message will be displayed that indicates that the container requires more memory to run (“sqlservr: This program requires a machine with at least 3250 megabytes of memory”). The latter memory issue will likely be the experience the first time this container is attempted to be started as the default Docker install only pre-allocates 2048MB of memory for the virtual machine.
Updating the Docker Virtual Machine’s Allocated Memory
If the container failed to start because of the error indicating that the machine needs at least 3250MB of memory, then the Docker machine needs to be reconfigured to increase the amount of available memory. Changing the allocated memory to the Docker machine (at least with Docker for Windows) will require restarting the Docker machine. Therefore, before changing the allocated memory setting, ensure that all Docker containers are stopped. To increase the memory to 4098MB (recommendation for this article), the Docker settings screen must be launched (in Docker for Windows, right-clicking on the icon in the system tray and selecting Settings). Once in settings, the memory can be adjusted on the Advanced screen and then applied. After the Docker machine restarts, re-execute the command above.
Connecting to the SQL Server Container
With the SQL Server container running using the example command above, it is exposed to the host system on port 1401 (instead of the default SQL Server port of 1433). SQL Server Management Studio (or another similar SQL management tool) can be used to connect to the running instance. To connect to the instance, set up a connection with the following properties:
Once connected, it is highly advisable to change the initial database administrator account (‘sa’) password to a value different than what was used to build the container because all environment variables used to start up the container are not secure and visible in several ways when inspecting a container. (This instance of the container will be destroyed shortly, so changing the database administrator password can wait until later.)
To illustrate this point, the docker exec command will be used to run a command in the container. Begin by using the docker ps command in a new terminal window to retrieve the Container ID for the SQL Server container that was started. Once the Container ID is determined (for this illustration, the Container ID used will be ed24bc83fcd9) use the docker exec command to print the environment variables as follows:
docker exec ed24bc83fcd9 printenv
The result of the command will be a printout of the environment variables set within the container and the database administrator’s initially chosen password will be displayed in clear text.
Stopping and Restarting a Container
The next point to cover with respect to the new SQL Server container instance is to prove that it can save data and survive being restarted. To prove that the container is storing the data, begin by using a SQL management tool to create a database called ‘test’ in the SQL Server instance. Once the database is created, close the SQL management tool, return to the original command window showing the container logs and press CTRL+C to return to the command prompt. Once at the command prompt, stop the container by running the following command (again, using the Container ID determined above from the docker ps command):
docker stop ed24bc83fcd9
Since the container is now stopped, the docker ps command will no longer return it in the list. However, the docker ps -a command (where -a includes all instances, including stopped instances) will still show the container in the list. Next, start the container again by using the following command:
docker start ed24bc83fcd9
After the container starts, connect again using the SQL management tool and verify that the ‘test’ database is still present in the instance. What does this show? For development purposes, a single SQL Server container could be created and that data will be saved. Then the container can be stopped and started as necessary, day after day, without losing data.
Container Idempotence and Data Persistence
First, a little theory on containers. Containers are built around the concept of idempotence. That is, every time a container is created, it is identical to all the containers created previously or in the future (given the same version of the repository image). This works great in cases where a container is built from the binaries of an application (say a web server instance with a single web site). In this case, each instance of the container will have all the same settings, the same version of the website and can be placed behind a load balancer or used with a container orchestration scale-set. When a new version of an application is produced, a new version of the repository image is created and then all container instances can be swapped out for new instances based on the new repository image.
Given that description, putting SQL Server in a container is a little of an anti-pattern for most usages. If the SQL Server container is used to host static, read-only data for advanced querying, it still matches the above idempotence pattern. However, read/write databases, by their nature, are not idempotent. Remember, the container created above had a database added to it after it was created. Therefore, it has changed data and the next container image created from the repository image will not have the changes (e.g. the ‘test’ database). To prove this, stop and remove the container with the following commands:
docker stop ed24bc83fcd9 docker rm ed24bc83fcd9
Next, recreate the SQL Container again using the original command we used above:
docker run -p 1401:1433 -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=Pa%%word12#4" microsoft/mssql-server-linux:latest
Again, connect to the instance using the information in the “Connecting to the SQL Server Container” section above. Remember to use the database administrator password set in the command line. Upon connecting to the instance, the ‘test’ database will not be present anymore.
Using Volumes
So now that it is apparent that storing databases in the container is not a good idea, how can this be changed so that data is not lost when a container is recreated from the repository image? Docker volumes are an answer. Think of Docker volumes as a mapped drive to external storage — or, more specifically in Linux, a mount point to external storage. In Microsoft Windows terms, think of a volume as being a thumb-drive inserted into a USB port and having it show up in Explorer as a new drive letter. If Windows is reinstalled and the drive is inserted again, the data will still be there at that drive letter. In Linux, this is similar but drive letters are not used and drives show up as folders in a folder usually called mnt (short for mount).
How does this apply to Docker Volumes? Docker Volumes are created in the Docker Machine and can be attached to containers when started and are mounted at a provided location. So, given that Microsoft SQL Server for Linux stores all its databases and configuration information in the /var/opt/mssql folder, the command to run the container will be changed to mount a Docker Volume in that location instead. This means that at startup, the master, model, tempdb (as well as all other databases) will be created in the volume instead. If a container instance is removed and recreated from the repository image with the volume is specified again, the data will still be present in the new container. To illustrate this point, stop and remove the current SQL Server container and start up a new container using the following command (adding another parameter):
docker run -p 1401:1433 -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=Pa%%word12#4" --mount source=sql-data-container,target=/var/opt/mssql microsoft/mssql-server-linux:latest
Notice the ‘–mount’ argument was used. The syntax of this arguments specifies the source location to be a named volume and the target location to be a path inside the container. Additionally, notice that the above command used a volume name for the source location. Docker will auto-create the named volume if it is not found. Alternatively, the command docker volume create sql-data-container could have been executed manually to create the volume ahead of launching the new container.
Now that the container is running with the databases in an attached volume, connect again with your SQL management tool and create a new database ‘volumetest’. After creating the database, close the SQL management tool, stop and remove the new container and then recreate the container with the same command:
docker run -p 1401:1433 -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=Pa%%word12#4" --mount source=sql-data-container,target=/var/opt/mssql microsoft/mssql-server-linux:latest
Using the SQL management tool to reconnect to the instance, it will be observed that the ‘volumetest’ database is present. Again, this is because in the Microsoft SQL Server for Linux version, all databases are stored in the /var/opt/mssql folder location and because we mounted a volume there before the first startup, the SQL Server put all the databases in the volume.
To wrap up this article’s discussion of volumes, the Docker command to display volumes will be reviewed. To display the listing of volumes that have been created use the following command:
docker volume list
Disk Usage
Given that the container has been created and a volume was used to persist the databases, it might be interesting to look at the space consumed in the Docker machine. To look at the space usage of the containers, repository images and volumes, the docker system command can be used. Run the following command to see the usages:
docker system df -v
From the reported listing the reported space utilizations (based on the time of authoring of this article) are approximately:
- Repository Image: 1.315GB
- container: 0B
- volume: 80.82MB
Wrapping up
Like the earlier posts, this post would not be complete without a discussion of cleaning up all traces of what was created. After stopping and removing the container, the volume can also be removed if it is no longer needed.
Removing the Volume
To delete a volume (after removing the container that uses it), the following command can be executed:
docker volume rm sql-data-container
The removal can be confirmed by the docker volume list command again.
Need help deploying or automating containers on-premise or in the cloud? Check out our Cloud & Hybrid IT offerings or reach out. We’d love to get you started.