How I Learned to Stop Worrying and Love Couchbase?
TL;DR 🤓
echo "ns_1@$(cat /proc/sys/kernel/random/uuid).\${!DNS_PRIVATE}" > /opt/couchbase/var/lib/couchbase/couchbase-server.node\
&& exec /entrypoint.sh couchbase-server
I inherited an Ansible playbook with Couchbase resources, which unfortunately did not survive an indempotency test. Since the infrastructure was aleady hosted in AWS on EC2 instances, I decided to replace it with a CloudFormation stack and run the database as Docker containers on ECS. We are also using vendor maintained Dockerfile to keep things simple, which at the time of writing deploys Community Edition 6.0.0 build 1693
.
This approach posed a number of challenges, since Couchbase doesn't play well in dynamic environment, especially with changing IP addresses. Also, the Couchbase blog article on this subject is incomplete. So, the following describes is some detail what I've done to solve (work-around) these challenges.
In broad terms, this solution is designed to automatically scale Couchbase cluster as ECS container instances are added/removed by EC2 Auto Scaling (group). In our case, this is performend by a tiny orchestrator container called CouchbaseHelper
. This container utilises EFS shared storage to manage cluster state and takes care of initialising the cluster, creating indexes, seeding data and adding/removing Couchbase servers (re-balancing). Local volume mapped storage on ECS container instances is used for Couchbase container data. We use awsvpc
network mode for Couchbase service, enabling the containers to be assigned VPC (private) IPs, instead of local Docker bridge IPs. There is always only one helper container per ECS cluster and one Couchbase container per ECS container instance enforced using ECS service scheduling strategy, e.g:
CouchbaseECSService:
Type: 'AWS::ECS::Service'
Properties:
SchedulingStrategy: 'DAEMON'
...
As a starting point, I've used the excellent CFN template by https://cloudonaut.io/. This template builds out an ECS cluster, with (almost) everything required for extension. In my fork, I've added a number of stacks exports required to extend the solution further.
We start by creating:
- shared resources (e.g. VPC, IAM, ACM, ECR, S3, Route53, CloudWatch, etc.)
- EFS (NFS) shared storage
- ECS cluster
Having a main.yml
CFN stack containing nested resources of Type: 'AWS::CloudFormation::Stack'
is a relatively scalable way to organise your software stack, following is a stub example of what your main.yml
parent template may look like.
---
AWSTemplateFormatVersion: '2010-09-09'
Description: Couchbase
Metadata:
'AWS::CloudFormation::Interface':
ParameterGroups:
- Label:
default: 'Nested templates'
Parameters:
- VPCTemplate
- ECSTemplate
- R53Template
...
Parameters:
VPCTemplate:
Description: 'Nested template containing VPC resources.'
Type: String
Default: ''
ECSTemplate:
Description: 'Nested template containing ECS resources.'
Type: String
Default: ''
...
Conditions:
HasVPC: !Not [ !Equals [ '', !Ref 'VPCTemplate' ]]
...
Resources:
VPCStack:
Type: 'AWS::CloudFormation::Stack'
Condition: HasVPC
Properties:
TemplateURL: !Ref 'VPCTemplate'
Parameters:
NameTag: !Sub '${AWS::StackName}'
...
ECSStack:
Type: 'AWS::CloudFormation::Stack'
...
Outputs:
StackName:
Value: !Ref 'AWS::StackName'
Export:
Name: !Sub 'StackName-${AWS::StackName}'
VPCStack:
Condition: HasVPC
Value: !GetAtt [ VPCStack, Outputs.VPCStackName ]
Export:
Name: !Sub 'VPCStackName-${AWS::StackName}'
...
The first pre-requisite for the orchestration to work is a private DNS namespace, where we can create unique DNS records for our Couchbase cluster nodes. While ECS automatically registers our containers using the AWS::ServiceDiscovery::PrivateDnsNamespace
resource (which is effectively creates a private DNS hosted zone in Route53), this hosted zone doesn't allow us to add our own custom DNS records to it. So within our main stack we create a route53.yml
nested template containing a private hosted zone.
---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Route53 resources'
Parameters:
NameTag:
Type: String
HostedZone:
Type: String
VpcId:
Type: String
Resources:
PrivateHostedZone:
Type: 'AWS::Route53::HostedZone'
Properties:
HostedZoneConfig:
Comment: !Sub 'Private hosted zone for ${VpcId}.'
Name:
Fn::Join:
- ''
- - 'private.'
- !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
- !Sub '.${HostedZone}.'
VPCs:
- VPCId: !Ref 'VpcId'
VPCRegion: !Ref 'AWS::Region'
HostedZoneTags:
- Key: Name
Value: !Ref 'NameTag'
Outputs:
R53StackName:
Value: !Ref 'AWS::StackName'
Export:
Name: !Sub 'R53StackName-${AWS::StackName}'
PrivateHostedZone:
Value: !Ref 'PrivateHostedZone'
Export:
Name: !Sub 'PrivateHostedZone-${AWS::StackName}'
DNSPrivate:
Value:
Fn::Join:
- '.'
- - 'private'
- !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
- !Ref 'HostedZone'
Export:
Name: !Sub 'DNSPrivate-${AWS::StackName}'
we assemble our DNS name using the unique alpha-numeric stack Id from
AWS::StackName
(e.g.private.y6w42p6ucx4m.grsThr!ve.com
), so make sure to select the correct element from the split array
Next, within our main stack we nest our application stack (e.g. app.yml
), which will contain all of our custom resources, such as ECS tasks, services and service discovery for Couchbase.
---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Application resources'
Parameters:
NameTag:
Type: String
...
Mappings:
InstanceLookup:
t3.medium:
'DCPU': 512 # CPU limit
'DMEM': 1920 # memory limit = (DATA+IDX+FTS) - reserve
'DATA': 1024 # data memory size (Mb) = sum(bucket_memory)
'IDX': 256 # index memory size (Mb)
'FTS': 256 # full-text search memory size (Mb)
# bucket-spec: "[<bucket_name>:<bucket_type>:<bucket_memory> ...]"
'bucketspec': 'bucketA:couchbase:256 bucketB:couchbase:512 memcached:memcached:256'
...
Resources:
ServiceDiscoveryNamespace:
Type: 'AWS::ServiceDiscovery::PrivateDnsNamespace'
Properties:
Description: !Sub '${NameTag} discovery namespace.'
Vpc: !Ref 'VpcId'
Name:
Fn::Join:
- ''
- - 'private.'
- !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
- !Sub '.${HostedZone}.'
CouchbaseDiscoveryService:
Type: 'AWS::ServiceDiscovery::Service'
Properties:
Description: !Sub '${NameTag} Couchbase discovery service.'
Name: !Sub '${NameTag}-couchbase'
NamespaceId: !Ref 'ServiceDiscoveryNamespace'
DnsConfig:
DnsRecords:
- Type: A
TTL: 60
NamespaceId: !Ref 'ServiceDiscoveryNamespace'
HealthCheckCustomConfig:
FailureThreshold: 1
CouchbaseTaskDefinition:
Type: 'AWS::ECS::TaskDefinition'
Properties:
Volumes:
- Name: 'efs'
Host:
SourcePath: !Sub '/mnt/efs/${NameTag}'
- Name: 'local'
Host:
SourcePath: !Sub '/opt/${NameTag}'
- Name: 'local-couchbase-data'
Host:
SourcePath: !Sub '/opt/${NameTag}/couchbase-data'
NetworkMode: awsvpc
ContainerDefinitions:
- Image: 'couchbase:community-6.0.0'
Environment:
- Name: NAME_TAG
Value: !Sub '${NameTag}'
- Name: AWS_REGION
Value: !Ref 'AWS::Region'
- Name: AWS_ACCOUNT_ID
Value: !Ref 'AWS::AccountId'
- Name: AWS_STACK_NAME
Value: !Ref 'AWS::StackName'
- Name: AWS_STACK_ID
Value: !Ref 'AWS::StackId'
- Name: ECS_CLUSTER
Value: !Ref 'Cluster'
- Name: PRIVATE_DNSNAME
Value:
Fn::Join:
- '.'
- - !Sub '${NameTag}-couchbase'
- 'private'
- !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
- !Sub '${HostedZone}'
- Name: DNS_PRIVATE
Value: !Ref 'DNSPrivate'
Command:
- '/local-data/couchbase-bootstrap.sh'
Cpu: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'DCPU' ]
Memory: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'DMEM' ]
DockerLabels:
Name: !Sub '${NameTag}-couchbase'
Ulimits:
- HardLimit: 70000
Name: nofile
SoftLimit: 70000
Privileged: true
LinuxParameters:
Capabilities:
Add:
- ALL
LogConfiguration:
LogDriver: 'awslogs'
Options:
awslogs-group: !Ref 'LogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: 'couchbase-server'
MountPoints:
- ContainerPath: '/shared-data'
SourceVolume: 'efs'
- ContainerPath: '/local-data'
SourceVolume: 'local'
- ContainerPath: '/opt/couchbase/var'
SourceVolume: 'local-couchbase-data'
Name: 'couchbase-container'
HelperTaskDefinition:
Type: 'AWS::ECS::TaskDefinition'
Properties:
Volumes:
- Name: 'efs'
Host:
SourcePath: !Sub '/mnt/efs/${NameTag}'
- Name: 'local'
Host:
SourcePath: !Sub '/opt/${NameTag}'
ContainerDefinitions:
- Image: 'couchbase:community-6.0.0'
Environment:
- Name: NAME_TAG
Value: !Sub '${NameTag}'
- Name: PRIVATE_DNSNAME
Value:
Fn::Join:
- '.'
- - !Sub '${NameTag}-couchbase'
- 'private'
- !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
- !Sub '${HostedZone}'
- Name: CB_MEM_DATA
Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'DATA' ]
- Name: CB_MEM_INDEX
Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'IDX' ]
- Name: CB_MEM_FTS
Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'FTS' ]
- Name: CB_BUCKETS
Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'bucketspec' ]
- Name: AWS_REGION
Value: !Ref 'AWS::Region'
- Name: AWS_ACCOUNT_ID
Value: !Ref 'AWS::AccountId'
- Name: AWS_STACK_NAME
Value: !Ref 'AWS::StackName'
- Name: AWS_STACK_ID
Value: !Ref 'AWS::StackId'
- Name: ECS_CLUSTER
Value: !Ref 'Cluster'
Command:
- '/local-data/couchbase-init.sh'
Cpu: 128
Memory: 128
DockerLabels:
Name: !Sub '${NameTag}-helper'
User: root
Privileged: true
LinuxParameters:
Capabilities:
Add:
- ALL
LogConfiguration:
LogDriver: 'awslogs'
Options:
awslogs-group: !Ref 'LogGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: 'couchbase-helper'
MountPoints:
- ContainerPath: '/shared-data'
SourceVolume: 'efs'
- ContainerPath: '/local-data'
SourceVolume: 'local'
Name: 'helper-container'
CouchbaseECSService:
Type: 'AWS::ECS::Service'
Properties:
SchedulingStrategy: 'DAEMON'
Cluster: !Ref 'Cluster'
ServiceRegistries:
- RegistryArn: !GetAtt CouchbaseDiscoveryService.Arn
ContainerName: 'couchbase-container'
TaskDefinition: !Ref 'CouchbaseTaskDefinition'
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: DISABLED
SecurityGroups:
- !Ref 'SecurityGroup'
Subnets: !Split [ ',', !Ref 'PrivateSubnets' ]
CouchbaseHelperService:
Type: 'AWS::ECS::Service'
DependsOn: CouchbaseECSService
Properties:
DeploymentConfiguration:
MinimumHealthyPercent: 0
PlacementConstraints:
- Type: 'memberOf'
Expression: 'agentConnected == true'
DesiredCount: !Ref 'DesiredCount'
Cluster: !Ref 'Cluster'
TaskDefinition: !Ref 'HelperTaskDefinition'
...
Outputs:
AppStackName:
Value: !Ref 'AWS::StackName'
Export:
Name: !Sub 'AppStackName-${AWS::StackName}'
...
Our task definitions are started using shell scripts we create on each ECS container instance, in /opt
and map to the containers using a volume mount. To create the shell scripts, we use an EC2 sub-service called Systems Manager Services by nesting ssm.yml
within our parent stack.
Firstly, SSM allows us to map out EFS (NFS) storage on each ECS container intance using AWS-RunShellScript
association as follows.
MountNFS:
Type: 'AWS::SSM::Association'
Properties:
Name: 'AWS-RunShellScript'
Parameters:
commands:
- !Sub |
echo ${CurrentTimeStamp}
yum list installed nfs-utils || yum install -y nfs-utils
which telnet || yum install -y telnet
which dig || yum install -y bind-utils
[ -d /mnt/efs ] || mkdir -p /mnt/efs
grep ${StorageNFS}.efs.${AWS::Region}.amazonaws.com /etc/fstab\
|| echo '${StorageNFS}.efs.${AWS::Region}.amazonaws.com:/ /mnt/efs nfs nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 0 0' >> /etc/fstab
mount | grep -q /mnt/efs\
|| while ! (echo > /dev/tcp/${StorageNFS}.efs.${AWS::Region}.amazonaws.com/2049) >/dev/null 2>&1; do sleep 10; done
sleep 10
mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 "${StorageNFS}.efs.${AWS::Region}.amazonaws.com:/" /mnt/efs
chown -R 1000:1000 /mnt/efs/${NameTag}
Targets:
- Key: 'tag:aws:autoscaling:groupName'
Values:
- !Ref 'AutoScalingGroup'
The association is assigned to the AutoScalingGroup
, meaning new ECS container instances automatically mount shared storage on start-up.
Using the following associations, we create a boostrap script or Couchbase containers as well as our main helper script.
The Couchbase bootstrap script generates unique hostnames for each fresh Couchbase node, upserts Route53 CNAME records to point to the internal EC2 hostname (e.g. ip-172-31-24-238.ec2.internal
) and records these hostnames in the .uuids
state database, which is effectively a text file on the shared storage mapped to each container. Lastly, it renames the Couchbase node from the default IP address to the uniquely generated hostname. For existing nodes (ECS container restarts) coming up on different EC2 provate IPs, the script simply updates the Route53 record, before handing back over to the default entrypoint script.
CouchbaseScripts:
Type: 'AWS::SSM::Association'
DependsOn: MountNFS
Properties:
Name: 'AWS-RunShellScript'
Parameters:
commands:
- !Sub |
echo ${CurrentTimeStamp}
mkdir -p /opt/${NameTag}
mkdir -p /opt/${NameTag}/couchbase-data
chown -R 1000:1000 /opt/${NameTag}
# -------------------------- #
# Couchbase bootstrap script #
# -------------------------- #
cat << EOF > /opt/${NameTag}/couchbase-bootstrap.sh
#!/usr/bin/env bash
curl_opts='--silent --fail --retry 3'
PATH=/root/.local/bin:\${!PATH}
[ -f /root/.local/bin/pip ] || (wget --quiet https://bootstrap.pypa.io/get-pip.py && python get-pip.py --user)
[ -f /root/.local/bin/aws ] || /root/.local/bin/pip install awscli --user --quiet
ec2_ip=\$(hostname -i)
ec2_hostname=\$(hostname)
mkdir -p /opt/couchbase/var/lib/couchbase
if [ -f /opt/couchbase/var/lib/couchbase/ip ] || [ -f /opt/couchbase/var/lib/couchbase/ip_start ]; then
cb_hostname=\$(cat /opt/couchbase/var/lib/couchbase/ip || cat /opt/couchbase/var/lib/couchbase/ip_start)
fi
if ! [[ "\${!cb_hostname}" =~ ^[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+\.\${!DNS_PRIVATE}\$ ]]; then
cb_hostname="\$(cat /proc/sys/kernel/random/uuid).\${!DNS_PRIVATE}"
fi
echo "ec2_ip=\${!ec2_ip} ec2_hostname=\${!ec2_hostname} cb_hostname=\${!cb_hostname}"
grep \${!cb_hostname} /shared-data/\${!ECS_CLUSTER}.uuids || echo \${!cb_hostname} >> /shared-data/\${!ECS_CLUSTER}.uuids
change_id=\$(/root/.local/bin/aws route53 change-resource-record-sets\
--hosted-zone-id ${PrivateHostedZoneId}\
--change-batch "{\"Changes\":[{\"Action\":\"UPSERT\",\"ResourceRecordSet\":{\"Name\":\"\${!cb_hostname}.\",\"Type\":\"CNAME\",\"TTL\":60,\"ResourceRecords\":[{\"Value\":\"\${!ec2_hostname}\"}]}}]}"\
| grep Id | awk '{print \$2}' | sed 's/"//g')
/root/.local/bin/aws route53 wait resource-record-sets-changed --id \${!change_id}
echo "set Couchbase hostname: \${!cb_hostname}"
echo "\${!cb_hostname}" > /opt/couchbase/var/lib/couchbase/ip
echo "\${!cb_hostname}" > /opt/couchbase/var/lib/couchbase/ip_start
echo "ns_1@\${!cb_hostname}" > /opt/couchbase/var/lib/couchbase/couchbase-server.node
exec /entrypoint.sh couchbase-server
EOF
chmod +x /opt/${NameTag}/couchbase-bootstrap.sh
...
This next long snippet contains our main orchestrator script, which takes care of cluster initialisation, data seeding, index creation and adding/removing nodes. It uses the .uuids
state information recorded by the Couchbase bootstrap script in the shared-data
location to create the initial custer by always taking the first hostname from the list. Note, this node becomes the master cluster node, where indexes are created.
Any failed nodes are flagged in the .remove
file on NFS shared storage, removed and the cluster re-balanced.
Similarly, any new nodes found in the .uuids
file, are added to the cluster and the cluster re-balanced.
# --------------------- #
# Couchbase init script #
# --------------------- #
cat << EOF > /opt/${NameTag}/couchbase-init.sh
#!/usr/bin/env bash
echo "\$@"
curl_opts='--silent --fail --retry 3'
aws_opts='--region ${AWS::Region}'
printenv
apt-get -qq update > /dev/null
which python || apt-get -qq install -y python > /dev/null
which pip || (apt-get -qq install -y python-pip > /dev/null && pip install pip --upgrade --quiet)
which dig || apt-get -qq install -y dnsutils > /dev/null
which openssl || apt-get -qq install -y openssl > /dev/null
which git || apt-get -qq install -y git > /dev/null
which jq || apt-get -qq install -y jq > /dev/null
which curl || apt-get -qq install -y curl > /dev/null
which aws || pip install awscli --upgrade --quiet
pip list | grep bcrypt || pip install bcrypt --upgrade --quiet
if ! [ -d /opt/bmemcached-cli ]; then
mkdir -p /opt/bmemcached-cli
git clone https://github.com/RedisLabs/bmemcached-cli.git /opt/bmemcached-cli
pushd /opt/bmemcached-cli
pip install . -r requirements.pip
popd
fi
ecs_metadata=\$(curl \${!curl_opts} \${!ECS_CONTAINER_METADATA_URI} | jq -r '.')
ecs_cluster=\$(echo \${!ecs_metadata} | jq -r '.Labels."com.amazonaws.ecs.cluster"')
cb_cluster=\$(cat /shared-data/\${!ecs_cluster}.uuids | head -n 1)
cb_admin_passwd=\$(cb_admin_passwd=\$(openssl rand -base64 18)
echo "\${!cb_admin_passwd}"
while ! [ -f /shared-data/\${!ecs_cluster}.uuids ]; do sleep 5s; done
while true; do
cb_cluster=\$(cat /shared-data/\${!ecs_cluster}.uuids | head -n 1)
cluster_ip=\$(dig +short \${!cb_cluster})
if ! [ -f /shared-data/\${!ecs_cluster}.init ]; then
while ! curl \${!curl_opts} http://\${!cluster_ip}:8091/pools; do
echo "waiting for cluster \${!cluster_ip} to become available..."
sleep 5s
done
cb_pools=\$(curl \${!curl_opts} http://\${!cb_cluster}:8091/pools | jq -r '.pools | length')
echo "cluster=\${!cb_cluster} cluster_ip=\${!cluster_ip} pools=\${!cb_pools}"
# initialise new cluster
if [[ "\${!cb_cluster}" != '' ]] && [[ "\${!cluster_ip}" != '' ]] && [[ \${!cb_pools} -eq 0 ]]; then
echo "initialise cluster \${!cb_cluster}"
/opt/couchbase/bin/couchbase-cli cluster-init\
--cluster \${!cb_cluster}\
--cluster-name \${!ecs_cluster}\
--services 'data,index,query,fts'\
--cluster-ramsize \${!CB_MEM_DATA}\
--cluster-index-ramsize \${!CB_MEM_INDEX}\
--cluster-fts-ramsize \${!CB_MEM_FTS}\
--cluster-username admin\
--cluster-password "\${!cb_admin_passwd}"
while [[ \$(curl \${!curl_opts} --user "admin:\${!cb_admin_passwd}"\
http://\${!cluster_ip}:8091/pools\
| jq -r '.pools[] | select(.name=="default").name') != 'default' ]]; do
sleep 5s
done
echo "bucket-spec=\${!CB_BUCKETS}"
for spec in \$(echo \${!CB_BUCKETS}); do
bucket_name=\$(echo \${!spec} | awk -F':' '{print \$1}')
bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
bucket_mem=\$(echo \${!spec} | awk -F':' '{print \$3}')
bucket_replica=''
[[ \${!bucket_name} != 'memcached' ]] && bucket_replica='--enable-index-replica 1 --bucket-replica ${DesiredCapacity}'
echo "create bucket=\${!bucket_name} type=\${!bucket_type} mem=\${!bucket_mem} cluster=\${!cb_cluster}"
/opt/couchbase/bin/couchbase-cli bucket-create\
--cluster \${!cb_cluster}\
--bucket-type \${!bucket_type}\
--bucket \${!bucket_name}\
--bucket-ramsize \${!bucket_mem}\
\${!bucket_replica}\
--username admin\
--password "\${!cb_admin_passwd}"
bucket_passwd=\$(openssl rand -base64 18)
echo "\${!bucket_passwd}" > /shared-data/\${!ecs_cluster}.\${!bucket_name}
echo "create user=\${!bucket_name} cluster=\${!cb_cluster}"
/opt/couchbase/bin/couchbase-cli user-manage\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}"\
--set\
--rbac-username "\${!bucket_name}"\
--rbac-password "\${!bucket_passwd}"\
--rbac-name "\${!bucket_name}"\
--roles "bucket_full_access[\${!bucket_name}]"\
--auth-domain local
done
sleep 30s
echo stats | bmemcached-cli memcached:\$(cat /shared-data/\${!ecs_cluster}.memcached | head -n 1)@\${!cb_cluster}:11210
echo \${!cb_cluster} > /shared-data/\${!ecs_cluster}.init
fi
fi
cluster_hosts=\$(/opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}")
echo "\${!cluster_hosts}"
for failed_host in \$(/opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}"\
| grep 'unhealthy inactiveFailed'\
| awk '{print \$1}' | awk -F'@' '{print \$2}'); do
if [[ "\${!failed_host}" != "\${!cb_cluster}" ]]; then
echo "flagging \${!failed_host} for removal"
echo \${!failed_host} >> /shared-data/\${!ecs_cluster}.remove
fi
done
# remove defunct nodes
if [ -f /shared-data/\${!ecs_cluster}.remove ]; then
for cb_host in \$(cat /shared-data/\${!ecs_cluster}.remove); do
if [[ "\${!cb_host}" != "\${!cb_cluster}" ]]; then
echo "(hard) failover \${!cb_host} on cluster \${!cb_cluster}"
/opt/couchbase/bin/couchbase-cli failover\
--cluster \${!cb_cluster}\
--server-failover \${!cb_host}\
--force\
--username admin\
--password "\${!cb_admin_passwd}"
while /opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}" | grep 'warmup'; do
for spec in \$(echo \${!CB_BUCKETS}); do
bucket=\$(echo \${!spec} | awk -F':' '{print \$1}')
bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
bucket_pass=\$(cat /shared-data/\${!ecs_cluster}.\${!bucket} | head -n 1)
if [[ "\${!bucket_type}" != 'memcached' ]]; then
/opt/couchbase/bin/cbstats\
\${!cb_cluster}\
-b \${!bucket}\
-p "\${!bucket_pass}"\
-j warmup
fi
done
sleep 60s
done
echo "remove \${!cb_host} from cluster \${!cb_cluster}"
/opt/couchbase/bin/couchbase-cli rebalance\
--cluster \${!cb_cluster}\
--server-remove \${!cb_host}\
--username admin\
--password "\${!cb_admin_passwd}"
ec2_hostname=\$(dig +short \${!cb_host})
change_id=\$(/root/.local/bin/aws route53 change-resource-record-sets\
--hosted-zone-id ${PrivateHostedZoneId}\
--change-batch "{\"Changes\":[{\"Action\":\"DELETE\",\"ResourceRecordSet\":{\"Name\":\"\${!cb_host}.\",\"Type\":\"CNAME\",\"TTL\":60,\"ResourceRecords\":[{\"Value\":\"\${!ec2_hostname}\"}]}}]}"\
| grep Id | awk '{print \$2}' | sed 's/"//g')
/root/.local/bin/aws route53 wait resource-record-sets-changed --id \${!change_id}
tmpfile=\$(mktemp)
sed "/\${!cb_host}/d" /shared-data/\${!ecs_cluster}.remove > \${!tmpfile}
cat \${!tmpfile} > /shared-data/\${!ecs_cluster}.remove
sed "/\${!cb_host}/d" /shared-data/\${!ecs_cluster}.init > \${!tmpfile}
cat \${!tmpfile} > /shared-data/\${!ecs_cluster}.init
fi
done
fi
cb_hosts=(\$(cat /shared-data/\${!ecs_cluster}.uuids))
# add the server to existing cluster
for cb_host in \${!cb_hosts[@]}; do
if [ -f /shared-data/\${!ecs_cluster}.init ]\
&& [[ "\${!cb_host}" != "\${!cb_cluster}" ]]\
&& ! grep \${!cb_host} /shared-data/\${!ecs_cluster}.init; then
if ! /opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}"\
| grep 'unhealthy'; then
echo "adding \${!cb_host} to Couchbase cluster \${!cb_cluster}"
echo \${!cb_host} > /shared-data/\${!ecs_cluster}.add
/opt/couchbase/bin/couchbase-cli server-add\
--cluster \${!cb_cluster}\
--server-add \${!cb_host}\
--services 'data,index,query,fts'\
--server-add-username admin\
--server-add-password "\${!cb_admin_passwd}"\
--username admin\
--password "\${!cb_admin_passwd}"
while /opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}" | grep 'warmup'; do
for spec in \$(echo \${!CB_BUCKETS}); do
bucket=\$(echo \${!spec} | awk -F':' '{print \$1}')
bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
bucket_pass=\$(cat /shared-data/\${!ecs_cluster}.\${!bucket} | head -n 1)
if [[ "\${!bucket_type}" != 'memcached' ]]; then
/opt/couchbase/bin/cbstats\
\${!cb_cluster}\
-b \${!bucket}\
-p "\${!bucket_pass}"\
-j warmup
fi
done
sleep 60s
done
echo "rebalancing \${!cb_host}"
/opt/couchbase/bin/couchbase-cli rebalance\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}"
echo 'update bucket replicas'
cluster_hosts=\$(/opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}" | wc -l)
echo "bucket-spec=\${!CB_BUCKETS} cluster_hosts=\${!cluster_hosts}"
for spec in \$(echo \${!CB_BUCKETS}); do
bucket_name=\$(echo \${!spec} | awk -F':' '{print \$1}')
if [[ \${!bucket_name} != 'memcached' ]]; then
echo "edit bucket=\${!bucket_name} cluster_hosts=\${!cluster_hosts}"
/opt/couchbase/bin/couchbase-cli bucket-edit\
--cluster \${!cb_cluster}\
--bucket \${!bucket_name}\
--bucket-replica \${!cluster_hosts}\
--username admin\
--password "\${!cb_admin_passwd}"
fi
done
echo stats | bmemcached-cli memcached:\$(cat /shared-data/\${!ecs_cluster}.memcached | head -n 1)@\${!cb_host}:11210
echo \${!cb_host} >> /shared-data/\${!ecs_cluster}.init
rm -rf /shared-data/\${!ecs_cluster}.add
fi
fi
done
while /opt/couchbase/bin/couchbase-cli server-list\
--cluster \${!cb_cluster}\
--username admin\
--password "\${!cb_admin_passwd}" | grep 'warmup'; do
for spec in \$(echo \${!CB_BUCKETS}); do
bucket=\$(echo \${!spec} | awk -F':' '{print \$1}')
bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
bucket_pass=\$(cat /shared-data/\${!ecs_cluster}.\${!bucket} | head -n 1)
if [[ "\${!bucket_type}" != 'memcached' ]]; then
/opt/couchbase/bin/cbstats\
\${!cb_cluster}\
-b \${!bucket}\
-p "\${!bucket_pass}"\
-j warmup
fi
done
sleep 60s
done
sleep 300s
done
EOF
chmod +x /opt/${NameTag}/couchbase-init.sh
The helper script in our case also hadles minor configuration tasks, such as enabling email alerts as well as bootstraps our application containers, by (re)setting application credentials. I've left these out for brevity, but these tasks follow the same documented approach, such as inserting data into the database using cbq
and writing state out to /shared-data
.
The above helper script is limited to automatically scaling the cluster with a maximum of two nodes. When three or more nodes are in the cluster, the default automatic failover mechanism in Couchbase will prevent the script from completing the rebalancing activities, since it will never exit the warmup wait loops. However, it should be trivial to change the helper script to enable it to automate >2 nodes if desired. Manually failing over one of the nodes usign the Couchbase UI or CLI will allow the script to proceed as is.
The other noteworthy item is the protection of the ECS container instance holding at least the master (first) cluster node. It would be worthy to protect this resource using auto-scaling group scale-in as well as EC2 termination protection mechanisms. Having said that, it's also worth remembering that the community edition of Couchbase does not support index replication and in this example the indexes are always created on the first (master) node.
For the record, I don't and have never loved Couchbase.
-- belodetek