1.环境: mysql 5 ubuntu10.04 x86_64
mdb1 eth0 192.168.5.11 mdb2 eth0 192.168.5.12 sdb1 eth0 192.168.5.21 sdb2 eth0 192.168.5.22 sdb3 eth0 192.168.5.23 sdb4 eth0 192.168.5.24 haproxy eth0 192.168.5.10 (mdb vip write) eth1 192.168.5.20 (sdb vip read) 说明:mdb vip用于DB的写,sdb vip用于DB读,实现读写分离和负载均衡,带故障检测自动切换
2.架构图 web1 web2 web3 | | | —————————- | haproxy(lb db write/read) | ———————————- | | mdb1 mdb2 | | ————– —————- | | | | sdb1 sdb2 sdb3 sdb4 说明: 1)mdb1和mdb1配置成主-主模式,相互同步,通过haproxy提供一个lb的写ip 2)sdb1和sdb2配置为mdb1的从,sdb3和sdb4配置为mdb2的从 3)sdb1,sdb2,sdb3,sdb4这4台从库,通过haproxy提供一个lb的读ip 4) 当mdb2停止复制,mdb1为主库,haproxy停止发送请求到mdb2和sdb3,sdb4 5) 当mdb1停止复制,mdb2为主库,haproxy停止发送请求到mdb1和sdb1,sdb2 6) 当mdb1和mdb2同时停止复制,这时2台主库变成readonly模式,数据库不能写入 7)当mdb2 offline时,mdb1进入backup mode,停止发送请求到mdb2,sdb3,sdb4 8)当mdb1 offline时,mdb2进入backup mode,停止发送请求到mdb1,sdb1,sdb2 9) 当mdb1 mdb2同时offline,整个DB停止工作
3.安装mysql-server 登录mdb1,mdb2,sdb1,sdb2,sdb3,sdb4,输入以下命令进行安装: apt-get install mysql-server -y 安装时会提示输入mysql root用户密码,输入gaojinbo.com 修改mysql配置,监听所有接口 vi /etc/mysql/my.cnf 修改为: bind-address = 0.0.0.0 重启mysql /etc/init.d/mysql restart
4.配置mdb1,mdb2主-主同步 1)mdb1: vi /etc/mysql/my.cnf server-id = 1 log_bin = mysql-bin log-slave-updates #很重要,从前一台机器上同步过来的数据才能同步到下一台机器 expire_logs_days = 10 max_binlog_size = 100M auto_increment_offset = 1 auto_increment_increment = 2
2)mdb2: vi /etc/mysql/my.cnf server-id = 2 log_bin = mysql-bin log-slave-updates #很重要,从前一台机器上同步过来的数据才能同步到下一台机器 expire_logs_days = 10 max_binlog_size = 100M auto_increment_offset = 2 auto_increment_increment = 2
3)mdb1和mdb2: 重启mysql /etc/init.d/mysql restart 添加复制用户 mysql -uroot -pgaojinbo.com GRANT REPLICATION SLAVE ON . TO .%’ IDENTIFIED BY ‘gaojinbo’; 记录日志文件和pos mysql -uroot -pgaojinbo.com show master status\G
4)mdb1: change master to master_host=’192.168.5.12′,master_port=3306,master_user=’repl’,master_password=’gaojinbo’,master_log_file=’mysql-bin.000003′,master_log_pos=106; start slave; show slave status\G 说明:mysql-bin.000003和106是主库配置第3)步记录的信息 出现以下内容,说明同步ok Slave_IO_Running: Yes Slave_SQL_Running: Yes
5)mdb2: change master to master_host=’192.168.5.11′,master_port=3306,master_user=’repl’,master_password=’gaojinbo’,master_log_file=’mysql-bin.000001′,master_log_pos=249; start slave; show slave status\G 说明:mysql-bin.000001和249是主库配置第3)步记录的信息 出现以下内容,说明同步ok Slave_IO_Running: Yes Slave_SQL_Running: Yes
6)测试主-主同步 mdb1: mysql -uroot -pgaojinbo.com show databases; create database gaojinbo; mdb2: mysql -uroot -pgaojinbo.com show databases; 即可看到在mdb1上建立的数据库gaojinbo
至此mdb1,mdb2主-主配置完成!
5.4台从库配置 sdb1-4配置(注:server-id不能相同): vi /etc/MySQL/my.cnf server-id = 3 log_bin = mysql-bin 重启mysql /etc/init.d/mysql restart
sdb1和sdb2配置成mdb1的从库: mysql -uroot -pgaojinbo.com change master to master_host=’192.168.5.11′,master_port=3306,master_user=’repl’,master_password=’gaojinbo’,master_log_file=’mysql-bin.000001′,master_log_pos=345; start slave; show slave status\G 说明:mysql-bin.000001和345是主库配置第3)步记录的信息 出现以下内容,说明同步ok Slave_IO_Running: Yes Slave_SQL_Running: Yes
sdb3和sdb4配置成mdb2的从库: change master to master_host=’192.168.5.12′,master_port=3306,master_user=’repl’,master_password=’gaojinbo’,master_log_file=’mysql-bin.000003′,master_log_pos=106; start slave; show slave status\G 说明:mysql-bin.000003和106是主库配置第3)步记录的信息 出现以下内容,说明同步ok Slave_IO_Running: Yes Slave_SQL_Running: Yes
测试: mdb1: mysql -uroot -pgaojinbo.com show databases; create database eossc; 在其他DB上,这时会看到刚建立的数据库eossc
至此4台从数据库配置完成!
6.编写mysql检测脚本 1)mdb1和mdb2: vi /etc/xinetd.d/mysqlchk
/etc/xinetd.d/mysqlchk
service mysqlchk_write { flags = REUSE socket_type = stream port = 9200 wait = no user = nobody server = /opt/mysqlchk_status.sh log_on_failure += USERID disable = no only_from = 192.168.5.0/24 # recommended to put the IPs that need # to connect exclusively (security purposes) }
service mysqlchk_replication { flags = REUSE socket_type = stream port = 9201 wait = no user = nobody server = /opt/mysqlchk_replication.sh log_on_failure += USERID disable = no only_from = 192.168.5.0/24 # recommended to put the IPs that need # to connect exclusively (security purposes) } 添加服务端口 vi /etc/services mysqlchk_write 9200/tcp #mysqlchk_write mysqlchk_replication 9201/tcp #mysqlchk_replication mdb1上操作: vi /opt/mysqlchk_status.sh #!/bin/bash MYSQL_HOST="192.168.5.11" MYSQL_PORT="3306" MYSQL_USERNAME="root" MYSQL_PASSWORD="gaojinbo.com"
ERROR_MSG=/usr/bin/mysql –host=$MYSQL_HOST –port=$MYSQL_PORT –user=$MYSQL_USERNAME –password=$MYSQL_PASSWORD -e "show databases;" 2>/dev/null
if [ "$ERROR_MSG" != "" ] then # mysql is fine, return http 200 /bin/echo -e "HTTP/1.1 200 OK\r\n" /bin/echo -e "Content-Type: Content-Type: text/plain\r\n" /bin/echo -e "\r\n" /bin/echo -e "MySQL is running.\r\n" /bin/echo -e "\r\n" else # mysql is down, return http 503 /bin/echo -e "HTTP/1.1 503 Service Unavailable\r\n" /bin/echo -e "Content-Type: Content-Type: text/plain\r\n" /bin/echo -e "\r\n" /bin/echo -e "MySQL is down.\r\n" /bin/echo -e "\r\n" fi vi /opt/mysqlchk_replication.sh #!/bin/bash MYSQL_HOST="192.168.5.11" MYSQL_PORT="3306" MYSQL_USERNAME="root" MYSQL_PASSWORD="gaojinbo.com"
/usr/bin/mysql –host=$MYSQL_HOST –port=$MYSQL_PORT –user=$MYSQL_USERNAME –password=$MYSQL_PASSWORD -e "show slave status\G;" >/tmp/check_repl.txt iostat=grep "Slave_IO_Running" /tmp/check_repl.txt |awk ‘{print $2}’
sqlstat=grep "Slave_SQL_Running" /tmp/check_repl.txt |awk ‘{print $2}’
#echo iostat:$iostat and sqlstat:$sqlstat if [ "$iostat" = "No" ] || [ "$sqlstat" = "No" ]; then # mysql is down, return http 503 /bin/echo -e "HTTP/1.1 503 Service Unavailable\r\n" /bin/echo -e "Content-Type: Content-Type: text/plain\r\n" /bin/echo -e "\r\n" /bin/echo -e "MySQL replication is down.\r\n" /bin/echo -e "\r\n" else # mysql is fine, return http 200 /bin/echo -e "HTTP/1.1 200 OK\r\n" /bin/echo -e "Content-Type: Content-Type: text/plain\r\n" /bin/echo -e "\r\n" /bin/echo -e "MySQL replication is running.\r\n" /bin/echo -e "\r\n" fi 测试同步检测脚本: mysql -uroot -pgaojinbo.com stop slave sql_thread; #或者 stop slave io_thread; /opt/mysqlchk_replication.sh mdb2上操作: 添加和mdb1一样的脚本,把 /opt/mysqlchk_status.sh 里面的192.168.5.11修改为192.168.5.12 /opt/mysqlchk_replication.sh 里面的192.168.5.11修改为192.168.5.12
2)sdb1,sdb2,sdb3,sdb4上操作: vi /etc/xinetd.d/mysqlchk
/etc/xinetd.d/mysqlchk
service mysqlchk_replication { flags = REUSE socket_type = stream port = 9201 wait = no user = nobody server = /opt/mysqlchk_replication.sh log_on_failure += USERID disable = no only_from = 192.168.5.0/24 # recommended to put the IPs that need # to connect exclusively (security purposes) } vi /opt/mysqlchk_replication.sh #!/bin/bash MYSQL_HOST="192.168.5.21" MYSQL_PORT="3306" MYSQL_USERNAME="root" MYSQL_PASSWORD="gaojinbo.com"
/usr/bin/mysql –host=$MYSQL_HOST –port=$MYSQL_PORT –user=$MYSQL_USERNAME –password=$MYSQL_PASSWORD -e "show slave status\G;" >/tmp/check_repl.txt iostat=grep "Slave_IO_Running" /tmp/check_repl.txt |awk ‘{print $2}’
sqlstat=grep "Slave_SQL_Running" /tmp/check_repl.txt |awk ‘{print $2}’
#echo iostat:$iostat and sqlstat:$sqlstat if [ "$iostat" = "No" ] || [ "$sqlstat" = "No" ]; then # mysql is down, return http 503 /bin/echo -e "HTTP/1.1 503 Service Unavailable\r\n" /bin/echo -e "Content-Type: Content-Type: text/plain\r\n" /bin/echo -e "\r\n" /bin/echo -e "MySQL replication is down.\r\n" /bin/echo -e "\r\n" else # mysql is fine, return http 200 /bin/echo -e "HTTP/1.1 200 OK\r\n" /bin/echo -e "Content-Type: Content-Type: text/plain\r\n" /bin/echo -e "\r\n" /bin/echo -e "MySQL replication is running.\r\n" /bin/echo -e "\r\n" fi 注:脚本/opt/mysqlchk_replication.sh里面的ip sdb1 MYSQL_HOST="192.168.5.21" sdb2 MYSQL_HOST="192.168.5.22" sdb3 MYSQL_HOST="192.168.5.23" sdb4 MYSQL_HOST="192.168.5.24" 添加服务端口 vi /etc/services mysqlchk_replication 9201/tcp #mysqlchk_replication
3)所有DB上操作: 增加检测脚本执行权限 chmod +x /opt/mysql*.sh 重启系统 reboot 查看监听端口 netstat -antup|grep xinetd tcp 0 0 0.0.0.0:9200 0.0.0.0:* LISTEN 903/xinetd
tcp 0 0 0.0.0.0:9201 0.0.0.0:* LISTEN 903/xinetd 注:sdb只有9201监听7.haproxy安装配置 下载编译安装: wget tar xvzf haproxy-1.4.11.tar.gz cd haproxy-1.4.11 make TARGET=linux26 ARCH=x86_64 make install
配置 vi /etc/haproxy.cfg global maxconn 40000 debug #quiet user haproxy group haproxy nbproc 1 log 127.0.0.1 local3 spread-checks 2 defaults timeout server 3s timeout connect 3s timeout client 60s timeout http-request 3s timeout queue 3s frontend db_write bind 192.168.5.10:3306 default_backend cluster_db_write frontend db_read bind 192.168.5.20:3306 default_backend cluster_db_read frontend web_haproxy_status bind :80 default_backend web_status frontend monitor_mdb1 bind 127.0.0.1:9301 mode http acl no_repl_mdb1 nbsrv(mdb1_replication) eq 0 acl no_repl_mdb2 nbsrv(mdb2_replication) eq 0 acl no_mdb1 nbsrv(mdb1_status) eq 0 acl no_mdb2 nbsrv(mdb2_status) eq 0 monitor-uri /dbs monitor fail unless no_repl_mdb1 no_repl_mdb2 no_mdb2 monitor fail if no_mdb1 no_mdb2 frontend monitor_mdb2 bind 127.0.0.1:9302 mode http acl no_repl_mdb1 nbsrv(mdb1_replication) eq 0 acl no_repl_mdb2 nbsrv(mdb2_replication) eq 0 acl no_mdb1 nbsrv(mdb1_status) eq 0 acl no_mdb2 nbsrv(mdb2_status) eq 0 monitor-uri /dbs monitor fail unless no_repl_mdb1 no_repl_mdb2 no_mdb1 monitor fail if no_mdb1 no_mdb2 frontend monitor_sdb1 bind 127.0.0.1:9303 mode http acl no_repl_sdb1 nbsrv(sdb1_replication) eq 0 acl no_repl_mdb1 nbsrv(mdb1_replication) eq 0 acl no_mdb2 nbsrv(mdb2_status) eq 1 monitor-uri /dbs monitor fail if no_repl_sdb1 monitor fail if no_repl_mdb1 no_mdb2 frontend monitor_sdb2 bind 127.0.0.1:9304 mode http acl no_repl_sdb2 nbsrv(sdb2_replication) eq 0 acl no_repl_mdb1 nbsrv(mdb1_replication) eq 0 acl no_mdb2 nbsrv(mdb2_status) eq 1 monitor-uri /dbs monitor fail if no_repl_sdb2 monitor fail if no_repl_mdb1 no_mdb2 frontend monitor_sdb3 bind 127.0.0.1:9305 mode http acl no_repl_sdb3 nbsrv(sdb3_replication) eq 0 acl no_repl_mdb2 nbsrv(mdb2_replication) eq 0 acl no_mdb1 nbsrv(mdb1_status) eq 1 monitor-uri /dbs monitor fail if no_repl_sdb3 monitor fail if no_repl_mdb2 no_mdb1 frontend monitor_sdb4 bind 127.0.0.1:9306 mode http acl no_repl_sdb4 nbsrv(sdb4_replication) eq 0 acl no_repl_mdb2 nbsrv(mdb2_replication) eq 0 acl no_mdb1 nbsrv(mdb1_status) eq 1 monitor-uri /dbs monitor fail if no_repl_sdb4 monitor fail if no_repl_mdb2 no_mdb1 frontend monitor_splitbrain bind 127.0.0.1:9300 mode http acl no_repl01 nbsrv(mdb1_replication) eq 0 acl no_repl02 nbsrv(mdb2_replication) eq 0 acl mdb1 nbsrv(mdb1_status) eq 1 acl mdb2 nbsrv(mdb2_status) eq 1 monitor-uri /dbs monitor fail unless no_repl01 no_repl02 mdb1 mdb2 backend mdb1_replication mode tcp balance roundrobin option tcpka option httpchk server mdb1 192.168.5.11:3306 check port 9201 inter 1s rise 1 fall 1
backend mdb2_replication mode tcp balance roundrobin option tcpka option httpchk server mdb2 192.168.5.12:3306 check port 9201 inter 1s rise 1 fall 1 backend sdb1_replication mode tcp balance roundrobin option tcpka option httpchk server sdb1 192.168.5.21:3306 check port 9201 inter 1s rise 1 fall 1 backend sdb2_replication mode tcp balance roundrobin option tcpka option httpchk server sdb2 192.168.5.22:3306 check port 9201 inter 1s rise 1 fall 1 backend sdb3_replication mode tcp balance roundrobin option tcpka option httpchk server sdb3 192.168.5.23:3306 check port 9201 inter 1s rise 1 fall 1 backend sdb4_replication mode tcp balance roundrobin option tcpka option httpchk server sdb4 192.168.5.24:3306 check port 9201 inter 1s rise 1 fall 1 backend mdb1_status mode tcp balance roundrobin option tcpka option httpchk server mdb1 192.168.5.11:3306 check port 9200 inter 1s rise 2 fall 2
backend mdb2_status mode tcp balance roundrobin option tcpka option httpchk server mdb2 192.168.5.12:3306 check port 9200 inter 1s rise 2 fall 2 backend cluster_db_write mode tcp option tcpka balance roundrobin option httpchk GET /dbs server mdb1 192.168.5.11:3306 weight 1 check port 9201 inter 1s rise 5 fall 1 server mdb2 192.168.5.12:3306 weight 1 check port 9201 inter 1s rise 5 fall 1 backup server mdb1_backup 192.168.5.11:3306 weight 1 check port 9301 inter 1s rise 2 fall 2 addr 127.0.0.1 backup server mdb2_backup 192.168.5.12:3306 weight 1 check port 9302 inter 1s rise 2 fall 2 addr 127.0.0.1 backup backend cluster_db_read mode tcp option tcpka balance roundrobin option httpchk GET /dbs server mdb1 192.168.5.11:3306 weight 1 track cluster_db_write/mdb1 server mdb2 192.168.5.12:3306 weight 1 track cluster_db_write/mdb2 server mdb1_backup 192.168.5.11:3306 weight 1 track cluster_db_write/mdb1_backup server mdb2_backup 192.168.5.12:3306 weight 1 track cluster_db_write/mdb2_backup server mdb1_splitbrain 192.168.5.11:3306 weight 1 check port 9300 inter 1s rise 1 fall 2 addr 127.0.0.1 server mdb2_splitbrain 192.168.5.12:3306 weight 1 check port 9300 inter 1s rise 1 fall 2 addr 127.0.0.1 server sdb1_slave 192.168.5.21:3306 weight 1 check port 9303 inter 1s rise 5 fall 1 addr 127.0.0.1 server sdb2_slave 192.168.5.22:3306 weight 1 check port 9304 inter 1s rise 5 fall 1 addr 127.0.0.1 server sdb3_slave 192.168.5.23:3306 weight 1 check port 9305 inter 1s rise 5 fall 1 addr 127.0.0.1 server sdb4_slave 192.168.5.24:3306 weight 1 check port 9306 inter 1s rise 5 fall 1 addr 127.0.0.1
backend web_status mode http stats enable # stats scope # stats hide-version stats refresh 5s stats uri /status stats realm Haproxy\ statistics stats auth ylmf:gaojinbo
8.测试 1)正常情况,backup和splitbrain状态down Highslide JS
2)停止mdb2复制,mdb2和sdb3,sdb4状态down,数据库仍可读写 Highslide JS
3)同时停止mdb1,mdb2复制,mdb1和sdb1,sdb2,sdb3,sdb4状态down,数据库只能读 Highslide JS
4)关闭mdb1数据库,mdb1,mdb2和sdb1,sdb2状态down,数据库仍可读写
Highslide JS 5)关闭mdb2数据库,mdb1,mdb2和sdb3,sdb4状态down,数据库仍可读写 Highslide JS