客服微信
前言
通过本文的学习,您将可以
1. 了解TBase对部署环境的要求
2. 掌握部署TBase前需要对操作系统做的准备性工作
3. 掌握TBase的安装部署操作
4. 了解TBase安装中的其他注意事项
一
部署环境介绍
1.1 机器配置
1.2 管理角色设备分配方案
1.3.1 操作系统部署要求
1.4 TBase安装包与许可证的获取安装TBase,你需要
以下文件:——以上请联系腾讯商务代表,或在 https://github.com/Tencent/TBase 获取。
二
服务器环境准备
2.1 测试数据盘的随机同步写入性能
time dd if=/dev/zero of=test bs=1M count=1024 oflag=dsync
records in
records out
1073741824 bytes (1.1 GB) copied, 17.4811 s, 61.4 MB/s
real 0m17.482s
user 0m0.002s
sys 0m0.748s
2.2 较准机器的时间
创建定时任务前,确保ntp / chrony 服务关闭
创建计划任务进行时间同步
crontab –e (写入以下内容)
#NTP服务器可自行选择
*/20 * * * * /usr/sbin/ntpdate ntpupdate.tencentyun.com >/dev/null &
2.3 防火墙与selinux配置
关闭SELinux
setenforce 0
设置SELinux开机不自启动
vi /etc/sysconfig/selinux
将其中的SELINUX= XXXXXX修改为SELINUX=disabled
关闭防火墙&禁止防火墙开机自启
2.4 修改limits.conf参数(生产环境)
[root@tbase01]#cat /etc/security/limits.conf
#准许单个进程打开文件数
* soft nofile 131072
* hard nofile 131072
#准放运行的进程数
* soft nproc 131072
* hard nproc 131072
#限制内核文件的大小
* soft core unlimited
* hard core unlimited
2.5 修改内核参数sysctl.conf(生产环境)
[ ]
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmall = 4294967296
kernel.shmmax=137438953472
kernel.shmmni = 4096
kernel.sem = 50100 64128000 50100 1280
fs.file-max = 6553600
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 6
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.ip_forward=1
vm.dirty_background_bytes = 102400000
2.6 主机名与IP配置
修改主机名
hostnamectl set-hostname XXXXXX
IP地址配置
IP地址无特别要求,可以存在多个网段,只能保证网络是连通的就行。
带宽要求
同IDC,同城最好是万兆,并且要求是低于3ms的延迟。
异地视要同步数据量要求,在做备机时需要比较高的带宽,延迟按设计是30ms左右。
2.7 sshd配置
修改/etc/ssh/sshd_config配置
vi /etc/ssh/sshd_config
UseDNS no
修改完成后重启sshd服务
CentOS 6 版本service sshd restart
CentOS 7 版本systemctl restart sshd
三
应用程序部署
3.1 上传安装包并解压
上传到/root目录下
[ ]
/root
[ ]
如果安装包是tar.gz包,则需要这样解压
tar -zxf package_name.tar.gz
3.2 配置安装选项
[root@tbase01~ ]# vi tbase_mgr/conf/role.info
#eth ip idc_name root password role( OssCenterMaster OssCenterSlave
OssAgent Confdb Alarm TStudio Zookeeper)
#Notice: The machine that will be installed OssCenterMaster or
OssCenterSlave can't be installed OssAgent
ens33 192.168.43.129 idc_1 root oracle
OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 idc_1 root oracle OssCenterSlave|Confdb|Alarm
#Etcd组件是confdb倒换和center倒换的容灾组件,Etcd部署个数必须是奇数个
#Etcd建议跟center或者confdb部署在一起,OssCenter支持一主多备。如果机器数量有足够多,除部署以上所示组件的两台机器,其他机器可以只部署OssAgent即可
#(建议最少部署3个Etcd组件,可靠性更高)。
3.3 运行安装程序
[root@tbase01 tbase_mgr]# ./tbase_mgr.sh install
Hey, Welcome and now we will install TBase OSS by the flowing steps:
0. Check role configuration read from conf/role.info ...
1. Install some base rpm packages such as dos2unix/createrepo/expect and so on ...
2. Check root password and do some initalization on all machines read from conf/role.info ...
3. Check package requires ...
4. Create OS user tbase specifided in conf/oss/oss_init.conf ...
5. Create yum repository which store all the rpm packages needed by TBase OSS ...
6. Install and Start ConfdbMaster ...
7. Install and Start all ConfdbSlaves ...
8. Install and Start Alarm Server ...
9. Install and Start Zookeeper If needed ...
10. Install and Start OssCenterMaster and all OssCenterSlaves ...
11. Install and Start all Agents ...
12. Install default tbase_pgxz package...
13. Install and Start Tstudio ...
Ready to continue (Yy|Nn) ?
3.4 安装完成
0. Now start to check role configuration ...
1. Now start to install dos2unix/createrepo/expect and so on ...
2. Now start to check root password and do some initalization on all machines ...
3. Now start to check package requires ...
4. Now start to create OS user tbase ...
5. Now start to create yum repository ...
6. Now start to install Etcd ...
7. Now start to install all Confdbs ...
8. Now start to install Alarm Server ...
9. Now start to install all OssCenters ...
10. Now start to install all OssAgents ...
11. Now start to install default tbase_pgxz...
--[ 2020-12-01 22:39:21 ] install studio
12. Now start to install TStudio ...
13. Now start to write etcd keys ...
Successed to install TBase OSS, visit http://192.168.43.129:8080 to continue ...
Successed to install TStudio, visit http://192.168.43.129:5050
default account: postgres@postgres.com password: postgres
3.5 安装失败解决办法
在安装过程中如果遇到失败,先查看tbase_mgr/logs文件夹下的日志文件
[root@tbase01 logs]# cd /root/tbase_mgr/logs
[root@tbase01 logs]# ll
total 780
-rw------- 1 root root 784 Dec 1 22:30 authorized_keys.root
-rw-r--r-- 1 root root 392 Dec 1 22:30 authorized_keys.root.192.168.43.129
-rw-r--r-- 1 root root 392 Dec 1 22:30 authorized_keys.root.192.168.43.130
-rw-r--r-- 1 root root 524730 Dec 1 22:39 tbase_mgr.sh.info.log.2020-12-01
-rw-r--r-- 1 root root 38726 Dec 4 17:07 tbase_mgr.sh.info.log.2020-12-04
-rw-r--r-- 1 root root 70527 Dec 5 22:10 tbase_mgr.sh.info.log.2020-12-05
-rw-r--r-- 1 root root 4622 Dec 6 00:07 tbase_mgr.sh.info.log.2020-12-06
-rw-r--r-- 1 root root 133901 Dec 7 15:11 tbase_mgr.sh.info.log.2020-12-07
在提示查看的日志文件的末尾,有报错中的具体执行语句,手动执行即可显示详细的错误信息
解决后再次执行安装程序即可
[ ]
docker版本与内核冲突
如果docker版本与内核冲突,我们可以忽略安装docker,这样的Tstudio组件也要放弃安装
以下是修改方法:
编辑tbase_mgr.sh文件,修改以下内容:
1)把tstudio_install > /dev/null 2>&1注释掉
2)把yum install -y dos2unix expect lrzsz net-tools lsof createrepo httpd httpd-tools php docker中“docker”删除
如果一定要安装tstduio,找到当前内核版本下的docker,安装好后,再独立部署tstudio组件。
3.6 yum安装源恢复方法
[ ]
[ ]
[ ]
[ ]
所有机器都要恢复,如不再需要安装其它软件,可以不需要恢复
四
安装失败问题汇总
4.1 IDC参数写为hostname错误
[root@tbase01 tbase_mgr]# cat logs/tbase_mgr.sh.info.log.2020-09-25
[ 2020-09-25 11:08:33 ] [ read_machine_info ] [ERROR]: Agent must be specified, exit
[ 2020-09-25 11:08:33 ] [ check_machine_idc ] [ERROR]: The machine_ip in (tbase01), but the tbase01 is not exist.
[ 2020-09-25 11:08:33 ] [ check_machine_idc ] [ERROR]: Fail to check machine idc in idc_1:local, exit
[ 2020-09-25 11:08:33 ] [ check_progress ] [ERROR]: from [read_machine_info] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-09-25 for more details ...
解决方法
--这里idc_1在OSS配置文件中写死,不建议修改
查看tbase_mgr/conf/role.info配置文件里的idc配置,不是hostname
[root@tbase01 tbase_mgr]# cat conf/role.info
eth0 192.168.43.129 idc_1 root xxxxxxxx OssCenterMaster|Confdb|TStudio|Etcd
eth0 192.168.43.130 idc_1 root xxxxxxxx OssCenterSlave|Confdb|Alarm
4.2、安装日志报yum源问题
httpd权限配置问题
Other repos take up 737 M of disk space (use --verbose for details)
[ 2020-09-25 11:26:09 ] [ conf_yum_repo ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.199 "yum -y install dos2unix expect readline createrepo net-tools lsof uuid" < /dev/null, retcode: 1, retmsg: Loaded plugins: fastestmirror, langpacks
Determining fastest mirrors, exit
[ 2020-09-25 11:26:09 ] [ check_progress ] [ERROR]: from [conf_yum_repo] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-09-25 for more details ...
解决方法
1、通过执行日志中的命令,发现yum源因权限问题无法访问
http://192.168.43.199/repodata/repomd.xml: [Errno 14] HTTP Error 403 - Forbidde
2、修改/etc/httpd/conf/httpd.conf
将denied改为granted
AllowOverride none
Require all granted
安装包冲突问题
Error: Package: policycoreutils-python-2.5-11.el7_3.x86_64 (PGXZ)
Requires: policycoreutils = 2.5-11.el7_3
Installed: policycoreutils-2.5-29.el7.x86_64 (@anaconda)
policycoreutils = 2.5-29.el7
Available: policycoreutils-2.5-11.el7_3.x86_64 (PGXZ)
policycoreutils = 2.5-11.el7_3
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
解决方法
安装包冲突,需要删除原来的包
yum -y remove policycoreutils-2.5-29.el7.x86_64
4.3、 Etcd数量为偶数的问题
[ 2020-12-01 15:41:59 ] [parse_machine_role] [Info]: 192.168.43.130, OssCenterSlave|Confdb|Alarm|Etcd r=[Alarm]
[ 2020-12-01 15:41:59 ] [parse_machine_role] [Info]: 192.168.43.130, OssCenterSlave|Confdb|Alarm|Etcd r=[Etcd]
[ 2020-12-01 15:41:59 ] [ parse_machine_role ] [Info]: XZEtcds 192.168.43.130 r=[Etcd]
[ 2020-12-01 15:41:59 ] [ read_machine_info ] [ERROR]: Agent must be specified, exit
[ 2020-12-01 15:41:59 ] [ read_machine_info ] [ERROR]: The number of Etcd must be a odd number. etcd_cnt=2, etcd_server=192.168.43.129
[ 2020-12-01 15:41:59 ] [ check_progress ] [ERROR]: from [read_machine_info] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-12-01 for more details ...
解决方案
查看role.info,发现Etcd配置了两个,必须为奇数
[root@tbase01 tbase_mgr]# cat conf/role.info
ens33 192.168.43.129 cdh root oracle OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 cdh02 root oracle OssCenterSlave|Confdb|Alarm|Etcd --删掉Etcd组件
4.4、root密码错误
[ 2020-12-01 15:51:32 ] [ selinux_policy_off ] [INFO]: spawn_pswd=oracle1
[ 2020-12-01 15:51:32 ] [ selinux_policy_off ] [INFO]: after spawn_pswd=oracle1
[ 2020-12-01 15:51:34 ] [ selinux_policy_off ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "setenforce 0;iptables -I INPUT -j ACCEPT;service iptables save", retcode: 3, exit
解决方案
查看role.info,发现root密码错误
[root@tbase01 tbase_mgr]# cat conf/role.info
ens33 192.168.43.129 idc_1 root oracle1 OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 idc_1 root oracle2 OssCenterSlave|Confdb|Alarm
4.5、网卡参数错误
[ 2020-12-01 16:00:46 ] [ check_net_eth ] [ERROR]: 192.168.43.129 not match 1ens33
解决方案
查看role.info,发现网络端口不存在
[root@tbase01 tbase_mgr]# cat conf/role.info
1ens33 192.168.43.129 idc_1 root oracle1 OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.130 idc_1 root oracle2 OssCenterSlave|Confdb|Alarm
4.6、IP错误
测试发现,这步不影响YUM安装软件
[ 2020-12-01 16:04:33 ] [ selinux_policy_off ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.131 "sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config", retcode: 5, exit
解决方案
查看role.info,发现IP
[root@tbase01 tbase_mgr]# cat conf/role.info
ens33 192.168.43.129 idc_1 root oracle OssCenterMaster|Confdb|TStudio|Etcd
ens33 192.168.43.131 idc_1 root oracle OssCenterSlave|Confdb|Alarm
4.7、不进入 tbase_mgr目录执行脚本
sh tbase_mgr/tbase_mgr.sh install
日志报错:This system is not registered with an entitlement server. You can use subscription-manager to register.Determining fastest mirrors, exit
解决方法
[ ]
[ ]
/root/Centos7/tbase_mgr
[ ]
4.8、docker安装失败
[10 -o StrictHostKeyChecking=no root@192.168.43.129 "service docker start && sleep 10 && docker info" < /dev/null, retcode: 1, retmsg: , exit ] [ tstudio_install ] [ERROR]: Failed to exec: ssh -o ConnectTimeout=
[from [tstudio_install] Abort progress due to an error occurs during step above, check logs/tbase_mgr.sh.info.log.2020-12-01 for more details ... ] [ check_progress ] [ERROR]:
解决
测试上述命令,发现docker进程启动失败
[root@tbase01 tbase_mgr]# ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "service docker start && sleep 10 && docker info" < /dev/null
Redirecting to /bin/systemctl start docker.service
Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
[root@cdh tbase_mgr]# systemctl status docker.service
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2020-12-01 20:30:19 CST; 11s ago
Docs: http://docs.docker.com
Process: 5490 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY (code=exited, status=1/FAILURE)
Main PID: 5490 (code=exited, status=1/FAILURE)
Dec 01 20:30:17 cdh systemd[1]: Starting Docker Application Container Engine...
Dec 01 20:30:17 cdh dockerd-current[5490]: time="2020-12-01T20:30:17.953932215+08:00" level=info msg="libcontainerd: new containerd process, pid: 5497"
Dec 01 20:30:18 cdh dockerd-current[5490]: time="2020-12-01T20:30:18.965695547+08:00" level=warning msg="devmapper: Usage of loopback devices ...section."
Dec 01 20:30:19 cdh dockerd-current[5490]: time="2020-12-01T20:30:19.006156809+08:00" level=warning msg="devmapper: Base device already exists...ignored."
Dec 01 20:30:19 cdh dockerd-current[5490]: time="2020-12-01T20:30:19.021358030+08:00" level=fatal msg="Error starting daemon: error initializi...DRIVER>)"
Dec 01 20:30:19 cdh systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Dec 01 20:30:19 cdh systemd[1]: Failed to start Docker Application Container Engine.
Dec 01 20:30:19 cdh systemd[1]: Unit docker.service entered failed state.
Dec 01 20:30:19 cdh systemd[1]: docker.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
journalctl -xe
--详细日志
-- Unit docker-storage-setup.service has begun starting up.
Dec 01 20:38:48 cdh docker-storage-setup[14921]: ERROR: Docker has been previously configured for use with devicemapper graph driver. Not creating a new t
Dec 01 20:38:48 cdh systemd[1]: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Dec 01 20:38:48 cdh docker-storage-setup[14921]: INFO: Docker state can be reset by stopping docker and by removing /var/lib/docker directory. This will d
Dec 01 20:38:48 cdh systemd[1]: Failed to start Docker Storage Setup.
-- Subject: Unit docker-storage-setup.service has failed
Dec 1 20:28:01 cdh docker-storage-setup: ERROR: Docker has been previously configured for use with devicemapper graph driver. Not creating a new thin pool as existing docker metadata will fail to work with it. Manual cleanup is required before this will succeed.
Dec 1 20:28:01 cdh docker-storage-setup: INFO: Docker state can be reset by stopping docker and by removing /var/lib/docker directory. This will destroy existing docker images and containers and all the docker metadata.
Dec 1 20:28:01 cdh systemd: docker-storage-setup.service: main process exited, code=exited, status=1/FAILURE
Dec 1 20:28:01 cdh systemd: Failed to start Docker Storage Setup.
Dec 1 20:28:01 cdh systemd: Unit docker-storage-setup.service entered failed state.
根据上述描述,发现是/var/lib/docker/已经存在,删除就可以。这个情况是我之前安装使用过DOCKER导致的
tbase_mgr]# rm -rf /var/lib/docker/
tbase_mgr]#
tbase_mgr]# ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@192.168.43.129 "service docker start && sleep 10 && docker info" < /dev/null
Redirecting to /bin/systemctl start docker.service
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-253:0-69112218-pool
Pool Blocksize: 65.54 kB