Playbooks 使用指南
1.主机与用户
在yml文件中hosts指定主机组或者满足主机的patterns,以逗号分隔;
remote_user指定以远程用户执行;sudo指定远程用户使用sudo权限执行命令
注意:
在每一个task中也可以定义自己的远程用户.也可以在一个task中使用sudo,而全局不使用sudo
---
- hosts: webservers
remote_user: pe
vars:
http_port: 80
max_clients: 200
# sudo: yes
tasks:
- name:测试连通性
ping:
sudo: yes
- name:重启nginx服务{{ http_port }} #在全局定义了vars变量之后,可以在任何一地方进行引用
template: src=/srv/nginx.j2 dest=/etc/nginx.conf #同步nginx配置文件
#service: name=nginx state=restarted
sudo: yes #在子任务中使用sudo
sudo_user:supdev #使用sudo去切换到其他用户执行
notify: #当检测文件被修改之后执行下面的语句
- restart nginx #改restart nginx语句被在最后的handlers中定义
- name:修改selinux配置
command: /sbin/setenforce 0
shell: /usr/bin/somecommand || /bin/true #如果成功执行命令的返回码不是0,可以这样做
ignore_errors: True #上面的shell模块执行也可以使用该参数
- name: 拷贝文件
copy: src=/etc/ansible/hosts dest=/etc/ansible/hosts
owner=root group=root mode=0644
handlers:
- name: restart nginx
service: name=nginx state=restarted
注意:
如果使用sudo时需要指定密码,可以在运行的ansible-playbook
命令时加上ask-sudo-pass
2.Tasks 列表
注意:
1.每一个play中包含了一个task列表,一个task在其对应的所有主机执行完毕之后,下一个task才会执行;
2.在运行playbooks时是按照从上到下的顺序进行的,如果一个hosts执行task失败,这个hosts将会从整个playbook的rotation中移除.
3.每个task的目标在于执行一个module,通常是带有特定的参数来执行,在参数中可以使用变量(variables)
Exapmle:shell,command,user,template(copy),service,yum等模块,后面接对应模块的一些参数
4.每个task必须有一个name,这样在运行时,可以很好的辨别每个task执行的详细信息
3.Handlers在发生改变时执行的操作
一个task中定义了了配置文件的更改,当notify模块检测到文件有改动之后执行handlers中的操作
- name: template configuration file
template: src=template.j2 dest=/etc/foo.conf
notify:
- restart memcached
- restart apache
handlers:
- name: restart memcached
service: name=memcached state=restarted
- name: restart apache
service: name=apache state=restarted
注意:
handlers会按照生命的顺序来执行。Handler最佳的应用场景就是用来重启服务,或者触发系统重启
4.运行一个playbook
ansible-playbook playbooks.yml -f 10 #并行运行ansible,并行级别为10
5.使用Ansible-Pull (拉取配置)
Ansible-pull 是一个小脚本,它从 git 上 checkout 一个关于配置指令的 repo,然后以这个配置指令来运行 ansible-playbook.
6.奇银技巧
在使用playbooks过程中,如果你想看到执行成功的 modules 的输出信息,使用 –verbose flag(否则只有执行失败的才会有输出信息
在执行一个 playbook 之前,想看看这个 playbook 的执行会影响到哪些 hosts,你可以这样做:
ansible-playbook playbook.yml --list-hosts
附录:Playbooks案例分析
1.使用playbooks进行应用jvm相关调整
目录结构:
sh-4.1$ tree
.
├── playbooks.yml
├── start.sh.j2
├── stop.sh.j2
└── vars.yml
Playbooks.yml配置
---
#file: playbooks.yml
- hosts: local
# remote_user: pe
# sudo: yes
vars:
service: " Nginx服务"
vars_files:
- vars.yml
tasks:
- name: "{{ service }}测试联通性 {{ ansible_date_time.iso8601 }} "
ping:
- name: 更新tomcat启动配置
remote_user: pe
sudo: yes
template:
# src: "start.sh.j2"
# dest: "/tmp/start{{ ansible_date_time.iso8601_basic }}.sh"
# src: "stop.sh.j2"
# dest: "/tmp/stop{{ ansible_date_time.iso8601_basic }}.sh"
src: "{{ item.src }}"
dest: "{{ item.dest }}"
owner: admin
group: admin
mode: 0755
with_items:
- { src: "start.sh.j2", dest: "/tmp/start{{ ansible_date_time.iso8601 }}.sh" }
- { src: "stop.sh.j2", dest: "/tmp/stop{{ ansible_date_time.iso8601 }}.sh" }
变量定义文件vars.yml
---
#定义tomcat_version
tomcat_version: tomcat6.0.33
#定义jdk_version
jdk_version: jdk1.6.0_25
#定义app_name
app_name: xxbandy.test.local
#定义server_id
server_id: 1
star.sh模板文件
#!/bin/bash
#chown 555 -R /export/home/tomcat/domains/
export CATALINA_HOME=/export/servers/{{ tomcat_version }}
export CATALINA_BASE=/export/Domains/{{ app_name }}/server{{ server_id }}
export CATALINA_PID=$CATALINA_BASE/work/catalina.pid
export LANG=zh_CN.UTF-8
###JAVA
export JAVA_HOME=/export/servers/{{ jdk_version }}
export JAVA_BIN=/export/servers/{{ jdk_version }}/bin
export PATH=$JAVA_BIN:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
export JAVA_OPTS="-Djava.library.path=/usr/local/lib -server -Xms2048m -Xmx2048m -XX:MaxPermSize=512m -XX:+UnlockExperimentalVMOptions -Djava.awt.headless=true -Dsun.net.client.defaultConnectTimeout=60000 -Dsun.net.client.defaultReadTimeout=60000 -Djmagick.systemclassloader=no -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.ttl=300 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$CATALINA_BASE/logs -XX:ErrorFile=$CATALINA_BASE/logs/java_error_%p.log"
export JAVA_HOME JAVA_BIN PATH CLASSPATH JAVA_OPTS
$CATALINA_HOME/bin/startup.sh -config $CATALINA_BASE/conf/server.xml
2.使用playbooks进行docker监控客户端telegraf配置更新
telegraf.conf模板文件:
[global_tags]
dc = "bigdata-1"
# dc = "us-east-1" # will tag all metrics with dc=us-east-1
# rack = "1a"
## Environment variables can be used as tags, and throughout the config file
# user = "$USER"
[agent]
## Default data collection interval for all inputs
#采集间隔时间
interval = "10s"
## Rounds collection interval to 'interval'
## ie, if interval="10s" then always collect on :00, :10, :20, etc.
#采用轮询时间间隔
round_interval = true
## Telegraf will send metrics to outputs in batches of at
## most metric_batch_size metrics.
#每次发送到output的度量大小
metric_batch_size = 1000
## For failed writes, telegraf will cache metric_buffer_limit metrics for each
## output, and will flush this buffer on a successful write. Oldest metrics
## are dropped first when this buffer fills.
#为每一个output 设置缓存
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
#设置收集抖动时间,防止多个采集源数据同一时间都在队列
collection_jitter = "0s"
## Default flushing interval for all outputs. You shouldn't set this below
## interval. Maximum flush_interval will be flush_interval + flush_jitter
#默认所有数据flush到outputs的时间(最大能到flush_interval + flush_jitter)
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
# flush的抖动时间
flush_jitter = "0s"
## By default, precision will be set to the same timestamp order as the
## collection interval, with the maximum being 1s.
## Precision will NOT be used for service inputs, such as logparser and statsd.
## Valid values are "ns", "us" (or "µs"), "ms", "s".
precision = ""
## Run telegraf in debug mode
debug = false
## Run telegraf in quiet mode
quiet = false
## Override default hostname, if empty use os.Hostname()
hostname = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
[[outputs.influxdb]]
## The full HTTP or UDP endpoint URL for your InfluxDB instance.
## Multiple urls can be specified as part of the same cluster,
## this means that only ONE of the urls will be written to each interval.
# urls = ["udp://localhost:8089"] # UDP endpoint example
urls = ["http://10.0.0.1:8086"] # required
## The target database for metrics (telegraf will create it if not exists).
database = "bigdata" # required
## Retention policy to write to. Empty string writes to the default rp.
retention_policy = ""
## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
write_consistency = "any"
## Write timeout (for the InfluxDB client), formatted as a string.
## If not provided, will default to 5s. 0s means no timeout (not recommended).
timeout = "5s"
# username = "telegraf"
# password = "metricsmetricsmetricsmetrics"
## Set the user agent for HTTP POSTs (can be useful for log differentiation)
# user_agent = "telegraf"
## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
# udp_payload = 512
## Optional SSL Config
# ssl_ca = "/etc/telegraf/ca.pem"
# ssl_cert = "/etc/telegraf/cert.pem"
# ssl_key = "/etc/telegraf/key.pem"
## Use SSL but skip chain & host verification
# insecure_skip_verify = false
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## Comment this line if you want the raw CPU time metrics
fielddrop = ["time_*"]
[[inputs.disk]]
## By default, telegraf gather stats for all mountpoints.
## Setting mountpoints will restrict the stats to the specified mountpoints.
mount_points = ["/export"]
fieldpass = ["inodes*"]
## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
## present on /run, /var/run, /dev/shm or /dev).
## By default, telegraf will gather stats for all devices including
## disk partitions.
## Setting devices will restrict the stats to the specified devices.
# devices = ["sda", "sdb"]
## Uncomment the following line if you need disk serial numbers.
# skip_serial_number = false
# no configuration
[[inputs.mem]]
# no configuration
# no configuration
# no configuration
# no configuration
[[inputs.docker]]
endpoint = "tcp://127.0.0.1:5256"
container_names = []
配置以及重启telegraf的playbook文件:
---
#file: playbooks.yml
- hosts: bigdata
remote_user: root
vars:
service: "dockers telegraf update"
tasks:
- name: "{{ service }}测试联通性 {{ ansible_date_time.iso8601 }} "
ping:
tasks:
- name: "{{ service }} 更新配置文件"
template:
src: "telegraf.j2"
dest: "/etc/telegraf/telegraf.conf"
notify: restart telegraf
handlers:
- name: restart telegraf
service: name=telegraf state=restarted
执行结果:
sh-4.2# ansible-playbook telegraf.yml
[WARNING]: While constructing a mapping from /export/ansible/telegraf.yml, line 3, column 3, found a duplicate dict key (tasks). Using last
defined value only.
PLAY [bigdata] *****************************************************************
TASK [setup] *******************************************************************
ok: [10.0.0.1]
ok: [10.0.0.2]
ok: [10.0.0.3]
ok: [10.0.0.4]
ok: [10.0.0.5]
TASK [dockers telegraf update 更新配置文件] ******************************************
ok: [10.0.0.1]
ok: [10.0.0.2]
ok: [10.0.0.3]
ok: [10.0.0.4]
ok: [10.0.0.5]
PLAY RECAP *********************************************************************
10.0.0.1 : ok=2 changed=0 unreachable=0 failed=0
10.0.0.2 : ok=2 changed=0 unreachable=0 failed=0
10.0.0.3 : ok=2 changed=0 unreachable=0 failed=0
10.0.0.4 : ok=2 changed=0 unreachable=0 failed=0
10.0.0.5 : ok=2 changed=0 unreachable=0 failed=0
因为配置文件是没有改动过的,因此不会触发后面的restart telegraf操作