我最近一直在使用Swap根卷方法来创建持久的竞价实例,如
here所述(方法2).通常,我的竞价实例需要2-5分钟才能完成,并且需要完成交换.然而,有些日子,这个过程永远不会结束(或者至少在等待20分钟到一个小时之后我会感到不耐烦!).
要明确的是,实例已创建,但Swap永远不会发生:我可以ssh到服务器但我的持久性文件不存在.我也可以通过访问我的AWS控制台并注意到“spotter”(我的持久存储)没有附件信息来看到这一点:
由于我使用的交换脚本从未给我任何错误,因此很难看出失败的原因.所以,我想知道如果根据我的屏幕截图,我可以使用AWS EC2管理控制台“手动”执行交换,如果是,我将如何实现此目的.
并且,如果它有助于@Vorsprung,
我通过运行以下脚本来启动该过程:
# The config file was created in ondemand_to_spot.sh
export config_file=my.conf
cd "$(dirname ${BASH_SOURCE[0]})"
. ../$config_file || exit -1
export request_id=`../ec2spotter-launch $config_file`
echo Spot request ID: $request_id
echo Waiting for spot request to be fulfilled...
aws ec2 wait spot-instance-request-fulfilled --spot-instance-request-ids $request_id
export instance_id=`aws ec2 describe-spot-instance-requests --spot-instance-request-ids $request_id --query="SpotInstanceRequests[*].InstanceId" --output="text"`
echo Waiting for spot instance to start up...
aws ec2 wait instance-running --instance-ids $instance_id
echo Spot instance ID: $instance_id
echo 'Please allow the root volume swap script a few minutes to finish.'
if [ "x$ec2spotter_elastic_ip" = "x" ]
then
# Non elastic IP
export ip=`aws ec2 describe-instances --instance-ids $instance_id --filter Name=instance-state-name,Values=running --query "Reservations[*].Instances[*].PublicIpAddress" --output=text`
else
# Elastic IP
export ip=`aws ec2 describe-addresses --allocation-ids $ec2spotter_elastic_ip --output text --query 'Addresses[0].PublicIp'`
fi
export name=fast-ai
if [ "$ec2spotter_key_name" = "aws-key-$name" ]
then function aws-ssh-spot {
ssh -i ~/.ssh/aws-key-$name.pem ubuntu@$ip
}
function aws-terminate-spot {
aws ec2 terminate-instances --instance-ids $instance_id
}
echo Jupyter Notebook -- $ip:8888
fi
my.conf的位置是:
# Name of root volume.
ec2spotter_volume_name=spotter
# Location (zone) of root volume. If not the same as ec2spotter_launch_zone,
# a copy will be created in ec2spotter_launch_zone.
# Can be left blank, if the same as ec2spotter_launch_zone
ec2spotter_volume_zone=us-west-2b
ec2spotter_launch_zone=us-west-2b
ec2spotter_key_name=aws-key-fast-ai
ec2spotter_instance_type=p2.xlarge
# Some instance types require a subnet to be specified:
ec2spotter_subnet=subnet-c9cba8af
ec2spotter_bid_price=0.55
# uncomment and update the value if you want an Elastic IP
# ec2spotter_elastic_ip=eipalloc-64d5890a
# Security group
ec2spotter_security_group=sg-2be79356
# The AMI to be used as the pre-boot environment. This is NOT your target system installation.
# Do Not Modify this unless you have a need for a different Kernel version from what's supplied.
# ami-6edd3078 is ubuntu-xenial-16.04-amd64-server-20170113
ec2spotter_preboot_image_id=ami-bc508adc
和ec2spotter启动脚本是:
#!/bin/bash
# "Phase 1" this is the user-facing script for launching a new spot istance
if [ "$1" = "" ]; then echo "USER ERROR: please specify a configuration file"; exit -1; fi
cd $(dirname $0)
. $1 || exit -1
# New instance:
# Desired launch zone
LAUNCH_ZONE=$ec2spotter_launch_zone
# Region is LAUNCH_ZONE minus the last character
LAUNCH_REGION=$(echo $LAUNCH_ZONE | sed -e 's/.$//')
PUB_KEY=$ec2spotter_key_name
# Existing Volume:
# If no volume zone
if [ "$ec2spotter_volume_zone" = "" ]
then # Use instance zone
ec2spotter_volume_zone=$LAUNCH_ZONE
fi
# Name of volume (find it by name later)
ROOT_VOL_NAME=$ec2spotter_volume_name
# zone of volume (needed if different than instance zone)
ROOT_ZONE=$ec2spotter_volume_zone
# Region is Zone minus the last character
ROOT_REGION=$(echo $ROOT_ZONE | sed -e 's/.$//')
#echo "ROOT_VOL_NAME=${ROOT_VOL_NAME}; ROOT_ZONE=${ROOT_ZONE}; ROOT_REGION=${ROOT_REGION}; "
#echo "LAUNCH_ZONE=${LAUNCH_ZONE}; LAUNCH_REGION=${LAUNCH_REGION}; PUB_KEY=${PUB_KEY}"
AWS_ACCESS_KEY=`aws configure get aws_access_key_id`
AWS_SECRET_KEY=`aws configure get aws_secret_access_key`
aws ec2 describe-volumes \
--filters Name=tag-key,Values="Name" Name=tag-value,Values="$ROOT_VOL_NAME" \
--region ${ROOT_REGION} --output=json > volumes.tmp || exit -1
ROOT_VOL=$(jq -r '.Volumes[0].VolumeId' volumes.tmp)
ROOT_TYPE=$(jq -r '.Volumes[0].VolumeType' volumes.tmp)
#echo "ROOT_TYPE=$ROOT_TYPE; ROOT_VOL=$ROOT_VOL";
if [ "$ROOT_VOL_NAME" = "" ]
then
echo "root volume lacks a Name tag";
exit -1;
fi
cat >user-data.tmp <<EOF
#!/bin/sh
echo AWSAccessKeyId=$AWS_ACCESS_KEY > /root/.aws.creds
echo AWSSecretKey=$AWS_SECRET_KEY >> /root/.aws.creds
apt-get update
apt-get install -y jq
apt-get install -y python-pip python-setuptools
apt-get install -y git
pip install awscli
cd /root
git clone --depth=1 https://github.com/slavivanov/ec2-spotter.git
echo Got spotter scripts from github.
cd ec2-spotter
echo Swapping root volume
./ec2spotter-remount-root --force 1 --vol_name ${ROOT_VOL_NAME} --vol_region ${ROOT_REGION} --elastic_ip $ec2spotter_elastic_ip
EOF
userData=$(base64 user-data.tmp | tr -d '\n');
cat >specs.tmp <<EOF
{
"ImageId" : "$ec2spotter_preboot_image_id",
"InstanceType": "$ec2spotter_instance_type",
"KeyName" : "$PUB_KEY",
"EbsOptimized": true,
"Placement": {
"AvailabilityZone": "$LAUNCH_ZONE"
},
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"VolumeType": "gp2",
"VolumeSize": 128
}
}
],
"NetworkInterfaces": [
{
"DeviceIndex": 0,
"SubnetId": "${ec2spotter_subnet}",
"Groups": [ "${ec2spotter_security_group}" ],
"AssociatePublicIpAddress": true
}
],
"UserData" : "${userData}"
}
EOF
SPOT_REQUEST_ID=$(aws ec2 request-spot-instances --launch-specification file://specs.tmp --spot-price $ec2spotter_bid_price --output="text" --query="SpotInstanceRequests[*].SpotInstanceRequestId" --region ${LAUNCH_REGION})
echo $SPOT_REQUEST_ID
# Clean up
rm user-data.tmp
rm specs.tmp
rm volumes.tmp
最佳答案 这不是一个确切的答案,但它可以帮助您找到调试问题的方法.
据我了解,这是您的设置的一部分是在ec2spotter启动脚本负责卷交换:
...
cat >specs.tmp <<EOF
{
"ImageId" : "$ec2spotter_preboot_image_id",
...
"UserData" : "${userData}"
}
EOF
SPOT_REQUEST_ID=$(aws ec2 request-spot-instances --launch-specification file://specs.tmp --spot-price $ec2spotter_bid_price --output="text" --query="SpotInstanceRequests[*].SpotInstanceRequestId" --region ${LAUNCH_REGION})
specs.tmp用作实例启动规范:–launc-specification file :: //specs.tmp.
启动规范中的“UserData”是一个脚本,它也是在es2spotter-launch中生成的:
cat >user-data.tmp <<EOF
#!/bin/sh
echo AWSAccessKeyId=$AWS_ACCESS_KEY > /root/.aws.creds
echo AWSSecretKey=$AWS_SECRET_KEY >> /root/.aws.creds
apt-get update
...
cd /root
git clone --depth=1 https://github.com/slavivanov/ec2-spotter.git
echo Got spotter scripts from github.
cd ec2-spotter
echo Swapping root volume
./ec2spotter-remount-root --force 1 --vol_name ${ROOT_VOL_NAME} --vol_region ${ROOT_REGION} --elastic_ip $ec2spotter_elastic_ip
EOF
交换根卷的实际工作由ec2spotter-remount-root脚本执行,该脚本为downloaded from github.
该脚本中有许多echo语句,所以我想如果你找到输出的位置,你就能理解错误了.
因此,当您遇到问题时,您将ssh到实例并检查日志文件.
问题是要检查的文件(以及脚本输出是否正在记录到某个文件中).
以下是我建议尝试的内容:
>检查实例启动时生成的/ var / log下的标准日志(cloud-init.log,syslog等),看看是否可以找到ec2spotter-remount-root输出
>尝试自己启用日志记录,类似于here
我会尝试以这种方式修改es2spotter-launch中的user-data.tmp部分:
#!/bin/bash
set -x
exec > >(tee /var/log/user-data.log|logger -t user-data ) 2>&1
echo AWSAccessKeyId=$AWS_ACCESS_KEY > /root/.aws.creds
...
echo Swapping root volume
./ec2spotter-remount-root --force 1 --vol_name ${ROOT_VOL_NAME} --vol_region ${ROOT_REGION} --elastic_ip $ec2spotter_elastic_ip
EOF
在这里,我更改了前三行以启用登录/var/log/user-data.log.
>如果1和2不起作用,我会尝试在github上询问脚本作者.由于脚本中有很多回声,作者应该知道在哪里查找输出.
希望有所帮助,您也不需要等待问题出现尝试这一点,而是在成功运行时查找脚本输出.
或者,如果您能够进行少量测试运行,那么请执行此操作并确保您可以使用脚本输出查找日志.