I would like to know a system by which I can keep track of multiple aws accounts, somewhere around 130+ accounts with each account containing around 200+ servers.
I wanna know methods to keep track of machine failure, service failure etc.
I also wanna know methods by which I can automatically turn up a machine if the underlying hardware failed or the machine terminated while on spot.
I'm open to all solutions including chef/terraform automation, healing scripts etc.
You guys will be saving me a lot of sleepless nights :)
Thanks in advance!!
Manage multiple aws accounts
506 views Asked by Shardool Singh At
2
There are 2 answers
0
Tim
On
AWS Organisations are useful for management. You can also look at multiple account billing strategy and security strategy. A shared services account with your IAM users will make things easier.
Regarding tracking failures you can set up automatic instance recovery using CloudWatch. CloudWatch can also have alerts defined that will email you when something happens you don't expect, though setting them up individually could be time consuming. At your scale I think you should look into third party tools.
Related Questions in AMAZON-WEB-SERVICES
- "Access Denied" - User's Permissions to S3 Bucket
- Cohort analysis with Amazon Redshift / PostgreSQL
- Using Amazon KMS service on Heroku
- can't ssh in after cloning an EC2 instance on Amazon AWS
- Using HDFS with Apache Spark on Amazon EC2
- How can I access Mule ESB Community edition via browser?
- AWS EC2: Migrating from Windows to Linux Server
- AWS ELB Load Balancer: is it possible to set multiple session cookies?
- AWS Flow Framework: Can we run activity worker and activity task on different EC2 instances
- Unable to access files from public s3 bucket with boto
- Cloudfront stream only part of the video
- s3cmd not working as cron-task when echos/dates are added
- How to deploy django 1.8 on Elastic Beanstalk using Docker
- InstanceProfile is required for creating cluster - create python function to install module
- How to fix WordPress HTTPS issues when behind an Amazon Load Balancer?
Related Questions in AUTOMATION
- Installing Teamcity build agent as a user: failed to install the service. selected account does not have enough rights
- Automating Telnet Scripts from .bat with a teamspeak instance
- schedule and automate sqoop import/export tasks
- Dynamic @Test generation in TestNG
- detecting a file downloaded in selenium java
- Can I automate auto-app installation on my Android device?
- C# Program automation - Program hangs/doesn't work
- Saving Excel workbook as PDF gives me an OLE error 800A03EC
- Appium-How to send SMS for login verification purpose during automation test
- How to maximize browser window with helium using Java?
- Appium iOS automation using Java : get element using accessibility Id?
- Looking to run automated jobs in .NET application
- How to click the back navigation button of the browser using helium?
- Firefox automatically choose certificate, without ui dialog
- Test class not found in selected project
Related Questions in MONITORING
- How to get raw hadoop metrics
- Nodejs ZMQ monitoring sockets
- Ambari Monitoring raw data
- Monitoring an applications performance within Visual studio
- how to monitor mesos frameworks
- PDF report in zabbix 2.2.9
- See data that an app is secretly sending to web server in the background
- Monitor Hadoop Cluster using Collectl
- Questions Nagios Monitoring
- NewRelic says "No data reporting for this application"
- How to monitor API calls on EC2?
- IIS Monitoring Through Zabbix
- Hadoop and Spark Monitoring and alert tool(Open source tools)
- Centreon/Icinga: command by services
- New Relic not logging custom parameters on transactions
Related Questions in FAILOVER
- How to start distributed Erlang app without starting dependencies at every node?
- Memcache cluster failover replication
- How to handle Activemq's max frame size exception with failover transport
- HaProxy - group tcp and http hosts dependent of each other
- DNS SRV failover not working in asterisk
- Couchbase - what happens if a node dies after writing data to disk but before it gets replicated
- how to understand read preferences in mongo
- How to make a custom server run in a cluster
- ActiveMQ fail-over of producer and consumer with a shared directory doesn't happen
- Error while duplicating Oracle database after failover, using 11.2.0.4
- MongoDB SDK Failover not working
- Best practise how to sync a SQL Server database
- Infinispan failover capability
- WSO2 indirect Endpoint inside Failover Group
- Why can't my PHP 5.4 webserver speak to my multi-subnet failover cluster using SQL Always On?
Related Questions in SELF-HEALING
- how to set-up liveness and readiness probes for Celery worker pods
- Apache spark job failed immediately without retry, setting maxFailures doesn't work
- checkpointing in python to catch the runtime state
- Linux self-healing script to check some process
- Is probing of a Pod retried after a readiness probe fails
- WARN com.epam.healenium.client.RestClient - Failed to make response of 'getLastHealingData' request
- Biztalk exception- self healing orchestartion
- is docker-compose a self healing orchestrator?
- OO Patterns and/or Structured Approach for "Self Healing"?
- Healenium Implementation for Javascript WDIO
- What language features eliminate a whole class of errors?
- involuntary disruptions / SIGKILL handling in microservice following saga pattern
- self healing in centralized logging
- Automatically turn a string into a formatted string? (Python)
- Consul watch with critical consul checks
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
This is purely my take on implementing your problem statement.
1) Well.. for managing and keeping track of multiple aws accounts you can use AWS Organization. This will help you manage centrally with one root account all the other 130+ accounts. You can enable consolidated billing as well.
2) As far as keeping track of failures... you may need to customize this according to your requirements. For example: You can build a micro service on top of
docker containers or ecswhose sole purpose is to keep track of failures, generate a report and push tos3on a daily basis.You can further create a dashboard usingAWS quicksightout of this reports in S3.There can be another micro service which will rectify the failures. It just depends on how exhaustive and fine grained you want your implementation to be.
3) For spawning instances when spot instances are terminated, it can be achieved through you simple autoscaling configurations. Here are some of the articles you may want to go through which will give you some ideas:
Using Spot Instances with On-Demand instances
Optimizing Spot Fleet+Docker with High Availability