So I am running a script on aws batch that fetches 100,000+ domains.
- The script runs in a docker container.
- The domains are randomized and in a redis queue.
- The script pulls 20,000 domains from the queue, processes them and then writes results to redis again.
When I run the script on an EC2 instance that I launched I get.
AMI: Custom
EBS Optimized: True
Time Elapsed : 2007.7330884933472
Good Domains : 53517
Processed Domains: 240000
When I run it on an EC2 instance spun up by batch I get:
AMI: Amazon Linux AMI 2017.03.e x86_64 ECS HVM
EBS Optimized: False
Time Elapsed : 2313.34757232666
New Domains : 51243
Processed Domains: 400000
Are the instance's that AWS batch spins up throttling my connection? Because I am using Docker I just can't think of why the results would be different other than bandwidth issues. The Docker image is stored in ECS and pulled down and then the script run.
I have run this test over millions of randomized domains and the results are the same so statistically speaking its not to do with the sampling of domains either. Also the good domain rate is 2x on an instance I spin up that vs batch.
UPDATE 1: Differences, EBS is True on one and False on the other. But I can't seem to change that with AWS batch.
UPDATE 2: Tested EBS False for a machine with lower specs and still can't account for lower network performance. Maybe its the AMI?
UPDATE 3: I tested the AMI and that could be the problem ami-c6f81abe used on batch. Not sure why yet.
UPDATE 4: Turns out it was ulimit parameter on the jobDefinition that was causing my problem.