So over the last couple of weeks, I have been hitting an issue several times. I have jobs, with 1 or 2 tests within, running on docker containers (chrome, firefox) on AWS instances.
This has run pretty much flawlessly until recently where the jobs are failing and it causes the instance to crash and go offline/disconnected state.
I have done a bunch of troubleshooting on my end, for example running through every single test and making sure there are no “auto-heal” occurrences that seem to cause timeouts here and there.
I am also getting jobs crashing that have had zero issues over the last year, which are now all of a sudden hitting this issue. I am posting a couple of screenshots of that most recent failure from this morning. This particular job is one that has worked flawlessly this entire year.
I would love to know how to fix this for consistent test runs.
The normal duration is under 3 minutes.
If I can provide more info please let me know.