K8 Agent crash: Failed to delete Agent identity

While executing a job in a K8 cluster using TestProject recommended yaml, everything works well until the final step of deleting the agent. This fails, resulting in a crash, as shown below. The problem with this, unless I missed something, is that it causes a Retry of the execution, 5 retries at ~1 minute intervals (the job itself runs a short 30 sec test). Then after a while another batch of 5 retries occur.

Is there anyway to control the retry logic and avoid retries considering the execution and reporting was successful? is this coming from kubernetes or TestProject?

2022-05-10 07:13:38.072 [ERROR] i.t.a.s.IdentityManager Failed to delete Agent identity at /var/testproject/agent/id.dat
`java.nio.file.NoSuchFileException: /var/testproject/agent/id.dat`

`at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)`

`at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)`

`at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)`

`at java.base/sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:249)`

`at java.base/sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:105)`

`at java.base/java.nio.file.Files.delete(Files.java:1145)`

`at io.testproject.agent.security.IdentityManager.b(TestProjectAgent:105)`

`at io.testproject.agent.security.IdentityManager.c(TestProjectAgent:357)`

`at io.testproject.agent.h.h(TestProjectAgent:514)`

`at io.testproject.agent.fsm.local.a.T.a(TestProjectAgent:66)`

`at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1807)`

`at java.base/java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1799)`

`at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)`

`at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1016)`

`at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1665)`

`at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1598)`

`at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:177)`

`2022-05-10 07:13:38.072 [INFO ] i.t.a.h                                  All modules stopped.`

`2022-05-10 07:13:38.073 [INFO ] i.t.a.f.l.d                              Agent state machine stopped.`

`2022-05-10 07:13:38.074 [INFO ] i.t.a.Program                            Agent will exit with code: 0`

`2022-05-10 07:13:38.074 [INFO ] i.t.a.Program                            *** AGENT MANAGER - STOP ***`
1 Like

Any updates on this? I’m moving to K8 soon, I would expect TestProject to not have this type of issues.

I have not heard anything back regarding this issue. BTW, this issue is true for TP Agent running in any container\docker.
I’ve worked around this in my EKS cluster by setting the Pod restartPolicy: Never to avoid restarts from K8 and this issue does not affect the test execution\reporting so in reality is a medium-low priority issue.

2 Likes