Skip to content

Conversation

@surajitshil-03
Copy link
Contributor

@surajitshil-03 surajitshil-03 commented Jan 13, 2026

Context

Adding a delay in the Agent listener code before retrying request when there is a retriable exception thrown from the server side.

Related work-item: AB#2338439


Description

When the server is unavailable (not an authentication error), the server returns a different exception (e.g., VssServiceResponseException with status code 404). The agent continuously retries the request indefinitely. Each retry invokes the OAuth token provisioning mechanism on the server (the server checks whether there is any cached token or not else it creates one). This behavior significantly increases load on an already unavailable server.
Hence we have increased the Backoff delay in the agent code before retrying the requests. We have implemented an exponential backoff delay which is capped at 5 minutes and all the changes are behind a Feature flag so that the execution can be controlled.


Risk Assessment (Low / Medium / High)

Low


Unit Tests Added or Updated (Yes / No)

Yes added L0 test cases for the delay behaviour in 3 methods CreateSessionAsync, GetNextMessageAsync and KeepAlive based on the state of the feature flag added.


Additional Testing Performed

Manually tested connecting the agent with devfabric and then stopping the tfs devfabric web service. The agent delayed the request based on the continuous error count.

For the main listener loop:

  • Exponential backoff delay for attempts<8:
image
  • When the attempt number>=8 the delay is capped at 5 minutes(300 seconds):
image

For the Keep Alive Call:

  • Exponential backoff delay for attempts<8:
image
  • When the attempt number>=8 the delay is capped at 5 minutes(300 seconds):
image

For Create Session Call:
[when there is session conflict]

  • Exponential backoff delay for attempts<8:
image
  • As soon as the retry limit is reached the session creation attempt will stop:
image

Change Behind Feature Flag (Yes / No)

No


Tech Design / Approach

This is done to reduce the load on the server where the agent continuously make requests to the server.


Documentation Changes Required (Yes/No)

No


Logging Added/Updated (Yes/No)

NA


Telemetry Added/Updated (Yes/No)

NA


Rollback Scenario and Process (Yes/No)

NA


Dependency Impact Assessed and Regression Tested (Yes/No)

NA

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03 surajitshil-03 changed the title Adding a delay in the listener when it retries to get connection Adding a delay in the listener when there is exception in the connection Jan 13, 2026
@surajitshil-03 surajitshil-03 marked this pull request as ready for review January 13, 2026 12:35
@surajitshil-03 surajitshil-03 requested review from a team as code owners January 13, 2026 12:35
@surajitshil-03 surajitshil-03 changed the title Adding a delay in the listener when there is exception in the connection Enhancing the delay in the listener when there is exception in the connection Jan 13, 2026
Copy link
Contributor

@rajmishra1997 rajmishra1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: For higher number of error response from server, can we add log/warning to display continuous error response received

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03 surajitshil-03 force-pushed the users/surajitshil/retryLogic branch from ad39d51 to 16a1ba3 Compare January 27, 2026 09:12
@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rishabhmalikMS
Copy link
Contributor

Please explore how we can simplify the currently spreaded logic of retries. If possible, we can have common function for calculating retry delay which we are having and multiple places in message listener right now.

@raujaiswal
Copy link
Contributor

raujaiswal commented Jan 29, 2026

we are not enabling FF to RM here in the PR , could you please check if do we need. https://dev.azure.com/mseng/AzureDevOps/_git/AzureDevOps/pullrequest/891094?_a=files

@surajitshil-03 surajitshil-03 force-pushed the users/surajitshil/retryLogic branch from 2ab1629 to 6be937b Compare February 2, 2026 04:32
@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03 surajitshil-03 force-pushed the users/surajitshil/retryLogic branch from b7977f4 to 05c42b5 Compare February 2, 2026 05:10
@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@surajitshil-03
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants