Learn how to set up different policies for departments sharing the same Office 365 tenant with our free webinar “Tailoring Microsoft Teams & Delegating Administration in Office 365!” on August 7th at 11:00 AM EST.
Why Does Throttling Happen?
Now that a good majority of organizations have at least a portion of their systems in the cloud, what does that mean for your Office 365 apps?
First, we need to understand how these new workloads are architected. At the end of the day, underneath all of that fancy “cloud talk,” data and applications still reside on a server somewhere. The big difference is that instead of managing your own servers—scaling up and down as needed—Office 365 is managed by Microsoft. The key goal of Microsoft’s engineers and systems is to make sure the end user does not experience any interruption or performance issues in those top layer applications; they do care the most about their customers, after all!
Another thing to consider is that throttling is normal and is merely a result of Office 365 ensuring that it can keep the system healthy and fast for its users.
Regardless of what entry-point in Office 365 we use (Graph, CSOM, etc.) those calls are turned into SQL queries or calls to lower-level infrastructure which quickly ramp up the CPU usage on that server.
These calls are unpredictable and hard to plan for on the database side since Office 365 doesn’t necessarily know when users are going to perform an action like a full backup or a security search that needs to touch almost every object in SharePoint. Office 365’s way of protecting itself is to “throttle” the traffic, which means stopping users from making these calls. These show up as failed calls (a.k.a. 429 errors or “Server too busy” messages). Servers can easily become unresponsive or even crash without this mechanism!
Though it’s typically the culprit, throttling does not only happen due to high call volume on your tenant. Another cause can be an overload of tenants sharing the same Office 365 infrastructure, an issue that can lead to crowded servers. For instance, what if another tenant on the same infrastructure is undergoing a migration while you’re attempting to run a backup? This is known as the “noisy neighbor problem.”
Best Practices to Prevent Throttling
As an ISV (independent software vendor), AvePoint has made many changes to our products to ensure best practices wherever possible in the new world of interacting with Office 365. Things such as App Profile, Dynamic Object Registration (DOR), etc. have all been taken into account. In addition, all of our calls are “decorated” with ISV tags to help Office 365 know where our traffic is coming from. (Note that decoration of calls is only valuable after you have been throttled and are troubleshooting logs so, while useful, this does not help much to avoid throttling itself.)
Most of our products also have built-in retry logic. This means that when Office 365 signals us to wait, our products will before trying again when they have the go-ahead (as per Microsoft’s best practice recommendations).
That being said, there are still a few things that remain configurable in our tools that can be proactively optimized to avoid throttling!
These best practices include:
- Off hours are key! This may seem like a no-brainer, but traffic is typically much higher during the working day as end users are interacting with the system.
- Limiting scope and settings (like the maximum number of versions being backed up or restored at one time) can drastically reduce our calls and lessen the chance of being throttled. Only act on the areas of the system that need it; try and cast a smaller net.
- Work with our Cloud Ops team! Our cloud operations team can reduce or cap the number of threads your jobs use to prevent you from overloading the Microsoft servers. Get acquainted with our team and see how we can help.
- Don’t immediately react with increasing the number of accounts talking to Office 365; try the app profile instead. If Microsoft is reporting these issues, that truly means their servers are busy; cramming in more calls won’t help the situation. Try to work with our Cloud Ops team to reduce the number of threads assigned to this tenant and prevent overload, or work with your engineer to set up an app profile if you haven’t yet.
These best practices should always be attempted before any escalation or “panic.” Microsoft provides some great guidance around what do in case of throttling, so always refer to that before taking any drastic measures.
Be Proactive About Throttling in Large Projects
Now that we have addressed what do after throttling has occurred, how can we be proactive about it?
The key is to have a process in place to address these challenges. You know when major projects are coming, so why not give Microsoft a heads up? This is helpful, not only as a best practice for planning a migration but also to ensure that you get the best performance possible.
Hopefully this clears up some confusion around throttling and how best to address it! Please do not hesitate to reach out in the comments if you have any questions!