Monday, August 3, 2015
Customers occasionally contact Google Cloud Platform Support to ask for help with troubleshooting latency issues in a Google App Engine application. In this post, I'll discuss how I typically isolate the root cause of this type of problem.
I start by creating a dynamic script that only returns a short text string, and then add it to the the customer’s App Engine app so that it can be accessed through a known URL. For an example of such a page in Python, see the hello world tutorial.
Then, I run this curl command from a terminal window:
curl -s -o /dev/null -w "@curl-format.txt"
The curl command uses a format file to define its output. Here are contents of the format file. You need to create and save this file as curl-format.txt before you run curl:
The output will look something like this, showing latencies in milliseconds:
The value for time_connect generally represents the latency of the client’s connection to the nearest Google datacenter. If this connection is slow, you can troubleshoot further using traceroute to determine which hop on the network causes the delay, as packets traverse your ISP’s network and Google’s production network to reach the Google frontend server.
You can run tests from clients in different geographical locations. Google Cloud Platform will automatically route requests to the closest data center, which will vary based on the client’s location.
If packets reach the Google frontend server with acceptable latency, then you need to troubleshoot the source of latency problems within App Engine’s serving infrastructure or your application code or configuration.
Look at your logs for the corresponding request in the Google Developers Console. It may help to print out the time when you ran the curl command.
The key field is the wall clock time for the request. This value doesn't include time spent between the client and the server that's running your application. You can calculate the time that the request spent within App Engine's serving infrastructure before reaching your application: subtract the time to reach the Google frontend server from the wall clock time.
All App Engine applications are hosted in the United States, unless their app ID is prefixed by e~, which signifies that the application is hosted in Europe. If your client is in a different geographical region from your application, you will see a significant delay as packets traverse Google’s internal network between the Google frontend server and the server running your application. You will see this delay, for example, if your application is in the US and your client is in Europe or Asia. One of the advantages of hosting your application on App Engine is that this latency is usually significantly less than if you used the public Internet to route requests to an application in another region.
Assuming that your client is in the same geographical region as your application, you can expect the App Engine serving infrastructure to add negligible latency.
Here are some additional troubleshooting tips for isolating latency problems:
- Was the latency caused by the time to start up a new instance of your application? You will see these start-ups flagged as loading requests in the logs. Try running your tests with the default scheduler settings. In most cases, the default scheduler settings will provide an optimal tradeoff between cost and latency. If you make changes to these settings, run load tests to determine the impact. Also consider adding resident instances.
- Are you serving a static file or using the Blobstore API to serve the request? Both of these approaches use a serving path that doesn't run any of your application’s code. Run separate tests for latency in these cases. Use Google’s high performance image serving infrastructure to reduce latency.
- Do slow requests have a large response size, according to the logs? If so, determine whether there is a bandwidth limitation between your client and Google.
- For consistency during tests, ensure that your requests aren't cached. When running in production, add a Cache-Control HTTP header to your response in order to improve latency.
- Does your request make API calls? If so, use Appstats to determine the time taken for API calls.
- Are you using HTTPS or a custom domain? Compare latency with HTTP requests to your appspot.com domain to isolate whether the latency is caused by these factors.
- If you think the slowdown occurs in your code, add application logging to record timing events in your code.
If you have purchased a support package, you can contact Google Cloud Platform's support team for further help. Here is information you should have at hand to help us quickly diagnose latency caused by network issues:
- Your IP address. You can get that by looking at the Developers Console logs for a request sent to App Engine.
- The URL of your App Engine application.
- The IP address to which the domain name from the above URL resolves to.
- The output of ping and traceroute from your client to the above IP address.
- The output from running the curl command, shown earlier in this blog post. You may want to run this a few times to ensure you have a representative result.
- The Developers Console logs for the above request.
If you’d like to explore this topic further, check out our methodology for YouTube video quality and read about Mobile analysis in PageSpeed Insights.
- Posted by John Lowry, Technical Account Manager