Posted:
In case you happened to miss some of the Cloud Platform news in July, we’ve got a round-up for you:

Expanding the Kubernetes community
This month, we announced that Microsoft, Red Hat, IBM, Docker, Mesosphere, CoreOS and SaltStack are joining the Kubernetes community. Kubernetes is our open source container management solution. These companies are going to work with us to ensure that Kubernetes is a strong container management framework for any application and in any environment - whether in a private, public or hybrid cloud.

Cloud Platform predicts the World Cup
We kicked off the month with a focus on the World Cup. We used Google Cloud Dataflow to ingest touch-by-touch gameplay data from World Cup matches going back to 2006 as well as three years of English Barclays Premier League, two seasons of Spanish La Liga, and two seasons of U.S. MLS. We then polished the raw data into predictive statistics using Google BigQuery. At the end of the day, we correctly predicted the final outcome as well as 11 of 12 of the games leading up to it. You can read our posts after the round of 16, after the quarterfinals, and before the final.

A great new way to learn about App Engine
We launched a new course on Udacity: Developing Scalable Apps with Google App Engine. We’ve already gotten great feedback from developers, and a few of our favorite sections are Urs talking about what makes App Engine unique as well as a brief history of the data center (pizza boxes included).

More container news: Red Hat Enterprise Atomic Host comes to Compute Engine
Jim Totton, Vice President and General Manager at Red Hat, wrote on our blog about Red Hat Enterprise Linux Atomic Host coming to Google Compute Engine. This provides a secure, lightweight and minimal footprint operating system optimized to run Linux Containers on Google’s infrastructure.

More great customers
We featured lots of great customers who are using Google Cloud Platform to power their business. Webydo, a B2B solution for professional web design, cut costs by 37% when they moved to Google Cloud Platform. And US Cellular is using BigQuery for “highly flexible analysis of large datasets.” This has allowed them to better measure the effectiveness of marketing campaigns.

David LaBine, Director of education software for SMART Technologies, wrote on our blog that using App Engine means “developers [at SMART Technologies] are more productive because they’re able to focus on writing new features rather than worrying about infrastructure…” Rafael Sanches, co-founder of Allthecooks, wrote on our blog that, “Google Cloud Platform played a key role in helping us grow... Since launching, we’ve grown to over 12 million users with a million monthly active users. Our application now sees millions of interactions daily that run through Google App Engine and Google Cloud Datastore. “

Finally, Brightcove and Fastly wrote on our blog that “because Google Cloud Platform launches instances in less than half the time of the rest of the industry, Fastly is able to launch new customers through Brightcove in a turnkey way.”

More product news
We introduced the Google Cloud Monitoring Read API, giving developers programmatic access to over 30 different metrics about their services, including CPU usage, disk IO and much more. Cloud Monitoring Read API allows you to query current and historical metric data for up to the past 30 days.

Also, click-to-deploy Apache Cassandra makes it easy to launch a dedicated Apache Cassandra cluster on Google Compute Engine. All it takes is one click after some basic information. In a matter of minutes, you can get a complete Cassandra cluster deployed and configured.

The roadshows kicked off
The Google Cloud Platform developer roadshow visited Los Angeles, San Francisco and Seattle in July. But, we’ve still got much of the tour coming up, so join us on the road to speak with the Cloud Platform team. You can still catch us in New York City (August 5), Cambridge (August 7), Boulder (August 12), Toronto (August 12), Austin (August 14), Atlanta (August 19), and Chicago (August 22). Click here to register.

-Posted by Benjamin Bechtolsheim, Product Marketing Manager

Posted:
We recently published a case study, Fast and Reliable Ranking in Datastore, that describes how we helped one of our Google App Engine customers shorten their ranking latency from one hour to five seconds. They applied unique design patterns such as job aggregation to achieve over 300 updates per second with strong consistency on Cloud Datastore. The following are highlights from the article.

The problem of ranking
Tomoaki Suzuki, an App Engine lead engineer at Applibot, a major game studio in Japan, has been trying to solve the common, yet difficult problem faced by every large gaming service: ranking.
Tomoaki Suzuki, App Engine lead engineer at Applibot, Inc. and their game Legend of Criptids (#1 ranked game in the Apple App Store North America gaming category in October 2012)

The requirements are simple:

  • Your game has hundreds of thousands (or more!) players.
  • Whenever a player fights enemies (or performs other activities), their score changes.
  • You want to show the latest ranking for the player on a web portal page.
Getting a rank is easy, if it's not expected to also be scalable and fast. For example, you could execute the following query:
SELECT count(key) FROM Players WHERE Score > YourScore

This query counts all the players who have a higher score than yours. But do you want to execute this query for every request from the portal page? How long would it take when you have a million players?

Tomoaki initially implemented this approach, but it took a few seconds to get each response. This was too slow, too expensive, and performed progressively worse as scale increased.
The easiest way: scan all players
Next, Tomoaki tried to maintain ranking data in Memcache. This was fast, but not reliable, because Memcache entries are just caches and could be evicted at any time. With a ranking service that depended solely on in-memory-key-values, it was difficult to maintain consistency and availability.

Looking for an O(log n) Algorithm
I was assigned to Applibot under a platinum support contract. I knew that ranking was a classic and yet hard-to-solve problem for any scalable distributed service. The simple query solution requires scanning all players with a higher score to count the rank of one player. The time complexity of this algorithm is O(n); that is, the time required for query execution increases proportionally to the number of players. In practice, this means that the algorithm is not scalable. Instead, we need an O(log n) or faster algorithm, where the time will only increase logarithmically as the number of players grows.

If you ever took a computer science course, you may remember that tree algorithms, such as binary trees, red-black trees, or B-Trees, can perform at O(log n) time complexity for finding an element. Tree algorithms can also be used to calculate an aggregate value of a range of elements, such as count, max/min, and average by holding the aggregated values on each branch node. Using this technique, it is possible to implement a ranking algorithm with O(log n) performance.

I found an open source implementation of a tree-based ranking algorithm for Datastore, written by a Google engineer: the Google Code Jam Ranking Library.
Getting the rank of a score in a tertiary tree with google Code Jam Ranking Library
Concurrent Updates Limit Scalability
However, during load testing, I found a critical limitation with the Code Jam ranking library. Its scalability in terms of update throughput was quite low. When he increased the load to three updates per second, the library started to return transaction retry errors. It was obvious that the library could not satisfy Applibot's requirement for 300 updates per second. It could handle only about 1% of that throughput.

Why is that? The reason is the cost of maintaining the consistency of the tree. In Datastore, you must use an entity group to assure strong consistency when updating multiple entities in a transaction—see "Balancing Strong and Eventual Consistency with Google Cloud Datastore". The Code Jam ranking library uses a single entity group to hold the entire tree to ensure consistency of the counts in the tree elements.

However, an entity group in Datastore has a performance limitation. Datastore only supports about one transaction per second on an entity group. Furthermore, if the same entity group is modified in concurrent transactions, they are likely to fail and must be retried. The Code Jam ranking library is strongly consistent, transactional, and fairly fast, but it does not support a high volume of concurrent updates.

Datastore Team's Solution: Job Aggregation
I remembered that a software engineer on the Datastore team had mentioned a technique to obtain much higher throughput than one update per second on an entity group. This could be achieved by aggregating a batch of updates into one transaction, rather than executing each update as a separate transaction. So Kaz asked the Datastore team for a solution for this problem.

In response to my request, the Datastore team started discussing this issue and advised us to consider using Job Aggregation, one of the design patterns used with Megastore, the underlying storage layer of Datastore, that manages the consistency and transactionality of entity groups. The basic idea of Job Aggregation is to use a single thread to process a batch of updates. Because there is only one thread and only one transaction open on the entity group, there are no transaction failures due to concurrent updates. You can find similar ideas in other storage products such as VoltDb and Redis.

Proposed Solution Runs at 300 Updates per Second Sustained
Based on the advice from the Datastore team, I wrote Proof of Concept (PoC) code that combines the Job Aggregation pattern with the Code Jam ranking library. The PoC creates a pull queue, which is a kind of Task Queue in App Engine that allows developers to implement one or multiple workers that consume the tasks added to the queue. The backend instance has a single thread in an infinite loop that keeps pulling as many tasks as possible (up to 1000) from the queue. The thread passes each update request to the Code Jam ranking library, which executes them as a batch in a single transaction. The transaction may be open for a second or more, but because there is a single thread driving the library and Datastore, there is no contention and no concurrent modification problem.

The following figure shows the load testing result of the final PoC implementation. Kaz also incorporated another design pattern, Queue Sharding, to effectively minimize the performance fluctuations in each task queue. With the final proposed solution, it can sustain 300 updates per second over several hours. Under usual load, each update is applied to Datastore within a few seconds of receiving the request.
Performance graph of the solution
With the load testing results and the PoC code, I presented the solution to Tomoaki and other Applibot engineers. Tomoaki plans to incorporate the solution in their production system, expects to reduce the latency of updating the ranking info from one hour to five seconds, and hopes to dramatically improve the user experience.

-Posted by Kazunori Sato, Solutions Architect

Notes
Any performance figures described in this article are sampled values for reference and do not guarantee any absolute performance of App Engine, Datastore, or other services.

Posted:
Today’s guest blog comes from Casey Wilms, product lead for cloud media services provider Brightcove, and Lee Chen, head of product at content delivery network Fastly. Fastly leverages Brightcove's Zencoder cloud-based encoding service, which runs on Google compute, to offer a powerful Live video streaming solution.

In the streaming video world, streaming live events without hiccups is a bit like the Holy Grail. Now, with a bundled transcoding and delivery package from Fastly, customers can leverage the power of Brightcove’s Zencoder cloud transcoding solution and Fastly's unparalleled content delivery network to offer events with an impeccable user experience.

Background on Brightcove and Fastly
Brightcove's Zencoder is a cloud-based transcoding solution for Live video and VOD. Customers like Funny or Die and SmugMug have built their video workflow around Zencoder to encode large volumes of video affordably and quickly. Performance is paramount for Zencoder, so Brightcove relies on the Google Cloud Platform for consistently great service around the globe.

Fastly is a next-generation content delivery network (CDN) that can cache any type of content and has zero-delay, instant purge (in ~150 milliseconds) across dynamic content, compared to two minutes (or even as long as six hours) on a traditional CDN. This allows Fastly to truly cache dynamic content for businesses like Twitter, Github, and Foursquare. Since Fastly’s goal is to deliver content that is as close to real time as possible, it made sense to package Zencoder as part of a broader live video delivery solution. Fastly’s event-based transcoding, powered by Zencoder, offers businesses a faster and more seamless video streaming service. For example, one customer test streamed video from Mexico to the central U.S. and had virtually no latency, around 50 milliseconds.

How the Brightcove/Fastly Integration Works
Let’s say you have a live event that you’re streaming online for your audience. Your production team provides you with a Program Out signal to your live encoders. You generate the source signal, which is sent to Fastly’s event-based transcoding, powered by Zencoder. Zencoder then transcodes that signal into different bitrates and resolutions (renditions), and Fastly picks it up and distributes it to the end user’s app or Web player. When the event is over, the transcoding instance spins down, and you won’t have to worry about paying for more time than you’re broadcasting for.
fastly_target_diagram-01.png
Why Brightcove Chose Google Cloud Platform
One of the benefits in using Fastly and Brightcove’s live online video delivery package is that Zencoder runs on Google Compute Engine, which means everything is hosted in the cloud. Brightcove tried other infrastructure hosting platforms, and found that only Google Compute Engine delivers the kind of high responsiveness, consistency, and performance needed to offer seamless live video streaming.

Because Google Cloud Platform is built on the same network Google uses, network latency is extremely low, resulting in best-in-class speed and reliability. Thus, Brightcove can spin up server instances as needed to meet spikes in demand. Google has achieved a level of consistency across their cloud in terms of compute and network performance that is simply impossible to find with other vendors. That means the 100th server that's spun up can be expected to exhibit the same compute profile as the first. Instances start up in about half the time of Amazon. Also, thanks to Google Compute Platform's per minute billing granularity and automatic discounts, customers using the live online video delivery package will get an unbeatable price point.

Another thing Brightcove doesn’t get from anyone else is the level of customer and engineering support that Google offers. If the team has an issue, they can email Google’s support and get a response right away. They can end up talking to a product manager, and if they file a support ticket, they get a call the next day, which is totally unheard of in the cloud platform world.

Faster Streaming, Happier Customers
Because Google Cloud Platform launches instances in less than half the time of the rest of the industry, Fastly is able to launch new customers through Brightcove in a turnkey way. If a customer has an on-demand broadcast, like a music broadcast or sporting event, that gets a choke in the CPU, it translates to latency, and viewers end up watching a stutter step or getting completely disconnected. If anything goes wrong with the live stream, it’s urgent to fix the problem immediately, since it’s impossible to redo an event that happens live. The amount of support required to satisfy customers for a live event is an order of magnitude higher than other types of video service. Thanks to Google Cloud Platform's consistency and speed, Fastly and Brightcove can rest comfortably knowing their customers will be happy.

-Contributed by Casey Wilms, product lead for media processing technology at Brightcove, and Lee Chen, Head of Product at Fastly

Posted:
Cross-posted from the Google Analytics Blog

With 10.6 million cell phone customers and retail stores in 400+ markets, U.S. Cellular needs to reach a lot of people with marketing messages. That's why U.S. Cellular uses many marketing channels -- online, in-store and telesales -- to drive mobile phone activations.

U.S. Cellular was challenged though. They didn’t know how many of their offline sales were driven by their digital marketing. This made it harder to adjust their media mix accordingly and also to forecast sales. To fix that situation, U.S. Cellular and its digital-analytics firm, Cardinal Path, turned to Google Analytics Premium and its integration with Google BigQuery.

Part of Google Cloud Platform, BigQuery allows for highly flexible analysis of large datasets. The U.S. Cellular team used it to integrate and analyze terabytes of data from Google Analytics Premium and other systems. Then they mapped consumer behavior across online and offline marketing channels. Each transaction was attributed to the consumer touchpoints that the buyer had made across various sales channels.

The result: U.S. Cellular got real insight into digital’s role in their sales. They were surprised to find that they could reclassify nearly half of all their offline activations to online marketing channels.

U.S. Cellular now uses this complete (and fully automatic) analytics framework to really see the consumer journey and forecast sales for each channel. Their team has the data they need to make better business decisions.

“We’re now in the enviable position of having an accurate view at each stage of our customer journey," says Katie Birmingham, a digital & e-commerce analyst for the company. "The Google Analytics Premium solution not only gives us a business advantage, but helps us shape a great customer experience, and ultimately ties in to our values of industry-leading innovation and world-class customer service.”

Be sure to read the full case study.

-Posted by: Suzanne Mumford, Google Analytics Premium Marketing

Posted:
Editor’s note: Today’s guest blog comes from Ron Zalkind, co-founder and CTO of Waltham, Massachusetts-based CloudLock, a leading cloud security provider. The largest organizations in the world trust CloudLock to secure their data in the cloud, increase collaboration, and reduce their risk.

At a time when more and more organizations are moving their most sensitive data assets and applications to the cloud, security takes center stage, often at the price of user productivity. At CloudLock, we believe that each and every organization should work hard to protect their data and users in the cloud, not from it. With this philosophy in mind, CloudLock provides cloud security applications that help over 700 businesses using Google Apps and Salesforce to enforce regulatory, operational and security compliance.

We’ve been building enterprise products on top of Google Cloud Platform, specifically Google App Engine, for four years now. Collaborating with Google early on allowed us to leverage its best-of-class infrastructure security and scalability - both paramount for us as a security provider.

Our business must be as agile as our customers. Their accelerated SaaS platform adoption means that our business is data intensive, continuously processing changes in billions of objects and thousands of third-party applications connected to Google Apps user accounts. We’ve built a massive real-time data processing solution using App Engine, enabling us to focus on delivering core value for customers instead of focusing on infrastructure. App Engine’s auto-scaling capabilities provide our customers with a security solution that grows with their business, and its development features enable us to release code improvements frequently and seamlessly.

As a SaaS company, high-quality service is our bond. To keep delivering best-in-class service our team leverages central management features available through the admin panel such as Google BigQuery and Google Cloud Storage for advanced service analysis, monitoring and delivery. We have also been using Premiere Support since its launch, which boosts our ability to provide enterprise-level customer support.

As a premier Google Apps partner, we rely on Cloud Platform to provide enterprise-class cloud security to over five million users. It’s a beautiful synergy.

-Posted by Ron Zalkind, co-founder and CTO of CloudLock

Posted:
A new, free Udacity online course, Developing Scalable Apps with Google App Engine, helps Java developers learn how to build scalable App Engine applications. As you work through the course, you'll build a conference management application that lets users create and query conferences.

magnus pizza boxes 5-cropped.png

The course starts with an entertaining introduction to Platform as a Service(PaaS). Magnus Hyttsten, Google Developer Advocate, discusses the evolution of server-side computing, from apps that could run on a computer under your desk, to applications that require the computing power of a data center. (This is not without disadvantages, as he points out. "It is no longer possible to warm your feet on the fan outlet.")

Urs Hölzle, Senior VP of infrastructure at Google, gives the background on Google Cloud Platform: "We built our internal cloud a long time ago, for our own needs. We had very large-scale applications, so we needed a very capable cloud, and now we're making our cloud available to everyone. In Cloud platform, App Engine is the one system that makes it really easy for you to start very small and then scale to a very large user base."

just the sandwich 3.png

After learning about the evolution of the data center and the need for scalability, you'll get right down to business and learn how to store data in the Datastore, use Memcache to speed up responses and cut down on Datastore quota usage, write queries, understand indexes, and use queues for tasks that execute outside front end requests.

Along the way, you'll build a backend App Engine application that uses Google Cloud Endpoints to expose its API to other applications.


conf-central-code.png

You'll learn how to implement Endpoints to make the API available externally, and how to use the Endpoints API from an Android application.

If you take this course, you'll not only learn about App Engine, but you'll use it behind the scenes too. Udacity uses App Engine to serve its online courses. Mike Sokolsky, Udacity co-founder and CTO, talks about Udacity's decision to use App Engine to host Udacity's MOOCs. He says, "It pushes you in the right direction. It pushes you to the best design practices for building a scalable application." And that's what this course aims to help you do, too.

You can take the course, Developing Scalable Applications with App Engine, at www.udacity.com/course/ud859.

The full course materials — all the videos, quizzes, and forums — are available for free for all students by selecting “View Courseware”. Personalized ongoing feedback and guidance from coaches is also available to anyone who chooses to enroll in Udacity’s guided program.

For more courses that Google and Udacity are developing together, see www.udacity.com/google.

"
-Posted by Jocelyn Becker, Developer Advocate

Posted:
Starting today, Google Cloud Monitoring Read API is generally available, allowing you to programmatically access metric data from your running services, such as CPU usage or disk IO. For example, you can use Cloud Monitoring Read API with Nagios to plug in to your existing alerting/event framework, or use it with Graphite to combine the data with your existing graphs. Third party providers can also use the API to integrate Google Cloud Platform metrics into their own monitoring services.

Cloud Monitoring Read API allows you to query current and historical metric data for up to the past 30 days. Also, you can use labels to filter data to more specific metrics (e.g. zones). Currently Cloud Monitoring Read API supports reading metric time series data from the following Cloud Platform services:
  • Google Compute Engine - 13 metrics
  • Google Cloud SQL - 12 metrics
  • Google Cloud Pub/Sub - 14 metrics

Our documentation provides a full list of supported metrics. Over time we will be adding support for more Cloud Platform services metrics and enhancing the metrics for existing services. You can see an example of usage and try these metrics for yourself on our getting started page. For samples and libraries, click here.

Example: getting CPU usage time series data
GET \
https://www.googleapis.com/cloudmonitoring/v2beta1/ \  # Access API
projects/YOUR_PROJECT_NAME/ \                          # For YOUR_PROJECT_NAME
timeseries/ \                                          # get time series of points
compute.googleapis.com%2Finstance%2Fcpu%2Fusage_time?\ # of CPU usage
youngest=2014-07-11T10%3A29%3A53.108Z& \           # with this latest timestamp
key={YOUR_API_KEY}                                     # using this API key
Your feedback is important!
We look forward to receiving feedback and suggestions at cloud-monitoring-feedback@googlegroups.com.

-Posted by Amir Hermelin, Product Manager