Google Prediction API in the App Engine

I've now integrated the Google Prediction API into a Google App Engine project, in order to supply sentiment prediction at runtime. I wanted to use a service account to access the model via the Google api client libraries for Java. This has proven trickier than I first imagined, but the code is ultimately straightforward.

Some caveats

Use your service account, not the APIs Explorer

I originally set up my model and queried it using the APIs Explorer. Unfortunately I didn't realise that although I was using the same google account to configure access to the API from the app (see below) as I was using to train the model, the one can't see the other. In other words, the service account and the google account are separate, and they can't see each other's data. The upshot of this is that I have to train my model programatically using the service account, if I want to query it programatically too.

If you set up your Google APIs Console project from a Google Apps account, don't

The problem is that your service account needs to allow access to your Google App Engine "Service Account Name" - see under the Application Settings for your app, it will be of the form something@appspot.gserviceaccount.com. Unfortunately you need to add this to your Team for the Google APIs Console project. And it won't let you if you're logged in to your Google Apps account. For example, I'm logged in under extropy.net, and it will complain if you add an address that doesn't belong to this domain. I couldn't find any mention of this problem in the documentation anywhere.

The solution is to create a new Google APIs Console project under a regular gmail account. You can then add any email address it seems, including appspot ones.

If you have a custom domain for your app that matches your Google APIs Console account you may be able to ignore this, because the domain names will match, but I'm not able to confirm that.

The Code

In the end the code is quite simple, although the documentation is misleading and in a number of cases out of date. This is what I found that worked...

Following the examples from the Google API client libraries I created a utility class like so:

static final HttpTransport HTTP_TRANSPORT = new UrlFetchTransport();

static final String MODEL_ID = "your_model_id";
static final String STORAGE_DATA_LOCATION = "path_to_your_training_data.csv";

static final String API_KEY = "the_key_from_the_apis_console";

/**
 * Global instance of the JSON factory.
 */
static final JsonFactory JSON_FACTORY = new JacksonFactory();
public static final String APPLICATION_NAME = "Grokmood";

public static Prediction getPrediction() throws Exception {

    AppIdentityCredential credential =
            new AppIdentityCredential(Arrays.asList(PredictionScopes.PREDICTION));

    Prediction prediction = new Prediction.Builder(
            HTTP_TRANSPORT, JSON_FACTORY, credential).setApplicationName(APPLICATION_NAME)
            .build();

    return prediction;
}

This will create a Prediction object for you. It uses the AppIdentityCredential to access your service account. I found the documentation for this somewhat scarce.

To train the model call this method:

public static void train(Prediction prediction) throws IOException {

    Training training = new Training();
    training.setId(MODEL_ID);
    training.setStorageDataLocation(STORAGE_DATA_LOCATION);

    prediction.trainedmodels().insert(training).setKey(API_KEY).execute();
}

And you can query it like so:

public static String predict(Prediction prediction, String text) throws IOException {
    Input input = new Input();
    Input.InputInput inputInput = new Input.InputInput();
    inputInput.setCsvInstance(Collections.<Object>singletonList(text));
    input.setInput(inputInput);
    Output output = prediction.trainedmodels().predict(MODEL_ID, input).setKey(API_KEY).setId(MODEL_ID).execute();
    return output.getOutputLabel();
}

This last method will simply return positive, negative or neutral for my sentiment model.

When training the model don't forget that this takes some time- the method just kicks off the training process and returns immediately. For an example of how to wait until the model is finished please take a look at PredictionSample.java. I just kicked off training and came back quarter of an hour later. Remember that you can't see the status except by querying from the service account - one could add a similar method using trainedmodels().get() to review the training status too.

One last caveat - none of the above will work locally. This is another difference between local and production Google App Engine environments. Once it's been deployed the app will correctly identify itself, but there's no way to do that when running on your own machine. You will either have to fake your API responses locally, or use a different authentication method - one could use OAuth to authenticate yourself after logging in with a google account. You'd then have two different means of authenticating, one for local, and one for production...

Posted on April 12, 2013 and filed under dev.