Google Prediction API in the App Engine

I've now integrated the Google Prediction API into a Google App Engine project, in order to supply sentiment prediction at runtime. I wanted to use a service account to access the model via the Google api client libraries for Java. This has proven trickier than I first imagined, but the code is ultimately straightforward.

Some caveats

Use your service account, not the APIs Explorer

I originally set up my model and queried it using the APIs Explorer. Unfortunately I didn't realise that although I was using the same google account to configure access to the API from the app (see below) as I was using to train the model, the one can't see the other. In other words, the service account and the google account are separate, and they can't see each other's data. The upshot of this is that I have to train my model programatically using the service account, if I want to query it programatically too.

If you set up your Google APIs Console project from a Google Apps account, don't

The problem is that your service account needs to allow access to your Google App Engine "Service Account Name" - see under the Application Settings for your app, it will be of the form something@appspot.gserviceaccount.com. Unfortunately you need to add this to your Team for the Google APIs Console project. And it won't let you if you're logged in to your Google Apps account. For example, I'm logged in under extropy.net, and it will complain if you add an address that doesn't belong to this domain. I couldn't find any mention of this problem in the documentation anywhere.

The solution is to create a new Google APIs Console project under a regular gmail account. You can then add any email address it seems, including appspot ones.

If you have a custom domain for your app that matches your Google APIs Console account you may be able to ignore this, because the domain names will match, but I'm not able to confirm that.

The Code

In the end the code is quite simple, although the documentation is misleading and in a number of cases out of date. This is what I found that worked...

Following the examples from the Google API client libraries I created a utility class like so:

static final HttpTransport HTTP_TRANSPORT = new UrlFetchTransport();

static final String MODEL_ID = "your_model_id";
static final String STORAGE_DATA_LOCATION = "path_to_your_training_data.csv";

static final String API_KEY = "the_key_from_the_apis_console";

/**
 * Global instance of the JSON factory.
 */
static final JsonFactory JSON_FACTORY = new JacksonFactory();
public static final String APPLICATION_NAME = "Grokmood";

public static Prediction getPrediction() throws Exception {

    AppIdentityCredential credential =
            new AppIdentityCredential(Arrays.asList(PredictionScopes.PREDICTION));

    Prediction prediction = new Prediction.Builder(
            HTTP_TRANSPORT, JSON_FACTORY, credential).setApplicationName(APPLICATION_NAME)
            .build();

    return prediction;
}

This will create a Prediction object for you. It uses the AppIdentityCredential to access your service account. I found the documentation for this somewhat scarce.

To train the model call this method:

public static void train(Prediction prediction) throws IOException {

    Training training = new Training();
    training.setId(MODEL_ID);
    training.setStorageDataLocation(STORAGE_DATA_LOCATION);

    prediction.trainedmodels().insert(training).setKey(API_KEY).execute();
}

And you can query it like so:

public static String predict(Prediction prediction, String text) throws IOException {
    Input input = new Input();
    Input.InputInput inputInput = new Input.InputInput();
    inputInput.setCsvInstance(Collections.<Object>singletonList(text));
    input.setInput(inputInput);
    Output output = prediction.trainedmodels().predict(MODEL_ID, input).setKey(API_KEY).setId(MODEL_ID).execute();
    return output.getOutputLabel();
}

This last method will simply return positive, negative or neutral for my sentiment model.

When training the model don't forget that this takes some time- the method just kicks off the training process and returns immediately. For an example of how to wait until the model is finished please take a look at PredictionSample.java. I just kicked off training and came back quarter of an hour later. Remember that you can't see the status except by querying from the service account - one could add a similar method using trainedmodels().get() to review the training status too.

One last caveat - none of the above will work locally. This is another difference between local and production Google App Engine environments. Once it's been deployed the app will correctly identify itself, but there's no way to do that when running on your own machine. You will either have to fake your API responses locally, or use a different authentication method - one could use OAuth to authenticate yourself after logging in with a google account. You'd then have two different means of authenticating, one for local, and one for production...

Posted on April 12, 2013 and filed under dev.

Google Prediction Revisited

A few weeks ago I had a brief foray with the Google Prediction API, and I resolved to revisit it armed with some techniques for cleaning up the training data.

Briefly, I've cleaned up the data to remove stop words, twitter orthography (eg 'RT' but not #hashtags), @usernames and links.

The results are more positive. Gone are the dubious classifications in the 0.66666... to 0.33333 range. Here are some examples, showing the query in italics and the response from the model below it:

I just love ice cream

  {
   "label": "positive",
   "score": 0.531505
  },
  {
   "label": "negative",
   "score": 0.11532
  },
  {
   "label": "neutral",
   "score": 0.353176
  }

this is relevant to my interests

  {
   "label": "positive",
   "score": 0.117893
  },
  {
   "label": "negative",
   "score": 0.333303
  },
  {
   "label": "neutral",
   "score": 0.548804
  }

I absolutely hate this rubbish

  {
   "label": "positive",
   "score": 0.07385
  },
  {
   "label": "negative",
   "score": 0.737656
  },
  {
   "label": "neutral",
   "score": 0.188494
  }

I have of course cherry-picked these examples. The training data is still heavily burdened with neutral examples, and this shows in some queries:

I'm incredibly happy

  {
   "label": "positive",
   "score": 0.150057
  },
  {
   "label": "negative",
   "score": 0.186907
  },
  {
   "label": "neutral",
   "score": 0.663037
  }

But on the whole this is much more useful than my previous experiment, and I'll continue to refine the processing and try to get some more training data.

Posted on April 10, 2013 and filed under dev.

Stop And Stem

After looking at the results of my brief foray into sentiment analysis of tweets a couple of weeks ago, and reading about the problem, it became clear that pre-processing may well help clean up the data and improve training. The goal is to reduce the number of possible features. Put simply, there are too many different words, and a lot of them are too noisy!

There are various techniques to do this, such as removing stop words ("and", "the" etc., words that don't add to the sentiment), and stemming to group reduce the variants of the same word (eg plurals and other endings) to the same token.

In Java the Lucene libraries help a great deal here. Here's how to remove stop words using Lucene's StopFilter:

    Tokenizer tokenizer = new StandardTokenizer(Version.LUCENE_41,
            new StringReader("I've got a brand new combine harvester, and I'm giving you the key"));

    final StandardFilter standardFilter = new StandardFilter(Version.LUCENE_41, tokenizer);
    final StopFilter stopFilter = new StopFilter(Version.LUCENE_41, standardFilter, StopAnalyzer.ENGLISH_STOP_WORDS_SET);

    final CharTermAttribute charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class);

    stopFilter.reset();
    while(stopFilter.incrementToken()) {
        final String token = charTermAttribute.toString().toString();
        System.out.println("token: " + token);
    }

This will give you the following output:

token: I've
token: got
token: brand
token: new
token: combine
token: harvester
token: I'm
token: giving
token: you
token: key

Note that this assumes that the language is English; you'll have to find your own list of stop words for other languages. This example also uses the StandardFilter, which is is also useful for tokenization - it recognises things like email addresses for correct tokenization.

Stemming can also be achieved with the help of Lucene, via the PorterStemmer:

    final PorterStemmer stemmer = new PorterStemmer();

    stemmer.setCurrent("weakness");

    stemmer.stem();

    final String current = stemmer.getCurrent();

    System.out.println("current: " + current);

This will print out:

    current: weak

Again this is for English only.

Some more ideas to clean up the data: removing @usernames, excessive punctuation!!! and characters repeated too many times (eg "cooool"). Armed with these I'll attempt my sentiment training again.

Posted on April 8, 2013 and filed under dev.

Monitoring quotas on Google App Engine

One of my periodic chores with the Google App Engine is monitoring the quotas, particularly in my apps without billing enabled. Unfortunately Google provides no programatic way of doing this, and it doesn't look likely that it will. There is a QuotaService, but that isn't well documented and only shows quota use during a request.

However, one can report on quota exceptions that occur using the LogService. With this it's possible to find all exceptions within the last hour, say, that involved an OverQuotaException, like so:

    final LogService logService = LogServiceFactory.getLogService();

    LogQuery query = LogQuery.Builder.withDefaults();
    query.includeAppLogs(true);
    query.minLogLevel(LogService.LogLevel.ERROR);

    Calendar cal = Calendar.getInstance();
    cal.add(Calendar.MINUTE, -60);

    query.startTimeMillis(cal.getTimeInMillis());

    final Iterable<RequestLogs> requestLogsIterable = logService.fetch(query);

    int quotaFailures = 0;

    for (RequestLogs requestLog : requestLogsIterable) {

        LOGGER.info(requestLog.toString());

        for (AppLogLine appLogLine : requestLog.getAppLogLines()) {

            if (appLogLine.getLogMessage().contains("OverQuotaException")) {
                quotaFailures++;
            }
        }
    }

I can use the total number of quota exceptions within the last hour to create a healthcheck servlet, which can be queried by a an automated monitor (I use ServerMojo to ping this URL once an hour).

Of course, this doesn't warn you that you're about to go over quota, but it's given me a good handle on how the app fares over the course of a day.

One warning, LogService querying is subject to its own quota. During my early experiments I managed to get the date range wrong, and blew my LogService read quota in one hit! YMMV.

Posted on April 8, 2013 and filed under dev.

Links

I've started collecting useful links on all manner of subjects here. I hope you find these helpful. I will keep them up to date.

Posted on March 25, 2013 and filed under links, dev.

Objectify and Google Guice

I've been working over several Google App Engine Java apps recently to introduce Google Guice and Objectify to them. Guice is a lightweight dependency injection framework, and Objectify is a superb replacement for JDO/JPA in your Java GAE projects.

Google Guice lets you bind interfaces to implementations and annotate dependencies for injection, eg:

public interface MyService...

public class ClientCode {

    private MyService myService;

    @Inject
    public void setMyService(MyService myService) {
        this.myService = myService.
    }

}

If you're familiar with Spring then you'll find this a doddle. There's no XML in sight - Guice concentrates pretty much only on dependency injection, and the Java-based configuration classes one uses instead of XML seem perfectly adequate for this.

It also works nicely with Objectify. This is a data access API for the app engine. Take a look at the examples, they are extremely straightforward:

@Entity
class Car {
    @Id String vin; // Can be Long, long, or String
    String color;
}

ofy().save().entity(new Car("123123", "red")).now();
Car c = ofy().load().type(Car.class).id("123123").get();
ofy().delete().entity(c);

There's an Objectify servlet filter, somewhat similar in purpose open session in view filters, which can easily be set up in a couple of lines in Guice.

Moreover, now I can easily write pretty concise DAO and Service classes that are easily testable, which is something essential I've been sorely missing.

Posted on March 25, 2013 and filed under dev.

The Google Prediction API

In a previous post I explored some sample sentiment training data available from Sanders. Now let's try using it in the Google Prediction API.

The API lets you upload a set of training data. It will then create a model which you can interrogate. Training data is stored in Google Cloud Storage, and the API is accessible via REST, secured by OAuth in the usual Google style.

To get a good idea of what's involved I recommend reading the Hello Prediction! tutorial. I pretty much followed their example, except instead of detecting the language I used it to detect sentiment.

I had to refine my aforementioned training data to be in a form suitable for the API. That just means in this case that it has to be CSV file like so:

"positive","I love the whole world and everything in it"
"negative","You guys suck"
"neutral","Cheese is a kind of dairy product"

After following the steps described in the tutorial I was then in a position to query the model. Here's the prediction for an actual example taken from the positive data set:

{
 "kind": "prediction#output",
 "id": "my_model_id",
 "selfLink": "https://www.googleapis.com/prediction/v1.5/trainedmodels/my_model_id/predict",
 "outputLabel": "positive",
 "outputMulti": [
  {
   "label": "positive",
   "score": 0.666667
  },
  {
   "label": "negative",
   "score": 0
  },
  {
   "label": "neutral",
   "score": 0.333333
  }
 ]
}

Note that it doesn't give a unanimous positive vote, although it clearly chooses positive as the most likely category. I suspect this is because there is a lot more neutral data in the training set than either positive or negative, so that there is always a tendency to treat things as neutral. This is a useful quality where borderline cases are involved.

The other thing worth noting is the suspicious looking 2/3 and 1/3 score values themselves. Playing around with different queries always shows this 1/3 to 2/3 split, never any other numbers. I don't know what the cause of this is.

I need to spend some more time with this model, and probably get some more training data. One thing I will say is that it's both easy to use and fast. In Java terms the google-api-java-client covers a lot of ground here. I will post some more on developing with the Prediction API, and how well it performs in future posts.

Posted on March 22, 2013 and filed under dev.

Googomi

One of the great things about Google App Engine is, if you stay inside the box, so to speak, many things are a doddle. So much so that I was able to create this new app, Googomi, in a day or two, most of which involved fiddling with and learning about the Google+ API.

The Googomi app is a very simple beast with only one purpose: it will take your public Google+ stream and turn it into an RSS feed.

I've put a modicum of processing into it, so that it should correctly guess the most appropriate title for each RSS item, eg choosing the annotation, or the remote URL's title, where appropriate.

I personally had a use case for this (apart from learning about various Google APIs) whereby I wanted to export Google+ posts to other services automatically. For example, with this I can post from Google+ to Buffer and then beyond automatically.

Posted on March 18, 2013 and filed under dev.

Google App Engine and the Google+ API

I've been playing with what the Google+ API has to offer and I've found it quite easy to integrate into my Google App Engine apps using the google-api-java-client.

I initially followed the Quick start for Java tutorial with regard to creating the OAuth tokens and so forth, but the google-api-java-client has some good tutorials regarding making the actual OAuth calls. See for example this section about how to make the calls from a Google App Engine app. The library handles all the plumbing for you.

I only had to make one ammendment to their example. I found that the refresh token wasn't being returned along with the access token after it was granted. However this was simply fixed by adding a call to setApprovalPrompt("force") on the GoogleAuthorizationCodeFlow.Builder, like so:

public static GoogleAuthorizationCodeFlow newFlow() throws IOException {
    return new GoogleAuthorizationCodeFlow.Builder(HTTP_TRANSPORT, JSON_FACTORY,
            getClientCredential(),             Collections.singleton(PlusScopes.ME.getUri())).setCredentialStore(
            new AppEngineCredentialStore()).setAccessType("offline")
            .setApprovalPrompt("force")
            .build();
}
Posted on March 18, 2013 and filed under dev.

Twitter Sentiment Data

I've been delving into some twitter sentiment analysis and have been casting about for some useful training data. I've found various sources but few have any neutral data, which I think is important for any training as sort of control.

One useful source is Sanders Analytics, which has a source of tweet ids and a script to download the actual tweets from the ids (Twitter's terms & conditions do not allow the tweets themselves to be distributed).

This script takes a couple of days to download all the tweets because it has to honour Twitter's API limits.

I found one issue in the script which is easily fixed. It could cope with the presence of "error" in the response, but not "errors", eg:

{"errors":[{"message":"Sorry, that page does not exist","code":34}]}

The simple fix is to add this to the parse_tweet_json function, after the error check:

if 'errors' in tweet_json:
    raise RuntimeError('errors in downloaded tweet')

When the script finishes it will produce a file called full-corpus.csv. Now the final data has this format:

"apple","positive","126360398885687296","Tue Oct 18 18:14:01 +0000 2011","a tweet of some sort"

That is, the subject, the sentiment, the tweet id, the date and the tweet content.

The subject is what the tweet is about. This is important, as the sentiment refers to the subject. In other words the sentiment is about the subject (in this case "apple"), and not anything else in the tweet content.

Regardless, for my purposes I do actually need the tweet content without the subject. This can be simply achieved using grep and awk. Eg to extract the neutral tweets:

grep "\"neutral\"" full-corpus.csv | awk -F"\",\"" '{print $5}' | cut -d "\"" -f1

The output of this will just be the tweets themselves.

Posted on March 18, 2013 and filed under dev.

Updating to GAE 1.7.5

Today I updated a maven-based Google App Engine app from 1.7.4 to 1.7.5. As before, it didn't turn out as straightforward as I expected (maybe I should stop expecting this).

Once I'd installed 1.7.5 and set gae.version to 1.7.5 the build failed yet again - the issue this time boiled down to this error:

Could not find artifact net.kindleit:maven-gae-parent:pom:0.9.6-SNAPSHOT

As usual I turned to stackoverflow for help, where several others have had the same problem.

The key for me was to specify 1.7.5.1 for the GAE runtime version.

<gae.version>1.7.5</gae.version>
<gae-runtime.version>1.7.5.1</gae-runtime.version>

<dependency>
    <groupId>net.kindleit</groupId>
    <artifactId>gae-runtime</artifactId>
    <version>${gae-runtime.version}</version>
    <type>pom</type>
</dependency>   

<plugin>
    <groupId>net.kindleit</groupId>
    <artifactId>maven-gae-plugin</artifactId>
    <version>0.9.5</version>
    <dependencies>
        <dependency>
            <groupId>net.kindleit</groupId>
            <artifactId>gae-runtime</artifactId>
            <version>${gae-runtime.version}</version>
            <type>pom</type>
        </dependency>
    </dependencies>
</plugin>

I continue to use 1.7.5 for other dependencies, eg appengine-api-stubs. I have no idea about the whys and wherefores regarding this inconsistency I'm afraid.

Posted on March 13, 2013 and filed under dev.

Facebook Apps in Heroku

A couple of years ago or so Heroku and Facebook teamed up to make creating Facebook apps a doddle. Indeed one can do so with a few clicks from the app creation centre in Facebook if you already have a Heroku account.

Here are pretty comprehensive instructions from Heroku on how to do this, and I can attest that it all works well.

I've added to this setup with a staging instance for team testing purposes using the facility Heroku has for managing different environments by pushing to different remotes. See this handy guide for full details.

To create a staging branch called staging:

heroku create --remote staging

And to add Facebook app credentials for the staging version of your app just do:

heroku config:add FACEBOOK_APP_ID=123456 FACEBOOK_SECRET=789102323etc --remote staging
Posted on March 3, 2013 and filed under dev.

Managing Javascript Resources In Maven

One of the fiddly steps in setting up a web app, and maintaining it is managing all the various javascript libraries your pages use. But it's quite easy to manage resources like jQuery in Maven thanks to WebJars. Here's how to use it in Dropwizard.

If you take a look at WebJars you'll see all sorts of supported libraries. I'll use jQuery in this example.

Adding jQuery to Dropwizard

First add your jQuery dependency in your pom:

<dependency>
    <groupId>org.webjars</groupId>
    <artifactId>jquery</artifactId>
    <version>1.9.0</version>
</dependency>

Now add an AssetBundle in your Dropwizard service class:

@Override
public void initialize(Bootstrap<StreamWebAppConfiguration> bootstrap) {
    bootstrap.setName("webjars-demo");
    ... other assets ...
    bootstrap.addBundle(new AssetsBundle("/META-INF/resources/webjars", "/webjars"));
}

This will map the path "/webjars" to that jar resource - which will contain the jQuery js files in our example.

Now you can reference them in your HTML pages:

<script src="/webjars/jquery/1.9.0/jquery.min.js"></script>

And that's that. But you can go one step further. You can remove references to the version number in your pages by using the dropwizard-webjars-resource library.

Dropwizard Webjars Resource

To do this add another maven dependency:

<dependency>
    <groupId>com.bazaarvoice.dropwizard</groupId>
    <artifactId>dropwizard-webjars-resource</artifactId>
    <version>0.2.0</version>
</dependency>

In your service class remove the aforementioned AssetBundle and instead add a WebJarResource to your run method:

environment.addResource(new WebJarResource());

This will handle all the asset mapping (which is why an AssetBundle is no longer required). Now in your pages you can reference:

<script src="/webjars/jquery/jquery.min.js"></script>

i.e. without the version number. Simple! If you need to update your whole site to the next version of jQuery, just update the pom.

Posted on February 19, 2013 and filed under dev.

Streaming Twitter with Twitter4J

Twitter4J is an excellent java library for all sorts of twitter work. I've been using it recently to connect to the "garden hose", ie Twitter's streaming API. Here's how to follow a particular user with it.

You can load this into your project via Maven:

<dependency>
  <groupId>org.twitter4j</groupId>
  <artifactId>twitter4j-core</artifactId>
  <version>3.0.3</version>
</dependency>
<dependency>
  <groupId>org.twitter4j</groupId>
  <artifactId>twitter4j-stream</artifactId>
  <version>3.0.3</version>
</dependency>

Now you can construct your TwitterStream class:

ConfigurationBuilder cb = new ConfigurationBuilder();
cb.setDebugEnabled(true)
.setOAuthConsumerKey("******************")
.setOAuthConsumerSecret("************************************")
.setOAuthAccessToken("************************************")
.setOAuthAccessTokenSecret("************************************");
TwitterStreamFactory twitterStreamFactory = new TwitterStreamFactory(cb.build());
TwitterStream twitterStream = twitterStreamFactory.getInstance();

Of course you'd put your own oauth tokens etc. here.

To listen to a particular user you can use a FilterQuery object:

  FilterQuery filterQuery = new FilterQuery();
  filterQuery.follow(new long[] {3473284738472384327743L});

The follow method takes an array of user's ids to follow.

To track the user you need to add a listener and attach this filter:

  twitterStream.addListener(new MyStatusListener());
  twitterStream.filter(filterQuery);

Now the MyStatusListener class merely implements StatusListener. The important method we implement here is onStatus. For our purposes we just print the statuses out:

  public void onStatus(Status status) {
          System.err.println("status: " + status);
  }

We don't need to do anything else in our StatusListener implementation for our current purpose.

If you execute this code and let it run - Twitter4J will start a thread for you - you will see the results comming in, eg:

  status: StatusJSONImpl{createdAt=Mon Feb 18 15:51:26 GMT 2013, id=303532182444584960, text='RT @stephenfry: Oh no, ...
  status: StatusJSONImpl{createdAt=Mon Feb 18 15:51:30 GMT 2013, id=303532201159557123, text='RT @stephenfry: Oh no, ...
  etc...
Posted on February 18, 2013 and filed under dev.

This Week's Reading

Once again this week has flown by with hardly any time to catch up on the outside word. Here are a few tidbits of interest...

Microsoft is looking into, in their own words, Mining the Web to Predict Future Events. I've not had time to dive into this in any detail yet but it looks very interesting. From the abstract:

We describe and evaluate methods for learning to forecast forthcoming events of interest from a corpus containing 22 years of news stories.

The StackExchange on Working Remotely. I'm still amazed by how many companies don't let their staff do more of this. More often than not it's nothing to do with any technical or communication issues, in my experience.

This article on the difference between public and private unit tests is not new, over a year old in fact, but well worth the read. So many malpractices like these creep in, particularly in places where the team is only partly on board with the idea.

Finally, I've just started getting me some Headspace. This is a simple set of meditations to listen to. I've started their 10 day series, and I've already managed to miss a couple of days, but I'm going to persevere. My wife is trying it too.

Posted on February 8, 2013 and filed under links.

Up the Google App Engine Backend

I've been playing with the Google App Engine Backend service recently. Google's documentation on this is clear but it could really do with a tutorial. This post isn't anything like that desired tutorial but a recap of what I discovered and ended up with trying to address my particular need, which may be helpful!

I've been using the Backend in a game engine I'm writing. I'm using Google's Channel API to send messages to the client, but the game needs a background thread to handle the game engine itself. More preciesly, each game process is likely to run longer than the 60 second limit normally imposed on GAE requests. This is where the backend comes in.

Here I create a backend called zengine:

<backends>
    <backend name="zengine">
        <options>
            <dynamic>true</dynamic>
            <fail-fast>true</fail-fast>
        </options>
    </backend>
</backends>

I put this configuration in a file called backends.xml. A word about the options used here:

  • dynamic: launches the backend upon request, and that backend will die when finished
  • fail-fast: give up immediately when trying to launch the backend

To start the backend from my game launching code I use a queue, via the following code:

Queue queue = QueueFactory.getDefaultQueue();
 queue.add(withUrl("/zengine").header("Host",backendService.getBackendAddress("zengine")));

This will create a GAE queue task which will execute the path /zengine in the backend environment called zengine. The code backendService.getBackendAddress("zengine") creates an address that works both in the production and local development environments. This is crucial, because the dynamic instances have an otherwise random and unguessable address - in dev environments they are on a random port!

By passing in the Host header with the address of the backend my /zengine servlet will run in the zengine backend environment instead of the normal, 60-second-limited one.

The actual task item will in practice contain parameters that specify which game process to start, ie for what user and what game. My /zengine path is mapped to the game process itself. It merely needs to start the process. Using the supplied queue task parameters I can get a handle on the channel for communication back to the client, and start the game.

Posted on February 5, 2013 and filed under dev.

Maven GAE Plugin Version Issues

While deploying something to Google App Engine this morning I noticed this minor but annoying warning:

[INFO] Updating Google App Engine Server...
********************************************************
There is a new version of the SDK available.
-----------
Latest SDK:
Release: 1.7.4
Timestamp: Sun Nov 11 09:09:13 GMT 2012
API versions: [1.0]

-----------
Your SDK:
Release: 1.7.3
Timestamp: Wed Oct 24 02:01:39 BST 2012
API versions: [1.0]

-----------
Please visit https://developers.google.com/appengine/downloads for the latest SDK.

I say annoying because I'd already specified the gae.version property to be 1.7.4, as per the plugin documentation. It was also definitely pointing to a 1.7.4 SDK installation.

If I run mvn gae:version it does indeed show the unwanted version:

INFO] Plugin version: 0.9.5
[INFO] SDK directory: /Users/nick/dev/support/appengine-java-sdk-1.7.4
[INFO] SDK version:
Release: 1.7.3
Timestamp: Wed Oct 24 02:01:39 BST 2012
API versions: [1.0]

Searching about I found a similar problem in previous versions of the plugin, a couple of years ago. Unfortunately the solutions given there don't work any more, because neither the appengine-local-runtime nor appengine-tools-api artifacts are available any more, at least from the central maven repository.

However a bit of trial error gave me this solution, to include appengine-tools-sdk instead as follows:

 <plugin>
    <groupId>net.kindleit</groupId>
    <artifactId>maven-gae-plugin</artifactId>
    <version>0.9.5</version>

    <dependencies>
        <dependency>
            <groupId>com.google.appengine</groupId>
            <artifactId>appengine-tools-sdk</artifactId>
            <version>${gae.version}</version>
            <scope>runtime</scope>
        </dependency>
    </dependencies>
</plugin>

Then when running gae:version we get the correct version shown:

[INFO] Plugin version: 0.9.5
[INFO] SDK directory: /Users/nick/dev/support/appengine-java-sdk-1.7.4
[INFO] SDK version:
Release: 1.7.4
Timestamp: Tue Dec 11 11:41:31 GMT 2012
API versions: [1.0]

and the warning no longer shows when running mvn gae:deploy.

Posted on January 29, 2013 and filed under dev.

Google App Engine & Maven Tips

I found this useful blog posting by Gunawan Deng very useful in getting me started on using Maven to create Google App Engine projects.

Here's another tip. I ran into an error when trying to run the app locally with gae:run:

[ERROR] Failed to execute goal net.kindleit:maven-gae-plugin:0.9.5:run (default-cli) on project extropymvnspike: ${gae.home} is not a directory: ${gae.home} is not a directory:
[ERROR] Please set the sdkDir configuration in your pom.xml to a valid directory. Make sure you have correctly extracted the app engine sdk.

The error was easy to fix - one can add a property for gae.home in your pom.xml called gae.home which should point to the installation directory of your app engine SDK. However, I found it more useful to add it to my settings.xml as I imagined I would want it for all my GAE projects:

<profiles>
        <profile>
                <id>global.properties.profile</id>
                <properties>
                    <gae.home>/Users/nick/dev/support/appengine-java-sdk-1.7.4</gae.home>
                </properties>
        </profile>
</profiles>

<activeProfiles>
        <activeProfile>global.properties.profile</activeProfile>
</activeProfiles>

Any other tips for Maven/GAE integration?

Posted on January 25, 2013 and filed under dev.

Session-based Security in Dropwizard

Dropwizard is an incredibly useful framework for creating REST apis very quickly. One thing that it doesn't come with out of the box (yet) is support for session-based security, that is, holding principal information with a plain old HttpSession.

There are excellent reasons not to do this, particularly for REST apis. Session management can be a fiddly business that isn't particularly scalable. However, sometimes you need it anyway.

Here's how to add a simple annotation-based scheme, drawing heavily on the very useful posting by Antoine Vianey.

Code

Let's add an annotation to represent a logged in user, i.e. a user that's been set into the session as an attribute:

@Target({ElementType.PARAMETER, ElementType.METHOD, ElementType.FIELD})
@Retention(RetentionPolicy.RUNTIME)
public @interface SessionUser {
}

With this we can write an authentication provider:

@Provider
public class SessionUserProvider
        implements Injectable<User>, InjectableProvider<SessionUser, Type> {

    private final HttpServletRequest request;

    public SessionUserProvider(@Context HttpServletRequest request) {
        this.request = request;
    }

    @Override
    public Injectable<User> getInjectable(ComponentContext cc, SessionUser a, Type c) {
        if (c.equals(User.class)) {
            return this;
        }
        return null;
    }

    @Override
    public ComponentScope getScope() {
        return ComponentScope.PerRequest;
    }

    @Override
    public User getValue() {
        final User user = (User) request.getSession().getAttribute("user");
        if (user == null) {
            throw new WebApplicationException(Response.Status.UNAUTHORIZED);
        }
        return user;
    }

}

This provider will inject a user object from the session which can be used in our resource methods. If there is no such object, the Response.Status.UNAUTHORISED (i.e. 401) is returned.

Configuration

To wire up our provider we need to add these lines to our Service run method:

environment.setSessionHandler(new SessionHandler());
environment.scanPackagesForResourcesAndProviders(SessionUserProvider.class);

The first line enables session support, which is not enabled by default. The second line wires up our new authentication provider in the request scope.

Usage

To use this we just add a @SessionUser annotated parameter to our resource methods, eg:

@GET
public MessageRepresentation getMessage(@SessionUser User user) {

    return new MessageRepresentation("hello");
}

and always have a valid user object available in our resource methods.

Login

To set the User into the session in the first place, for example from the submission of a login form, we just need to get hold of it from the request context, eg:

@POST
public HomeView login(@FormParam("username") String username, @FormParam("password") String password,
                      @Context HttpServletRequest request) {

    // should lookup the user etc. here - let's assume the user is fine and dandy to proceed for this example

    request.getSession().setAttribute("user", new User());

    return new HomeView();
}

Conclusion

This is an easy and flexible way to add session management to your resources when you need it. As Antoine adds, one can create other annotations for different roles, eg a @SessionAdminUser, if you want to restrict different methods to different roles.

Posted on January 23, 2013 and filed under dev.