Testing Your Service With Poorly Remote Services

ToxiProxy allows you to create a controllable proxy between your app and a remote service when you want to test network reliability and related effects.

It can be downloaded in command line executable form that, when run, will start up a web service on port 8474. You can configure it using a REST client (eg Postman or cURL).

Say you want to configure a proxy for a remote service running on somewhere.else.com:2000. You can proxy it to localhost:3000 by POSTing this:

{
  "name":"badendpoint",
  "listen":"localhost:3000",
  "upstream":"somewhere.else.com:2000"
}

to http://localhost:8474/proxies

You can then GET, and update via a POST, this proxy at http://localhost:8474/proxies/badendpoint

By default the proxy will be enabled. You can change this, or anything else about it, by POSTing to that URL:

{
  "name":"badendpoint",
  "listen":"localhost:3000",
  "upstream":"somewhere.else.com:2000",
  "enabled": false
}

In this example above the proxy has been disabled - which can be used to simply test network disconnections.

Once you have a proxy you can add "toxics" to it, which simulate a variety of faults. The currently configured toxics for your proxy can be seen by making a GET request to:

http://localhost:8474/proxies/yourproxyname/toxics

To add a toxic simply post a toxic definition to the same endpoint. This example sets a timeout of 0 to the upstream proxy, which means hang forever when making the request:

{
  "type": "timeout",
  "stream": "upstream",
  "attributes": {
      "timeout": 0
  }
}

Take a look at the toxics section of the main docs to see what can be set - these include latency, jitter, bandwidth limits or custom ones.

All of this can be performed programmatically too if you want to incorporate this into your testing framework.

Posted on March 20, 2018 and filed under dev.

If I Arrive Then Record How Long My Journey Was

I've been trying out If This Then That for logging my commute times. I live on the far edge of Dartmoor from the office where I work, which is in Plymouth. The drive to Plymouth is pleasant enough but there are seemingly wild variations in the amount of time it takes me to cross the city to the office.

So, I've set up a recipe to log the following:

  1. the time I leave the house
  2. the time I reach the edge of the city
  3. the time I reach the office

Strictly speaking what I'm logging is the time I enter or leave a circle of a few hundred meters or so centred on:

  1. my house
  2. the point where I join the Plymouth Express Way
  3. the office

Using IFTTT I then log the dates and times into Google Sheets. Each log point will log to a separate spreadsheet, which looks something like this:

Each date needs to be matched with the dates in the sheets for the other log points, and then we can start analysing.

I will do some proper analysis in due course, but here's a quick taster: a plot of time at each log point over the course of several weeks.

Posted on April 8, 2017 and filed under data.

Keeping Spring Boot Camels alive

We had a situation where a camel route running in a non-web Spring Boot application would die occasionally, taking down the application with it.

The route in question served as a bridge between two JMS brokers, one remote, not under our control, and one local to us. It turned out that the remote broker would become unreachable from time to time, usually not for more than a few seconds. A not unexpected or unreasonable state of affairs. When this happened the following appears in the logs:

2017-03-07 13:39:33.792 [Camel (MyContext) thread #0 - JmsConsumer[myTopic]] ERROR o.a.c.c.j.DefaultJmsMessageListenerContainer - Could not refresh JMS Connection for destination 'myTopic' - retrying using FixedBackOff{interval=5000, currentAttempts=0, maxAttempts=unlimited}. Cause: Could not connect to broker URL: ssl://broker.somewhere.com:61616. Reason: java.net.ConnectException: Connection refused: connect

The exception looks fine - it's the behaviour of the application that is not. It should retry until the other end comes online again.

It turns out that this behaviour is caused by the way we're starting Camel in our application's main method. We have:

public class OurService {
...
public static void main(String[] args) {
    ApplicationContext applicationContext = SpringApplication.run(OurService.class, args);
}

The problem here is that when the Camel route dies (eg when it there is a connection issue in the "from" end of a route) then this application thread will also die. An attempt is actually made to reconnect to the remote broker, but not in time to save our app.

Camel provides a way of changing this behaviour, which is to use the CamelSpringBootApplicationController class. Add a couple of lines to the main method like so:

ApplicationContext applicationContext = SpringAp plication.run(OurService.class, args);
    CamelSpringBootApplicationController applicationController
            = applicationContext.getBean(CamelSpringBootApplicationController.class);
    applicationController.run();

Now CamelSpringBootApplicationController will effectively hang around until Spring Boot is shutdown. When the remote broker dies the reconnect attempts are made (forever) until successful, and then life can go on as before.

For further hints and tips about using Spring Boot with Camel check out Camel's page on the subject.

Posted on March 8, 2017 and filed under dev.

Useful Tools

I've recently stumbled across two great command line tools that I'd like to mention.

ack

ack is a superb alternative to grep, with programmers in mind. Take a look at the website for bullet points, but here's an example to give you the flavour:

ack --java InterestingClass

This will show all occurrences of "InterestingClass" in java files, recursively from your location.

sdkman

The other useful tool is sdkman. Think of it as a more general purpose rbenv, if you're familiar with that tool. It lets you select which particular version of (for example) gradle you want to work with. Or scala. Or ant. Or a number of other options. You can install these tools using sdkman too, as simple as:

sdk install gradle

Posted on January 26, 2016 and filed under dev.

Countotron

I've submitted my first app to the iTunes App Store. Don't expect anything that useful - it's part of my homework. It's the most minimal viable product I could think of! All it does it count when you tap it.

This has been a good learning experience for me. Swift is an interesting language to learn. The most painful part of the whole experience was trying to get the box to stay in the middle of screen in all orientations and screen sizes. I had flashbacks to my early attempts at responsive design in HTML.

Anyway, enjoy!

Posted on January 18, 2016 and filed under dev.

Mathsdown

I've been casting about for a way to show mathematical symbols easily in these blog postings. This problem has of course already been solved. There are a number of options, such as MathJax and KaTex for example.

I've chosen MathJax as it seems a bit simpler and more appropriate to my needs.

To get it working here on Squarespace I found a good example on Stackoverflow. In short, go to settings->advanced and then to the code injection section.

In the header put:

<script type="text/javascript" src="//cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

In the footer put:

<script> 
MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
</script>

And before you can say "Algebraic!", here is a lovely equation:

$$x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$

Posted on January 14, 2016 and filed under data.

Converting Apple HealthKit exports to CSV

I've been diving a bit deeper into Clojure recently, and have knocked up this shaky skeleton of a converter that takes Apple HealthKit export XML files and writes them out as pipe-delimited CSV files.

It's far (very, very far) from perfect, but it serves my needs for the moment! I'll be coming back to it from time to time as I learn more about the language.

Posted on December 17, 2015 and filed under dev, health.

Minimalist Clojure Tooling

I've been trying to get started again with Clojure, after a fairly shaky start a couple of years ago. To this end I've enlisted the excellent book Seven Languages in Seven Weeks by Bruce Tate. Each chapter covers a different language and the exercises are challenging enough to be interesting.

I've always been fond of a minimalist development environment (especially coming from a heavy Java background), so I tried out a couple of suggested approaches.

vim + fireplace

The first thing I tried was a combination of vim and the vim-fireplace plugin. Updating the version of vim that came with my mac, which turned out to be too old for the plugin, was only mildly fiddly with the assistance of homebrew. Regardless, macvim does work quite nicely. I also installed the vim-clojure-static plugin, which handles highlighting and syntax and so forth.

I found this a workable solution. I'm probably not as fast with vim as I should be, but I only needed a handful of key combinations to get a decent eval loop going.

However, it is not as slick as I would like. I had some trouble getting Eval to work in conjunction with lein repl to begin with, and the author himself states:

You know that one plugin that provides a REPL in a split window and works absolutely flawlessly, never breaking just because you did something innocuous like backspace through part of the prompt? No? Such a shame, you really would have liked it.

The screenshot above shows the evaluation for all the code. Note how it only shows the evaluation results for the last line.

Lighttable

Then I tried Lighttable. This project aims to provide instant feedback on your code as you develop. I backed the Kickstarter for this project quite a while ago and then never got a chance to use it. But it's come a long way since then, and I'm impressed.

It provides a nice clear view of your code, and best of all the eval-loop is near instant. The only delay I encountered was setting up the dependencies the first time. Above you can see a screenshot showing how the evaluation results appear inline. Note how all the line evaluations are shown, unlike the vim example.

Problems are shown inline too, for example:

Verdict

In the end I chose Lighttable. While still in alpha, I still found it less fragile and fiddly than vim+fireplace, and it provided quicker and better feedback.

Posted on December 15, 2015 and filed under dev.

Streaming twitter searches with node.js and docker

I've created an example of how to wrap up a simple node.js into a docker container. I'm indebted to William Blankenship's very clear article on how to do this. In fact I suggest you read that article first.

The example code is available on github. I've simply taken the twitter node.js streams example code and dockerised it. I've made a couple of small changes in how to run it.

Firstly I've used the dotenv library to read a .env file from your local checkout. This file can contain all your Twitter api keys etc.

Secondly I've changed the Dockerfile slightly so that command line arguments can be passed to the script, using the ENTRYPOINT directive. Here's the Dockerfile in full:

FROM nodesource/node:5.1.0

ADD package.json package.json
RUN npm install
ADD . .

ENTRYPOINT ["node","streamer.js"]

ENTRYPOINT simply allows command line arguments to be tacked onto the docker run at runtime. In this case I use it to specify the search term.

More details are available in the project's README.

Posted on November 23, 2015 and filed under dev.

Running the ironworker tutorial on OSX

I've been trying out ironworker to check out its asynchronous, (almost?) serverless job running abilities. I've had a lot of trouble getting Docker working in OSX, mainly network-related.

I've been concentrating on ironworker's node.js example. The first stumbling block I hit was that the first docker command in the tutorial:

docker run --rm -v "$PWD":/worker -w /worker iron/node:dev npm install

The reason being that npm couldn't see out to the great beyond:

npm WARN package.json @ No repository field.
npm WARN package.json @ No license field.
npm ERR! Linux 4.1.13-boot2docker
npm ERR! argv "/usr/bin/node" "/usr/bin/npm" "install"
npm ERR! node v4.1.2
npm ERR! npm v2.14.4
npm ERR! code EAIFAIL
npm ERR! errno EAI
FAIL
npm ERR! syscall getaddrinfo

npm ERR! network getaddrinfo EAI_FAIL registry.npmjs.org:443
npm ERR! network This is most likely not a problem with npm itself
npm ERR! network and is related to network connectivity.
npm ERR! network In most cases you are behind a proxy or have bad network settings.
npm ERR! network
npm ERR! network If you are behind a proxy, please make sure that the
npm ERR! network 'proxy' config is set properly. See: 'npm help config'

npm ERR! Please include the following file with any support request:
npm ERR! /worker/npm-debug.log

This is easily fixed on an adhoc basis by adding a dns flag to the command, eg:

docker run --rm -v "$PWD":/worker --dns 8.8.8.8 -w /worker iron/node:dev npm install

I don't as yet know what to do to fix this permanently.

The second issue, which is merely something not spelled out explicitly in the tutorial, is that an iron.js file needs to be present before zipping up the project for upload. This file contains the project id for ironworker. Without it you get:

iron worker upload --name hellojs --zip hello.zip iron/node node hello.js
did not find project id in any config files or env variables

Thereafter everything went smoothly.

Posted on November 23, 2015 and filed under dev.

Docker networking issue in OSX

I've been having trouble recently setting up Docker on OSX (El Capitan). I've found some respite by following the suggestions in this issue:

I had the problem earlier but it turned out that dhcp is off on the VirtualBox host only network that the docker "default" image attaches to. I'm on OSX but I think Windows too needs the "default" virtual machine to handle all the containers.
Posted on November 23, 2015 and filed under dev.

The Making Of Maps

I've started playing roleplaying games with my two children and to this end I've started drawing fantasy maps. I've found the following guides by Jonathan Roberts very helpful.

Here is a general guide on drawing a map in Photoshop. The author provides a series of steps that will help to create a fantasy map of all scales. 

In this article the author expands on a simple design technique for creating world maps prior to illustration.

As a bonus, here's a Wired slideshow of the sketches Tolkien used to develop his stories.

Posted on October 17, 2015 and filed under games.

Health Checks and Metrics in Spring

Spring Boot provides a comprehensive metrics package which you can use and extend with a minimum of fuss, but on occasion I've wished to add the same functionality to a Spring web application which doesn't use Spring Boot. 

We can do this using Dropwizard's metrics package without too much difficulty but it does need a little bit of glue. I won't go into detail about how to write healthchecks and metrics but I will elaborate on the wiring. It is my aim here to let Spring manage any healthcheck and metrics beans, and let the Metrics package handle the web endpoints.

Firstly you'll need a couple of dependencies for the metrics package. Here are the gradle entries I used:

compile 'com.codahale.metrics:metrics-core:3.0.2'
compile 'com.codahale.metrics:metrics-servlets:3.0.2'

Let's say I want a health check for the database connection. Here's a simple implementation which checks that a connection can be obtained:

@Component
public class DatabaseConnectionHealthCheck extends HealthCheck {

    public static final String HEALTHCHECK_NAME = "databaseConnection";

    @Autowired
    private HealthCheckRegistry healthCheckRegistry;

    @Autowired
    private DataSource dataSource;

    @PostConstruct
    public void addToRegistry() {
        healthCheckRegistry.register(HEALTHCHECK_NAME, this);
    }

    @Override
    protected Result check() {

      try (Connection connection = dataSource.getConnection()) {
          return HealthCheck.Result.healthy();
      } catch (SQLException e) {
          return HealthCheck.Result.unhealthy(e.getMessage());
      }
    }
}

This class is wired into our app as a Spring bean, and importantly it has a @PostConstruct method which wires it into the Metrics' own context, the HealthCheckRegistry. Now the fiddly part is injecting this registry, and the MetricsRegistry, into the Spring context as well.

We will need a few classes of our own to do this. The Metrics endpoints may be configured in our web.xml, eg:

<servlet>
    <servlet-name>metrics</servlet-name>
    <servlet-class>com.codahale.metrics.servlets.AdminServlet</servlet-class>
</servlet>

<servlet-mapping>
    <servlet-name>metrics</servlet-name>
    <url-pattern>/metrics/*</url-pattern>
</servlet-mapping>

The AdminServlet looks for instances of MetricsRegistry and HealthCheckRegistry to find instances of metrics and healthchecks to handle. We can supply our own Spring configuration which creates these:

@Configuration
public class MetricsConfiguration {

    @Bean
    public HealthCheckRegistry newHealthCheckRegistry() {
        return new HealthCheckRegistry();
    }

    @Bean
    public MetricRegistry newMetricRegistry() {
        return new MetricRegistry();
    }
}

This creates the two registries as Spring beans. In order to make these visible to the Metrics code (i.e. the AdminServlet) we can use a Spring ServletContextListener: 

public class MetricsServletsWiringContextListener implements ServletContextListener {

  @Autowired
  private MetricRegistry metricRegistry;

  @Autowired
  private HealthCheckRegistry healthCheckRegistry;

  private MetricsServletContextListener metricsServletContextListener;
  private HealthCheckServletContextListener healthCheckServletContextListener;

  @Override
  public void contextInitialized(ServletContextEvent event) {
          WebApplicationContextUtils.getRequiredWebApplicationContext(event.getServletContext())
              .getAutowireCapableBeanFactory()
              .autowireBean(this);

      metricsServletContextListener = new MetricsServletContextListener(metricRegistry);
      healthCheckServletContextListener = new HealthCheckServletContextListener(healthCheckRegistry);

      metricsServletContextListener.contextInitialized(event);
      healthCheckServletContextListener.contextInitialized(event);
  }

  @Override
  public void contextDestroyed(ServletContextEvent event) {
  }

}

Let's unpack this a bit. Firstly the two registries are @Autowired in from Spring. Next we have two classes, MetricsServletContextListener and HealthcheckServletContextListener. More on these later.

The meat of this class lies in the contextInitialized method. This is called when the web context has been initialized. The first thing the method does is autowire this listener itself into the Spring context:

WebApplicationContextUtils.getRequiredWebApplicationContext(event.getServletContext())
            .getAutowireCapableBeanFactory()
            .autowireBean(this);

Then we take the two aforementioned listener classes and inject the registries into them. The magic happens inside their contextInitialized methods. This part is a bit of a hack - what we're doing here is re-using some Metrics servlet context code to make various beans available as servlet context attributes.

Now these two other classes are subclasses of ContextListeners supplied by the Metrics package. Here is the health check one:

public class HealthCheckServletContextListener extends  HealthCheckServlet.ContextListener {

private final HealthCheckRegistry registry;

public HealthCheckServletContextListener(HealthCheckRegistry metricRegistry) {
    this.registry = metricRegistry;
}

@Override
protected HealthCheckRegistry getHealthCheckRegistry() {
    return registry;
}

}

If one were to take a look inside the contextInitialized method one would see things like this:

context.setAttribute(METRICS_REGISTRY, getMetricRegistry());

Now one could perform these steps manually without extending the Metrics ContextListener classes, but I'll leave that as an exercise for the reader. The MetricsServletContextListener works the same way, but extends MetricsServlet.ContextListener instead.

Finally one needs to configure the MetricsServletsWiringContextListener in the web.xml:

<listener>
      <listener-class>uk.co.tpplc.digitaltrade.app.metrics.MetricsServletsWiringContextListener</listener-class>
  </listener>

And there you have it. It's a bit convoluted but at this point one can add HealthChecks and Metrics as required. 

Posted on July 9, 2015 and filed under dev.

Camel Event Logging With Spring Java Configuration

Here's how to set up a Camel EventNotifier which is managed by Spring Java Configuration. You may wish to do this if you EventNotifier implementation has further Spring-based dependencies of its own.

We can do this by having our Spring config extend CamelConfiguration:

@Configuration
public class MyConfig extends CamelConfiguration {
...
}

Then we can wire in our custom EventNotifier (and any other beans it requires):

@Bean
public EventNotifier newEventNotifier() {
    CustomEventNotifier eventNotifier = new CustomEventNotifier();
    eventNotifier.setIgnoreCamelContextEvents(true);
    eventNotifier.setIgnoreRouteEvents(true);
    eventNotifier.setIgnoreServiceEvents(true);
    eventNotifier.setIgnoreExchangeCreatedEvent(true);
    // etc.
    return eventNotifier;
}

Finally, and here's the critical part for Camel, we need to inject it into the CamelContext. We can do this by overriding CamelConfiguration's setupCamelContext method, which will do our work at the right time in the Spring+Camel lifecycle:

@Override
protected void setupCamelContext(CamelContext camelContext) {
    ApplicationContext ctx = getApplicationContext();
    ManagementStrategy mgtStrategy = camelContext.getManagementStrategy()  ;      
    mgtStrategy.addEventNotifier(ctx.getBean(EventNotifier.class));
}
Posted on July 9, 2015 and filed under dev.

Features Over Artifacts

Artifactitis

Engineers have a natural tendency to focus on artifacts - classes, layers, deployable services and the like. This can be problematic when trying to slice requirements for delivery, all the way from the task up to the feature level. In this article I'll concentrate on issues with marking features as done while still thinking primarily of artifacts.

I've seen this recently where a feature which spans a number of microservces was split into stories - let's say one for each service. A not unreasonable view is that the story should be marked as done when that service change reaches production.

However, the service changes have some interdependencies. Given that the service changes don't all make it to production at the same time, then some stories would be blocked until their dependencies also made it to production. Now in one sense this is perfectly true: if the service can't be released until its dependencies have also been released then clearly it's unreleasable. But in another sense all the work - all the potential business value, all the coding, and probably even any infrastructure work required for physical deployment - is ready for delivery. If we are using these stories to track progress or burndowns then this information is lost.

The problem here is treating the artifact (the new or updated service) as the deliverable, whereas it's more useful to say that it's the feature that needs to go live. The artifacts are just a means of delivering the feature.

Features As Deliverables

So instead let's make the feature first class trackable items in our development workflow. In our imaginary kanban our stories fly (or crawl) across the board. But we can have another kanban at the feature level.

Note that the columns here naturally fit deployment environments, or more generally, configurations. Naturally, green means that all the tests in that combination of feature and configuration passed; red means failure, either test failure or because the code hasn't reached that configuration yet (one could presumably refine this further if one wants to see those two failure states seperately).

BDD From End To End

Using Cucumber (or any other similar BDD framework) we can write sets of scenarios for each of these features and have the underlying step definitions do the right thing in each configuration.

Consider this simple scenario for sending invoices from a backend system to an external client:

Given an invoice is produced in the backend

When the invoice batch is loaded by the system

Then the invoice is delivered to the client

Let's assume the system above consists of a number of services.

Now for the Integration configuration this might involve mocking various services or using a certain subset, whereas for Production this would be the entire set of interdependent live services. And all the scenarios have to pass in a given configuration for that stage in the board to go green.

Posted on June 25, 2015 and filed under dev.

Escaping Endpoint URIs in Camel

Camel endpoint URIs (eg for file or sftp endpoints) are URI encoded. This can cause problems with the URI needs to contain special characters such as "+". This might occur with a regex filter for example:

filter=my-awesome-regex\d+

Here the + would be treated as a URI-encoded space, and the regex would likely not match as expected.

A simple way around this is to use the RAW operator. This can be placed around a regex. To return to the previous example, one could put:

filter=RAW(my-awesome-regex\d+)

and all should be well.

Posted on December 23, 2014 and filed under dev.

How to get character encoding correct on Google App Engine | MacGyver Development

I've been having endless trouble trying to force a particular encoding for some content on the Google App Engine. It was complicated by my Mac's insistence on MacRoman, but even when forcing a file encoding of UTF-8 my web pages would still show up with funny ?s all over the shop.

The Spring CharacterEncodingFilter described in the linked blog post did the trick. Can you think of any other way of doing this without a filter?

Posted on May 22, 2013 and filed under dev.

Everything announced at the Google I/O 2013 keynote in one handy list

Google has completed its mammoth 3-hour I/O 2013 keynote, and many announcements were made. We’ve compiled a handy list so you can catch up and make sure you haven’t missed anything.

 A very handy list from The Next Web about all the exciting announcements made at Google I/O 2013 - well worth checking out. I'm particularly interested in exploring the Compute Engine, more of which later.

Posted on May 17, 2013 and filed under dev.

Simulating the High Replication Datastore Locally

Recently I was trying to add transactional support to certain batch processes in a Google App Engine app and it was coming up with strange errors. I was using Objectify. In particular it would tell me that it

can't operate on multiple entity groups in a single transaction

To my knowledge, I wasn't trying to operate on multiple entity groups. The problem turned out to be that the local dev datastore doesn't simulate the eventual consistency of the live datastore properly.

However, it's possible to turn on a simulation of this behaviour in your local app by following these instructions, ie by passing this command option:

-Ddatastore.default_high_rep_job_policy_unapplied_job_pct=1

The 1 on the end is the percentage of eventual consistency you want to see in your datastore. In practice any number bigger than zero is enough to get Objectify transactions working properly locally in these situations.

If you are using Maven to run your apps using the maven-gae-plugin, then you can configure this option in your pom.xml as follows:

        <plugin>
            <groupId>net.kindleit</groupId>
            <artifactId>maven-gae-plugin</artifactId>
            <version>0.9.6</version>
            <configuration>
                <jvmFlags>
                    <jvmFlag>-Ddatastore.default_high_rep_job_policy_unapplied_job_pct=1</jvmFlag>
                </jvmFlags>
            </configuration>
            <dependencies>
                <dependency>
                    <groupId>net.kindleit</groupId>
                    <artifactId>gae-runtime</artifactId>
                    <version>${gae-runtime.version}</version>
                    <type>pom</type>
                </dependency>
            </dependencies>
        </plugin>
Posted on May 8, 2013 and filed under dev.

Google Prediction API in the App Engine

I've now integrated the Google Prediction API into a Google App Engine project, in order to supply sentiment prediction at runtime. I wanted to use a service account to access the model via the Google api client libraries for Java. This has proven trickier than I first imagined, but the code is ultimately straightforward.

Some caveats

Use your service account, not the APIs Explorer

I originally set up my model and queried it using the APIs Explorer. Unfortunately I didn't realise that although I was using the same google account to configure access to the API from the app (see below) as I was using to train the model, the one can't see the other. In other words, the service account and the google account are separate, and they can't see each other's data. The upshot of this is that I have to train my model programatically using the service account, if I want to query it programatically too.

If you set up your Google APIs Console project from a Google Apps account, don't

The problem is that your service account needs to allow access to your Google App Engine "Service Account Name" - see under the Application Settings for your app, it will be of the form something@appspot.gserviceaccount.com. Unfortunately you need to add this to your Team for the Google APIs Console project. And it won't let you if you're logged in to your Google Apps account. For example, I'm logged in under extropy.net, and it will complain if you add an address that doesn't belong to this domain. I couldn't find any mention of this problem in the documentation anywhere.

The solution is to create a new Google APIs Console project under a regular gmail account. You can then add any email address it seems, including appspot ones.

If you have a custom domain for your app that matches your Google APIs Console account you may be able to ignore this, because the domain names will match, but I'm not able to confirm that.

The Code

In the end the code is quite simple, although the documentation is misleading and in a number of cases out of date. This is what I found that worked...

Following the examples from the Google API client libraries I created a utility class like so:

static final HttpTransport HTTP_TRANSPORT = new UrlFetchTransport();

static final String MODEL_ID = "your_model_id";
static final String STORAGE_DATA_LOCATION = "path_to_your_training_data.csv";

static final String API_KEY = "the_key_from_the_apis_console";

/**
 * Global instance of the JSON factory.
 */
static final JsonFactory JSON_FACTORY = new JacksonFactory();
public static final String APPLICATION_NAME = "Grokmood";

public static Prediction getPrediction() throws Exception {

    AppIdentityCredential credential =
            new AppIdentityCredential(Arrays.asList(PredictionScopes.PREDICTION));

    Prediction prediction = new Prediction.Builder(
            HTTP_TRANSPORT, JSON_FACTORY, credential).setApplicationName(APPLICATION_NAME)
            .build();

    return prediction;
}

This will create a Prediction object for you. It uses the AppIdentityCredential to access your service account. I found the documentation for this somewhat scarce.

To train the model call this method:

public static void train(Prediction prediction) throws IOException {

    Training training = new Training();
    training.setId(MODEL_ID);
    training.setStorageDataLocation(STORAGE_DATA_LOCATION);

    prediction.trainedmodels().insert(training).setKey(API_KEY).execute();
}

And you can query it like so:

public static String predict(Prediction prediction, String text) throws IOException {
    Input input = new Input();
    Input.InputInput inputInput = new Input.InputInput();
    inputInput.setCsvInstance(Collections.<Object>singletonList(text));
    input.setInput(inputInput);
    Output output = prediction.trainedmodels().predict(MODEL_ID, input).setKey(API_KEY).setId(MODEL_ID).execute();
    return output.getOutputLabel();
}

This last method will simply return positive, negative or neutral for my sentiment model.

When training the model don't forget that this takes some time- the method just kicks off the training process and returns immediately. For an example of how to wait until the model is finished please take a look at PredictionSample.java. I just kicked off training and came back quarter of an hour later. Remember that you can't see the status except by querying from the service account - one could add a similar method using trainedmodels().get() to review the training status too.

One last caveat - none of the above will work locally. This is another difference between local and production Google App Engine environments. Once it's been deployed the app will correctly identify itself, but there's no way to do that when running on your own machine. You will either have to fake your API responses locally, or use a different authentication method - one could use OAuth to authenticate yourself after logging in with a google account. You'd then have two different means of authenticating, one for local, and one for production...

Posted on April 12, 2013 and filed under dev.