What is Koursaros

Background

Machine learning models are excellent specializers; they’re fantastic at being trained for a particular task and doing it really well. In many cases, these models are able to perform at superhuman levels on very defined tasks, such as Facebook AI’s RoBERTa model beating human baselines on the GLUE benchmark. Rest assured, this situation becomes tricky when applying these models to solve real-world problems, so bottle your fears of an impending Skynet future.

Interestingly though, many machine learning tasks often utilize the same fundamental information to solve problems. For instance, many text classification tasks can leverage an understanding of the English language, and image recognition models can use the same knowledge of shapes and colors to identify different objects in pictures. However, just to produce a model that has a basal understanding of something as general as a language is expensive. Google’s XLNET model, which recently set SOTA for certain NLP tasks, cost roughly $61,000 to train. Thankfully, big tech companies open source these pretrained models for the rest of the AI community. Developers will take these complex models and train them a wee bit more so they can identify cat facts, recognize pictures of potted plants, or what have you. This process is called fine-tuning. The pretrained-model-to-finetuned-application phenomena is an emerging trend, and for good reason. It is a powerful method of achieving cutting edge (and economically sane) results with ML applications.

An Introduction to the Platform

Designing for scale

There is currently no widely adopted computing platform designed specifically for efficiently tuning and deploying these models for development and production scaling (i.e. Kubernetes). If such a platform did exist, it would allow engineers to focus on what matters: the training data.

To fill this gap we propose Koursaros, a robust platform that allows you to train and deploy ensembles of pretrained models such as BERT and other SOTA natural language models, to a development environment or production Kubernetes cluster, with a single command. Rather than orchestrating the entire training pipeline from a single monolithic script, Koursaros relies on a distributed microservice infrastructure that leverages a robust messaging system (namely RabbitMQ) to handle buffering between services. This takes the burden off of Python, which, while slow, is necessary for using most cutting edge machine learning libraries.

alt text

At a glance

The fundamental unit of Koursaros is a stub. Each microservice has a predefined set of stubs that designate where messages get sent to and from. Stubs can be combined in different ways to define an _action _in a yaml. For example, here is a simplified version of our fact-checking pipeline defined as an action:

  actions:

     factchecking:

       stubs:

         send: client.sendClaims() -> Claim                                   | retrieval

         retrieval: retriever.getRelatedArticles(Claim) -> ClaimWithArticles  | compare

         compare: model.factcheck(ClaimWithArticles)                        ->|

Koursaros can take this yaml definition, spin up each microservice on separate processes, and route the messaging system between the services. If a microservice has multiple stubs, each stub gets its own thread. The type of message being sent is also defined ahead of time. So for the send stub, the client executes the sendClaims function and returns a Claim message to the retrieval stub. The claim message gets routed to the retriever, who takes that Claim and finds related articles with the getRelatedArticles function, and returns a ClaimWithArticles message.

Separating out our services like this allows us to independently scale each component of the pipeline. Additionally, the flow of jobs on Koursaros is unidirectional: messages are sent downstream without recipients talking back to senders. Overall, this creates a scalable, non-blocking, low-latency architecture.

Features

Koursaros Features:

  • Distributed computing environment with pub-sub architecture to remove backchatter
  • Defined protobufs (schemas) which are compiled into python classes, and actions which describe how these protobuffs are routed for various actions.
  • Each stub (function) within python microservice can be referenced directly for routing, so multiple stubs can be in the same or different microservices if it makes sense for them to share memory space (e..g same model for encoding and classifying) or have different memory space.
  • Deploy to production with build in deployment build pipeline and define replicas for deploying to Kubernetes or another orchestration service with support for GPU acceleration.

Applying Koursaros: Automated Fact-Checking

The goal

We became aware of the need for such a platform while working on the task of automated fact checking. Our goal was to beat the top performers in competitions related to this space, such as the Fake News Challenge (FNC) and the Fact Extraction and Verification (FEVER) workshop. We wanted to see if they could be employed for real-world, low-latency fact checking.

Training Elasticsearch

Rather than relying on hand built matching and filtering heuristics, such as the “entity linking” described by Athene (the top team by information retrieval (IR) for Fever) we were interested in building a system that could be trained for optimal recall on a IR task such as that proposed by FEVER. Additionally such systems could be trained in production by incorporating user feedback.

One of our primary concerns in designing such a system was performance and scalability. This is in keeping with our overall system design for Koursaros, which relies on pub-sub architecture and a robust broker.

We divide our search efforts into those powered primarily by machine learning, and those powered primarily by traditional text search. However, this distinction is blurred by our use of Bayesian optimization for tuning hyperparameters for text search.

Written on September 25, 2019