Tech Talk: How The Blue Alliance Gets Data

Welcome to the next installment in The Blue Alliance Blog’s tech talk series. If you haven’t already, be sure to check out past tech talks on TBA’s caching layers and how we predict match times.

If you’ve ever wondered where The Blue Alliance gets its data from, wonder no more! This blog post will walk you through how TBA’s incoming datafeeds have evolved over the years. At its core, The Blue Alliance is just a collection of cronjobs that periodically fetch data from FIRST’s official sources. During a live event, match data and rankings are updated every minute or two. Data that changes less frequently (like the list of all events in a given year) is updated once per day. This approach is called polling (as opposed to an event driven method, like interrupts or webhooks).

The Beginning: HTML Pages

You may remember how official match results were posted before the 2015 season – they were HTML pages that contained a table of results. After each match, FMS would generate these pages and upload them to FIRST’s servers via FTP. Believe it or not, these pages were also how TBA obtained its data.

 

 

The Blue Alliance servers would fetch the contents of these pages at predefined intervals and parse the information out of the resulting page. Parsing HTML is notoriously tricky – TBA used a third-party library called BeautifulSoup to assist in this parsing. The HTML parsing code looked like this – it wasn’t the prettiest code ever written, but it got the job done until 2014.

perl_problems
Not a good way to parse HTML…

Even back then, it was important to make sure that the data import code worked reliably – if the datafeed code didn’t work, then TBA wouldn’t have any data! We wrote unit tests to ensure it functioned properly and catch bugs before they made it into production. Having tests gave us greater confidence to make changes all across the site, since we knew that our data import code would work as long as the tests passed.

Real-Time Data from Twitter

Starting in 2009, FMS began posting match results in real-time to the Twitter account @FRCFMS. At a glance, the tweets appeared somewhat cryptic, but they actually contain enough information to import match results to The Blue Alliance.

This is the tweet for Einstein 2014 Finals match 3. The tweet starts contains the short event code (CMP for Einstein), the match type and number (E 9 means Elims match #9, the final match in the Einstein sequence), red and blue final scores (RF and BF), and the teams on each alliance (RA and BA). Here is the code that parses these tweets. The Twitter datafeed was especially useful for offseason events, as FMS would tweet results even if it wasn’t uploading the “official” HTML pages. These tweets continued through the 2014 season, after which FMS underwent a major rewrite.

2015 and Beyond: Iterate

The 2015 season brought a lot of changes to FMS that you probably didn’t notice. Under the hood, the field software underwent a major rewrite and the ways FIRST published official match results changed drastically (for the better). The old HTML tables were gone and a more modern solution was in. That year, FIRST announced a developer API that would provide all the information the community had come to expect regarding event results. As an aside, it’s interesting to go back to my notes from that meeting and my reaction at the time. Since then, the API has gone through two revisions and become a reliable source of detailed information for TBA. One of the biggest improvements here is that the FRC API includes detailed score breakdowns, so you can see exactly how an alliance got to their final score. This enables TBA to compute more advanced analytics and statistics, but that’s for another blog post.

Even with all the changes in the way data is presented, TBA still functions the same way at the core. Every minute or two, a new request will be issued to update a live event’s matches. The only difference between an event in 2017 and one in 2012 is that we’re now parsing JSON output from a dedicated API instead of HTML tables from a webpage.

Other Ways to Get Data

There are many other ways we’ve used over the years to import data from offseason events to TBA, but I’ll just touch on a few here. Again, we’ve come a long way over the years – from sharing csv files in a Facebook group to building tooling to let events post their own results. Team 254 has built Cheesy Arena, a Field System that can automatically post results to TBA and we built TBA Event Wizard, a fully-fledged webpage that allows events to import FMS reports and live scores (which grew out of a 2 AM coding session during Championship 2015, but that’s also for another blog post). If you know of an offseason event that needs a way to share its results with the FRC community, check out our guide here!

In conclusion, there are many ways for FRC event data to flow into The Blue Alliance. These methods have changed significantly over the years, and the TBA team is constantly working hard to ensure that the site has up to date, complete, and reliable data sources.

If you think these problems are interesting and would like to help our team solve them, don’t forget that TBA is open source and we’re always looking for new contributors!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s