Tech Talk: We <3 Big Data

At The Blue Alliance, we pride ourselves on maintaining the largest database of FRC event results. This data powers everything we do – showing it to you on thebluealliance.com and in the Android app, calculating numerical insights into how various games play out, and powering your own applications via our APIs.

Often, people ask us interesting questions – like “how many times have teams 254 and 971 played against each other?” Traditionally, one of the admins would write a script in python that would calculate this by directly querying the TBA datastore or anyone could write a script using the API. While a fun exercise in programming, this is not always the best approach. Having an easier way to compute various queries over the TBA data would unblock many insightful observations across the FRC community.

To solve this, we have created a public Google BigQuery dataset with all of TBA’s data. BigQuery is a large-scale data warehouse that is easy to query. You can run queries across the TBA database using standard SQL 2011. The dataset is available at https://thebluealliance.com/bigquery.

Note: The schema of the BigQuery tables exactly matches the schema of the internal TBA datastore, so some fields have been deprecated or are not ideally represented. Many times, enums are stored as integers – you can match them to human readable values in the code

For example, if you wanted to answer the above question (“how many times have teams 254 and 971 played against each other?”), you can run the following query:

#standardSQL
SELECT match_key, red_score, blue_score, red_teams, blue_teams
FROM (
SELECT __key__.name as match_key, team_key_names,
JSON_EXTRACT_SCALAR(alliances_json, '$.red.score') as red_score,
JSON_EXTRACT_SCALAR(alliances_json, '$.blue.score') as blue_score,
REGEXP_EXTRACT_ALL(JSON_EXTRACT(alliances_json, '$.red.teams'), '"(frc[0-9]+)"') AS red_teams,
REGEXP_EXTRACT_ALL(JSON_EXTRACT(alliances_json, '$.blue.teams'), '"(frc[0-9]+)"') AS blue_teams
FROM `tbatv-prod-hrd.the_blue_alliance.match`
WHERE 'frc971' IN UNNEST(team_key_names) AND 'frc254' IN UNNEST(team_key_names))
WHERE 'frc971' IN UNNEST(red_teams) AND 'frc254' IN UNNEST(blue_teams)
OR 'frc971' IN UNNEST(blue_teams) AND 'frc254' IN UNNEST(red_teams)
ORDER BY match_key
LIMIT 100

view raw
254vs971.sql
hosted with ❤ by GitHub

In less than 5 seconds, we can see that the most recent of these matches was Quals 59 at the 2017 San Francisco Regional.

What if we wanted to find the event that had the most Unicorn Matches (where the winning alliance obtained the Rotor Bonus and KPA Bonus) during qualifications this year? Easy.

#standardSQL
SELECT
event_key,
COUNT(*) as unicorn_count
FROM (
SELECT
__key__.name as match_key,
event.name as event_key,
JSON_EXTRACT_SCALAR(alliances_json, '$.red.score') as red_score,
JSON_EXTRACT_SCALAR(alliances_json, '$.blue.score') as blue_score,
JSON_EXTRACT_SCALAR(score_breakdown_json , '$.red.kPaRankingPointAchieved') as red_kpa,
JSON_EXTRACT_SCALAR(score_breakdown_json , '$.red.rotorRankingPointAchieved') as red_rotor,
JSON_EXTRACT_SCALAR(score_breakdown_json, '$.blue.kPaRankingPointAchieved') as blue_kpa,
JSON_EXTRACT_SCALAR(score_breakdown_json, '$.blue.rotorRankingPointAchieved') as blue_rotor
FROM `tbatv-prod-hrd.the_blue_alliance.match`
WHERE comp_level = 'qm'
)
WHERE (red_score > blue_score AND red_kpa = 'true' AND red_rotor = 'true') OR (blue_score > red_score AND blue_kpa = 'true' AND blue_rotor = 'true')
GROUP BY event_key
ORDER BY unicorn_count DESC

view raw
find_unicorn.sql
hosted with ❤ by GitHub

Running this query, we see that IRI far and away has the most Unicorn Matches, but the Darwin Division and the Carver Division were tied for the most of any official event.

Screenshot from 2017-08-31 22-10-49

As an exercise, try changing this query to find out if there were any qualification matches where an alliance earned both bonuses but still lost.

In the coming weeks, we will build a system that automatically keeps this dataset up-to-date with the TBA database, so you’ll always be able to get the most accurate results for your queries.

We hope that you’ll enjoy having the full power of the TBA dataset available to query and can compute some interesting insights. When you do, don’t forget to share them with the community!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s