Tech Talk: We <3 Big Data

At The Blue Alliance, we pride ourselves on maintaining the largest database of FRC event results. This data powers everything we do – showing it to you on thebluealliance.com and in the Android app, calculating numerical insights into how various games play out, and powering your own applications via our APIs.

Often, people ask us interesting questions – like “how many times have teams 254 and 971 played against each other?” Traditionally, one of the admins would write a script in python that would calculate this by directly querying the TBA datastore or anyone could write a script using the API. While a fun exercise in programming, this is not always the best approach. Having an easier way to compute various queries over the TBA data would unblock many insightful observations across the FRC community.

To solve this, we have created a public Google BigQuery dataset with all of TBA’s data. BigQuery is a large-scale data warehouse that is easy to query. You can run queries across the TBA database using standard SQL 2011. The dataset is available at https://thebluealliance.com/bigquery.

Note: The schema of the BigQuery tables exactly matches the schema of the internal TBA datastore, so some fields have been deprecated or are not ideally represented. Many times, enums are stored as integers – you can match them to human readable values in the code

For example, if you wanted to answer the above question (“how many times have teams 254 and 971 played against each other?”), you can run the following query:

	#standardSQL
	SELECT match_key, red_score, blue_score, red_teams, blue_teams
	FROM (
	SELECT __key__.name as match_key, team_key_names,
	JSON_EXTRACT_SCALAR(alliances_json, '$.red.score') as red_score,
	JSON_EXTRACT_SCALAR(alliances_json, '$.blue.score') as blue_score,
	REGEXP_EXTRACT_ALL(JSON_EXTRACT(alliances_json, '$.red.teams'), '"(frc[0-9]+)"') AS red_teams,
	REGEXP_EXTRACT_ALL(JSON_EXTRACT(alliances_json, '$.blue.teams'), '"(frc[0-9]+)"') AS blue_teams
	FROM `tbatv-prod-hrd.the_blue_alliance.match`
	WHERE 'frc971' IN UNNEST(team_key_names) AND 'frc254' IN UNNEST(team_key_names))
	WHERE 'frc971' IN UNNEST(red_teams) AND 'frc254' IN UNNEST(blue_teams)
	OR 'frc971' IN UNNEST(blue_teams) AND 'frc254' IN UNNEST(red_teams)
	ORDER BY match_key
	LIMIT 100

view raw

254vs971.sql

hosted with ❤ by GitHub

In less than 5 seconds, we can see that the most recent of these matches was Quals 59 at the 2017 San Francisco Regional.

What if we wanted to find the event that had the most Unicorn Matches (where the winning alliance obtained the Rotor Bonus and KPA Bonus) during qualifications this year? Easy.

	#standardSQL
	SELECT
	event_key,
	COUNT(*) as unicorn_count
	FROM (
	SELECT
	__key__.name as match_key,
	event.name as event_key,
	JSON_EXTRACT_SCALAR(alliances_json, '$.red.score') as red_score,
	JSON_EXTRACT_SCALAR(alliances_json, '$.blue.score') as blue_score,
	JSON_EXTRACT_SCALAR(score_breakdown_json , '$.red.kPaRankingPointAchieved') as red_kpa,
	JSON_EXTRACT_SCALAR(score_breakdown_json , '$.red.rotorRankingPointAchieved') as red_rotor,
	JSON_EXTRACT_SCALAR(score_breakdown_json, '$.blue.kPaRankingPointAchieved') as blue_kpa,
	JSON_EXTRACT_SCALAR(score_breakdown_json, '$.blue.rotorRankingPointAchieved') as blue_rotor
	FROM `tbatv-prod-hrd.the_blue_alliance.match`
	WHERE comp_level = 'qm'
	)
	WHERE (red_score > blue_score AND red_kpa = 'true' AND red_rotor = 'true') OR (blue_score > red_score AND blue_kpa = 'true' AND blue_rotor = 'true')
	GROUP BY event_key
	ORDER BY unicorn_count DESC

view raw

find_unicorn.sql

hosted with ❤ by GitHub

Running this query, we see that IRI far and away has the most Unicorn Matches, but the Darwin Division and the Carver Division were tied for the most of any official event.

Screenshot from 2017-08-31 22-10-49

As an exercise, try changing this query to find out if there were any qualification matches where an alliance earned both bonuses but still lost.

In the coming weeks, we will build a system that automatically keeps this dataset up-to-date with the TBA database, so you’ll always be able to get the most accurate results for your queries.

We hope that you’ll enjoy having the full power of the TBA dataset available to query and can compute some interesting insights. When you do, don’t forget to share them with the community!

The Blue Alliance Blog

Analysis, stories, tips, and updates related to FIRST Robotics and The Blue Alliance

Tech Talk: We <3 Big Data

Leave a comment Cancel reply

Tech Talk: We <3 Big Data

Share this:

Related

Leave a comment Cancel reply