METADATA

137 things the Republican party wants to know about every American voter

A sample of data collected by a Republican National Committee contractor
A sample of data collected by a Republican National Committee contractor
We may earn a commission from links on this page.

It’s common knowledge that political parties in the United States collect information about potential voters, but exactly how comprehensive is the data they collet?

To explore that question, we set out to find detailed descriptions of 176 data points the Republican National Committee (RNC) has been gathering about voters since at least 2008. The data points were published on June 19 by a cybersecurity company after one of its analysts found a trove of voter records that had been inadvertently left on a public Amazon server. Our search led us to previously unreported data sources, providing a more complete view of what the Republican party tracks about American voters.

The voter records were discovered on June 12 by Chris Vickery, a risk analyst at the security firm UpGuard. While scanning the internet for misconfigured systems, Vickery came across an 11-terabyte cache of election-related data that he later learned had been compiled by three Republican contractors: TargetPoint Consulting, the Data Trust, and Deep Root Analytics. In a public statement, Deep Root took responsibility for leaving the data on the Amazon server where Vickery found it, saying the data had only been used to “inform local television ad buying.”

Included within the files Vickery found were 102 massive spreadsheets–two for every state and the District of Columbia. For each state, one file contained voter data based on the 2008 election, and the other based on the 2012 election. In its blog post, UpGuard listed the 176 categories that made up the column headers in those spreadsheets. Some were self-explanatory, such as “FirstName” and “OfficialParty,” but others were not, such as “VH12PP” and “RNCCalcParty.” A few were somewhat clear, such as “ModeledEthnicGroup,” which suggests data about a voter’s ethnic group as determined by predictive modeling, but what those groups were was less clear. UpGuard declined to share further details about the data, citing the inherent privacy violations that could come with such a disclosure.

However, we were able to match up the categories revealed by UpGuard with other sources and obtain detailed descriptions of most of them.

First, we found that the unique field names listed in UpGuard’s blog post match up with those used in a now-offline API that appears to have been built by the Data Trust for the RNC. The RNC’s API, which was previously hosted at docs.api.gop.com, is no longer online, and cached versions of it only show an Amazon AWS login page. But very specific Google searches, such as site:docs.api.gop.com VH12G matched 137 of the 176 categories UpGuard listed, and most of those revealed the category’s descriptions. Some fields were slightly different, listed in UpGuard’s post as “RegistrationAddr1” and in the API as “Registration_Addr1,” for example, but the added underscores were the only inconsistencies.

A Google search revealing the description of “VH12G”
A Google search revealing the description of “VH12G”
Image: Google

Additionally, a GitHub account owned by the Data Trust includes a repository called “direct-api-examples” that also references many of the field names, and includes example uses of what appears to be an early version of the API, which it calls the “GOP Data Trust API.”

A repository in the Data Trust’s GitHub account
A repository in the Data Trust’s GitHub account
Image: GitHub

Of course, the link between this API and the data found by Vickery is unclear and unconfirmed, but it is apparent that the matching fields describe the same data. Asked for a comment about the API, the Data Trust referred us to the RNC, and the RNC did not respond to our questions.

Further insight into the nature of the data came from a post on Stack Overflow that included JSON data, which also used many of the field names. The provenance of the data is unclear, but nearly all of the 59 categories it contains match the categories in the RNC’s API and those shared by UpGuard, including uniquely named fields like “RNCCalcParty” and “MADR_LastCleanse.” Because the data on Stack Overflow contained actual values, it helped us to expand the descriptions of some of the columns.

A post on Stack Overflow containing JSON data that shares field names with the voter data found by Vickery
A post on Stack Overflow containing JSON data that shares field names with the voter data found by Vickery
Image: Stack Overflow

In aggregate, these clues allowed us to compile the lists below. They contain descriptions of 137 data points the Republican party knows, or at least wants to know, about every American voter. (According to UpGuard, the database that Vickery found did not contain data in every field for every voter.) All of the descriptions came from the RNC’s API, except in cases where the category names had a match in the API, but where the descriptions of those categories did not show up. In those cases, we put the descriptions we inferred from the field names in italics. Some descriptions include “sample data,” which came from the data posted on Stack Overflow. The 39 fields we were unable to identify were those ranging from “PG01” to “PG39.”

Your likely religion and ethnicity

The RNC and the Democratic National Committee both pay millions of dollars to data analysis firms like Deep Root to combine information provided by states with data gathered from cold-calls, canvasing efforts, campaign contributions, and social media. Then those datapoints are synthesized to determine how you’re likely to vote and what kind of messaging you’ll respond to. The fields below refer to data derived through that kind of analysis. Notably, the codes for “ModeledEthnicGroup” are limited to “H” for Hispanic and “B” for black, but the field in the Stack Overflow data was populated with a “Z.”

  • RNCCalcParty: RNC Calculated Partisan score: 1=Hard Rep, 2=Lean Rem [SIC], 3=Swing/Ind, 4=Lean Dem, 5=Hard Dem
  • StateCalcParty: Likely a state-level partisanship score similar to RNCCalParty
  • ModeledEthnicity: Modeled Ethnicity – Ethnicity Code. See supplemental documention [SIC] for code values. Sample data: “E1”
  • ModeledReligion: Modeled Religion – Ethnicity Religious Affiliation Code: B = Buddhist, C = Catholic, G = Greek Orthodox, H = Hindu, I = Islamic, J = Jewish, K = Sikh, L = Lutheran […information cuts off here]. Sample data: “P”
  • ModeledEthnicGroup: Modeled Ethnic Coding (H=Hispanic, B=Black). Sample data: “Z”

Your voting history

The voting data retained by each state varies, but is generally considered public information. These fields list which party citizens voted for in each election going back to 2002.

  • LastActiveDate (last_activedate): Last Active Date – Date of Last Voter Activity (if provided on source data)
  • VoterStatus: Voter Status – Current Status of registration as observed by jurisdiction. A – Active, I – Inactive, C – Cancelled, D – Deceased.
  • VH12G: Vote History 2012 General – 2012 General Election
  • VH12P: Vote History 2012 Primary – 2012 Primary Election
  • VH12PP: Vote History 2012 Presidential – 2012 Presidential Primary Election
  • VH11G: Vote History 2011 General – 2011 General Election
  • VH11P: Vote History 2011 Primary – 2011 Primary Election
  • VH10G: Vote History 2010 General – 2010 General Election
  • VH10P: Vote History 2010 Primary – 2010 Primary Election
  • VH09G: Vote History 2009 General – 2009 General Election
  • VH09P: Vote History 2009 Primary – 2009 Primary Election
  • VH08G: Vote History 2008 General – 2008 General Election
  • VH08P: Vote History 2008 Primary – 2008 Primary Election
  • VH08PP: Vote History 2008 Presidential – 2008 Presidential Primary Election
  • VH07G: Vote History 2007 General – 2007 General Election
  • VH07P: Vote History 2007 Primary – 2007 Primary Election
  • VH06G: Vote History 2006 General – 2006 General Election
  • VH06P: Vote History 2006 Primary – 2006 Primary Election
  • VH05G: Vote History 2005 General – 2005 General Election
  • VH05P: Vote History 2005 Primary – 2005 Primary Election
  • VH04G: Vote History 2004 General – 2004 General Election
  • VH04P: Vote History 2004 Primary – 2004 Primary Election
  • VH04PP: Vote History 2004 Presidential – 2004 Presidential Primary Election
  • VH03G: Vote History 2003 General – 2003 General Election
  • VH03P: Vote History 2003 Primary – 2003 Primary Election
  • VH02G: Vote History 2002 General – 2002 General Election
  • VH02P: Vote History 2002 Primary – 2002 Primary Election

What messages you’ll respond to

These fields are a bit ambiguous, but are clearly based on a micro-targeting campaign conducted in 2010, which appears to have examined voter sentiment on several factors.

  • MT10_Party: MT10 Party – 2010 Regional Microtargeting – Party Model.
  • MT10_GenericBallot: MT10 Generic Ballot – 2010 Regional Microtargeting – Generic Ballot Model
  • MT10_Turnout: MT10 Turnout – 2010 Regional Microtargeting – Turnout Model
  • MT10_ObamaDisapproval: MT10 Obama Disapproval – 2010 Regional Microtargeting – Obama Disapproval Model
  • MT10_Jobs: MT10 Jobs – 2010 Regional Microtargeting – Jobs Model
  • MT10_Healthcare: MT10 Healthcare – 2010 Regional Microtargeting – Healthcare Model
  • MT10_SoCo: MT10 SoCo – 2010 Regional Microtargeting – Social Conservative Model

What kind of voter you are, where you live, and how to contact you

Each state keeps track of its citizens’ voting records, party registrations, and contact details, and all of that data is generally considered public information. Some states sell the information to campaigns and other organizations; others give it away for free. The fields listed below include that kind of data, which every voter should assume their state keeps track of. One notable discovery here is that when voters move, there’s a field that describes whether it’s an “individual” or “family” move, presumably to account for cases where children move out of their parents home. Another is that telephone numbers appear to be obtained or otherwise verified with reverse-lookups using voters’ addresses.

  • RNCID: RNCID Primary Key for registration
  • RNC_RegID: RNC GUID for registration
  • SOURCEID: Likely refers to where some or all of the voter’s data came from
  • OfficialParty: Clearly indicates the party the voter is registered with
  • SelfReportedDemographic: Voter-Provided Demographic code (H=Hispanic, B=Black)
  • FTC_DoNotCall: UpGuard confirmed in its blog post that this field indicates whether the voter is on the federal do-not-call list.
  • State: Character Abbreviation State Code. Sample data: “DC”
  • PermAbs: Likely indicates whether the voter is signed up as a permanent absentee voter
  • AffidavitID: AffidavitID – Affidavit Number. Note: Affidavits are paper ballots voters use when their names do not appear on roles at their polling stations.
  • RegistrationDate: Clearly indicates the date the voter registered to vote. Sample data: “20030521”
  • Juriscode: Registration Juriscode Code- A nationally unique numeric representation of each election jurisdiction responsible for voter registration data
  • Jurisname: Jurisname – County or Municipality Name. Sample data: “District of Columbia”
  • CountyFIPS: County code as defined by the jurisdiction. Coding scheme is based on Federal Information Processing Standard (FIPS) municipality assignments.
  • MCD: Minor Civil Division – Indicates municipality in which voter is registered. Coding scheme is based on Federal Information Processing…
  • CNTY: County – State Assigned County Code
  • Town: Field is self-explanatory
  • Ward: Ward – Jurisdiction Assigned Ward Code. Sample data: “07”
  • Precinct: Precinct. Sample data: “097”
  • PrecinctName: Precinct – long form name
  • Ballotbox: Ballot Box – Jurisdiction Assigned Precinct Sub-Division Code / Ballot Box
  • CD_Current: US Congressional District Pre 2011 Redistricting
  • CD_NextElection: CD Next Election – US Congressional District Post 2011 Redistricting
  • SD_Current: State Upper House District Name Pre 2011 Redistricting
  • SDProper_Current: SD Proper Name Current – State Upper House District Name Pre 2011 Redistricting
  • SD_NextElection: State Upper House District Name Post 2011 Redistricting
  • SDProper_NextElection: SD Proper Name Next Election – State Upper House District Name Post 2011 Redistricting
  • LD_Current: LD Current – State Lower House District Pre 2011 Redistricting
  • LDS_Current: LDS Current – State Lower House District Subdivision Pre 2011 Redistricting
  • LDProper_Current: LD Proper Name Current – State Lower House District Pre 2011 Redistricting
  • LD_NextElection: LD Next Election – State Lower House District Post 2011 Redistricting.
  • LDS_NextElection: LDS Next Election – State Lower House District Subdivision Post 2011 Redistricting
  • LDProper_NextElection: LD Proper Name Current – State Lower House District Post 2011 Redistricting
  • NamePrefix: Voter’s Name Prefix
  • FirstName: Voter’s First Name. If first name value passed does not match name provided during registration, no match will be made.
  • MiddleName: Voter’s Middle Name
  • LastName: Voter’s Last Name
  • NameSuffix: Voter’s Name Suffix
  • Sex: Voter’s Gender (M/F/U)
  • BirthYear: Voter’s Birth Year
  • BirthMonth: Voter’s Birth Month
  • BirthDay: Voter’s Birth Day
  • StateVoterID: State Assigned Voter ID Number
  • JurisdictionVoterID: Jurisdiction Voter ID – Locality Assigned Voter Identification Number (if provided on source data)
  • LegacyID: Presumably an obsolete registration ID number
  • HTSEQ: Description not accessible. Sample data: “1” and “2”
  • HHSEQ: Household Sequence – Household Sequence Number. [Sample data: “194398”
  • ChangeOfAddress: Change of Address – N=NCOA move, A= 48 Month NCOA move, D= Multisourced Non-USPS move, L=LACS address conversion
  • COADate: Change of Address Date – Change of Address Date (data only present if there is an address change)
  • COAType: Change of Address Type – Change of Address Type: F = Family Move, I = Individual Move (data only present if there is an address) [… description cuts off
  • RegistrationAddr1 (Registration_Addr1): Field is self-explanatory
  • RegistrationAddr2 (Registration_Addr2): Field is self-explanatory
  • RegHouseNum (Reg_HouseNum): Field is self-explanatory
  • RegHouseSfx (Reg_House_Sfx): Field is self-explanatory
  • RegStPrefix (Reg_St_Prefix): Field is self-explanatory
  • RegStName (reg_st_name): Field is self-explanatory
  • RegStType (Reg_St_Type): Field is self-explanatory
  • RegstPost (Reg_st_Post): Field is self-explanatory
  • RegUnitType (Reg_Unit_Type): Field is self-explanatory
  • RegUnitNumber (Reg_UnitNumber): Field is self-explanatory
  • RegCity (Reg_City): Field is self-explanatory
  • RegSta (Reg_Sta): Field is self-explanatory
  • RegZip5 (Reg_Zip5): Field is self-explanatory
  • RegZip4 (Reg_Zip4): Field is self-explanatory
  • RegLatitude (Reg_Latitude): Field is self-explanatory
  • RegLongitude (Reg_Longitude): Field is self-explanatory
  • RegGeocodeLevel (Reg_GeocodeLevel): Field is self-explanatory
  • MailingAddr1 (Mailing_Addr1): Field is self-explanatory
  • MailingAddr2 (Mailing_Addr2): Field is self-explanatory
  • MailHouseNum (Mail_HouseNum): Field is self-explanatory
  • MailHouseSfx (Mail_HouseSfx): Field is self-explanatory
  • MailStPrefix (Mail_StPrefix): Field is self-explanatory
  • MailStName (Mail_StName): Field is self-explanatory
  • MailStType (Mail_StType): Field is self-explanatory
  • MailStPost (Mail_StPost): Field is self-explanatory
  • MailUnitType (Mail_UnitType): Field is self-explanatory
  • MailUnitNumber (Mail_UnitNumber): Field is self-explanatory
  • MailCity (Mail_City): Field is self-explanatory
  • MailSta (Mail_Sta): Field is self-explanatory
  • MailZip5 (Mail_Zip5): Field is self-explanatory
  • MailZip4 (Mail_Zip4): Field is self-explanatory
  • MailSortCodeRoute (Mail_SortCodeRoute): Field is self-explanatory
  • MailDeliveryPt (Mail_DeliveryPt): Field is self-explanatory
  • MailDeliveryPtChkDigit (Mail_DeliveryPtChkDigit): Field is self-explanatory
  • MailLineOfTravel (Mail_LineOfTravel): Mail Line of Travel – Mail Address Enhanced Line of Travel
  • MailLineOfTravelOrder (Mail_LineOfTravelOrder): Clearly related to the above field.
  • MailDPVStatus: Mail Delivery Point Verification Status – USPS Delivery Point Verification Flag
  • RADR_LastCleanse: Likely refers to when the voter’s registration address was last updated. Sample data: “2013-02-04”
  • RADR_LastGeoCode: Likely refers to when the voter’s geographic information for their registration was last updated. Sample data: “2013-02-04”
  • RADR_LastCOA: Likely refers to when the voter’s change-of-address information for their registration was last updated
  • MADR_LastCleanse: Likely refers to when the voter’s mailing address was last updated. Sample data: “2013-02-04”
  • MADR_LastCOA: Likely refers to when the voter’s change-of-mailing-address information was last updated
  • AreaCode: Telephone Area Code
  • TelSourceCode: Telephone Source Code – Telephone Source Code: N=New append, V=Verified number, S=Source file number, R=Reverse Verify – Name & Address…
  • TelephoneNUm: Telephone Number – 7-Digit Telephone Number
  • TelMatchLevel (tel_matchlevel): Related to telephone number
  • TelReliability (tel_reliability): Telephone Reliability – Telephone Reliability Code: 9=TML of ‘1’ or sum of lower TML recode and number of same number matches in household [Highest… [description cuts off]
  • PhoneAppendDate (phone_appenddate): Related to telephone number