Electoral Process Education Corporation (EPEC), a 501 c(3) educational charity devoted to voter participation and education, performed a Levenshtein Distance Analysis on Virginia’s Registered Voter List (RVL) as of June 30, 2023. The results are expressed in summary form in this data visualization showing voters who may have more than one voter ID assigned to their registration. The background on the algorithms applied follows the data map.
Backgrounder on the Dashboard Numbers
The EPEC Levenshtein Distance Analysis
Algorithmic Detection of Voter Registrations with “Multiple Vote” Risk
1. Background and Methodology
Electoral Process Education Corporation (EPEC), a 501 c(3) educational charity devoted to voter participation and education, performed a Levenshtein Distance Analysis on Virginia’s Registered Voter List (RVL) as of June 30, 2023.
The results identified voters who may be registered multiple times in the Commonwealth. EPEC has reported these results in briefings with state and local election officials, who are able to check these results against other forms of identification on voters not available to the public or nonprofits qualified to receive election data.
2. Levenshtein Distance Analysis Overview
The Levenshtein Distance Analysis is a process that is used to compute the distance, or similarities between two strings. Jon Lareau, who volunteers as EPEC’s Chief Technology Officer, deployed this technique against the RVL. His results found matches of registration-name data that suggests some voters have more than one registration ID assigned to them.
His work on the RVL deploys a process which, pair-wise, compares each voter registration record in the official RVL to all other voter registration records. Registration records which had exact Personal Identifying Information, or near-exact PII matches, were flagged (identified as a possible registration error). This dashboard is showing summary data on the numbers.
3. Categories Ranked by Highest Risk
In this Dashboard, we group these registrations into four categories to explain the numbers in the Data Visualization to show the highest-risk numbers by rank:
–Category 1 (L-Distance = 0)
Exact Date of Birth (month, date, year) and Exact Full Name (First Name, Middle Name, Last Name, Suffix)
–-Category 2: (L-Distance = 1)
Similar DOB and Full Name (Difference of 1 character)
–- Category 3: (L-Distance = 2)
Similar DOB and Full Name (Difference of 2 characters)
–-Category 4: (L-Distance = 3)
Similar DOB and Full Name (Difference of 3 characters)
##
Original reporting by Jon Lareau, per Digital PollWatchers.org Sandbox for deeper analysis.