Voting System Proposal

The recent Presidential election has brought up a lot of issues surrounding voting and the tallying of these votes. After thinking about it a bit, I figured I'd write up what I think would be a proposal for a way to have voting and vote tabulating systems that would be verifiable, open, and transparent. This document is mostly one taking a look at technical issues of validation of votes and the counting of same. Human issues regarding who should be able to vote and other issues surrounding it are not really within its scope, with the exception of the Bonus section near the end. This document is fairly nerdy, as I'm a nerdy kinda guy. I believe, however that these issues need to be more fully considered than they have been in the past.

  1. All voters shall be positively identified via standard state or national ID, i.e., drivers licenses, state ID cards, military ID, or passport). Each positively identified individual shall be logged both locally and reported immediately to a centralized location to combat fraud. This list of individuals will be consulted to validate any mail in ballots. Mail in voting shall be restricted to military or expatriate individuals. If you want to vote, show up.

  2. All actual ballots shall be paper. They shall be human readable, and also able to be processed easily by machines for tabulation.

  3. Ballots shall be printed on demand via laser printers at leach polling location. The only thing that should be necessary at the polling places is toner and paper.

  4. All ballot printing devices shall be identical to the maximum degree possible. The computer used to produce ballots shall have no local hard drive, and shall be booted from read-only media such as CD-ROM or DVD. Information about ballots that will be used to print them shall be contained in XML or CSV format that is human-readable and verifiable. All config files shall have a cryptographic hash that can be verified and validated by any concerned party. (See PRINTING below.)

  5. All tabulating systems shall be identical to the maximum degree possible. All software on the tabulating systems should be open source, so that they can be validated by any organization that cares to do so.

  6. Tabulating systems would be booted from read-only media such as CD-ROM or DVD

  7. Tabulating systems should have no local storage, except for removable media, such as SD cards, which will be individually numbered. Micro-SD cards are too small to be individually numbered, so they should not be used.

  8. Each SD cards would initially be identical, and verifiability so. All configuration files shall have a cryptographic hash that can be validated both before and after the election. Each configuration file would be human readable, as either XML or CSV data.

  9. Upon the conclusion of all voting, a copy shall be made of each SD card used by any printing and/or tabulating device. (see IMAGING below). Once this copy is made, all cards shall be sealed in a tamper-evident enclosure. A copy of the results of each cryptographic hash and actual disk images shall be provided to any interested party. A copy of each of these should also be provided to each candidate listed on the ballot if requested.


 

Printing

As mentioned above, there will be no pre-printed ballots. This prevents issues arising of not having enough ballots at a particular location. A given polling location might want to use touchscreens. That is OK, but the ballots produced by these touchscreen devices must be human-readable. Ideally, the only difference between a touchscreen ballot and a standard paper ballot would be that all of the squares or boxes used to indicate a voter's preferences would be filled in by the printer as it is produced.

The individual choices available on the ballot will be determined when the voter presents his ID. If a touchscreen is used, the voter will be handed a slip of paper that will contain whatever information is necessary to display/print the correct ballot. In those locations using strictly a paper system the printed ballot given to the voter would be generated in a similar manner.

I live in Texas, so I am going to use the information found on my voters registration card as an example. My address indicates exatly which races/districts and whatnot are appropriate for me. Here's the information I get on my card (None of the numbers are actually the numbers on my personal card):

Voter number:1234567890
GenderM
Valid from01/01/2020
Valid Through12/31/2021
Year of Birth1968
  
Prec. No4112-01
CONG020
St. Sen011
St. Rep050
Comm003
JP/Con016
City032
City Ward000
St. Edu12
QR Code

Using the data from the above table, the exact proper ballot can be printed. This information should be printed on the ballot in both human readable and an easily verifiable machine readable format (such as a QR Code, which can be read by almost any cellphone.) The voter number would not be printed on the ballot (else you'd be able to correlate a particular voter to a particular ballot) Perhaps a random unique string could be used and logged so as to facilitate forensics, as long as the number could not be associated with an individual voter. If the number printed on the ballot is not in the logs, it would be an invalid vote. Only the information above starting with "Prec. No" would be used along with that unique random number. One way to keep from being able to do this would be to have a stack of 10 or so identical ballots. The voter would pick randomly from the stack, and another blank for that precinct added to the stack. Once all voting is completed. Each unused ballot in the stack would be marked as spoiled (which would be a checkbox on the ballot) and either placed aside, or fed into the counting machine as a spoiled, null, ballot. Thus a log entry for every valid and invalid votes would be maintained.

I'm tempted to say that there should also be a QR Code image of the information on the ballot, but that would tempt someone who might be interested in bribing folks to vote a certain way, because the person paying for the vote could actually validate what was voted. This is something that has to be considered in any voting system. Of course, these days it's also possible for the voter to take a picture of his vote given that just about every phone on the planet has a camera built in, so maybe that might be less of a concern than it might have been in the past. I would lean against it in any case.


 

Imaging

As mentioned above, the only thing in the Printing / Tabulating systems that can be written to are removable media such as SD cards. One thing that those interested in the integrity of the vote would be interested in would be a verifiable way to obtain copies of all data relevant to the vote. The following is a method that might be useful to generate such documentation.

All hardware would be designed in such a way that a given device can be used at any polling location. The information printed or displayed to the voter would be based on config files contained on the removable media, which for our purposes at the moment, we'll assume are SD cards. I'm also going to focus below primarily on cards utilized for tabulation purposes. Those used for printing ballots would all be identical. Using the same methods below, this could be trivially validated.

With observers present. The procedure below will create a validated image of the card that can be saved, and provided to anyone who wants to look at it from a forensics or data perspective.

All of the following can be performed from just about any Unix/Linux computer, and is completely read-only on the card itself. At no point is the card even mounted for writing. In the following, lines that start with "###" are my comments explaining what is being done. Lines that start with "$" are the actual commands being issued.
### First, create an empty directory
$ mkdir votecards

### change to that directory
$ cd votecards/

### verify the directory is empty
$ ls -l
total 0

### Without mounting the card, create an image of it on the local hard disk
$ sudo dd if=/dev/sdd1 of=card0001.img
246175+0 records in
246175+0 records out
126041600 bytes (126 MB, 120 MiB) copied, 16.2103 s, 7.8 MB/s

### verify that the image file exists.
$ ls -l
total 123092
-rw-r--r-- 1 root root 126041600 Dec  5 19:49 card0001.img

### Get a cryptographic hash of the image. Have all observers write this hash down.
$ sha256sum card0001.img 
6f4624afb94125a4ca0ac0c3a1cde7b4e9566f5de89f26eb1125d2977b44cf08  card0001.img


### Do the same thing, except this time dump the results into a file.
$ sha256sum card0001.img >> card0001.img.sha256sum.txt


### Validate the contents of the hash file. Observers can compare against written value.
### If the number above and the number below do not match, something is wrong. 
$ cat card0001.img.sha256sum.txt 
6f4624afb94125a4ca0ac0c3a1cde7b4e9566f5de89f26eb1125d2977b44cf08  card0001.img

### Mount the image file 
$ sudo mount -o loop card0001.img /mnt

### Check contents of the mounted filesystem
$ ls -lR /mnt
/mnt:
total 6
drwxr-xr-x 2 root root 2048 Dec  5 17:23 config
drwxr-xr-x 2 root root 2048 Dec  5 17:14 logs
drwxr-xr-x 2 root root 2048 Dec  5 17:13 votedata

/mnt/config:
total 14
-rwxr-xr-x 1 root root  73 Dec  5 17:21 precinct-001.cfg
-rwxr-xr-x 1 root root  73 Dec  5 17:21 precinct-002.cfg
-rwxr-xr-x 1 root root  73 Dec  5 17:21 precinct-003.cfg
-rwxr-xr-x 1 root root  73 Dec  5 17:21 precinct-004.cfg
-rwxr-xr-x 1 root root  73 Dec  5 17:21 precinct-005.cfg
-rwxr-xr-x 1 root root  73 Dec  5 17:22 precinct.cfg
-rwxr-xr-x 1 root root 494 Dec  5 17:23 precinct.sha256.txt

/mnt/logs:
total 2
-rwxr-xr-x 1 root root 84 Dec  5 17:14 logfile.01.txt

/mnt/votedata:
total 2
-rwxr-xr-x 1 root root 82 Dec  5 17:13 votes.txt


### Get a cryptographic hash of each individual file. Write these down or take screen shot.
### Note, piping the output through sort will make sure all files are displayed in the same 
###   order each time.  
$ find /mnt -type f -exec sha256sum {} \; | sort -k2
4511277a6fd1f513ef6448e7b89e554aa155351960501c69f050b77434aac0c5  /mnt/config/precinct-001.cfg
0e940e44a02c22217af9f40eab2f55c1bb763a85baf84f7c78068ab9a95d8e87  /mnt/config/precinct-002.cfg
f87612e4c850324a3dd7999d1b48078a154d35319989c304d8681c7b64a0d953  /mnt/config/precinct-003.cfg
eb7b5c0bba630a60abba2919543fb4374b0d392f6aa9fd2de0fa6deb93035321  /mnt/config/precinct-004.cfg
5b6eb9e719edb9b53675cec35a19fcc0d68c012e068a47ded4f141cab25b790e  /mnt/config/precinct-005.cfg
0e940e44a02c22217af9f40eab2f55c1bb763a85baf84f7c78068ab9a95d8e87  /mnt/config/precinct.cfg
b940d2ae1447984dd41285a63b056270ff2f1b5df32525944c7ad95cbfb384a9  /mnt/config/precinct.sha256.txt
a9b71823d534f6f7dcb04af1f4975057d4045b27c1e795e828b513790afae881  /mnt/logs/logfile.01.txt
acb1018d99ec642ffcc006b2885f9bc5ff0ef70ce4b3f070d3b9ac3c8d1ef9f5  /mnt/votedata/votes.txt


### Get cryptographic hash of each individual file, and store it in a file.
$ find /mnt -type f -exec sha256sum {} \; | sort -k2 > card0001.files.sha256sum.txt

### Check contents of file hashes. Make sure the contents of the file matches 
###  the written hashes or screenshot.
$ cat card0001.files.sha256sum.txt 
4511277a6fd1f513ef6448e7b89e554aa155351960501c69f050b77434aac0c5  /mnt/config/precinct-001.cfg
0e940e44a02c22217af9f40eab2f55c1bb763a85baf84f7c78068ab9a95d8e87  /mnt/config/precinct-002.cfg
f87612e4c850324a3dd7999d1b48078a154d35319989c304d8681c7b64a0d953  /mnt/config/precinct-003.cfg
eb7b5c0bba630a60abba2919543fb4374b0d392f6aa9fd2de0fa6deb93035321  /mnt/config/precinct-004.cfg
5b6eb9e719edb9b53675cec35a19fcc0d68c012e068a47ded4f141cab25b790e  /mnt/config/precinct-005.cfg
0e940e44a02c22217af9f40eab2f55c1bb763a85baf84f7c78068ab9a95d8e87  /mnt/config/precinct.cfg
b940d2ae1447984dd41285a63b056270ff2f1b5df32525944c7ad95cbfb384a9  /mnt/config/precinct.sha256.txt
a9b71823d534f6f7dcb04af1f4975057d4045b27c1e795e828b513790afae881  /mnt/logs/logfile.01.txt
acb1018d99ec642ffcc006b2885f9bc5ff0ef70ce4b3f070d3b9ac3c8d1ef9f5  /mnt/votedata/votes.txt

### Pro Tip: 
### Rather than staring at that huge mass of random characters, pipe the entire 
###   output through sha256sum so that only one line of output prints. If the two lines
###   below are the same, then the data is the same in both raw output and the file. 
$ find /mnt -type f -exec sha256sum {} \; | sort -k2 | sha256sum
85aee5b269910bcf47bf9096a136e8cc80722142826e40cd99cfea5c1d4e41fa  -

$ sha256sum card0001.files.sha256sum.txt 
85aee5b269910bcf47bf9096a136e8cc80722142826e40cd99cfea5c1d4e41fa  card0001.files.sha256sum.txt

### Unmount the image file

$ sudo umount /mnt

### take a look at the files that now exist in the directory.
$ ls -l
total 123100
-rw-rw-r-- 1 amp  amp        845 Dec  5 19:59 card0001.files.sha256sum.txt
-rw-r--r-- 1 root root 126041600 Dec  5 19:57 card0001.img
-rw-rw-r-- 1 amp  amp         79 Dec  5 19:51 card0001.img.sha256sum.txt

See the section below about cryptographic hashes for more detail why the above hashes are so incredibly useful.

Once all of the above is complete on each card, the originals should be sealed until the all of the legal issues have been dealt with. The state can by a new stack of fresh cards/drives or whatever to use in the upcoming election.

Any competent Unix/Linux nerd can validate the above procedure. The disk images can be provided to any person or organization that would like to take a look at them. One of the cool things about using the 'dd' command to image the cards is that it actually provides a byte-for-byte copy of the card itself. There are tools you can use to see deleted files and other information on the card. It does not just copy the files/directories of the file, but is actually an exact image of the card itself.

Anyone can validate after that point that the hashes match. The hash data should be publicly published so that anyone can look at it. In fact, I would strongly argue that the individual images should also be make publicly available. The computer used to generate all of this data can be a completely stand-alone box that has no network connection, and for the truly paranoid, could be installed from validated media immediately before this imaging process is initiated.


 

Cryptographic Hashes

A Cryptographic Hash is a strong one-way function that can be used to validate that specified data has not been altered. Wikipedia has a pretty good article about it, that explains it in much better detail than I can. However, the following is an attempt at explaining it in general terms that hopefully is understandable by most folk.

A 'cryptographic hash' is a humnan-readable string of hexidecimal digits. The number of digits is dependant upon the type of hash being used. In the examples below, I'm using a program called 'sha256sum' that will take any data input and reduce it to a 64 character string. This string will be unique for any input. It is theoretically possible for two different files to create the same hash, but the likelyhood of this happening by chance is really astronomical. Picture yourself standing on one of Jupiter's moons, and hitting a golfball that flies across the almost unimaginable distance to Earth, and lands directly in the cup on the first hole of your favorite golf course. It's roughly the same likelihood. One of the cool things about a hash of this type is that it is completely independent of the amount of data that is fed into it. No matter how big the file is, you always get exactly 64 characters as output. It can be easily written down, or otherwise saved, and then used as a comparison at a later date.

Here's a quick example of using a hash to see if a file has been altered...

The following is something that you can do using just about any standard Linux or Unix computer. I am pretty sure the tools also exist for MS-Windows, but I do not believe they are standard tools. In the following, the lines that start with '##' are my comments about what is being done. The lines that start with '$' are the actual commands being executed.

## The following is the original file. It is the Project Gutenberg version of 
## the King James version of the bible.
$ ls -l
total 4844
-rw-r--r-- 1 amp amp 4959549 Nov 28 20:30 The_Bible-KJV.txt

## This is the hash generated via the 'sha256' program.
$ sha256sum The_Bible-KJV.txt 
6d1c5625cad6b6f619bd8b5cb5e77ea20dcf052082743f27bc8c8be2fb7e8a55  The_Bible-KJV.txt

## Now I make a copy of that file.
$ cp The_Bible-KJV.txt The_Bible-KJVa.txt

## I check the hash of both files, and they show as being identical
$ sha256sum The_Bible-KJV.txt The_Bible-KJVa.txt
6d1c5625cad6b6f619bd8b5cb5e77ea20dcf052082743f27bc8c8be2fb7e8a55  The_Bible-KJV.txt
6d1c5625cad6b6f619bd8b5cb5e77ea20dcf052082743f27bc8c8be2fb7e8a55  The_Bible-KJVa.txt

## I edit the copy...
$ vi The_Bible-KJVa.txt

## The following is a listing of the first 3 lines of each file. 
## Note only difference is the first line starts with "T" in the first
## and "t" in the second.
$ head -3 The_Bible-KJV.txt
*This King James' Bible is the SECOND Project Gutenberg Version*
This 10th edition should be labeled biblea10.txt or biblea10.zip
****This edition is being officially released on Easter 1992****

$ head -3 The_Bible-KJVa.txt
*this King James' Bible is the SECOND Project Gutenberg Version*
This 10th edition should be labeled biblea10.txt or biblea10.zip
****This edition is being officially released on Easter 1992****

## Now, lets check the hash again...
$ sha256sum The_Bible-KJV.txt The_Bible-KJVa.txt
6d1c5625cad6b6f619bd8b5cb5e77ea20dcf052082743f27bc8c8be2fb7e8a55  The_Bible-KJV.txt
2cedfa1ddd401af877a03c9f9e84f675c89f86a3474372b2e45b0e777dd88c21  The_Bible-KJVa.txt

## Note that even the tiniest of changes to the file generates a completely different hash.
## You'll also note below that the two files are still exactly the same size, yet
## produce much different output even if that difference is only a single character.
$ ls -l 
-rw-r--r-- 1 amp amp 4959545 Nov 28 20:44 The_Bible-KJVa.txt
-rw-r--r-- 1 amp amp 4959545 Nov 28 20:42 The_Bible-KJV.txt

None of the above is rocket science to anyone who knows anything about security. Not only can you generate a hash for each individual file on the card, but after doing so and saving the resulting list of hashes, you can hash that resulting file as well, so that if any individual file is changed that overall has will fail as well. You can print, save, email and otherwise disseminate these hashes so everyone involved will have confidence in the data.

I'd also note, that if I were setting up something to assist with validating election results, not only would you have strong cryptographic hashes of all data, but the files on the computer as well, such that any change made would be readily apparent. I'd also implement digital signatures using strong cryptographic functions like those available with the PGP or GPG encryption programs, but that is a much longer discussion for another day.


Initial and Final Validation

If you have read down to this part, all of the basics of validation using cryptographic have been discussed, in probably more detail than most people would care to know about. However, it's all necessary to a certain degree because it it is an attempt to make vote validation systematic and easily verifiable by anyone who can compare two numbers and make the simple correlation that 'yes', these two strings are the same. They do not necessarily have to understand why the two strings need to match, only that they must before one proceeds.

Briefly I'd like to talk about initial setup of the systems. As mentioned earler, as much of the computer system should be read-only as possible. The main reason this is important to me is that if all of the systems start from a known good state, that can be trivially validated, then it leaves less room for malicious individuals to subvert the systems. As envisioned, almost none of the computer systems in place need to be internet capable. In fact, it is advisable that the fewest possible systems have the ability to communicate with the external world, because any such communication is, by its nature two-way, and presents issues that must be dealt with from a security standpoint.

Here's how I would envision initial preparation for the vote.


 

Bonus!
Nully's modest proposal to end voter and election fraud:

Of course, all this would only apply to Federal elections, for federal offices, as that is the legitimate concern of the federal government.

Let the states who have local authority use whatever system they wish to force the elections of their favorite sons and daughters to alderman, mayor or goobernor. They can do it the cheap way, by just following the federal rules for all voting, or they can have separate ballots for local and federal. Their call. It's a free country, ain't it?