Software

Contact Tracing using Location based Encryption

Background

A number of solutions to contact tracing using Bluetooth have been developed by the likes of Google and Apple and many others. This requires user devices to monitor the Bluetooth spectrum for potential contacts and record those contacts for a particular amount of time. The issue here is that other devices would be monitoring your Bluetooth and that data could be combined to determine your identity, places you visit, and who you associate with.

Concept

The basic concept is that your personal device records your own GPS location data for a certain period of time and is not sent out to any centralized service. If you end up contracting a contagious disease like COVID-19 and have been diagnosed you are given a unique set of data IDs and the ability to publish data on a public database or website. You would then use your location data to encrypt these unique IDs. To do this the GPS position/time data is used to get a key that corresponds to a set of grid cells and time units. The location keys are used to encrypt your unique set of data IDs creating a set of encrypted data that can be published on the public website.

Most users would then take the data off the public website and try to decrypt the data with their own location data. This would create a large data set of decryption attempts or potential data IDs. The user would then query a private database with this data set. The private database contains all of the unique data IDs that it has given out to the infected users. The private database would then return the number of hits against the known infected data ID database. This number of hits allows an app to calculate the amount of time the user could have been in contact with infected persons.

This method means you never share location data with a centralized service. So infected users only publish encrypted location data and most users only share potential data IDs with the centralized service. Users are also only collecting their own data never anyone else’s.

Potential Issues

User processing power is needed to try to decrypt the encrypted data with all of their GPS location data creating a large set of data. Server processing power is also needed to check all of the potential decryptions against the private data ID database. GPS position accuracy is severely limited in indoor locations and in altitude.

Enabling Technologies

  • GPS sensor for position and time data
  • pseudo-random number generators
  • database search methods, bloom filters, and hash functions
  • AES or other encryption methods

Entropy of The Encryption

Assumptions for entropy calculations:

  • Grid area 100km^2
  • Grid Cell area 100m^2
  • 1 min reporting Period

The number of cells that divides up the desired area.

Cells = \frac{Area}{CellArea}

Cells = \frac{100km^2}{100m^2} = 1000000

Each day is divided up by the GPS reporting interval to give the number of location points.

Locations = \frac{24h*60min}{Period}

Locations = \frac{24h*60min}{1min} = 1440

Location division into grid cells example:

The grids have unique cell IDs based on their latitude and longitude that is also determined by the time base where t0 is the first followed by t1 till the end of the set epoch.

The entropy of the encryption:

EncryptEntropy = Cells*Locations

EncryptEntropy = 1000000*1440 = 1440000000

This is aproximatly 1.5 Billion combinations with the number of equivelent bits becomes:

log2(1.5Billion) = 30bits

Proof of Concept

The proof of concept is composed of two parts. The first part is the population simulation that generates the people movement data and tracks true contacts with infected persons. The second part is the actual tracing algorithm that utilizes location-based encryption.

Simulation

In order to test the contact tracing algorithms a data set of population movement was needed. Since I do not have access to real fine-scale GPS movement data, I created it in simulation. I used Python and Jupyter Notebooks to create a simple population movement simulation. The simulation has configurable generics that set up the simulation area, number of people, the percent infected, and the simulation length.

The simulation area is divided into a grid of cells that are used to determine contacts between people. That cell area is configurable as well. The Jupyter Notebook is shown below.

https://github.com/Tschucker/ELECTRA/blob/master/popsim.ipynb

The simulation outputs two lists of data, the clear population which contains the majority of the people, and the infected population. The data is an array of people class objects. The person class contains an array of location objects as well as several other objects that will be used with the contact tracing algorithms. The location objects contain time data along with the (x,y) fine position, the grid cell number used for contact calculation, and data ID that will be used by the contact tracing algorithm.

Contact Tracing Algorithm

The output of the simulation is pulled into the contact tracing algorithm Jupyter Notebook. The infected users get assigned data IDs for all of their location data objects. These are then encrypted by each infected user with the associated location cell ID. This encrypted data is then added to a master array that all users can access.

The majority of the population then checks for contacts with the infected by trying to decrypt this data with their cell IDs. These potential decrypted data IDs are checked against the private array of all the data IDs given out. The number of hits against the database is then relayed back to the user as time in contact with an infected user.

The proof of concept python code operates on only a few people known to be in contact with infected users to limit the processing time.

https://github.com/Tschucker/ELECTRA/blob/master/contact_trace.ipynb

The output of the contact tracing algorithm notebook is a set of statistics on how long the processing takes and the number of operations that would be needed to perform the necessary checks, decryptions, and encryptions.

Code

The code for this proof of concept can be found on my GitHub account at the link below.

https://github.com/Tschucker/ELECTRA

Future Improvements

As this project progresses I hope to improve the population movement simulation based on this paper: https://royalsocietypublishing.org/doi/10.1098/rsif.2014.0642

I also want to create faster database lookup operations for the private database and investigate better unique key generation.