Location Based Encryption Contact Tracing Algorithm and Processing Statistics

Import Libraries

In [1]:
import numpy as np
from Crypto.Cipher import AES
import matplotlib.pyplot as plt
import pickle
from classes import Person, Location, EDataPair
import string, random
import time

#constant
min_per_day = 1440

Set Plot Style

In [2]:
plt.rcParams['figure.figsize'] = [25/2.54, 20/2.54]
plt.style.use('dark_background')
plt.rcParams['figure.dpi'] = 300
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']

Import Data from Population Simulation

In [12]:
f1 = 'clear_population.pickle'
if1 = open(f1,'rb')
people = pickle.load(if1)
if1.close()

f2 = 'infected_population.pickle'
if2 = open(f2,'rb')
infected = pickle.load(if2)
if2.close()

print("Number of people in Clear Population:", len(people))
print("Number of people in Infected Population:", len(infected))
Number of people in Clear Population: 198
Number of people in Infected Population: 2

Assign Random Value ID to Each Location in Infected Array

The unique value ID is associated with infected to show that it is a confirmed case. An ID is generated each of infected's locations. These IDs are used as the master private database to be checked against.

In [13]:
#Determine the number of data ids to produce
num_data_ids = 0
for i in range(len(infected)):
    num_data_ids = num_data_ids + len(infected[i].locs)
    
#Create unique data ids (as number of infected grows creating enough unique ids could get difficult)
total_data_ids = []
i = 0
while i < num_data_ids:
    #needs to be updated to not just ascii to blend into attempted decryptions better
    temp_id = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for x in range(16))
    if temp_id not in total_data_ids:
        #encode to match decryption output format (wont be needed when id format changes)
        total_data_ids.append(temp_id.encode('UTF-8'))
        i = i + 1

#Assign a data id to each location (assignment could be random index)
for i in range(len(infected)):
    for loc_i in range(len(infected[i].locs)):
        infected[i].locs[loc_i].data_id = total_data_ids[(i+1)*loc_i]
        
print("Example Unique Data ID:",total_data_ids[0])
Example Unique Data ID: b'hwqqIhlmWL0O4P1l'

Create Encrypted Data

Each of the infected persons uses their location data to encrypt the associated data_id. That encrypted data is then placed in an public database array for users to compare against.

In [18]:
encrypted_data = []
encrypt_time_array = []
for person in infected:
    start = time.time()
    for loc in person.locs:
        #Create initilization vector for the aes encryption (used in plain text)
        iv = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for x in range(16))
        
        #Extend cell number with x's to create 16byte key
        key = str(loc.cell)
        if len(key) < 16:
            ext = ''.join('x' for x in range(16-len(key)))
            key = key + ext
        
        #Create the aes object
        aes = AES.new(key, AES.MODE_CBC, iv)
        
        #Encrypt the assigned data id
        temp_edata = aes.encrypt(loc.data_id)
        
        #pair up the initilization vector and the encrypted data
        encrypted_data.append(EDataPair(iv,temp_edata))
    end = time.time()
    encrypt_time_array.append(end-start)
In [19]:
num_encrypted_data = len(encrypted_data)
print('Example Key Generated with Location Cell ID:', key)
print('Number of Encrypted Points in Public Datatabase:',num_encrypted_data)
print('Average Encryption time per Infected Person:',np.mean(encrypt_time_array),'sec')
Example Key Generated with Location Cell ID: 635138xxxxxxxxxx
Number of Encrypted Points in Public Datatabase: 2880
Average Encryption time per Infected Person: 0.26625287532806396 sec

Perform Decryption Attempts Test

The users will then try to decrypt the public encrypted data using their own location data as the set of potential keys. This generates a block of decryption attempts for each user. Python proof of concept will use people in the clear population with known contacts to check the algorithm and speed up dev iterations.

In [16]:
test_people = []
for person in people:
    if len(person.contacts) > 0:
        test_people.append(person)
        break #used for single person test

print('Known people with contacts:',len(test_people))

decrypt_time_array = []
for person in test_people:
    start = time.time()
    for loc in person.locs:
        #Extend cell number with x's to create 16byte key
        key = str(loc.cell)
        if len(key) < 16:
            ext = ''.join('x' for x in range(16-len(key)))
            key = key + ext
        
        for pair in encrypted_data:
            #Create the aes object
            aes = AES.new(key, AES.MODE_CBC, pair.iv)
            attempt = aes.decrypt(pair.edata)
            person.decypted_attempts.append(attempt)
    end = time.time()
    decrypt_time_array.append(end-start)
            
num_decypted_attampts = len(test_people[0].decypted_attempts)
print('Number of Decrypted Attempts per Person:',num_decypted_attampts)
print('Average Decryption Attempts time per Person:',np.mean(decrypt_time_array),'sec')
Known people with contacts: 1
Number of Decrypted Attempts per Person: 4147200
Average Decryption Attempts time per Person: 31.930400133132935 sec

Check Matching Decryptions

The decryption attempts for each user are checked against the private data_id database array to see if they match. The private database calculation is done on the server side and reports only the number of contacts. Show the real known contacts vs the algorithm generated contacts to make sure they match. Look into using a bloom filter in the future.

In [9]:
db_check_time_array = []
for person in test_people:
    start = time.time()
    for attempt in person.decypted_attempts:
        if attempt in total_data_ids:
            person.alg_contacts.append(attempt)
    end = time.time()
    db_check_time_array.append(end-start)
    print('Known Contacts:', len(person.contacts), 'Decrypted Contacts', len(person.alg_contacts))
print ('Average Check time:', np.mean(db_check_time_array)/60, 'min')
Known Contacts: 100 Decrypted Contacts 200
Average Check time: 3.231235619386037

Processing Statistics

In [17]:
checks = num_encrypted_data * num_decypted_attampts
print('Number of checks performed by the database per Person:', checks)
print('Total Number of checks performed by the database:', checks*len(people))
Number of checks performed by the database per Person: 11943936000
Total Number of checks performed by the database: 2364899328000