Andrew B. Collier

Andrew is a Data Scientist with experience in academic and commercial environments. Working as an Experimental Physicist he refined his skills in research, data analysis, scientific computing, writing and presentation. He now leverages these expertise as a Data Scientist. Things he likes to do (and he's pretty good at):

transforming raw and messy data into a tidy and workable data set;
separating signal from noise in large volumes of data;
making sense of analytical results within a business or scientific context;
communicating illuminating results in a clear and intuitive way with attractive visualisations and insightful discussion; and
doing all of this in plain English, accessible to both experts and laymen.

Andrew is a builder and a problem solver. His passion for learning and sharing is evident on his blog and Stack Overflow profile.

Citizenship: UK ZA

Skills

Advanced

R
Python
SQL
C
C++
HTML
CSS
AI/ML
web scraping
Flask
Linux
BASH
Git
CI/CD
Docker
automation

Intermediate

PostgreSQL
Redis
AWS
MPI
SQLAlchemy
Scrapy
Playwright
Selenium

Education

Jun 2006

PhD

(Physics)

Royal Institute of Technology

Stockholm, Sweden

Jun 2004

Licentiate

(Physics)

Royal Institute of Technology

Stockholm, Sweden

Sep 1998

MSc

(Nuclear Engineering)

University of Potchefstroom

Potchefstroom, South Africa

Apr 1994

BSc (Honours: Physics & Mathematics)

University of Natal

Durban, South Africa

Work Experience

Sep 2024 - Current

Quantitative Technology Director / Web Crawling Specialist

Qube Research & Technologies

London (hybrid)

Mar 2020 - Current

Senior Python Engineer / Web Crawling Specialist (contract)

Unrival

London (remote)

Python
R
web scraping
Selenium
SQLAlchemy
Scrapy

Unrival is a cloud-based intelligence platform for understanding corporate structure.

Developed a suite of web crawlers for extracting data from LinkedIn and Sales Navigator.
Built a generic web crawler for extracting data from corporate C-Suite pages.
Created standalone web crawlers to be run by analysts (supporting Linux, Windows and macOS platforms), saving hundreds of hours of manual data extraction.
Implemented a collection of scripts for automated documentation and presentation generation.
Built and managed a Flask API over a AWS Aurora database.

Oct 2022 - Current

Backend Engineer (contract)

Domino Data Lab

San Francisco, CA (remote)

Gatsby
TypeScript

Domino Data Lab provides a Data Science development platform. I was the backend engineer responsible for the documentation system.

Compiled documentation and recorded instructional videos for the Low Code Assistant feature.
Migrated site from Gatsby Cloud to Vercel.
Implemented full restyling of site.
Improved efficiency of build process.
Implemented restyling of site.
Fixed Coveo search.

Apr 2017 - Mar 2024

Founder & Lead Data Scientist

Fathom Data

South Africa (remote)

R
Python
management
leadership
mentoring

Fathom Data is a 100% Data Science consulting company.

Established 100% remote company.
Recruited & led a team of 10 Data Scientists.
Grew revenue from 501k ZAR (2017) to 5,838k ZAR (2023).
Gained international clients organically without the use of paid advertising.
Built a spatial stochastic optimisation model, allowing a South African security company to optimise the size and operation of their vehicle fleet.
Created a pipeline for efficiently processing enormous Transparency in Coverage files, enabling the client to integrate these data into their offering.
Translated an SQL iterative solver first into R and then into Python, massively reducing memory and compute requirements.
Built a robust suite of web crawlers to gather data from online vehicle dealers.
Created web crawlers to gather pricing data from a selection of South African online retailers.
Performed raking and analysis of survey data for South African opposition party.

May 2023 - Mar 2024

Backend Engineer (contract)

HumanOS

London (remote)

Python
Flask
PostgreSQL
WebSockets

HumanOS is a comprehensive wellbeing platform. I was the first (and only) backend engineer, responsible for establishing the product's support infrastructure.

Designed, deploy and maintain a PostgreSQL database hosted on RDS.
Created a Flask API to support both web and mobile front ends. The API is served via Docker and NGINX from an EC2 instance.
Added WebSockects to API, ensuring that the front ends are responsive.
Integrated the API with the WeFitter API for acquiring data from wearable devices.

Jan 2020 - Apr 2022

Data Engineer (contract)

BluePath Solutions

Los Angeles, CA (remote)

R
Python
AWS
web scraping

BluePath Solutions is a consulting firm specialising in health economics.

Built, deployed and maintained web crawlers for gathering pharmaceutical data (Red Book, NDC codes and crosswalk, and RxNorm).
Maintained and extended a Shiny application.

Oct 2019 - Feb 2020

Data Scientist (contract)

HOF Capital

New York, NY (remote)

R
AWS
web scraping

HOF Capital is a seed fund venture capital company.

Built web crawlers for gathering data on new company registrations.
Deployed automated crawls on AWS.

Jan 2015 - May 2017

Senior Data Scientist

Derivco

Durban, South Africa

Sep 2013 - Jan 2015

Game Mathematician

Derivco

Durban, South Africa

Jan 2013 - Jan 2014

Researcher

University of Bergen

Bergen, Norway

Apr 2011 - Jul 2013

Senior Researcher

South African National Space Agency (SANSA)

Hermanus, South Africa

Jan 2006 - Apr 2011

Postdoctoral Researcher

Hermanus Magnetic Observatory (HMO)

Hermanus, South Africa

Projects

paddle: Kayak Race Management

https://github.com/datawookie/paddle

Python
Flask
SQLAlchemy

A Flask application for managing entries and results for the Waterside Series of kayak races run by the Newbury Canoe Club.

Medusa Multi-Headed Tor Proxy

https://github.com/datawookie/medusa-proxy

Docker

The Medusa Proxy Docker image provides a flexible interface to a set of proxies operating on the Tor network. The operation of the image is explained in a blog post.

{emayili}

https://github.com/datawookie/emayili

The {emayili} package provides a simple, tidyverse-compliant interface for sending emails from R. It is lightweight, avoiding some of the bulky dependencies associated with other similar packages.

{binance}

https://github.com/datawookie/binance

An R wrapper for the Binance API, enabling automated cryptocurrency trading from R.

{clockify}

https://github.com/datawookie/clockify

An R wrapper for the Clockify API.

{filebin}

https://github.com/datawookie/filebin

An R wrapper for the Filebin API, making it possible to create and manage ephemeral file shares from R.

{tomtom}

https://github.com/datawookie/tomtom

An R wrapper for the TomTom Developer API.