Andrew is a Data Scientist with experience in academic and commercial environments. Working as an Experimental Physicist he refined his skills in research, data analysis, scientific computing, writing and presentation. He now leverages these expertise as a Data Scientist. Things he likes to do (and he's pretty good at):

  • transforming raw and messy data into a tidy and workable data set;
  • separating signal from noise in large volumes of data;
  • making sense of analytical results within a business or scientific context;
  • communicating illuminating results in a clear and intuitive way with attractive visualisations and insightful discussion; and
  • doing all of this in plain English, accessible to both experts and laymen.

Andrew is a builder and a problem solver. His passion for learning and sharing is evident on his blog and Stack Overflow profile.

Citizenship: UK ZA

Skills

Advanced
  • R
  • Python
  • SQL
  • C
  • C++
  • HTML
  • CSS
  • AI/ML
  • web scraping
  • Flask
  • Linux
  • BASH
  • Git
  • CI/CD
  • Docker
  • automation
Intermediate
  • PostgreSQL
  • Redis
  • AWS
  • MPI
  • SQLAlchemy
  • Scrapy
  • Playwright
  • Selenium

Education

Jun 2006
PhD
(Physics)
|
Royal Institute of Technology
Stockholm, Sweden
Jun 2004
Licentiate
(Physics)
|
Royal Institute of Technology
Stockholm, Sweden
Sep 1998
MSc
(Nuclear Engineering)
|
University of Potchefstroom
Potchefstroom, South Africa
Apr 1994
BSc (Honours: Physics & Mathematics)
|
University of Natal
Durban, South Africa

Work Experience

Sep 2024 - Current
Quantitative Technology Director / Web Crawling Specialist
|
Qube Research & Technologies
London (hybrid)
Mar 2020 - Current
Senior Python Engineer / Web Crawling Specialist (contract)
|
Unrival
London (remote)
  • Python
  • R
  • web scraping
  • Selenium
  • SQLAlchemy
  • Scrapy

Unrival is a cloud-based intelligence platform for understanding corporate structure.

  • Developed a suite of web crawlers for extracting data from LinkedIn and Sales Navigator.
  • Built a generic web crawler for extracting data from corporate C-Suite pages.
  • Created standalone web crawlers to be run by analysts (supporting Linux, Windows and macOS platforms), saving hundreds of hours of manual data extraction.
  • Implemented a collection of scripts for automated documentation and presentation generation.
  • Built and managed a Flask API over a AWS Aurora database.
Oct 2022 - Current
Backend Engineer (contract)
|
Domino Data Lab
San Francisco, CA (remote)
  • Gatsby
  • TypeScript

Domino Data Lab provides a Data Science development platform. I was the backend engineer responsible for the documentation system.

  • Compiled documentation and recorded instructional videos for the Low Code Assistant feature.
  • Migrated site from Gatsby Cloud to Vercel.
  • Implemented full restyling of site.
  • Improved efficiency of build process.
  • Implemented restyling of site.
  • Fixed Coveo search.
Apr 2017 - Mar 2024
Founder & Lead Data Scientist
|
Fathom Data
South Africa (remote)
  • R
  • Python
  • management
  • leadership
  • mentoring

Fathom Data is a 100% Data Science consulting company.

  • Established 100% remote company.
  • Recruited & led a team of 10 Data Scientists.
  • Grew revenue from 501k ZAR (2017) to 5,838k ZAR (2023).
  • Gained international clients organically without the use of paid advertising.
  • Built a spatial stochastic optimisation model, allowing a South African security company to optimise the size and operation of their vehicle fleet.
  • Created a pipeline for efficiently processing enormous Transparency in Coverage files, enabling the client to integrate these data into their offering.
  • Translated an SQL iterative solver first into R and then into Python, massively reducing memory and compute requirements.
  • Built a robust suite of web crawlers to gather data from online vehicle dealers.
  • Created web crawlers to gather pricing data from a selection of South African online retailers.
  • Performed raking and analysis of survey data for South African opposition party.
May 2023 - Mar 2024
Backend Engineer (contract)
|
HumanOS
London (remote)
  • Python
  • Flask
  • PostgreSQL
  • WebSockets

HumanOS is a comprehensive wellbeing platform. I was the first (and only) backend engineer, responsible for establishing the product's support infrastructure.

  • Designed, deploy and maintain a PostgreSQL database hosted on RDS.
  • Created a Flask API to support both web and mobile front ends. The API is served via Docker and NGINX from an EC2 instance.
  • Added WebSockects to API, ensuring that the front ends are responsive.
  • Integrated the API with the WeFitter API for acquiring data from wearable devices.
Jan 2020 - Apr 2022
Data Engineer (contract)
|
BluePath Solutions
Los Angeles, CA (remote)
  • R
  • Python
  • AWS
  • web scraping

BluePath Solutions is a consulting firm specialising in health economics.

  • Built, deployed and maintained web crawlers for gathering pharmaceutical data (Red Book, NDC codes and crosswalk, and RxNorm).
  • Maintained and extended a Shiny application.
Oct 2019 - Feb 2020
Data Scientist (contract)
|
HOF Capital
New York, NY (remote)
  • R
  • AWS
  • web scraping

HOF Capital is a seed fund venture capital company.

  • Built web crawlers for gathering data on new company registrations.
  • Deployed automated crawls on AWS.
Jan 2015 - May 2017
Senior Data Scientist
|
Derivco
Durban, South Africa
Sep 2013 - Jan 2015
Game Mathematician
|
Derivco
Durban, South Africa
Jan 2013 - Jan 2014
Researcher
|
University of Bergen
Bergen, Norway
Apr 2011 - Jul 2013
Senior Researcher
|
South African National Space Agency (SANSA)
Hermanus, South Africa
Jan 2006 - Apr 2011
Postdoctoral Researcher
|
Hermanus Magnetic Observatory (HMO)
Hermanus, South Africa

Projects

paddle: Kayak Race Management
https://github.com/datawookie/paddle
  • Python
  • Flask
  • SQLAlchemy

A Flask application for managing entries and results for the Waterside Series of kayak races run by the Newbury Canoe Club.

Medusa Multi-Headed Tor Proxy
https://github.com/datawookie/medusa-proxy
  • Docker

The Medusa Proxy Docker image provides a flexible interface to a set of proxies operating on the Tor network. The operation of the image is explained in a blog post.

{emayili}
https://github.com/datawookie/emayili
  • R
  • email

The {emayili} package provides a simple, tidyverse-compliant interface for sending emails from R. It is lightweight, avoiding some of the bulky dependencies associated with other similar packages.

{binance}
https://github.com/datawookie/binance
  • R

An R wrapper for the Binance API, enabling automated cryptocurrency trading from R.

{clockify}
https://github.com/datawookie/clockify
  • R

An R wrapper for the Clockify API.

{filebin}
https://github.com/datawookie/filebin
  • R

An R wrapper for the Filebin API, making it possible to create and manage ephemeral file shares from R.

{tomtom}
https://github.com/datawookie/tomtom
  • R

An R wrapper for the TomTom Developer API.