Andrew is a Data Scientist with experience in academic and commercial environments. Working as an Experimental Physicist he refined his skills in research, data analysis, scientific computing, writing and presentation. He now leverages these expertise as a Data Scientist. Things he likes to do (and he's pretty good at):

  • transforming raw and messy data into a tidy and workable data set;
  • separating signal from noise in large volumes of data;
  • making sense of analytical results within a business or scientific context;
  • communicating illuminating results in a clear and intuitive way with attractive visualisations and insightful discussion; and
  • doing all of this in plain English, accessible to both experts and laymen.

Comfortable working in a remote or hybrid environment.

Andrew is principally a builder. This is what he enjoys doing the most and where he excels. He has a passion for solving problems, as demonstrated on his blog and StackOverflow profile.

Skills

Advanced
  • R
  • Python
  • SQL
  • C
  • C++
  • HTML
  • CSS
  • AI/ML
  • web scraping
  • Flask
  • Linux
  • BASH
  • Git
  • CI/CD
  • Docker
  • automation
Intermediate
  • PostgreSQL
  • Redis
  • AWS
  • MPI
  • SQLAlchemy
  • Scrapy
  • Playwright
  • Selenium

Education

PhD
(Space Physics)
|
Royal Institute of Technology
Stockholm, Sweden
Licentiate
(Space Physics)
|
Royal Institute of Technology
Stockholm, Sweden
MSc
(Nuclear Engineering)
|
University of Potchefstroom
Potchefstroom, South Africa
BSc (Honours)
|
University of Natal
Durban, South Africa

Work Experience

Apr 2017 - Mar 2024
Founder & Lead Data Scientist
|
Fathom Data
South Africa (remote)
  • R
  • Python
  • management
  • leadership
  • mentoring
  • developer

Technical leadership on a team of Data Scientists working on a diverse range of projects.

  • Grew revenue from R 501k (2017) to R 5,838k (2023).
  • Grew team from 1 to 10 people.
  • Built a spatial stochastic optimisation model which allowed a South African security company to reduce the size of their vehicle fleet and operate it more efficiently.
  • Initiated pipeline for efficiently processing enormous Transparency in Coverage files, enabling the client to integrate these data into their offering.
  • Built a robust suite of web crawlers to gather data from online vehicle dealers.
  • Translated an SQL iterative solver first into R and then into Python, massively reducing memory and compute requirements.
  • Created web crawlers to gather pricing data from a selection of South African online retailers.
  • Performed raking and analysis of political survey data.
May 2023 - Mar 2024
Backend Engineer (contract)
|
HumanOS
London (remote)
  • Python
  • Flask
  • PostgreSQL
  • WebSockets

HumanOS is a comprehensive wellbeing platform. I was the first (and only) backend engineer, responsible for establishing the product's support infrastructure.

  • Designed, deploy and maintain a PostgreSQL database hosted on RDS.
  • Created a Flask API to support both web and mobile front ends. The API is served via Docker and NGINX from an EC2 instance.
  • Added WebSockects to API, ensuring that the front ends are responsive.
  • Integrate the API with the WeFitter API for acquiring data from wearable devices.
Mar 2020 - Current
Web Crawling Specialist (contract)
|
Unrival
London (remote)
  • Python
  • R
  • web scraping
  • Selenium
  • SQLAlchemy
  • Scrapy

Unrival is a cloud-based intelligence platform for understanding corporate structure.

  • Created standalone crawlers to be run by analysts (supporting Linux, Windows and macOS platforms), saving hundreds of hours of manual data extraction.
  • Developed a suite of web crawlers for extracting data, principally from LinkedIn.
  • Built a generic web crawler for extracting data from corporate C-Suite pages.
  • Implemented a collection of scripts for automated documentation and presentation generation.
  • Built and maintained a Flask API over a AWS Aurora database.
Oct 2022 - Current
Backend Engineer (contract)
|
Domino Data Lab
San Francisco, CA (remote)
  • Gatsby
  • TypeScript

Domino Data Lab provides a Data Science development platform. I was the backend engineer responsible for the documentation system.

  • Compiled documentation and recorded instructional videos for the Low Code Assistant feature.
  • Migrated site from Gatsby Cloud to Vercel.
  • Implemented full restyling of site.
  • Improved efficiency of build process.
  • Implemented restyling of site.
  • Fixed Coveo search.
Jan 2020 - Apr 2022
Data Engineer (contract)
|
BluePath Solutions
Los Angeles, CA (remote)
  • R
  • Python
  • AWS
  • web scraping

BluePath Solutions is a consulting firm specialising in health economics.

Oct 2019 - Feb 2020
Data Scientist (contract)
|
HOF Capital
New York, NY (remote)
  • R
  • AWS
  • web scraping

HOF Capital is a seed fund venture capital company.

Jan 2015 - May 2017
Senior Data Scientist
|
Derivco
Durban, South Africa
Sep 2013 - Jan 2015
Game Mathematician
|
Derivco
Durban, South Africa
Jan 2013 - Jan 2014
Researcher
|
University of Bergen
Bergen, Norway
Apr 2011 - Jul 2013
Senior Researcher
|
South African National Space Agency (SANSA)
Hermanus, South Africa
Jan 2006 - Apr 2011
Postdoctoral Researcher
|
Hermanus Magnetic Observatory (HMO)
Hermanus, South Africa

Projects

paddle: Kayak Race Management
https://github.com/datawookie/paddle
  • Python
  • Flask
  • SQLAlchemy

A Flask application for managing entries and results for the Waterside Series of kayak races run by the Newbury Canoe Club.

Medusa Multi-Headed Tor Proxy
https://github.com/datawookie/medusa-proxy
  • Docker

The Medusa Proxy Docker image provides a flexible interface to a set of proxies operating on the Tor network. The operation of the image is explained in a blog post.

{emayili}
https://github.com/datawookie/emayili
  • R
  • email

The {emayili} package provides a simple, tidyverse-compliant interface for sending emails from R. It is lightweight, avoiding some of the bulky dependencies associated with other similar packages.

{binance}
https://github.com/datawookie/binance
  • R

An R wrapper for the Binance API, enabling automated cryptocurrency trading from R.

{clockify}
https://github.com/datawookie/clockify
  • R

An R wrapper for the Clockify API.

{filebin}
https://github.com/datawookie/filebin
  • R

An R wrapper for the Filebin API, making it possible to create and manage ephemeral file shares from R.

{tomtom}
https://github.com/datawookie/tomtom
  • R

An R wrapper for the TomTom Developer API.