I am a builder and a problem solver with expertise in software development, data engineering and scientific computing. I use my technical capabilities to build robust, maintainable data pipelines, web scrapers and automation solutions. A PhD in Physics with extensive and diverse experience gives me the ability to work independently and comfortably handle complex technical challenges. I lead by example and have refined communication skills, with experience mentoring and effectively collaborating across technical and non-technical teams. My passion for learning and sharing is evident on my blog and Stack Overflow profile.

Citizenship: UK ZA

Languages: English (native), Afrikaans (proficient), French (learning)

Skills

Advanced
  • Python
  • web scraping
  • data engineering
  • R
  • SQL
  • HTML
  • CSS
  • AI/ML
  • Flask
  • Linux
  • BASH
  • Git
  • CI/CD
  • Docker
  • automation
Intermediate
  • PostgreSQL
  • Redis
  • AWS
  • SQLAlchemy
  • Scrapy
  • Playwright
  • Selenium
  • C
  • C++

Education

Jun 2006
PhD
(Physics)
|
Royal Institute of Technology
Stockholm, Sweden
Jun 2004
Licentiate
(Physics)
|
Royal Institute of Technology
Stockholm, Sweden
Sep 1998
MSc
(Nuclear Engineering)
|
University of Potchefstroom
Potchefstroom, South Africa
Apr 1994
BSc (Honours: Physics & Mathematics)
|
University of Natal
Durban, South Africa

Work Experience

Sep 2024 - Current
Quantitative Technology Director / Web Crawling Specialist
|
QRT
London (hybrid)

Web Scraping Specialist at QRT (Qube Research & Technologies), a global multi-strategy systematic hedge fund.

  • Responsible for the architecture, reliability, validation and maintenance of resilient automated data acquisition systems.
  • Design and implement large-scale, production-grade web scraping pipelines that ingest diverse, complex and high-value data.
  • Reverse-engineering dynamic sites via underlying APIs.
  • Integrate data sources into research and trading platforms.
  • Lead a small distributed team.
Mar 2020 - Apr 2024
Senior Python Engineer / Web Crawling Specialist (contract)
|
Unrival
London (remote)
  • Python
  • R
  • web scraping
  • Selenium
  • SQLAlchemy
  • Scrapy

Unrival is a cloud-based intelligence platform for understanding corporate structure.

  • Developed a suite of web crawlers for extracting data from LinkedIn and Sales Navigator.
  • Built a generic web crawler for extracting data from corporate C-Suite pages.
  • Created standalone web crawlers to be run by analysts (supporting Linux, Windows and macOS platforms), saving hundreds of hours of manual data extraction.
  • Implemented a collection of scripts for automated documentation and presentation generation.
  • Built and managed a Flask API over a AWS Aurora database.
Oct 2022 - Current
Backend Engineer (contract)
|
Domino Data Lab
San Francisco, CA (remote)
  • Gatsby
  • TypeScript

Domino Data Lab provides a Data Science development platform. I was the backend engineer responsible for the documentation system.

  • Compiled documentation and recorded instructional videos for the Low Code Assistant feature.
  • Migrated site from Gatsby Cloud to Vercel.
  • Implemented full restyling of site.
  • Improved efficiency of build process.
  • Implemented restyling of site.
  • Fixed Coveo search.
Apr 2017 - Mar 2024
Founder & Lead Data Scientist
|
Fathom Data
South Africa (remote)
  • R
  • Python
  • management
  • leadership
  • mentoring

Fathom Data is a Data Science consulting company.

  • Established 100% remote company.
  • Recruited & led a team of 10 Data Scientists.
  • Grew revenue from 501k ZAR (2017) to 5,838k ZAR (2023).
  • Gained international clients organically without the use of paid advertising.
  • Built a spatial stochastic optimisation model determine optimal fleet size and operations for a security company's vehicle fleet.
  • Created a pipeline for efficiently processing enormous Transparency in Coverage files.
  • Translated an SQL iterative solver first into R and then into Python, massively reducing memory and compute requirements.
  • Built a robust suite of web crawlers to gather data from online vehicle dealers.
  • Created web crawlers to gather pricing data from a selection of South African online retailers.
  • Performed raking and analysis of survey data for South African opposition party.
May 2023 - Mar 2024
Backend Engineer (contract)
|
HumanOS
London (remote)
  • Python
  • Flask
  • PostgreSQL
  • WebSockets

HumanOS is a comprehensive wellbeing platform. I was responsible for establishing the database and API infrastructure.

  • Designed, deploy and maintain a PostgreSQL database hosted on RDS.
  • Created a Flask API to support both web and mobile front ends. The API is served via Docker and NGINX from an EC2 instance.
  • Added WebSockects to API, ensuring that the front ends are responsive.
  • Integrated the API with the WeFitter API for acquiring data from wearable devices.
Jan 2020 - Apr 2022
Data Engineer (contract)
|
BluePath Solutions
Los Angeles, CA (remote)
  • R
  • Python
  • AWS
  • web scraping

BluePath Solutions is a consulting firm specialising in health economics.

  • Built, deployed and maintained web crawlers for gathering pharmaceutical data.
  • Maintained and extended a Shiny application.
Oct 2019 - Feb 2020
Data Scientist (contract)
|
HOF Capital
New York, NY (remote)
  • R
  • AWS
  • web scraping

HOF Capital is a seed fund venture capital company.

  • Built web crawlers for gathering data on new company registrations.
  • Deployed automated crawls on AWS.
Jan 2015 - May 2017
Senior Data Scientist
|
Derivco
Durban, South Africa
Sep 2013 - Jan 2015
Game Mathematician
|
Derivco
Durban, South Africa
Jan 2013 - Jan 2014
Researcher
|
University of Bergen
Bergen, Norway
Apr 2011 - Jul 2013
Senior Researcher
|
South African National Space Agency (SANSA)
Hermanus, South Africa
Jan 2006 - Apr 2011
Postdoctoral Researcher
|
Hermanus Magnetic Observatory (HMO)
Hermanus, South Africa

Projects

ibauth: IBKR Authentication Workflow
https://github.com/datawookie/ibauth
  • Python

A package for managing connections to the IBKR API.

ibproxy: IBKR Proxy
https://github.com/datawookie/ibproxy
  • Python

A proxy for relaying requests to the IBKR API.

paddle: Kayak Race Management
https://github.com/datawookie/paddle
  • Python
  • Flask
  • SQLAlchemy

A Flask application for managing entries and results for the Waterside Series of kayak races run by the Newbury Canoe Club.

Medusa Multi-Headed Tor Proxy
https://github.com/datawookie/medusa-proxy
  • Docker

The Medusa Proxy Docker image provides a flexible interface to a set of proxies operating on the Tor network. The operation of the image is explained in a blog post.

{emayili}
https://github.com/datawookie/emayili
  • R
  • email

The {emayili} package provides a simple, tidyverse-compliant interface for sending emails from R. It is lightweight, avoiding some of the bulky dependencies associated with other similar packages.

{binance}
https://github.com/datawookie/binance
  • R

An R wrapper for the Binance API, enabling automated cryptocurrency trading from R.

{clockify}
https://github.com/datawookie/clockify
  • R

An R wrapper for the Clockify API.

{filebin}
https://github.com/datawookie/filebin
  • R

An R wrapper for the Filebin API, making it possible to create and manage ephemeral file shares from R.