A Primer on Data Normalization

Normalizing data is a common data engineering task. It prepares information to be stored in a way that minimizes duplication and is digestible by machines.  It also aims to solve other problems and issues that are out of scope for this particular article but worth reading about if you find yourself struggling to understand jokes …

Let Pycharm Use WSL’s Git Executable

This post is mostly for me but I ran into a ton of conflicting information while troubleshooting my Windows Subsystem for Linux (WSL) and PyCharm integration and figured it may help someone else. First things first. Versions matter! Before wasting your time trying to get Pycharm and WSL to play nicely, make sure you are …

Speed Up Your REST Workflows with asyncio

I have been waiting for a project that would allow me to dig into the Python’s asyncio library. Recently, such a project presented itself. I was tasked with hitting a rate limited REST API with just under 4 million requests. My first attempt was simple. Gather and build a block of search queries, POST each …

How to Get the First N Bytes of a File

There comes a time when you just need to take a little off the top of a file, see what you are working with. That is where knowing how to use a utility like head can help. Just running: $ head filename.txt Will get you Print the first 10 lines of each FILE to standard …

Search for a String in a list of Encrypted Values

Imagine a scenario where one party wants to check whether a name they have exists in a list of names kept by the another party. But I do not want the other party to know what name I am searching. This problem may seem unrealistic but imagine a data breach where tons of personal information …

Your Simple Guide to Collecting Oral History

Collecting memories from people is an excellent way to celebrate the experience of others. I have found it helps me learn more about why people hold certain beliefs, how they overcame hardships, and the world we live in. Interviewing other people has helped me learn more about myself, which is why I wanted to write …

Troubleshooting Windows Subsystem for Linux and SSH

The Windows Subsystem for Linux (WSL) is one of the best features on Windows 10. It makes development so much easier than it used to be but still has a few hiccups. Kinda like Linux, some things don’t “just work.” One pesky thing is getting SSH to work with a keypair file from WSL. This post details how to get SSH working on WSL.

Kafkacat Amazon Workspace

Below are some notes on getting kafkacat installed on an Amazon workspace with admin access. The commands listed on the GitHub page will not work without a little preparation. A Linux Amazon Workspace image is based on Amazon Linux. Attempts to use a package manager like yum go through a plugin, amzn_workspaces_filter_updates. This filter only …

Processing Audio Files with Amazon Transcribe

I have been working on collecting a family’s oral history for the past few months. During the process I took notes with simple descriptions of what the speaker was describing or telling and a rough timestamp of when in the file the conversation took place. After collecting hours of stories, I realized that having a …