MilkCrunch

Advent of Code Prep: Foundation Skills

· by Michael Doornbos · 2242 words

This is part 1 of a three-part series on preparing for Advent of Code. See the main overview for the complete roadmap, or jump to Part 2: Core Skills or Part 3: Advanced Techniques.

Before you learn fancy algorithms, you need to be fluent in the basics. I’ve watched people struggle with AoC puzzles not because the algorithm was hard, but because they spent twenty minutes fighting with string parsing or couldn’t remember how to use a dictionary properly.

This article covers the foundation: the skills you’ll use in literally every puzzle. Master these first.

String Parsing: Where Every Puzzle Begins

Every AoC puzzle hands you a text file. Before you can solve anything, you need to turn that text into usable data. This is where most beginners lose time.

The split() method

split() is your workhorse. It breaks a string into a list at every occurrence of a delimiter.

# Split on whitespace (default)
"hello world".split()           # ['hello', 'world']

# Split on a specific character
"a,b,c".split(',')              # ['a', 'b', 'c']

# Split on a string
"one->two->three".split('->')   # ['one', 'two', 'three']

The default (no argument) splits on any whitespace and handles multiple spaces gracefully:

"hello    world".split()        # ['hello', 'world']
"hello    world".split(' ')     # ['hello', '', '', '', 'world']

See the difference? The second version creates empty strings for each space. Usually you want the first behavior.

Splitting input into lines

AoC input is typically multiple lines. Here’s the pattern:

data = """line one
line two
line three
"""

lines = data.strip().split('\n')
# ['line one', 'line two', 'line three']

The strip() removes the trailing newline. Without it:

data.split('\n')
# ['line one', 'line two', 'line three', '']

That empty string at the end will cause bugs. Always strip first.

Splitting into sections

Some puzzles have blank-line-separated sections:

data = """section one
more of section one

section two
continues here"""

sections = data.strip().split('\n\n')
# ['section one\nmore of section one', 'section two\ncontinues here']

# Then split each section into lines
for section in sections:
    lines = section.split('\n')

This pattern appears constantly—puzzle rules in one section, input data in another.

String slicing

Slicing extracts parts of a string:

s = "hello world"

s[0]      # 'h' - first character
s[-1]     # 'd' - last character
s[0:5]    # 'hello' - characters 0-4
s[:5]     # 'hello' - same thing, start defaults to 0
s[6:]     # 'world' - from index 6 to end
s[-5:]    # 'world' - last 5 characters
s[::2]    # 'hlowrd' - every 2nd character
s[::-1]   # 'dlrow olleh' - reversed

The syntax is [start:end:step]. All three are optional. The end index is exclusive—s[0:5] gets indices 0, 1, 2, 3, 4.

Parsing structured data

AoC loves input like "move 3 from 1 to 2". Break it down step by step:

line = "move 3 from 1 to 2"

# Split into words
parts = line.split()  # ['move', '3', 'from', '1', 'to', '2']

# Extract what you need
amount = int(parts[1])  # 3
source = int(parts[3])  # 1
dest = int(parts[5])    # 2

Or use unpacking if the format is fixed:

_, amount, _, source, _, dest = line.split()
amount, source, dest = int(amount), int(source), int(dest)

The _ is a convention for “I don’t care about this value.”

Extracting numbers with different formats

# "R25" - direction and number
direction = line[0]       # 'R'
amount = int(line[1:])    # 25

# "3-7" - range
start, end = line.split('-')
start, end = int(start), int(end)

# "(123, 456)" - coordinates
line = "(123, 456)"
# Remove parentheses, split on comma
coords = line[1:-1].split(', ')
x, y = int(coords[0]), int(coords[1])

# Or more directly with multiple operations
x, y = map(int, line[1:-1].split(', '))

The join() method

join() is the opposite of split()—it combines a list into a string:

words = ['hello', 'world']

' '.join(words)     # 'hello world'
','.join(words)     # 'hello,world'
''.join(words)      # 'helloworld'
'\n'.join(words)    # 'hello\nworld'

The separator goes before .join(), which feels backwards but makes sense when you realize it’s a string method.

Practice problem: Parse a grid

Given this input:

###.#
#...#
#.#.#
#...#
#####

Parse it into a 2D list where # is True and . is False:

data = """###.#
#...#
#.#.#
#...#
#####"""

grid = []
for line in data.strip().split('\n'):
    row = [char == '#' for char in line]
    grid.append(row)

# Or as a one-liner
grid = [[c == '#' for c in line] for line in data.strip().split('\n')]

List Comprehensions: Write Less, Do More

List comprehensions are Python’s way of building lists concisely. Once you’re fluent with them, you’ll use them everywhere.

Basic syntax

# Traditional loop
squares = []
for x in range(5):
    squares.append(x ** 2)
# [0, 1, 4, 9, 16]

# List comprehension
squares = [x ** 2 for x in range(5)]
# [0, 1, 4, 9, 16]

The pattern is [expression for item in iterable].

Adding conditions

Filter with if:

# Only even squares
[x ** 2 for x in range(10) if x % 2 == 0]
# [0, 4, 16, 36, 64]

# Only non-empty strings
[s for s in strings if s]

# Only positive numbers
[n for n in numbers if n > 0]

Transforming data

Comprehensions shine for data transformation:

# Convert strings to integers
numbers = [int(x) for x in "1 2 3 4 5".split()]

# Extract first character from each word
firsts = [word[0] for word in words]

# Uppercase everything
upper = [s.upper() for s in strings]

# Get lengths
lengths = [len(s) for s in strings]

Nested comprehensions

For 2D structures, you can nest:

# Create a 3x3 grid of zeros
grid = [[0 for c in range(3)] for r in range(3)]
# [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

# Flatten a 2D list
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [item for row in matrix for item in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

The nested version reads like nested loops: “for each row in matrix, for each item in row.”

Conditional expressions (ternary)

You can use if-else in the expression part:

# Replace negatives with zero
[x if x > 0 else 0 for x in numbers]

# Convert boolean to character
['#' if cell else '.' for cell in row]

Note the position: if after for is a filter. if-else before for is a conditional expression.

# Filter: only process positive numbers
[x * 2 for x in nums if x > 0]

# Transform: double positives, zero negatives
[x * 2 if x > 0 else 0 for x in nums]

Dictionary and set comprehensions

Same syntax works for dicts and sets:

# Dictionary comprehension
{word: len(word) for word in words}
# {'hello': 5, 'world': 5}

# Set comprehension
{x % 10 for x in numbers}  # unique last digits

Generator expressions

Replace [] with () for a generator that doesn’t build the whole list in memory:

# Sum of squares (doesn't build intermediate list)
total = sum(x ** 2 for x in range(1000000))

# Any/all with generator
any(x > 100 for x in numbers)
all(len(s) > 0 for s in strings)

For large data, generators save memory. For AoC, lists are usually fine.

Practice problem: Transform input

Given lines like "Player 1: 7, 4, 9, 5, 11", extract player number and their numbers:

line = "Player 1: 7, 4, 9, 5, 11"

# Split into parts
player_part, numbers_part = line.split(': ')
player_num = int(player_part.split()[1])
numbers = [int(x) for x in numbers_part.split(', ')]

# player_num = 1
# numbers = [7, 4, 9, 5, 11]

Sets: Fast Membership Testing

Sets are unordered collections of unique elements. Their superpower: O(1) membership testing.

When to use sets

Use a set when you need to answer “is X in this collection?” repeatedly.

# Slow: O(n) for each check
visited = []
if position not in visited:  # Scans entire list
    visited.append(position)

# Fast: O(1) for each check
visited = set()
if position not in visited:  # Hash lookup
    visited.add(position)

For a list of 10,000 items, the list version does up to 10,000 comparisons per check. The set version does roughly 1.

Basic operations

s = set()           # Empty set
s = {1, 2, 3}       # Set with values
s = set([1, 2, 3])  # Set from list

s.add(4)            # Add element
s.remove(4)         # Remove (raises KeyError if missing)
s.discard(4)        # Remove (no error if missing)

4 in s              # Membership test
len(s)              # Number of elements

Set operations

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

a | b       # Union: {1, 2, 3, 4, 5, 6}
a & b       # Intersection: {3, 4}
a - b       # Difference: {1, 2}
a ^ b       # Symmetric difference: {1, 2, 5, 6}

a.issubset(b)      # Is a contained in b?
a.issuperset(b)    # Does a contain b?

Deduplication

Sets automatically remove duplicates:

numbers = [1, 2, 2, 3, 3, 3, 4]
unique = set(numbers)  # {1, 2, 3, 4}

# Count unique elements
len(set(items))

Coordinate tracking

Perfect for grid puzzles where you track visited positions:

visited = set()
position = (0, 0)

visited.add(position)
visited.add((1, 0))
visited.add((0, 1))

if (1, 0) in visited:
    print("Been there")

# All positions visited
print(len(visited))

Tuples are hashable, so they work as set elements. Lists don’t.

Practice problem: Find duplicates

Find the first character that appears twice in a string:

def first_duplicate(s):
    seen = set()
    for char in s:
        if char in seen:
            return char
        seen.add(char)
    return None

first_duplicate("abcdefgc")  # 'c'

Dictionaries: Key-Value Mapping

Dictionaries map keys to values. Essential for counting, grouping, and fast lookups.

Basic operations

d = {}                    # Empty dict
d = {'a': 1, 'b': 2}      # Dict with values
d = dict(a=1, b=2)        # Alternative syntax

d['a']                    # Get value (KeyError if missing)
d.get('a')                # Get value (None if missing)
d.get('z', 0)             # Get value (0 if missing)

d['c'] = 3                # Set value
del d['c']                # Delete key

'a' in d                  # Key exists?
len(d)                    # Number of keys

Iteration

d = {'a': 1, 'b': 2, 'c': 3}

# Iterate keys (default)
for key in d:
    print(key)

# Iterate values
for value in d.values():
    print(value)

# Iterate key-value pairs
for key, value in d.items():
    print(f"{key}: {value}")

Counting with dictionaries

The classic pattern:

# Count character frequencies
text = "abracadabra"
counts = {}
for char in text:
    if char in counts:
        counts[char] += 1
    else:
        counts[char] = 1
# {'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1}

The get() method simplifies this:

counts = {}
for char in text:
    counts[char] = counts.get(char, 0) + 1

defaultdict: Automatic defaults

defaultdict eliminates the need for existence checks:

from collections import defaultdict

# Counting
counts = defaultdict(int)  # Missing keys default to 0
for char in text:
    counts[char] += 1

# Grouping
groups = defaultdict(list)  # Missing keys default to []
for item in items:
    groups[item.category].append(item)

# Nested dicts
graph = defaultdict(dict)
graph['a']['b'] = 5  # No KeyError for 'a'

The argument to defaultdict is a function that returns the default value. int returns 0, list returns [], set returns set().

Counter: Purpose-built counting

For pure counting, Counter is even cleaner:

from collections import Counter

# Count frequencies
freq = Counter("abracadabra")
# Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

freq['a']           # 5
freq['z']           # 0 (missing keys return 0, no error)
freq.most_common(2) # [('a', 5), ('b', 2)]

# Count anything iterable
Counter([1, 1, 2, 3, 3, 3])
Counter(words)

Building graphs

Dictionaries naturally represent graphs:

# Adjacency list
graph = {
    'a': ['b', 'c'],
    'b': ['a', 'd'],
    'c': ['a', 'd'],
    'd': ['b', 'c']
}

# With weights
graph = {
    'a': {'b': 5, 'c': 3},
    'b': {'a': 5, 'd': 2},
}
# Distance from a to b is graph['a']['b']

Practice problem: Word frequency

Count word frequencies in text, ignoring case:

text = "The quick brown fox jumps over the lazy dog"

words = text.lower().split()
freq = Counter(words)

# Most common word
freq.most_common(1)  # [('the', 2)]

Putting It Together

Here’s a realistic AoC-style problem that uses all these skills:

Problem: Given input describing connections between rooms, find how many rooms are reachable from “start”.

start-A
start-B
A-C
A-D
B-D
C-end
D-end

Solution:

from collections import defaultdict

data = """start-A
start-B
A-C
A-D
B-D
C-end
D-end"""

# Parse into adjacency list
graph = defaultdict(list)
for line in data.strip().split('\n'):
    a, b = line.split('-')
    graph[a].append(b)
    graph[b].append(a)

# BFS to find reachable rooms
visited = set()
queue = ['start']

while queue:
    room = queue.pop(0)
    if room in visited:
        continue
    visited.add(room)

    for neighbor in graph[room]:
        if neighbor not in visited:
            queue.append(neighbor)

print(f"Reachable rooms: {len(visited)}")
# Reachable rooms: 6

This uses:


Exercises

Try these before moving to Part 2:

  1. Parse coordinates: Given "(3, 7) -> (5, 7)", extract both coordinate pairs as tuples of integers.

  2. Count letters: Given a string, return a dictionary mapping each letter to its count, ignoring non-letters.

  3. Find intersection: Given two lists of numbers, find which numbers appear in both (use sets).

  4. Group by first letter: Given a list of words, create a dictionary mapping first letters to lists of words starting with that letter.

  5. Parse grid: Given a grid of digits as a string, parse it into a 2D list of integers.


Ready to practice? Head to adventofcode.com and try some past puzzles. The 2015-2024 archives are all available.

Next in this series: Part 2: Core Skills covers 2D grid navigation, BFS/DFS, and recursion with memoization.

<< Previous Post

|

Next Post >>