A Brief Explanation of yield
In Python, when a function uses the yield
keyword, it becomes a special type of function called a generator.
Think of a normal function with return
as a worker who does a job, gives you the single final result, and goes home. A generator function with yield
is like a worker on an assembly line who gives you one finished product at a time, waits for you to be ready for the next one, and then continues working from where they left off.
return
ends the function completely.yield
pauses the function, “hands off” a value, and remembers its exact state, ready to resume right where it paused.
Simple Example of yield
Here is a simple counter that shows how yield
works.
# This is a generator function def simple_counter(max_number): print("Generator starting...") num = 1 while num <= max_number: # The function pauses here and hands 'num' back to the loop yield num num += 1 print("Generator finished!") # --- How to use the generator --- # Note: The "Generator starting..." message does not print yet! # We are just creating the generator object. my_gen = simple_counter(3) print("Now, let's start the loop...") for number in my_gen: print(f"The loop received: {number}")
Output:
Now, let's start the loop... Generator starting... The loop received: 1 The loop received: 2 The loop received: 3 Generator finished!
Notice how the code inside the generator only runs when the for
loop asks for the next item.
Visualizing parse_fasta
Step-by-Step
Now, let’s apply that “pausing” concept to our FASTA parser. The yield
is the moment the machine hands off a completed Sequence
object.
Our File: test.fasta
>SEQ1 ACGT GGGG >SEQ2 TTT
Step 1: The Machine Starts
The function is called, the file is opened, and the initial state is set. The code is ready but the loop hasn’t started.
name
isNone
.sequence_lines
is[]
.
Code Executing:
def parse_fasta(cls, filepath: str): with open(filepath, 'r') as f: name = None sequence_lines = [] for line in f: # The loop is about to start...
Step 2: Reading >SEQ1
The for
loop asks for its first item. The generator runs until the first yield
. It reads the >SEQ1
line and updates its state.
Code Executing:
# line is ">SEQ1" if line.startswith('>'): if name is not None: # This is false yield ... # The state is updated for the new sequence. name = line[1:].strip() # name becomes "SEQ1" sequence_lines = []
Step 3: Reading Sequence Data
The for
loop continues. The generator reads the ACGT
and GGGG
lines, appending them to its internal list. It hasn’t hit a yield
yet.
Code Executing:
# For line "ACGT" and then for "GGGG": if line.startswith('>'): # This is false. ... elif name is not None: # This is true. The line is appended. sequence_lines.append(line)
Step 4: The First YIELD
This is the key! The for
loop asks for the next item, and the generator reads >SEQ2
. It sees the >
and knows it has just finished SEQ1.
Code Executing:
# line is ">SEQ2" if line.startswith('>'): # This time, `name` is "SEQ1", so the condition is TRUE! if name is not None: # The function PAUSES here and hands off the value. yield cls("".join(sequence_lines), name) # After yielding, it resets for the new sequence. name = line[1:].strip() sequence_lines = []
Step 5: Finishing Up
- The
for
loop asks for another item. The generator resumes from where it paused. - It reads the
TTT
line and adds it tocurrent_sequence_lines
. - It reaches the end of the file, and the
for
loop finishes. - The code after the loop runs to handle the very last sequence.
Code Executing (After the loop):
# The for loop has finished. if name is not None: # `name` is "SEQ2", so this is TRUE. # The last sequence is yielded. yield cls("".join(sequence_lines), name)
The machine yields the final Sequence("TTT", "SEQ2")
object, and the function is complete.