A Brief Explanation of yield
In Python, when a function uses the yield keyword, it becomes a special type of function called a generator.
Think of a normal function with return as a worker who does a job, gives you the single final result, and goes home. A generator function with yield is like a worker on an assembly line who gives you one finished product at a time, waits for you to be ready for the next one, and then continues working from where they left off.
returnends the function completely.yieldpauses the function, “hands off” a value, and remembers its exact state, ready to resume right where it paused.
Simple Example of yield
Here is a simple counter that shows how yield works.
# This is a generator function
def simple_counter(max_number):
print("Generator starting...")
num = 1
while num <= max_number:
# The function pauses here and hands 'num' back to the loop
yield num
num += 1
print("Generator finished!")
# --- How to use the generator ---
# Note: The "Generator starting..." message does not print yet!
# We are just creating the generator object.
my_gen = simple_counter(3)
print("Now, let's start the loop...")
for number in my_gen:
print(f"The loop received: {number}")Output:
Now, let's start the loop... Generator starting... The loop received: 1 The loop received: 2 The loop received: 3 Generator finished!
Notice how the code inside the generator only runs when the for loop asks for the next item.
Visualizing parse_fasta Step-by-Step
Now, let’s apply that “pausing” concept to our FASTA parser. The yield is the moment the machine hands off a completed Sequence object.
Our File: test.fasta
>SEQ1 ACGT GGGG >SEQ2 TTT
Step 1: The Machine Starts
The function is called, the file is opened, and the initial state is set. The code is ready but the loop hasn’t started.
nameisNone.sequence_linesis[].
Code Executing:
def parse_fasta(cls, filepath: str):
with open(filepath, 'r') as f:
name = None
sequence_lines = []
for line in f:
# The loop is about to start...Step 2: Reading >SEQ1
The for loop asks for its first item. The generator runs until the first yield. It reads the >SEQ1 line and updates its state.
Code Executing:
# line is ">SEQ1"
if line.startswith('>'):
if name is not None: # This is false
yield ...
# The state is updated for the new sequence.
name = line[1:].strip() # name becomes "SEQ1"
sequence_lines = []Step 3: Reading Sequence Data
The for loop continues. The generator reads the ACGT and GGGG lines, appending them to its internal list. It hasn’t hit a yield yet.
Code Executing:
# For line "ACGT" and then for "GGGG":
if line.startswith('>'): # This is false.
...
elif name is not None:
# This is true. The line is appended.
sequence_lines.append(line)Step 4: The First YIELD
This is the key! The for loop asks for the next item, and the generator reads >SEQ2. It sees the > and knows it has just finished SEQ1.
Code Executing:
# line is ">SEQ2"
if line.startswith('>'):
# This time, `name` is "SEQ1", so the condition is TRUE!
if name is not None:
# The function PAUSES here and hands off the value.
yield cls("".join(sequence_lines), name)
# After yielding, it resets for the new sequence.
name = line[1:].strip()
sequence_lines = []Step 5: Finishing Up
- The
forloop asks for another item. The generator resumes from where it paused. - It reads the
TTTline and adds it tocurrent_sequence_lines. - It reaches the end of the file, and the
forloop finishes. - The code after the loop runs to handle the very last sequence.
Code Executing (After the loop):
# The for loop has finished.
if name is not None:
# `name` is "SEQ2", so this is TRUE.
# The last sequence is yielded.
yield cls("".join(sequence_lines), name)The machine yields the final Sequence("TTT", "SEQ2") object, and the function is complete.