A Brief Explanation of yield and parse_fasta() algorithm

A Brief Explanation of yield

In Python, when a function uses the yield keyword, it becomes a special type of function called a generator.

Think of a normal function with return as a worker who does a job, gives you the single final result, and goes home. A generator function with yield is like a worker on an assembly line who gives you one finished product at a time, waits for you to be ready for the next one, and then continues working from where they left off.

  • return ends the function completely.
  • yield pauses the function, “hands off” a value, and remembers its exact state, ready to resume right where it paused.

Simple Example of yield

Here is a simple counter that shows how yield works.

# This is a generator function
def simple_counter(max_number):
    print("Generator starting...")
    num = 1
    while num <= max_number:
        # The function pauses here and hands 'num' back to the loop
        yield num
        num += 1
    print("Generator finished!")

# --- How to use the generator ---
# Note: The "Generator starting..." message does not print yet!
# We are just creating the generator object.
my_gen = simple_counter(3)

print("Now, let's start the loop...")
for number in my_gen:
    print(f"The loop received: {number}")

Output:

Now, let's start the loop...
Generator starting...
The loop received: 1
The loop received: 2
The loop received: 3
Generator finished!

Notice how the code inside the generator only runs when the for loop asks for the next item.


Visualizing parse_fasta Step-by-Step

Now, let’s apply that “pausing” concept to our FASTA parser. The yield is the moment the machine hands off a completed Sequence object.

Our File: test.fasta

>SEQ1
ACGT
GGGG
>SEQ2
TTT

Step 1: The Machine Starts

The function is called, the file is opened, and the initial state is set. The code is ready but the loop hasn’t started.

  • name is None.
  • sequence_lines is [].

Code Executing:

def parse_fasta(cls, filepath: str):
    with open(filepath, 'r') as f:
        name = None
        sequence_lines = []
        for line in f:
            # The loop is about to start...

Step 2: Reading >SEQ1

The for loop asks for its first item. The generator runs until the first yield. It reads the >SEQ1 line and updates its state.

Code Executing:

# line is ">SEQ1"
if line.startswith('>'):
    if name is not None: # This is false
        yield ...

    # The state is updated for the new sequence.
    name = line[1:].strip()  # name becomes "SEQ1"
    sequence_lines = []

Step 3: Reading Sequence Data

The for loop continues. The generator reads the ACGT and GGGG lines, appending them to its internal list. It hasn’t hit a yield yet.

Code Executing:

# For line "ACGT" and then for "GGGG":
if line.startswith('>'):  # This is false.
    ...
elif name is not None:
    # This is true. The line is appended.
    sequence_lines.append(line)

Step 4: The First YIELD

This is the key! The for loop asks for the next item, and the generator reads >SEQ2. It sees the > and knows it has just finished SEQ1.

Code Executing:

# line is ">SEQ2"
if line.startswith('>'):
    # This time, `name` is "SEQ1", so the condition is TRUE!
    if name is not None:
        # The function PAUSES here and hands off the value.
        yield cls("".join(sequence_lines), name)

    # After yielding, it resets for the new sequence.
    name = line[1:].strip()
    sequence_lines = []

Step 5: Finishing Up

  • The for loop asks for another item. The generator resumes from where it paused.
  • It reads the TTT line and adds it to current_sequence_lines.
  • It reaches the end of the file, and the for loop finishes.
  • The code after the loop runs to handle the very last sequence.

Code Executing (After the loop):

        # The for loop has finished.
        if name is not None:
            # `name` is "SEQ2", so this is TRUE.
            # The last sequence is yielded.
            yield cls("".join(sequence_lines), name)

The machine yields the final Sequence("TTT", "SEQ2") object, and the function is complete.