Codepth.dev

From .NET to AI Engineer — Part 1: Unlearning the Loop

This is Part 1 of a series on how an experienced software engineer can stay current by adding AI — stage by stage, theory first and then the thing I built with it. I did it from a .NET background; if you're a backend or .NET developer, this is the series I wish I'd had.

I spent years writing C#. So when I opened my first real Python data script, my hands did what they always did: reached for a loop. A row comes in, you iterate, you process it, you move on. That instinct served me well for a decade.

It's also the first thing you have to unlearn.

Stage 1 took me about three days, and almost all of it was retraining that one reflex. The Python syntax is easy to pick up when you already know a C-family language. The genuinely hard part — the part that actually matters for everything later in AI engineering — is changing how you think about data. So that's where I'll spend most of this post.

One note before we start: I learned this in Python because I already knew it and it's where most AI tooling lives — but if you'd rather stay in .NET, you can, and I'll publish a .NET version of this path soon. The mindset shift in this part matters far more than the language it's written in.

The plan: learn first, then build

Every stage in this series follows the same rhythm, because it's the rhythm I studied with: a few hours of theory in the morning, then four or five hours building something real in the afternoon, and a push to GitHub at night. No copy-pasting — I typed everything, because typing is where the errors (and the learning) actually happen.

For Stage 1, the theory is four ideas, and the build is one project: a stock data engine that cleans a large CSV and computes financial metrics — without a single for loop over the rows.

Theory

1. The Python mindset, coming from C#

Most of the syntax maps over cleanly. The differences that bit me early were the idioms — the Pythonic way of doing things that looks alien at first and then becomes second nature.

List comprehensions are the obvious one. Where in C# I'd reach for LINQ:

var squares = numbers.Select(n => n * n).ToList();

In Python that's a comprehension:

squares = [n * n for n in numbers]

Generators were the idea that took longer to click. A generator produces values lazily, one at a time, instead of building the whole list in memory — which matters enormously when you're streaming through a file too big to load at once. And decorators (@something above a function) are just wrappers — close enough to C# attributes combined with middleware that the concept felt familiar even if the syntax didn't.

2. OOP in Python — and why Pydantic exists

Coming from C#, the unsettling thing about Python is dynamic typing. There's no compiler standing between you and a string you accidentally treated as an int. That freedom is nice until your data pipeline silently passes garbage downstream.

This is where Pydantic earned a permanent spot in my toolkit. Pydantic lets you define data models as classes with typed fields, and it validates and coerces the data at runtime — the closest thing Python has to the safety of a strongly-typed DTO:

from pydantic import BaseModel

class LoanApplication(BaseModel):
    customer_id: int
    income: float
    debt: float

# Raises a clear validation error if the data doesn't fit the model
app = LoanApplication(customer_id="42", income="55000", debt=12000)
print(app.customer_id)  # 42  -> coerced to int

If you've ever leaned on model validation in ASP.NET, Pydantic will feel like home. It became the boundary I put around every piece of untrusted data entering my systems.

3. The no-loop rule (this is the whole point)

Here is the mental shift. In C#, processing a million rows looks like this:

foreach (var row in rows)
{
    row.DailyReturn = (row.Close - row.Open) / row.Open;
}

You tell the computer how to walk the data, one element at a time. In NumPy and Pandas, you don't. You describe the operation on the entire column at once and let the library run it in optimized, compiled code underneath:

df["daily_return"] = (df["close"] - df["open"]) / df["open"]

That single line does the work of the whole loop — across every row — and runs dramatically faster, because the iteration happens in C under the hood instead of in your Python for loop. This is called vectorization, and once it clicks, you start seeing loops as a code smell in data work.

The rule I gave myself: if I'm writing a for loop over rows of a DataFrame, I'm probably doing it wrong. Almost every per-row operation has a vectorized form — and the times it genuinely doesn't are rare enough to be worth pausing over.

4. Virtual environments (a two-minute habit that saves hours)

One unglamorous but essential piece: every project gets its own isolated environment, so one project's library versions can't break another's.

python -m venv .venv
source .venv/bin/activate      # macOS/Linux
.venv\Scripts\activate         # Windows
pip install numpy pandas pydantic

I learned this the hard way later in the series when a stray dependency from one project crashed an unrelated one. Isolate from day one.

Build: the stock data engine

With the theory in hand, the Stage 1 project was a data engine that ingests a large CSV of market data, cleans it, and computes a set of metrics — all vectorized.

The features I built, and the vectorized one-liners behind them:

import pandas as pd

df = pd.read_csv("prices.csv")

# Daily return
df["daily_return"] = (df["close"] - df["open"]) / df["open"]

# 7-day and 30-day moving averages
df["ma_7"]  = df["close"].rolling(window=7).mean()
df["ma_30"] = df["close"].rolling(window=30).mean()

# Rolling volatility (30-day standard deviation)
df["volatility"] = df["close"].rolling(window=30).std()

# Volume spike: today's volume more than 2x the recent average
df["avg_volume"]   = df["volume"].rolling(window=20).mean()
df["volume_spike"] = df["volume"] > 2 * df["avg_volume"]

Not one loop, and it runs over the whole dataset in a blink.

For files too large to fit comfortably in memory, this is also where chunking comes in — reading the CSV in batches rather than all at once:

for chunk in pd.read_csv("huge_prices.csv", chunksize=100_000):
    process(chunk)   # handle one batch at a time, keep memory flat

That for loop is the legitimate kind — it's iterating over batches of work, not over individual rows. Worth noticing the difference.

The takeaway

Stage 1 isn't really about learning Python syntax — if you're a .NET developer, you'll pick that up in an afternoon. It's about rewiring the instinct that says "loop over the rows." Vectorization is the foundation everything else in AI engineering sits on: the data cleaning, the feature engineering, eventually the embeddings. Get comfortable thinking in whole columns and arrays now, and the later stages get a lot easier.

If you take one thing from this part, let it be the rule: describe the operation on the whole dataset, not the steps to walk through it.

The 3-day plan (if you want to follow along)

This is the actual structure I used — theory in the first part of the day, building in the second. If you want to retrace Stage 1 yourself, here it is at a glance.

Day Time Learn (theory) Build Why it matters Reference Output
1 ~4h Pythonic syntax, OOP with dynamic typing, Pydantic for type-safe models Set up a virtual environment; write a Pydantic model that validates sample records In a language with no compiler, validation is your only safety net for incoming data Real Python OOP; Pydantic docs A model that rejects bad data with a clear error
2 ~6h The no-loop rule — NumPy & Pandas vectorization Rewrite per-row calculations as vectorized column operations Vectorization is the foundation under every later stage — cleaning, features, embeddings NumPy/Pandas vectorization video Daily return + moving averages computed with zero loops
3 ~8h Consolidate Stage 1 The stock data engine: load a large CSV, clean it, compute all metrics (chunked if the file is big) A finished, pushed project is the real proof you can do this — and your future README Project pushed to GitHub with a README

What I used to learn this

Related code: https://github.com/ashaniwale-codestack/stock_alert_intelligence_system

Next up — Part 2: Data Engineering & ETL, where my existing SQL knowledge finally paid off, and I learned how SQLAlchemy maps onto everything I already knew from .NET.