From .NET to AI Engineer — Part 7: A Notebook Isn't a Product

Part 7, the finale of the series on staying current by adding AI, from a .NET background. Days 20–24 of the journey — turning a clever notebook into something other people can actually use.

Here's the good news to end on: this final stage is the one where a backend developer is most at home. Everything that makes an AI feature production-grade — APIs, packaging, caching, rate limits, retries, security, observability — is the work you've done your whole career. The AI is just the new thing inside a very familiar box.

This stage took about five days, and it's less about new concepts than about applying old discipline to a new payload.

Theory and build, together

Wrap it in an API

A model behind a notebook helps nobody. FastAPI exposes your logic as a clean HTTP service, and it'll feel immediately familiar — typed request/response models (Pydantic again), async handlers, dependency injection. If you've built Web APIs in .NET, you already know this shape:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Query(BaseModel):
    question: str

@app.post("/ask")
async def ask(q: Query):
    answer = rag_pipeline(q.question)
    return {"answer": answer}

For a quick front end to demo it, Streamlit turns a Python script into a usable web UI in minutes — handy for showing stakeholders something clickable.

Package it

Docker makes "works on my machine" irrelevant by shipping the app and its environment together. Same reasoning you'd use for any service — for AI apps it matters even more, because the dependency stack is heavy and fussy about versions.

Make it cheap and resilient

This is where AI apps differ from ordinary services, and where costs quietly explode if you're careless:

Caching. Identical or near-identical requests shouldn't hit the model twice. A cache (Redis, say) in front of the model cuts both latency and spend dramatically.
Rate limiting. Protect yourself from runaway usage — and runaway bills.
Retries and timeouts. Model APIs fail and stall. Wrap calls in sensible retries with backoff and hard timeouts, exactly as you would any flaky dependency.
Cost tracking. Log token usage per request so spend never surprises you.

Secure it

AI brings one genuinely new threat: prompt injection — a user (or a document the model reads) sneaking in instructions that hijack its behavior. Treat all model input and output as untrusted, keep tools on a tight allowlist, and never let a model's raw output trigger a dangerous action unchecked. The old rule holds, with a new attacker: never trust input.

See inside it

You can't fix what you can't see. Observability tooling (LangSmith and similar) traces each step of a chain or agent — what was retrieved, what the model was sent, what it returned — so a bad answer becomes debuggable instead of mysterious. It's logging and tracing, adapted to non-deterministic calls.

The takeaway — and the end of the road

The thing I most want a fellow backend developer to hear: you were already most of the way there. The path from .NET to AI engineering isn't about abandoning what you know — it's about adding a thin, learnable layer of AI concepts on top of the engineering foundation you've spent years building. Data pipelines, APIs, validation, caching, security, deployment: that's the hard, durable part, and it's already yours.

Seven parts, six stages, about five weeks. From unlearning a loop to deploying an agent. If you've followed along, you don't just understand AI engineering — you've built the portfolio that proves it.

The 5-day plan (if you want to follow along)

Day	Time	Learn (theory)	Build	Why it matters	Reference	Output
20	~6h	FastAPI — async, Pydantic models, DI	Expose your pipeline as an HTTP endpoint	A model behind a notebook helps nobody	FastAPI tutorial	A working `/ask` API
21	~6h	Streamlit; Docker	Add a simple UI; containerize the app	"Works on my machine" stops being your problem	FastAPI + Docker docs	A containerized, clickable app
22	~7h	Caching, rate limiting, retries	Add a Redis cache and retry/timeout logic	This is where AI costs and failures are tamed	Redis docs	A cheaper, resilient service
23	~6h	Prompt-injection security; cost tracking	Sanitize I/O; log token usage per request	New attacker, old rule: never trust input	provider safety guidance	A safer, cost-aware app
24	~6h	Observability; deployment	Add tracing; deploy it somewhere	You can't fix what you can't see	LangSmith docs	A deployed, observable app — done

What I used to learn this

FastAPI tutorial: https://fastapi.tiangolo.com/tutorial/
Redis docs: https://redis.io/docs/latest/
LangSmith (observability): https://docs.smith.langchain.com/
DeepLearning.AI — Prompt Compression and Query Optimization (cost/latency): https://www.deeplearning.ai/short-courses/

That's the series. If it helped, the best thing you can do is build your own version of one of these stages and write up where it broke — the broken parts are where the real learning is.