From .NET to AI Engineer — Part 7: A Notebook Isn't a Product
Part 7, the finale of the series on staying current by adding AI, from a .NET background. Days 20–24 of the journey — turning a clever notebook into something other people can actually use.
Here's the good news to end on: this final stage is the one where a backend developer is most at home. Everything that makes an AI feature production-grade — APIs, packaging, caching, rate limits, retries, security, observability — is the work you've done your whole career. The AI is just the new thing inside a very familiar box.
This stage took about five days, and it's less about new concepts than about applying old discipline to a new payload.
Theory and build, together
Wrap it in an API
A model behind a notebook helps nobody. FastAPI exposes your logic as a clean HTTP service, and it'll feel immediately familiar — typed request/response models (Pydantic again), async handlers, dependency injection. If you've built Web APIs in .NET, you already know this shape:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class Query(BaseModel):
question: str
@app.post("/ask")
async def ask(q: Query):
answer = rag_pipeline(q.question)
return {"answer": answer}
For a quick front end to demo it, Streamlit turns a Python script into a usable web UI in minutes — handy for showing stakeholders something clickable.
Package it
Docker makes "works on my machine" irrelevant by shipping the app and its environment together. Same reasoning you'd use for any service — for AI apps it matters even more, because the dependency stack is heavy and fussy about versions.
Make it cheap and resilient
This is where AI apps differ from ordinary services, and where costs quietly explode if you're careless:
- Caching. Identical or near-identical requests shouldn't hit the model twice. A cache (Redis, say) in front of the model cuts both latency and spend dramatically.
- Rate limiting. Protect yourself from runaway usage — and runaway bills.
- Retries and timeouts. Model APIs fail and stall. Wrap calls in sensible retries with backoff and hard timeouts, exactly as you would any flaky dependency.
- Cost tracking. Log token usage per request so spend never surprises you.
Secure it
AI brings one genuinely new threat: prompt injection — a user (or a document the model reads) sneaking in instructions that hijack its behavior. Treat all model input and output as untrusted, keep tools on a tight allowlist, and never let a model's raw output trigger a dangerous action unchecked. The old rule holds, with a new attacker: never trust input.
See inside it
You can't fix what you can't see. Observability tooling (LangSmith and similar) traces each step of a chain or agent — what was retrieved, what the model was sent, what it returned — so a bad answer becomes debuggable instead of mysterious. It's logging and tracing, adapted to non-deterministic calls.
The takeaway — and the end of the road
The thing I most want a fellow backend developer to hear: you were already most of the way there. The path from .NET to AI engineering isn't about abandoning what you know — it's about adding a thin, learnable layer of AI concepts on top of the engineering foundation you've spent years building. Data pipelines, APIs, validation, caching, security, deployment: that's the hard, durable part, and it's already yours.
Seven parts, six stages, about five weeks. From unlearning a loop to deploying an agent. If you've followed along, you don't just understand AI engineering — you've built the portfolio that proves it.
The 5-day plan (if you want to follow along)
| Day | Time | Learn (theory) | Build | Why it matters | Reference | Output |
|---|---|---|---|---|---|---|
| 20 | ~6h | FastAPI — async, Pydantic models, DI | Expose your pipeline as an HTTP endpoint | A model behind a notebook helps nobody | FastAPI tutorial | A working /ask API |
| 21 | ~6h | Streamlit; Docker | Add a simple UI; containerize the app | "Works on my machine" stops being your problem | FastAPI + Docker docs | A containerized, clickable app |
| 22 | ~7h | Caching, rate limiting, retries | Add a Redis cache and retry/timeout logic | This is where AI costs and failures are tamed | Redis docs | A cheaper, resilient service |
| 23 | ~6h | Prompt-injection security; cost tracking | Sanitize I/O; log token usage per request | New attacker, old rule: never trust input | provider safety guidance | A safer, cost-aware app |
| 24 | ~6h | Observability; deployment | Add tracing; deploy it somewhere | You can't fix what you can't see | LangSmith docs | A deployed, observable app — done |
What I used to learn this
- FastAPI tutorial: https://fastapi.tiangolo.com/tutorial/
- Redis docs: https://redis.io/docs/latest/
- LangSmith (observability): https://docs.smith.langchain.com/
- DeepLearning.AI — Prompt Compression and Query Optimization (cost/latency): https://www.deeplearning.ai/short-courses/
That's the series. If it helped, the best thing you can do is build your own version of one of these stages and write up where it broke — the broken parts are where the real learning is.