<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Felix Gogodae]]></title><description><![CDATA[Felix Gogodae]]></description><link>https://blog.felixgogodae.xyz</link><image><url>https://cdn.hashnode.com/uploads/logos/62bdc54f79c4ef14aeaad2b5/084277ee-4296-43d7-8fec-1a536c0874d5.jpg</url><title>Felix Gogodae</title><link>https://blog.felixgogodae.xyz</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 10 Jun 2026 22:13:54 GMT</lastBuildDate><atom:link href="https://blog.felixgogodae.xyz/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[I Built a Self-Service Sandbox Platform with Docker, Nginx, and Bash — Here's How It All Works]]></title><description><![CDATA[I wanted to build something that captures the real operational concerns of a platform team in a small, understandable package — not a toy "Hello World in Docker," but a system where you can actually s]]></description><link>https://blog.felixgogodae.xyz/building-a-mini-heroku</link><guid isPermaLink="true">https://blog.felixgogodae.xyz/building-a-mini-heroku</guid><category><![CDATA[Devops]]></category><category><![CDATA[Docker]]></category><category><![CDATA[nginx]]></category><category><![CDATA[Bash]]></category><dc:creator><![CDATA[Felix Gogodae]]></dc:creator><pubDate>Wed, 13 May 2026 19:12:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/62bdc54f79c4ef14aeaad2b5/5186bd5c-b8c3-48aa-93be-653aeae99dff.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>I wanted to build something that captures the real operational concerns of a platform team in a small, understandable package — not a toy "Hello World in Docker," but a system where you can actually see environments spin up, routes register, health monitors fire, and chaos strike. What came out is a self-service sandbox platform: a miniature internal Heroku with a chaos engineering toggle, running entirely on one Linux VM.</p>
<p>Full source: <a href="https://github.com/Trojanhorse7/devops-sandbox">Trojanhorse7/devops-sandbox</a></p>
<hr />
<h2>What the platform does</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ltlroc1nkkz5acl7m2lr.png" alt="System overview diagram showing the nginx edge router, API container, and per-environment workload containers" /></p>
<p>The platform lets you spin up isolated, temporary environments on demand. Each environment is:</p>
<ul>
<li>A Docker container running an app, on its own Docker network</li>
<li>Reachable at a unique URL: <code>http://&lt;host&gt;/env/&lt;ENV_ID&gt;/</code></li>
<li>Tracked with a JSON state file that records name, creation time, TTL, and status</li>
<li>Self-destructing after its TTL expires</li>
<li>Observable through a live health monitor and a REST API</li>
</ul>
<p>The whole control plane boots with one command:</p>
<pre><code class="language-bash">make up
</code></pre>
<p>From there, creating an environment, simulating an outage, tailing logs, and triggering auto-cleanup all flow from either the Makefile or the HTTP API.</p>
<hr />
<h2>Repo layout</h2>
<pre><code>devops-sandbox/
├── platform/
│   ├── create_env.sh        # spin up a new environment
│   ├── destroy_env.sh       # tear it down
│   ├── cleanup_daemon.sh    # TTL-based auto-expiry loop
│   ├── simulate_outage.sh   # chaos modes: crash / pause / network / recover / stress
│   ├── api.py               # FastAPI control plane
│   └── common.sh            # shared config + atomic-write helpers
├── nginx/
│   ├── nginx.conf           # static master config
│   └── conf.d/
│       └── env-bootstrap.conf  # always-present stub
├── monitor/
│   └── health_poller.py     # polls /health every 30s
├── envs/                    # runtime state files (.gitignored)
├── logs/                    # per-env app + health logs (.gitignored)
├── Makefile
└── docker-compose.yml
</code></pre>
<hr />
<h2>The core architecture decision: one nginx, many routes</h2>
<p>The platform's front door is a single nginx container. It has a static master config that never changes at runtime. What changes are the files in <code>conf.d/</code>.</p>
<pre><code class="language-nginx">server {
    listen 80 default_server;
    server_name _;

    location / {
        default_type text/plain;
        return 200 'sandbox edge — routes live under /env/&lt;ENV_ID&gt;/\n';
    }

    # Dynamically populated — one file per active environment
    include /etc/nginx/conf.d/env-*.conf;
}
</code></pre>
<p>The <code>include</code> directive with a glob expression tells nginx to read every matching file and paste its contents inline before the config is parsed. When a new environment is created, a snippet is written to disk. When it's destroyed, the snippet is deleted. After each change, <code>nginx -s reload</code> sends <code>SIGHUP</code> to the master process — it forks new workers with the fresh config, drains the old ones gracefully, and replaces them. Existing connections are never dropped.</p>
<p><strong>The glob edge case.</strong> If <code>env-*.conf</code> matches zero files, nginx refuses to start:</p>
<pre><code>nginx: [emerg] open() "/etc/nginx/conf.d/env-*.conf" failed (2: No such file or directory)
</code></pre>
<p>The fix is a committed stub called <code>env-bootstrap.conf</code> that always matches the glob and doubles as an internal health endpoint:</p>
<pre><code class="language-nginx"># Always present so `include env-*.conf` matches ≥1 file.
location = /__sandbox_nginx_ok {
    access_log off;
    return 204;
}
</code></pre>
<p><strong>The Docker Compose volume binding</strong> is what makes host-side writes immediately visible inside the container:</p>
<pre><code class="language-yaml">volumes:
  - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro   # master config — immutable
  - ./nginx/conf.d:/etc/nginx/conf.d               # snippets — read-write bind mount
</code></pre>
<p><code>nginx.conf</code> is mounted <code>:ro</code> so nothing inside the container can accidentally corrupt it. <code>conf.d/</code> is a plain bind mount: writes from the API container to <code>./nginx/conf.d/</code> on the host filesystem appear instantly at <code>/etc/nginx/conf.d/</code> inside the nginx container. No rebuild, no copy, no restart — just a reload.</p>
<hr />
<h2>Environment lifecycle</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2ua4dhso1n6anrvx6oi0.png" alt="Environment lifecycle diagram showing create, healthy, outage simulation, and destroy states" /></p>
<h3>Creating an environment</h3>
<p><code>create_env.sh</code> takes a name and an optional TTL (default 30 minutes). Here's what it does in sequence:</p>
<p><strong>1. Generate a unique ID and set up infrastructure</strong></p>
<pre><code class="language-bash">ENV_ID="env-$(openssl rand -hex 8)"
NETWORK_NAME="sandbox-net-${ENV_ID}"
CONTAINER_NAME="sandbox-app-${ENV_ID}"

docker network create "${NETWORK_NAME}"
docker run -d \
  --name "${CONTAINER_NAME}" \
  --network "${NETWORK_NAME}" \
  --label "sandbox.env=${ENV_ID}" \
  --label "sandbox.role=workload" \
  -e "SANDBOX_ENV_ID=${ENV_ID}" \
  -e "PORT=8080" \
  "${DEMO_IMAGE}"
</code></pre>
<p>Every container is labeled <code>sandbox.env=&lt;ID&gt;</code> and <code>sandbox.role=workload</code>. These labels are how <code>destroy_env.sh</code> finds containers to remove and how <code>simulate_outage.sh</code> refuses to run against control-plane containers.</p>
<p><strong>2. Write the nginx snippet — atomically</strong></p>
<pre><code class="language-bash">cat &gt; "${NGINX_SNIPPET}.tmp" &lt;&lt;EOF
location /env/${ENV_ID}/ {
    proxy_pass http://${CONTAINER_NAME}:8080/;
    proxy_http_version 1.1;
    proxy_set_header Host \$host;
    proxy_set_header X-Real-IP \$remote_addr;
    proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto \$scheme;
}
EOF
mv -f "\({NGINX_SNIPPET}.tmp" "\){NGINX_SNIPPET}"
</code></pre>
<p>Writing to a <code>.tmp</code> file first, then renaming with <code>mv</code>, is critical. <code>mv</code> within the same filesystem is an atomic syscall (<code>rename(2)</code>). If nginx reloads mid-write and reads a partial file, the config parse fails. The <code>.tmp</code> → <code>mv</code> pattern ensures nginx either sees the complete file or nothing.</p>
<p><strong>3. Connect nginx to the environment's network, then reload</strong></p>
<pre><code class="language-bash">docker network connect "\({NETWORK_NAME}" "\){NGINX_CONTAINER_NAME}"
docker exec "${NGINX_CONTAINER_NAME}" nginx -s reload
</code></pre>
<p>Nginx needs to be on the same Docker network as the workload container to resolve <code>proxy_pass http://sandbox-app-${ENV_ID}:8080/</code> by container name. The connect happens before the reload so the upstream is reachable the moment the route goes live.</p>
<p><strong>4. Start log shipping and write the state file</strong></p>
<pre><code class="language-bash">nohup docker logs -f "\({CONTAINER_NAME}" &gt;&gt; "\){SANDBOX_ROOT}/logs/${ENV_ID}/app.log" 2&gt;&amp;1 &amp;
LOG_SHIPPER_PID=$!
disown "${LOG_SHIPPER_PID}"
</code></pre>
<p>A <code>docker logs -f</code> process is backgrounded with <code>nohup</code> and its PID is stored in the state file. <code>nohup</code> matters here because when the API container (which calls this script via subprocess) exits, it would otherwise send SIGHUP to any background jobs — killing the log shipper before it ships anything. <code>disown</code> removes it from the shell's job table so it outlives the script.</p>
<p>The state file is also written atomically using <code>python3</code>'s <code>tempfile.mkstemp</code> + <code>os.replace</code>:</p>
<pre><code class="language-json">{
  "id": "env-a1b2c3d4e5f6a7b8",
  "name": "my-feature-branch",
  "created_at": "2026-05-13T18:00:00Z",
  "ttl": 1800,
  "status": "healthy",
  "network": "sandbox-net-env-a1b2c3d4e5f6a7b8",
  "container_name": "sandbox-app-env-a1b2c3d4e5f6a7b8",
  "nginx_snippet": "/sandbox/nginx/conf.d/env-a1b2c3d4e5f6a7b8.conf",
  "log_shipper_pid": 4291
}
</code></pre>
<p>Output on completion:</p>
<pre><code>Created environment 'my-feature-branch' (env-a1b2c3d4e5f6a7b8)
URL: http://127.0.0.1:80/env/env-a1b2c3d4e5f6a7b8/
TTL: 1800s (cleanup daemon enforces expiry)
</code></pre>
<h3>Destroying an environment</h3>
<p><code>destroy_env.sh</code> works through the inverse operations in a safe order:</p>
<pre><code class="language-bash"># 1. Kill the log shipper (prevents zombie docker logs processes)
kill "${LOG_PID}"

# 2. Remove all labeled containers
docker rm -f \((docker ps -aq --filter "label=sandbox.env=\){ENV_ID}")

# 3. Disconnect nginx from the network, then remove the network
docker network disconnect -f "\({NETWORK_NAME}" "\){NGINX_CONTAINER_NAME}"
docker network rm "${NETWORK_NAME}"

# 4. Delete the nginx snippet and reload
rm -f "${NGINX_SNIPPET}"
docker exec "${NGINX_CONTAINER_NAME}" nginx -s reload

# 5. Archive logs, delete state file
mv "\({SANDBOX_ROOT}/logs/\){ENV_ID}/"* "${ARCHIVE_DIR}/"
rm -f "${STATE_PATH}"
</code></pre>
<p>Killing the log shipper before removing the container is important — if you remove the container first and the log shipper is still running, you'll have a zombie <code>docker logs</code> process spinning on a dead container ID.</p>
<p>After the reload, requests to <code>/env/env-a1b2c3d4e5f6a7b8/</code> fall through to the default <code>location /</code> handler with a 200 plain text response.</p>
<hr />
<h2>TTL-based auto-cleanup</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/a5ve1c75vdztsgpjgo8r.png" alt="TTL auto-cleanup flow showing environments expiring automatically" /></p>
<p><code>cleanup_daemon.sh</code> runs in a loop, waking every 60 seconds. For each state file in <code>envs/</code>, it checks whether <code>now &gt; created_at + ttl</code> using an inline Python snippet:</p>
<pre><code class="language-bash">should_destroy="\((python3 - "\){state_file}" &lt;&lt;'PY'
import datetime, json, sys
from pathlib import Path

data = json.loads(Path(sys.argv[1]).read_text(encoding="utf-8"))
created = datetime.datetime.fromisoformat(data["created_at"].replace("Z", "+00:00"))
ttl = int(data["ttl"])
expires = created + datetime.timedelta(seconds=ttl)
now = datetime.datetime.now(datetime.timezone.utc)
print("true" if now &gt; expires else "false")
PY
)"

if [[ "${should_destroy}" == "true" ]]; then
  sandbox_ts_log "destroying ${ENV_ID} (past ttl)"
  bash "\({SCRIPT_DIR}/destroy_env.sh" "\){ENV_ID}"
fi
</code></pre>
<p>Every action is timestamped and appended to <code>logs/cleanup.log</code>. The daemon starts at <code>make up</code> with <code>nohup</code> and its PID is saved to <code>.cleanup.pid</code> so <code>make down</code> can kill it cleanly.</p>
<hr />
<h2>Health monitoring</h2>
<p>The health poller in <code>monitor/health_poller.py</code> wakes every 30 seconds, iterates over all active state files, and hits each environment's <code>/health</code> endpoint through nginx:</p>
<pre><code class="language-python">url = f"http://127.0.0.1:{port}/env/{env_id}/health"
started = time.perf_counter()
try:
    with urllib.request.urlopen(request, timeout=5) as response:
        status = int(response.status)
except urllib.error.HTTPError as exc:
    status = int(exc.code)
except Exception as exc:
    err = f"{type(exc).__name__}: {exc}"
    status = None

latency_ms = int((time.perf_counter() - started) * 1000)
</code></pre>
<p>Each result is appended as a JSON line to <code>logs/&lt;ENV_ID&gt;/health.log</code>:</p>
<pre><code class="language-json">{"ts": "2026-05-13T18:12:00+00:00", "http_status": 200, "latency_ms": 4, "error": null}
</code></pre>
<p>A separate <code>health_tracker.json</code> file tracks consecutive failures per environment. After 3 consecutive failures, the poller flips the environment's <code>status</code> field to <code>"degraded"</code> atomically and emits a warning to stderr:</p>
<pre><code>[2026-05-13T18:14:30+00:00] WARNING env=env-a1b2c3d4e5f6a7b8 degraded after 3 consecutive health failures
</code></pre>
<p>When a check succeeds again, <code>status</code> is reset to <code>"healthy"</code> and the failure counter is zeroed.</p>
<hr />
<h2>Outage simulation</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gk6yag3lmk81qcj19xle.png" alt="Outage simulation state diagram showing transitions between Healthy, Paused, Stressed, Crashed, Isolated, and Degraded states" /></p>
<p><code>simulate_outage.sh</code> is the chaos engineering toggle. It accepts <code>--env</code> and <code>--mode</code> and supports five modes:</p>
<table>
<thead>
<tr>
<th>Mode</th>
<th>What it does</th>
</tr>
</thead>
<tbody><tr>
<td><code>crash</code></td>
<td><code>docker kill</code> — sends SIGKILL to the workload container</td>
</tr>
<tr>
<td><code>pause</code></td>
<td><code>docker pause</code> — freezes all processes in the container (SIGSTOP)</td>
</tr>
<tr>
<td><code>network</code></td>
<td><code>docker network disconnect</code> — severs the container from its network</td>
</tr>
<tr>
<td><code>stress</code></td>
<td>Runs a hot CPU busy-loop inside the container (no extra packages needed)</td>
</tr>
<tr>
<td><code>recover</code></td>
<td>Undoes whichever mode was previously applied</td>
</tr>
</tbody></table>
<p>Before doing anything, the script verifies it is targeting a workload container, not a control-plane container:</p>
<pre><code class="language-bash">guard_container() {
  local cid="$1"
  local name
  name="\((docker inspect --format '{{.Name}}' "\){cid}" | sed 's#^/##')"

  if [[ "\({name}" == "\){NGINX_CONTAINER_NAME}" || "\({name}" == "\){API_CONTAINER_NAME}" ]]; then
    echo "Refusing to simulate outage on platform container (${name})." &gt;&amp;2
    exit 1
  fi

  local plat
  plat="\((docker inspect --format '{{index .Config.Labels "sandbox.platform"}}' "\){cid}")"
  if [[ "${plat}" == "control" ]]; then
    echo "Refusing to simulate outage on control-plane container (${name})." &gt;&amp;2
    exit 1
  fi
}
</code></pre>
<p>The applied mode is saved to <code>envs/.sim/&lt;ENV_ID&gt;</code> so <code>recover</code> knows what to undo. The <code>crash</code> recovery path has an extra detail: since <code>docker kill</code> terminates the container, the <code>docker logs -f</code> shipper also dies. Recovery restarts the container, spawns a new log shipper, and patches the stored PID in the state file atomically.</p>
<pre><code class="language-bash">crash)
  docker start "${cid}"
  nohup docker logs -f "\({cid}" &gt;&gt; "\){APP_LOG}" 2&gt;&amp;1 &amp;
  NEW_LOG_PID=$!
  disown "${NEW_LOG_PID}"
  # atomic patch of log_shipper_pid in state file
  ...
</code></pre>
<p>Because each environment's container is the only thing behind its <code>proxy_pass</code>, stopping a container without touching nginx produces a clean 502 Bad Gateway at the edge — which is exactly the signal your monitoring stack should catch.</p>
<hr />
<h2>The control API</h2>
<p><code>platform/api.py</code> is a FastAPI application that wraps the shell scripts. It exposes six endpoints:</p>
<pre><code>POST   /envs              → create env (body: {name, ttl})
GET    /envs              → list all active envs with TTL remaining
DELETE /envs/{id}         → destroy env
GET    /envs/{id}/logs    → last 100 lines of app.log
GET    /envs/{id}/health  → last 10 health check records
POST   /envs/{id}/outage  → trigger simulation (body: {mode})
</code></pre>
<p>Each endpoint either reads state files directly (for reads) or calls the corresponding shell script via <code>subprocess.run</code>:</p>
<pre><code class="language-python">def _run_script(rel: str, *args: str, timeout: int = 600) -&gt; str:
    script = ROOT / "platform" / rel
    result = subprocess.run(
        ["bash", str(script), *[str(a) for a in args]],
        cwd=str(ROOT),
        env={**os.environ, "SANDBOX_ROOT": str(ROOT)},
        capture_output=True,
        text=True,
        timeout=timeout,
        check=False,
    )
    if result.returncode != 0:
        detail = (result.stderr or result.stdout or "").strip()
        raise HTTPException(status_code=500, detail=detail)
    return (result.stdout or "").strip()
</code></pre>
<p>The API container mounts the Docker socket (<code>/var/run/docker.sock</code>) so the shell scripts running inside it can call <code>docker</code> commands against the host daemon. The entire repo is also bind-mounted into the container at <code>/sandbox</code> so script paths, state files, and log directories are all consistent.</p>
<p>The <code>GET /envs</code> endpoint computes TTL remaining live on each request:</p>
<pre><code class="language-python">def _ttl_remaining_seconds(data: dict) -&gt; int:
    created = datetime.fromisoformat(str(data["created_at"]).replace("Z", "+00:00"))
    expires = created + timedelta(seconds=int(data["ttl"]))
    now = datetime.now(timezone.utc)
    return max(0, int((expires - now).total_seconds()))
</code></pre>
<p>Interactive docs are at <code>http://127.0.0.1:9090/docs</code> once the stack is up.</p>
<hr />
<h2>Makefile ergonomics</h2>
<p>Every operation has a <code>make</code> target so the platform is scriptable and human-friendly at the same time:</p>
<pre><code class="language-makefile">make up                        # start nginx + api, launch daemon + health poller
make down                      # stop everything, destroy all active envs
make create                    # interactive: prompts for name + TTL
make destroy ENV=env-a1b2c3d4  # destroy specific env
make logs ENV=env-a1b2c3d4     # tail app.log (falls back to archived logs)
make health                    # print status table for all envs
make simulate ENV=... MODE=... # run outage simulation
make clean                     # wipe all state, logs, and generated nginx snippets
</code></pre>
<p><code>make up</code> also handles idempotency: it checks whether the daemon and health poller are already running before starting new instances.</p>
<hr />
<h2>What I learned building this</h2>
<p><strong>Atomicity shows up everywhere.</strong> Three separate layers of the system use the write-to-temp-then-rename pattern: the nginx snippet writer in <code>create_env.sh</code>, the state file writer in <code>common.sh</code>, and the health tracker updater in <code>health_poller.py</code>. The pattern costs nothing and eliminates an entire class of race conditions where a reader sees a partial file.</p>
<p><strong>Labels are your namespace.</strong> Using <code>sandbox.env=&lt;ID&gt;</code> and <code>sandbox.role=workload</code> labels on every container means <code>destroy_env.sh</code> can find exactly what it needs with <code>docker ps --filter</code> without hardcoding any names. It also makes the control-plane guard trivial to implement — just check the label.</p>
<p><strong><code>nohup</code> + <code>disown</code> for background processes launched by subprocess.</strong> Running <code>docker logs -f ... &amp;</code> in a script called by a Python subprocess sounds simple but produces zombie log shippers without <code>nohup</code>. The subprocess parent exits, bash sends SIGHUP to background jobs, and the shipper dies before writing anything. <code>nohup</code> suppresses the signal; <code>disown</code> removes the job from the table so it truly outlives the shell.</p>
<p><strong>The bootstrap stub is not optional.</strong> The nginx glob edge case (<code>[emerg] open()... No such file or directory</code>) is easy to miss in development because you always have at least one env running. It only surfaces on a clean boot with zero environments, which is exactly when you need the platform to start reliably.</p>
<p><strong><code>nginx -s reload</code> is not instantaneous but it is safe.</strong> New workers pick up the fresh config; old workers finish draining in-flight requests. Under low load this takes milliseconds. Under heavy load you'll see a brief overlap — which is fine, because the old route is still being served until the drain completes. The new route is not yet live during that window, but nothing breaks.</p>
<hr />
<h2>When this pattern fits (and when it doesn't)</h2>
<p>Good fit:</p>
<ul>
<li>Routes are created/destroyed by external events — deployments, tenant provisioning, CI pipelines</li>
<li>You can't afford connection drops on reload</li>
<li>The route set is unbounded or changes frequently enough that hardcoding into a master config isn't maintainable</li>
</ul>
<p>Poor fit:</p>
<ul>
<li>Sub-second reconfiguration at very high frequency (nginx reload takes ~100ms to drain workers; for that latency class, look at OpenResty/Lua or Envoy xDS)</li>
<li>Environments that need to survive restarts with complex state beyond what a JSON file can hold</li>
</ul>
<hr />
<h2>Running it</h2>
<pre><code class="language-bash">git clone https://github.com/Trojanhorse7/devops-sandbox
cd devops-sandbox
make up

# create an environment
make create
# → Environment name: demo
# → TTL seconds [1800]: 300
# → URL: http://127.0.0.1:80/env/env-a1b2c3d4e5f6a7b8/

# check health
make health

# simulate a crash
make simulate ENV=env-a1b2c3d4e5f6a7b8 MODE=crash
# health monitor catches it within 90 seconds, status → degraded

# recover
make simulate ENV=env-a1b2c3d4e5f6a7b8 MODE=recover

# or let the TTL expire — cleanup daemon destroys it automatically
</code></pre>
<hr />
<p><em>Full source: <a href="https://github.com/Trojanhorse7/devops-sandbox">github.com/Trojanhorse7/devops-sandbox</a></em></p>
]]></content:encoded></item><item><title><![CDATA[Building Insighta: Django + React + CLI Profiles Platform]]></title><description><![CDATA[Introduction
Most tutorials stop at “here’s a REST API.” They rarely walk through shipping a multi-client platform: a browser SPA that authenticates differently from a CLI, rate limiting that must wor]]></description><link>https://blog.felixgogodae.xyz/building-insighta-django-react-cli-profiles-platform</link><guid isPermaLink="true">https://blog.felixgogodae.xyz/building-insighta-django-react-cli-profiles-platform</guid><dc:creator><![CDATA[Felix Gogodae]]></dc:creator><pubDate>Thu, 30 Apr 2026 14:21:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/62bdc54f79c4ef14aeaad2b5/c186a392-8853-40a4-bd33-e3c791027c85.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Introduction</h2>
<p>Most tutorials stop at “here’s a REST API.” They rarely walk through shipping a <strong>multi-client</strong> platform: a browser SPA that authenticates differently from a CLI, rate limiting that must work <strong>before</strong> and <strong>after</strong> JWTs, a natural-language parser that turns plain English into SQL-safe filters without calling an LLM, and the moment Django REST Framework quietly returns <strong>404</strong> because you reused a reserved query parameter.</p>
<p><strong>Insighta</strong> is that stack — a profiles intelligence system split across three repos that share one Django backend:</p>
<ul>
<li><strong>Backend</strong> — Django 5 + DRF, PostgreSQL, JWT (HS256), GitHub OAuth with PKCE, RBAC, layered rate limiting, deterministic NL search, CSV export  </li>
<li><strong>Frontend</strong> — Vite + React 19 SPA: httpOnly cookies, CSRF, silent refresh  </li>
<li><strong>CLI</strong> — Python Typer app: loopback OAuth, Bearer tokens, Rich terminal UI</li>
</ul>
<p>This post is the architecture tour: what we built, why two OAuth apps exist, where limits apply, and the bugs that only show up in production.</p>
<hr />
<h2>Table of Contents</h2>
<ol>
<li><a href="#architecture-at-a-glance">Architecture at a glance</a></li>
<li><a href="#data-model">Data model</a></li>
<li><a href="#authentication-two-github-apps-one-backend">Authentication: two GitHub apps, one backend</a></li>
<li><a href="#middleware-pipeline">Middleware pipeline</a></li>
<li><a href="#rate-limiting">Rate limiting</a></li>
<li><a href="#profile-aggregation">Profile aggregation</a></li>
<li><a href="#natural-language-search">Natural language search</a></li>
<li><a href="#csv-export-and-the-drf-format-trap">CSV export and the DRF format trap</a></li>
<li><a href="#react-spa-cookies-done-right">React SPA: cookies done right</a></li>
<li><a href="#cli-oauth-in-the-terminal">CLI: OAuth in the terminal</a></li>
<li><a href="#deployment">Deployment</a></li>
<li><a href="#lessons-learned">Lessons learned</a></li>
</ol>
<hr />
<h2>Architecture at a glance</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/gftuui0ry18bs8e4pefo.png" alt="System Architecture" /></p>
<p>Three clients, one API:</p>
<table>
<thead>
<tr>
<th>Piece</th>
<th>Stack</th>
<th>Auth</th>
<th>Deploy</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Backend API</strong></td>
<td>Django 5 + DRF</td>
<td>JWT (HS256)</td>
<td>Leapcell (Gunicorn)</td>
</tr>
<tr>
<td><strong>Web portal</strong></td>
<td>Vite + React 19 + React Router v7</td>
<td>httpOnly cookies + CSRF</td>
<td>Vercel</td>
</tr>
<tr>
<td><strong>CLI</strong></td>
<td>Python 3.12+, Typer, Rich, httpx</td>
<td><code>Authorization: Bearer …</code></td>
<td>PyPI / local</td>
</tr>
</tbody></table>
<p>The browser and CLI both log in with GitHub — but they use <strong>different OAuth Apps</strong> on purpose. GitHub requires callback URLs to match exactly; you cannot cleanly register both a public <code>https://api…/callback</code> and <code>http://127.0.0.1:8765/callback</code> on a single OAuth app without unsupported wildcard tricks. Two apps, two client IDs, one user table.</p>
<hr />
<h2>Data model</h2>
<p>Four logical pieces power persistence:</p>
<h3>User (<code>accounts</code>)</h3>
<p>GitHub is the identity provider — no local passwords. A <code>role</code> field (<code>admin</code> | <code>analyst</code>) gates capabilities: analysts read; admins create/delete profiles.</p>
<h3>Profile (<code>classify</code>)</h3>
<p>Each row is a <strong>unique name</strong> enriched from external APIs: inferred gender and confidence, estimated age, derived <strong>age_group</strong> (child / teenager / adult / senior), and primary country with probabilities. Creating the same name twice is <strong>idempotent</strong> — you get the existing row back.</p>
<h3>Refresh tokens &amp; OAuth state</h3>
<p>Refresh tokens are stored as <strong>hashes</strong>, never plaintext. <code>GitHubOAuthState</code> rows are short-lived: minted at the start of login, validated at callback, then discarded or expired.</p>
<hr />
<h2>Authentication: two GitHub apps, one backend</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/3saijjl0hh6bgyk18dbw.png" alt="Browser Flow" /></p>
<table>
<thead>
<tr>
<th></th>
<th>Browser (portal)</th>
<th>CLI</th>
</tr>
</thead>
<tbody><tr>
<td>Credentials</td>
<td><code>GITHUB_CLIENT_ID</code> / <code>SECRET</code></td>
<td><code>GITHUB_CLI_CLIENT_ID</code> / <code>SECRET</code></td>
</tr>
<tr>
<td>Callback</td>
<td>Backend public URL</td>
<td><code>http://127.0.0.1:8765/callback</code></td>
</tr>
<tr>
<td>Token delivery</td>
<td><code>Set-Cookie</code> (httpOnly, <code>Secure</code> in prod)</td>
<td>JSON body → <code>~/.insighta/credentials.json</code></td>
</tr>
<tr>
<td>Refresh</td>
<td><code>POST /auth/refresh/web</code> (cookies)</td>
<td><code>POST /auth/refresh</code> (JSON body)</td>
</tr>
<tr>
<td>CSRF</td>
<td>Required on mutating requests</td>
<td>Not applicable</td>
</tr>
</tbody></table>
<h3>PKCE everywhere</h3>
<p>Both flows implement <strong>PKCE (S256)</strong> — random verifier, SHA-256 challenge, <code>code_challenge_method=S256</code> on authorize, verifier on token exchange. Even with server-side client secrets, PKCE closes the authorization-code interception window.</p>
<h3>Portal flow (abbreviated)</h3>
<pre><code class="language-plaintext">User clicks “Sign in with GitHub”
  → SPA navigates to GET /auth/github (full redirect)
  → Backend stores PKCE + state, redirects to GitHub

User approves on GitHub
  → Redirect to /auth/github/callback?code=…&amp;state=…
  → Backend validates state, exchanges code, loads GitHub profile
  → Issues JWT access + refresh, sets httpOnly cookies, redirects to SPA
  → SPA calls GET /auth/me with credentials included
</code></pre>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/rfaamvatq99iyzelhi84.png" alt="CLI Authentication" /></p>
<h3>CLI flow (abbreviated)</h3>
<pre><code class="language-plaintext">insighta login
  → CLI binds HTTP listener on 127.0.0.1:8765
  → Opens GitHub authorize URL (PKCE + state)

GitHub redirects to loopback with ?code=…
  → CLI POST /auth/github/cli with code + code_verifier
  → Backend exchanges with GitHub using CLI app credentials
  → JSON tokens saved locally; listener shuts down
</code></pre>
<hr />
<h2>Middleware pipeline</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/aapd4jz91u45y9aa8dga.png" alt="Request Lifecycle" /></p>
<p>Every request walks through <strong>eight</strong> middleware layers (order matters):</p>
<pre><code class="language-plaintext">1. CorsMiddleware           — allow SPA origin with credentials
2. SecurityMiddleware       — HSTS, SSL redirect (production)
3. CommonMiddleware         — slashes, prepends, etc.
4. ApiVersionMiddleware     — X-API-Version: 1 required on /api/*
5. RateLimitMiddleware      — IP bucket on selected /auth/* (pre-DRF)
6. CsrfViewMiddleware       — cookie-auth writes need CSRF token
7. XFrameOptionsMiddleware  — clickjacking defaults
8. RequestLoggingMiddleware — method, path, status, duration, user id
</code></pre>
<h3>API version gate</h3>
<p><code>/api/*</code> without <code>X-API-Version: 1</code> returns a deliberate <strong>400</strong> with a clear JSON body — versioning is enforced <strong>before</strong> authentication so anonymous misconfigured clients fail fast.</p>
<pre><code class="language-json">{ "status": "error", "message": "API version header required" }
</code></pre>
<hr />
<h2>Rate limiting</h2>
<p>Limits are <strong>split by layer</strong> so brute-force login attempts and authenticated API abuse are both covered:</p>
<h3>Layer 1 — middleware (mostly pre-auth)</h3>
<p>Roughly <strong>10 requests/minute per IP</strong> on sensitive <code>/auth/*</code> routes that handle redirects and CSRF priming. Paths already throttled inside DRF views are <strong>skipped</strong> so you don’t double-penalize the same call:</p>
<pre><code class="language-python"># Example idea: skip what DRF handles with AuthBurstThrottle
SKIP_PATHS = [
    "/auth/me",
    "/auth/github/cli",
    "/auth/refresh",
    "/auth/logout",
]
</code></pre>
<h3>Layer 2 — DRF throttles (post-auth)</h3>
<p>After JWT resolution, DRF applies per-user (or per-IP for anonymous) limits on <code>/api/*</code>. Auth-heavy endpoints can carry their own burst throttle.</p>
<p><strong>Rule of thumb:</strong> OAuth redirect surfaces → middleware IP limits; JSON token APIs → DRF. Mixing them blindly either double-counts or leaves holes.</p>
<hr />
<h2>Profile aggregation</h2>
<p>Admin creates a profile with:</p>
<pre><code class="language-http">POST /api/profiles
Content-Type: application/json

{ "name": "Ada Lovelace" }
</code></pre>
<p>The service fans out to three public APIs:</p>
<table>
<thead>
<tr>
<th>API</th>
<th>Role</th>
</tr>
</thead>
<tbody><tr>
<td><strong>Genderize</strong></td>
<td>gender + probability</td>
</tr>
<tr>
<td><strong>Agify</strong></td>
<td>estimated age</td>
</tr>
<tr>
<td><strong>Nationalize</strong></td>
<td>country candidates + probabilities</td>
</tr>
</tbody></table>
<p>The aggregator picks the <strong>top</strong> country by probability, derives <strong>age_group</strong> from numeric age, maps ISO codes to display names, and inserts or returns the existing profile (same unique <code>name</code>).</p>
<hr />
<h2>Natural language search</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/4luq34wfb902rg6ehjsl.png" alt="NL search" /></p>
<p>Endpoint: <code>GET /api/profiles/search?q=…</code></p>
<p>No embeddings, no chat completions — just <strong>deterministic rules</strong>:</p>
<pre><code class="language-plaintext">Input: "young males from nigeria above 20"

1. Normalize (lowercase, strip accents, collapse whitespace)
2. Longest-match country names against a curated dictionary (~65 countries)
3. Gender keywords ("male", "female", combined phrases)
4. Age-group vocabulary ("young", "teenager", "adult", …)
5. Numeric phrases ("above 20", "under 30", …) merged with age-group bounds

Output: structured filters → queryset
</code></pre>
<h3>Example mapping</h3>
<table>
<thead>
<tr>
<th>Query fragment</th>
<th>Effect</th>
</tr>
</thead>
<tbody><tr>
<td><code>young</code></td>
<td>tightens upper age bound</td>
</tr>
<tr>
<td><code>males</code></td>
<td><code>gender = male</code></td>
</tr>
<tr>
<td><code>from nigeria</code></td>
<td><code>country_id = NG</code></td>
</tr>
<tr>
<td><code>above 20</code></td>
<td>raises minimum age</td>
</tr>
</tbody></table>
<p>Unparseable noise yields <strong>422</strong> — better than silently returning everything.</p>
<h3>Why not an LLM here?</h3>
<ul>
<li><strong>Deterministic</strong> — same string, same filters; no temperature surprises  </li>
<li><strong>Fast / free</strong> — microseconds, no vendor rate limits  </li>
<li><strong>Testable</strong> — table-driven unit tests for phrases</li>
</ul>
<p>When the vocabulary is small and the output shape is fixed, rules beat models.</p>
<hr />
<h2>CSV export and the DRF format trap</h2>
<p>Everything passed locally — then production returned <strong>404</strong> for the export URL.</p>
<p>Root cause: Django REST Framework treats <strong><code>format</code></strong> as a <strong>first-class content negotiation switch</strong>. A request like:</p>
<pre><code class="language-http">GET /api/profiles/export?format=csv
</code></pre>
<p>makes DRF look for a renderer registered for <code>csv</code>. If none exists, you can get <strong>404</strong> — <strong>before your view runs</strong>. Your URLconf is fine; your tests might not hit negotiation the same way.</p>
<p><strong>Fix:</strong> rename the parameter:</p>
<pre><code class="language-python"># Broken pattern
fmt = request.query_params.get("format", "")

# Working pattern
fmt = request.query_params.get("export_format", "")
</code></pre>
<p>CLI and docs must send <code>export_format=csv</code> instead. <strong>Lesson:</strong> grep your framework for reserved query keys before naming business parameters.</p>
<hr />
<h2>React SPA: cookies done right</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/x469m9o2eqh7iaxrovtm.png" alt="Dashboard" /></p>
<h3>Why httpOnly cookies?</h3>
<p>JavaScript cannot read httpOnly cookies — XSS cannot exfiltrate bearer tokens from <code>localStorage</code>. Cookies ride automatically on same-site or correctly configured cross-site requests; refresh can rotate both tokens server-side.</p>
<h3>Boot sequence</h3>
<pre><code class="language-plaintext">App mounts → GET /auth/me (cookies attached)
  → 200: hydrate user context
  → 401: POST /auth/refresh/web
       → success: retry /auth/me
       → failure: clear client state, redirect to landing
</code></pre>
<h3>CSRF for cross-origin cookie auth</h3>
<p>The SPA calls <strong><code>GET /auth/csrf</code></strong> early, mirrors the token into <code>X-CSRFToken</code> on <code>POST</code>/<code>DELETE</code>/<code>PUT</code>/<code>PATCH</code>, and relies on trusted origins + cookie flags in production.</p>
<h3>Silent retry wrapper</h3>
<p>A thin <code>apiFetch</code> helper: on <strong>401</strong>, attempt cookie refresh once, then replay the original request. Users stay signed in until the refresh token itself expires.</p>
<hr />
<h2>CLI: OAuth in the terminal</h2>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tqlmnwpvqhhupkjg8msg.png" alt="CLI" /></p>
<h3>Loopback login</h3>
<pre><code class="language-bash">insighta login --api-url https://api.example.com
</code></pre>
<p>PKCE + random state → temporary localhost server → browser authorization → POST consolidated token exchange → credentials file.</p>
<h3>Automatic refresh</h3>
<p>HTTP wrapper pseudocode:</p>
<pre><code class="language-python">def request(self, method, path, ...):
    r = self.http.request(method, url, headers=self._headers())
    if r.status_code == 401 and self.refresh_token:
        self._do_refresh()
        r = self.http.request(method, url, headers=self._headers())
    return r
</code></pre>
<p>Persist new tokens so the next invocation stays logged in.</p>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/yl6tb9yuzy4zlh88gdav.png" alt="profiles list" /></p>
<h3>Commands (typical)</h3>
<pre><code class="language-shell">insighta login
insighta logout
insighta whoami
insighta profiles list
insighta profiles search
insighta profiles show &lt;uuid&gt;
insighta profiles create      # admin
insighta profiles delete      # admin
insighta profiles export      # CSV via export_format=
insighta classify "Ada Lovelace"
</code></pre>
<p>Rich handles tables, spinners, and readable HTTP errors — the CLI is a <strong>product</strong>, not a thin <code>curl</code> script.</p>
<hr />
<h2>Deployment</h2>
<table>
<thead>
<tr>
<th>Piece</th>
<th>Platform</th>
<th>Notes</th>
</tr>
</thead>
<tbody><tr>
<td>API</td>
<td>Leapcell</td>
<td>Gunicorn, managed PostgreSQL, TLS termination</td>
</tr>
<tr>
<td>SPA</td>
<td>Vercel</td>
<td>Static assets + <code>vercel.json</code> rewrites for client routing</td>
</tr>
<tr>
<td>CLI</td>
<td>Local / PyPI</td>
<td><code>pip install</code> or <code>python -m insighta_cli</code></td>
</tr>
</tbody></table>
<h3>Backend env (representative)</h3>
<pre><code class="language-properties">DJANGO_SECRET_KEY
JWT_SIGNING_KEY
DATABASE_URL
GITHUB_CLIENT_ID
GITHUB_CLIENT_SECRET
GITHUB_CLI_CLIENT_ID
GITHUB_CLI_CLIENT_SECRET
BACKEND_PUBLIC_URL
WEB_PORTAL_ORIGIN
INSIGHTA_CLI_OAUTH_REDIRECT
</code></pre>
<h3>Cross-origin cookies</h3>
<p>SPA on Vercel talking to API elsewhere requires:</p>
<ul>
<li><code>CORS_ALLOW_CREDENTIALS = True</code>  </li>
<li>Explicit <code>CORS_ALLOWED_ORIGINS</code> (no <code>*</code> with credentials)  </li>
<li><code>CSRF_COOKIE_SAMESITE = "None"</code> and <code>CSRF_COOKIE_SECURE = True</code> in production  </li>
<li><code>CSRF_TRUSTED_ORIGINS</code> including the SPA origin</li>
</ul>
<p>Local <code>DEBUG=True</code> can relax <code>SameSite</code> / <code>Secure</code> so <code>http://localhost</code> stays ergonomic.</p>
<hr />
<h2>Lessons learned</h2>
<ol>
<li><strong>Reserved framework words</strong> — <code>format</code> looked innocent; DRF disagreed. When behavior is “impossible,” read how negotiation runs before your view.  </li>
<li><strong>Two OAuth apps</strong> — simpler and clearer than fighting callback URL constraints for browser vs CLI.  </li>
<li><strong>Rate limit by responsibility</strong> — IP buckets where there is no user identity yet; per-user throttles after JWT.  </li>
<li><strong>Cookies in SPAs</strong> — CSRF + CORS + cookie flags are real work, but XSS-resistant token storage is worth it.  </li>
<li><strong>Small-domain NL</strong> — rules beat LLMs when inputs are bounded and outputs must be exact.  </li>
<li><strong>CLI quality bar</strong> — OAuth, refresh, and UX decide whether developers keep the tool installed.</li>
</ol>
<hr />
<h2>Wrapping up</h2>
<p>Insighta grew from a Stage 1 shaped exercise into a coherent platform: three clients, two OAuth registrations, layered security, and search that never phones home to an inference API.</p>
<p>Repos:</p>
<ul>
<li><strong>Backend</strong>: <a href="https://github.com/Trojanhorse7/insighta-backend">github.com/Trojanhorse7/insighta-backend</a>  </li>
<li><strong>Frontend</strong>: <a href="https://github.com/Trojanhorse7/insighta-frontend">github.com/Trojanhorse7/insighta-frontend</a>  </li>
<li><strong>CLI</strong>: <a href="https://github.com/Trojanhorse7/insighta-cli">github.com/Trojanhorse7/insighta-cli</a></li>
</ul>
<p>If you’re wiring cookie auth to a remote API, splitting OAuth between browser and terminal, or staring at a “404” that should be a 200 — I hope this saves you a night of debugging.</p>
<h3>What I’d iterate on next</h3>
<ul>
<li><strong>Observability</strong> — structured request IDs end-to-end from SPA → API → DB slow queries.  </li>
<li><strong>Webhook-style exports</strong> — async CSV generation for huge datasets instead of holding the connection open.  </li>
<li><strong>Parser fuzzing</strong> — generative tests on the NL layer so odd unicode and punctuation never slip past normalization.</li>
</ul>
<hr />
<p><em>Questions or corrections? Comment below or open an issue on any of the repos.</em></p>
]]></content:encoded></item></channel></rss>