cortex_status/elixir_notes.md

15 KiB

Elixir / Phoenix Learnings — Cortex Status Dashboard

Patterns, gotchas, and reference notes from building the cortex_status Phoenix app.

Project: cortex_status

  • Location on cortex: /data/cortex_status/
  • Service: symbiont-ex-api.service (systemd)
  • Port: 4000 (behind Caddy at status.hydrascale.net)
  • Framework: Phoenix 1.7.14, LiveView 1.0.0, LiveDashboard 0.8.4

LiveView Patterns

PubSub for Real-Time Updates

The status page subscribes to PubSub topics on mount and receives broadcast updates:

def mount(_params, _session, socket) do
  if connected?(socket) do
    Phoenix.PubSub.subscribe(CortexStatus.PubSub, "service_status")
  end
  {:ok, assign(socket, ...)}
end

def handle_info({:status_update, new_status}, socket) do
  {:noreply, assign(socket, status: new_status)}
end

Polling Pattern (Process.send_after)

For polling an external API from a LiveView (e.g., task progress):

@poll_interval 1_000

# Start polling
timer = Process.send_after(self(), :poll_task, @poll_interval)
{:noreply, assign(socket, poll_timer: timer)}

# Handle poll
def handle_info(:poll_task, socket) do
  case fetch_progress(socket.assigns.task_id) do
    {:ok, data} when is_map(data) ->
      terminal = data["status"] in ["completed", "failed"]
      timer = if terminal, do: nil, else: Process.send_after(self(), :poll_task, @poll_interval)
      {:noreply, assign(socket, data: data, poll_timer: timer)}
    _ ->
      # Don't crash on bad data — keep polling
      timer = Process.send_after(self(), :poll_task, @poll_interval)
      {:noreply, assign(socket, poll_timer: timer)}
  end
end

Gotcha: Always cancel timers on unmount/logout:

if socket.assigns.poll_timer, do: Process.cancel_timer(socket.assigns.poll_timer)

Component Attrs and Passing Assigns

When defining function components with attr, use assigns directly. For passing whole assigns bundles to sub-components, use a named assign:

# Don't try to pass @assigns directly — use a named prop
<.auth_gate socket_assigns={assigns} />

defp auth_gate(assigns) do
  ~H"""
  <%= @socket_assigns.prompt %>
  """
end

phx-change Goes on the Form, Not Individual Inputs

# WRONG: phx-change on textarea alone won't fire
<textarea phx-change="update_prompt" name="prompt"></textarea>

# RIGHT: phx-change on the form, inputs trigger it
<form phx-submit="submit" phx-change="update">
  <textarea name="prompt"><%= @prompt %></textarea>
</form>

Enum.with_index Returns {element, index}

# WRONG — destructuring is backwards:
for {index, element} <- Enum.with_index(list)

# RIGHT:
for {element, index} <- Enum.with_index(list)

LiveDashboard Custom Pages

Basic Structure

defmodule MyApp.DashboardPages.MyPage do
  use Phoenix.LiveDashboard.PageBuilder

  @impl true
  def menu_link(_, _), do: {:ok, "Page Title"}

  @impl true
  def render_page(_assigns) do
    {:ok, row(components: [card(value: "Hello", inner_title: "Title")])}
  end
end

Registration in Router

live_dashboard "/dashboard",
  metrics: MyAppWeb.Telemetry,
  additional_pages: [
    my_page: MyApp.DashboardPages.MyPage
  ]

Available Components

  • card(value:, inner_title:) — simple KV display
  • table(columns:, rows:, id:, title:, row_attrs:) — data table
  • row(components:) — horizontal layout
  • columns(columns:) — multi-column layout

Gotcha: row_attrs must be a function: fn row -> [{"data-id", row.id}] end


DateTime Gotchas

DateTime.from_iso8601 Returns {:error, reason}, Not :error

# WRONG:
case DateTime.from_iso8601(str) do
  {:ok, dt, _} -> dt
  :error -> str          # This clause never matches!
end

# RIGHT:
case DateTime.from_iso8601(str) do
  {:ok, dt, _} -> dt
  {:error, _} -> str     # Correct error tuple
end

HTTP Calls from LiveView/GenServer

Using Req Library

# GET
case Req.get("http://localhost:8111/status", receive_timeout: 5000) do
  {:ok, %{status: 200, body: body}} when is_map(body) -> {:ok, body}
  {:ok, %{status: code}} -> {:error, "HTTP #{code}"}
  {:error, reason} -> {:error, reason}
end

# POST with JSON
case Req.post(url, json: %{"prompt" => prompt}) do
  {:ok, %{status: 200, body: body}} -> {:ok, body}
  ...
end

Gotcha: Always handle when body might be a string (not auto-parsed JSON). Req parses JSON automatically when content-type is application/json.


Application Configuration

Reading Config at Runtime (Not Compile-Time)

Use function calls instead of module attributes for config that may change:

# WRONG — baked in at compile time:
@symbiont_url Application.get_env(:cortex_status, :services)[:symbiont_url]

# RIGHT — reads at runtime:
defp symbiont_url do
  config = Application.get_env(:cortex_status, :services, [])
  Keyword.get(config, :symbiont_url, "http://127.0.0.1:8111")
end

os_mon for System Metrics

The app uses Erlang's os_mon for host-level metrics (requires :os_mon in extra_applications):

cpu_load = :cpu_sup.avg1() / 256  # normalized 0-1
{mem_total, mem_alloc, _} = :memsup.get_memory_data()
disk_data = :disksup.get_disk_data()  # [{mount, total_kb, percent_used}]

Release & Deployment

Build Commands

cd /data/cortex_status
MIX_ENV=prod mix deps.get
MIX_ENV=prod mix compile
MIX_ENV=prod mix assets.deploy  # tailwind + esbuild
MIX_ENV=prod mix release --overwrite
systemctl restart symbiont-ex-api

Environment Variables (runtime.exs)

  • SECRET_KEY_BASE — required in prod
  • PHX_HOST — defaults to cortex.hydrascale.net
  • PORT — defaults to 4000
  • SYMBIONT_URL — override Symbiont API base URL

check_origin: The LiveView Connection Killer

When LiveView connections silently fail (liveSocket.isConnected() returns false, _mount_attempts climbs into the thousands), the most likely culprit is a check_origin mismatch. Phoenix checks the HTTP Origin header against its configured URL.

Symptom: Phoenix logs show:

[error] Could not check origin for Phoenix.Socket transport.
Origin of the request: https://cortex.hydrascale.net

Root cause: Setting url: [host: h, port: 443, scheme: "https"] in runtime.exs causes Phoenix to expect an origin of https://cortex.hydrascale.net:443, but browsers send https://cortex.hydrascale.net (no port — 443 is implicit for HTTPS). String comparison fails.

Fix: Explicitly set check_origin in the endpoint config in runtime.exs:

config :cortex_status, CortexStatusWeb.Endpoint,
  url: [host: host, port: 443, scheme: "https"],
  http: [ip: {127, 0, 0, 1}, port: port],
  secret_key_base: secret_key_base,
  check_origin: ["https://cortex.hydrascale.net"]   # ← explicit, no port

Note: LiveView uses longpoll as its transport (WebSocket upgrades may not work through all Caddy configs). longpoll is functionally identical for most purposes — slightly more latency but fully supported.


LiveView Auth Gate Pattern

Password-Protected LiveView Pages

For pages that need a simple password gate (like Mission Control), use assigns to track auth state and conditionally render:

def mount(_params, _session, socket) do
  {:ok, assign(socket, authenticated: false, error: nil, prompt: "")}
end

def handle_event("authenticate", %{"password" => pw}, socket) do
  if pw == Application.get_env(:my_app, :task_password) do
    {:noreply, assign(socket, authenticated: true, error: nil)}
  else
    {:noreply, assign(socket, error: "Invalid password")}
  end
end

In the template, wrap content with <%= if @authenticated do %>.

Gotcha: The password check happens server-side in the LiveView process, so it's secure even though the HTML is rendered client-side. But remember: the initial static render (before WebSocket connects) will show the unauthenticated state, so don't put sensitive data in the assigns until after authentication.


LiveView Silent Failures — Always Show Error Feedback

The Problem

When a LiveView event handler hits an error (API call fails, validation error, etc.) and you just return {:noreply, socket} without updating assigns, the user sees nothing happen. The button appears to do nothing. This is extremely confusing.

The Fix Pattern

Always maintain an error assign and display it:

def handle_event("submit_task", %{"prompt" => prompt}, socket) do
  case submit_to_api(prompt) do
    {:ok, task_id} ->
      {:noreply, assign(socket, task_id: task_id, error: nil)}
    {:error, reason} ->
      {:noreply, assign(socket, error: "Task failed: #{reason}")}
  end
end

In the template:

<%= if @error do %>
  <div class="alert alert-danger"><%= @error %></div>
<% end %>

Lesson learned the hard way: The Mission Control "Execute" button did nothing for a while because the Symbiont API was rejecting the auth token, but the error was swallowed silently. Always surface errors to the UI.


API Authentication from LiveView

Bearer Token vs Query Param vs JSON Body

When calling external APIs from LiveView, be careful about where auth tokens go. FastAPI's Depends() for auth reads from specific locations — if the API expects a query param and you send it in the JSON body, auth silently fails.

Preferred pattern: Use Authorization: Bearer <token> header — it's unambiguous:

headers = [{"authorization", "Bearer #{token}"}, {"content-type", "application/json"}]
case Req.post(url, json: payload, headers: headers) do
  {:ok, %{status: 200, body: body}} -> {:ok, body}
  {:ok, %{status: code, body: body}} -> {:error, "HTTP #{code}: #{inspect(body)}"}
  {:error, reason} -> {:error, inspect(reason)}
end

Gotcha: Always match on the status code, not just {:ok, _}. A 401 or 500 response is still {:ok, %Req.Response{}} — it's only :error if the HTTP request itself fails (timeout, DNS, connection refused).


LiveView Form Gotchas (Expanded)

phx-submit Doesn't Fire Without a Submit Button

If your form has phx-submit="do_thing" but no <button type="submit">, pressing Enter in a text input may not trigger the event in all browsers.

Textarea Value Persistence

When using phx-change on a form with a textarea, the server receives the current value on every keystroke. If your handle_event for phx-change doesn't re-assign the textarea value, it can appear to reset or flicker:

# In handle_event("update", params, socket):
def handle_event("update", %{"prompt" => prompt}, socket) do
  {:noreply, assign(socket, prompt: prompt)}  # ← must re-assign
end

Form Params Are Always Strings

All form params arrive as strings, even for number inputs:

# WRONG:
def handle_event("set_count", %{"count" => count}, socket) when is_integer(count)
# This clause NEVER matches — count is always a string

# RIGHT:
def handle_event("set_count", %{"count" => count_str}, socket) do
  count = String.to_integer(count_str)
  {:noreply, assign(socket, count: count)}
end

LiveDashboard Gotchas

table() Component Limitations

The LiveDashboard table() component expects very specific data shapes and can be finicky with dynamic data. If your data doesn't fit cleanly, use card() with formatted text instead — it's more flexible and less error-prone.

What went wrong: Mission Control initially tried to use table() for task display but hit issues with dynamic columns. Switched to card() with pre-formatted text, which worked immediately.

Custom Page render_page/1 Returns Tuples

render_page/1 must return {:ok, component_tree}, not just a component:

# WRONG:
def render_page(assigns), do: row(components: [...])

# RIGHT:
def render_page(_assigns), do: {:ok, row(components: [...])}

Debugging LiveView Connections

Diagnosis Checklist (when LiveView "doesn't work")

  1. Check liveSocket.isConnected() in browser console — false means the WebSocket/longpoll connection failed
  2. Check _mount_attempts — if climbing into thousands, it's retrying and failing
  3. Check Phoenix logs for check_origin errors (most common cause)
  4. Check Caddy/reverse proxy — WebSocket upgrade headers may be stripped
  5. Check runtime.exs — host/port/scheme must match the actual public URL
  6. Use Dendrite to automate this: navigate to the page, run liveSocket.isConnected() via JS, check the result programmatically

Longpoll vs WebSocket

Phoenix LiveView supports both transports. Behind Caddy, longpoll is often more reliable. In app.js:

let liveSocket = new LiveSocket("/live", Socket, {
  params: {_csrf_token: csrfToken},
  // transport: WebSocket  // uncomment to force WebSocket
})

If WebSocket connections fail silently, LiveView falls back to longpoll automatically. This is fine for most use cases.


Caddy + Phoenix Integration Notes

Reverse Proxy Config

cortex.hydrascale.net {
    reverse_proxy localhost:4000
    encode gzip
}

Important: Caddy handles TLS termination. Phoenix should listen on plain HTTP (127.0.0.1 only). Don't configure Phoenix for HTTPS — let Caddy do it.

The Self-Check Trap

If your Phoenix status page monitors URLs including its own domain (e.g., cortex.hydrascale.net), the HTTP request goes through Caddy, back to Phoenix, creating a circular dependency that times out. The timeout handler may also lose the site name, producing mysterious "?" entries.

Fix: Don't have the app check itself. If the status page is loading, it's up.


Release Build Gotchas

Mix Release vs Mix Run

In development: mix phx.server or iex -S mix phx.server In production: always use a release build:

MIX_ENV=prod mix deps.get
MIX_ENV=prod mix compile
MIX_ENV=prod mix assets.deploy
MIX_ENV=prod mix release --overwrite

Gotcha: mix assets.deploy must run BEFORE mix release. The release bundles the compiled assets — if you skip this step, the release will serve stale CSS/JS or no assets at all.

Config Hierarchy Matters

config/config.exs      → compile-time defaults (all envs)
config/dev.exs          → compile-time dev overrides
config/prod.exs         → compile-time prod overrides
config/runtime.exs      → runtime config (reads env vars, runs at boot)

Critical: Application.get_env/3 in module attributes (@foo Application.get_env(...)) reads at compile time. Use Application.compile_env/3 to make this explicit, or better yet, read config in a function that runs at runtime.

SECRET_KEY_BASE

Required in prod. Generate with: mix phx.gen.secret Set as environment variable or hardcode in runtime.exs (on a single-server deploy where the .env file is secured, this is fine).