cortex_status/elixir_notes.md

474 lines
15 KiB
Markdown

# Elixir / Phoenix Learnings — Cortex Status Dashboard
Patterns, gotchas, and reference notes from building the cortex_status Phoenix app.
## Project: cortex_status
- **Location on cortex**: `/data/cortex_status/`
- **Service**: `symbiont-ex-api.service` (systemd)
- **Port**: 4000 (behind Caddy at status.hydrascale.net)
- **Framework**: Phoenix 1.7.14, LiveView 1.0.0, LiveDashboard 0.8.4
---
## LiveView Patterns
### PubSub for Real-Time Updates
The status page subscribes to PubSub topics on mount and receives broadcast updates:
```elixir
def mount(_params, _session, socket) do
if connected?(socket) do
Phoenix.PubSub.subscribe(CortexStatus.PubSub, "service_status")
end
{:ok, assign(socket, ...)}
end
def handle_info({:status_update, new_status}, socket) do
{:noreply, assign(socket, status: new_status)}
end
```
### Polling Pattern (Process.send_after)
For polling an external API from a LiveView (e.g., task progress):
```elixir
@poll_interval 1_000
# Start polling
timer = Process.send_after(self(), :poll_task, @poll_interval)
{:noreply, assign(socket, poll_timer: timer)}
# Handle poll
def handle_info(:poll_task, socket) do
case fetch_progress(socket.assigns.task_id) do
{:ok, data} when is_map(data) ->
terminal = data["status"] in ["completed", "failed"]
timer = if terminal, do: nil, else: Process.send_after(self(), :poll_task, @poll_interval)
{:noreply, assign(socket, data: data, poll_timer: timer)}
_ ->
# Don't crash on bad data — keep polling
timer = Process.send_after(self(), :poll_task, @poll_interval)
{:noreply, assign(socket, poll_timer: timer)}
end
end
```
**Gotcha**: Always cancel timers on unmount/logout:
```elixir
if socket.assigns.poll_timer, do: Process.cancel_timer(socket.assigns.poll_timer)
```
### Component Attrs and Passing Assigns
When defining function components with `attr`, use `assigns` directly. For passing whole assigns bundles to sub-components, use a named assign:
```elixir
# Don't try to pass @assigns directly — use a named prop
<.auth_gate socket_assigns={assigns} />
defp auth_gate(assigns) do
~H"""
<%= @socket_assigns.prompt %>
"""
end
```
### phx-change Goes on the Form, Not Individual Inputs
```elixir
# WRONG: phx-change on textarea alone won't fire
<textarea phx-change="update_prompt" name="prompt"></textarea>
# RIGHT: phx-change on the form, inputs trigger it
<form phx-submit="submit" phx-change="update">
<textarea name="prompt"><%= @prompt %></textarea>
</form>
```
### Enum.with_index Returns {element, index}
```elixir
# WRONG — destructuring is backwards:
for {index, element} <- Enum.with_index(list)
# RIGHT:
for {element, index} <- Enum.with_index(list)
```
---
## LiveDashboard Custom Pages
### Basic Structure
```elixir
defmodule MyApp.DashboardPages.MyPage do
use Phoenix.LiveDashboard.PageBuilder
@impl true
def menu_link(_, _), do: {:ok, "Page Title"}
@impl true
def render_page(_assigns) do
{:ok, row(components: [card(value: "Hello", inner_title: "Title")])}
end
end
```
### Registration in Router
```elixir
live_dashboard "/dashboard",
metrics: MyAppWeb.Telemetry,
additional_pages: [
my_page: MyApp.DashboardPages.MyPage
]
```
### Available Components
- `card(value:, inner_title:)` — simple KV display
- `table(columns:, rows:, id:, title:, row_attrs:)` — data table
- `row(components:)` — horizontal layout
- `columns(columns:)` — multi-column layout
**Gotcha**: `row_attrs` must be a function: `fn row -> [{"data-id", row.id}] end`
---
## DateTime Gotchas
### DateTime.from_iso8601 Returns {:error, reason}, Not :error
```elixir
# WRONG:
case DateTime.from_iso8601(str) do
{:ok, dt, _} -> dt
:error -> str # This clause never matches!
end
# RIGHT:
case DateTime.from_iso8601(str) do
{:ok, dt, _} -> dt
{:error, _} -> str # Correct error tuple
end
```
---
## HTTP Calls from LiveView/GenServer
### Using Req Library
```elixir
# GET
case Req.get("http://localhost:8111/status", receive_timeout: 5000) do
{:ok, %{status: 200, body: body}} when is_map(body) -> {:ok, body}
{:ok, %{status: code}} -> {:error, "HTTP #{code}"}
{:error, reason} -> {:error, reason}
end
# POST with JSON
case Req.post(url, json: %{"prompt" => prompt}) do
{:ok, %{status: 200, body: body}} -> {:ok, body}
...
end
```
**Gotcha**: Always handle when `body` might be a string (not auto-parsed JSON). Req parses JSON automatically when content-type is application/json.
---
## Application Configuration
### Reading Config at Runtime (Not Compile-Time)
Use function calls instead of module attributes for config that may change:
```elixir
# WRONG — baked in at compile time:
@symbiont_url Application.get_env(:cortex_status, :services)[:symbiont_url]
# RIGHT — reads at runtime:
defp symbiont_url do
config = Application.get_env(:cortex_status, :services, [])
Keyword.get(config, :symbiont_url, "http://127.0.0.1:8111")
end
```
---
## os_mon for System Metrics
The app uses Erlang's `os_mon` for host-level metrics (requires `:os_mon` in `extra_applications`):
```elixir
cpu_load = :cpu_sup.avg1() / 256 # normalized 0-1
{mem_total, mem_alloc, _} = :memsup.get_memory_data()
disk_data = :disksup.get_disk_data() # [{mount, total_kb, percent_used}]
```
---
## Release & Deployment
### Build Commands
```bash
cd /data/cortex_status
MIX_ENV=prod mix deps.get
MIX_ENV=prod mix compile
MIX_ENV=prod mix assets.deploy # tailwind + esbuild
MIX_ENV=prod mix release --overwrite
systemctl restart symbiont-ex-api
```
### Environment Variables (runtime.exs)
- `SECRET_KEY_BASE` — required in prod
- `PHX_HOST` — defaults to cortex.hydrascale.net
- `PORT` — defaults to 4000
- `SYMBIONT_URL` — override Symbiont API base URL
---
## check_origin: The LiveView Connection Killer
When LiveView connections silently fail (`liveSocket.isConnected()` returns `false`,
`_mount_attempts` climbs into the thousands), the most likely culprit is a `check_origin`
mismatch. Phoenix checks the HTTP `Origin` header against its configured URL.
**Symptom**: Phoenix logs show:
```
[error] Could not check origin for Phoenix.Socket transport.
Origin of the request: https://cortex.hydrascale.net
```
**Root cause**: Setting `url: [host: h, port: 443, scheme: "https"]` in `runtime.exs`
causes Phoenix to expect an origin of `https://cortex.hydrascale.net:443`, but browsers
send `https://cortex.hydrascale.net` (no port — 443 is implicit for HTTPS). String
comparison fails.
**Fix**: Explicitly set `check_origin` in the endpoint config in `runtime.exs`:
```elixir
config :cortex_status, CortexStatusWeb.Endpoint,
url: [host: host, port: 443, scheme: "https"],
http: [ip: {127, 0, 0, 1}, port: port],
secret_key_base: secret_key_base,
check_origin: ["https://cortex.hydrascale.net"] # ← explicit, no port
```
**Note**: LiveView uses `longpoll` as its transport (WebSocket upgrades may not work
through all Caddy configs). `longpoll` is functionally identical for most purposes —
slightly more latency but fully supported.
---
## LiveView Auth Gate Pattern
### Password-Protected LiveView Pages
For pages that need a simple password gate (like Mission Control), use assigns to track
auth state and conditionally render:
```elixir
def mount(_params, _session, socket) do
{:ok, assign(socket, authenticated: false, error: nil, prompt: "")}
end
def handle_event("authenticate", %{"password" => pw}, socket) do
if pw == Application.get_env(:my_app, :task_password) do
{:noreply, assign(socket, authenticated: true, error: nil)}
else
{:noreply, assign(socket, error: "Invalid password")}
end
end
```
In the template, wrap content with `<%= if @authenticated do %>`.
**Gotcha**: The password check happens server-side in the LiveView process, so it's
secure even though the HTML is rendered client-side. But remember: the initial static
render (before WebSocket connects) will show the unauthenticated state, so don't put
sensitive data in the assigns until after authentication.
---
## LiveView Silent Failures — Always Show Error Feedback
### The Problem
When a LiveView event handler hits an error (API call fails, validation error, etc.)
and you just return `{:noreply, socket}` without updating assigns, the user sees
*nothing happen*. The button appears to do nothing. This is extremely confusing.
### The Fix Pattern
Always maintain an `error` assign and display it:
```elixir
def handle_event("submit_task", %{"prompt" => prompt}, socket) do
case submit_to_api(prompt) do
{:ok, task_id} ->
{:noreply, assign(socket, task_id: task_id, error: nil)}
{:error, reason} ->
{:noreply, assign(socket, error: "Task failed: #{reason}")}
end
end
```
In the template:
```elixir
<%= if @error do %>
<div class="alert alert-danger"><%= @error %></div>
<% end %>
```
**Lesson learned the hard way**: The Mission Control "Execute" button did nothing
for a while because the Symbiont API was rejecting the auth token, but the error
was swallowed silently. Always surface errors to the UI.
---
## API Authentication from LiveView
### Bearer Token vs Query Param vs JSON Body
When calling external APIs from LiveView, be careful about where auth tokens go.
FastAPI's `Depends()` for auth reads from specific locations — if the API expects
a query param and you send it in the JSON body, auth silently fails.
**Preferred pattern**: Use `Authorization: Bearer <token>` header — it's unambiguous:
```elixir
headers = [{"authorization", "Bearer #{token}"}, {"content-type", "application/json"}]
case Req.post(url, json: payload, headers: headers) do
{:ok, %{status: 200, body: body}} -> {:ok, body}
{:ok, %{status: code, body: body}} -> {:error, "HTTP #{code}: #{inspect(body)}"}
{:error, reason} -> {:error, inspect(reason)}
end
```
**Gotcha**: Always match on the status code, not just `{:ok, _}`. A 401 or 500
response is still `{:ok, %Req.Response{}}` — it's only `:error` if the HTTP
request itself fails (timeout, DNS, connection refused).
---
## LiveView Form Gotchas (Expanded)
### phx-submit Doesn't Fire Without a Submit Button
If your form has `phx-submit="do_thing"` but no `<button type="submit">`, pressing
Enter in a text input may not trigger the event in all browsers.
### Textarea Value Persistence
When using `phx-change` on a form with a textarea, the server receives the current
value on every keystroke. If your `handle_event` for `phx-change` doesn't re-assign
the textarea value, it can appear to reset or flicker:
```elixir
# In handle_event("update", params, socket):
def handle_event("update", %{"prompt" => prompt}, socket) do
{:noreply, assign(socket, prompt: prompt)} # ← must re-assign
end
```
### Form Params Are Always Strings
All form params arrive as strings, even for number inputs:
```elixir
# WRONG:
def handle_event("set_count", %{"count" => count}, socket) when is_integer(count)
# This clause NEVER matches — count is always a string
# RIGHT:
def handle_event("set_count", %{"count" => count_str}, socket) do
count = String.to_integer(count_str)
{:noreply, assign(socket, count: count)}
end
```
---
## LiveDashboard Gotchas
### table() Component Limitations
The LiveDashboard `table()` component expects very specific data shapes and can be
finicky with dynamic data. If your data doesn't fit cleanly, use `card()` with
formatted text instead — it's more flexible and less error-prone.
**What went wrong**: Mission Control initially tried to use `table()` for task display
but hit issues with dynamic columns. Switched to `card()` with pre-formatted text,
which worked immediately.
### Custom Page render_page/1 Returns Tuples
`render_page/1` must return `{:ok, component_tree}`, not just a component:
```elixir
# WRONG:
def render_page(assigns), do: row(components: [...])
# RIGHT:
def render_page(_assigns), do: {:ok, row(components: [...])}
```
---
## Debugging LiveView Connections
### Diagnosis Checklist (when LiveView "doesn't work")
1. **Check `liveSocket.isConnected()`** in browser console — `false` means the
WebSocket/longpoll connection failed
2. **Check `_mount_attempts`** — if climbing into thousands, it's retrying and failing
3. **Check Phoenix logs** for `check_origin` errors (most common cause)
4. **Check Caddy/reverse proxy** — WebSocket upgrade headers may be stripped
5. **Check `runtime.exs`** — host/port/scheme must match the actual public URL
6. **Use Dendrite** to automate this: navigate to the page, run
`liveSocket.isConnected()` via JS, check the result programmatically
### Longpoll vs WebSocket
Phoenix LiveView supports both transports. Behind Caddy, longpoll is often more
reliable. In `app.js`:
```javascript
let liveSocket = new LiveSocket("/live", Socket, {
params: {_csrf_token: csrfToken},
// transport: WebSocket // uncomment to force WebSocket
})
```
If WebSocket connections fail silently, LiveView falls back to longpoll automatically.
This is fine for most use cases.
---
## Caddy + Phoenix Integration Notes
### Reverse Proxy Config
```
cortex.hydrascale.net {
reverse_proxy localhost:4000
encode gzip
}
```
**Important**: Caddy handles TLS termination. Phoenix should listen on plain HTTP
(127.0.0.1 only). Don't configure Phoenix for HTTPS — let Caddy do it.
### The Self-Check Trap
If your Phoenix status page monitors URLs including its own domain
(e.g., `cortex.hydrascale.net`), the HTTP request goes through Caddy, back to
Phoenix, creating a circular dependency that times out. The timeout handler may
also lose the site name, producing mysterious "?" entries.
**Fix**: Don't have the app check itself. If the status page is loading, it's up.
---
## Release Build Gotchas
### Mix Release vs Mix Run
In development: `mix phx.server` or `iex -S mix phx.server`
In production: always use a release build:
```bash
MIX_ENV=prod mix deps.get
MIX_ENV=prod mix compile
MIX_ENV=prod mix assets.deploy
MIX_ENV=prod mix release --overwrite
```
**Gotcha**: `mix assets.deploy` must run BEFORE `mix release`. The release bundles
the compiled assets — if you skip this step, the release will serve stale CSS/JS
or no assets at all.
### Config Hierarchy Matters
```
config/config.exs → compile-time defaults (all envs)
config/dev.exs → compile-time dev overrides
config/prod.exs → compile-time prod overrides
config/runtime.exs → runtime config (reads env vars, runs at boot)
```
**Critical**: `Application.get_env/3` in module attributes (`@foo Application.get_env(...)`)
reads at **compile time**. Use `Application.compile_env/3` to make this explicit, or
better yet, read config in a function that runs at runtime.
### SECRET_KEY_BASE
Required in prod. Generate with: `mix phx.gen.secret`
Set as environment variable or hardcode in runtime.exs (on a single-server deploy
where the .env file is secured, this is fine).