Given an HMM with possible hidden states $\mathcal{H} = {h_1,\dots,h_{n_H}}$, emissions $\mathcal{X} = {x_1,\dots,x_{n_E}}$, actual hidden states $H_{1:T}={h_1,\dots,h_T}$ and emissions $X_{1:T}={x_1,\dots,x_T}$, and transition/emission probabilities $\mathbf\Theta = {\mathbf \Pi\in\mathbb{R}^{n_H\times n_H}, \mathbf E\in\mathbb{R}^{n_E\times n_H}}$, with $\pi_{i,j}\in\mathbf\Pi$ and $e_{i,j}\in\mathbf E$.
The overall likelihood of a sequence of hidden states $H_{1:t}$ and observed states $X_{1:t}$ is
\[\begin{align*} p(X_{1:t},H_{1:t}|\mathbf\Theta) &= \prod_{\tau = 1}^t p(x_\tau|h_\tau) p(h_\tau|h_{\tau - 1})\\ &= \prod_{\tau = 1}^t e_{x_\tau,h_\tau} \pi_{h_\tau,h_{\tau-1}}. \end{align*}\]By definition of conditional probability,
\[p(h_t|X_{1:t},\mathbf\Theta) = \frac{p(X_{1:t},h_t|\mathbf\Theta)}{p(X_{1:t}|\mathbf\Theta)}.\]The emission likelihood in the denominator could be computed inefficiently by summing over all possible hidden states in the observation likelihood
\[\begin{align*} p(X_{1:t}|\mathbf\Theta) &= \sum_{h_1,\dots,h_t} p(X_{1:t},H_{1:t}|\mathbf\Theta)\\ &= \sum_{h_1\in\mathcal{H},\dots,h_t\in\mathcal{H}} \prod_{\tau = 1}^t p(x_\tau|h_\tau) p(h_\tau|h_{\tau - 1}), \end{align*}\]which is $O(n_H^t)$ and thus intractible. Likewise, the numerator could be computed as
\[p(X_{1:t},h_t|\mathbf\Theta) = \sum_{h_1,\dots,h_{t-1}} p(X_{1:t},H_{1:t}|\mathbf\Theta),\]which also runs in exponential time and is intractible.
We can instead sum over the hidden states inductively; rather than summing over all $n_H^t$ possible hidden paths in the full likelihood, we can instead sum over just the $n_H$ hidden states at each timestep. Given initial hidden state probability $p(h_1)$, the joint probability of $x_1,h_1$ is
\[p(x_1,h_1|\mathbf\Theta) = p(x_1|h_1)p(h_1).\]The joint probability of $x_1,x_2,h_2,h_1$ is
\[p(x_1,x_2,h_1,h_2|\mathbf\Theta) = p(x_1,h_1|\mathbf\Theta)p(x_2|h_2)p(h_2|h_1);\]marginalizing over $h_1$, we have
\[p(x_1,x_2,h_2|\mathbf\Theta) = \sum_{h_1}p(h_1,x_1|\mathbf\Theta)p(x_2|h_2)p(h_2|h_1).\]Likewise, for $x_1,x_2,x_3,h_3$, we have
\[p(x_1,x_2,x_3,h_3|\mathbf\Theta) = \sum_{h_2}p(x_1,x_2,h_2|\mathbf\Theta)p(x_3|h_3)p(h_3|h_2).\]This is the forward algorithm; given base step
\[p(x_1,h_1|\mathbf\Theta) = p(x_1|h_1)p(h_1),\]the $\tau$th inductive step is
\[p(x_1,\dots,x_\tau,h_\tau|\mathbf\Theta) = \sum_{h_{\tau-1}}p(x_1,\dots,x_{\tau-1},h_{\tau-1}|\mathbf\Theta)p(x_\tau|h_\tau)p(h_\tau|h_{\tau-1}).\]From this it is straightforward to compute
\[p(h_t|X_{1:t},\mathbf\Theta)=\frac{p(x_1,\dots,x_t,h_t|\mathbf\Theta)}{\sum_{h_t}p(x_1,\dots,x_t,h_t|\mathbf\Theta)}.\]We may also be interested in computing the probability of the hidden state at step $t$, given all observed states, i.e. including those that come after $t$. We do this by starting with the full joint probability of all observed states and the hidden state at $t$, which we factor as
\[\begin{align*} p(X_{1:t}, X_{(t+1):T},h_t)&=p(X_{(t+1):T}|\underbrace{X_{1:t}}_{\emptyset}, h_t)p(X_{1:t}, h_t)\\ &=p(x_{t+1},\dots,x_T|h_t)p(x_1,\dots,x_t,h_t) \end{align*}\]The right factor comes from the forward algorithm above. For the left factor $p(x_T,\dots,x_{t+1}|h_t)$, we first note that the conditioning on $X_{1:t}$ disappears because by the HMM graphical model, there is no conditional dependence between emissions. We then take the joint probability of $x_T,h_T|h_{T-1}$, and note that this can be marginalized to $p(x_T|h_{T-1})$,
\[p(x_T|h_{T-1}) = \sum_{h_T} p(x_T|h_T)p(h_T|h_{T-1}).\]Likewise, the joint probability of $x_T,x_{T-1},h_{T-1}|h_{T-2}$ is
\[p(x_T,x_{T-1},h_{T-1}|h_{T-2}) = p(x_T|h_{T-1})p(h_{T-1}|h_{T-2})p(x_{T-1}|h_{T-1});\]marginalizing out $h_{T-1}$, we get
\[p(x_T,x_{T-1}|h_{T-2}) = \sum_{h_{T-1}}p(x_T|h_{T-1})p(h_{T-1}|h_{T-2})p(x_{T-1}|h_{T-1}).\]Explicitly writing out one more recursion for clarity:
\[p(x_T,x_{T-1},x_{T-2}|h_{T-3}) = \sum_{h_{T-2}}p(x_T,x_{T-1}|h_{T-2})p(h_{T-2}|h_{T-3})p(x_{T-2}|h_{T-2})\]This is the backward algorithm; given base step
\[p(x_T|h_{T-1}) = \sum_{h_T} p(x_T|h_T)p(h_T|h_{T-1}),\]the $b$th inductive step yields
\[p(x_T,x_{T-1},\dots, x_{T-b}|h_{T-b-1})=\sum_{h_{T - b}} p(x_T,x_{T-1},\dots, x_{T-b+1}|h_{T-b})p(h_{T-b}|h_{T-b-1})p(x_{T-b}|h_{T-b}).\]We can thus compute $p(h_t|X_{1:t},X_{t+1,T})$ from the full joint distribution by normalizing,
\[\begin{align*} p(h_t|X_{1:t}, X_{(t+1):T})&=\frac{p(X_{1:t}, X_{(t+1):T},h_t)}{p(X_{1:t}, X_{(t+1):T})}\\ &=\frac{p(x_{t+1},\dots,x_T|h_t)p(x_1,\dots,x_t,h_t)}{\sum_{h_T}p(x_1,\dots,x_T,h_T)} \end{align*}\]