Modules over PIDs

Theorem B.

Let \(R\) be a PID and \(M\) a finitely generated \(R\)-module.

If \(F\) is a finitely generated free \(R\)-module of rank \(r\) and \(N\subseteq F\) is a submodule, then \(N\) is a finitely generated free \(R\)-module of rank less than or equal to \(r\).
If \(M\) is torsion-free, then \(M\) is a free \(R\)-module.
If \(M\) can be generated by \(n\) elements, then any submodule of \(M\) can be generated by \(n\) elements or less. In particular, a submodule of a cyclic module is cyclic.
There exists a submodule \(F\subseteq M\) such that \(F\) is a free \(R\)-module and \(M = F\oplus T(M)\).

Proof:

In order to make the notation a bit less confusing, henceforth (throughout these notes), if \(x\) is an element of an \(R\)-module, we will write \(\langle x\rangle\) for the cyclic submodule generated by \(x\) and if \(a\in R\), we will write \(aR\) for the principal ideal generated by \(a\). For (i), we induct on \(r\). If \(r = 1\), then \(F = \langle x\rangle\) for some \(x\in R\). If \(N = 0\), there is nothing to prove. Otherwise, since \(N \subseteq Rx\), we consider the ideal \(J := \{t\in R\ |\ tx\in N\}\). We then have \(J = aR\), for some \(a\in R\). We claim \(N = \langle ax\rangle\), which is a free \(R\)-module, since \(F\) is torsion-free. Clearly \(\langle ax\rangle \subseteq N\), by definition of \(a\). On the other hand, if \(n \in N\), then \(n = rx\), for some \(r\in J\), so \(r = r'a\), for some \(r'\in R\). Thus, \(n = tx = (r'a)x = r'(ax)\), which shows that \(N\subseteq \langle ax\rangle\), which gives what we want.

Suppose \(r > 1\). Let \(x_1, \ldots, x_r\) be a basis for \(F\). Set \(G := \langle x_1, \ldots, x_{r-1}\rangle\), a free module of rank \(r-1\). By induction on \(r\), \(N\cap G\) is either \((0)\) or a free \(R\)-module of rank \(r-1\) or less. Now every element \(n\) in \(N\) can be written in the form \(n = a_1x_1+\cdots + a_{r-1}x_{r-1}+ a_rx_r\), with each \(a_j \in R\). If, for every element in \(N\), \(a_r = 0\), then \(N \subseteq G\), and we are done by induction on \(r\). Otherwise, let \(J\) denote the ideal of \(R\) generated by all of the coefficients of \(x_r\) as \(n\) varies over the elements of \(N\). \(J\) is clearly an ideal of \(R\). Thus, \(J = aR\), for sone \(a\in R\). By definition of \(J\), there exists \(n_0 \in N\) of the form \(n_0 = s_1x_1 +\cdots +s_{r-1}x_{r-1}+ax_r\).

We claim \(N = (N\cap G)\oplus \langle n_0\rangle\). If so, then, on the one hand, \(\langle n_0\rangle\) is a free \(R\)-module of rank one. On the other hand, \(N\cap G\) has a basis consisting of \(r-1\) or fewer elements. Putting these bases together gives a basis for \(N\) having no more than \(n\) elements (see Homework 5), which is what we want. For the claim, take \(n = c_1x_1+\cdots c_{r-1}x_{r-1}+ c_rx_r\) in \(N\). If \(c_r = 0\), then \(n\in N\cap G\). Otherwise, by definition of \(J\), \(c_r = da\), for some \(d\in R\). it follows that \(n-dn_0 \in N\cap G\). Therefore, \(n \in (N\cap G) + \langle n_0\rangle\), and therefore \(N = (N\cap G)+ \langle n_0\rangle\). On the other hand, suppose \(t_1x_1+\cdots t_{r-1}x_{r-1} = sn_0\) belongs to \((N\cap G)\cap \langle n_0\rangle\). Then

t_1x_1+\cdots +t_{r-1}x_{r-1} = ss_1 x_1+\cdots +ss_{r-1}x_{r-1} + sax_r.

Since the \(x_j\) are linearly independent, this gives \(sax_r = 0\), forcing \(s = 0\). Thus, \(sn_0 = 0\), showing \((N\cap G)\cap \langle n_0\rangle = 0\), and thus, \(N = (N\cap G)\oplus \langle n_0\rangle\), as required.

For (ii), suppose \(M\) can be generated by \(r\) elements. If \(r = 0\), or \(r = 1\), there is nothing to prove. Otherwise, \(M\) is isomorphic to a quotient of \(R^r\), which is a free \(R\)-module. Let \(\phi : R^r\to M\) be a surjective \(R\)-module homomorphism. Then the kernel of \(\phi\) is a submodule of a free module, and therefore, by (i), \(K := \textrm{ker}(\phi)\) is a free module of rank \(t\leq r\). Let \(f_1, \ldots, f_t\) be a basis for \(K\). Extend this to a basis \(f_1, \ldots, f_t, f_{t+1}, \ldots, f_r\) for \(R^r\). For \(x\in R^r\), write \(x = a_1f_1+\cdots + a_rf_r\). If \(x\in K\), then \(\phi(x) = 0\), which gives \(a_{t+1} = \cdots = a_r = 0\), since otherwise \(x\not \in K\). On the other hand, if \(a_{t+1} = \cdots = a_r = 0\), then \(x\) belongs to the span of \(f_1, \ldots, f_t\), and thus \(x\in K\). Thus, an element of \(R^r\) is in \(K\) if and only if its last \(r-t\) coordinates with respect to the basis are zero. Now, since \(M\) is torsion-free and \(K\subseteq R^r\), which is also torsion-free, if any non-zero linear combination \(a_{t+1}f_{t+1}+\cdots + a_rf_r\) belonged to \(K\), we would have \(\phi (a_{t+1}f_{t+1}+\cdots + a_rf_r) = 0\). Since \(\phi\) is a surjective map from \(R^r\) onto \(M\), which is torsion-free, this would force \(a_{t+1}f_{t+1}+\cdots + a_rf_r = 0\) in \(R^r\), which is impossible unless all \(a_i = 0\), since \(R^r\) is torsion-free. Thus, \(K = \langle f_1, \ldots, f_t\rangle\) and \(R^r = K\oplus \langle f_{t+1}, \ldots, f_r\rangle\). Therefore, \(M \cong R^r/K \cong \langle f_{t+1}, \ldots, f_r\rangle\), showing that \(M\) is free.

For (iii), since \(M\) is a quotient of the free module \(R^n\), any submodule \(N\) of \(M\) is a quotient of a submodule of \(R^n\), which by (i) is a finitely generated module having at most \(n\) generators.

For (iv), by (iii), the torsion submodule \(T(M)\) is finitely generated. By (ii), there exists a free submodule \(F\subseteq M\) such that \(M = F\oplus T(M)\). ∎

Proposition C.

Let \(R\) be a PID and \(M\) a finitely generated torsion \(R\)-module. Suppose \(p\in R\) is a prime element. Set \(M(p) := \{x \in M\ |\ p^nx = 0, \ \textrm{for some}\ n\geq 0\}\). Then

\(M(p)\) is a submodule of \(M\), and \(M(p)\) is a torsion module whose annihilator is a power of \(p\).
\(M(q) = 0\), for any prime element \(q\in R\) that is not an associate of \(p\).

Proof:

For (i), suppose \(x, y\in M(p)\) and \(r\in R\). Then there exist \(m, n\geq 0\) such that \(p^mx = 0\) and \(p^ny = 0\). Set \(s := \textrm{max}\{m, n\}\). Then \(p^s (x+ry) = p^sx +rp^sy = 0\), which shows \(M(p)\) is a submodule. Since every element in \(M(p)\) is annihilated by a power of \(p\), \(M(p)\) is a torsion module. Let \(\textrm{ann}(M) = aR\). Since \(ax = 0\), for all \(x\in M(p)\), \(a\in \textrm{ann}(M(p))\). On the other hand, if \(0\not = b \in \textrm{ann}(M(p))\), then since \(M(p)\subseteq M\), \(b\in \textrm{ann}(M) = aR\), so \(b = ra\), for some \(r\in R\). By unique factorization, we may write \(a = up_1^{e_1}\cdots p_t^{e_t}\), for a unit \(u\) and distinct primes \(p_i\), with \(e_i\geq 1\). If \(x\in M(p)\), then \(p^nx = 0\), for some \(n\geq 0\), so that \(p_1^{e_1}\cdots p_t^{e_t} \mid p^n\). By unique factorization, this implies that \(p\) is an associate of some \(p_i\). Thus, we may assume \(p = p_1\). Then \(a = up^{e_1}p_2^{e_2}\cdots p_t^{e_t}\). Since \(b = ra\) annihilates all of \(M(p)\), and \(a\) also annihilates all of \(M(p)\), we have \(\textrm{ann}(M(p)) = p^eR\), for some \(e\geq 1\).

For (ii), suppose \(q\) is a prime element that is not an associate of \(p\) and \(0\not = x\in M(q)\). Then \(q^n x = 0\), for some \(n\geq 0\), and thus, the annihilator of \(x\), which is a principal ideal generated by an element of \(R\), is divisible by \(q^n\). But \(\textrm{ann}(x)\subseteq \textrm{ann}(M(q))\), which by (i) equals \(p^eR\), for some \(e\geq 0\). Thus, \(p^e\) is divisible by \(q^n\). By unique factorization, \(q\) is an associate of \(p\), which is a contradiction. ∎

Proposition D.

Let \(R\) be a PID, \(p\in R\) a prime element, and \(M\) a finitely generated torsion \(R\)-module with \(\textrm{ann}(M) = pR\). Then

\(M\) is a vector space over the field \(R/pR\).
If \(x_1, \ldots, x_t\) generate \(M\) as an \(R\)-module, then their images \(\overline{x_1}, \ldots, \overline{x_t}\) span \(M\) as a vector space over \(R/pR\).
Suppose \(x\in M\) is such that \(\textrm{ann}(x) = pR = \textrm{ann} (M)\). If \(N\) is a non-trivial cyclic submodule of \(\langle x\rangle\), then \(N = \langle x\rangle\).
Suppose \(N\subseteq \langle x\rangle\) is a non-trivial submodule of \(\langle x\rangle\), where \(\textrm{ann}(x) = pR\). Then \(N\) is a cyclic submodule and \(N = \langle x\rangle\).
Suppose \(M\) cannot be generated by fewer than \(n\) elements and \(M = \langle x_1, \ldots, x_n\rangle\). Then \(M = \langle x_1\rangle\oplus \cdots \oplus \langle x_n\rangle\).

Proof:

For (i), since \(pM = 0\), for any \(x\in M\) and \(r_1, r_2\in R\), if \(r_1-r_2 \in pR\), we have \((r_1-r_2)x = 0\), which shows that \(r_1x = r_2x\). Thus, for any residue class \(\overline{r}\in R/pR\), we may define \(\overline{r}\cdot x := rx\). It is now routine to check that \(M\) is a vector space over \(R/pR\).

For (ii), let \(x\in M\). Since \(x_1, \ldots, x_t\) generate \(M\) as an \(R\)-module, we may write \(x = r_1x_1+\cdots + r_tx_t\), for \(r_i\in R\). Thus, \(x = \overline{r_1}\cdot x_1+\cdots +\overline{r_t}\cdot x_t\), which shows that \(\overline{x_1}, \ldots, \overline{x_t}\) span \(M\) as a vector space over \(R/pR\).

For (iii), write \(N = \langle z\rangle\). Then \(z = rx\), for some \(r\in R\). If \(rz = 0\), for \(r\in R\), then \(r(rx) = 0\), so \((rr)x = 0\), which implies \(rr\in pR\). Since \(p\) is prime, either \(r\in pR\) or \(r\in pR\). In either case, \(rz \equiv 0\) in \(N\), so that \(z\) has annihilator \(pR\). This shows that \(\textrm{ann} (z) = pR\). Since \(\langle z\rangle = N \not = 0\), \(0\not = z = rx\), for some \(r\in R\), and \(rp x = pz = 0\). Since \(\textrm{ann} (x) = pR\), we have \(rp \in pR\), showing \(r\in pR\cup U(R)\). If \(r\in pR\), then \(z = 0\), contradicting \(z\not = 0\). Thus, \(r\) is a unit in \(R\), showing \(\langle z\rangle = \langle x\rangle\).

For (iv), by Theorem B (iii), \(N\) is a cyclic module, say \(N = \langle n\rangle\). Since \(N\subseteq \langle x\rangle\), we have \(n = rx\), for some \(r \in R\). Since \(R\) has the unique factorization property, we may write \(r = r_0p^c\), for \(r_0 \in R\) not divisible by \(p\). Thus, \(n = r_0p^cx\). Since \(n \not = 0\), we must have \(0\leq c < e\). On the one hand, we have \(\langle n \rangle \subseteq \langle p^cx\rangle\). On the other hand, we may write \(1 = ur_0+vp^{e-c}\), since \(p\) does not divide \(r_0\). Multiplying this equation by \(p^cx\), and using the fact that \(p^ex = 0\), we have, \(p^cx = ur_0p^cx\). Thus, \(p^cx = un\), showing that \(p^cx\in \langle n\rangle\), and thus \(\langle p^cx\rangle \subseteq \langle n\rangle\), which gives \(\langle p^cx\rangle = \langle n\rangle = N\), which is what we want.

For (v), let \(n\geq 1\) be such that \(M\) can be generated by \(n\) elements and \(M\) cannot be generated by fewer than \(n\) elements. Suppose \(M = \langle x_1, \ldots, x_n\rangle\). We show by induction on \(n\) that \(M = \langle x_1\rangle \oplus \cdots \oplus \langle x_n\rangle\). If \(n = 1\), there is nothing to prove. Suppose \(n > 1\). Set \(M' := \langle x_1, \ldots, x_{n-1}\rangle\). Clearly \(M'\) cannot be generated by fewer than \(n-1\) elements, otherwise these elements together with \(x_n\) would generated \(M\), contradicting the choice of \(n\). Thus, \(M' = \langle x_1\rangle \oplus \cdots \oplus \langle x_{n-1}\rangle\). It suffices to show \(M = M'\oplus \langle x_n\rangle\). Clearly \(M = M'+\langle x_n\rangle\). Suppose \(z\in M'\cap \langle x_n\rangle\). If \(z \not = 0\), \(\langle z\rangle \subseteq \langle x_n\rangle\), so by part (iii), \(\langle z\rangle = \langle x_n\rangle\), so \(x_n\in \langle z\rangle \subseteq M'\), which is a contradiction - since this would imply \(M = M'\). Therefore, \(z = 0\) and thus, \(M'\cap \langle x_n\rangle = 0\), which gives \(M = M'\oplus \langle x_n\rangle\), as required. ∎

Proposition F.

Let \(R\) be a PID and \(M\) a finitely generated, torsion \(R\)-module. Suppose \(\textrm{ann} (M) = aR\), and \(a = p_1^{e_1}\cdots p_r^{e_r}\), for primes \(p_i\in R\) and \(e_i\geq 1\). Then

\(M = M(p_1)\oplus \cdots\oplus M(p_r)\).
\(\textrm{ann} (M(p_i)) = p_i^{e_i}R\).

Proof:

For (i), set \(s_i := \Pi_{j\not = i} p_j^{e_j}\), for \(1\leq i\leq r\). Since the GCD of the \(s_i\) equals 1, the ideal generated by the \(s_i\) is \(R\). Thus, we may write \(1 = t_1s_1+\cdots + t_rs_r\). For any \(x\in M\), we have \(x = (t_1s_1x)+\cdots + (t_rs_rx)\). Since \(p_i^{e_i}\cdot (t_is_ix) = 0\), each \(t_is_ix\in M(p_i)\). This shows that \(M = M(p_1)+\cdots +M(p_r)\). On the other hand, by the previous proposition, each \(M(p_i)\) is annihilated by a power of \(p_i\), so we take \(\alpha _i\) to be the least power of \(p_i\) annihilating \(M(p_i)\). Then there exist \(c, d\in R\) such \(1 = cp_i^{\alpha _i} + du_i\), where \(u_i = \Pi_{j\not = i}p_j^{\alpha _j}\). Now suppose \(y \in M(p_i)\cap (\sum_{j\not = i}M(p_j))\). Then \(y = (cp_i^{\alpha_i}y) + (du_iy)\). Since \(y \in M(p_i)\), \(cp_i^{\alpha_i}y = 0\), while on the other hand, \(du_iy = 0\), since \(y \in \sum_{j\not = i} M(p_j)\). Thus, \(y = 0\), showing \(M = M(p_1)\oplus \cdots \oplus M(p_r)\), as required.

For (ii), by the previous paragraph, \(\textrm{ann} (M(p_i)) = p_i^{\alpha _i}R\), where \(\alpha _i\) to be the least power of \(p_i\) annihilating \(M(p_i)\). Set \(a' := p_i^{\alpha _1}\cdots p_r^{\alpha _r}\). Then \(a'\) annihilates \(M\), since every element in \(M\) is a sum of elements from the \(M(p_i)\). Thus, \(a \mid a'\). It follows that each \(\alpha _i \geq e_i\). Now suppose, for example, \(\alpha _1 > e_1\). Then there exists a non-zero \(x\in M(p_1)\) such that \(p_1^{e_1} x \not = 0\). Since \(p_1^{e_1}x\) is annihilated by \(p_2^{e_2}\cdots p_r^{e_r}\), the first part of the proof shows that \(p_1^{e_1}x\) belongs to \(M(p_2)+\cdots + M(p_r)\), contradicting the directness of the sum in part (i). Thus, \(p_1^{e_1}\) annihilates \(M(p_1)\) and we have \(\alpha _1 = e_1\). Similarly, \(\alpha _j = e_j\) for \(2\leq j\leq r\). ∎

Theorem G.

Let \(R\) be a PID and \(M\) a finitely generated \(R\)-module with \(\textrm{ann}(M) = p^eR\), where \(p\in R\) is prime and \(e\geq 1\). Then \(M\) is a direct sum of cyclic modules. In fact, there exist \(x_1, \ldots, x_n\in R\) and \(e = e_1\geq \cdots \geq e_n\) such that \(M = \langle x_1\rangle\oplus \cdots \oplus \langle x_n\rangle\), with \(\textrm{ann} (x_i) = p^{e_i}R\), for all \(i\).

Proof:

We begin by noting that there exists \(0 \not = x\in M\) such that \(p^{e-1}x \not = 0\). Otherwise, \(p^{e-1}\) annihilates every \(x\) in \(M\), and thus is divisible by \(p^e\), which cannot happen. So we start with \(0 \not = x\) such that \(p^{e-1}x \not = 0\), i.e., ann(\(x\)) = \(\textrm{ann} (M)\). If \(M = \langle x\rangle\), we are done. If \(M \not = \langle x \rangle\), we set \(x_1 := x\) and claim there is a submodule \(M_1\) such that \(M = \langle x_1\rangle \oplus M_1\). Suppose we could always find such an \(M_1\) whenever a cyclic submodule has the same annihilator as the module. Then, taking \(x_2 \in M_1\) so that \(\textrm{ann}(x_2) = \textrm{ann} (M_1)\), either \(M_1 = \langle x_2\rangle\), and thus, \(M = \langle x_1\rangle \oplus \langle x_2\rangle\) or there exists \(M_2\subseteq M_1\) such that \(M_1 = \langle x_2\rangle \oplus M_2\), so that \(M = \langle x_1\rangle \oplus \langle x_2\rangle \oplus M_2\). If we apply the construction inductively, then we have a chain of submodules,

\langle x_1\rangle \subseteq \langle x_1\rangle \oplus \langle x_2\rangle \subseteq \langle x_1\rangle \oplus \langle x_2\rangle\oplus \langle x_3 \rangle \subseteq \cdots.

Since \(M\) satisfies the ascending chain condition, this process must stop when \(M\) is a direct sum of cyclic submodules. For the statement about annihilators, first note that since \(p^eR\) is in the annihilator of every element and submodule of \(M\), the annihilator of every element and submodule divides \(p^e\) and is thus generated by a power of \(p\). Moreover, since \(M_{i+1}\subseteq M_i\), \(\textrm{ann}(M_i)\subseteq \textrm{ann} (M_{i+1})\), and therefore, if \(\textrm{ann}(M_i) = p^{e_i}R\) and \(\textrm{ann}(M_{i+1}) = p^{e_{i+1}}R\), \(e_i \geq e_{i+1}\). Since \(\textrm{ann} (x_i) = \textrm{ann}(M_i)\), the statement concerning annihilators follows.

Thus, we must prove the following statement: If \(M\) is a finitely generated \(R\)-module with \(\textrm{ann}(M) = p^eR\) and \(x\in M\) satisfies \(\textrm{ann}(x) = p^eR\), then there exists a submodule \(K\subseteq M\) such that \(M = \langle x \rangle \oplus K\). To see this, we will show that there exists an \(R\)-module homomorphism \(\alpha : M\to \langle x\rangle\) that is the identity on \(\langle x\rangle\). Suppose \(\alpha\) exists. Set \(K\) to be the kernel of \(\alpha\). Let \(m\in M\). Then \(\alpha (m)\in Rx\), and hence \(\alpha(\alpha(m)) = \alpha(m)\). Thus, \(\alpha (m-\alpha(m)) = \alpha(m) -\alpha(\alpha(m)) = 0\), so that \(m - \alpha(m) \in K\). Thus, \(m\subseteq \langle x\rangle +K\), since \(\alpha (m)\in \langle x\rangle\), by definition. Therefore, \(M = \langle x\rangle+K\). Suppose \(rx\in K\) belongs to \(\langle x\rangle \cap K\). Then \(rx = \alpha(rx) = 0\). Thus, \(\langle x\rangle\cap K = 0\), showing \(M = \langle x\rangle\oplus K\).

To find \(\alpha : M\to \langle x\rangle\) which is the identity on \(\langle x\rangle\), let \(\mathcal{C}\) denote the collection of submodules \(N\subseteq M\) containing \(\langle x\rangle\) for which there exists a homomorphism \(\gamma : N \to \langle x\rangle\) which is the identity on \(\langle x\rangle\). Note that \(\langle x\rangle\) belongs to \(\mathcal{C}\) by just taking the identity map on \(\langle x\rangle\), so \(\mathcal{C}\) is not empty. Then \(\mathcal{C}\) has a maximal element, say \(N\), together with a homomorphism \(\alpha : N\to \langle x\rangle\), which is the identity on \(\langle x\rangle\). We claim \(N = M\). If so, then we are done. Suppose not. Take \(m \in M\backslash N\) such that \(pm \in N\). Then \(p^e m = 0\), so that \(p^{e-1} \alpha (pm) = 0\). Now, since \(\alpha(pm) \in \langle x\rangle\), we may write \(\alpha (pm) = rx\), for some \(r\in R\). Thus, \(0 = p^{e-1} (rx) = (p^{e-1}r)x\), so \(p^{e-1}r \in \textrm{ann}(x) = p^eR\). Thus, \(p^{e-1}r\) is divisible by \(p^e\), so \(r\) is divisible by \(p\). Thus, we may write \(r = r_0p\) and therefore \(\alpha (pm) = p(r_0x)\). Set \(z := r_0x\), so that

\alpha (pm) = pz \in Rx.

We now define \(\gamma : N+\langle m\rangle \to \langle x\rangle\) as follows: \(\gamma (n+rm) = \alpha(n)+rz\), for all \(n\in N\) and \(r\in R\). If \(\gamma\) is well defined, then the fact that \(\gamma\) extends \(\alpha\) and \(N +\langle m\rangle\) is strictly larger that \(N\) contradicts the maximality of \(N\). Thus, we must have \(N = M\), which gives what we want.

To see that \(\gamma\) is well defined, suppose \(n+rm = n'+r'm\). Then \((n-n')+(r-r')m = 0\), so \((r-r')m = -(n-n') \in N\). Applying \(\alpha\) we get \(\alpha ((r-r')m) = \alpha (-(n-n')) = -\alpha (n-n')\), which is equivalent to \((r-r')\alpha (m) = -\alpha (n-n')\). Thus, \(\alpha(n)+rz = \alpha (n')+(r')z\), showing that \(\gamma\) is well defined. It is now straight forward to check that \(\gamma\) is an \(R\)-module homomorphism, which extends \(\alpha\), contradicting the maximality of \(N\). Thus, \(N = M\), and we are done. ∎

Rational Canonical Form via elementary divisors. Let \(V\) be a finite dimensional vector space over the field \(F\) and \(T: V\to V\) a linear transformation. Factor the minimal polynomial of \(T\) as \(q(x) = p_1(x)^{e_1} \cdots p_r(x)^{e_r}\), with each \(p_i(x)\) irreducible over \(F\). Then \(V\) is a direct sum of cyclic subspaces. In particular, for each \(1\leq i\leq r\) there exist positive integers \(e_i = e_{i,1}\geq \cdots \geq e_{i,n_i}\), and a basis \(\mathcal{B}\) for \(V\) such that \(A\), the matrix of \(T\) with respect to \(\mathcal{B}\), has the the block diagonal form

A = \begin{pmatrix} A_1 & 0 & \cdots & 0\\ 0 & A_2 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & A_r\\ \end{pmatrix},

and for each \(1\leq i\leq r\),

A_i = \begin{pmatrix} C(p_i(x)^{e_{i,1}}) & 0 & \cdots & 0\\ 0 & C(p_i(x)^{e_{i,2}}) & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & C(p_i(x)^{e_{i,n_i}})\\ \end{pmatrix}.\qed

We note that from the module perspective, \(V = V(p_1(x))\oplus \cdots \oplus V(p_r(x))\). Each \(V(p_i(x))\) is a direct sum of cyclic subspaces that yield a companion matrix of the form \(C(p_i(x)^{e_{i,j}})\), and each \(A_i\) is the matrix of \(T \rvert_{V(p_i(x))}\) with respect to the union of the cyclic bases from each of those cyclic subspaces.

An alternate approach to Theorem G. The proof of Theorem G above conceptually follows the same approach to the Rational Canonical Form Theorem taken in the Fall Math 790 class, in the following sense: If \(M\) is a finitely generated module over the PID \(R\) with \(p\in R\) prime, and \(\textrm{ann}(M) = p^eR\), then this corresponds to the case that the linear operator \(T\) on the finite dimensional vector space \(V\) over the field \(F\) has minimal polynomial \(p(x)^e \in F[x]\), with \(p(x)\) irreducible over \(F\). Taking \(x\in M\) with \(\textrm{ann} (x) = \textrm{ann}(M)\) corresponds to taking a maximal vector \(v\in V\). In the module case, we seek a summand \(K\) such that \(M = \langle x\rangle \oplus K\) and in the linear algebra case we seek a \(T\)-invariant complement of \(\langle v, T\rangle\), the cyclic subspace of \(V\) determined by \(v\) and \(T\). In each case, induction finishes the proof. The approach below to Theorem G relies more on a commutative algebra perspective, so we begin with a standard result.

Nakayama's Lemma.

Let \(R\) be a commutative ring, \(M\) a finitely generated \(R\)-module, and \(J\subseteq R\) the Jacobson radical of \(R\). Suppose \(N\subseteq M\) is a submodule and \(M = N+JM\). Then \(N = M\).

Proof:

We first note that \(M/JM = (JM+N)/N = J(M/N)\), so we have a finitely generated \(R\)-module \(A:= M/N\) satisfying \(A = JA\). If we show \(A = 0\), then \(M = N\). Starting again, suppose \(n\) is the least number of non-zero elements required to generate \(A\) and \(A = \langle x_1, \ldots, x_n\rangle\). Then \(x_1\in M = JM\), so we can write \(x_1 = j_1x_1+\cdots + j_nx_n\), for some \(j_i \in J\). Thus, \((1-j_1)x_1 = j_2x_2+\cdots +j_nx_n\). Since \(1-j_1\) is a unit, we have \(x_1\in \langle x_2, \ldots, x_n\rangle\), which implies \(A = \langle x_2, \ldots, x_n\rangle\), contradicting the minimality of \(n\). Thus, \(A = 0\), and \(M = N\). ∎

Corollary J.

Suppose \(R\) has a unique maximal ideal \(P\) and \(M\) is a finitely generated \(R\)-module. Set \(\tilde{R} := R/P\) and \(\tilde{M} := M/PM\), so that \(\tilde{M}\) is a finite dimensional vector space over \(\tilde{R}\). Take \(x_1, \ldots, x_n\) in \(M\). Then \(x_1, \ldots,x_n\) is a minimal generating set for \(M\), ie., \(x_1, \ldots, x_n\) generate \(M\), but no subset of the \(x_j\) generates \(M\), if and only if \(\tilde{x_1}, \ldots, \tilde{x_n}\) forms a basis for \(\tilde{M}\).

Here, we are writing \(\tilde{x_j}\) for the class of \(x_j\) in \(\tilde{M}\). It follows that every minimal generating set for \(M\) has the same number of elements, namely the dimension of the vector space \(\tilde{M}\) over \(\tilde{R}\).

Proof:

First assume that \(x_1, \ldots, x_n\) is a minimal generating set for \(M\). We clearly have that \(\tilde{x_1}, \ldots, \tilde{x_s}\) span \(\tilde{M}\). Suppose we have \(\tilde{r_1}\tilde{x_1}+\cdots +\tilde{r_n}\tilde{x_n} \equiv 0\) in \(\tilde{M}\). We want each \(\tilde{r_j}\equiv 0\) in \(\tilde{R}\), in other words, in \(R\), we should have \(r_j \in P\), for all \(j\). In \(M\), we have \(r_1x_1+\cdots +r_nx_n = z\), for some \(z\in PM\). Writing \(z = \sum_j t_jx_j\), with each \(t_i \in P\), we have \(\sum_{i=1}^s(r_i-t_i)x_i = 0\). Suppose, for example, \(\tilde{r_1} \not \equiv 0\) in \(\tilde{R}\). Then \(r_1\not \in P\), and thus, \(r_1-t_1 \not \in P\), so that \(r_1-t_1\) is a unit in \(R\). From the equation \(\sum_i (r_i-t_i)x_i = 0\), it follows that \(x_1\) is in the submodule of \(M\) generated by \(x_2, \ldots, x_n\). This gives \(M = \langle x_2, \ldots, x_n\rangle\), contradicting the minimality assumption. It follows, that \(\tilde{r_1} \equiv 0\) in \(\tilde{R}\), and similarly, \(\tilde{r_i}\equiv 0\), for all \(i\), showing that that \(\{\tilde{x_1}, \ldots, \tilde{x_n}\}\) is a basis for \(\tilde{M}\).

Conversely, suppose \(\tilde{x_1}, \ldots, \tilde{x_s}\) is a basis for \(\tilde{M}\). Set \(N := \langle x_1, \ldots, x_n\rangle\). Since the \(\tilde{x_j}\) span \(\tilde{M}\), we have \(M/PM = (N+PM)/PM\), so \(M = N+PM\). Thus, by Nakayama's lemma, \(N = M\), i.e., \(x_1, \ldots, x_n\) generate \(M\). Suppose this generating set is not minimal, say \(x_1 = r_2x_2+\cdots + r_nx_n\), for some \(r_i\in R\). In \(\tilde{M}\) we have \(\tilde{x_1} \equiv \tilde{r_2}\tilde{x_2}+\cdots +\tilde{r_n}{x_n}\), contradicting the linear independence of the \(\tilde{x_j}\). Thus, \(x_1, \ldots, x_n\) is a minimal generating set for \(M\). ∎

For the remainder of this note, we assume that \(R\) is a PID, and \(M\) is a finitely generated \(R\)-module with \(\textrm{ann} (M) = p^eR\), for \(p\in R\) prime. We begin with a couple of observation regarding \(M\).

Remarks K. (i) From Homework 1, we have that \(M\) is also an \(\overline{R} := R/p^eR\)-module, and moreover, for any residue class \(\overline{r}\in \overline{R}\), \(\overline{r}x = rx\), for all \(x\in M\).

(ii) By the correspondence theorem between ideals of \(R\) and \(\overline{R}\), it is easily seen that \(\overline{R}\) has just one maximal ideal, namely \(p\overline{R}\). Since the action of \(R\) on \(M\) is the same as the action of \(\overline{R}\) on \(M\), it follows from Corollary J that \(x_1 \ldots, x_n\in M\) is a minimal generating set for \(M\) if and only if their images in \(M/pM\) form a basis for \(M/pM\) over the field \(R/pR\). Thus, the number of elements in a minimal generating set for \(M\) as an \(R\)-module is well defined.

(iii) If \(S\) is an arbitrary commutative ring, and \(A\) is an \(S\)-module with submodules \(B_1, \ldots, B_r\) satisfying \(A = B_1+\cdots + B_r\), it is straight forward to check that \(A = B_1\oplus \cdots \oplus B_r\) if and only if whenever \(b_1+\cdots + b_r = 0\), for \(b_i\in B_i\), then \(b_i = 0\), for all \(i\). In particular, if each \(B_i = \langle x_i\rangle\), then \(A = B_1\oplus \cdots \oplus B_r\) if and only if whenever \(s_1x_1+\cdots + s_rx_r = 0\), each \(s_ix_i = 0\).

The next lemma is reminiscent of the proof of Cauchy's theorem for abelian groups.

Lemma L.

Let \(S\) be an integral domain, \(L\) an \(S\)-module, \(x\in L\) such that \(\textrm{ann}(L) = aS = \textrm{ann}(x)\). Set \(\overline{L} := L/\langle x\rangle\). Suppose \(z\in L\) satisfies \(\textrm{ann}(\overline{z}) = bS\). Then there exists \(t\in L\) such that \(\overline{t} = \overline{z}\) in \(\overline{L}\) and \(\textrm{ann}(t) = bS\).

Proof:

It is enough to find \(t\in L\) such that \(bt = 0\) and \(\overline{t} = \overline{z}\), for then \(bS\subseteq \textrm{ann}(t)\). On the other hand, for \(r\in S\), \(rt = 0\) implies \(r\overline{t} =r\overline{b} \equiv 0\) in \(\overline{L}\), so \(r \in bR\). Thus, \(\textrm{ann} (t) = bR\).

Now, \(bz = fx\), for some \(f\in S\), and since \(az = 0\), \(a\overline{z} \equiv 0\), so \(a = \gamma b\), for some \(\gamma\in S\). Thus, we have, \(0 = az = \gamma bz = \gamma fx\), which implies \(\gamma f \in \textrm{ann} (x) = aS\). Thus, \(\gamma f = \tau a\), for some \(\tau \in S\), and thus, \(\gamma f = \tau \gamma b\), showing that \(f = \tau b\). Set \(t := z-\tau x\). Then \(\overline{t} = \overline{z}\) in \(\overline{L}\). Moreover,

bt = b(z-\tau x) = bz-b\tau x = bz-fx = 0,

so \(b\in \textrm{ann}(t)\), as required. ∎

Theorem G Revisited.

Proof:

Let \(n\) denote the minimal number of generators of \(M\), which is well defined by Remark K(ii) above. We induct on \(n\) to show that \(M\) is the direct sum of \(n\) cyclic submodules with the required annihilators, the case \(n = 1\) being trivial. Now suppose \(n > 1\). Take \(0\not = x \in M\) such that \(p^{e-1}x \not = 0\). Such an \(x\) exists, otherwise \(p^{e-1}\in \textrm{ann} (M) = p^eR\), which cannot happen. Thus, \(p^ex = 0\), and it follows that \(p^e\in \textrm{ann}(x)\). On the other hand, \(\textrm{ann} (x) = cR\), for some \(c\in R\), so we can write \(p^e = rc\), for \(r\in R\). Unique factorization implies that \(c\) must be a unit times \(p^i\), for some \(i\), and this forces \(c\) to be a unit multiple of \(p^e\). Thus, \(\textrm{ann}(x) = p^eR = \textrm{ann}(M)\).

We now note that \(x\) can be extended to a minimal generating set for \(M\). Since \(\textrm{ann}(x) = p^eR\), we cannot have \(x\in pM\), otherwise \(p^{e-1}x = 0\). Thus, the image of \(x\) in the \(R/pR\) vector space \(M/pM\) is non-zero. It can therefore be extended to a basis of \(M/pM\). The pre-images of these basis elements in \(M\) form a minimal generating set for \(M\) as a module over \(R/p^eR\), by Corollary J, and hence they form a minimal generating set of \(M\) as a module over \(R\). Let us write \(x, y_2, \ldots, y_n\) for this minimal generating set. Set \(\overline{M} := M/\langle x\rangle\). Then \(\overline{M}\) is a finitely generated \(R\)-module over \(R\) and since \(p^e M = 0\), \(p^e \overline{M} = 0\), and this forces \(\textrm{ann}(\overline{M}) = p^f\), for some \(1\leq f\leq e\). Now, \(\overline{M}\) is minimally generated by \(n-1\) elements, namely, the residue classes of \(y_2, \ldots, y_n\). By induction on \(n\), there exist \(z_2, \ldots, z_n\in M\) such that \(\overline{M} = \langle \overline{z_2}\rangle \oplus \cdots \oplus \langle \overline{z_n}\rangle\), and moreover, there exist \(f = e_2\geq \cdots \geq e_n\) such that \(\textrm{ann}(\overline{z_i}) = p_i^{e_i}R\), for all \(2\leq i\leq n\). By Lemma L, there exist \(x_2, \ldots, x_n \in M\) such that \(\overline{x_i} = \overline{z_i}\) and \(\textrm{ann}(x_i) = p_i^{e_i}R\), for all \(2\leq i\leq n\). If we set \(x _1 := x\), we are done if we show \(M = \langle x_1\rangle\oplus \langle x_2\rangle \oplus \cdots \oplus \langle x_n\rangle\). For this, we must show that \(x_1, \ldots, x_n\) generate \(M\) and if \(r_1x_1+\cdots +r_nx_n = 0\), then each \(r_ix_i = 0\).

Take \(h\in M\). Then, \(\overline{h} = \sum_{i=2}^n r_i \overline{x_i}\) in \(\overline{M}\), for some \(r_i\in R\), since \(\overline{x_2}, \ldots, \overline{x_n}\) generate \(\overline{M}\). Therefore, \(h - \sum_{i=2}^n r_ix_i = r_1x_1\), for some \(r_1\in R\). It follows that \(h = \sum_{i=1}r_ix_i\), showing \(M = \langle x_1, \ldots, x_n\rangle\). In other words, \(M = \langle x_1\rangle +\cdots + \langle x_n\rangle\). Now, suppose \(r_1x_1+\cdots +r_nx_n = 0\), for \(r_i\in R\). Then, \(r_2\overline{x_2}+\cdots + r_n\overline{x_n} \equiv 0\) in \(\overline{M}\). By the direct sum property for \(\overline{M}\), each \(r_i\overline{x_i} \equiv 0\) in \(\overline{x_i}\). Thus, for each \(2\leq i\leq n\), \(r_i\in \textrm{ann} (\overline{x_i}) = \textrm{ann} (x_i)\), and hence \(r_ix_i = 0\), for \(2\leq i\leq n.\) But then \(r_1x_1 = 0\). Therefore, by Remark K (iii), \(M = \langle x_1\rangle\oplus \cdots \oplus \langle x_n\rangle\).

Finally, we have that for \(2\leq i\leq n\), \(\textrm{ann} (x_i) = p_i^{e_i}R\), with \(e_2\geq \cdots \geq e_n\). Moreover, we have \(p^e M = 0\), so \(p^e \langle x_i\rangle = 0\), for \(2\leq i\leq n\). Thus, \(p^e\in p_i^{e_i}R\), and hence \(e\geq e_i\), for all such \(i\). Setting \(p_1 := p\) and \(e_1 := e\) finishes the proof. ∎