PSK's blog

PSK的学习研究经验分享站

华中科技大学2023年代数学作业-练习7-大整数分解

给定,试用数域筛法对其进行分解,要求写出详细过程并附代码。

对应文件:Quadratic.ipynb

运行时间:约20~30分钟,单线程。

注意,本题使用的是二次筛法而非数域筛法,有点偏题了,各位慎重参考。能力有限,我还是不太能理解数域筛法最后的线性方程组求解步骤。

这部分的代码很大部分来自于MattInglisWhalen/FastFactor,遵循 MIT 版权协议。

我使用了它的代码,并利用sympy减少了它的代码复杂度。

初始化

选择勒让德符号为1的素数,并确定一个素数的上界。
对于所有的选定素数,都满足:

注意到,由于代码限制,2的勒让德符号的结果也为1,受到API限制只能手动添加。

这是为了能让在mod p的情况有解所设立的先决条件

为了后文计算方便,预先生成的解以及所有选定素数的log对数,前者存在递推公式(Hensal Lifting)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from math import log, exp, sqrt, floor, gcd
from sympy import primerange, prevprime, nextprime
from sympy.ntheory.residue_ntheory import legendre_symbol, sqrt_mod
from tqdm.notebook import tqdm, trange

n = 1234268228312430759578090015472355712114804731217710966738223
M = 234722
bound = 2 * int(exp(0.5 * sqrt(0.707 * log(n) * log(log(n)))))
prime_max = prevprime(bound + 1)
thresh = log(M * sqrt(2*n), 10) - log(prime_max, 10) * 2
# filter out too small primes
factor_base = [2] + [p for p in primerange(3 , prime_max + 1) if legendre_symbol(n, p) == 1]

# calc sqrt_mod(n, p^i) & log(p)
sieve_primes = []
mod_root = {}
log_p = {}
for prime in factor_base:
log_p[prime] = log(prime,10)
if prime > 2: # int(2 * thresh):
sieve_primes.append(prime)
r = sqrt_mod(n, prime)
roots = [r]
q = prime * prime
while q < M:
# find p^n via Hensal Lifting
r = (r - (r * r - n) * pow(r + r, -1, q)) % q
assert (r * r) % q == n % q
roots.append(r)
q *= prime
mod_root[prime] = roots

bound, len(factor_base), len(sieve_primes), factor_base[:10], sieve_primes[:10]
(117408,
 5463,
 5462,
 [2, 11, 17, 23, 29, 37, 41, 47, 53, 59],
 [11, 17, 23, 29, 37, 41, 47, 53, 59, 61])

确定多项式参数

构造二次函数$y(a) = (Aa + B)^2 - n = A2a2 + 2ABa + B^2 - nx = Aa+By(a) \equiv (Aa+B)^2 \equiv x^2 (mod\ N)$

现在需要寻找一组,其连乘积为一个完全平方数。

不妨令 ,可以写作

$$y(a) = A2a2 + 2ABa + AC = A(Aa^2 + 2Ba + C) = AQ(a)$$

不妨再令,q为素数且n与q的勒让德符号为1。这样只需要去凑就行了

为了使计算过程简便,我们自然希望尽可能小, 不妨将a的搜索区间长度M的中间设定为的极小值点处,其极小值点的绝对值为:

且Q(a)在搜索曲线 上的最大值与极小值点的绝对值之差(极小值点往往在x轴下方,取绝对值之后函数从极小值点下降,再上升)。我们不希望Q(a)太大,不妨令尽可能小甚至为0:

$$ A2M2 = 2N q^2 = A = \frac{\sqrt{2N}}{M} $$

令q逼近即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
ideal_A = int(sqrt(2 * n) / M)
ideal_q = int(sqrt(ideal_A))
def q_generator(ideal_q, cnt = 100000000):
lower = prevprime(ideal_q)
upper = nextprime(ideal_q)
for _ in range(cnt):
if ideal_q - lower < upper - ideal_q:
yield lower
lower = prevprime(lower)
else:
yield upper
upper = nextprime(upper)
int(ideal_A), int(ideal_q), [prime * prime for prime in q_generator(int(ideal_q),10)]
(6693695863116716427968512,
 2587217784245,
 [6693695863092909627946081,
  6693695863341282535235521,
  6693695862875583334071601,
  6693695863393026890921401,
  6693695863486166731156489,
  6693695863517213344568329,
  6693695862678954782473921,
  6693695863558608829117561,
  6693695862585814942244449,
  6693695862461628488606161])

不妨对模A,这样式子变为了,可以通过算法解出这个同余式,得到B的值

进一步的,我们可以根据获得C的值,解出构造的二次函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# A = next(A_generator(ideal_A, 10))
def Q_generator(ideal_q):
A,B,C = 0,0,0
def Q(a):
return A * a ** 2 + 2 * B * a + C
for q in q_generator(ideal_q):
A = q * q
b = sqrt_mod(n, A)
if b is None:
continue
B = (b + (n - b*b) * pow(b+b, -1, q)) % A
C = (B * B - n) // A
yield q,A,B,C,Q

next(Q_generator(ideal_q))
(2587217784241,
 6693695863092909627946081,
 2266262553726809679453608,
 -184392636527257014150946233403311439,
 <function __main__.Q_generator.<locals>.Q(a)>)

二次筛与高斯方程求解

然后,进行二次筛。我们使用一种类似于埃拉托斯特尼筛法的方法对搜索范围内的所有Q(a)进行搜索。这部分的筛选方法可参照维基百科中二次筛选法下借由筛选来筛查光滑度一小节所解释。与维基百科不同的是,我们在筛选过程中刨除了部分较小素数,因为这些较小素数往往的模值很小,需要对很多的Q(a)的对应项加lg(p)。而且,临界值设定为一个比维基百科小得多的值。

完成二次筛筛选后,我们进入分解阶段。分解阶段会有两种情况:
1)这个数是平滑的——我们可以直接把它加入平滑数列表
2)这个数有很多光滑的值,但留下来一个非常大的无法分解的数:
这个情况我们会把这个无法分解的数保留下来。如果在后续的过程中发现了同样的不可分解的数。我们可以让这两个Q(a)乘起来,这样这个不可分解的数就凑成了平方数。其余的平滑素数结合起来,形成一个新的所谓的平滑数,加入到平滑数列表。

生成了足够多的数(这里设定为了平滑数的数量超过所使用的素数数量——这样后面的线性方程有解),我们就可以解一个线性方程。我们把这些平滑数的因子变为一个列向量,把奇数次因子置1(也就是要去凑的因子),把所有的平滑数组成一个use_primenum_smooth列的矩阵A,解x是一个非0的列向量。解的每一个位置若为1,则选择这个数作为后面连乘的一个平滑数。这个连乘的平滑数的各个因子的奇偶性对应着A的某一行乘以x的和。根据线性方程我们很容易知道连乘的平滑数的值正好就是一个完全平方数,我们计。我们可以使用高斯消元法来解出x,假设我们选择的每个平滑数为 ,有:





我们获得了形的式子!在接下来只需要计算下式,就很有可能计算出N的一个非平凡因子

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
cnt = 0
poly_cnt = 0
used_prime = set()
smooth = []
partial = {}
num_partial = 0
num_smooth = 0
num_used_prime = 0
print("Thresh: ",thresh)

def list_prod(a):
while len(a) > 1:
a = [m * n for m, n in zip(a[::2], a[1::2] + [1])]
return a[0]

with tqdm(Q_generator(ideal_q), total=len(factor_base)+1, desc="Finding No.0 Q(a)") as pbar:
for q,A,B,C,Q in Q_generator(ideal_q):
poly_cnt += 1
pbar.set_description(f"Finding No.{poly_cnt} Q(a)")

# seive init
fudge = sum([log_p[factor_base[i]] // (factor_base[i] - 1) for i in range(len(factor_base) - len(sieve_primes))])
# print(fudge)
V = [fudge for _ in range(-M, M)]
# V = {a:0 for a in range(-M, M)}
# sieve for prime factors
for prime in sieve_primes:

lg_p = log_p[prime]

e = 0 # mod prime^(e+1)
pe = prime # value of prime^(e+1)

while pe < M:
inv_A = pow(A, -1, pe)

sroot_a = ((mod_root[prime][e] - B) * inv_A) % pe
sroot_b = ((pe - mod_root[prime][e] - B) * inv_A) % pe

# print(mod_root[prime][e], pe - mod_root[prime][e], sqrt_mod(n, pe, True))

# assert Q(sroot_a) % pe == 0
# assert Q(sroot_b) % pe == 0

amx = sroot_a
bmx = sroot_b

apx = sroot_a - pe
bpx = sroot_b - pe

for delta in range(pe, M, pe):
V[apx + delta] += lg_p
V[bpx + delta] += lg_p
V[amx - delta] += lg_p
V[bmx - delta] += lg_p

e += 1
pe *= prime # goto p^e+1
# check for smooths
for a in range(-M,M):
if V[a] < thresh:
continue
factor_vec = set()
factor_square = []
value = Q(a)

if value < 0:
factor_vec.add(-1)
value = -value

for prime in factor_base:
if prime > value:
break
while value % prime == 0:
if prime in factor_vec:
# it's a square factors! mark it.
factor_square.append(prime)
factor_vec ^= {prime}
value //= prime

if value == 1:
# A Smooth Number
# print(f"A Smooth Number found: {Q(a)}: with factors {factor_vec}")
smooth.append((factor_vec, (factor_square, (A * a + B), q)))
used_prime |= factor_vec
pbar.set_postfix({"total_smooth":len(smooth), "partial": num_partial, "partial_wait_list": len(partial), "used_prime": len(used_prime)})
pbar.update()
elif value in partial:
# combine two partials to make a xor smooth
# if we find the other pair, we can apply a xor to it and make the multipy as a smooth number
# it's ok for one partial pairing with more than one partial
pair_vec, pair_vals = partial[value]
factor_square.extend(list(factor_vec & pair_vec))
factor_square.append(value)
factor_vec ^= pair_vec
# print(f"A partial Smooth Number found: {Q(a)}: with factors {factor_vec}")
smooth.append((factor_vec, (factor_square + pair_vals[0], (A*a + B) * pair_vals[1], q * pair_vals[2])))
used_prime |= factor_vec
num_partial += 1
pbar.set_postfix({"total_smooth":len(smooth), "partial": num_partial, "partial_wait_list": len(partial), "used_prime": len(used_prime)})
pbar.update()
else:
# save partial
partial[value] = (factor_vec, (factor_square, A*a + B, q))

prev_num_smooth = num_smooth
num_smooth = len(smooth)
num_used_prime = len(used_prime)

if num_smooth > num_used_prime and num_smooth > prev_num_smooth:
# set up bit fields for gaussian elimination
masks = []
mask = 1
bit_fields = [0] * num_used_prime
for vec, vals in smooth:
masks += [mask]
i = 0
for p in used_prime:
if p in vec:
bit_fields[i] |= mask
i += 1
mask += mask

# row echelon form
col_offset = 0
null_cols = []
for col in range(num_smooth):
pivot = col - col_offset == num_used_prime or bit_fields[col - col_offset] & masks[col] == 0
for row in range(col + 1 - col_offset, num_used_prime):
if bit_fields[row] & masks[col]:
if pivot:
bit_fields[col - col_offset], bit_fields[row] = bit_fields[row], bit_fields[
col - col_offset]
pivot = False
else:
bit_fields[row] ^= bit_fields[col - col_offset]
if pivot:
null_cols += [col]
col_offset += 1

# reduced row echelon form
for row in range(num_used_prime):
# lowest set bit
mask = bit_fields[row] & -bit_fields[row]
for up_row in range(row):
if bit_fields[up_row] & mask:
bit_fields[up_row] ^= bit_fields[row]

# check for non-trivial congruencies
for col in null_cols:
all_vec, (lh, rh, rA) = smooth[col]
lhs = lh # sieved values (left hand side)
rhs = [rh] # sieved values - n (right hand side)
rAs = [rA] # root_As (cofactor of lhs)
i = 0
for field in bit_fields:
if field & masks[col]:
vec, (lh, rh, rA) = smooth[i]
lhs += list(all_vec & vec) + lh
all_vec ^= vec
rhs += [rh]
rAs += [rA]
i += 1

factor = gcd(list_prod(rAs) * list_prod(lhs) - list_prod(rhs), n)
if 1 < factor < n:
break
else:
print('none found.')
continue
break

print('factors found:')
print(factor, 'x', n // factor)

# cnt
Thresh:  25.42751876372442
factors found:
1231768382920384328173284372769 x 1002029476829174312039487628367

分解出来的截图

结论

经过6536组Q(a)多项式,5321个平滑数(其中2820个是凑大“质数”凑出来的),耗时26分钟,我们找到了n的因子,它是: