有大量加密的 Excel 需要导入到数据库中,怎么办?今天就给大家介绍一个实用的 Python 包(工具)。
msoffcrypto-tool[1](以前称为 ms-offcrypto-tool)是 Python 工具和库,用于使用密码、中间密钥或生成其托管密钥的私钥解密加密的 MS Office 文件。
安装
pip install msoffcrypto-tool
例子
作为 CLI 工具(带密码)
$ msoffcrypto-tool encrypted.docx decrypted.docx -p Passw0rd
如果省略密码参数值,则会提示密码:
$ msoffcrypto-tool encrypted.docx decrypted.docx -pPassword:
测试文件是否加密(返回退出码 0 或 1):
$ msoffcrypto-tool document.doc --test -v
作为库,写程序调用
库函数支持密码和更多密钥类型。
解密文档(可以 Word, 也可以 Excel):
import msoffcryptowith open(full_path, 'rb') as office_in: # Open the file office_file = msoffcrypto.OfficeFile(office_in) # Input the password office_file = msoffcrypto.load_key(password='mypassword') # open the output with open(out_path, 'wb') as office_out: # Run decrypt. This will write to the output file. office_file.decrypt(office_out)
解密 Excel,并使用 Pandas 读取解密后的 Excel(内存中)
import msoffcryptoimport ioimport pandas as pddecrypted = io.BytesIO()with open("encrypted.xlsx", "rb") as f: file = msoffcrypto.OfficeFile(f) file.load_key(password="Passw0rd") # Use password file.decrypt(decrypted)df = pd.read_excel(decrypted)print(df)
上面代码就是我批量导入加密 Excel 的核心代码。
高级用法:
# 解密前先验证密码 (默认: False)# ECMA-376敏捷/标准加密系统允许人们在实际解密文件之前知道提供的密码是否正确。# 目前,verify_password选项仅对ECMA-376敏捷/标准加密有意义file.load_key(password="Passw0rd", verify_password=True)# 使用密钥file.load_key(private_key=open("priv.pem", "rb"))# 使用中间键, intermediate key (secretKey)file.load_key(secret_key=binascii.unhexlify("AE8C36E68B4BB9EA46E5544A5FDB6693875B2FDE1507CBC65C8BCF99E25C2562"))# 在解密前检查数据负载的HMAC(默认:False)# 目前,verify_integrity选项仅对ECMA-376敏捷加密有意义file.decrypt(open("decrypted.docx", "wb"), verify_integrity=True)
支持的加密方法
MS-OFFCRYPTO 规格
[x] ECMA-376(敏捷加密/标准加密)
[x] MS-DOCX (OOXML) (Word 2007-2016)
[x] MS-XLSX (OOXML) (Excel 2007-2016)
[x] MS-PPTX (OOXML) (PowerPoint 2007-2016)
[x] Office 二进制文档 RC4 CryptoAPI
[x] MS-DOC (Word 2002, 2003, 2004)
[x] MS-XLS(Excel 2002、2003、2004)(实验性)
[x] MS-PPT(PowerPoint 2002、2003、2004)(部分,实验性)
[x] Office 二进制文件 RC4
[x] MS-DOC(Word 97、98、2000)
[x] MS-XLS(Excel 97、98、2000)(实验性)
[ ] ECMA-376(可扩展加密)
[ ] 异或混淆
更健壮的 Office 文档解密代码
# Open the filefrom pathlib import Pathimport msoffcryptofull_path = Path('input_file.docx')out_path = Path('output_file.docx')with open(full_path, 'rb') as office_in: try: # Load it in to msoffcrypto office_file = msoffcrypto.OfficeFile(office_in) office_file.load_key(password=password) except OSError: # OSError will be thrown if you passed in a file that isn't an office file return 'not an office file' except AssertionError: # Office 97~2004 files only: # AssertionError will be thrown on load_key if the password is wrong return 'wrong password' except Exception: # xls files only: # msoffcrypto will throw a generic Exception on load_key if the file isn't encrypted return 'not encrypted' if not office_file.is_encrypted(): # Other than xls files, you can check if a file is encrypted with the .is_encrypted function return 'not encrypted' # Open your desired output as a file with open(out_path, 'wb') as office_out: try: # load_key just inputs a password; you need to call decrypt to actually decrypt it. office_file.decrypt(office_out) except error: # Office 97~2003 Only: These files aren't supported yet. # If the password is CORRECT, msoffcrypto will through a generic 'error' return 'encrypted, but decryption not supported' except Exception: # Finally, msoffcrypto will throw a generic Exception on decrypt if the password is wrong return 'wrong password' # If you want to overwrite it, you must save it separately and then move it # shutil.move(out_path, full_path)
参考资料
[1]
msoffcrypto-tool: https://github.com/nolze/msoffcrypto-tool