Hello,
I'm using pypdf to fill out a for and generate a printable pdf. Everything works fine, expcept when I use unicode strings. The text apprears corrupted in the output pdf, regardless of the pdf viewer I use. I tried Adobe Reader, SumatraPdf and Brave.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
# Windows-10-10.0.26100-SP0
$ python -c "import pypdf;print(pypdf._debug_versions)"
# pypdf==5.7.0, crypt_provider=('cryptography', '45.0.5'), PIL=11.3.0
Code + PDF
This is a minimal, complete example that shows the issue:
from io import BytesIO
import pypdf
from pypdf.generic import NameObject, NumberObject, BooleanObject, IndirectObject
import pypdf.generic
import pypdf.types
data = {
"subsemnatul": "Σὲ γνωρίζω ἀπὸ τὴν κόψη",
"cnp_cui": "123456789",
"localitatea": "Comuna Roșia-Nouă",
"strada": "Căpitan Nicolae Licăreț",
"adresa_nr": "12",
"adresa_bl": "A",
"adresa_sc": "1",
"adresa_et": "5",
"adresa_ap": "123",
"adresa_judet": "Конференция",
}
# https://stackoverflow.com/a/55302753
def fill_with_pypdf(file, data):
"""
Used to fill PDF with PyPDF.
To fill, PDF form must have field name values that match the dictionary keys
:param file: The PDF being written to
:param data: The data dictionary being written to the PDF Fields
:return:
"""
with open(file, "rb") as input_stream:
pdf_reader = pypdf.PdfReader(input_stream)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
pdf_reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = pypdf.PdfWriter(pdf_reader)
# alter NeedAppearances
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
if "/AcroForm" in writer._root_object:
# Acro form is form field, set needs appearances to fix printing issues
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
# loop over all pages
for page_num in range(len(pdf_reader.pages)):
# writer.add_page(pdf_reader.pages[page_num])
page = writer.pages[page_num]
# loop over annotations, but ensure they are there first...
if page.get('/Annots'):
# update field values
writer.update_page_form_field_values(page, data, auto_regenerate=False)
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].get_object()
# flatten all the fields by setting bit position to 1
# use loop below if only specific fields need to be flattened.
writer_annot.update({
NameObject("/Ff"): NumberObject(1) # changing bit position to 1 flattens field
})
output_stream = BytesIO()
#lock fields
permissions = pypdf.constants.UserAccessPermissions(
pypdf.constants.UserAccessPermissions.PRINT |
pypdf.constants.UserAccessPermissions.PRINT_TO_REPRESENTATION |
pypdf.constants.UserAccessPermissions.EXTRACT_TEXT_AND_GRAPHICS |
pypdf.constants.UserAccessPermissions.EXTRACT
)
writer.encrypt(user_password="", owner_password="my-secret-password", algorithm="AES-256", use_128bit=False, permissions_flag=permissions)
writer.write(output_stream)
writer.set_need_appearances_writer(True)
return output_stream.getvalue()
out = fill_with_pypdf("forms/CERERE INMATRICULARE form.pdf", data)
with open("output_pypdf.pdf", "wb") as f:
f.write(out)
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
output_pypdf.pdf
CERERE INMATRICULARE form.pdf
Traceback
This is the complete traceback I see:
# TODO: Your traceback goes here (if applicable)
Hello,
I'm using pypdf to fill out a for and generate a printable pdf. Everything works fine, expcept when I use unicode strings. The text apprears corrupted in the output pdf, regardless of the pdf viewer I use. I tried Adobe Reader, SumatraPdf and Brave.
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
output_pypdf.pdf
CERERE INMATRICULARE form.pdf
Traceback
This is the complete traceback I see: