Tutorials and examples

In this page are collected a bunch of examples that will show you the correct use of pdf4py.

PDF objects count

Suppose we want to know how many PDF (in use) objects are in a PDF file. Let’s use the following snippet to find it out.

>>> import pdf4py.parser
>>> fp = open('tests/pdfs/0000.pdf', 'rb')
>>> parser = pdf4py.parser.Parser(fp)
>>> all_xref_entries = list(parser.xreftable)
>>> len(all_xref_entries)
119
>>> for x in all_xref_entries[:10]:
...     print(x)
...
XrefInUseEntry(offset=15, object_number=1, generation_number=0)
XrefInUseEntry(offset=525989, object_number=2, generation_number=0)
XrefInUseEntry(offset=63, object_number=3, generation_number=0)
XrefInUseEntry(offset=60167, object_number=4, generation_number=0)
XrefInUseEntry(offset=285, object_number=5, generation_number=0)
XrefInUseEntry(offset=38737, object_number=6, generation_number=0)
XrefInUseEntry(offset=36091, object_number=7, generation_number=0)
XrefInUseEntry(offset=21676, object_number=8, generation_number=0)
XrefInUseEntry(offset=4102, object_number=9, generation_number=0)
XrefInUseEntry(offset=5162, object_number=10, generation_number=0)

The special method __iter__ called on xreftable returns a generator over the in-use and compressed objects references. To know how many of them there are one must iterates over the generator until it is exhausted. This is what list does: to collect all entries.