Download v3.06  76mb
Download Link

Menu Links

Buy / Register
Upgrade from v1.xx/v2.xx
Trial Restrictions
Full Features List
Known Issues/Bugs
Video Examples
Knowledge Base

PDF Data Extractor - Extract text data from any pdf to csv, excel, fdf, xml files


What's new?


3.06 28/10/2024

1. fix for crash issue when no output file. now defaults to desktop when this is set.

2. added [FILETITLE] to output filename option. e.g. C:\myfiles\[FILETITLE].csv

3.05 20/09/2024

1. added rule with a range match for -20 or +20 the values added in the exact match.
2. added skip lines output in setup.

3.04 25/01/2023

1. added multi word text match for option "if last exact data match then h>=(n) && h<=(n),v>=(n) && v<=(n) match output (join + add space)"

e.g. match text: "Account Number:" would automatically use last match and one before, it can now have upto 3 words with a space to match on.

before it was limited to one word. e.g. "Number:"

2. fix for file menu-->"save as" memory issue in 32bit version.

3.03 09/09/2022

1. updated custom pattern match for 'a' alpha to check a-z, A-Z
2. fixed a settings issue.
3. updated custom match for use in multi output methods.
4. added custom part match e.g. match on nn-nnn would match 12-345 but also anything taken afterwards in the word e.g. 12-345E.1
5. changed limit outputs for multi output from 200 to 800.
6. added XX-XXX to pattern match for alpha or numeric.
7. added column adjustment for (multi) and line feed with extra fields used.

3.02 22/11/2021

1. fix for log move viewer title.
2. fix for permissions issues and monitor setup.
3. fix for permission log issues.
4. fixed an issue with batch process list from menu.
5. fixed process button status when loading new setup.
6. fixed a read only not closing file issue when permissions redirected to desktop output.
7. added clear all option to batch list menu.
8. fix close buttons in log viewers when scaled.
9. added F1 help to monitor.
10. fix for when adding multiple files to batch.
11. fix for enterprise monitor - report to one option.

3.01 08/10/2021

1. fix for until text match, now case & non-case sensitive.
2. added new output option: if last exact data match output all after until match (across all pages), also ignores rules beyond rule page for it to work, must have start and end match to output.
3. added drag and drop .pde files to load them automatically in the source file(s) box.
4. added extras until text match now can use multiple words to match on with a pipe delimiter e.g. Account|Statement
5. changed text limit per column to 4096

3.00 20/08/2021

1. added smart setup highlight in adobe for a quick and easy setup.
2. added email search match output e.g.
3. added Upper, Lower & Smart Upper-Lower to extras
4. added Telephone number output numbers detection within a text e.g. tel: +nn nnn nnnnn or tel: +n (nnn) nnnnnn, 20+ different permutations detected.
5. added two pattern matching options e.g. input data is like Account: AA12345-6789 you can take AA12345-6789 with pattern match anywhere aannnnn-nnnn
6. improved offset match, can now full text match, first part match, [a] or [n] or [an] for first alpha or numeric match or either.
7. added stop to process button for stopping at any point during the processing.
8. improved File Save, to show saved when not default file.
9. added [COL(n)] or [F(n)] to file naming, for using the column data extracted into the filename e.g. c:\report[F1].csv or multiple e.g. c:\report[F1]-[F2][F3].csv
10. added lookup file match to extras, so now can substitute comma separated data with other data e.g. 123,Canon EOS 500 will data match code '123' with output 'Canon EOS 500'
11. added font name and font size to list.
12. added font name / size matching, so now you can extract e.g. any bold font match in an area or font name and or size.
13. added yellow color in list to extract data for easier viewing.
14. added smart setup color from pdf to list. e.g. if data highlight note is set as red in adobe pdf then this color is used in the list.
15. added quick header in header setup, tries to guess data titles on data & setup with one button click.
16. added quick example loading from file menu.
17. added presets for popular requests and common bills / invoices / bank statements e.g. BT, EE, MBNA, Barclays, Virgin Money etc etc
18. added xls and xlsx output support.
19. added xls / xlsx multi-sheet output support. can have up to 5 sheets.
20. added output (multi) (join + add space) for easier repeated lines of multiple words setup.
21. fix for saveas issue when clicking on a file then changing it, won't save. also pre-populates save to filename as one loaded.
22. fixed issue with view pdf when location of pdf changed / typed not picking up change.
23. added FDF output form file support. e.g. output as fdf to import into another pdf template or batch print.
24. added XML output support.
25. added start at page option, so now can process last page e.g. enter 999 as start or skip a page by entering e.g. 2, 0 is for start at the beginning.
26. fixed issues with multiple rules per page for different outputs.
27. added highlight rule exact match, e.g. IFPOSMATCH: 1 for if rule "h,v exact match data: 1" rule on that page h,v analysis.
28. added IFPOSHMATCH: n and IFPOSVMATCH: n to highlight setup rules.
29. added AFTER2 <data> <data> in highlight setup for after two words of data. note: AFTER <data> is for after last word.
30. added excel sums etc to output header e.g. Vat=SUM(E[LINE]/100*20) would output header Vat, then do =SUM(E2/100*20) for line 2 in the output.
31. fixed sort not working.
32. added light red color to rule matches for easy viewing.

2.03 24/06/2021

1. fix for long header, now allows upto 1024 characters rather than 255.

2. added landscape mode option.

3. added add text before match in extras. e.g. match on invoice number, then add PO- text before it.

4. added h>=(%d) & h<=(%d),v>=(%d) & v<=(%d) match output (multi) (join + add space)

5. fix for extra output line feed when no data match.

6. fix for more than 100 lines data.

2.02 15/02/2021

1. fix for scan ocr issue.

2. fix for output all on one line and batch, now adds line feed after each file automatically.

3. some improvements to one line processing per file and positions.

4. fix for potential line feed problem.

2.01 25/02/2020

1. added if last exact data match then output h,v range joined with space.

2. added h>=(n) && h<=(n),v>=(n) && v<=(n) match output (last one) (join + add space)

3. added take until text option in filter. so can stop at certain text to take e.g. 'test 123' take only 'test'

2.00 - 24/02/2020

1. changed c:\default.lst to default.lst

2. when no text on analyze show link to video:

3. added add multiple pdf files to batch list option.

4. added pre OCR first option.

5. fixed no output, when no output matched wasn't outputing line to csv.

6. fixed number contraints for negative in input boxes. now -9999 to 9999

7. added alpha/numeric option filter.

8. fixed a money value figure not working correctly.

9. fix for delete key on multiple conditions.

10. verification fix for money filter, should have , or . when above 999. otherwise probably wrong data so blank it.

11. fix for , in money value only.

12. added clear all settings option in setup.

13. fix - when drag and drop file, it clears output path.

14. fix for date formats when . in date e.g. 20.12.19 instead of slash /.

15. added 2000 to date output if below 1900 e.g. 20.12.19 to 20/12/2019

16. fix for joined dates e.g. 20.12.1 then 9 later fix, now ouptut's correctly.

17. now checks for bad date more than 10 characters then blanks it. so can pickup correct one later.

18. added numeric filter fixing option e.g. O to 0, l to 1, S to 5, B to 8 etc etc. if need any others then please let us know.

19. now when outputing in spreadsheet mode i.e. multi rows - you need to use the (multi) output option.

20. changed so can add line feed after other output extras option. e.g. filter number fixing.

1.05 - 22/02/2017

1. added sortation.

2. added customize link email.

1.04 - 17/07/2012

1. added row adjust for floating row sizes, match on h text

2. changed text dump to include all fonts, embedded and subset fonts

1.03 - 15/02/2012

1. fix for dialog resize

2. fixed some joining (join) parameter issues.

3. added column option

4. added rows options and size of rows on same page

1.02 - 16/11/2011

1. added find box and button

2. if last-1 exact data match option to output

3. if last-2 exact data match option to output

4. if last-3 exact data match option to output

5. if last-3 && if last-3 exact data match option to output

6. added rename / copy script options, for batch renaming files on data extraction

7. added batch rename to dos options -b option

8. added page stop option


1.01 - 01/10/2009

1. fix for command line rules missing

1.00 - 03/06/2009

first release