Replies: 1 comment
-
|
Hi @catmanjan, yes, developers do use the Open XML SDK for large scale text extraction. The OOXML SDK is accurate, because it reads the xml directly, so anything it extracts comes directly from the file's XML parts. Performance wise, if you have millions of Word files to extract from, I recommend using the SAX method to extract the text to avoid out of memory issues. Here are some samples of how to use the SAX API: Replace Text in a Word Document Using SAX (Simple API for XML), Copy a Worksheet Using SAX (Simple API for XML). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Does anyone use Open XML SDK to do large scale text extraction?
I mean running it over millions of word files to get the text out of it? How does it perform? Is it accurate?
Beta Was this translation helpful? Give feedback.
All reactions