Hello,
This issue has been reported several times and always "closed as not planned", but it persists.
I've just installed trafilatura (v2.0.0) but when importing it, I get the following error:
ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
Install lxml[html_clean] or lxml_html_clean directly.
I'm using python 3.10.10 and Windows (did you run your pipeline on this OS?).
Note that this warning shows during the install:
WARNING: lxml 6.0.2 does not provide the extra 'html_clean'
Installing lxml properly with this commande solves the issue:
pip install lxml_html_clean
Thanks,
Mark
Hello,
This issue has been reported several times and always "closed as not planned", but it persists.
I've just installed trafilatura (v2.0.0) but when importing it, I get the following error:
I'm using python 3.10.10 and Windows (did you run your pipeline on this OS?).
Note that this warning shows during the install:
Installing lxml properly with this commande solves the issue:
pip install lxml_html_cleanThanks,
Mark