Skip to content

ImportError: lxml.html.clean module is now a separate project lxml_html_clean. #835

@Mark531

Description

@Mark531

Hello,

This issue has been reported several times and always "closed as not planned", but it persists.

I've just installed trafilatura (v2.0.0) but when importing it, I get the following error:

ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
Install lxml[html_clean] or lxml_html_clean directly.

I'm using python 3.10.10 and Windows (did you run your pipeline on this OS?).

Note that this warning shows during the install:

WARNING: lxml 6.0.2 does not provide the extra 'html_clean'

Installing lxml properly with this commande solves the issue:
pip install lxml_html_clean

Thanks,
Mark

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions