I’d like to be able to retrieve the current node’s text:
markup = "<h1>that<span>but not that</span>and that</h1>"
soup = Soup(markup, "html.parser")
assert soup.find_all(string=True, recursive=False) == ["that", "and that"]
# returns []
Other solutions I found only work when you know the node type, for example doing:
assert soup.h1.find_all(string=True, recursive=False) == ["that", "and that"]
# returns ["that", "and that"]
But in this case, I do not know that it’s a h1, I’d like it to be type-agnostic. I looked for something like soup.self
, and tried soup[soup.name]
in case it was stored as a dictionary, but not luck there.
These solutions either use soup.h1
, which is not node-type agnostic, or use soup.find_all(string=True, recursive=False)
, which returns in my case []
. Not sure if BeautifulSoup updated the behavior here.
You need to sign in to view this answers
Leave feedback about this