Only extracting text from this element, not its children

I’d like to be able to retrieve the current node’s text:

markup = "<h1>that<span>but not that</span>and that</h1>"
soup = Soup(markup, "html.parser")
assert soup.find_all(string=True, recursive=False) == ["that", "and that"]
# returns []

Other solutions I found only work when you know the node type, for example doing:

assert soup.h1.find_all(string=True, recursive=False) == ["that", "and that"]
# returns ["that", "and that"]

But in this case, I do not know that it’s a h1, I’d like it to be type-agnostic. I looked for something like soup.self, and tried soup[soup.name] in case it was stored as a dictionary, but not luck there.

These solutions either use soup.h1, which is not node-type agnostic, or use soup.find_all(string=True, recursive=False), which returns in my case []. Not sure if BeautifulSoup updated the behavior here.

You need to sign in to view this answers

Related Post