python

Only extracting text from this element, not its children

by admin
October 23, 2024
0 Comments
Less than a minute
1 View
6 hours ago

I’d like to be able to retrieve the current node’s text:

markup = "<h1>that<span>but not that</span>and that</h1>"
soup = Soup(markup, "html.parser")
assert soup.find_all(string=True, recursive=False) == ["that", "and that"]
# returns []

Other solutions I found only work when you know the node type, for example doing:

assert soup.h1.find_all(string=True, recursive=False) == ["that", "and that"]
# returns ["that", "and that"]

But in this case, I do not know that it’s a h1, I’d like it to be type-agnostic. I looked for something like soup.self, and tried soup[soup.name] in case it was stored as a dictionary, but not luck there.

These solutions either use soup.h1, which is not node-type agnostic, or use soup.find_all(string=True, recursive=False), which returns in my case []. Not sure if BeautifulSoup updated the behavior here.

You need to sign in to view this answers

Leave feedback about this Cancel Reply

PROS

Add Field

CONS

Add Field

Upload Image

Choose Image

Upload Video

Choose Video

External Video Link

Review anonymously

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Only extracting text from this element, not its children

Leave feedback about this Cancel Reply

PROS

CONS

Categories

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP

Recent Posts

Postgres drop type XX000 “cache lookup failed for type”

PostgreSQL how to merge rows where some fields match and others are null

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Follow Us

Only extracting text from this element, not its children

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP