October 26, 2024

Chicago 12, Melborne City, USA

python

Python Regex matching only last occurrence while using re.findall

by admin
October 26, 2024
0 Comments
Less than a minute
1 View
8 hours ago

I am observing a strange behavior, while parsing texts from a html file using python regex. Would greatly appreciate your suggestions on regex which I should use.

string = "<a href="https://academia/course/3743">3743</a>, <a href="https://academia/course/3963">3963</a>,    <a href="https://academia/course/3850">3850</a>,"
# I want to extract 3743, 3963, 3850 from the above text
pattern = r".*?<a href=".*">([0-9]+)</a>,.*"
result = re.findall(pattern, string)
print(result)

# Output
['3850']

It is printing only the last occurence and leaving out rest. I tried following this as well, but it doesn’t help
python findall finds only the last occurrence

Can anybody please help with the regex I should use to get all the numbers

# expected output
[3743, 3963, 3850]

PS: I can’t use any other python modules like bs4. I need to stick with native python modules.

You need to sign in to view this answers

Leave feedback about this Cancel Reply

PROS

Add Field

CONS

Add Field

Upload Image

Choose Image

Upload Video

Choose Video

External Video Link

Review anonymously

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Python Regex matching only last occurrence while using re.findall

Leave feedback about this Cancel Reply

PROS

CONS

Categories

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP

Recent Posts

Postgres drop type XX000 “cache lookup failed for type”

Login servlet app with session and cookies

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Follow Us

Python Regex matching only last occurrence while using re.findall

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP