Python Regex matching only last occurrence while using re.findall

I am observing a strange behavior, while parsing texts from a html file using python regex. Would greatly appreciate your suggestions on regex which I should use.

string = "<a href="https://academia/course/3743">3743</a>, <a href="https://academia/course/3963">3963</a>,    <a href="https://academia/course/3850">3850</a>,"
# I want to extract 3743, 3963, 3850 from the above text
pattern = r".*?<a href=".*">([0-9]+)</a>,.*"
result = re.findall(pattern, string)
print(result)

# Output
['3850']

It is printing only the last occurence and leaving out rest. I tried following this as well, but it doesn’t help
python findall finds only the last occurrence

Can anybody please help with the regex I should use to get all the numbers

# expected output
[3743, 3963, 3850]

PS: I can’t use any other python modules like bs4. I need to stick with native python modules.

You need to sign in to view this answers

Related Post