OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Python: Get filename from URL with wild cards [closed]

  • Thread starter Thread starter megalamehaxxor
  • Start date Start date
M

megalamehaxxor

Guest
This code fetches a file from ARCH_URL. However, there is a manifest file on the same page, which will always have the current ARCH_FILE with the current timestamp in the filename. How could I read that manifest to make sure I'm always fetching the current file of ARCH_TYPE?

Code:
import re
import requests
from urllib.request import urlopen
from pathlib import Path

ARCH_TYPE = "stage3-amd64-desktop-systemd"
ARCH_URL = "https://distfiles.gentoo.org/releases/amd64/autobuilds/current-{0}/".format(ARCH_TYPE)
ARCH_FILE = "{0}-20240609T164903Z.tar.xz".format(ARCH_TYPE)
FILEPATH = Path(ARCH_FILE)
SRC_URL = "https://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-desktop-systemd/{0}".format(ARCH_FILE)

# Get current stage3 filename:
with urlopen(ARCH_URL) as response:
    html_response = response.read()
    encoding = response.headers.get_content_charset('utf-8')
    decoded_html = html_response.decode(encoding)
sub_str = '{0}-.*.tar.xz'.format(ARCH_TYPE)
temp = re.compile(sub_str)
res = temp.search(decoded_html)
# The result includes the .tar.xz.asc immediately followed by the .tar.xz.
# So we need to weed out the .tar.xz.asc filename and the double-quote and angle bracket that come after it.
# The following line is: result[substring-starts-with-angle-bracket-plus-one:the-entire-result-length-to-the-end]
print("The substring match is : " + str(res.group(0))[str(res.group(0)).find('>')+1:len(str(res.group(0)))])
<p>This code fetches a file from ARCH_URL. However, there is a manifest file on the same page, which will always have the current ARCH_FILE with the current timestamp in the filename. How could I read that manifest to make sure I'm always fetching the current file of ARCH_TYPE?</p>
<pre><code>import re
import requests
from urllib.request import urlopen
from pathlib import Path

ARCH_TYPE = "stage3-amd64-desktop-systemd"
ARCH_URL = "https://distfiles.gentoo.org/releases/amd64/autobuilds/current-{0}/".format(ARCH_TYPE)
ARCH_FILE = "{0}-20240609T164903Z.tar.xz".format(ARCH_TYPE)
FILEPATH = Path(ARCH_FILE)
SRC_URL = "https://distfiles.gentoo.org/releases/amd64/autobuilds/current-stage3-amd64-desktop-systemd/{0}".format(ARCH_FILE)

# Get current stage3 filename:
with urlopen(ARCH_URL) as response:
html_response = response.read()
encoding = response.headers.get_content_charset('utf-8')
decoded_html = html_response.decode(encoding)
sub_str = '{0}-.*.tar.xz'.format(ARCH_TYPE)
temp = re.compile(sub_str)
res = temp.search(decoded_html)
# The result includes the .tar.xz.asc immediately followed by the .tar.xz.
# So we need to weed out the .tar.xz.asc filename and the double-quote and angle bracket that come after it.
# The following line is: result[substring-starts-with-angle-bracket-plus-one:the-entire-result-length-to-the-end]
print("The substring match is : " + str(res.group(0))[str(res.group(0)).find('>')+1:len(str(res.group(0)))])
</code></pre>
 
Top