Question or problem about Python programming:
I have the following url:
url = http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg
I would like to extract the file name in this url: 09-09-201315-47-571378756077.jpg
Once I get this file name, I’m going to save it with this name to the Desktop.
filename = **extracted file name from the url** download_photo = urllib.urlretrieve(url, "/home/ubuntu/Desktop/%s.jpg" % (filename))
After this, I’m going to resize the photo, once that is done, I’ve going to save the resized version and append the word “_small” to the end of the filename.
downloadedphoto = Image.open("/home/ubuntu/Desktop/%s.jpg" % (filename)) resize_downloadedphoto = downloadedphoto.resize.((300, 300), Image.ANTIALIAS) resize_downloadedphoto.save("/home/ubuntu/Desktop/%s.jpg" % (filename + _small))
From this, what I am trying to achieve is to get two files, the original photo with the original name, then the resized photo with the modified name. Like so:
09-09-201315-47-571378756077.jpg
09-09-201315-47-571378756077_small.jpg
How can I go about doing this?
How to solve the problem:
Solution 1:
You can use urllib.parse.urlparse
with os.path.basename
:
import os from urllib.parse import urlparse url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg" a = urlparse(url) print(a.path) # Output: /kyle/09-09-201315-47-571378756077.jpg print(os.path.basename(a.path)) # Output: 09-09-201315-47-571378756077.jpg
Solution 2:
os.path.basename(url)
Why try harder?
In [1]: os.path.basename("https://example.com/file.html") Out[1]: 'file.html' In [2]: os.path.basename("https://example.com/file") Out[2]: 'file' In [3]: os.path.basename("https://example.com/") Out[3]: '' In [4]: os.path.basename("https://example.com") Out[4]: 'example.com'
Note 2020-12-20
Nobody has thus far provided a complete solution.
A URL can contain a ?[query-string]
and/or a #[fragment Identifier]
(but only in that order: ref)
In [1]: from os import path In [2]: def get_filename(url): ...: fragment_removed = url.split("#")[0] # keep to left of first # ...: query_string_removed = fragment_removed.split("?")[0] ...: scheme_removed = query_string_removed.split("://")[-1].split(":")[-1] ...: if scheme_removed.find("/") == -1: ...: return "" ...: return path.basename(scheme_removed) ...: In [3]: get_filename("a.com/b") Out[3]: 'b' In [4]: get_filename("a.com/") Out[4]: '' In [5]: get_filename("https://a.com/") Out[5]: '' In [6]: get_filename("https://a.com/b") Out[6]: 'b' In [7]: get_filename("https://a.com/b?c=d#e") Out[7]: 'b'
Solution 3:
filename = url[url.rfind("/")+1:] filename_small = filename.replace(".", "_small.")
maybe use “.jpg” in the last case since a . can also be in the filename.
Solution 4:
You could just split the url by “/” and retrieve the last member of the list:
url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg" filename = url.split("/")[-1] #09-09-201315-47-571378756077.jpg
Then use replace
to change the ending:
small_jpg = filename.replace(".jpg", "_small.jpg") #09-09-201315-47-571378756077_small.jpg
Solution 5:
Use urllib.parse.urlparse
to get just the path part of the URL, and then use pathlib.Path
on that path to get the filename:
from urllib.parse import urlparse from pathlib import Path url = "http://example.com/some/long/path/a_filename.jpg?some_query_params=true&some_more=true#and-an-anchor" a = urlparse(url) a.path # '/some/long/path/a_filename.jpg' Path(a.path).name # 'a_filename.jpg'
Solution 6:
Sometimes there is a query string:
filename = url.split("/")[-1].split("?")[0] new_filename = filename.replace(".jpg", "_small.jpg")
Solution 7:
Python split url to find image name and extension
helps you to extract the image name. to append name :
imageName = '09-09-201315-47-571378756077' new_name = '{0}_small.jpg'.format(imageName)
Solution 8:
We can extract filename from a url by using ntpath module.
import ntpath url = 'http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg' name, ext = ntpath.splitext(ntpath.basename(url)) # 09-09-201315-47-571378756077 .jpg print(name + '_small' + ext) 09-09-201315-47-571378756077_small.jpg
Solution 9:
With python3 (from 3.4 upwards) you can abuse the pathlib
library in the following way:
from pathlib import Path p = Path('http://example.com/somefile.html') print(p.name) # >>> 'somefile.html' print(p.stem) # >>> 'somefile' print(p.suffix) # >>> '.html' print(f'{p.stem}-spamspam{p.suffix}') # >>> 'somefile-spamspam.html'