Question or problem about Python programming:
I have a path which looks like
/First/Second/Third/Fourth/Fifth
and I would like to remove the First from it, thus obtaining
Second/Third/Fourth/Fifth
The only idea I could come up with is to use recursively os.path.split but this does not seem optimal. Is there a better solution?
How to solve the problem:
Solution 1:
There really is nothing in the os.path
module to do this. Every so often, someone suggests creating a splitall
function that returns a list (or iterator) of all of the components, but it never gained enough traction.
Partly this is because every time anyone ever suggested adding new functionality to os.path
, it re-ignited the long-standing dissatisfaction with the general design of the library, leading to someone proposing a new, more OO-like, API for paths to deprecated the os, clunky API. In 3.4, that finally happened, with pathlib
. And it’s already got functionality that wasn’t in os.path
. So:
>>> import pathlib >>> p = pathlib.Path('/First/Second/Third/Fourth/Fifth') >>> p.parts[2:] ('Second', 'Third', 'Fourth', 'Fifth') >>> pathlib.Path(*p.parts[2:]) PosixPath('Second/Third/Fourth/Fifth')
Or… are you sure you really want to remove the first component, rather than do this?
>>> p.relative_to(*p.parts[:2]) PosixPath('Second/Third/Fourth/Fifth')
If you need to do this in 2.6-2.7 or 3.2-3.3, there’s a backport of pathlib
.
Of course, you can use string manipulation, as long as you’re careful to normalize the path and use os.path.sep
, and to make sure you handle the fiddly details with non-absolute paths or with systems with drive letters, and…
Or you can just wrap up your recursive os.path.split
. What exactly is “non-optimal” about it, once you wrap it up? It may be a bit slower, but we’re talking nanoseconds here, many orders of magnitude faster than even calling stat
on a file. It will have recursion-depth problems if you have a filesystem that’s 1000 directories deep, but have you ever seen one? (If so, you can always turn it into a loop…) It takes a few minutes to wrap it up and write good unit tests, but that’s something you just do once and never worry about again. So, honestly, if you don’t want to use pathlib
, that’s what I’d do.
Solution 2:
A bit like another answer, taking advantage of os.path :
os.path.join(*(x.split(os.path.sep)[2:]))
… assuming your string starts with a separator.
Solution 3:
A simple approach
a = '/First/Second/Third/Fourth/Fifth' "/".join(a.strip("/").split('/')[1:])
output:
Second/Third/Fourth/Fifth
In this above code i have split the string. then joined leaving 1st element
Using itertools.dropwhile
:
>>> a = '/First/Second/Third/Fourth/Fifth' >>> "".join(list(itertools.dropwhile(str.isalnum, a.strip("/"))[1:]) 'Second/Third/Fourth/Fifth'
Solution 4:
You can try:
os.path.relpath(your_path, '/First')
Solution 5:
I was looking if there was a native way to do it, but it seems it doesn’t.
I know this topic is old, but this is what I did to get me to the best solution:
There was two basically two approaches: using split() and using len(). Both had to use slicing.
1) Using split()
import time start_time = time.time() path = "/folder1/folder2/folder3/file.zip" for i in xrange(500000): new_path = "/" + "/".join(path.split("/")[2:]) print("--- %s seconds ---" % (time.time() - start_time))
Result: — 0.420122861862 seconds —
*Removing the char “/” in the line new_path = “/” + “/”…. didn’t improve the performance too much.
2) Using len(). This method will only work if you provide the folder if you would like to remove
import time start_time = time.time() path = "/folder1/folder2/folder3/file.zip" folder = "/folder1" for i in xrange(500000): if path.startswith(folder): a = path[len(folder):] print("--- %s seconds ---" % (time.time() - start_time))
Result: — 0.199596166611 seconds —
*Even with that “if” to check if the path starts with the file name, it was twice as fast as the first method.
In summary: each method has a pro and con. If you are absolutely sure about the folder you want to remove use method two, otherwise I recommend to use method 1 which people here have mentioned previously.