I recently purchased a set of programming books published by No Starch Press on Humble Bundle.
Immediately, I got annoyed. The downloaded files had short names, not as descriptive (but concise) as I would have liked. So, I thought, “let’s write a small python script to rename them”, and I did.
The script:
"""
A simple python script to rename and sort PDF/EPUB files.
These books were purchased and downloaded from Humble Bundle (No Starch Press)
Script written on 2021-March-28
"""
# Import modules
import pandas as pd
import os
import shutil
def rename_sort(main_dir, namelist=None):
"""Rename and sort EPUB/PDF into each respective folder
"""
# Dump list of books into book list
for root, dirs, files in os.walk(main_dir, onerror=True):
_book_list = []
for _book in files:
_book_list.append(_book)
# Read namelist as Pandas DataFrame
_namelist = pd.read_excel(namelist, engine="odf")
# Main loop to rename and sort files to each respective type
for _book_ext in _book_list:
_book, _ext = os.path.splitext(_book_ext)
_full_name = _namelist.query(" short_name == @_book ")["full_name"].to_list()[0]
if _ext == ".epub":
shutil.copy(f"{main_dir}/{_book_ext}", f"epub/{_full_name}{_ext}")
elif _ext == ".pdf":
shutil.copy(f"{main_dir}/{_book_ext}", f"pdf/{_full_name}{_ext}")
Quick explanations:
- The
pandas
package is loaded because I would like to use a list of short names (as downloaded) and full names (the ones I would like to rename the books to). I could have used pythondictionary
to achieve this, but I was like “nah, let’s use a spreadsheet to make life easier”. - The
os
standard library is used to run.walk(dir)
method withonerror
parameter set toTrue
so that it does not fail silently. At first, I tried.rename(src, dst)
method, but it ended up moving my files. I wanted to copy, which I deemed a safer approach if I did something extremely wrong. - The
shutil
standard library, because I needed to use the.copy(src, dst)
method.
This is how the namelist
(ODF spreadsheet) looks like:
I saved that script as resorter.py
. Here’s the folder structure:
.
├── epub/
├── pdf/
├── unsorted/
│ ├── book1.epub
│ ├── book1.pdf
│ ├── book2.epub
│ └── book2.pdf
├── names.ods
├── resorter.py
└── renamer.ipynb
And on a Jupyter notebook, here are the codes:
# Import module
from resorter import *
# Run
rename_sort(main_dir="unsorted", namelist="names.ods")
Caveat: the epub
and pdf
directories must be present prior running this code, because these directories are hardcoded within the script.
Otherwise, it worked.