Caspershire Meta

Aizan's Secondary Blog

Organizing files with small python script

Posted on 28 Mar 2021

I recently purchased a set of programming books published by No Starch Press on Humble Bundle.

Immediately, I got annoyed. The downloaded files had short names, not as descriptive (but concise) as I would have liked. So, I thought, “let’s write a small python script to rename them”, and I did.

The script:

"""
A simple python script to rename and sort PDF/EPUB files.
These books were purchased and downloaded from Humble Bundle (No Starch Press)
Script written on 2021-March-28
"""

# Import modules
import pandas as pd
import os
import shutil

def rename_sort(main_dir, namelist=None):
    """Rename and sort EPUB/PDF into each respective folder
    """

    # Dump list of books into book list
    for root, dirs, files in os.walk(main_dir, onerror=True):
        _book_list = []
        for _book in files:
            _book_list.append(_book)

    # Read namelist as Pandas DataFrame
    _namelist = pd.read_excel(namelist, engine="odf")

    # Main loop to rename and sort files to each respective type
    for _book_ext in _book_list:
        _book, _ext = os.path.splitext(_book_ext)
        _full_name = _namelist.query(" short_name == @_book ")["full_name"].to_list()[0]
        
        if _ext == ".epub":
            shutil.copy(f"{main_dir}/{_book_ext}", f"epub/{_full_name}{_ext}")
        elif _ext == ".pdf":
            shutil.copy(f"{main_dir}/{_book_ext}", f"pdf/{_full_name}{_ext}")

Quick explanations:

This is how the namelist (ODF spreadsheet) looks like:

sheet_example

I saved that script as resorter.py. Here’s the folder structure:

.
├── epub/
├── pdf/
├── unsorted/
│   ├── book1.epub
│   ├── book1.pdf
│   ├── book2.epub
│   └── book2.pdf
├── names.ods
├── resorter.py
└── renamer.ipynb

And on a Jupyter notebook, here are the codes:

# Import module
from resorter import *

# Run
rename_sort(main_dir="unsorted", namelist="names.ods")

Caveat: the epub and pdf directories must be present prior running this code, because these directories are hardcoded within the script.

Otherwise, it worked.