Refactoring 14303 Lines Of Code

I must admit, I'm just starting out with writing blog posts, so please bear with me.

Firstly, you can explore the codebase here.

I embarked on this project on July 3, 2020, and it's still a work in progress. Admittedly, the initial code was far from ideal. To say that it was an eyesore would be an understatement. I've been refactoring it periodically, but it hasn't reached perfection yet. Starting today, however, I'm on a mission to make it impeccable.

What This Project Is About

Initially, this project was conceived as a straightforward interface for devising ML & DL algorithms without the need for manual coding. The intended workflow was:

  1. Choose the desired algorithm.
  2. Import training data.
  3. Pick preprocessing steps (limited options).
  4. Select hyperparameters.
  5. Execute the algorithm.
  6. Store the model.
  7. Evaluate the model.

While this workflow remains unchanged, the underlying code is rather messy. Disturbingly, I've been recycling almost identical code for various algorithms.

My objective is to compartmentalize different sections of the interface and bifurcate the backend and frontend code.

Here's a visual representation of the current project structure:

project_structure

Issues with the Codebase

The snippet below exemplifies one of the modular parts (the train input frame) of the project. Shockingly, this code repeats in 6-8 other files within the project.

There's a pressing need to modularize and refactor this. I've opted to employ composition as a strategy, over inheritance.

On the bright side, this code is almost self-contained, which is evident upon inspection. The downside is that the backend and frontend are entwined, necessitating further alterations.

# Get Train Set
get_train_set_frame = ttk.Labelframe(self.root, text="Get Train Set")
get_train_set_frame.grid(column=0, row=0)

file_path = tk.StringVar(value="")
ttk.Label(get_train_set_frame, text="Train File Path").grid(column=0, row=0)
ttk.Entry(get_train_set_frame, textvariable=file_path).grid(column=1, row=0)
ttk.Button(
    get_train_set_frame,
    text="Read Data",
    command=lambda: self.read_train_data(file_path),
).grid(column=2, row=0)

self.input_list = tk.Listbox(get_train_set_frame)
self.input_list.grid(column=0, row=1)
self.input_list.bind("<Double-Button-1>", self.add_predictor)
self.input_list.bind("<Double-Button-3>", self.add_target)

self.predictor_list = tk.Listbox(get_train_set_frame)
self.predictor_list.grid(column=1, row=1)
self.predictor_list.bind("<Double-Button-1>", self.eject_predictor)

self.target_list = tk.Listbox(get_train_set_frame)
self.target_list.grid(column=2, row=1)
self.target_list.bind("<Double-Button-1>", self.eject_target)

ttk.Button(
    get_train_set_frame, text="Add Predictor", command=self.add_predictor
).grid(column=1, row=2)
ttk.Button(
    get_train_set_frame, text="Eject Predictor", command=self.eject_predictor
).grid(column=1, row=3)

ttk.Button(
    get_train_set_frame, text="Add Target", command=self.add_target
).grid(column=2, row=2)
ttk.Button(
    get_train_set_frame, text="Eject Target", command=self.eject_target
).grid(column=2, row=3)