Python for Scientists – Directory Structures

Are you the type of person that has files saved in a variety of places? Sure, there was an initial method to the madness. But now? Files are saved in various directories and it looks like a paper-nado ravaged your folder. There is no consistency between different projects/classes. Maybe you even have a single folder to dump all files for each project/class, or use your desktop as a file organizer. A consistent directory structure will not only help you now, but will save you tons of time in the future. You are your own best collaborator.

There are multiple ways to organize projects, and readers are encouraged to determine their own optimal setup. This post will provide a suggestion based on my experience. Frameworks are useful even after school when your working in the real world! Scientists and professors can benefit from directory consistency for research, grants, and classes.

Topics covered:

  • Consistent directory structure
  • README files
  • Licenses

Click here for a list of related posts.

Updated: 2019-09-25


Consistency

Eventually we are going to create a template that can be copy/pasted for each new project. Before that, let’s talk about your options for directory structures. I will assume you like to save things in a top level directory of your user name. For Linux/Mac it would be /home/AtmoGuy/projects, or on Windows, C:\Users\AtmoGuy\projects. The /projects directory can be named whatever you prefer. Perhaps /work is better for your situation. If you’re in school it would make sense to have /research and /classes or /courses. Even if you have two separate directories as the student example above, keep things consistent!

Below is an example of a /research directory for a student

  • /home/AtmoGuy/research
    • /myFirstProject
      • /code
      • /input
      • /output
      • /plots
      • /writing
      • /presentations
      • README.md
    • /mySecondProject
      • /code
      • /input
      • /output
      • /plots
      • /writing
      • /presentations
      • README.md
    • /template
      • /code
      • /input
      • /output
      • /plots
      • /writing
      • /presentations
      • README.md

Notice the consistency? The /plots directory is optional. Figures can go into /output, but I prefer to save figures separately from data (e.g. netCDF files, csv files, etc.). Note that each project directory has a README.md file. README’s are important! The README file includes instructions for your future self (and other people) on what was going on when you were actively working on the project. Sure, you remember all details of the project you just finished. How about a few months from now when you get major revisions for your paper but you have already started working on the next project? Good luck with that. If you had a README file, then it would be quick and easy to remind yourself about important project details.

What might a class/course structure look like?

  • /home/AtmoGuy/classes
    • /2016_Fall_Synoptic
      • /Lecture1
      • /Lecture2
      • /Homework1
        • /code
        • /input
        • /output
        • /plots
        • /writing
        • /presentations
        • README.md
      • /Midterm
        • /studyGuide
      • /Final
        • /studyGuide
    • /YYYY_SSSS_CLASS
      • /Lecture1
      • /Homework1
        • /code
        • /input
        • /output
        • /plots
        • /writing
        • /presentations
        • README.md
      • /Midterm
        • /studyGuide
      • /Final
        • /studyGuide

Use the template at the bottom for each new class. Unused directories under the Homework directory can be deleted when it makes sense. Maybe you don’t want a dedicated /studyGuide folder. That’s fine. Remove it from your template. Whatever you decide to do, be consistent!

What if you are a scientist in private industry or a professor at a college/university? Consistency will help you as well! You will undoubtedly have numerous projects and having a consistent structure will make life much easier.

Whether you’re a student, private industry scientist, or college professor, consistent project frameworks will help you be more efficient. Set up a template for all of the directories you might need. If a certain directory doesn’t make sense for a specific project, delete it. Simple as that. If the project does not need code but does require collaboration on writing, copy/paste the template, rename it, and then delete everything but the writing and presentations directories. When you open the directory later you will immediately know there was no code or data were associated with the project. It was purely a writing and presentation project. I recommend keeping the README file even if a project does not involve coding. README files are extremely useful for reminding yourself about the project as mentioned above.

README

I have mentioned the README file a few times already and stressed the importance of them. But what goes in a README file? Before we go into contents, remember this one thing. Be consistent! Whatever layout/organization you settle on for README files, keep them consistent and thank yourself later.

Visit my Gitlab repo for an example template of my template directories and README file. Credit goes to makeareadme.com for most of the README template format.

License

Think of a license as a way to protect yourself against getting sued. They can be used to protect your work from theft as well, but I’m assuming most of what you do is open source. The MIT license is a good way to say, hey you can use this however you want but don’t blame me if it destroys your world. My template git repo has a MIT license on it. Do I think anybody will ever want to sue me due to damages from my template directory? No. Will I add a license anyway? Yes. Add a license to any project shared on the internet.

Sites like GitLab have an option to select a license which is why I didn’t add a specific license file to the template’s above.

BONUS

Whatever template you chose to use, you can push the template to your own git repository instead of having a only a local template copy. That way you no matter what computer you are using you can easily clone the template for a new project!

It does take a few extra steps to ensure the template is renamed and does not point back to the template repo.

$ git clone <path-to-repo>/template.git <project-name>
$ cd <project-name>
$ git remote remove origin
$ git remote add origin <path-to-new-repo>/project.git
$ git push origin master -u

Our next post will dive more into version control. Why you need it, how to use it, useful commands, and repository hosts.


Liked it? Take a second to support AtmoGuy on Patreon!
Become a patron at Patreon!

Comments are closed.

Proudly powered by WordPress | Theme: Baskerville 2 by Anders Noren.

Up ↑