Originally posted on towardsdatascience.
Start the new year with this inspired list of interesting code with libraries, roadmaps, and projects to bookmark
New tools that improve developer productivity are released every single day in the ever evolving domain that is Data Science. There is a sea of great projects to learn from, and even better ones get added every day in every machine learning, deep learning or even MLOps topic you look for.
Upon that thought, I wanted to get this New Year start with a blast by curating a bunch of noteworthy and inspirational code repos that I feel should help you discover new variations and interesting advancements in many topics in Data Science as you go.
For some ease of understanding, I have chosen to split these repos into aforementioned multiple topics that I believe they could be put under.
Let’s get going! 👇
Please note that this is NOT an exhaustive list of any sort of my favourite GitHub repos. It is just a list that I feel will be excellent to bring more awareness to.
Modern Classic Libraries
- Streamlit — A farily new library that has taken the world of frontend by storm by providing a easy to use Pythonic way of building frontend for ML apps. A well deserved No. 1 spot in my opinion.
- Optuna — Widely used in Kaggle notebooks and competitions, Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning tasks.
- Poetry — Frankly, this is one of the best ways to start any project on your machine. It handles dependencies, package building, as well as your virtual environments.
- SHAP — A game theoretic, friendly way to explain any ML model outputs.
- Weights and Biases — A libary to help track and visualize all parts of a ML pipeline, in development and in production.
- SQLAlchemy — It brings all the power and flexibility of SQL in Python code with ORM (if you aren’t already using Django 😉 ).
Tools you really need to know about today
- Arrow — Bringing forth a friendlier and convenient usage of date time functions in your Python code.
- Icecream — You will fall in love with logging with this simple and addictive tool to write better and more efficient logs.
- Pyenv — In my humble opinion, this is frankly the best way to install and manage different versions of your Python on your machine and projects without royally messing up your global environment. Fun fact — This was the first thing I installed on my new computer.
- Dotenv — Not enough people know about this library which is quite a travesty. This provides an extremely easy way to use secrets or environment variables in your Python code.
- Knock-Knock — A convenient way to notify yourself or your teammates with Email, Slack, Discord, etc if and when you code finishes execution.
- Imbalance-learn — A number of resampling techniques available for you to reuse in any ML/DL project.
- AutoViz — An up and coming tool for one-click visualization of any dataset.
- NeuralProphet — A newer and more advanced time series forecasting library, built on FB Prophet itself.
Data Science Roadmaps
Some great repositories dedicated to making you from good to great in all different fields of Data science.
- Machine Learning for Software Engineers —A month-wise study plan for going from a software engineer to an ML engineer.
- 100 Days of ML — A visually appealing way to master ML topics from the ground up in 100 days.
- Data Engineer Roadmap — For all my readers who are data engineering enthusiasts, here is a complete a-z roadmap to help you learn all that you need in a structured manner.
Tech Interview Preparation
- ML Interviews Book — The best open source book to help you with ML, Data Science, Data Engineer, Data Analyst or MLOps roles.
- Tech Interview Handbook — This is hands down the best free software engineering interviews guide I’ve come across.
- Cracking the Data Science Interview — This is a great collection of cheatsheets, books, questions that you might need.
- Applied ML — A production focused approach taken by companies sharing their work on data science & machine learning in production, accumulated for easy consumption in this concise repository.
- Netron — A useful tool to help visualize any simple or complex deep learning model architecture, particularly helpful for representing in a paper.
- Cheatsheets AI — A great way to learn important functions of any tool or library is by using conveniently made cheatsheets. Well, this repo has them in abundance. Use them well!
- MLFlow — Another fantastic MLOps tool in direct competition with Weights and Biases to help you track and manage your ML project lifecycles.
- ML complete — A comprehensive store of 30+ Jupyter notebooks on most topics of machine learning.
Bonus — No. 27
Data Another Day — My very own Data Science repository containing the code and articles for every project that I make and fresh libraries that I explore and write about!
A few parting words…
Thank you for reading.
I hope this curated list of repositories are helpful in whatever next Data Science venture you choose to undertake. In a couple months or so, I may choose to come back with a newer, updated list for you again. Until then, happy learning! 🙂