Originally posted on medium
This article is the second in a series going over how to scrape Yahoo Finance for stock price history data using the Python coding language. In this lesson, you will learn about open-source repositories, security risks of external packages, and how to install the external Python libraries using the Anaconda Prompt command-line interface. You can read the other articles in the series here.
It is likely that any problem you try to solve already has a solution. You need to find it. We are lucky today in the age of the internet because we can easily search for indexed answers in Google, DuckDuckGo, or your favourite search engine.
Always look for the answer online first before you waste hours trying to invent a solution on your own. Except for learning and skill-building purposes, reinventing the wheel is not the most efficient use of your time. When you are first learning to code, take baby steps and avoid getting caught up in all the details you don’t know.
Take care before downloading code from open-source repositories
Besides Github, you can find incredible solutions for any task you are attempting to automate on PyPI.org, an accessible Python software repository. While you can find great code to implement into your projects from these sources, you must take caution before downloading code from these sites.
Often, these packages may contain malicious code, as was the case for a library named python3-dateutil, a similar name to the popular dateutil library. These open-source repositories are a hacker’s paradise. You can read more about this here.
To protect yourself, only download reputable code or code you can audit yourself for security risks. When in doubt, don’t download the code if you can’t translate the black box into a set of easily readable steps and instructions.
Understand the purpose of packages before you download
In part one, we went over how to format dates to input as arguments into a URL. To do this, we imported the datetime and time libraries. These packages give structure and format to date and time objects.
Today, we will go over how to install additional external libraries, which you will use to build a stock scraper to pull stock price history data from Yahoo Finance.
To scrape stock price history data from Yahoo Finance, you will need to install the following external packages:
- requests: An HTTP library for Python, which allows you to make requests to specified URLs and receive a response from the URL. Learn more about HTTP requests here.
- pandas: A data analysis library for Python that can read HTML tables and turn them into a list of data frames. Read more about the HTML parsing functionality here.
- lxml: Parses website data into dictionary objects from element trees or directories of information contained on a webpage. Find a more in-depth explanation here.
Defining libraries, packages, modules, methods, and functions
It may be helpful to have a basic understanding of the following vocabulary before continuing with the following lessons:
- Libraries: A collection of functions and methods to perform a group of tasks like data analysis or data visualization.
- Packages: Groups of modules combined with an __init__.py file for distribution via a software repository.
- Modules: Python files, with the extension .py, that comprise Python packages and libraries and contain methods and functions.
- Methods: Functions within class objects that can only be implemented on the class objects with which the functions are associated.
- Functions: A set of commands to perform a defined task outside of an object class, which begins with def function_name(*args, **kwargs).
You can learn more about these terms on the official Python documentation website, docs.python.org. If you are not yet familiar with object-oriented programming, you will learn more about object classes in subsequent lessons.
The trick when learning something as complex and vast as programming is to not become overwhelmed with all the information you do not know. Take things slow and understand that everything will start to click over time. You can do this!
Python coding language uses the Anaconda command-line interface
You can download and install these packages easily with Anaconda Prompt and the pip command. Your terminal window should like the picture below. Type in pip install requests pandas lxml and press the Enter on your keyboard. When you press enter, the packages will install one by one on your computer.
You can also install additional command-line interfaces like Git and Bash. With all these options, many new programmers can quickly become confused between the options. Today, the vital thing to remember is that the Anaconda Prompt will be your primary tool to install external python packages.
Pip is a Python package installer that comes pre-installed with Anaconda and will install any package hosted by the aforementioned PyPI repository. Although it shouldn’t be necessary, if you have problems using pip to install these libraries, you can learn more about installing open-source packages here.
Troubleshooting package install errors in Python
Every once in awhile, you will run across an error when installing external packages. If your code throws an error, copy and paste it into Google and see what other coders have done to solve the problem. The answer will most likely be on StackOverflow, a question and answer site for programmers.
One of the essential skills you will learn as a programmer is how to solve unfamiliar problems — on your own without calling a friend for a lifeline. That means you need to learn how to search effectively for answers online.
Even more importantly, you need to learn how to process and simplify unfamiliar concepts. Whether you have six months or ten years of experience, being able to teach yourself to solve complex problems with limited support is an essential quality of a successful coder.