Top Python libraries 2020


Python has the following three characteristics:
·  Ease of use and flexibility
·  Industry-wide high acceptance: Python is undoubtedly the industry's most popular data scientific language
·  The number of advantages for data science Python library
In fact, because there are many types of Python libraries, it is very difficult to keep up with their development speed. Therefore, this article introduces 24 Python libraries covering the end-to-end data science life cycle.
It mentions libraries for data cleansing, data manipulation, visualization, building models, and even model deployment (among other uses). This is a fairly comprehensive list to help you start your data science journey with Python.
Top Python libraries for different data science tasks 2020
Top Python library for data collection 2020
·  Beautiful Soup
·  Scrapy
·  The Selenium
Top Python library for data cleaning and data manipulation 2020
·  Pandas
·  PyOD
·  NumPy
·  Spacy
Top Python library for data visualization 2020
·  Matplotlib
·  Seaborn
·  Bokeh
Top Python libraries for modeling 2020
Scikit-learn
TensorFlow
· PyTorch
Top Python library for model interpretation 2020
·  Lime
·  H2O
Top Python library for speech processing 2020
·  Librosa
·  Madmom
·  PyAudioAnalysis
Top Python library for image processing 2020
·  OpenCV-Python
·  Scikit-Image
·  Pillow
Top Python library as database 2020
·  Psycopg
·  SQLAlchemy
Top Python library for model deployment 2020
·  The Flask
Top Python library for data collection 2020
Have you ever encountered a situation like this: lack of data to solve the problem? This is an eternal problem in data science. This is why learning to extract and collect data is a very important skill for data scientists. Data extraction and collection has opened an unprecedented path.
Here are three Python libraries for extracting and collecting data:
Beautiful Soup
Portal: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
One of the best ways to collect data is to crawl the website (of course, ethically and legally!) Doing this by hand takes a lot of labor and time. Beautiful Soup is undoubtedly a savior.
Beautiful Soup is an HTML and XML parser that creates a parse tree for the parsed page and is used to extract data from web pages. The process of extracting data from a web page is called web scraping.
Use the following code to install BeautifulSoup:
pip install beautifulsoup4
Here is a simple code for Beautiful Soup that extracts all anchor tags from HTML:
#!/usr/bin/python3# Anchor extraction from html documentfrom bs4 import BeautifulSoupfrom urllib.request import urlopen  with urlopen('LINK') as response:    soup = BeautifulSoup(response, 'html.parser')    for anchor in soup.find_all('a'):        print(anchor.get('href', '/'))
I recommend reading the following article to learn how to use Beautiful Soup in Python:
Beginner's Guide: Using BeautifulSoup for Web Crawling in Python Portal: https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/
Scrapy
Portal: https://docs.scrapy.org/en/latest/intro/tutorial.html
Scrapy is another Python library that can be effectively used for web scraping. It is an open source collaboration framework for extracting required data from websites. It's quick and easy to use.
Here is the code for installing Scrapy:
pip install scrapy
Scrapy is a framework for large-scale web scraping. Provide all the tools you need to effectively grab data from your website, process it as needed, and store it in the structure and format your users prefer.
Here is a simple code that implements Scrapy:
import scrapy  class Spider(scrapy.Spider):    name = 'NAME'    start_urls = ['LINK']      def parse(self, response):        for title in response.css('.post-header>h2'):            yield {'title': title.css('a ::text').get()}          for next_page in response.css('a.next-posts-link'):            yield response.follow(next_page, self.parse
Here is a great tutorial to learn Scrapy and implement Scrapy in Python:
"Scrapy web scraping in Python (with multiple examples)" portal: https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/
Selenium
Portal: https://www.seleniumhq.org/
Selenium is a popular automated browser tool. It is often used for testing in the industry, but it is also very convenient for web scraping. Selenium is very popular in IT.
It is easy to write a Python script to automate a web browser using Selenium. It allows data to be extracted efficiently and freely and stored in a preferred format for later use.
Article about grabbing YouTube video data using Python and Selenium:
Data Science Project: Grab YouTube Data Using Python and Selenium to Classify Videos Portal: https://www.analyticsvidhya.com/blog/2019/05/scraping-classifying-youtube-video-data-python-selenium /
Best Python library for data cleaning and data manipulation 2020
Once the data is collected, it's time to clean up any confusing data you may face and learn how to manipulate the data so that it is ready for modeling.
Here are four Python libraries that can be used for data cleaning and data manipulation. Keep in mind that the article only states that structured (numeric) data and textual data (unstructured) are handled in the real world-and that library list covers everything.
Pandas
Portal: https://pandas.pydata.org/pandas-docs/stable/
In terms of data manipulation and data analysis, Pandas is invincible. Pandas was once the most popular Python library. Pandas is written in Python and is mainly used for data manipulation and data analysis.
The name comes from the term "panel data". "Panel data" is an econometric term that refers to a data set that contains the observations of the same person over multiple time periods.
Pandas is pre-installed in Python or Anaconda, but just in case, the installation code is as follows:
pip install pandas
Pandas has the following characteristics:
·  Datasets and merging connection
·  Delete and insert data structure columns
·  Data filtering
·  Reshape the data set
·  Use DataFrame objects to manipulate data, etc.
Here is an article and a great cheatsheet that will help bring Pandas skills to the mark:
"12 Useful Pandas Techniques for Data Manipulation in Python" Portal: https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/
CheatSheet: Data Exploration with Pandas in Python Portal: https://www.analyticsvidhya.com/blog/2015/07/11-steps-perform-data-analysis-pandas-python/
PyOD
Portal: https://pyod.readthedocs.io/en/latest/
Hard to find outliers? This is by no means an exception. Don't worry, the PyOD library is here.
PyOD is a comprehensive, scalable Python toolkit for detecting peripheral objects. Outlier detection basically identifies rare items or observations that differ significantly from most data.
The following code can be used to download pyOD:
pip install pyod
How does PyOD work? How to implement PyOD? The following guide will answer all questions about PyOD:
"A great tutorial for learning to use PyOD library to detect outliers in Python" portal: https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/
NumPy
Portal: https://www.numpy.org/
Like Pandas, NumPy is also a very popular Python library. NumPy introduces functions that support large multi-dimensional arrays and matrices, as well as advanced mathematical functions to handle these arrays and matrices.
NumPy is an open source library with multiple contributors. Numpy is pre-installed in Anaconda and Python, but just in case, the following is the installation code:
$ pip install numpy
Here are some basic functions that can be performed using NumPy:
Create array
import numpy as npx = np.array([1, 2, 3])print(x)y = np.arange(10)print(y)
output - [1 2 3]         [0 1 2 3 4 5 6 7 8 9]
Basic operation
a = np.array([1, 2, 3, 6])b = np.linspace(0, 2, 4)c = a - bprint(c)print(a**2)
output - [1. 1.33333333 1.66666667 4. ]         [ 1 4 9 36]
And much more!
SpaCy
Portal: https://spacy.io/
How to clean up data and deal with numerical data has been discussed so far. But what if you are dealing with text data? So far, existing libraries have not been able to solve this problem.
Spacy is a very useful and flexible natural language processing (NLP) library and framework for cleaning up text documents that create models. Compared to other libraries for similar purposes, SpaCy is faster.
Install Spacy in Linux:
pip install -U spacypython -m spacy download en
To install Spacy on other operating systems, please click: https://spacy.io/usage
Here are the courses to learn spaCy:
Simplified Natural Language Processing-Using SpaCy (in Python) portal: https://www.analyticsvidhya.com/blog/2017/04/natural-language-processing-made-easy-using-spacy-%e2%80% 8bin-python/
Best Python library for data visualization 2020
What's next? data visualization! The assumptions here have been verified and hidden ideas and patterns have been discovered.
Here are three great Python libraries for data visualization.
Matplotlib
Portal: https://matplotlib.org/
Matplotlib is the most popular data visualization library in Python. Allows generation and construction of various graphs. Matplotlib is the author's preferred library and can be used with Seaborn for data visualization research.
Here is the code to install Matplotli:
$ pip install matplotlib
Here are some examples of different types of diagrams built using Matplotlib:
Histogram
%matplotlib inlineimport matplotlib.pyplot as pltfrom numpy.random import normalx = normal(size=100)plt.hist(x, bins=20)plt.show()
3D chart
from matplotlib import cmfrom mpl_toolkits.mplot3d import Axes3Dimport matplotlib.pyplot as pltimport numpy as npfig = plt.figure()ax = fig.gca(projection='3d')X = np.arange(-10, 10, 0.1)Y = np.arange(-10, 10, 0.1)X, Y = np.meshgrid(X, Y)R = np.sqrt(X**2 + Y**2)Z = np.sin(R)surf = ax.plot_surface(X, Y, Z, rstride=1,cstride=1, cmap=cm.coolwarm)plt.show()
Now that Pandas, NumPy, and Matplotlib have been introduced, then please check the following tutorial, which combines the three libraries to explain:
The Ultimate Guide to Data Exploration in Python Using NumPy, Matplotlib, and Pandas Portal: https://www.analyticsvidhya.com/blog/2015/04/comprehensive-guide-data-exploration-sas-using-python- numpy-scipy-matplotlib-pandas /
Seaborn
Portal: https://seaborn.pydata.org/
Seaborn is another plotting library based on matplotlib. It is a python library that provides a high-level interface for drawing attractive images. Matplotlib can do this, Seaborn just implements it in another more attractive visual way.
 Some features of Seaborn:
·  As a dataset-oriented API, it can be used to check the relationship between multiple variables
·  Easy to see the overall structure of complex data sets
·  Select the display mode palette data tools
The following line of code can be used to install Seaborn:
pip install seaborn
Browse these cool charts to see what seaborn can do:
import seaborn as snssns.set()tips =sns.load_dataset("tips")sns.relplot(x="total_bill",y="tip", col="time",            hue="smoker",style="smoker", size="size",            data=tips);
Here is another example:
import seaborn as snssns.catplot(x="day",y="total_bill", hue="smoker",            kind="violin",split=True, data=tips);
Bokeh
Portal: https://bokeh.pydata.org/en/latest/
Bokeh is an interactive visualization library for modern web browsers that provides beautiful, general-purpose graphical structures for large data sets.
Bokeh can be used to create interactive drawing, dashboard, and data applications.
installation:
pip install bokeh
Learn more about Bokeh and its practical applications:
Interactive data visualization using Bokeh (in Python) portal: https://www.analyticsvidhya.com/blog/2015/08/interactive-data-visualization-library-python-bokeh/
Best Python libraries for modeling 2020
Now comes the most anticipated part of this article-modeling! This is why most people are exposed to data science in the first place.
Let's explore modeling with these three Python libraries.
Scikit-learn
Portal: https://scikit-learn.org/stable/
Just like Pandas for data manipulation and matplotlib for visualization, scikit-learn is the best in the Python build model. Nothing can match it.
In fact, scikit-learn is built on top of NumPy, SciPy and matplotlib. It is open source, accessible to everyone, and can be reused in various environments.
Scikit-learn supports different operations performed in machine learning, such as classification, regression, clustering, and model selection. Name it-then scikit-learn will have a module.
It is recommended to browse the following links to learn more about scikit-learn:
"Scikit-learn in Python-the most important machine learning tool I learned last year! 》 Portal: https://www.analyticsvidhya.com/blog/2015/01/scikit-learn-python-machine-learning-tool/
TensorFlow
Portal: https://www.tensorflow.org/
Developed by Google, TensorFlow is a popular deep learning library that helps build and train different models. Is an open source end-to-end platform. TensorFlow provides simple model building, powerful machine learning production, and powerful experimental tools and libraries.
TensorFlow provides multiple levels of abstraction, which can be selected as needed. TensorFlow uses advanced Keras APIs to build and train models, which makes getting started with TensorFlow and machine learning easy.
Install portal: https://www.tensorflow.org/install
Using TensorFlow starts by reading these articles:
TensorFlow 101: Understanding Tensors and Images to Start Deep Learning Portal: https://www.analyticsvidhya.com/blog/2017/03/tensorflow-understanding-tensors-and-graphs/
"Starting deep learning in R with Keras and TensorFlow" portal:
https://www.analyticsvidhya.com/blog/2017/06/getting-started-with-deep-learning-using-keras-in-r/
PyTorch
Portal: https://pytorch.org/
What is PyTorch? In fact, this is a scientific calculation package based on Python, and its functions are as follows:
·  NumPy alternatives that use the power of the GPU
·  Deep learning research platform with maximum flexibility and fastest speed
Installation Guide Portal: https://pytorch.org/get-started/locally/
PyTorch provides the following features:
·  Hybrid headend
·  Tools and libraries: the R & D personnel and active community has built a rich ecosystem of tools and libraries for extending PyTorch and support the development in the field of computer vision and enhanced learning
·  Cloud Support: PyTorch support to run on major cloud platform, through pre-built images, large-scale training of the GPU, as well as the ability to run the model in production scale environment, can provide frictionless ease of development and expansion
Here are two very detailed and easy-to-understand articles on PyTorch:
"PyTorch Introduction-A Simple but Powerful Deep Learning Library" Portal: https://www.analyticsvidhya.com/blog/2018/02/pytorch-tutorial/
"Getting Started with PyTorch-Learning How to Build Fast and Accurate Neural Networks (Taking 4 Case Studies as an Example)" Portal: https://www.analyticsvidhya.com/blog/2019/01/guide-pytorch-neural- networks-case-studies /
Best Python library for data interpretation 2020
Do you really understand how the model works? Can you explain why the model can produce results? These are questions that every data scientist can answer. Building a black box model is useless in the industry.
Therefore, the two Python libraries already mentioned above can help explain the performance of the model.
LIME
Portal: https://github.com/marcotcr/lime
LIME is an algorithm (library) that can interpret the predictions of any classifier or regression. How does LIME do it? This model interpreter can be used to generate an interpretation of any classification algorithm, with an interpretable model continuously approaching the predicted value locally.
Installing LIME is simple:
pip install lime
The following will help develop the intuition and model interpretability behind LIME in general:
"Building trust in machine learning models (using LIME in Python)" portal: https://www.analyticsvidhya.com/blog/2017/06/building-trust-in-machine-learning-models/
H2O
Portal: https://github.com/h2oai/mli-resources
I believe many people have heard of H2O.ai, the market leader in automated machine learning. But did you know that it also has a model interpretability library in Python?
H2O's unmanned AI provides simple data visualization techniques to represent highly feature interactions and non-linear model behaviors, and provides machine learning interpretability (MLI) through visualization, explaining the modeling results and the impact of features in the model.
Read below to read more about H2O's autonomous AI execution MLI.
Machine Learning Interpretability Portal: https://www.h2o.ai/wp-content/uploads/2018/01/Machine-Learning-Interpretability-MLI_datasheet_v4-1.pdf
Best Python library for audio processing 2020
Audio processing or audio analysis refers to the extraction of information and meaning from audio signals for analysis, classification, or any other task. This is becoming a popular feature in deep learning, so keep an eye on this.
LibROSA
Portal: https://librosa.github.io/librosa/
LibROSA is a Python library for music and audio analysis. It provides the building blocks needed to create a music information retrieval system.
Installation Guide Portal: https://librosa.github.io/librosa/install.html
This is an in-depth article on audio processing and how it works:
"Using Deep Learning to Start Audio Data Analysis (with Case Study)" Portal: https://www.analyticsvidhya.com/blog/2017/08/audio-voice-processing-deep-learning/
Madmom
Portal: https://github.com/CPJKU/madmom
Madmom is a great Python library for audio data analysis. It is an audio signal processing library written in Python and is mainly used for music information retrieval (MIR) tasks.
The following are the prerequisites for installing Madmom:
·  NumPy
·  SciPy
·  Cython
·  Mido
The following packages are used for test installation:
·  PyTest
·  Fyaudio
·  PyFftw
Code to install Madmom:
pip install madmom
The following can be used to understand how Madmom is used for music information retrieval:
Audio Beat Tracking for Learning Music Information Retrieval (using Python code) Portal: https://www.analyticsvidhya.com/blog/2018/02/audio-beat-tracking-for-music-information-retrieval/
pyAudioAnalysis
Portal: https://github.com/tyiannak/pyAudioAnalysis
pyAudioAnalysis is a Python library for audio feature extraction, classification, and segmentation, covering a wide range of audio analysis tasks, such as:
·  Unknown voice classification
·  Detect faults and eliminate audio recording long periods of silence
·  Segmentation of supervised and unsupervised
 Extract audio thumbnails, etc.
You can use the following code to install:
pip install pyAudioAnalysis
Best Python library for image processing 2020
If you want to be successful in the data science industry, you must learn how to use image data. As systems are able to collect more and more data (mainly thanks to advances in computing resources), image processing is becoming more and more ubiquitous.
Therefore, make sure you are familiar with at least one of the following three Python libraries.
OpenCV-Python
Portal: https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_setup/py_intro/py_intro.html
When it comes to image processing, OpenCV first came to mind. OpenCV-Python is a Python API for image processing that combines the best features of the OpenCV C ++ API with the Python language. It is mainly used to solve computer vision problems.
OpenCV-Python uses NumPy mentioned above. All OpenCV array structures are converted to and from NumPy arrays. It also makes it easier to integrate with other libraries that use Numpy, such as SciPy and Matplotlib.
Install OpenCV-Python on your system:
pip3 install opencv-python
Here are two popular tutorials on how to use OpenCV in Python:
"Deep learning-based video face detection model establishment (Python implementation)" portal: https://www.analyticsvidhya.com/blog/2018/12/introduction-face-detection-video-deep-learning-python/
"16 OpenCV functions start computer vision journey (using Python code)" portal: https://www.analyticsvidhya.com/blog/2019/03/opencv-functions-computer-vision-python/
Scikit-image
Portal: https://scikit-image.org/
Scikit-image is another python library for image processing. It is a collection of algorithms for performing multiple different image processing tasks. Can be used for image segmentation, geometric transformation, color space manipulation, analysis, filtering, morphology, feature detection, and more.
Before installing scikit-image, please install the following packages:
·  Python (> = 3.5)
·  NumPy (> = 1.11.0)
·  SciPy (> = 0.17.0)
·  Joblib (> = 0.11)
This is how to install scikit-image on the machine:
pip install -U scikit-learn
Pillow
Portal: https://pillow.readthedocs.io/en/stable/
Pillow is a new version of PIL (Python Imaging Library). It is derived from PIL and is used as a replacement for the original PIL in some Linux distributions such as Ubuntu.
Pillow provides several standard procedures for performing image processing:
·  -By-pixel operations
·  Mask and transparent process
·  Image filter, such as blurring, profile, or edge smoothing Monitoring
·  Image enhancement, sharpening e.g., adjust the brightness, contrast or color
·  Add on the image text, etc.
Install Pillow:
pip install Pillow  
Check out the following AI comics about using Pillow in computer vision:
AI Comic: ZAIN-Issue 2: Facial Recognition Using Computer Vision Portal:
https://www.analyticsvidhya.com/blog/2019/06/ai-comic-zain-issue-2-facial-recognition-computer-vision/
Python libraries for databases
Learning how to store, access, and retrieve data from a database is an essential skill for data scientists. But how do you model without first retrieving the data?
Next, introduce two SQL-related Python libraries.
psycopg
Portal: http://initd.org/psycopg/
Psycopg is the most popular PostgreSQL (Advanced Open Source Code Relational Database) adapter in the Python programming language. The core of Psycopg is to fully implement the Python DB API 2.0 specification.
The current psycopg2 implementation supports:
·  Python version 2.7
·  Python 3 version (3.4 to 3.7)
·  PostgreSQL server version (7.4 to 11)
·  PostgreSQL client library version (9.1 and above)
Here's how to install psycopg2:
pip install psycopg2
SQLAlchemy
Portal: https://www.sqlalchemy.org/
SQL is the most popular database language. SQLAlchemy is a pythonSQL toolkit and object-relational mapper. It provides application developers with all the features of SQL and is extremely flexible.
SQL is designed to enable efficient and high-performance database access. SQLAlchemy treats the database as a relational algebra engine, not just a collection of tables.
To install SQLAlchemy, you can use the following line of code:
pip install SQLAlchemy  
Best Python libraries for deployment 2020
Do you know which models are deployed? Deploying a model means putting the final model into the final application (technically called a production environment).
Flask
Portal: http://flask.pocoo.org/docs/1.0/
Flask is a web framework written in Python that is widely used to deploy data science models. Flask consists of two parts:
·  Werkzeug: utility library Python programming language
·  Jinja: Python template engine
View the following example to print "Hello world":
from flask import Flaskapp = Flask(__name__)@app.route("/")def hello():    return "HelloWorld!"if __name__ == "__main__":    app.run()
The following articles are a good start to learning Flask:
"Tutorial for deploying machine learning models as APIs in production (using Flask)" portal: https://www.analyticsvidhya.com/blog/2017/09/machine-learning-models-as-apis-using-flask /