Product description: Machine learning in Python. Recipes. From data preparation to deep learning. 2nd edition - Kyle Gallatin, Chris Albon
The second edition of the book Machine Learning in Python , contains over 200 proven recipes , which are based on the latest editions of Python libraries. They contain ready-made codes , which can be adapted to your needs. The book presents ready-made examples on working with data in many formats, databases and data stores, as well as many other tips that can be useful when solving a spectrum of problems, from preparing and loading data, to training models and using neural networks . The publication is suitable for people who want to implement machine learning algorithms in practice.
A few words about the authors
Kyle Gallatin is a software engineer who works on Etsy’s machine learning platform and has also worked as a data scientist, data analyst, and machine learning engineer.
Dr. Chris Albon has been a data scientist and political scientist for many years. He currently works for Devoted Health and was also the lead data scientist at the Kenyan startup BRCK.
Machine Learning - Recipes
Machine learning is one of the most interesting and dynamically developing areas of information technology. The book contains information on working with data in many formats, databases and data stores. It also discusses dimensionality reduction techniques and methods for evaluating and selecting models. The publication contains recipes for such topics as linear and logistic regression, decision trees and random forests, as well as k-nearest neighbor algorithms, support vector machines (SVM), naive Bayes classification and clustering.
Information about the book
- Original title: Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning, 2nd Edition
- Authors: Kyle Gallatin, Dr. Chris Albon
- Translation: Robert Górczyński
- ISBN: 978-83-289-0811-6
- Release year: 2024
- Size: 165 x 235 mm
- Binding: softcover
- Number of pages: 398
- Publisher: Helion S.A.
Contents
Introduction
1. Vector, matrix and array
- 1.0 Introduction
- 1.1. Creating a Vector
- 1.2. Creating a matrix
- 1.3. Creating a Sparse Matrix
- 1.4. NumPy Array Preallocation
- 1.5. Downloading items
- 1.6. Describing Matrices
- 1.7. Performing Operations on All Elements
- 1.8. Finding the Maximum and Minimum Values
- 1.9. Calculating Mean, Variance, and Standard Deviation
- 1.10. Changing the shape of the board
- 1.11. Transposing a Vector or Matrix
- 1.12. Matrix Flattening
- 1.13. Finding the Rank of a Matrix
- 1.14. Getting the diagonal of a matrix
- 1.15. Calculating the trace of a matrix
- 1.16. Calculating the dot product
- 1.17. Adding and Subtracting Matrices
- 1.18. Matrix Multiplication
- 1.19. Inverting a Matrix
- 1.20. Random Number Generation
2. Loading data
- 2.0 Introduction
- 2.1. Loading a Sample Dataset
- 2.2. Creating a Simulated Data Set
- 2.3. Loading a CSV file
- 2.4. Loading an Excel file
- 2.5. Loading a JSON file
- 2.6. Loading a Parquet file
- 2.7. Loading an Avro file
- 2.8. Querying an SQLite Database
- 2.9. Querying a Remote SQL Database
- 2.10. Loading data from Google Sheets
- 2.11. Loading data from S3 bucket
- 2.12. Loading unstructured data
3. Data preparation
- 3.0 Introduction
- 3.1. Creating a Data Frame
- 3.2. Describing Data
- 3.3. Moving around the data frame
- 3.4. Fetching rows based on certain conditions
- 3.5. Sorting values
- 3.6. Replacing Values
- 3.7. Changing the column name
- 3.8. Finding the Min, Max, Sum, Average, and Count of Items in a Column
- 3.9. Finding Unique Values
- 3.10. Handling Missing Values
- 3.11. Deleting Columns
- 3.12. Deleting a row
- 3.13. Removing duplicate rows
- 3.14. Grouping Rows by Value
- 3.15. Grouping Rows by Time
- 3.16. Aggregating operations and statistics
- 3.17. Iterating Through a Column
- 3.18. Calling a function for all elements of a column
- 3.19. Calling a function for a group
- 3.20. Concatenation of DataFrame objects
- 3.21. Combining DataFrame objects
4. Handling numeric data
- 4.0 Introduction
- 4.1. Rescaling a feature
- 4.2. Standardizing a feature
- 4.3. Normalizing observations
- 4.4. Generating Polynomial Features and Interactions
- 4.5. Feature Transformation
- 4.6. Outlier Detection
- 4.7. Handling Outliers
- 4.8. Feature discretization
- 4.9. Grouping observations using a cluster
- 4.10. Removing observations with missing values
- 4.11. Filling in missing values
5. Handling categorization data
- 5.0 Introduction
- 5.1. Coding of Nominal Categorizing Features
- 5.2. Coding of ordinal categorizing features
- 5.3. Coding of feature dictionaries
- 5.4. Inserting Missing Class Values
- 5.5. Handling Unbalanced Classes
6. Text support
- 6.0. Introduction
- 6.1. Text Cleanup
- 6.2. Processing and cleaning HTML data
- 6.3. Removing a punctuation mark
- 6.4. Text Tokenization
- 6.5. Removing words of little importance
- 6.6. Word stemming
- 6.7. Parts of Speech Labeling
- 6.8. Named Entity Recognition
- 6.9. Text Encoding Using Bag of Words Model
- 6.10. Determining the weight of words
- 6.11. Using Text Vectors to Calculate Text Similarity in a Search Query
- 6.12. Using the Sentiment Analysis Classifier
7. Date and time support
- 7.0. Introduction
- 7.1. Converting a Text String to a Date
- 7.2. Time Zone Support
- 7.3. Getting the date and time
- 7.4. Splitting Date Data into Multiple Features
- 7.5. Calculating the difference between dates
- 7.6. Day of the week coding
- 7.7. Creating a Time-Delayed Feature
- 7.8. Using Elapsed Time Windows
- 7.9. Handling Missing Data in Data Series Containing Date and Time Values
8. Image handling
- 8.0. Introduction
- 8.1. Loading an image
- 8.2. Saving an image
- 8.3. Resize the image
- 8.4. Cropping an image
- 8.5. Blurring the image
- 8.6. Sharpening an image
- 8.7. Increasing Contrast
- 8.8. Isolating Colors
- 8.9. Image Thresholding
- 8.10. Removing the background of an image
- 8.11. Edge Detection
- 8.12. Detecting corners in an image
- 8.13. Creating Features in Machine Learning
- 8.14. Using Color Histogram as Feature
- 8.15. Using Trained Embeddings as Features
- 8.16. Object Detection with OpenCV
- 8.17. Classifying Images with PyTorch
9. Dimensionality Reduction Using Feature Extraction
- 9.0. Introduction
- 9.1. Reducing Features Using Principal Components
- 9.2. Reducing Features When Data Are Linearly Inseparable
- 9.3. Reducing Features by Maximizing Class Disjointness
- 9.4. Reducing Features Using Matrix Decomposition
- 9.5. Reducing Features in Sparse Data
10. Dimensionality Reduction Using Feature Selection
- 10.0. Introduction
- 10.1. Thresholding the variance of a numerical feature
- 10.2. Binary Feature Variance Thresholding
- 10.3. Handling Highly Correlated Features
- 10.4. Removing features that are not relevant for classification
- 10.5. Recursive feature elimination
11. Model evaluation
- 11.0. Introduction
- 11.1. Cross-validation models
- 11.2. Creating a Base Regression Model
- 11.3. Creating a Base Classification Model
- 11.4. Evaluating Binary Classifier Predictions
- 11.5. Evaluation of binary classifier thresholding
- 11.6. Evaluating the predictions of a multi-class classifier
- 11.7. Classifier Performance Visualization
- 11.8. Evaluation of the regression model
- 11.9. Clustering model evaluation
- 11.10. Defining Custom Model Evaluation Factors
- 11.11. Visualization of the effect of training set size
- 11.12. Creating a text report on the evaluation factor
- 11.13. Visualization of the effect of changing the value of hyperparameters
12. Model selection
- 12.0. Introduction
- 12.1. Selecting the best models using an exhaustive search
- 12.2. Selecting the best models using random search
- 12.3. Selecting the best models from multiple machine learning algorithms
- 12.4. Selecting the best models at the data preparation stage
- 12.5. Accelerating Model Selection with Parallelism
- 12.6. Accelerating Model Selection Using Algorithm-Specific Methods
- 12.7. Performance Evaluation After Model Selection
13. Linear regression
- 13.0. Introduction
- 13.1. Drawing lines
- 13.2. Handling Interaction Influence
- 13.3. Determining the nonlinear relationship
- 13.4. Reducing Variance with Regularization
- 13.5. Reducing Features Using LASSO Regression
14. Trees and forests
- 14.0. Introduction
- 14.1. Training a Decision Tree Classifier
- 14.2. Training a Decision Tree Regressor
- 14.3. Decision Tree Model Visualization
- 14.4. Training a Random Forest Classifier
- 14.5. Training a Random Forest Regressor
- 14.6. Evaluating Random Forest with Out-of-Bag Error Estimator
- 14.7. Identifying Important Features in Random Forests
- 14.8. Selecting Important Features in Random Forest
- 14.9. Handling Unbalanced Classes
- 14.10. Controlling the size of the tree
- 14.11. Improving performance with boost
- 14.12. Training the XGBoost model
- 14.13 Improving Real-Time Performance with LightGBM
15. k nearest neighbors algorithm
- 15.0. Introduction
- 15.1. Finding the nearest neighbors of an observation
- 15.2. Creating a k-nearest neighbor classifier
- 15.3. Determining the best neighborhood size
- 15.4. Creating a Radius-Based Nearest Neighbor Classifier
- 15.5. Finding Approximate Nearest Neighbors
- 15.6. Estimation of approximate nearest neighbors
16. Logistic regression
- 16.0. Introduction
- 16.1. Training a Binary Classifier
- 16.2. Training a Multi-Class Classifier
- 16.3. Variance Reduction Through Regularization
- 16.4. Training a Classifier on Very Large Data
- 16.5. Handling Unbalanced Classes
17. Support Vector Machine
- 17.0. Introduction
- 17.1. Training a Linear Classifier
- 17.2. Handling Linearly Inseparable Classes Using Kernel Functions
- 17.3. Determining the Projected Probability
- 17.4. Identification of support vectors
- 17.5. Handling Unbalanced Classes
18. Naive Bayes Classifier
- 18.0. Introduction
- 18.1. Training a Classifier for Continuous Features
- 18.2. Training a Classifier for Discrete or Numerical Features
- 18.3. Training a Naive Bayes Classifier for Binary Features
- 18.4. Calibrating the Forecast Probability
19. Clustering
- 19.0. Introduction
- 19.1. Clustering with k means
- 19.2. Accelerating clustering with k-means
- 19.3. Clustering using the meanshift algorithm
- 19.4. Clustering using the DBSCAN algorithm
- 19.5. Clustering by Hierarchical Linking
20. Tensors in PyTorch
- 20.0. Introduction
- 20.1. Creating a Tensor
- 20.2. Creating a Tensor from NumPy
- 20.3. Creation of a sparse tensor
- 20.4. Selecting Tensor Elements
- 20.5. Describing a Tensor
- 20.6. Performing operations on tensor elements
- 20.7. Finding the minimum and maximum values
- 20.8. Changing the shape of a tensor
- 20.9. Tensor Transpose
- 20.10. Tensor Flattening
- 20.11. Calculating the dot product
- 20.12. Tensor Multiplication
21. Neural networks
- 21.0. Introduction
- 21.1. Using the PyTorch Framework's Autograd Engine
- 21.2. Preparing data for a neural network
- 21.3. Designing a Neural Network
- 21.4. Training a Binary Classifier
- 21.5. Training a Multi-Class Classifier
- 21.6. Training the Regressor
- 21.7. Generating Forecasts
- 21.8. Training History Visualization
- 21.9. Reducing Overfitting with Weight Regularization
- 21.10. Overfitting Reduction Using Early Termination Technique
- 21.11. Reducing Overfitting Using the Dropout Technique
- 21.12. Saving the progress of the learning model
- 21.13. Fine-tuning a neural network
- 21.14. Neural network visualization
22. Neural Networks for Unstructured Data
- 22.0 Introduction
- 22.1. Training a Neural Network for Image Classification
- 22.2. Training a Neural Network for Text Classification
- 22.3. Fine-tuning the Trained Model for Image Classification
- 22.4. Fine-tuning the Trained Model for Text Classification
23. Saving, Loading, and Sharing Trained Models
- 23.0. Introduction
- 23.1. Saving and Loading a Scikit-Learn Model
- 23.2. Saving and Loading a TensorFlow Model
- 23.3. Saving and Loading a PyTorch Model
- 23.4. Sharing scikit-learn models
- 23.5. Sharing TensorFlow Models
- 23.6. Sharing PyTorch models with Seldon
Useful links |