- Type Learning
- Level Intermediate
- Time Days
- Cost Free
bp Data Science Intermediate Bootcamp
Issued by
bp
Earners have successfully completed a hands-on, intensive online training in which they deepened their knowledge of Python and learnt the basics of programmatically extracting data from the web. They learnt how to build machine learning models to deliver data-driven insights that help make better decisions that improve revenue, reduce costs, create opportunities, identify ideas, improve customer experience and more. They have acquired deep knowledge in a variety of data tools and skills.
- Type Learning
- Level Intermediate
- Time Days
- Cost Free
Skills
Earning Criteria
-
The earner completed an advanced Python mini-project, scoring at least a 90% on all questions of the mini-project.
-
Earners used Python libraries to parse a text file that contained information about a collection of party photographs. Each non-blank line of the text file represented a photo caption in JSON format, which they needed to parse while some of the JSON objects were malformed. They used Python string manipulation to process all captions and to determine which people appear in pictures together. They used this information to answer several questions about how these people are connected to each other.
-
In the data wrangling mini-project the earner used Python’s pandas library and worked with the US Bureau of Labor Statistics data set to answer questions relating to employment statistics, requiring the use of grouping and aggregating pandas operations. They needed to work across multiple files representing data for various states and industries.
-
The earner completed the mini-project which tested web scraping skills using Python's Requests library, skills parsing HTML with Beautiful Soup and using a checkpointing library like Ediblepickle. The earner scraped data off of an e-commerce website that contains information about computer and phone products. They investigated the prices of laptops offered on the website and as the ratings for different laptop brands.
-
In the machine meaning mini-project that focused on Regression, the earner worked with a housing data set to develop a model to predict house price based on various features about a house. They started with a linear regression model using just one feature, then built a model with 2 features and finally used a linear regression model trained on all of the data. They practiced using essential Scikit-Learn tools and assessed feature importance.
-
In the classification mini-project the earner developed a model to predict customer churn from various customer features. They first trained a logistic regression model that used only numerical features and then they also incorporated categorical features, which required using essential Scikit-Learn tools (including transformers and pipelines). They improved the model by using a random forest classifier and tuned it with with hyperparameter optimization.
-
Earners developed models to solve both Regression and Classification problems. They used demographic and socioeconomic data to build a Regression model to predict household income using a variety of numeric and categorical features. For the Classification part of the project they worked with airline flights data and used various features to predict the flight carrier. They engineered new features using the polynomial transformer and used Scikit-Learn pipelines.
-
Earners gained experience with implementing unsupervised learning algorithms on real-world data and drawing insightful conclusions. They used a data set of credit card company customers and first performed principal components analysis. Then they trained a K-Means clustering model and determined the proper number of clusters according to silhouette score. Lastly they found cluster centroids which correspond to average customers and extract characteristic customers information.
-
The earner also had a chance to work on an optional mini-project in which they used natural language processing techniques to analyze Amazon product reviews to predict their sentiment. By using essential Scikit-Learn tools such as transformers and pipeline, they were required to train a bag of words model to predict if the review was positive or negative.