This badge was issued to Galib Shahriar on 08 Dec 2020.
- Type Learning
- Level Intermediate
- Time Weeks
- Cost Free
Applied Data Science II: Machine Learning & Statistical Analysis (with honors)
Issued by
WorldQuant University
Earners of this badge are able to build machine learning models to make predictions on real-world data. They understand the best way to treat, clean, and encode data and how to choose the appropriate machine learning models for the task. They can properly tune the model to create a generalized model that performs well on both a training set and on out-of-sample data. They can build models using text and time series data. Earners are also proficient in using Python’s scikit-learn package.
- Type Learning
- Level Intermediate
- Time Weeks
- Cost Free
Skills
- Anomaly Detection
- Clustering
- Decision Trees
- Dimensionality Reduction
- Gradient Boosting Trees
- Linear Regression
- Logistic Regression
- Machine Learning
- Model Fitting
- Model Tuning
- Natural Language Processing
- Random Forest
- Supervised Learning
- Support Vector Machine
- Time Series Analysis
- Unsupervised Learning
Earning Criteria
-
Earners of this badge have previously earned the badge "Applied Data Science I: Scientific Computing & Python." Additionally, they have successfully completed 2 mini projects and maintained a cumulative average score of 90% or above. The descriptions and skills needed to complete these projects are listed below.
-
In mini project 1, earners of this badge worked with nursing home inspection data from the United States, predicting which providers may be fined and for how much. They used the scikit-learn Python package to construct progressively more complicated machine learning models. They had to impute missing values, apply feature engineering, and encode categorical data.
-
In mini project 2, earners of this badge used natural language processing to train various machine learning models to predict an Amazon review rating based on the text of the review. Further, they used one of the trained models to gain insight on the reviews, identifying words that are highly polar. With these highly polar words identified, one can understand what words highly influence the model’s prediction.