bp Data Engineering Fundamentals

Issued by bp

The earner of this badge has successfully completed data engineering training. They have demonstrated abilities to use SQL and Python to ingest, transform, store, and process data using both relational databases (RDBMS) and NoSQL databases,, and utilized AWS and Spark to scale data solutions. The learner brings together these skills to create ETL data pipelines and data platforms for both batch and streaming data sources.

Type Learning
Level Foundational
Time Weeks
Cost Free

Additional Details

Skills

Earning Criteria

The earner completed all 9 core mini-projects. A completed mini-project means scoring at least a 90% on each question of the mini-project.
In the mini-project focused on Python data structures the earner demonstrated ability to write Python code to work with fundamental data structures such as lists (including appending to lists, modifying and sorting lists, list slicing, indexing etc.), work with dictionaries (dictionary lookup, adding keys, finding the key associated with largest value and more), work with sets, write for and while loops as well as compose list and dictionary comprehensions.
In the two mini-project focused on Python functions, earner has written custom functions that included working with various fundamental data structures and Python string processing. Further, they will also be able to write recursive functions. Tasks completed required understanding function scope, input arguments, the return statement and function calling.
In the mini-project focused on object oriented programming (OOP), the learner demonstrated the fundamentals of OOP. They will know how to write custom classes, create objects of those classes, and use inheritance to create more complicated classes.
In the two projects focused on SQL, the learner used the Structured Query Language to understand a fictional company’s sales pattern. They have connected to a Postgres database using Python and ran queries against it to answer specific questions. In the second project, the earner designed and assembled a SQL database of 4 years worth of NYC restaurant inspection data. They wrote and executed queries against this database to understand the variations in scores and violations across the city.
In the two data wrangling mini-project the earner used Python’s pandas library and worked with the US Bureau of Labor Statistics data set to answer questions relating to employment statistics, requiring the use of grouping and aggregating pandas operations. They needed to work across multiple files representing data for various states and industries. In the second, they will answer questions surrounding the US Department of Education College Scorecard data set.
In the Spark mini-project, the earner has parsed, cleaned, and processed a 10 GB set of XML files of user actions on a Q&A website. Using this behavior, they have answered questions about user behavior to predict the long-term behavior of new users. They have worked with RDDs and DataFrames.