• Data Engineer II - Research

    Job Locations US-IA-Iowa City | US-CO-Lakewood | US-Remote
    Posted Date 7 months ago(9/5/2018 2:02 PM)
    Job ID
    # of Openings
    Up to 25% Travel
  • Overview

    The Research Data Engineer II takes the lead from the Senior Data Engineer to establish data pipelines that optimize ACT’s data assets, specifically for research and data scientist exploratory analytics and modeling and for algorithmic use in products.  The position helps the business to leverage data as a strategic asset, including building automated processes and metadata in support of data governance.  The Research Data Engineer II is a data steward within the data lake, and collaborates with enterprise data architecture and enterprise data engineering to define business rules, critical data elements, and data standards, and to codify metadata relevant to data science and research applications.


    Typical work-related activities include:


    • Process structured, unstructured, and semi-structured data into a form suitable for analysis by data scientists, research scientists, and learning scientists
    • Work closely with scientists to understand the questions they are asking of the data
    • Define business rules, critical data elements, data standards, and codify metadata in data lake as data stewards, ensuring data is interpretable by those working in the data lake
    • Meet established data quality metric targets such as completeness, currency, accuracy, lineage, accessibility, timeliness, validity, integrity, precision, and representation
    • Implement and support data visualization
    • Build automated processes and metadata in support of data governance
    • Write code that is easy to understand, test, and maintain
    • Assist our engineering team to integrate data pipelines into our production systems 
    • Build and document performant data pipelines; work with performance engineering to optimize code as needed
    • Keep pace with ever-changing data storage and wrangling tooling, including data in motion, cloud distributed, fog and edge data uses
    • Contribute to a culture of high achievement, industry leadership, innovation, and accountability





    • Bachelor’s degree in computer science, engineering, data mining, or related field required.
    • Or an equivalent combination of education and experience from which comparable knowledge and abilities can be acquired


    • Minimum of three years’ demonstrated success building data pipelines for analytic purposes
    • Experience with software development tools such as Github and JIRA required
    • Experience in education or assessment industries preferred
    • Expert in data modeling, with advanced knowledge of and experience writing and tuning SQL; Postgres and Amazon RDS preferred
    • Experience integrating data from multiple data sources and processing large amounts of structured and unstructured data; Spark preferred; Apache Atlas/Apache Ranger experience helpful
    • Experience with NoSQL databases preferred
    • Experience with scalable, real-time messaging systems preferred; Kafka preferred
    • Experience handling datasets exceeding 250GB preferred

    Knowledge, Skills and Abilities:


    • Demonstrated skills preparing data for analytic purposes for users of Python, R, and/or SAS
    • Strong knowledge of data flows, data architecture, ETL and processing of structured, unstructured, and semi-structured data
    • Knowledge of general computer science principles, including distributed computing and object oriented design and development
    • Good oral and written communication skills
    • Exceptional collaborator and team member
    • Demonstrated eagerness to learn new techniques
    • Agile mindset


    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share on your newsfeed

    Connect With Us!

    Not ready to apply? Connect with us for general consideration.