When you are part of the team at Thermo Fisher Scientific, you’ll do important work, like helping customers in finding cures for cancer, protecting the environment, or making sure our food is safe.
Your work will have a real-world impact, and you’ll be supported in achieving your career goals.
How will you make an impact?
Data Engineer , based out of Shanghai, China location, will join the Corporate ETO Platform Solutions , Data Science and Digital Marketing Team .
will be working with business leaders and team members to develop data ingestion pipelines, delta lakes, lake houses, and data warehouses across a variety of infrastructure (both on-prem and cloud).
The Candidate must be able to work effectively in an agile team to design, develop, and maintain data structures for the Data warehouse.
This position offers an exciting opportunity to work on processes that interface with multiple systems including AWS, Oracle, Middleware, and ERPs.
What will you do?
Design, develop, test, deploy, support, enhance data integration solutions seamlessly to connect and integrate Thermo Fisher enterprise systems in our Data Science and Enterprise Data Platform.
Innovate for data integration in Apache Spark-based Platform to ensure the technology solutions leverage cutting edge integration capabilities.
Facilitate requirements gathering and process mapping workshops, review business / functional requirement documents, author technical design documents, testing plans, and scripts.
Assist with implementing standard operating procedures, facilitate review sessions with functional owners and end-user representatives, and leverage technical knowledge and expertise to drive improvements.
Defining, designing, documenting reference architecture, and leading the implementation of BI and analytical solutions.
Follow agile development methodologies to deliver solutions and product features by following DevOps practices.
How will you get here?
Master's degree in computer science engineering from an accredited university (desired)
4-year degree with a major in computer science engineering (or equivalent) from an accredited university (preferred) will substitute for a minimum of 5 -7 years of professional IT experience.
Experience in Databricks, Data / Delta lake, Oracle, SQL Server, or AWS Redshift type relational databases.
Experience in ETL (Data extraction, data transformation, and data load processes)
Knowledge, Skills, Abilities
5+ years of working experience in data integration and pipeline development.
Excellent experience in Databricks and Apache Spark .
Data lake and Delta lake experience with AWS Glue and Athena .
2+ years of Experience with AWS Cloud on data integration with Apache Spark, Glue, Kafka, Elastic Search, Lambda, S3, Redshift, RDS, MongoDB / DynamoDB ecosystems.
Strong real-life experience in python development especially in pySpark in AWS Cloud environment
Design, develop test, deploy, maintain and improve data integration pipeline .
Experience in Python and common python libraries.
Strong analytical experience with database in writing complex queries, query optimization, debugging, user-defined functions, views, indexes, etc.
Strong experience with source control systems such as Git and Jenkins build and continuous integration tools.
Highly self-driven, execution-focused, with a willingness to do "what it takes to deliver results as you will be expected to rapidly cover a considerable amount of demands on data integration
Understanding of development methodology and actual experience writing functional and technical design specifications.
Excellent verbal and written communication skills, in person, by telephone, and with large teams.
Strong prior technical, development background in either data Services or Engineering
Demonstrated experience resolving complex data integration problems;
Must be able to work cross-functionally. Above all else, must be equal parts data-driven and results-driven.