Data Engineer
Shework.in
- Onsite
- All Metro Cities
- Full Time
- 5+ Years
Role Review
As a Senior Data Engineer, you will be responsible for the architecture, design, implementation, and maintenance of data processing pipelines. You will work with large-scale data systems and collaborate with cross-functional teams to deliver high-quality, scalable, and efficient data solutions.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines to process and analyze large volumes of structured and unstructured data.
- Work with data warehouses, data lakes, and other storage solutions to manage and optimize data flow.
- Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and deliver effective solutions.
- Develop and optimize ETL processes and workflows for data ingestion, transformation, and loading.
- Work with big data technologies such as Hadoop, Spark, and Kafka to process and manage real-time data streams.
- Integrate data from various sources, including internal databases, external APIs, and third-party services.
- Ensure data quality, integrity, and consistency by implementing data validation checks, monitoring tools, and automated testing.
- Create and manage data models, data marts, and reports to support business intelligence and analytics efforts.
- Ensure data security and compliance by applying encryption, access control, and other security best practices.
- Optimize the performance and scalability of data systems to handle large datasets and high-throughput data workloads.
- Document data engineering processes, data flows, and architectures to ensure maintainability and scalability.
- Provide mentorship and guidance to junior data engineers, ensuring best practices in coding and data management.
- Stay up-to-date with the latest trends and advancements in data engineering and big data technologies.
Required Skills & Qualifications
- 5+ years of experience in data engineering, with a strong background in designing and implementing large-scale data systems.
- Proficiency in programming languages such as Python, Java, or Scala for data processing and automation.
- Hands-on experience with big data technologies such as Hadoop, Spark, and Kafka.
- Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).
- Expertise in data warehousing technologies (e.g., Redshift, Snowflake, Google BigQuery).
- Strong understanding of ETL tools and frameworks (e.g., Apache NiFi, Airflow, Talend).
- Familiarity with cloud platforms (AWS, Google Cloud, Azure) and services like S3, Lambda, and DataBricks.
- Knowledge of data modeling, schema design, and data architecture best practices.
- Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Understanding of data security, data governance, and compliance standards (GDPR, CCPA).
- Experience with version control (e.g., Git) and CI/CD pipelines.
- Solid problem-solving, troubleshooting, and performance optimization skills.
- Excellent communication and collaboration skills to work with cross-functional teams.
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
Preferred Qualifications
- Experience with machine learning models and integration of ML pipelines into data workflows.
- Familiarity with real-time data processing frameworks (e.g., Apache Flink, Spark Streaming).
- Knowledge of business intelligence and analytics tools (e.g., Tableau, Power BI).
- Experience with distributed systems and cloud-native architecture.
- Familiarity with data lake architectures and tools like AWS Glue or Apache Hudi.
- Exposure to advanced data technologies like graph databases or time-series databases.