Ho sostenuto un colloquio presso Anonymous Content (Poona)
Colloquio
It was good and panel was a bit strict towards his asked points and over all experience was good and well to go for interview along with more and crisp knowledge
Ho sostenuto un colloquio presso Anonymous Content
Colloquio
It was easy and prepare Advanced Topics
What are accumulators and broadcast variables? Use cases?
What is Tungsten and Catalyst Optimizer in Spark?
Difference between cache and persist.
What is checkpointing? When do you use it?
Domande di colloquio [1]
Domanda 1
Find duplicate rows in a PySpark DataFrame.
Remove duplicates but keep the latest row (based on timestamp).
Find employees who logged in for 3 consecutive days.
Pivot sales data: rows (month, sales) → columns (Jan, Feb, Mar…).
Explode JSON column (with arrays) into multiple rows.
Read data from Kafka using PySpark Structured Streaming.
Write a PySpark job that increments data daily using partition pruning.