Domande poste durante i colloqui per Data mining analyst
243
Domande dei colloqui per Data Mining Analyst condivise dai candidatiDomande principali poste durante i colloqui

Given the set a a b b a a c c b b of unknown length, write an algorithm that figures out which occurs most frequently and with the most continuous repetition.
2 risposte↳
doesn't matter, no answer is right!
↳
Maintains Hashmap to store the overall frequency and longest continuous sequence. Meno

List the strings that are anagrams from a set of strings?
2 risposte↳
Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Meno
↳
sort the strings and compare

How would you design a recommendation system (like amazon)?
2 risposte↳
Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Meno
↳
Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Meno

there really were none.
2 risposte↳
they seemed ot want to hear what I had ot say about my past assignments and relevance to the opening. i think they were not impressed. Meno
↳
Intuited

We do pre-screening on the data to remove fraud threats -- so how do we find a data sample that we can use to determine a real representation of fraud events.
2 risposte↳
Remove screen and look at the unbiased data.
↳
Yes, remove prescreen and look the unbiased sample. IF the unbiased sample becomes too big, then just randomly choose 1/2, or small, for the purpose of representation of fraud events. Meno

why do you think you should be chosen for this position?
2 risposte↳
I'm hard working, great team player, reliable, quick learner etc etc
↳
cuz i got a 10inch and great performer in front of the camera - porn industry

Implement a sampling function with nominal distribution.
2 risposte↳
I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean & SD. e.g. >set.seed(123) >rnorm(100, 2, 5) Meno
↳
I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Meno
Tell me about your past experience in engineering.
1 risposte↳
Provided examples from my education and work.
