Domande per la posizione di Data Mining Scientist...13 ottobre 2011

Given the set a a b b a a c c b b of unknown length, write an algorithm that figures out which occurs most frequently and with the most continuous repetition.

2 risposte

doesn't matter, no answer is right!

Maintains Hashmap to store the overall frequency and longest continuous sequence. Meno


What was the angle between the clock hands at 3:15.

2 risposte

7.5 degrees

Zero degrees


List the strings that are anagrams from a set of strings?

2 risposte

Sorting the strings is not optimal because each sort is O(N log N) where N is the number of characters in each word. A more optimal solution is to create a function to encode each word as a hash table of character frequencies, which is O(N) for each word. Meno

sort the strings and compare


How would you design a recommendation system (like amazon)?

2 risposte

Use collaborate filtering to compare personal preference with others. If A and B are similar, we can recommend preferred items in B to A. Meno

Why downvote on other answer? He/she is right. Collaborative filtering is the most common strategy for recommendation systems. You see user A buys these things and user B also bought those things but user B bought this other thing too so let's show that thing to User A. Meno


there really were none.

2 risposte

they seemed ot want to hear what I had ot say about my past assignments and relevance to the opening. i think they were not impressed. Meno



We do pre-screening on the data to remove fraud threats -- so how do we find a data sample that we can use to determine a real representation of fraud events.

2 risposte

Remove screen and look at the unbiased data.

Yes, remove prescreen and look the unbiased sample. IF the unbiased sample becomes too big, then just randomly choose 1/2, or small, for the purpose of representation of fraud events. Meno

Compass Group

why do you think you should be chosen for this position?

2 risposte

I'm hard working, great team player, reliable, quick learner etc etc

cuz i got a 10inch and great performer in front of the camera - porn industry


Implement a sampling function with nominal distribution.

2 risposte

I think you mean Normal distribution! If you are using R use set.seed(). You can then use rnorm() with size, mean & SD. e.g. >set.seed(123) >rnorm(100, 2, 5) Meno

I'm the original poster, sorry for my typo. I actually mean multinomial distribution. And the advanced question was, if the probability is a skewed distribution, how would you speed up your algorithm. You can find both answer from Wikipedia. :) Meno

Tell me about your past experience in engineering.

1 risposte

Provided examples from my education and work.

Bharat Aluminium

What's ur favorite subject

1 risposte

Mine Development.

