System design round, implement a file system / tree crawler. You are provided a getPath method which can take a long time / fail when called. At the beginning you'd be asked to implement this in a single thread. Then, make it faster by implementing it in distributed manner. What happens if getPath fail / take a long time? How do you know when to stop crawling? What happens when you have many companies that you need to traverse? What database would you use? How would the end user fetch the result from the crawl? Retryable vs non-retryable errors?

Question

System design round, implement a file system / tree crawler. You are provided a getPath method which can take a long time / fail when called.  At the beginning you'd be asked to implement this in a single thread. Then, make it faster by implementing it in distributed manner. What happens if getPath fail / take a long time? How do you know when to stop crawling? What happens when you have many companies that you need to traverse?  What database would you use? How would the end user fetch the result from the crawl? Retryable vs non-retryable errors?

Sigma Computing

Domanda di colloquio di Sigma Computing

Aziende seguite

Ricerche di lavoro