Page 155 - Career Development Guidebook
P. 155
SECTION 4: INTERVIEWS
Interview Question: Data
Some of the data structures whose runtime complexities you should know and that you should be able
to implement in at least one language:
Trees: binary search tree, heap, trie (prefix and suffix tree)
Queues, stacks, and priority queues
Linked lists
HashMap and HashTable
Tips:
You should be comfortable manipulating popular data formats such as the ubiquitous CSV
format and the web- and serialization-friendly JSON format. Both CSV and JSON are examples
of the traditional row-based file formats: data is stored and often indexed row-by-row.
In recent years, the column-based format has become more and more common, as it allows big
data applications to quickly extract one feature from all the data points by calling the column
corresponding to that feature.
Popular data frameworks for machine learning, including Pandas and Dask, are optimized for
column-based operations. The two common column-based file formats are Parquet,
championed by Apache Hadoop, and ORC, championed by Apache Hive.
155