Page 155 - Career Development Guidebook
P. 155

SECTION 4: INTERVIEWS



          Interview Question: Data

          Some of the data structures whose runtime complexities you should know and that you should be able
          to implement in at least one language:
                Trees: binary search tree, heap, trie (prefix and suffix tree)
                Queues, stacks, and priority queues
                Linked lists
                HashMap and HashTable




              Tips:
              You  should  be  comfortable  manipulating  popular  data  formats  such  as  the  ubiquitous  CSV
              format and the web- and serialization-friendly JSON format. Both CSV and JSON are examples
              of the traditional row-based file formats: data is stored and often indexed row-by-row.
              In recent years, the column-based format has become more and more common, as it allows big
              data applications to quickly extract one feature from all the data points by calling the column
              corresponding to that feature.

              Popular data frameworks for machine learning, including Pandas and Dask, are optimized for
              column-based  operations.  The  two  common  column-based  file  formats  are  Parquet,
              championed by Apache Hadoop, and ORC, championed by Apache Hive.


























































                                                                                                           155
   150   151   152   153   154   155   156   157   158