David Lu
In computer science, a data structure is a particular way of organizing data in a computer so that it can be used efficiently.
Over the next few weeks, we will look at the some basic data structures: the linear linked list and the binary search tree.
In 1976 Swiss computer scientist Niklaus Wirth (chief designer of the programming language Pascal) wrote an influential book titled Algorithms + Data Structures = Programs.
Algorithms and data structures are two fundamental topics in programming and they are closely related. We cannot learn about one without the other.
Any time we discuss and algorithm, we should also discuss its run time complexity. Generally we want to choose the most efficient algorithm.
Big O is a special notation that allows us to express how quickly an algorithm grows.
Imagine we are writing a search algorithm to help a drone calculate where to land. Suppose we are trying to decide between a simple search and a binary search. The algorithm needs to be both correct and fast. Perhaps we only have 10 seconds to figure out where to land, otherwise the drone will be off course.
On one hand, binary search is faster. We know this. On the other hand, simple search is easier to write and there is less chance of bugs. How might we decide whether it's okay to go with simple search?
We might time both methods and see how they differ. So suppose we time both algorithms with a list of 100 elements. Assume it takes 1 millisecond to check one element. With simple search, we must check 100 elements in the worst case, so the search takes 100ms in the worst case. On the other hand, binary search only has to check 7 elements (log_2 100 is about 7), so the search takes 7ms.
More realistically, the search space might have a billion elements (1,000,000,000). How long will simple search take? And how long will binary search take?
Binary search on a billion elements takes about 32ms. (What's log_2 1,000,000,000?) How long do you think simple search will take?
Elements | Simple cearch | Binary search |
---|---|---|
100 | 100ms | 7ms |
10,000 | 10 seconds | 14ms |
1,000,000,000 | 11 days | 32ms |
As the size of the search space increases, the time it takes simple search increases faster than binary search!
Algorithm efficiency is not measured in terms of seconds. Algorithm efficiency is measured in terms of growth.
In computer science, a list or sequence is an abstract data type that represents a countable number of ordered values, where the same value may occur more than once.
An instance of a list is a computer representation of the mathematical concept of a finite sequence.
A list can also be defined inductively:
A list is either empty or it is an item followed by a list.
In computer science, a linked list is a linear collection of data elements, called nodes, each pointing to the next node by means of a pointer.
It is a data structure consisting of a group of nodes which together represent a sequence.
In a simple linear linked list, each node is composed of a piece of data and a reference (in other words, a link) to the next node in the sequence.
Linked Lists are among the simplest and most common data structures because it allows for efficient insertion or removal of elements from any position in the sequence.
It is important to understand that a linked list is different than an array. Linked lists are not indexed and does not support arbitrary access.
Do you recall what an array is and how arrays work?
What are the benefits of arrays versus linked lists?
What are the benefits of linked lists versus arrays?
Time complexity for Linear Linked List:
Access | Search | Insert | Delete |
---|---|---|---|
Abstract Data Type (ADT)
To implement a linear linked list in C++, we need to understand a few C++ concepts.
We can use a struct or a class to define each node inductively.
struct node { int data; node* next; };
Let's keep things simple and consider linear linked lists of integers.
A basic operation for lists is to put something on the list. If our list is written on paper, we just find a place to write our item down. If our list is in markdown, we can do this:
What is the algorithm for insertion in a linear linked list?
def insert(head, data): if head is empty: head <- data else: insert(head.next, data)
What will this do?
Should we insert at the front, back, or middle of the list?
What are the algorithms to do these?
Another basic operation is to remove something from the list. Remember in C++ we need to free the memory that we allocated.
You should have seen how to traverse the linear linked list by looping. Perhaps you've seen code that looks like this:
node* temp = head; while(temp != NULL) { //Do stuff temp = temp->next; }
Since a list is a recursive data type, it makes sense to make use of recursion in programming operations on lists.
Recursion is a tool a programmer can use to invoke a function call on itself.
Recursive programming is directly related to mathematical induction.
In recursive programming we need two things:
Consider the pseudocode to add an item at the end of a linear linked list.
Iteratively
def addLast(item): if head = None: head <- new node(item) else: node temp <- head while temp.next != None: temp <- temp.next temp.next <- new node(item)
Recursively
def addLast(head, item): if head = None: head <- new node(item) else addLast(head->next, item)
Let's try in C++ together.
An operation we may be interested in doing on lists is to find whether an item is in a list or not. This is called search.
Another operation we may be interested in performing on lists is to sort it. We called this sort.
If you want to learn how to program, you must practice. Programming is a skill, not something that can just be memorized. Like any skill, you must practice to get good.
There are many websites that have environments set up so that you can practice programming.
Here are two examples that you can try:
Codefights
HackerRank
Alternatively, you can program from scratch and practice.
There are some useful functions in the C++ standard library to generate random numbers.
We will need to include two libraries: ctime and cstdlib.
We need the ctime library to "seed" the peudo random number generator. The cstdlib library contains a pseudo random number generator function.
#include <ctime> #include <cstdlib> ... // To seed the PRNG: srand(time(NULL)); // Do this only once! // To get a random number from 0 to 100: int num = rand() % 101;
So we may have a loop to insert some number of random numbers into our list to practice with.
For example:
for(int i = 0; i < 10; ++i) { list.insert(rand() % 101); // Or insert(head, rand() % 101); if we do not have a class. }
Given a linear linked list print in reverse order.
Example: = 3->2->4->5
Output of print(): 5->4->2->3
Given two linear linked lists and , return true if they are the same list. (Order and values matter)
Given a linear linked list return with its items in reverse order
Given a linear linked list , return a sorted list such taht contains the same values as
Example: Given 4->9->2->3->2, return 2->2->3->4->9
Given a sorted linear linked list, delete duplicates from the list.
For example, if given 1->2->2->3->3->3->4, the resulting list should be 1->2->3->4.
Given sorted linear linked lists and , return a sorted linear linked list that contains all elements from and
Example: = 2->3->4->6, = 1->3->7
= 1->2->3->3->4->6->7