Advanced Information Revision Notes
Contents
Recursion
A method or subroutine defined in terms of itself (ie it will call itself from within itself).
Base Case
The base case doesn't call itself, and is the criteria to stop the recursion calling itself again.
Normally it will return a value.
General Case
This is where the code calls itself, ie the bit that is recursive.
Remember
For every call every line of code will be run.
Until the base case is reached the number of calls open will expand.
When the base case is reached that call will run to an end.
Now you will go through the open calls from the most recent down to the oldest, each will run to an end.
Arrays
An Array is a data structure that allows you to store several values of the same data type.
They are static structures.
You must declare the size of the array when it is created.
Memory for entire structure is given when created.
Single Dimension
Each value can be accessed by the element number within the sequence.
Multi Dimension
You need to specify the dimensions when created (ie 3, 3 ).
It can still only store one data type, so it could be a 2 dimensional array of Integers for example.
To access an element you need to give the element number for each dimension.
Abstract Data Types & Dynamic vs Static
Static Data Structure
With a static data structure, the size of the structure is fixed. Static data structures are very good for storing a well-defined number of data items.
Advantage
- The memory allocation is fixed and so there will be no problem with adding and removing data items.
- The memory is allocated in one block and is continuous so execution will be faster.
- Easier to program as there is no need to check on data structure size at any point.
Disadvantage
- Can be very inefficient as the memory for the data structure has been set aside regardless of whether it is needed or not whilst the program is executing.
Dynamic Data Structure
Memory is allocated to the data structure dynamically i.e. as the program executes. This means the data structure is allowed to grow and shrink as the demand for storage arises.
Advantage
- Makes the most efficient use of memory as the data structure only uses as much memory as it needs
Disadvantage
- Because the memory allocation is dynamic, it is possible for the structure to 'overflow' should it exceed its allowed limit. It can also 'underflow' should it become empty.
- Allocated memory is fragmented so will take longer to read
- You might need to code basic functions such as checking its size or the number of items stored.
Abstract Data Structure
An abstract data structure is one that is not defined by how it is implemented, it is instead defined by its functions.
So an abstract data structure could be implemented in many different ways, but each having the same functionality.
Queues
A queue is a data structure with a first-in first-out policy.
Linear
A linear queue is where items are removed from the front of queue and items are added to the rear of the queue.
As items are removed from the front of the queue the front of the queue will move further and further back.
This would require each item in the queue to be moved every time an item was removed from the queue. This would require too much processing especially for large queue sizes.
Circular
Pointers are required to point to the front and rear of the queue.
The rear pointer always points to the last item in the queue and the front pointer points to the next item to be removed from the front of the queue.
Initially rear pointer would be set to 0 and front pointer to 1.
We have a limit to the size of the queue (ie queuemax). To make the queue circular when either pointer reaches queuemax it will next move to position 1.
The queue is full when Number of items = queuemax and the queue is empty when Number of items = 0.
We can only add an item if the queue is not full and we can only remove an item if the queue is not empty.
Priority
A priority queue is essentially two queues. You have a queue for priority items, and a queue for standard items. New priority items are added to the back of the priority queue, and new standard items are added to the back of the standard queue.
Items are removed from the front of the priority queue first. When no priority items are queued, items will be removed from the front of the standard queue.
Stacks
A stack is a last in, first out data structure.
A stack can only have data added or removed from the top.
The Stack Pointer will always point to the value at the top of the stack.
The stack has limited size and this must be defined.
You push an item onto the stack, an pop an item off the stack.
You must check for Stack Empty (Stack Pointer = 0) before a pop.
You must check for Stack Full (Stack Pointer < Max Size) before a push.
Graphs
Graphs can be used to represent complex relationships, ie who knows who or how things are connected.
Vertex / Node
A point or location on the graph.
Edge / Arc
A connection from 1 node to another.
Weighted Graph
Edges will have a cost or value attached to it. This represents the cost or distance of using that edge.
Undirected
Edges are shown by a line. You can travel in both directions along the line.
If a node A is connected to node B, then node B is also connected to node A
Directed
Edges are shown with arrows, you can only travel in the direction of the arrow.
If a node A is connected to node B, then node B might not be connected to node A (it will be a separate arrow)
Adjacency Matrix
Used to store what is and isn't connected.
Essentially a table with the nodes across the top and down the sides.
You can enter a 1 or a weighting if the are connected.
0 if they aren't connected.
Adjacency List
Only stores what is connected.
For each node, simply list which nodes are connected.
Doesn't work for weighted graphs.
Matrix vs List
The advice is to use a matrix when you have a graph with many edges (Dense).
The advice is to use a List when you have few edges (Sparse)
Remember a list will also require additional processing to check the list of connections for a given node.
A matrix requires little processing, you can go direct to the location.
Trees
A tree is a specific type of graph.
- Must be connected
- Must be undirected
- Must have no cycles
Graph Traversal
Breath First Traversal
The purpose of this traversal is to examine the width of the graph as soon as possible, the queue means the oldest discovered edge will be the next node explored. This ensures all of the nodes at a given level are explored before moving to the next level.
Algorithm
For this algorithm we will use a queue to store the nodes we need to visit. We should also maintain a list of the nodes we visit. Every node also has a setting to say if it has been discovered or not, it will also have a setting to say when it has been completely explored. The node will be completely explored when each neighbour has been discovered.
So we:
- Create structure to store Discovered & Completely Explored
- Create structure for Visited Nodes
- Add starting node to the queue
Then while we have an item in the queue:
- Set the Current Node = Front of Queue
- Set Explored of Current Node = true
- Add Current Node to Visited Nodes
- For each neighbour of Current Node
- If Not Discovered
- Add neighbour to queue
- Set Discovered of neighbour to true
- If Not Discovered
- Dequeue Current Node
Applications of Breath First Traversal
- Identifying the shortest path between 2 points (GPS etc)
- Showing relationships between connected nodes (Facebook)
- Web crawlers analyse the sites you can reach by following links on another site
Depth First Traversal
The purpose is to examine the depth of the structure as soon as possible, the stack means it examines the most recently discovered nodes first.
Algorithm
For this algorithm we will use a Stack to store the nodes we need to visit. We should also maintain a list of the nodes we visit. Every node also has a setting to say if it has been discovered or not, it will also have a setting to say when it has been completely explored. The node will be completely explored when each neighbour has been discovered.
So we:
- Create structure to store Discovered & Completely Explored
- Create structure for Visited Nodes
- Add starting node to the Stack
Then while we have an item on the Stack:
- Set the Current Node = Stack.Pop
- Set Explored of Current Node = true
- Add Current Node to Visited Nodes
- For each neighbour of Current Node
- If Not Discovered
- Push neighbour to Stack
- Set Discovered of neighbour to true
- If Not Discovered
Applications of Depth First Traversal
- Solving puzzles with a single solution (eg maze)
- Path finding, the items on the stack when your reach destination should be the path
- Finding cycles in a graph
Searching Algorithms
Linear Search
Going through every entry in list to find what you are looking for.
The complexity is O(n) because the complexity increases at the same rate as the size of the list.
Binary Search
- Find the middle value of the list and compare it to what you are looking for.
- If the value is lower cut the list to Start..(Middle - 1).
- If the value is higher cut the list to (Middle+1)..End.
- Repeat above steps
The list must be sorted for this to work.
The complexity is O(log N) because a bigger change to the size of the list produces a small increase in the complexity.
If you double the size of the list you add just 1 extra comparison.
Sorting Algorithms
Bubble Sort
Merge Sort
Optimisation Algorithms
Dijkstra's Shortest Path
Order of Complexity
O(1) - Constant
If the size of the input increases it still takes exactly the same amount of processing.
O(log N) - Logarithmic
If the size of the input increases the the amount of processing increases by a much smaller amount.
eg: Binary Search.
O(N) - Linear
If the size of the input increases the amount of processing increases at the same rate.
eg: Linear Search.
O(N log N) - Linearithmic
If the size of the input increases the amount of processing increases at a faster rate.
eg: Merge Sort.
O(N2) - Polynomial
If the size of the input increases the amount of processing increases at a rate to the power of 2.
eg: Bubble Sort.
O(kN) - Exponential
If the size of the input increases the amount of processing increases by the power of N.
Problems with O(kn) can only be solved for a small values of the input n.
This is Intractable.
eg: Chess board problem (ie add one grain onto the first square & double the previous for every other square).
O(N!) - Factorial
If the size of the input increases the amount of processing increases by the factorial of N.
This is Intractable.
eg: Travelling Salesman Problem (a route which goes through every node and back to the start for the lowest cost).
Halting Problem
The halting problem asks:
‘Is it possible to write a program that can tell given any program and its inputs and without executing the program, whether it will halt?’
A method to do this for any program has been proven to not exist.