CHAPTER 1: DATA REPRESENTATION

1.1 USER-DEFINED DATA TYPES

1.1.1 Introduction to User-Defined Data Types

Definition: User-defined data types are data types designed by the programmer. When object-oriented programming is not being used, a programmer may choose to utilize user-defined data types for a large program as their use can reduce errors and make the program more understandable.

Why Use User-Defined Data Types:

Reduce errors in the program
Make code more understandable
Less restriction than built-in types
Allows for inevitable user definition

1.1.2 Non-Composite User-Defined Data Types

Enumerated Data Type: A non-composite user-defined data type that is a list of possible data values. The values defined have an implied order of values to allow comparisons.

<PSEUDOCODE>

TYPE Season = (Summer, Winter, Autumn, Spring)

DECLARE ThisSeason : Season
DECLARE NextSeason : Season

ThisSeason ← Autumn
NextSeason ← ThisSeason + 1 // NextSeason is set to Spring

Characteristics:

Values are countable and finite
Allow comparisons (value2 > value1)
NOT string values (don't use quotes)

Pointer Data Type: Used to reference a memory location. It may be used to construct dynamically varying data structures. The pointer definition has to relate to the type of the variable being pointed to.

<PSEUDOCODE>

TYPE IntegerPointer = ^INTEGER

DECLARE IPointer : IntegerPointer
DECLARE MyInt1 : INTEGER
DECLARE MyInt2 : INTEGER

// Store address of MyInt2 in IPointer
IPointer ← @MyInt2

// Access data at address pointed to by IPointer
IPointer^ ← 33

// Store address of MyInt1 in SecondPointer
SecondPointer ← @MyInt1

Pointer Notation:

@identifier - returns the address of identifier
Pointer^ - accesses the data stored at the address (dereferencing)

1.1.3 Composite User-Defined Data Types

Record Data Type: A data type that contains a fixed number of components that can be of different types. Allows the programmer to collect values with other data types together when these form a coherent whole.

<PSEUDOCODE>

TYPE TEmployeeRecord
 DECLARE FirstName : STRING
 DECLARE LastName : STRING
 DECLARE Salary : REAL
 DECLARE Position : STRING
ENDTYPE

DECLARE Employee1 : TEmployeeRecord

Employee1.FirstName ← "John"
Employee1.LastName ← "Doe"
Employee1.Salary ← 2830.80
Employee1.Position ← "Project Manager"

Set Data Type: Allows a program to create sets and to apply mathematical operations defined in set theory. All elements in the set should be unique.

<PSEUDOCODE>

TYPE Days = SET OF STRING
DEFINE Today(Monday, Tuesday, Wednesday, Thursday, Friday) : Days

Operations on Sets:

Union
Difference
Intersection
Include an element
Exclude an element
Check whether an element is in a set

Class (in OOP): In object-oriented programming, a program defines the classes to be used. Then, for each class, the objects must be defined. A Class includes variables and methods (functions or procedures that an object can run).

1.2 FILE ORGANISATION AND ACCESS

1.2.1 Types of Files

Text Files:

Contains data stored according to a defined character code (ASCII or Unicode)
Can be created using a text editor
Data appears as readable characters

Binary Files:

Designed for storing data to be used by a computer program
Stores data in its internal representation
Created using specific programs
Structure: File → Records → Fields → Values

1.2.2 Methods of File Organisation

Serial Files:

Records have no defined order
Stored one after another in the order they were added
New records added at the end of the file
No end of record character (must have defined format)

Advantages:

Simple task
Low cost

Disadvantages:

Difficult to access specific records
Must read all preceding records
Cannot support modern high-speed requirements

File Access: Sequential access only

Uses:

Batch processing
Backing up data on magnetic tape
Bank transactions

Sequential Files:

Records ordered using a key field
Key field values must be unique and ordered
More efficient than serial files due to data integrity
New records must be added in correct position

File Access Methods:

Sequential Access: Read key field values until required value found
Direct Access: Use index to look up address of record location

Random Files:

Records stored randomly
Accessed directly using hashing algorithm
Uses hashing on record's key field to calculate address
Well suited for magnetic and optical disks

Advantages:

Quick retrieval of records
Records may vary in size

1.2.3 Hashing Algorithms

Definition: Takes the key field as input and outputs a value for the record's position relative to the file's start.

Example:

<TEXT>

If key field is numeric: Divide by suitable large number and use remainder
Position = Key MOD FileSize

Handling Collisions: When two different keys produce the same position, use the next available position in the file.

File Access:

Value in key field submitted to hashing algorithm
Algorithm provides position in file
May require short linear search due to collisions

1.3 FLOATING-POINT NUMBERS

1.3.1 Floating-Point Representation

Definition: The approximate representation of a real number using binary digits.

Format:

<TEXT>

Number = ±Mantissa × Base^Exponent

Mantissa: The non-zero part of the number
Exponent: The power to which the base is raised
Base: 2 (in binary floating-point)

1.3.2 Converting Denary to Floating-Point Binary

Steps:

Convert whole number part to binary
Add sign bit (0 for positive, 1 for negative)
Convert fractional part by multiplying by 2 and recording whole parts
Combine parts and adjust exponent
Normalise the number

Example: Converting 8.75 to floating-point (8-bit: 4 for mantissa, 4 for exponent)

<TEXT>

Step 1: Convert whole number
8 = 1000

Step 2: Convert fractional part
0.75 × 2 = 1.5 → 1
0.5 × 2 = 1.0 → 1
0.0 × 2 = 0 → 0
0.75 = 0.11 (binary)

Step 3: Combine
8.75 = 1000.11

Step 4: Normalise
1000.11 = 0.100011 × 2^4

Step 5: Represent
Mantissa: 10001100 (0.100011 with padding)
Exponent: 0100 (4 in 4-bit two's complement)
Sign: 0

Final: 0 10001100 0100
(Sign) (Mantissa) (Exponent)

1.3.3 Normalisation

Purpose:

Maximise precision using all available bits in mantissa
Ensure most significant bits are different (0 1 for positive, 1 0 for negative)
Avoid multiple representations of the same number

Process:

For positive numbers: Shift left until first bit after binary point is 1
For negative numbers: Shift left until first two bits are different (10)
Each shift left decreases exponent by 1

1.3.4 Problems with Floating-Point Numbers

Rounding Errors:
- Binary representation of fractions is often approximate
- Can become significant after multiple calculations
Overflow:
- Result too large to represent
- Occurs when exponent exceeds maximum
Underflow:
- Result too small to represent
- May be rounded to zero
Inability to Store Zero:
- Normalised form requires mantissa to be 0.1 or 1.0
- Zero must be handled as special case

1.3.5 Precision vs Range

Increase	Effect
Mantissa bits	Better precision
Exponent bits	Larger range

Trade-off: Must balance precision and range based on application needs