CHAPTER 1: DATA REPRESENTATION 1.1 USER-DEFINED DATA TYPES 1.1.1 Introduction to User-Defined Data Types Definition: User-defined data types are data types designed by the programmer. When object-oriented programming is not being used, a programmer may choose to utilize user-defined data types for a large program as their use can reduce errors and make the program more understandable. Why Use User-Defined Data Types: Reduce errors in the program Make code more understandable Less restriction than built-in types Allows for inevitable user definition 1.1.2 Non-Composite User-Defined Data Types Enumerated Data Type: A non-composite user-defined data type that is a list of possible data values. The values defined have an implied order of values to allow comparisons. TYPE Season = (Summer, Winter, Autumn, Spring) DECLARE ThisSeason : Season DECLARE NextSeason : Season ThisSeason ← Autumn NextSeason ← ThisSeason + 1 // NextSeason is set to Spring Characteristics: Values are countable and finite Allow comparisons (value2 > value1) NOT string values (don't use quotes) Pointer Data Type: Used to reference a memory location. It may be used to construct dynamically varying data structures. The pointer definition has to relate to the type of the variable being pointed to. TYPE IntegerPointer = ^INTEGER DECLARE IPointer : IntegerPointer DECLARE MyInt1 : INTEGER DECLARE MyInt2 : INTEGER // Store address of MyInt2 in IPointer IPointer ← @MyInt2 // Access data at address pointed to by IPointer IPointer^ ← 33 // Store address of MyInt1 in SecondPointer SecondPointer ← @MyInt1 Pointer Notation: @identifier  - returns the address of identifier Pointer^  - accesses the data stored at the address (dereferencing) 1.1.3 Composite User-Defined Data Types Record Data Type: A data type that contains a fixed number of components that can be of different types. Allows the programmer to collect values with other data types together when these form a coherent whole. TYPE TEmployeeRecord DECLARE FirstName : STRING DECLARE LastName : STRING DECLARE Salary : REAL DECLARE Position : STRING ENDTYPE DECLARE Employee1 : TEmployeeRecord Employee1.FirstName ← "John" Employee1.LastName ← "Doe" Employee1.Salary ← 2830.80 Employee1.Position ← "Project Manager" Set Data Type: Allows a program to create sets and to apply mathematical operations defined in set theory. All elements in the set should be unique. TYPE Days = SET OF STRING DEFINE Today(Monday, Tuesday, Wednesday, Thursday, Friday) : Days Operations on Sets: Union Difference Intersection Include an element Exclude an element Check whether an element is in a set Class (in OOP): In object-oriented programming, a program defines the classes to be used. Then, for each class, the objects must be defined. A Class includes variables and methods (functions or procedures that an object can run). 1.2 FILE ORGANISATION AND ACCESS 1.2.1 Types of Files Text Files: Contains data stored according to a defined character code (ASCII or Unicode) Can be created using a text editor Data appears as readable characters Binary Files: Designed for storing data to be used by a computer program Stores data in its internal representation Created using specific programs Structure: File → Records → Fields → Values 1.2.2 Methods of File Organisation Serial Files: Records have no defined order Stored one after another in the order they were added New records added at the end of the file No end of record character (must have defined format) Advantages: Simple task Low cost Disadvantages: Difficult to access specific records Must read all preceding records Cannot support modern high-speed requirements File Access: Sequential access only Uses: Batch processing Backing up data on magnetic tape Bank transactions Sequential Files: Records ordered using a key field Key field values must be unique and ordered More efficient than serial files due to data integrity New records must be added in correct position File Access Methods: Sequential Access:  Read key field values until required value found Direct Access:  Use index to look up address of record location Random Files: Records stored randomly Accessed directly using hashing algorithm Uses hashing on record's key field to calculate address Well suited for magnetic and optical disks Advantages: Quick retrieval of records Records may vary in size 1.2.3 Hashing Algorithms Definition: Takes the key field as input and outputs a value for the record's position relative to the file's start. Example: If key field is numeric: Divide by suitable large number and use remainder Position = Key MOD FileSize Handling Collisions: When two different keys produce the same position, use the next available position in the file. File Access: Value in key field submitted to hashing algorithm Algorithm provides position in file May require short linear search due to collisions 1.3 FLOATING-POINT NUMBERS 1.3.1 Floating-Point Representation Definition: The approximate representation of a real number using binary digits. Format: Number = ±Mantissa × Base^Exponent Mantissa:  The non-zero part of the number Exponent:  The power to which the base is raised Base:  2 (in binary floating-point) 1.3.2 Converting Denary to Floating-Point Binary Steps: Convert whole number part to binary Add sign bit (0 for positive, 1 for negative) Convert fractional part by multiplying by 2 and recording whole parts Combine parts and adjust exponent Normalise the number Example: Converting 8.75 to floating-point (8-bit: 4 for mantissa, 4 for exponent) Step 1: Convert whole number 8 = 1000 Step 2: Convert fractional part 0.75 × 2 = 1.5 → 1 0.5 × 2 = 1.0 → 1 0.0 × 2 = 0 → 0 0.75 = 0.11 (binary) Step 3: Combine 8.75 = 1000.11 Step 4: Normalise 1000.11 = 0.100011 × 2^4 Step 5: Represent Mantissa: 10001100 (0.100011 with padding) Exponent: 0100 (4 in 4-bit two's complement) Sign: 0 Final: 0 10001100 0100 (Sign) (Mantissa) (Exponent) 1.3.3 Normalisation Purpose: Maximise precision using all available bits in mantissa Ensure most significant bits are different (0 1 for positive, 1 0 for negative) Avoid multiple representations of the same number Process: For positive numbers: Shift left until first bit after binary point is 1 For negative numbers: Shift left until first two bits are different (10) Each shift left decreases exponent by 1 1.3.4 Problems with Floating-Point Numbers Rounding Errors: Binary representation of fractions is often approximate Can become significant after multiple calculations Overflow: Result too large to represent Occurs when exponent exceeds maximum Underflow: Result too small to represent May be rounded to zero Inability to Store Zero: Normalised form requires mantissa to be 0.1 or 1.0 Zero must be handled as special case 1.3.5 Precision vs Range Increase Effect Mantissa bits Better precision Exponent bits Larger range Trade-off: Must balance precision and range based on application needs