CHAPTER 1: DATA REPRESENTATION

1.1 USER-DEFINED DATA TYPES 

 1.1.1 Introduction to User-Defined Data Types 

 Definition: User-defined data types are data types designed by the programmer. When object-oriented programming is not being used, a programmer may choose to utilize user-defined data types for a large program as their use can reduce errors and make the program more understandable. 

 Why Use User-Defined Data Types: 

 

 Reduce errors in the program 

 Make code more understandable 

 Less restriction than built-in types 

 Allows for inevitable user definition 

 

 1.1.2 Non-Composite User-Defined Data Types 

 Enumerated Data Type: A non-composite user-defined data type that is a list of possible data values. The values defined have an implied order of values to allow comparisons. 

 

 <PSEUDOCODE> 

 

 

 

 

 

 

 

 

 

 TYPE Season = (Summer, Winter, Autumn, Spring) 

 

 

 

 

 

 DECLARE ThisSeason : Season 

 

 

 DECLARE NextSeason : Season 

 

 

 

 

 

 ThisSeason ← Autumn 

 

 

 NextSeason ← ThisSeason + 1 // NextSeason is set to Spring 

 

 

 

 

 

 

 

 Characteristics: 

 

 Values are countable and finite 

 Allow comparisons (value2 > value1) 

 NOT string values (don't use quotes) 

 

 Pointer Data Type: Used to reference a memory location. It may be used to construct dynamically varying data structures. The pointer definition has to relate to the type of the variable being pointed to. 

 

 <PSEUDOCODE> 

 

 

 

 

 

 

 

 

 

 TYPE IntegerPointer = ^INTEGER 

 

 

 

 

 

 DECLARE IPointer : IntegerPointer 

 

 

 DECLARE MyInt1 : INTEGER 

 

 

 DECLARE MyInt2 : INTEGER 

 

 

 

 

 

 // Store address of MyInt2 in IPointer 

 

 

 IPointer ← @MyInt2 

 

 

 

 

 

 // Access data at address pointed to by IPointer 

 

 

 IPointer^ ← 33 

 

 

 

 

 

 // Store address of MyInt1 in SecondPointer 

 

 

 SecondPointer ← @MyInt1 

 

 

 

 

 

 

 

 Pointer Notation: 

 

 @identifier  - returns the address of identifier 

 Pointer^  - accesses the data stored at the address (dereferencing) 

 

 1.1.3 Composite User-Defined Data Types 

 Record Data Type: A data type that contains a fixed number of components that can be of different types. Allows the programmer to collect values with other data types together when these form a coherent whole. 

 

 <PSEUDOCODE> 

 

 

 

 

 

 

 

 

 

 TYPE TEmployeeRecord 

 

 

 DECLARE FirstName : STRING 

 

 

 DECLARE LastName : STRING 

 

 

 DECLARE Salary : REAL 

 

 

 DECLARE Position : STRING 

 

 

 ENDTYPE 

 

 

 

 

 

 DECLARE Employee1 : TEmployeeRecord 

 

 

 

 

 

 Employee1.FirstName ← "John" 

 

 

 Employee1.LastName ← "Doe" 

 

 

 Employee1.Salary ← 2830.80 

 

 

 Employee1.Position ← "Project Manager" 

 

 

 

 

 

 

 

 Set Data Type: Allows a program to create sets and to apply mathematical operations defined in set theory. All elements in the set should be unique. 

 

 <PSEUDOCODE> 

 

 

 

 

 

 

 

 

 

 TYPE Days = SET OF STRING 

 

 

 DEFINE Today(Monday, Tuesday, Wednesday, Thursday, Friday) : Days 

 

 

 

 

 

 

 

 Operations on Sets: 

 

 Union 

 Difference 

 Intersection 

 Include an element 

 Exclude an element 

 Check whether an element is in a set 

 

 Class (in OOP): In object-oriented programming, a program defines the classes to be used. Then, for each class, the objects must be defined. A Class includes variables and methods (functions or procedures that an object can run). 

 

 1.2 FILE ORGANISATION AND ACCESS 

 1.2.1 Types of Files 

 Text Files: 

 

 Contains data stored according to a defined character code (ASCII or Unicode) 

 Can be created using a text editor 

 Data appears as readable characters 

 

 Binary Files: 

 

 Designed for storing data to be used by a computer program 

 Stores data in its internal representation 

 Created using specific programs 

 Structure: File → Records → Fields → Values 

 

 1.2.2 Methods of File Organisation 

 Serial Files: 

 

 Records have no defined order 

 Stored one after another in the order they were added 

 New records added at the end of the file 

 No end of record character (must have defined format) 

 

 Advantages: 

 

 Simple task 

 Low cost 

 

 Disadvantages: 

 

 Difficult to access specific records 

 Must read all preceding records 

 Cannot support modern high-speed requirements 

 

 File Access: Sequential access only 

 Uses: 

 

 Batch processing 

 Backing up data on magnetic tape 

 Bank transactions 

 

 Sequential Files: 

 

 Records ordered using a key field 

 Key field values must be unique and ordered 

 More efficient than serial files due to data integrity 

 New records must be added in correct position 

 

 File Access Methods: 

 

 Sequential Access:  Read key field values until required value found 

 Direct Access:  Use index to look up address of record location 

 

 Random Files: 

 

 Records stored randomly 

 Accessed directly using hashing algorithm 

 Uses hashing on record's key field to calculate address 

 Well suited for magnetic and optical disks 

 

 Advantages: 

 

 Quick retrieval of records 

 Records may vary in size 

 

 1.2.3 Hashing Algorithms 

 Definition: Takes the key field as input and outputs a value for the record's position relative to the file's start. 

 Example: 

 

 <TEXT> 

 

 

 

 

 

 

 

 

 

 If key field is numeric: Divide by suitable large number and use remainder 

 

 

 Position = Key MOD FileSize 

 

 

 

 

 

 

 

 Handling Collisions: When two different keys produce the same position, use the next available position in the file. 

 File Access: 

 

 Value in key field submitted to hashing algorithm 

 Algorithm provides position in file 

 May require short linear search due to collisions 

 

 

 1.3 FLOATING-POINT NUMBERS 

 1.3.1 Floating-Point Representation 

 Definition: The approximate representation of a real number using binary digits. 

 Format: 

 

 <TEXT> 

 

 

 

 

 

 

 

 

 

 Number = ±Mantissa × Base^Exponent 

 

 

 

 

 

 

 

 

 Mantissa:  The non-zero part of the number 

 Exponent:  The power to which the base is raised 

 Base:  2 (in binary floating-point) 

 

 1.3.2 Converting Denary to Floating-Point Binary 

 Steps: 

 

 Convert whole number part to binary 

 Add sign bit (0 for positive, 1 for negative) 

 Convert fractional part by multiplying by 2 and recording whole parts 

 Combine parts and adjust exponent 

 Normalise the number 

 

 Example: Converting 8.75 to floating-point (8-bit: 4 for mantissa, 4 for exponent) 

 

 <TEXT> 

 

 

 

 

 

 

 

 

 

 Step 1: Convert whole number 

 

 

 8 = 1000 

 

 

 

 

 

 Step 2: Convert fractional part 

 

 

 0.75 × 2 = 1.5 → 1 

 

 

 0.5 × 2 = 1.0 → 1 

 

 

 0.0 × 2 = 0 → 0 

 

 

 0.75 = 0.11 (binary) 

 

 

 

 

 

 Step 3: Combine 

 

 

 8.75 = 1000.11 

 

 

 

 

 

 Step 4: Normalise 

 

 

 1000.11 = 0.100011 × 2^4 

 

 

 

 

 

 Step 5: Represent 

 

 

 Mantissa: 10001100 (0.100011 with padding) 

 

 

 Exponent: 0100 (4 in 4-bit two's complement) 

 

 

 Sign: 0 

 

 

 

 

 

 Final: 0 10001100 0100 

 

 

 (Sign) (Mantissa) (Exponent) 

 

 

 

 

 

 

 

 1.3.3 Normalisation 

 Purpose: 

 

 Maximise precision using all available bits in mantissa 

 Ensure most significant bits are different (0 1 for positive, 1 0 for negative) 

 Avoid multiple representations of the same number 

 

 Process: 

 

 For positive numbers: Shift left until first bit after binary point is 1 

 For negative numbers: Shift left until first two bits are different (10) 

 Each shift left decreases exponent by 1 

 

 1.3.4 Problems with Floating-Point Numbers 

 

 

 Rounding Errors: 

 

 Binary representation of fractions is often approximate 

 Can become significant after multiple calculations 

 

 

 

 Overflow: 

 

 Result too large to represent 

 Occurs when exponent exceeds maximum 

 

 

 

 Underflow: 

 

 Result too small to represent 

 May be rounded to zero 

 

 

 

 Inability to Store Zero: 

 

 Normalised form requires mantissa to be 0.1 or 1.0 

 Zero must be handled as special case 

 

 

 

 1.3.5 Precision vs Range 

 

 

 

 

 Increase 

 Effect 

 

 

 

 

 Mantissa bits 

 Better precision 

 

 

 Exponent bits 

 Larger range 

 

 

 

 

 

 Trade-off: Must balance precision and range based on application needs