Session 4: Data Types
A. Introduction
- A data type defines a collection of data objects and a set of predefined operations on those objects.
- A descriptor is the collection of the attributes of a variable.
- An object represents an instance of a user-defined (abstract data) type.
B. Primitive Data Type
- Almost all programming languages provide a set of primitive data types.
- Primitive data types: Those not defined in terms of other data types.
- Some primitive data types are merely reflections of the hardware.
- Others require only a little non-hardware support for their implementation.
- Primitive data types:
- Integer
- Floating Point.
- Double Floating Point.
- Valueless
- Wide Character.
- Boolean
- Character
C. Character String Types
- Values are sequences of characters.
- Typical operations:
- Assignment and copying.
- Comparison (=, >, etc.).
- Catenation
- Substring reference.
- Pattern matching.
D. User-Defined Ordinal Types
- An ordinal type is one in which the range of possible values can be easily associated with the set of positive integers
- Enumeration Types:
- All possible values, which are named constants, are provided in the definition.
- C++ Example:
- enum days {mon, tue, wed, thu, fri, sat, sun};
- Evaluation of Enumerated Type
- Aid to readability, e.g., no need to code a color as a number.
- Aid to reliability, e.g., compiler can check:
- Operations (don’t allow colors to be added).
- No enumeration variable can be assigned a value outside its defined range.
- Ada, C#, and Java 5.0 provide better support for enumeration than C++ because enumeration type variables in these languages are not coerced into integer types.
- Implementation of User-Defined Ordinal Types
- Enumeration types are implemented as integers.
- Subrange types are implemented like the parent types with code inserted (by the compiler) to restrict assignments to subrange variables.
E. Array Types
- An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element.
- Subscript Binding and Array Categories
- C and C++ arrays that include static modifier are static.
- C and C++ arrays without static modifier are fixed stack-dynamic.
- C and C++ provide fixed heap-dynamic arrays.
- Array Initialization:
- C, C++, Java, C# example:
int list [] = {4, 5, 7, 83}
- Character strings in C and C++:
char name [] = ″freddie″;
- Arrays of strings in C and C++:
char *names [] = {″Bob″, ″Jake″, ″Joe″];
- Heterogeneous Arrays
- A heterogeneous array is one in which the elements need not be of the same type
- Array Initialization:
- int list [] = {1, 3, 5, 7};
- char *names [] = {″Mike″, ″Fred″, ″Mary Lou″};
- Slices
- A slice is some substructure of an array; nothing more than a referencing mechanism.
- Slices are only useful in languages that have array operations.
Example: #include <iostream> // std::cout
#include <cstddef> // std::size_t
#include <valarray> // std::valarray, std::slice
int main ()
{
std::valarray<int> foo (12);
for (int i=0; i<12; ++i) foo[i]=i;
std::valarray<int> bar = foo[std::slice(2,3,4)];
std::cout << “slice(2,3,4):”;
for (std::size_t n=0; n<bar.size(); n++)
std::cout << ‘ ‘ << bar[n];
std::cout << ‘\n’;
return 0;
}
- Implementation Of Arrays
- Access function maps subscript expressions to an address in the array.
- Access function for single-dimensioned arrays:
- Access function maps subscript expressions to an address in the array.
address(list[k]) = address (list[lower_bound])
+ ((k-lower_bound) * element_size)
- Accessing Multi-Dimensioned Arrays
- Two common ways:
- Row major order (by rows) – used in most languages
- Column major order (by columns) – used in Fortran
- A compile-time descriptor for a multidimensional array
- Compile-Time Descriptors
Single-Dimensioned array Multidimensional array
F. Associative Arrays
- An associative array is an unordered collection of data elements that are indexed by an equal number of values called
- Example:
#include <iostream>
#include <map>
#include <string>
#include <algorithm>
#include <cstdlib>
#include <iomanip>
#include <vector>
using namespace std;
int main()
{
map<string, map<string, vector<string> > > data;
data[“plants”][“trees”].push_back(“Elm”);
data[“plants”][“trees”].push_back(“Pine”);
data[“plants”][“trees”].push_back(“Ash”);
data[“plants”][“trees”].push_back(“Birch”);
data[“plants”][“trees”].push_back(“Ceder”);
vector<string>::iterator it;
for(
it = data[“plants”][“trees”].begin();
it != data[“plants”][“trees”].end();
it++)
{
cout << “Kind of tree: ” << *it << endl;
}
return 0;
}
G. Record Types
- A record is a possibly heterogeneous aggregate of data elements in which the individual elements are identified by names
- Example:
Student (idnumber, name, status, credits, gpa)
enum StatusType {FULLTIME, PARTTIME, NONDEGREE, GRAD}
struct StudentType
{
int idNumber;
char name[30];
StatusType status;
float credits;
float gpa;
};
StudentType student1;
StudentType student2;
§ Implementation Of Record Type
- Offset address relative to the beginning of the records is associated with each field
H. Tuple Types
§ A tuple is a data type that is similar to a record, except that the elements are not named
I. List Types
§ Lists in LISP and Scheme are delimited by parentheses and use no commas
(A B C D) and (A (B C) D)
§ Data and code have the same form
As data, (A B C) is literally what it is
As code, (A B C) is the function A applied to the parameters B and C
§ The interpreter needs to know which a list is, so if it is data, we quote it with an apostrophe. ′(A B C) is data
§ List Operations in Scheme
- CAR returns the first element of its list parameter (CAR ′(A B C)) returns A
- CDR returns the remainder of its list parameter after the first element has been removed(CDR ′(A B C)) returns (B C)
- CONS puts its first parameter into its second parameter, a list, to make a new list(CONS ′A (B C)) returns (A B C)
- LIST returns a new list of its parameters(LIST ′A ′B ′(C D)) returns (A B (C D))
J. Union Types
§ A union is a type whose variables are allowed to store different type values at different times during execution
§ Implementation of Unions :
type Node (Tag : Boolean) is
record
case Tag is
when True => Count : Integer;
when False => Sum : Float;
end case;
end record;
K. Pointer and Reference Types
§ A pointer type variable has a range of values that consists of memory addresses and a special value, nil.
§ Provide the power of indirect addressing.
§ Provide a way to manage dynamic memory.
§ A pointer can be used to access a location in the area where storage is dynamically created (usually called a heap).
§ Pointer Operations:
- Two fundamental operations: assignment and dereferencing.Ø Assignment is used to set a pointer variable’s value to some useful address.
- Dereferencing yields the value stored at the location represented by the pointer’s valueü
- Dereferencing can be explicit or implicit
- C++ uses an explicit operation via*
j = *ptr sets j to the value located at ptr
The assignment operation j = *ptr
§ Problems with Pointers
- Dangling pointers (dangerous)
- A pointer points to a heap-dynamic variable that has been deallocated
- Lost heap-dynamic variable
- An allocated heap-dynamic variable that is no longer accessible to the user program (often called garbage)
§ Pointers in C and C++
- Extremely flexible but must be used with care
- Pointers can point at any variable regardless of when or where it was allocatedØ Used for dynamic storage management and addressing
- Pointer arithmetic is possible
- Explicit dereferencing and address-of operators
- Domain type need not be fixed (void *)
void * can point to any type and can be type checked (cannot be de-referenced)
§ Pointer Arithmetic in C and C++
float stuff[100];
float *p;
p = stuff;
*(p+5) is equivalent to stuff[5] and p[5]
*(p+i) is equivalent to stuff[i] and p[i]
§ Reference Types
- C++ includes a special kind of pointer type called a reference type that is used primarily for formal parameters
- Java extends C++’s reference variables and allows them to replace pointers entirely
- C# includes both the references of Java and the pointers of C++
§ Representations of Pointers
- Large computers use single values
- Intel microprocessors use segment and offset
§ Dangling Pointer Problem
- Tombstone: extra heap cell that is a pointer to the heap-dynamic variable
- The actual pointer variable points only at tombstones
- When heap-dynamic variable de-allocated, tombstone remains but set to nil
- Costly in time and space
- Locks-and-keys: Pointer values are represented as (key, address) pairs
- Heap-dynamic variables are represented as variable plus cell for integer lock value
- When heap-dynamic variable allocated, lock value is created and placed in lock cell and key cell of pointer
§ Heap Management
- A very complex run-time process
- Single-size cells vs. variable-size cells
- Two approaches to reclaim garbage
- Reference counters (eager approach): reclamation is gradual
- Mark-sweep (lazy approach): reclamation occurs when the list of variable space becomes empty
§ Reference Counter
- Reference counters: maintain a counter in every cell that store the number of pointers currently pointing at the cell
- Disadvantages: space required, execution time required, complications for cells connected circularly
- Advantage: it is intrinsically incremental, so significant delays in the application execution are avoided
L. Type Checking
§ Generalize the concept of operands and operators to include subprograms and assignments
§ Type checking is the activity of ensuring that the operands of an operator are of compatible types
§ A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler- generated code, to a legal type. This automatic conversion is called a coercion.
§ A type error is the application of an operator to an operand of an inappropriate type
§ If all type bindings are static, nearly all type checking can be static
§ If type bindings are dynamic, type checking must be dynamic
§ A programming language is strongly typed if type errors are always detected
§ Advantage of strong typing: allows the detection of the misuses of variables that result in type errors
M. Strong Typing
Language examples:
- C and C++ are not: parameter type checking can be avoided; unions are not type checked
- Ada is, almost (UNCHECKED CONVERSION is loophole)(Java and C# are similar to Ada)
§ Coercion rules strongly affect strong typing–they can weaken it considerably (C++ versus Ada)
§ Although Java has just half the assignment coercions of C++, its strong typing is still far less effective than that of Ada
Source(s) :
http://jcsites.juniata.edu/faculty/rhodes/cs2/ch05a.htmhttp://www.dreamincode.net/forums/topic/67804-c-multidimensional-associative-arrays/http://www.cplusplus.com/reference/valarray/slice/Slide Binus Maya