www.digitalmars.com

D Programming Language 2.0


Last update Fri May 6 14:52:57 2011

Arrays

There are four kinds of arrays:

Kinds of Arrays
Syntax Description
type* Pointers to data
type[integer] Static arrays
type[] Dynamic arrays
type[type] Associative arrays

Pointers

int* p;

These are simple pointers to data, analogous to C pointers. Pointers are provided for interfacing with C and for specialized systems work. There is no length associated with it, and so there is no way for the compiler or runtime to do bounds checking, etc., on it. Most conventional uses for pointers can be replaced with dynamic arrays, out and ref parameters, and reference types.

Static Arrays

int[3] s;

These are analogous to C arrays. Static arrays are distinguished by having a length fixed at compile time.

The total size of a static array cannot exceed 16Mb. A dynamic array should be used instead for such large arrays.

A static array with a dimension of 0 is allowed, but no space is allocated for it. It's useful as the last member of a variable length struct, or as the degenerate case of a template expansion.

Static arrays are value types. Unlike in C and D version 1, static arrays are passed to functions by value. Static arrays can also be returned by functions.

Dynamic Arrays

int[] a;

Dynamic arrays consist of a length and a pointer to the array data. Multiple dynamic arrays can share all or parts of the array data.

Array Declarations

There are two ways to declare arrays, prefix and postfix. The prefix form is the preferred method, especially for non-trivial types.

Prefix Array Declarations

Prefix declarations appear before the identifier being declared and read right to left, so:

int[] a;	// dynamic array of ints
int[4][3] b;	// array of 3 arrays of 4 ints each
int[][5] c;	// array of 5 dynamic arrays of ints.
int*[]*[3] d;	// array of 3 pointers to dynamic arrays of pointers to ints
int[]* e;	// pointer to dynamic array of ints

Postfix Array Declarations

Postfix declarations appear after the identifier being declared and read left to right. Each group lists equivalent declarations:

// dynamic array of ints
int[] a;
int a[];

// array of 3 arrays of 4 ints each
int[4][3] b;
int[4] b[3];
int b[3][4];

// array of 5 dynamic arrays of ints.
int[][5] c;
int[] c[5];
int c[5][];

// array of 3 pointers to dynamic arrays of pointers to ints
int*[]*[3] d;
int*[]* d[3];
int* (*d[3])[];

// pointer to dynamic array of ints
int[]* e;
int (*e)[];

Rationale: The postfix form matches the way arrays are declared in C and C++, and supporting this form provides an easy migration path for programmers used to it.

Usage

There are two broad kinds of operations to do on an array - affecting the handle to the array, and affecting the contents of the array. C only has operators to affect the handle. In D, both are accessible.

The handle to an array is specified by naming the array, as in p, s or a:

int* p;
int[3] s;
int[] a;

int* q;
int[3] t;
int[] b;

p = q;		// p points to the same thing q does.
p = s.ptr;	// p points to the first element of the array s.
p = a.ptr;	// p points to the first element of the array a.

s = ...;	// error, since s is a compiled in static
		// reference to an array.

a = p;		// error, since the length of the array pointed
		// to by p is unknown
a = s;		// a is initialized to point to the s array
a = b;		// a points to the same array as b does

Slicing

Slicing an array means to specify a subarray of it. An array slice does not copy the data, it is only another reference to it. For example:

int[10] a;	// declare array of 10 ints
int[] b;

b = a[1..3];	// a[1..3] is a 2 element array consisting of
		// a[1] and a[2]
foo(b[1]);	// equivalent to foo(0)
a[2] = 3;
foo(b[1]);	// equivalent to foo(3)

The [] is shorthand for a slice of the entire array. For example, the assignments to b:

int[10] a;
int[] b;

b = a;
b = a[];
b = a[0 .. a.length];

are all semantically equivalent.

Slicing is not only handy for referring to parts of other arrays, but for converting pointers into bounds-checked arrays:

int* p;
int[] b = p[0..8];

Array Copying

When the slice operator appears as the lvalue of an assignment expression, it means that the contents of the array are the target of the assignment rather than a reference to the array. Array copying happens when the lvalue is a slice, and the rvalue is an array of or pointer to the same type.

int[3] s;
int[3] t;

s[] = t;		// the 3 elements of t[3] are copied into s[3]
s[] = t[];		// the 3 elements of t[3] are copied into s[3]
s[1..2] = t[0..1];	// same as s[1] = t[0]
s[0..2] = t[1..3];	// same as s[0] = t[1], s[1] = t[2]
s[0..4] = t[0..4];	// error, only 3 elements in s
s[0..2] = t;		// error, different lengths for lvalue and rvalue

Overlapping copies are an error:

s[0..2] = s[1..3];	// error, overlapping copy
s[1..3] = s[0..2];	// error, overlapping copy

Disallowing overlapping makes it possible for more aggressive parallel code optimizations than possible with the serial semantics of C.

Array Setting

If a slice operator appears as the lvalue of an assignment expression, and the type of the rvalue is the same as the element type of the lvalue, then the lvalue's array contents are set to the rvalue.

int[3] s;
int* p;

s[] = 3;		// same as s[0] = 3, s[1] = 3, s[2] = 3
p[0..2] = 3;		// same as p[0] = 3, p[1] = 3

Array Concatenation

The binary operator ~ is the cat operator. It is used to concatenate arrays:

int[] a;
int[] b;
int[] c;

a = b ~ c;	// Create an array from the concatenation of the
		// b and c arrays

Many languages overload the + operator to mean concatenation. This confusingly leads to, does:

"10" + 3 + 4

produce the number 17, the string "1034" or the string "107" as the result? It isn't obvious, and the language designers wind up carefully writing rules to disambiguate it - rules that get incorrectly implemented, overlooked, forgotten, and ignored. It's much better to have + mean addition, and a separate operator to be array concatenation.

Similarly, the ~= operator means append, as in:

a ~= b;		// a becomes the concatenation of a and b

Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array, so:

a = b;			// a refers to b
a = b ~ c[0..0];	// a refers to a copy of b

Appending does not always create a copy, see setting dynamic array length for details.

Array Operations

Many array operations, also known as vector operations, can be expressed at a high level rather than as a loop. For example, the loop:

T[] a, b;
...
for (size_t i = 0; i < a.length; i++)
    a[i] = b[i] + 4;

assigns to the elements of a the elements of b with 4 added to each. This can also be expressed in vector notation as:

T[] a, b;
...
a[] = b[] + 4;

A vector operation is indicated by the slice operator appearing as the lvalue of an =, +=, -=, *=, /=, %=, ^=, &= or |= operator. The rvalue can be an expression consisting either of an array slice of the same length and type as the lvalue or an expression of the element type of the lvalue, in any combination. The operators supported for vector operations are the binary operators +, -, *, /, %, ^, & and |, and the unary operators - and ~.

The lvalue slice and any rvalue slices must not overlap. The vector assignment operators are evaluated right to left, and the other binary operators are evaluated left to right. All operands are evaluated exactly once, even if the array slice has zero elements in it.

The order in which the array elements are computed is implementation defined, and may even occur in parallel. An application must not depend on this order.

Implementation note: many of the more common vector operations are expected to take advantage of any vector math instructions available on the target computer.

Pointer Arithmetic

int[3] abc;			// static array of 3 ints
int[] def = [ 1, 2, 3 ];	// dynamic array of 3 ints

void dibb(int* array)
{
	array[2];		// means same thing as *(array + 2)
	*(array + 2);		// get 3rd element
}

void diss(int[] array)
{
	array[2];		// ok
	*(array + 2);		// error, array is not a pointer
}

void ditt(int[3] array)
{
	array[2];		// ok
	*(array + 2);		// error, array is not a pointer
}

Rectangular Arrays

Experienced FORTRAN numerics programmers know that multidimensional "rectangular" arrays for things like matrix operations are much faster than trying to access them via pointers to pointers resulting from "array of pointers to array" semantics. For example, the D syntax:

double[][] matrix;

declares matrix as an array of pointers to arrays. (Dynamic arrays are implemented as pointers to the array data.) Since the arrays can have varying sizes (being dynamically sized), this is sometimes called "jagged" arrays. Even worse for optimizing the code, the array rows can sometimes point to each other! Fortunately, D static arrays, while using the same syntax, are implemented as a fixed rectangular layout:

double[3][3] matrix;

declares a rectangular matrix with 3 rows and 3 columns, all contiguously in memory. In other languages, this would be called a multidimensional array and be declared as:

double matrix[3,3];

Array Length

Within the [ ] of a static or a dynamic array, the symbol $ represents the length of the array.

int[4] foo;
int[]  bar = foo;
int*   p = &foo[0];

// These expressions are equivalent:
bar[]
bar[0 .. 4]
bar[0 .. $]
bar[0 .. bar.length]


p[0 .. $]		// '$' is not defined, since p is not an array
bar[0]+$		// '$' is not defined, out of scope of [ ]

bar[$-1]	// retrieves last element of the array

Array Properties

Static array properties are:

Static Array Properties
Property Description
.init Returns an array literal with each element of the literal being the .init property of the array element type.
.sizeof Returns the array length multiplied by the number of bytes per array element.
.length Returns the number of elements in the array. This is a fixed quantity for static arrays. It is of type size_t.
.ptr Returns a pointer to the first element of the array.
.dup Create a dynamic array of the same size and copy the contents of the array into it.
.idup Create a dynamic array of the same size and copy the contents of the array into it. The copy is typed as being immutable. D 2.0 only
.reverse Reverses in place the order of the elements in the array. Returns the array.
.sort Sorts in place the order of the elements in the array. Returns the array.

Dynamic array properties are:

Dynamic Array Properties
Property Description
.init Returns null.
.sizeof Returns the size of the dynamic array reference, which is 8 on 32 bit machines.
.length Get/set number of elements in the array. It is of type size_t.
.ptr Returns a pointer to the first element of the array.
.dup Create a dynamic array of the same size and copy the contents of the array into it.
.idup Create a dynamic array of the same size and copy the contents of the array into it. The copy is typed as being immutable. D 2.0 only
.reverse Reverses in place the order of the elements in the array. Returns the array.
.sort Sorts in place the order of the elements in the array. Returns the array.

For the .sort property to work on arrays of class objects, the class definition must define the function: int opCmp(Object). This is used to determine the ordering of the class objects. Note that the parameter is of type Object, not the type of the class.

For the .sort property to work on arrays of structs or unions, the struct or union definition must define the function: int opCmp(S) or int opCmp(S*). The type S is the type of the struct or union. This function will determine the sort ordering.

Examples:

int* p;
int[3] s;
int[] a;

p.length;	// error, length not known for pointer
s.length;	// compile time constant 3
a.length;	// runtime value

p.dup;  	// error, length not known
s.dup;		// creates an array of 3 elements, copies
		// elements s into it
a.dup;		// creates an array of a.length elements, copies
		// elements of a into it

Setting Dynamic Array Length

The .length property of a dynamic array can be set as the lvalue of an = operator:

array.length = 7;

This causes the array to be reallocated in place, and the existing contents copied over to the new array. If the new array length is shorter, the array is not reallocated, and no data is copied. It is equivalent to slicing the array:

array = array[0..7];
If the new array length is longer, the remainder is filled out with the default initializer.

To maximize efficiency, the runtime always tries to resize the array in place to avoid extra copying. It will always do a copy if the new size is larger and the array was not allocated via the new operator or resizing in place would overwrite valid data in the array.

For example:
char[] a = new char[20];
char[] b = a[0..10];
char[] c = a[10..20];
char[] d = a;

b.length = 15;    // always reallocates because extending in place would
                  // overwrite other data in a.
b[11] = 'x';      // a[11] and c[1] are not affected

d.length = 1;
d.length = 20;    // also reallocates, because doing this will overwrite a and
                  // c

c.length = 12;    // may reallocate in place if space allows, because nothing
                  // was allocated after c.
c[5] = 'y';       // may affect contents of a, but not b or d because those
                  // were reallocated. 

a.length = 25;    // This always reallocates because if c extended in place,
                  // then extending a would overwrite c.  If c didn't
                  // reallocate in place, it means there was not enough space,
                  // which will still be true for a.
a[15] = 'z';      // does not affect c, because either a or c has reallocated.

To guarantee copying behavior, use the .dup property to ensure a unique array that can be resized. Also, one may use the phobos .capacity property to determine how many elements can be appended to the array without reallocating.

These issues also apply to appending arrays with the ~= operator. Concatenation using the ~ operator is not affected since it always reallocates.

Resizing a dynamic array is a relatively expensive operation. So, while the following method of filling an array:

int[] array;
while (1)
{   c = getinput();
    if (!c)
       break;
    array.length = array.length + 1;
    array[array.length - 1] = c;
}

will work, it will be inefficient. A more practical approach would be to minimize the number of resizes:

int[] array;
array.length = 100;        // guess
for (i = 0; 1; i++)
{   c = getinput();
     if (!c)
	break;
     if (i == array.length)
	array.length = array.length * 2;
     array[i] = c;
}
array.length = i;

Picking a good initial guess is an art, but you usually can pick a value covering 99% of the cases. For example, when gathering user input from the console - it's unlikely to be longer than 80.

Also, you may wish to utilize the phobos reserve function to pre-allocate array data to use with the append operator.

Functions as Array Properties

If the first parameter to a function is an array, the function can be called as if it were a property of the array:

int[] array;
void foo(int[] a, int x);

foo(array, 3);
array.foo(3);	// means the same thing

Array Bounds Checking

It is an error to index an array with an index that is less than 0 or greater than or equal to the array length. If an index is out of bounds, a RangeError exception is raised if detected at runtime, and an error if detected at compile time. A program may not rely on array bounds checking happening, for example, the following program is incorrect:

try
{
    for (i = 0; ; i++)
    {
	array[i] = 5;
    }
}
catch (ArrayBoundsError)
{
    // terminate loop
}
The loop is correctly written:
for (i = 0; i < array.length; i++)
{
    array[i] = 5;
}

Implementation Note: Compilers should attempt to detect array bounds errors at compile time, for example:

int[3] foo;
int x = foo[3];		// error, out of bounds

Insertion of array bounds checking code at runtime should be turned on and off with a compile time switch.

Array Initialization

Default Initialization

Void Initialization

Void initialization happens when the Initializer for an array is void. What it means is that no initialization is done, i.e. the contents of the array will be undefined. This is most useful as an efficiency optimization. Void initializations are an advanced technique and should only be used when profiling indicates that it matters.

Static Initialization of Static Arrays

Static initalizations are supplied by a list of array element values enclosed in [ ]. The values can be optionally preceded by an index and a :. If an index is not supplied, it is set to the previous index plus 1, or 0 if it is the first value.

int[3] a = [ 1:2, 3 ];		// a[0] = 0, a[1] = 2, a[2] = 3

This is most handy when the array indices are given by enums:

enum Color { red, blue, green };

int value[Color.max + 1] = [ Color.blue:6, Color.green:2, Color.red:5 ];

These arrays are static when they appear in global scope. Otherwise, they need to be marked with const or static storage classes to make them static arrays.

Special Array Types

Strings

A string is an array of characters. String literals are just an easy way to write character arrays. String literals are immutable (read only).

char[] str1 = "abc";                // error, "abc" is not mutable
char[] str2 = "abc".dup;            // ok, make mutable copy
immutable(char)[] str3 = "abc";     // ok
immutable(char)[] str4 = str1;      // error, str4 is not mutable
immutable(char)[] str5 = str1.idup; // ok, make immutable copy

The name string is aliased to immutable(char)[], so the above declarations could be equivalently written as:

char[] str1 = "abc";       // error, "abc" is not mutable
char[] str2 = "abc".dup;   // ok, make mutable copy
string str3 = "abc";       // ok
string str4 = str1;        // error, str4 is not mutable
string str5 = str1.idup;   // ok, make immutable copy

char[] strings are in UTF-8 format. wchar[] strings are in UTF-16 format. dchar[] strings are in UTF-32 format.

Strings can be copied, compared, concatenated, and appended:

str1 = str2;
if (str1 < str3) ...
func(str3 ~ str4);
str4 ~= str1;

with the obvious semantics. Any generated temporaries get cleaned up by the garbage collector (or by using alloca()). Not only that, this works with any array not just a special String array.

A pointer to a char can be generated:

char* p = &str[3];	// pointer to 4th element
char* p = str;		// pointer to 1st element

Since strings, however, are not 0 terminated in D, when transferring a pointer to a string to C, add a terminating 0:

str ~= "\0";

or use the function std.string.toStringz.

The type of a string is determined by the semantic phase of compilation. The type is one of: char[], wchar[], dchar[], and is determined by implicit conversion rules. If there are two equally applicable implicit conversions, the result is an error. To disambiguate these cases, a cast or a postfix of c, w or d can be used:


cast(immutable(wchar) [])"abc"	// this is an array of wchar characters
"abc"w			        // so is this

String literals that do not have a postfix character and that have not been cast can be implicitly converted between char[], wchar[], and dchar[] as necessary.

char c;
wchar w;
dchar d;

c = 'b';		// c is assigned the character 'b'
w = 'b';		// w is assigned the wchar character 'b'
w = 'bc';		// error - only one wchar character at a time
w = "b"[0];		// w is assigned the wchar character 'b'
w = "\r"[0];		// w is assigned the carriage return wchar character
d = 'd';		// d is assigned the character 'd'

C's printf() and Strings

printf() is a C function and is not part of D. printf() will print C strings, which are 0 terminated. There are two ways to use printf() with D strings. The first is to add a terminating 0, and cast the result to a char*:

str ~= "\0";
printf("the string is '%s'\n", cast(char*)str);

or:

import std.string;
printf("the string is '%s'\n", std.string.toStringz(str));

String literals already have a 0 appended to them, so can be used directly:

printf("the string is '%s'\n", cast(char*)"string literal");

So, why does the first string literal to printf not need the cast? The first parameter is prototyped as a char*, and a string literal can be implicitly cast to a char*. The rest of the arguments to printf, however, are variadic (specified by ...), and a string literal is passed as a (length,pointer) combination to variadic parameters.

The second way is to use the precision specifier. The way D arrays are laid out, the length comes first, so the following works:

printf("the string is '%.*s'\n", str);

The best way is to use std.stdio.writefln, which can handle D strings:

import std.stdio;
writefln("the string is '%s'", str);

Implicit Conversions

A pointer T* can be implicitly converted to one of the following:

A static array T[dim] can be implicitly converted to one of the following:

A dynamic array T[] can be implicitly converted to one of the following:

Where U is a base class of T.


Associative Arrays

Associative arrays have an index that is not necessarily an integer, and can be sparsely populated. The index for an associative array is called the key, and its type is called the KeyType.

Associative arrays are declared by placing the KeyType within the [] of an array declaration:

int[char[]] b;		// associative array b of ints that are
			// indexed by an array of characters.
			// The KeyType is char[]
b["hello"] = 3;		// set value associated with key "hello" to 3
func(b["hello"]);	// pass 3 as parameter to func()

Particular keys in an associative array can be removed with the remove function:

b.remove("hello");

The InExpression yields a pointer to the value if the key is in the associative array, or null if not:

int* p;
p = ("hello" in b);
if (p != null)
	...

KeyTypes cannot be functions or voids.

Using Classes as the KeyType

Classes can be used as the KeyType. For this to work, the class definition must override the following member functions of class Object:

hash_t is an alias to an integral type.

Note that the parameter to opCmp and opEquals is of type Object, not the type of the class in which it is defined.

For example:

class Foo
{
    int a, b;

    hash_t toHash() { return a + b; }

    bool opEquals(Object o)
    {	Foo foo = cast(Foo) o;
	return foo && a == foo.a && b == foo.b;
    }

    int opCmp(Object o)
    {	Foo foo = cast(Foo) o;
	if (!foo)
	    return -1;
	if (a == foo.a)
	    return b - foo.b;
	return a - foo.a;
    }
}

The implementation may use either opEquals or opCmp or both. Care should be taken so that the results of opEquals and opCmp are consistent with each other when the class objects are the same or not.

Using Structs or Unions as the KeyType

If the KeyType is a struct or union type, a default mechanism is used to compute the hash and comparisons of it based on the binary data within the struct value. A custom mechanism can be used by providing the following functions as struct members:

const hash_t toHash();
const bool opEquals(ref const KeyType s);
const int opCmp(ref const KeyType s);

For example:

import std.string;

struct MyString
{
    string str;


    const hash_t toHash()
    {   hash_t hash;
	foreach (char c; str)
	    hash = (hash * 9) + c;
	return hash;
    }

    const bool opEquals(ref const MyString s)
    {
	return std.string.cmp(this.str, s.str) == 0;
    }

    const int opCmp(ref const MyString s)
    {
	return std.string.cmp(this.str, s.str);
    }
}

The implementation may use either opEquals or opCmp or both. Care should be taken so that the results of opEquals and opCmp are consistent with each other when the struct/union objects are the same or not.

Properties

Properties for associative arrays are:
Associative Array Properties
Property Description
.sizeof Returns the size of the reference to the associative array; it is typically 8.
.length Returns number of values in the associative array. Unlike for dynamic arrays, it is read-only.
.keys Returns dynamic array, the elements of which are the keys in the associative array.
.values Returns dynamic array, the elements of which are the values in the associative array.
.rehash Reorganizes the associative array in place so that lookups are more efficient. rehash is effective when, for example, the program is done loading up a symbol table and now needs fast lookups in it. Returns a reference to the reorganized array.
.byKey() Returns a delegate suitable for use as an Aggregate to a ForeachStatement which will iterate over the keys of the associative array.
.byValue() Returns a delegate suitable for use as an Aggregate to a ForeachStatement which will iterate over the values of the associative array.
.get(Key key, lazy Value defaultValue) Looks up key; if it exists returns corresponding value else evaluates and returns defaultValue.

Associative Array Example: word count

import std.file;         // D file I/O
import std.stdio;

int main (string[] args)
{
    int word_total;
    int line_total;
    int char_total;
    int[char[]] dictionary;

    writefln("   lines   words   bytes file");
    for (int i = 1; i < args.length; ++i)      // program arguments
    {
	char[] input;            // input buffer
	int w_cnt, l_cnt, c_cnt; // word, line, char counts
	int inword;
	int wstart;

	// read file into input[]
	input = cast(char[])std.file.read(args[i]);

	foreach (j, char c; input)
	{
	    if (c == '\n')
		    ++l_cnt;
	    if (c >= '0' && c <= '9')
	    {
	    }
	    else if (c >= 'a' && c <= 'z' ||
		    c >= 'A' && c <= 'Z')
	    {
		if (!inword)
		{
		    wstart = j;
		    inword = 1;
		    ++w_cnt;
		}
	    }
	    else if (inword)
	    {
		char[] word = input[wstart .. j];
		dictionary[word]++;        // increment count for word
		inword = 0;
	    }
	    ++c_cnt;
	}
	if (inword)
	{
	    char[] word = input[wstart .. input.length];
	    dictionary[word]++;
	}
	writefln("%8d%8d%8d %s", l_cnt, w_cnt, c_cnt, args[i]);
	line_total += l_cnt;
	word_total += w_cnt;
	char_total += c_cnt;
    }

    if (args.length > 2)
    {
	writef("-------------------------------------\n%8d%8d%8d total",
		line_total, word_total, char_total);
    }

    writefln("-------------------------------------");
    foreach (word; dictionary.keys.sort)
    {
	writefln("%3d %s", dictionary[word], word);
    }
    return 0;
}




Forums | Comments |  D  | Search | Downloads | Home