Indivisible. An instruction sequence executed by a strand is atomic if it appears at any moment to any other strand as if either no instructions in the sequence have been executed or all instructions in the sequence have been executed.

Chip multiprocessor

A general-purpose multiprocessor implemented as a single multicore chip.


An extension to the C and C++ programming languages that allows a programmer to express task parallelism. Important milestones in the history of Cilk include Cilk-1, Cilk-5, Cilk++, Cilk Plus, and OpenCilk.

Cilk Plus

An important milestone in the history of Cilk developed by Intel Corporation, Cilk Plus extended Cilk++ with transparent interoperability with legacy C/C++ binary executables.


An important milestone in the history of Cilk developed by CilkArts, Inc., Cilk++ extended Cilk-5 to C++ and introduced reducer hyperobjects as an efficient means for resolving races on nonlocal variables.


The first important milestone in the history of Cilk, Cilk-1 was developed at MIT and provided provably efficient work-stealing runtime support but little linguistic support.


The next important milestone after Cilk-1 in the history of Cilk, Cilk-5 was developed at MIT and provided simple linguistic extensions to ANSI C for multithreading.


The Cilksan race detector is a tool provided in OpenCilk for finding race condition defects in Cilk code.


A keyword in the Cilk language that indicates a for loop whose iterations can be executed independently in parallel.


A keyword in the Cilk language that indicates that all spawned children within the scoped region must complete before proceeding.


A keyword in the Cilk language that indicates that the named subroutine can execute independently and in parallel with the caller.


A keyword in the Cilk language that indicates that all functions spawned within the current function must complete before statements following the cilk_sync can be executed.

Commutative operation

An operation over a type T is commutative if ab=ba for any two objects a and b of type T. Integer addition and set union are commutative, but string concatenation is not.

Concurrent agent

A processor, process, thread, strand, or other entity that executes a program instruction sequence in a computing environment containing other such entities.


A single processor unit of a multicore chip. The terms "processor" and "CPU" are often used in place of "core," although industry usage varies. Archaic: A solid-state memory made of magnetized toroidal memory elements.


Central Processing Unit. We use this term as a synonym for "core," or a single processor of a multicore chip.

Critical section

The code executed by a strand while holding a lock.

Critical-path length

See span.

Data race

A race condition that occurs when two or more parallel strands, holding no lock in common, access the same memory location and at least one of the strands performs a write. Compare with determinacy race.


A situation when two or more strands are each waiting for another to release a resource, and the "waiting-for" relation forms a cycle so that none can ever proceed.

Determinacy race

A race condition that occurs when two logically parallel strands access the same memory location and at least one strand performs a write.


The property of a program when it behaves identically from run to run when executed on the same inputs. Deterministic programs are usually easier to debug.

Distributed memory

Computer storage that is partitioned among several processors. A distributed-memory multiprocessor is a computer in which processors must send messages to remote processors to access data in remote processor memory. Contrast with shared memory.

Execution time

How long a program takes to execute on a given computer system. Also called running time.

Fake lock

A construct that Cilksan treats as a lock but which behaves like a no-op during actual running of the program. A fake lock can be used to suppress the reporting of an intentional race condition.

False sharing

The situation that occurs when two strands access different memory locations residing on the same cache block, thereby contending for the cache block. For more information, see the Wikipedia entry

Global variable

A variable that is bound outside of all local scopes. See also nonlocal variable.


A linguistic construct supported by the OpenCilk runtime system that allows many strands to coordinate in updating a shared variable or data structure independently by providing different views of the hyperobject to different strands at the same time. The reducer is the only hyperobject currently provided by OpenCilk.


A single operation executed by a processor.


A point at which the end of one strand meets the end of another. If a knot has one incoming strand and one outgoing strand, it is a serial knot. If it has one incoming strand and two outgoing strands, it is a spawn knot. If it has multiple incoming strands and one outgoing strand, it is a sync knot. A Cilk program does not produce serial knots or knots with both multiple incoming and multiple outgoing strands.

Linear speedup

Speedup proportional to the processor count. See also perfect linear speedup.


A synchronization mechanism for providing atomic operation by limiting concurrent access to a resource. Important operations on locks include acquire (lock) and release (unlock). Many locks are implemented as a mutex, whereby only one strand can hold the lock at any time.

Lock contention

The situation wherein multiple strands vie for the same lock.


A semiconductor chip containing more than one processor core.


A computer containing multiple general-purpose processors.


A "mutually exclusive" lock that only one strand can acquire at a time, thereby ensuring that only one strand executes the critical section protected by the mutex at a time. For example, Linux* OS supports Pthreads pthread_mutex_t objects.


The property of a program when it behaves differently from run to run when executed on exactly the same inputs. Nondeterministic programs are usually hard to debug.

Nonlocal variable

A program variable that is bound outside of the scope of the function, method, or class in which it is used. In Cilk programs, we also use this term to refer to variables with a scope outside a cilk_for loop.


A task-parallel programming platform for multicore computers based on Cilk technology.

Parallel loop

A for loop all of whose iterations can be run independently in parallel. The cilk_for keyword designates a parallel loop.


The ratio of work to span, which is the largest speedup an application could possibly attain when run on an infinite number of processors.

Perfect linear speedup

Speedup equal to the processor count. See also linear speedup.


A self-contained concurrent agent that by default executes a serial chain of instructions. More than one thread may run within a process, but a process does not usually share memory with other processes. Scheduling of processes is typically managed by the operating system.


A processor implements the logic to execute program instructions sequentially; we use the term "core" as a synonym. This document does not use the term "processor" to refer to multiple processing units on the same or multiple chips, although other documents may use the term that way.

Race condition

A source of nondeterminism whereby the result of a concurrent computation depends on the timing or relative order of the execution of instructions in each individual strand.


A variable to receive the result of a function call.


A hyperobject with a defined (usually associative) reduce() binary operator which the OpenCilk runtime system uses to combine the each view of each separate strand. A reducer must have two methods:

  • A default constructor which initializes the reducer to its identity value
  • A reduce() method which merges the value of right reducer into the left (this) reducer.

Response time

The time it takes to execute a computation from the time a human user provides an input to the time the user gets the result.

Running time

How long a program takes to execute on a given computer system. Also called execution time.

Scale down

The ability of a parallel application to run efficiently on one or a small number of processors.

Scale up

The ability of a parallel application to run efficiently on a large number of processors. See also linear speedup.

Sequential consistency

The memory model for concurrency wherein the effect of concurrent agents is as if their operations on shared memory were interleaved in a global order consistent with the orders in which each agent executed them. This model was advanced in 1976 by Leslie Lamport.

Serial execution

Execution of the serial projection of a Cilk program.

Serial projection

The C or C++ program that results from stubbing out the keywords of a Cilk program, where cilk_spawn, cilk_scope, and cilk_sync are elided and cilk_for is replaced with an ordinary for. The serial projection can be used for debugging and, in the case of a converted C/C++ program, will behave exactly as the original C/C++ program. The terms "serialization" and "serial elision" are used in some of the literature. Also, see "serial semantics".

Serial semantics

The behavior of a Cilk program when executed as the serial projection of the program. See the following article: Four Reasons Why Parallel Programs Should Have Serial Semantics.


See serial projection.

Shared memory

Computer storage that is shared among several processors. A shared-memory multiprocessor is a computer in which each processor can directly address any memory location. Contrast with distributed memory.


The theoretically fastest execution time for a parallel program when run on an infinite number of processors, discounting overheads for communication and scheduling. Often denoted by T in the literature, and sometimes called critical-path length.


To call a function without waiting for it to return, as in a normal call. The caller can continue to execute in parallel with the called function. See also cilk_spawn.


How many times faster a program is when run in parallel than when run on one processor. Speedup can be computed by dividing the running time TP of the program on P processors by its running time T1 on one processor.


A serial chain of executed instructions without any parallel control (such as a spawn, sync, return from a spawn, etc.)


To wait for a set of spawned functions to return before proceeding. The current function is dependent upon the spawned functions and cannot proceed in parallel with them. See also cilk_sync.


Task-parallel platforms provide a layer of software on top of threads to coordinate, schedule, and manage the processors of a multicore. Some task-parallel platforms are built as runtime libraries, but others provide full-fledged parallel languages with compiler and runtime support.

Task-parallel programming allows parallelism to be specified in a "processor-oblivious" fashion, where the programmer identifies what computational tasks may run in parallel but does not indicate which thread or processor performs the task. Thus, the programmer is freed from worrying about communication protocols, load balancing, and other vagaries of thread programming. The task-parallel platform contains a scheduler, which automatically load-balances the tasks across the processors, thereby greatly simplifying the programmer’s chore. Task-parallel algorithms provide a natural extension to ordinary serial algorithms, allowing performance to be reasoned about mathematically using "work/span analysis."


A thread executes a serial instruction chain. Scheduling of threads is typically managed by the operating system.


A number of operations performed per unit time.


The state of a hyperobject as seen by a given strand.


The running time of a program when run on one processor, sometimes denoted by T1.

Work stealing

A scheduling strategy where processors post parallel work locally and, when a processor runs out of local work, it steals work from another processor. Work-stealing schedulers are notable for their efficiency, because they incur no communication or synchronization overhead when there is ample parallelism. The OpenCilk runtime system employs a work-stealing scheduler.


A thread that, together with other workers, implements the OpenCilk runtime system's work stealing scheduler.