Programming booby-traps

C# lambdas with by-reference capture

Virtually all modern programming languages offer some mechanism that allow programmers to define a function that, at the same time:

  1. its definition is written inside of the body of some other function (for convenience of defining it where it is used), and
  2. it captures variables from the surrounding function (and also variables visible from that surrounding function).

While putting the definition of a function inside the body of another is a simple matter of defining the syntax, capturing the variables from the surrounding scope is a much harder issue.

Essentially, there is a functional object created by the lambda syntax. This functional object contains either copies of, or references to, all variables from the surrounding scope. The way it is done is different in different languages, leading to different behaviors.

Java is one of the strictest: all variables are copied into the functional object (for reference types the reference to the underlying object is copied, but this is also identical to the behavior of the assignment operator), and the captured variables cannot be modified (re-assigned), neither in the surrounding function nor in the lambda (in older versions of Java, the captured variables had to be declared final; newer versions allow to not declare them final explicitly, but the programmer is still forbidden to re-assign them after the initial assignment.

C++ is one of the most flexible: the programmer must explicitly tell which variable is to be copied (by value), and which is to be taken by reference.

C#, unlike Java, takes everything by reference, including variables of primitive types like int, and there is no way to do otherwise. This leads to a surprising behavior when a lambda is actually called much later or on a distinct thread from the one where it is created. Example:

using System;
using System.Collections.Generic;
using System.Threading;

namespace MisleadingLambda
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");

            List<Thread> threads = new List<Thread>();
            for (int i = 0; i < 10; ++i)
            {
                threads.Add(new Thread(() => { Console.WriteLine("i={0}", i); }));
                threads[i].Start();
            }

            for (int i = 0; i < 10; ++i)
            {
                threads[i].Join();
            }
        }
    }
}

The problem is that, since variable i is taken by reference, its value when the lambda is executed is probably distinct from the one when the lambda is created. So, the result is that the program can print other than just a permutation of numbers from 0 to 9.

The fix is to re-write the loop as:

            for (int i = 0; i < 10; ++i)
            {
                int j = i;
                threads.Add(new Thread(() => { Console.WriteLine("i={0}", j); }));
                threads[i].Start();
            }

The result is that each iteration of the loop creates a distinct j variable that is assigned to only once, with the value of i at that time.

As a side note, the i (in the first example) or j (in the second example) variable lifetime is handled by the garbage collector.


Python default function argument

Default function arguments is a mechanism that exists in many programming languages, and, while it seems very simple and straightforward at the first sight, there are deep complications.

In short, the declaration of a function declares a default value for one or more of that function's parameter(s). Thus, if the caller does not provide an actual argument for that parameter, the parameter takes the default value.

The semantic of what happens is clear if the default value is either a literal (like 42) or a simple expression (like 6*7). In this case, it does not matter when or how many times is that expression evaluated.

Things start to be problematic if the default value is an expression that has side effects or whose value depends on non-constant things. If the programming language accepts that, it has to define when is that expression evaluated. Main possibilities:

  1. when the function having the default argument is defined,
  2. the first time the function is called,
  3. each time the function is called.

Python chooses option 1 above. It makes perfect sense from the implementation point of view in an interpreted programming language - storing the default value for the argument fits perfectly to the language's philosophy, unlike storing the expression (maybe as a lambda function) to implement options 2 or 3.

However, option 1 with creating mutable objects may be surprising for the programmer. Consider the following Python code:

def foo(lst=[]):
    lst.append(len(lst))
    print(f"lst={lst}")
foo(["a", "b", "c"])
foo()
foo()

The programmer may expect that, when called with no arguments, the function receives an empty list. In reality, the interpreter will create a signle list, initially empty, that is passed to all no-arguments call to foo(). Changes made by the foo() function will thus be passed from one call to the next.

The code will thus print:

lst=['a', 'b', 'c', 3]
lst=[0]
lst=[0, 1]