為什么數組是從0開始的

2024-06-28 13:22:00

字體：大中小

來源：轉載

供稿：網友

為什么數組是從0開始的

本文通過匯總一些網上搜集到的資料，總結出大部分編程語言中數組下標從0開始的原因

本博客已經遷移至：

http://cenalulu.github.io/

本篇博文已經遷移，閱讀全文請點擊：

http://cenalulu.github.io/linux/why-array-start-from-zero/

背景

我們知道大部分編程語言中的數組都是從0開始編號的，即array[0]是數組的第一個元素。這個和我們平時生活中從1開始編號的習慣相比顯得很反人類。那么究竟是什么樣的原因讓大部分編程語言數組都遵從了這個神奇的習慣呢？本文最初是受stackoverflow上的一個問題的啟發，通過搜集和閱讀了一些資料在這里做個總結。當然，本文摘錄較多的過程結論，如果你想把這篇文章當做快餐享用的話，可以直接跳到文章末尾看結論。

最早的原因

在回答大部分我們無法解釋的詭異問題時，我們最常用的辯詞通常是歷史原因。那么，歷史又是出于什么原因，使用了0標號數組呢？Mike Hoye就是本著這么一種追根刨地的科學精神為我們找到了解答。以下是一些他的重要結論的摘錄翻譯：

據作者的說法，C語言中從0開始標號的做法是沿用了BCPL這門編程語言的做法。而BCPL中如果一個變量是指針的話，那么該指針可以指向一系列連續的相同類型的數值。那么p+0就代表了這一串數值的第一個。在BCPL中數組第5個元素的寫法是p!5，而C語言中把寫法改成了p[5]，也就是現在的數組。具體原文摘錄如下：

If a BCPL variable rePResents a pointer, it points to one or more consecutive Words of memory. These words are the same size as BCPL variables. Just as machine code allows address arithmetic so does BCPL, so if p is a pointer p+1 is a pointer to the next word after the one p points to. Naturally p+0 has the same value as p. The monodic indirection Operator ! takes a pointer as it’s argument and returns the contents of the word pointed to. If v is a pointer !(v+I) will access the word pointed to by v+I.

至于為什么C語言中為什么使用[]方括號來表示數組下標，這個設計也有一定來歷。據C語言作者的說法是方括號是現代鍵盤上唯一較為容易輸入的成對符號（不用shift）不信你對著鍵盤找找？

為什么這個反人類設計在一段時間內一直沒有被改變

根據Mike的說法，BCPL是被設計在IBM硬件環境下編譯運行的。在1960后的很長一段時間內，服務器硬件幾乎被IBM統治。一個城市內也許至于一臺超級計算機，還需要根據時間配額使用。當你當天的配額用完以后，你的程序就被完全清出計算隊列。甚至連計算結果都不給你保留，死無全尸。這個時候寫一段高效的程序，就顯得比什么都重要了。而這時0下標數組又體現了出了它的另一個優勢，就是：相較于1下標數組，它的編譯效率更高。原文摘錄如下：

So: the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.

此外，還有另外一種說法。在C語言中有指針的概念，而指針數組標號實際上是一個偏移量而不是計數作用。例如對于指針p，第N個元素是*(p+N)，指針指向數組的第一個元素就是*(p+0)，

一些現代語言為什么仍然使用這種做法

上文中提到的為了計較分秒的編譯時間而使用0下標數組，在硬件飛速發展的今天顯然是不必要的。那么為什么一些新興語言，如Python依然選擇以0作為數組第一個元素呢？難道也是歷史原因？對于這個問題，Python的作者Guido van Rossum也有自己的答案。這里大致概括一下作者的用意：從0開始的半開放數組寫法在表示子數組（或者子串）的時候格外的便捷。例如：a[0:n]表示了a中前n個元素組成的新數組。如果我們使用1開始的數組寫法，那么就要寫成a[1:n+1]。這樣就顯得不是很優雅。那么問題來了，Python數組為什么使用半開放，即[m,n)左閉合右開發的寫法呢？這個理解起來就比較簡單，讀者可以參考http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF作為擴展閱讀。下面摘錄一段Python作者的原話：

Using 0-based indexing, half-open intervals, and suitable defaults (as Python ended up having), they are beautiful: a[:n] and a[i:i+n]; the former is long for a[0:n].Using 1-based indexing, if you want a[:n] to mean the first n elements, you either have to use closed intervals or you can use a slice notation that uses start and length as the slice parameters. Using half-open intervals just isn't very elegant when combined with 1-based indexing. Using closed intervals, you'd have to write a[i:i+n-1] for the n items starting at i. So perhaps using the slice length would be more elegant with 1-based indexing? Then you could write a[i:n]. And this is in fact what ABC did -- it used a different notation so you could write a@i|n.(See http://homepages.cwi.nl/~steven/abc/qr.html#EXPRESSIONS.)

總結

從0標號的數組傳統，沿用了這么長時間的原因主要列舉如下：

在計算資源缺乏的過去，0標號的寫法可以節省編譯時間
現代語言中0標號可以更優雅的表示數組字串
在支持指針的語言中，標號被視作是偏移量，因此從0開始更符合邏輯

參考文獻

(http://developeronline.blogspot.com/2008/04/why-array-index-should-start-from-0.html)
[http://exple.tive.org/blarg/2013/10/22/citation-needed/]
[https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi]
[http://stackoverflow.com/questions/7320686/why-does-the-indexing-start-with-zero-in-c]

上一篇：yum的一些用法

下一篇：NPM install