.NET并行編程——數據并行

2019-11-14 16:02:45

字體：大中小

來源：轉載

供稿：網友

本文內容

并行編程
數據并行
- 環境
- 計算 PI
- 矩陣相乘
- 把目錄中的全部圖片復制到另一個目錄
- 列出指定目錄中的所有文件，包括其子目錄

最近，對多線程編程，并行編程，異步編程，這三個概念有點暈了，之前我研究了異步編程《VS 2013 C# 異步編程 async await》，現在猛然發覺，自己怎么有點不明白這三者之間有什么聯系和區別了呢？有點說不清、道不明的感覺~

因此，回顧了一下個人經歷，屢屢思路~我剛接觸計算機時，還是學校的 DOS 和 win 3.x，之后，學校換了 Windows 95，再之后，我有自己的臺式機……但無論如何，那時電腦的 CPU 都是單核的，即便采用多線程，程序無論看上多么像“同時”執行，其本質上還是順序的，因為代碼段是獨占 CPU 的；之后，我賣了臺式機，買了筆記本電腦，CPU 是雙核的，如果用多線程，那情況就不同了，能達到正真的“同時”執行，也就是并行。

“并行”是目的，為了實現這個目的，我們采用“多線程編程”這個手段，而我們知道，多線程編程涉及的問題很多，申請超額、競爭條件、死鎖、活鎖、二步舞、優先級翻轉等，為了簡化多線程編程，加之多核 CPU 越來越普遍，于是很多編程框架本身就提供了對多線程的封裝，比如一些類和方法，這些就是并行編程。因此，多線程編程變成了較底層東西，而并行編程則是較高層次，較高抽象，至少能將一段很簡單的代碼從順序的直接編程并行的；而異步編程呢，異步方法旨在成為非阻止操作，異步并不會創建其他線程。異步方法不會在其自身線程上運行，而是在 CLR 提供的線程上，因此它不需要多線程。

總之，在多核和眾核（manycore）時代，想想一下，在未來，具有一百萬個核的 CPU 不是不可能的事。人類中樞神經系統中約含1000億個神經元，僅大腦皮層中就約有140億。如果再讓程序員自己用多線程編程，顯然太低效了，低效也就算了，還容易犯錯，所以才需要并行編程。

2009年Google推出了它的第二個開源語言 Go。對 Go 的評價褒貶不一，中國比國外的熱情高中國比國外的熱情高。Go 天生就是為并發和網絡而生的，除了這點外，在靜態編譯、GC、跨平臺、易學、豐富的標準庫等，其實并不如 C/C++、java、C#、Python。由此可想而知，為什么會出現 Go？以及為什么 Go 存在如此多的問題和爭論？——也許Go 更像是一個“天才的自閉癥患者”，如果看清了這點，對 Go 的褒貶也就能泰然啦~

使用 TPL，除了線程方面的知識，你最好對委托、匿名方法或 Lambda 表達式有所了解。

下載 MyDemo

下載 Samples for Parallel PRogramming with .net framework 完整示例

下載 Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4 完整示例（該書的例子，深入淺出，循序漸進，對理解并行編程幫助很大，針對本文的數據并行，你可以參考 CH2，看作者如何對 ASE 和 md5 的計算進行改進的，評價的標準是 Amdahl 定律）

并行編程

多核 CPU 已經相當普遍，使得多個線程能夠同時執行。將代碼并行化，工作也就分攤到多個 CPU 上。

過去，并行化需要線程和鎖的低級操作。而 Visual Studio 2010 和 .NET Framework 4 開始提供了新的運行時、新的類庫類型以及新的診斷工具，從而增強了對并行編程的支持。這些功能簡化了并行開發，通過固有方法編寫高效、細化且可伸縮的并行代碼，而不必直接處理線程或線程池。

下圖從較高層面上概述了 .NET Framework 4 中的并行編程體系結構。

IC387462

任務并行庫（The Task Parallel Library，TPL）是 System.Threading 和 System.Threading.Tasks 空間中的一組公共類型和 API。TPL 的目的是通過簡化將并行和并發添加到應用程序的過程來提高開發人員的工作效率。TPL 能動態地最有效地使用所有可用的處理器。此外，TPL 還處理工作分區、ThreadPool 上的線程調度、取消支持、狀態管理以及其他低級別的細節操作。通過使用 TPL，你可以將精力集中于程序要完成的工作，同時最大程度地提高代碼的性能。

從 .NET Framework 4 開始，TPL 是編寫多線程代碼和并行代碼的首選方法。但并不是所有代碼都適合并行化，例如，如果某個循環在每次迭代時只執行少量工作，或它在很多次迭代時都不運行，那么并行化的開銷可能導致代碼運行更慢。此外，像任何多線程代碼一樣，并行化會增加程序執行的復雜性。盡管 TPL 簡化了多線程方案，但建議對線程處理概念（例如，鎖、死鎖和爭用條件）進行基本了解，以便能夠有效地使用 TPL。

數據并行

我們可以對數據進行并行，簡單地說，對集合中的每個數據同時執行相同的操作，當然也可以對任務和數據流進行并行。本文主要描述數據并行。

TPL 通過 System.Threading.Tasks.Parallel 類實現數據并行，此類提供了 for 和 foreach 基于并行的實現。為 Parallel.For 或 Parallel.ForEach 編寫循環邏輯與編寫順序循環非常類似。你不必創建線程或隊列工作項。基本循環中不必采用鎖。TPL 將處理所有低級別工作。

System.Threading.Tasks.Parallel類有三個方法：For、ForEach、Invoke，它們有很多重載，沒必要說明這些方法本身，因此，下面用實例說明如何用這些方法進行并行編程，并對比與順序執行的性能。

環境

Windows 7 旗艦版 SP1
Microsoft Visual Studio Ultimate 2013 Update 4

計算 PI

對比順序計算 PI、并行計算 PI 和并行分區計算 PI 的性能。

using System;

using System.Collections.Concurrent;

using System.Diagnostics;

using System.Threading.Tasks;

namespace ComputePi

    class Program

        const int num_steps = 100000000;

        static void Main(string[] args)

            Time(() => SerialPi());

            Time(() => ParallelPi());

            Time(() => ParallelPartitionerPi());

            Console.WriteLine("Press any keys to Exit.");

            Console.ReadLine();

        /// <summary>

        /// Times the execution of a function and outputs both the elapsed time and the function's result.

        /// </summary>

        static void Time<T>(Func<T> work)

            var sw = Stopwatch.StartNew();

            var result = work();

            Console.WriteLine(sw.Elapsed + ": " + result);

        /// <summary>

        /// Estimates the value of PI using a for loop.

        /// </summary>

        static double SerialPi()

            double sum = 0.0;

            double step = 1.0 / (double)num_steps;

            for (int i = 0; i < num_steps; i++)

                double x = (i + 0.5) * step;

                sum = sum + 4.0 / (1.0 + x * x);

            return step * sum;

        /// <summary>

        /// Estimates the value of PI using a Parallel.For.

        /// </summary>

        static double ParallelPi()

            double sum = 0.0;

            double step = 1.0 / (double)num_steps;

            object monitor = new object();

            Parallel.For(0, num_steps, () => 0.0, (i, state, local) =>

                double x = (i + 0.5) * step;

                return local + 4.0 / (1.0 + x * x);

            }, local => { lock (monitor) sum += local; });

            return step * sum;

        /// <summary>

        /// Estimates the value of PI using a Parallel.ForEach and a range partitioner.

        /// </summary>

        static double ParallelPartitionerPi()

            double sum = 0.0;

            double step = 1.0 / (double)num_steps;

            object monitor = new object();

            Parallel.ForEach(Partitioner.Create(0, num_steps), () => 0.0, (range, state, local) =>

                for (int i = range.Item1; i < range.Item2; i++)

                    double x = (i + 0.5) * step;

                    local += 4.0 / (1.0 + x * x);

                return local;

            }, local => { lock (monitor) sum += local; });

            return step * sum;

//RESULT:

//00:00:00.4358850: 3.14159265359043

//00:00:00.4523856: 3.14159265358987

//00:00:00.1435475: 3.14159265358979

//Press any keys to Exit.

當 For 循環的循環體很小時，它的執行速度可能比等效的順序循環更慢。這也就是為什么順序計算 PI 與并行計算 PI 的時間差不多，因為對數據進行分區所涉及的開銷以及調用每個循環迭代上的委托的開銷導致了性能降低。為了解決類似情況，Partitioner 類提供 Partitioner.Create 方法，該方法使您可以為委托體提供順序循環，以便每個分區只調用一次委托，而不是每個迭代調用一次委托。因此，并行分區計算 PI 時，性能有大幅度提升。

矩陣相乘

對比順序與并行計算矩陣乘法的性能。

using System;

using System.Diagnostics;

using System.Threading.Tasks;

namespace DataParallelismDemo

    class Program

        /// <summary>

        /// Sequential_Loop

        /// </summary>

        /// <param name="matA"></param>

        /// <param name="matB"></param>

        /// <param name="result"></param>

        static void MultiplyMatricesSequential(double[,] matA, double[,] matB, double[,] result)

            int matACols = matA.GetLength(1);

            int matBCols = matB.GetLength(1);

            int matARows = matA.GetLength(0);

            for (int i = 0; i < matARows; i++)

                for (int j = 0; j < matBCols; j++)

                    for (int k = 0; k < matACols; k++)

                        result[i, j] += matA[i, k] * matB[k, j];

        /// <summary>

        /// Parallel_Loop

        /// </summary>

        /// <param name="matA"></param>

        /// <param name="matB"></param>

        /// <param name="result"></param>

        static void MultiplyMatricesParallel(double[,] matA, double[,] matB, double[,] result)

            int matACols = matA.GetLength(1);

            int matBCols = matB.GetLength(1);

            int matARows = matA.GetLength(0);

            // A basic matrix multiplication.

            // Parallelize the outer loop to partition the source array by rows.

            Parallel.For(0, matARows, i =>

                for (int j = 0; j < matBCols; j++)

                    // Use a temporary to improve parallel performance.

                    double temp = 0;

                    for (int k = 0; k < matACols; k++)

                        temp += matA[i, k] * matB[k, j];

                    result[i, j] = temp;

            }); // Parallel.For

        static void Main(string[] args)

            // Set up matrices. Use small values to better view

            // result matrix. Increase the counts to see greater

            // speedup in the parallel loop vs. the sequential loop.

            int colCount = 180;

            int rowCount = 2000;

            int colCount2 = 270;

            double[,] m1 = InitializeMatrix(rowCount, colCount);

            double[,] m2 = InitializeMatrix(colCount, colCount2);

            double[,] result = new double[rowCount, colCount2];

            // First do the sequential version.

            Console.WriteLine("Executing sequential loop...");

            Stopwatch stopwatch = new Stopwatch();

            stopwatch.Start();

            MultiplyMatricesSequential(m1, m2, result);

            stopwatch.Stop();

            Console.WriteLine("Sequential loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);

            // For the skeptics.

            OfferToPrint(rowCount, colCount2, result);

            // Reset timer and results matrix.

            stopwatch.Reset();

            result = new double[rowCount, colCount2];

            // Do the parallel loop.

            Console.WriteLine("Executing parallel loop...");

            stopwatch.Start();

            MultiplyMatricesParallel(m1, m2, result);

            stopwatch.Stop();

            Console.WriteLine("Parallel loop time in milliseconds: {0}", stopwatch.ElapsedMilliseconds);

            OfferToPrint(rowCount, colCount2, result);

            // Keep the console window open in debug mode.

            Console.WriteLine("Press any key to exit.");

            Console.ReadKey();

        /// <summary>

        /// 生成矩陣

        /// </summary>

        /// <param name="rows"></param>

        /// <param name="cols"></param>

        /// <returns></returns>

        static double[,] InitializeMatrix(int rows, int cols)

            double[,] matrix = new double[rows, cols];

            Random r = new Random();

            for (int i = 0; i < rows; i++)

                for (int j = 0; j < cols; j++)

                    matrix[i, j] = r.Next(100);

            return matrix;

        private static void OfferToPrint(int rowCount, int colCount, double[,] matrix)

            Console.WriteLine("Computation complete. Print results? y/n");

            char c = Console.ReadKey().KeyChar;

            if (c == 'y' || c == 'Y')

                Console.WindowWidth = 180;

                Console.WriteLine();

                for (int x = 0; x < rowCount; x++)

                    Console.WriteLine("ROW {0}: ", x);

                    for (int y = 0; y < colCount; y++)

                        Console.Write("{0:#.##} ", matrix[x, y]);

                    Console.WriteLine();

//RESULST:

//Executing sequential loop...

//Sequential loop time in milliseconds: 1168

//Computation complete. Print results? y/n

//nExecuting parallel loop...

//Parallel loop time in milliseconds: 360

//Computation complete. Print results? y/n

//nPress any key to exit.

using System;

//using System.Collections.Generic;

//using System.Linq;

//using System.Text;

using System.Threading;

using System.Threading.Tasks;

using System.Configuration;

namespace MovePics

    class Program

        protected static string PIC_PATH = ConfigurationManager.AppSettings["PicPath"].ToString();

        protected static string NEW_PIC_PATH = ConfigurationManager.AppSettings["NewPicPath"].ToString();

        static void Main(string[] args)

            // A simple source for demonstration purposes. Modify this path as necessary.

            string[] files = System.IO.Directory.GetFiles(PIC_PATH, "*.png");

            System.IO.Directory.CreateDirectory(NEW_PIC_PATH);

            //  Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)

            Parallel.ForEach(files, currentFile =>

                // The more computational work you do here, the greater

                // the speedup compared to a sequential foreach loop.

                string filename = System.IO.Path.GetFileName(currentFile);

                System.Drawing.Bitmap bitmap = new System.Drawing.Bitmap(currentFile);

                bitmap.RotateFlip(System.Drawing.RotateFlipType.Rotate180FlipNone);

                bitmap.Save(System.IO.Path.Combine(NEW_PIC_PATH, filename));

                // Peek behind the scenes to see how work is parallelized.

                // But be aware: Thread contention for the Console slows down parallel loops!!!

                Console.WriteLine("Processing {0} on thread {1}", filename,

                                    Thread.CurrentThread.ManagedThreadId);

            } //close lambda expression

                 ); //close method invocation

            // Keep the console window open in debug mode.

            Console.WriteLine("Processing complete. Press any key to exit.");

            Console.ReadKey();

using System;

using System.Collections.Generic;

using System.Diagnostics;

using System.IO;

using System.Linq;

using System.Security;

using System.Text;

using System.Threading;

using System.Threading.Tasks;

namespace TraverseTreeParallelForEach

    class Program

        static void Main(string[] args)

try

                TraverseTreeParallelForEach(@"C:/Program Files", (f) =>

                    // Exceptions are no-ops.

try

                        // Do nothing with the data except read it.

                        byte[] data = File.ReadAllBytes(f);

                    catch (FileNotFoundException) { }

                    catch (IOException) { }

                    catch (UnauthorizedaccessException) { }

                    catch (SecurityException) { }

                    // Display the filename.

                    Console.WriteLine(f);

});

            catch (ArgumentException)

                Console.WriteLine(@"The directory 'C:/Program Files' does not exist.");

            // Keep the console window open.

            Console.WriteLine("Press any key to exit.");

            Console.ReadKey();

        public static void TraverseTreeParallelForEach(string root, Action<string> action)

            //Count of files traversed and timer for diagnostic output

            int fileCount = 0;

            var sw = Stopwatch.StartNew();

            // Determine whether to parallelize file processing on each folder based on processor count.

            int procCount = System.Environment.ProcessorCount;

            // Data structure to hold names of subfolders to be examined for files.

            Stack<string> dirs = new Stack<string>();

            if (!Directory.Exists(root))

                throw new ArgumentException();

            dirs.Push(root);

            while (dirs.Count > 0)

                string currentDir = dirs.Pop();

                string[] subDirs = { };

                string[] files = { };

try

                    subDirs = Directory.GetDirectories(currentDir);

                // Thrown if we do not have discovery permission on the directory.

                catch (UnauthorizedAccessException e)

                    Console.WriteLine(e.Message);

                    continue;

                // Thrown if another process has deleted the directory after we retrieved its name.

                catch (DirectoryNotFoundException e)

                    Console.WriteLine(e.Message);

                    continue;

try

                    files = Directory.GetFiles(currentDir);

                catch (UnauthorizedAccessException e)

                    Console.WriteLine(e.Message);

                    continue;

                catch (DirectoryNotFoundException e)

                    Console.WriteLine(e.Message);

                    continue;

                catch (IOException e)

                    Console.WriteLine(e.Message);

                    continue;

                // Execute in parallel if there are enough files in the directory.

                // Otherwise, execute sequentially.Files are opened and processed

                // synchronously but this could be modified to perform async I/O.

try

                    if (files.Length < procCount)

                        foreach (var file in files)

                            action(file);

                            fileCount++;

                    else

                        Parallel.ForEach(files, () => 0, (file, loopState, localCount) =>

                            action(file);

                            return (int)++localCount;

},

                                         (c) =>

                                             Interlocked.Add(ref fileCount, c);

});

                catch (AggregateException ae)

                    ae.Handle((ex) =>

                        if (ex is UnauthorizedAccessException)

                            // Here we just output a message and go on.

                            Console.WriteLine(ex.Message);

                            return true;

                        // Handle other exceptions here if necessary...

                        return false;

});

                // Push the subdirectories onto the stack for traversal.

                // This could also be done before handing the files.

                foreach (string str in subDirs)

                    dirs.Push(str);

            // For diagnostic purposes.

            Console.WriteLine("Processed {0} files in {1} milleseconds", fileCount, sw.ElapsedMilliseconds);

另外，Parallel.For 和 Parallel.ForEach 方法都有若干重載，利用這些重載可以停止或中斷循環執行、監視其他線程上循環的狀態、維護線程本地狀態、完成線程本地對象、控制并發程度，等等。啟用此功能的幫助器類型包括 ParallelLoopState、ParallelOptions、ParallelLoopResult、 CancellationToken 和 CancellationTokenSource。

參考資料

Microsoft Developer Network 并行編程
Microsft Developer Network 數據并行

下載 MyDemo

下載 Samples for Parallel Programming with .net framework 完整示例

下載 Professional Parallel Programming with C#: Master Parallel Extensions with .NET 4 完整示例（該書的例子，深入淺出，循序漸進，對理解并行編程幫助很大，針對本文的數據并行，你可以參考 CH2，看作者如何對 ASE 和 MD5 的計算進行改進的，評價的標準是 Amdahl 定律）

上一篇：Testfailed.嘗試加載Oracle客戶端庫時引發BadImageFormatException

下一篇：c#、sql數據庫備份還原