PCrawler is a parallel web crawler using Task Parallel Library (TPL) in .net 4.5 framework . Web crawlers or web spiders is a very usefull agents from search engines and other applications. Every day crawlers collects web sites and update databases with new "fresh" data. Nowadays, new CPUs comes with new abilities , intel and amd launch new multicore processors. A multi-core processor is a single computing component with two or more independent actual central processing units (called "cores"), which are the units that read and execute program instructions.The instructions are ordinary CPU instructions such as add, move data, and branch, but the multiple cores can run multiple instructions at the same time, increasing overall speed for programs amenable to parallel computing. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP), or onto multiple dies in a single chip package. 

   The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework 4. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish. Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code. However, not all code is suitable for parallelization; for example, if a loop performs only a small amount of work on each iteration, or it doesn't run for many iterations, then the overhead of parallelization can cause the code to run more slowly. Furthermore, parallelization like any multithreaded code adds complexity to your program execution. Although the TPL simplifies multithreaded scenarios, we recommend that you have a basic understanding of threading concepts, for example, locks, deadlocks, and race conditions, so that you can use the TPL effectively. For more information about basic parallel computing concepts, see the Parallel Computer Developer Center on MSDN.
       With a PCrawler demonstrate a new programming style using Task Parallel Library for better perfomance and better hardware resources using. I hope to be usefull as a programming pattern for any TPL developer.

Last edited Oct 27, 2013 at 1:17 PM by Geronatsios, version 4