• My PC's cores can synchronize to about 1.000 clock cycles accuracy

    From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Fri Sep 13 18:45:41 2024
    From Newsgroup: comp.lang.c

    #include <iostream>
    #include <barrier>
    #include <thread>
    #include <vector>
    #if defined(_WIN32)
    #include <intrin.h>
    #elif defined(__linux__)
    #include <x86intrin.h>
    #endif

    using namespace std;

    int main()
    {
    unsigned hc = thread::hardware_concurrency();
    barrier bar( hc );
    atomic_uint synch( hc );
    atomic_uint64_t zero( 0 );
    atomic_int64_t diffs( 0 );
    auto thr = [&]()
    {
    int64_t sum = 0;
    for( unsigned t = 1'000; t; --t )
    {
    bar.arrive_and_wait();
    if( synch.fetch_sub( 1, memory_order_relaxed ) > 1 )
    while( synch.load( memory_order_relaxed ) );
    uint64_t tsc = __rdtsc(), expected = 0;
    if( !zero.compare_exchange_weak( expected, tsc, memory_order_relaxed,
    memory_order_relaxed ) )
    sum += abs( (int64_t)(expected - tsc) );
    bar.arrive_and_wait();
    synch.store( hc );
    zero.store( 0, memory_order_relaxed );
    }
    diffs.fetch_add( sum, memory_order_relaxed );
    };
    vector<jthread> threads;
    threads.reserve( hc - 1 );
    for( unsigned t = hc - 1; t; --t )
    threads.emplace_back( thr );
    thr();
    threads.resize( 0 );
    cout << (double)diffs.load( memory_order_relaxed ) / (1'000.0 * hc) << endl;
    }

    My PC is a AMD 7950X 16-core system.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c,comp.lang.c++ on Fri Sep 13 18:49:51 2024
    From Newsgroup: comp.lang.c

    Sorry, wrong newsgroup.

    Am 13.09.2024 um 18:45 schrieb Bonita Montero:
    #include <iostream>
    #include <barrier>
    #include <thread>
    #include <vector>
    #if defined(_WIN32)
        #include <intrin.h>
    #elif defined(__linux__)
        #include <x86intrin.h>
    #endif

    using namespace std;

    int main()
    {
        unsigned hc = thread::hardware_concurrency();
        barrier bar( hc );
        atomic_uint synch( hc );
        atomic_uint64_t zero( 0 );
        atomic_int64_t diffs( 0 );
        auto thr = [&]()
        {
            int64_t sum = 0;
            for( unsigned t = 1'000; t; --t )
            {
                bar.arrive_and_wait();
                if( synch.fetch_sub( 1, memory_order_relaxed ) > 1 )
                    while( synch.load( memory_order_relaxed ) );
                uint64_t tsc = __rdtsc(), expected = 0;
                if( !zero.compare_exchange_weak( expected, tsc, memory_order_relaxed, memory_order_relaxed ) )
                    sum += abs( (int64_t)(expected - tsc) );
                bar.arrive_and_wait();
                synch.store( hc );
                zero.store( 0, memory_order_relaxed );
            }
            diffs.fetch_add( sum, memory_order_relaxed );
        };
        vector<jthread> threads;
        threads.reserve( hc - 1 );
        for( unsigned t = hc - 1; t; --t )
            threads.emplace_back( thr );
        thr();
        threads.resize( 0 );
        cout << (double)diffs.load( memory_order_relaxed ) / (1'000.0 * hc) << endl;
    }

    My PC is a AMD 7950X 16-core system.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Sep 13 10:16:05 2024
    From Newsgroup: comp.lang.c

    Bonita Montero <Bonita.Montero@gmail.com> writes:

    #include <iostream>
    #include <barrier>
    #include <thread>
    #include <vector>
    #if defined(_WIN32)
    #include <intrin.h>
    #elif defined(__linux__)
    #include <x86intrin.h>
    #endif

    using namespace std;

    [...]

    Wrong newsgroup, shit-for-brains.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Sat Sep 14 05:01:16 2024
    From Newsgroup: comp.lang.c

    Am 13.09.2024 um 19:16 schrieb Tim Rentsch:
    Bonita Montero <Bonita.Montero@gmail.com> writes:

    #include <iostream>
    #include <barrier>
    #include <thread>
    #include <vector>
    #if defined(_WIN32)
    #include <intrin.h>
    #elif defined(__linux__)
    #include <x86intrin.h>
    #endif

    using namespace std;

    [...]

    Wrong newsgroup, shit-for-brains.

    It's more about the general principle that the interconnect between the CPU-cores of my PC is that fast that the cores can synchronize to about
    1'000 clock cycles.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Mon Sep 16 06:43:29 2024
    From Newsgroup: comp.lang.c

    This posting was completely baffling to me, until I realized ...
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c on Mon Sep 16 12:56:36 2024
    From Newsgroup: comp.lang.c

    Am 16.09.2024 um 08:43 schrieb Lawrence D'Oliveiro:

    This posting was completely baffling to me, until I realized ...

    I'm from Europe and I can handle both types of decimal points.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul@nospam@needed.invalid to comp.lang.c on Mon Sep 16 15:15:09 2024
    From Newsgroup: comp.lang.c

    On Mon, 9/16/2024 6:56 AM, Bonita Montero wrote:
    Am 16.09.2024 um 08:43 schrieb Lawrence D'Oliveiro:

    This posting was completely baffling to me, until I realized ...

    I'm from Europe and I can handle both types of decimal points.


    I'm from "somewhere" and 1000 is 1000 here. Punctuation
    is for Excel spreadsheets :-)

    *******

    The hardware has a declaration, so in principle you don't
    even have to measure anything.

    "Since the family 10h (Barcelona/Phenom), AMD chips feature a constant TSC,
    which can be driven either by the HyperTransport speed or the highest P state.
    A CPUID bit (Fn8000_0007:EDX_8) advertises this;
    Intel-CPUs also report their invariant TSC on that bit."

    In Linux, there is "constant_tsc" in the CPU feature list.
    Both my machines have it (an Intel machine ten years old,
    an AMD machine two years old). Your machine would list it
    in /proc/cpuinfo.

    So you didn't write that code for your 7950X, since you
    could just check the CPU feature bit instead for the
    property of "constant_tsc", AKA invariant TSC. Your CPU
    is not "synchronized" -- the hardware just does not vary
    across the face of the CPU. It's like an entirely different
    feature in a sense.

    The way Wiki puts this:

    "The specific processor configuration determines the behavior.
    Constant TSC behavior ensures that the duration of each clock tick
    is uniform and makes it possible to use the TSC as a wall-clock timer
    even if the processor core changes frequency. This is the
    architectural behavior for all later Intel processors."

    Paul
    --- Synchronet 3.20a-Linux NewsLink 1.114