• Embedding a Checksum in an Image File

    From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Wed Apr 19 19:06:33 2023
    From Newsgroup: comp.arch.embedded

    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.
    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.
    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.
    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.
    I keep thinking there is a different way of looking at this to achieve the result I want...
    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.
    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.
    --
    Rick C.
    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Thu Apr 20 12:14:58 2023
    From Newsgroup: comp.arch.embedded

    On 2023-04-20 5:06, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple
    modulo N addition, maybe N being 2^16.

    Some decades ago I was involved with a project for an 8052-based device,
    which was required to perform a code-check-sum check at boot.

    We decided to use a byte-per-byte xor checksum and make the correct
    check-sum be zero. We had a code module (possibly in assembler, I don't remember) that defined a one-byte "adjustment" constant in code memory.
    For each new version of the code, we first set the adjustment constant
    to zero, then ran the program, and it usually reported an error at boot because the check-sum was not zero. We then changed the adjustment
    constant to the actual reported checksum, C say, and that zeroed the
    check-sum because C xor C = 0. Bingo. You can use this method to make
    the checksum anything you like, for example hex 55.

    With a more advanced order-sensitive check-sum such as a CRC you could
    use the same method if you also ensure (by linker commands) that the adjustment value is always the last value that enters in the computed check-sum (assuming that the linking order of the other code modules is
    not incidentally changed when the value of the adjustment constant is changed).

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Peter Heitzer@peter.heitzer@rz.uni-regensburg.de to comp.arch.embedded on Thu Apr 20 11:30:34 2023
    From Newsgroup: comp.arch.embedded

    Rick C <gnuarm.deletethisbit@gmail.com> wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    What about putting the following structure at a fixed address at the end of ROM?:
    <startaddr><len><checksum>
    Your check function then for example does a 16 bit sum of the bytes from <startaddr>..<startaddr>+<len>-1 and compares with <checksum>
    <startaddr>, <len> an <checksum> can be evaluated at compile time.
    --
    Dipl.-Inform(FH) Peter Heitzer, peter.heitzer@rz.uni-regensburg.de
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From dalai lamah@antonio12358@hotmail.com to comp.arch.embedded on Thu Apr 20 13:47:48 2023
    From Newsgroup: comp.arch.embedded

    Un bel giorno Rick C digitò:

    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple
    modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work
    out in any useful way. Even if you try to anticipate the impact of
    adding the checksum, that only gives you a different checksum, that you
    then need to anticipate further... ad infinitum.

    I'm probably not understanding what you mean, but normally the checksum is stored in a memory section which is not subjected to the checksum
    calculation itself.

    The actual implementation depends on the tools you are using. Many linkers support this directly: you specify the memory section(s) subjected to
    checksum calculation, the type of checksum (CRC16, CRC32 etc) and the
    memory section that will store the checksum.

    Here is a technical note for IAR: https://www.iar.com/knowledge/support/technical-notes/general/checksum-calculation-with-xlink/

    A "poor man" solution is to do it manually:

    -In the source code, declare your checksum initializing to a known, fixed
    value (e.g. 0xDEADBEEF)
    -Run the program with a debugger; set a breakpoint when it calculates the checksum (and fails), and write down the correct checksum
    -Using a binary editor, find the fixed value into the executable binary,
    and replace it with the correct value.
    --
    Fletto i muscoli e sono nel vuoto.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Thu Apr 20 06:04:21 2023
    From Newsgroup: comp.arch.embedded

    On Thursday, April 20, 2023 at 7:47:54 AM UTC-4, dalai lamah wrote:
    Un bel giorno Rick C digitò:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work
    out in any useful way. Even if you try to anticipate the impact of
    adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.
    I'm probably not understanding what you mean, but normally the checksum is stored in a memory section which is not subjected to the checksum calculation itself.
    Yes, I didn't explain it clearly. I am not looking for a way to calculate the checksum from a processor. That would be trivial. I want to embed the checksum in the code, so that it can be provided at run time as an ID, a way to validate the version number.
    The actual implementation depends on the tools you are using. Many linkers support this directly: you specify the memory section(s) subjected to checksum calculation, the type of checksum (CRC16, CRC32 etc) and the
    memory section that will store the checksum.
    I wish to perform this checksum on the executable file.
    Here is a technical note for IAR: https://www.iar.com/knowledge/support/technical-notes/general/checksum-calculation-with-xlink/

    A "poor man" solution is to do it manually:

    -In the source code, declare your checksum initializing to a known, fixed value (e.g. 0xDEADBEEF)
    -Run the program with a debugger; set a breakpoint when it calculates the checksum (and fails), and write down the correct checksum
    -Using a binary editor, find the fixed value into the executable binary,
    and replace it with the correct value.
    Yeah, this is not useful, because changing the value stored changes the checksum. It also makes assumptions about the target.
    Maybe this was not the best group to ask the question in. I thought this was more of a math problem with I started writing the question and the embedded community had already dealt with it.
    --
    Rick C.
    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Thu Apr 20 06:18:25 2023
    From Newsgroup: comp.arch.embedded

    On Thursday, April 20, 2023 at 5:15:04 AM UTC-4, Niklas Holsti wrote:
    On 2023-04-20 5:06, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.
    Some decades ago I was involved with a project for an 8052-based device, which was required to perform a code-check-sum check at boot.

    We decided to use a byte-per-byte xor checksum and make the correct check-sum be zero. We had a code module (possibly in assembler, I don't remember) that defined a one-byte "adjustment" constant in code memory.
    For each new version of the code, we first set the adjustment constant
    to zero, then ran the program, and it usually reported an error at boot because the check-sum was not zero. We then changed the adjustment
    constant to the actual reported checksum, C say, and that zeroed the check-sum because C xor C = 0. Bingo. You can use this method to make
    the checksum anything you like, for example hex 55.

    With a more advanced order-sensitive check-sum such as a CRC you could
    use the same method if you also ensure (by linker commands) that the adjustment value is always the last value that enters in the computed check-sum (assuming that the linking order of the other code modules is
    not incidentally changed when the value of the adjustment constant is changed).
    Yes, it had occurred to me that a simple checksum could be used with adjustment codes. But I don't want the checksum to be set to some value, in this way. I would like to embed the check sum generated from the file. The way to do this is to embed the checksum in the spot where it can be read for reporting. Then another value can be embedded elsewhere, that complements the checksum, keeping the file checksum constant.
    Your mention of the XOR checksum makes me realize that if I use addition, rather than XOR, a 16 bit checksum only has a complement if the data used in the calculation are 16 bit quantities. If the 16 bit checksum is calculated using 8 bit data, there will be a carry out of the lower 8 bits changing the final checksum. The XOR checksum is really the equivalent of 8 separate bit level checksums. This has the short coming of one bit detection, but two bit changes in the same bit of two bytes not being detected. But since I'm not trying to protect against changes, this isn't really a problem. I'm using this as a verification of the version number.
    --
    Rick C.
    -- Get 1,000 miles of free Supercharging
    -- Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Thu Apr 20 16:46:35 2023
    From Newsgroup: comp.arch.embedded

    On 20/04/2023 04:06, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple
    modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work
    out in any useful way. Even if you try to anticipate the impact of
    adding the checksum, that only gives you a different checksum, that
    you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the
    checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to
    achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X
    when the checksum data is zero. The goal would then be to include
    the checksum data value Y in the file, that would change X to Y.
    Given the properties of the module N checksum, this would appear to
    be impossible for the general case, unless... Add another data
    value, called, checksum normalizer. This data value checksums with
    the original checksum to give the result zero. Then, when the
    checksum is also added, the resulting checksum is, in fact, the
    checksum. Another way of looking at this is to add a value that
    combines with the added checksum, to be zero, leaving the original
    checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum
    would not be an issue, I think. At least, this could work in
    software, where data can be included in an image file as itself. In
    a device like an FPGA, it might not be included in the bit stream
    file so directly... but that might depend on where in the device it
    is inserted. Memory might have data that is stored as itself. I'll
    need to look into that.



    I am not sure what your intended use-case is here. But it is very
    common to add a checksum of some sort to binary image files after
    generating them. This is done post-link. You have a struct in your
    read-only data that you link at a known fixed point in the binary. Your post-link patcher can read this struct (for example, to get the program version number that is then used to rename the final image file). It
    can modify the struct (such as inserting the length of the image). Then
    it calculates a CRC and appends it to the end of the image.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Thu Apr 20 11:33:19 2023
    From Newsgroup: comp.arch.embedded

    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C <gnuarm.deletethisbit@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    Take a look at the old xmodem/ymodem CRC. It was designed such that
    when the CRC was sent immediately following the data, a receiver
    computing CRC over the whole incoming packet (data and CRC both) would
    get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use xmodem/ymodem.


    I'm not thinking anything too fancy, like a CRC, but rather a simple
    modulo N addition, maybe N being 2^16.

    Sorry, I don't know a way to do it with a modular checksum.
    YMMV, but I think 16-bit CRC is pretty simple.

    George
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Thu Apr 20 09:45:59 2023
    From Newsgroup: comp.arch.embedded

    On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C
    <gnuarm.del...@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.
    Take a look at the old xmodem/ymodem CRC. It was designed such that
    when the CRC was sent immediately following the data, a receiver
    computing CRC over the whole incoming packet (data and CRC both) would
    get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use xmodem/ymodem.
    I'm not thinking anything too fancy, like a CRC, but rather a simple >modulo N addition, maybe N being 2^16.
    Sorry, I don't know a way to do it with a modular checksum.
    YMMV, but I think 16-bit CRC is pretty simple.

    George
    CRC is not complicated, but I would not know how to calculate an inserted value to force the resulting CRC to zero. How do you do that?
    Even so, I'm not trying to validate the file. I'm trying to come up with a substitute for a time stamp or version number. I don't want to have to rely on my consistency in handling the version number correctly. This would be a backup in case there was more than one version released, even only within the "lab", that were different. A checksum that could be read by the controlling software would do the job.
    I have run into this before, where the version number was not a 100% indication of the uniqueness of an executable. The checksum would be a second indicator.
    I should mention that I'm not looking for a solution that relies on any specific details of the tools.
    --
    Rick C.
    -+ Get 1,000 miles of free Supercharging
    -+ Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tauno Voipio@tauno.voipio@notused.fi.invalid to comp.arch.embedded on Thu Apr 20 20:17:07 2023
    From Newsgroup: comp.arch.embedded

    On 20.4.2023 18.33, George Neuner wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C <gnuarm.deletethisbit@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    Take a look at the old xmodem/ymodem CRC. It was designed such that
    when the CRC was sent immediately following the data, a receiver
    computing CRC over the whole incoming packet (data and CRC both) would
    get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use xmodem/ymodem.


    I'm not thinking anything too fancy, like a CRC, but rather a simple
    modulo N addition, maybe N being 2^16.

    Sorry, I don't know a way to do it with a modular checksum.
    YMMV, but I think 16-bit CRC is pretty simple.

    George


    The method to check for a proper constant value after the whole
    block and CRC are received and put through the generator works
    with the CRC-CCITT (actually ITU-T). The proper final value
    depends on the initial CRC and whether the CRC is inverted before
    sending. The limitation is that the CRC has to be sent least
    significant octet first.

    For a reference, see RFC1662, Appendix C.
    --

    -TV

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Thu Apr 20 22:26:51 2023
    From Newsgroup: comp.arch.embedded

    On 20/04/2023 18:45, Rick C wrote:
    On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner
    wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C
    <gnuarm.del...@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed
    a checksum in a code module to report the checksum, is there a
    way of doing this? It's a bit like being your own grandfather, I
    think.
    Take a look at the old xmodem/ymodem CRC. It was designed such
    that when the CRC was sent immediately following the data, a
    receiver computing CRC over the whole incoming packet (data and CRC
    both) would get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use
    xmodem/ymodem.
    I'm not thinking anything too fancy, like a CRC, but rather a
    simple modulo N addition, maybe N being 2^16.
    Sorry, I don't know a way to do it with a modular checksum. YMMV,
    but I think 16-bit CRC is pretty simple.

    George

    CRC is not complicated, but I would not know how to calculate an
    inserted value to force the resulting CRC to zero. How do you do
    that?

    You "insert" the value at the end. Anything else is insane.

    CRC's are quite good hashes, for suitable sized data. There are perhaps
    some special cases, but basically you'd be doing trial-and-error
    searches to find an inserted value that gives you a zero CRC overall.
    2^16 is not an overwhelming search space, but the whole idea is pointless.


    Even so, I'm not trying to validate the file. I'm trying to come up
    with a substitute for a time stamp or version number. I don't want
    to have to rely on my consistency in handling the version number
    correctly. This would be a backup in case there was more than one
    version released, even only within the "lab", that were different. A checksum that could be read by the controlling software would do the
    job.

    A CRC is fine for that.


    I have run into this before, where the version number was not a 100% indication of the uniqueness of an executable. The checksum would be
    a second indicator.

    I should mention that I'm not looking for a solution that relies on
    any specific details of the tools.


    A table-based CRC is easy, runs quickly, and can be quickly ported to
    pretty much any language (the C and Python code, for example, is almost
    the same).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Thu Apr 20 16:44:10 2023
    From Newsgroup: comp.arch.embedded

    On Thu, 20 Apr 2023 09:45:59 -0700 (PDT), Rick C <gnuarm.deletethisbit@gmail.com> wrote:

    On Thursday, April 20, 2023 at 11:33:28?AM UTC-4, George Neuner wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C
    <gnuarm.del...@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    Take a look at the old xmodem/ymodem CRC. It was designed such that
    when the CRC was sent immediately following the data, a receiver
    computing CRC over the whole incoming packet (data and CRC both) would
    get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use
    xmodem/ymodem.
    I'm not thinking anything too fancy, like a CRC, but rather a simple
    modulo N addition, maybe N being 2^16.
    Sorry, I don't know a way to do it with a modular checksum.
    YMMV, but I think 16-bit CRC is pretty simple.

    George

    CRC is not complicated, but I would not know how to calculate an
    inserted value to force the resulting CRC to zero. How do you do
    that?

    It's implicit in the equation they chose. I don't know how it works -
    just that it does.


    You have some block of data |....data....|

    You compute CRC on the data block and then append the resulting value
    to the end of the block. xmodem CRC is 16-bit, so it adds 2 bytes to
    the data.

    So now you have a new extended block |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or
    the original CRC value appended to it has been changed/corrupted.


    Even so, I'm not trying to validate the file. I'm trying to come up
    with a substitute for a time stamp or version number. I don't want
    to have to rely on my consistency in handling the version number
    correctly. This would be a backup in case there was more than one
    version released, even only within the "lab", that were different. A >checksum that could be read by the controlling software would do the
    job.

    I've actually done this: in the early 90s I designed a system that
    used a CRC based scheme to identify load modules and track
    inter-module code dependencies.

    I computed both 16-bit Xmodem and CCITT CRCs on the modules and
    concatenated the two values into a 32-bit identifier. That identifier
    then was used to sign the module and to demand load (or unload) it
    when needed.

    At the time it worked quite well: the system had quite limited memory,
    so code modules were small enough that even a 16-bit CRC could
    uniquely identify most/all of them. Combining the two different CRCs
    into a 32-bit identifier provided more than enough uniqueness, it was
    fast and easy to compute, and it saved a lot of space vs using
    something with stronger guarantees like a UUID or crypto-strength
    signing hash.
    [A lot of the hashing functions available today either didn't exist or
    just weren't widely known back then. And still most of them that even
    have 32-bit variants are weak in guarantees for those variants.]


    I have run into this before, where the version number was not a 100% >indication of the uniqueness of an executable. The checksum would be
    a second indicator.

    I made it the basis of dependency checking. Version numbers were
    secondary and for the benefit of the programmer.


    I should mention that I'm not looking for a solution that relies on
    any specific details of the tools.

    YMMV.
    George
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From George Neuner@gneuner2@comcast.net to comp.arch.embedded on Thu Apr 20 16:49:20 2023
    From Newsgroup: comp.arch.embedded

    On Thu, 20 Apr 2023 20:17:07 +0300, Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:

    The method to check for a proper constant value after the whole
    block and CRC are received and put through the generator works
    with the CRC-CCITT (actually ITU-T). The proper final value
    depends on the initial CRC and whether the CRC is inverted before
    sending. The limitation is that the CRC has to be sent least
    significant octet first.

    For a reference, see RFC1662, Appendix C.

    I remember seeing an explanation of it decades ago, but I never would
    have been able to find it again.

    Thanks,
    George
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@Richard@Damon-Family.org to comp.arch.embedded on Thu Apr 20 22:09:29 2023
    From Newsgroup: comp.arch.embedded

    On 4/19/23 10:06 PM, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.



    IF I understand you correctly, what you want is for the file to compute
    to some "checksum" that comes from the basic contents of the file, and
    then you want to add the "checksum" into the file so the program itself
    can print its checksum.

    One fact to remember, is that "cryptographic hashes" were invented
    because it was too easy to create a faked file that matches a
    non-crptographic hash/checksum, so that couldn't be a key to make sure
    you really had the right file in the presence of a determined enemy, but
    the checksums were good enough to catch "random" errors.

    This means that you can add the checksum into the file, and some
    additional bytes (likely at the end) and by knowing the propeties of the checksum algorithm, compute a value for those extra bytes such that the
    "undo" the changes caused by adding the checksum bytes to file.

    I'm not sure exactly how to computes these, but the key is that you add something at the end of the file to get the checksum back to what the
    original file had before you added the checksum into the file.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Thu Apr 20 19:41:47 2023
    From Newsgroup: comp.arch.embedded

    On Thursday, April 20, 2023 at 10:09:35 PM UTC-4, Richard Damon wrote:
    On 4/19/23 10:06 PM, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

    IF I understand you correctly, what you want is for the file to compute
    to some "checksum" that comes from the basic contents of the file, and
    then you want to add the "checksum" into the file so the program itself
    can print its checksum.

    One fact to remember, is that "cryptographic hashes" were invented
    because it was too easy to create a faked file that matches a non-crptographic hash/checksum, so that couldn't be a key to make sure
    you really had the right file in the presence of a determined enemy, but
    the checksums were good enough to catch "random" errors.

    This means that you can add the checksum into the file, and some
    additional bytes (likely at the end) and by knowing the propeties of the checksum algorithm, compute a value for those extra bytes such that the "undo" the changes caused by adding the checksum bytes to file.

    I'm not sure exactly how to computes these, but the key is that you add something at the end of the file to get the checksum back to what the original file had before you added the checksum into the file.
    Yeah, for a simple checksum, I think that would be easy, at least if "checksum" means a bitwise XOR operation. If the checksum and extra bytes are both 16 bits, this would also work for an arithmetic checksum where each 16 bit word were added into the checksum. All the carries would cascade out of the upper 16 bits from adding the inserted checksum and it's 2's complement.
    I don't even want to think about using a CRC to try to do this.
    --
    Rick C.
    +- Get 1,000 miles of free Supercharging
    +- Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Thu Apr 20 22:37:07 2023
    From Newsgroup: comp.arch.embedded

    On 4/20/2023 1:44 PM, George Neuner wrote:
    You have some block of data |....data....|

    You compute CRC on the data block and then append the resulting value
    ---------------------------------------------^^^^^^
    to the end of the block. xmodem CRC is 16-bit, so it adds 2 bytes to
    the data.

    Exactly. You *don't* drag the "extra bits" into the initial
    CRC calculation but *do* into the CRC *verification*. Easy
    peasy (since forever).

    [Think about it: your performing a division operation
    and the residual is the "remainder".]

    Note that you want to choose a polynomial that doesn't
    give you a "win" result for "obviously" corrupt data.
    E.g., if data is all zeros or all 0xFF (as these sorts of
    conditions can happen with hardware failures) you probably
    wouldn't want a "success" indication!

    You can also "salt" the calculation so that the residual
    is deliberately nonzero. So, for example, "success" is
    indicated by a residual of 0x474E. :>

    So now you have a new extended block |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or
    the original CRC value appended to it has been changed/corrupted.

    As there is usually a lack of originality in the algorithms
    chosen, you have to consider if you are also hoping to use
    this to safeguard the *integrity* of your image (i.e.,
    against intentional modification).

    I have an old Compaq Portable 386 (lunchbox) that obviously
    wasn't designed to support the disk drives that I would
    *later* install in it. So, I patched the BIOS ROMs to
    add another disk type to the Disk Parameter Table. Then,
    made compensating changes to other parts of the ROM
    (that I knew would not be referenced) to ensure the original
    checksum -- WHEREVER IT MAY HAVE BEEN "STORED" -- would remain
    intact.

    [I could similarly have altered the boot message to show
    a copyright of "Don Y" in place of "Compaq" -- as it would
    be pretty easy to locate the "Compaq" string (in plaintext)
    in the image.]

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Brian Cockburn@brian.cockburn.1959@gmail.com to comp.arch.embedded on Fri Apr 21 01:53:15 2023
    From Newsgroup: comp.arch.embedded

    On Thursday, April 20, 2023 at 12:06:36 PM UTC+10, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209
    Rick, What is the purpose of this? Is it (1) to be able to externally identify a binary, as one might a ROM image by computing a checksum? Is it (2) for a run-able binary to be able to check itself? This would of course only be able to detect corruption, not tampering. Is it (3) for the loader (whatever that might be) to be able to say 'this binary has the correct checksum' and only jump to it if it does? Again this would only be able to detect corruption, not tampering. Are you hoping for more than corruption detection?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 21 12:43:39 2023
    From Newsgroup: comp.arch.embedded

    On 21/04/2023 07:37, Don Y wrote:
    On 4/20/2023 1:44 PM, George Neuner wrote:
    You have some block of data   |....data....|

    You compute CRC on the data block and then append the resulting value
    ---------------------------------------------^^^^^^
    to the end of the block.  xmodem CRC is 16-bit, so it adds 2 bytes to
    the data.

    Exactly.  You *don't* drag the "extra bits" into the initial
    CRC calculation but *do* into the CRC *verification*.  Easy
    peasy (since forever).

    [Think about it:  your performing a division operation
    and the residual is the "remainder".]

    George's earlier posts made it look like the algorithm was inserting ("embedding") a value somewhere inside the image, so that the CRC over
    the modified image was zero. This is easy to do for simple checksums
    such as XOR's or a sum-of-bytes checksum, but infeasible for CRC's.

    It is a much easier matter when appending the checksum. Depending
    somewhat on the details of the CRC (such as bit/byte reversals,
    inversions, starting values, etc.) it is typically the case that for a
    binary blob A, crc(A ++ crc(A)) = 0. i.e., if you append the CRC of
    your data to the data, the CRC of the whole thing is 0.

    Of course, this is pretty much irrelevant - whether you check the
    integrity of the final image by running CRC over it all and comparing to
    0, or running it over all but the last word and comparing to the last
    word is a minor matter.


    Note that you want to choose a polynomial that doesn't
    give you a "win" result for "obviously" corrupt data.
    E.g., if data is all zeros or all 0xFF (as these sorts of
    conditions can happen with hardware failures) you probably
    wouldn't want a "success" indication!

    No, that is pointless for something like a code image. It just adds
    needless complexity to your CRC algorithm.

    You should already have checks that would eliminate an all-zero image or
    other "obviously corrupt" data. You'll be checking the image for a key
    or "magic number" that identifies the image as "program image for board
    X, project Y". You'll be checking version numbers. You'll be reading
    the length of the image so you know the range for your CRC function, and
    where to find the appended CRC check. You might not have all of these
    in a given system, but you'll have some kind of check which would fail
    on an all-zero image.


    You can also "salt" the calculation so that the residual
    is deliberately nonzero.  So, for example, "success" is
    indicated by a residual of 0x474E.  :>


    Again, pointless.

    Salt is important for security-related hashes (like password hashes),
    not for integrity checks.

    So now you have a new extended block   |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or
    the original CRC value appended to it has been changed/corrupted.

    As there is usually a lack of originality in the algorithms
    chosen, you have to consider if you are also hoping to use
    this to safeguard the *integrity* of your image (i.e.,
    against intentional modification).


    "Integrity" has nothing to do with the motivation for change.
    /Security/ is concerned with intentional modifications that deliberately attempt to defeat /integrity/ checks. Integrity is about detecting any changes.

    If you are concerned about the possibility of intentional malicious
    changes, CRC's alone are useless. All the attacker needs to do after modifying the image is calculate the CRC themselves, and replace the
    original checksum with their own.

    Using non-standard algorithms for security is a simple way to get things completely wrong. "Security by obscurity" is very rarely the right
    answer. In reality, good security algorithms, and good implementations,
    are difficult and specialised tasks, best left to people who know what
    they are doing.

    To make something secure, you have to ensure that the check algorithms
    depend on a key that you know, but that the attacker does not have.
    That's the basis of digital signatures (though you use a secure hash
    algorithm rather than a simple CRC).



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Fri Apr 21 04:39:38 2023
    From Newsgroup: comp.arch.embedded

    On 4/21/2023 3:43 AM, David Brown wrote:
    Note that you want to choose a polynomial that doesn't
    give you a "win" result for "obviously" corrupt data.
    E.g., if data is all zeros or all 0xFF (as these sorts of
    conditions can happen with hardware failures) you probably
    wouldn't want a "success" indication!

    No, that is pointless for something like a code image.  It just adds needless
    complexity to your CRC algorithm.

    Perhaps you've forgotten that you don't just use CRCs (secure hashes, etc.)
    on "code images"?

    You should already have checks that would eliminate an all-zero image or other
    "obviously corrupt" data.  You'll be checking the image for a key or "magic number" that identifies the image as "program image for board X, project Y".
    You'll be checking version numbers.  You'll be reading the length of the image
    so you know the range for your CRC function, and where to find the appended CRC
    check.  You might not have all of these in a given system, but you'll have some
    kind of check which would fail on an all-zero image.

    See above.

    You can also "salt" the calculation so that the residual
    is deliberately nonzero.  So, for example, "success" is
    indicated by a residual of 0x474E.  :>

    Again, pointless.

    Salt is important for security-related hashes (like password hashes), not for
    integrity checks.

    You've missed the point. The correct "sum" can be anything.
    Why is "0" more special than any other value? As the value is
    typically meaningless to anything other than the code that verifies
    it, you couldn't look at an image (or the output of the verifier)
    and gain anything from seeing that obscure value.

    OTOH, if the CRC yields something familiar -- or useful -- then
    it can tell you something about the image. E.g., salt the algorithm
    with the product code, version number, your initials, 0xDEADBEEF, etc.

    So now you have a new extended block   |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or
    the original CRC value appended to it has been changed/corrupted.

    As there is usually a lack of originality in the algorithms
    chosen, you have to consider if you are also hoping to use
    this to safeguard the *integrity* of your image (i.e.,
    against intentional modification).

    "Integrity" has nothing to do with the motivation for change. /Security/ is concerned with intentional modifications that deliberately attempt to defeat /integrity/ checks.  Integrity is about detecting any changes.

    If you are concerned about the possibility of intentional malicious changes,

    Changes don't have to be malicious. I altered the test procedure for a
    piece of military gear we were building simply to skip some lengthy tests that I *knew* would pass (I don't want to inject an extra 20 minutes of wait time just to get through a lengthy test I already know works before I can get
    to the test of interest to me, now.

    I failed to undo the change before the official signoff on the device.

    The only evidence of this was the fact that I had also patched the
    startup message to say "Go for coffee..." -- which remained on the
    screen for the duration of the lengthy (even with the long test
    elided) procedure...

    ..which alerted folks to the fact that this *probably* wasn't the
    original image. (The computer running the test suite on the DUT had
    no problem accepting my patched binary)

    CRC's alone are useless.  All the attacker needs to do after modifying the image is calculate the CRC themselves, and replace the original checksum with
    their own.

    That assumes the "alterer" knows how to replace the checksum, how it
    is computed, where it is embedded in the image, etc. I modified the Compaq portable mentioned without ever knowing where the checksum was store
    or *if* it was explicitly stored. I had no desire to disassemble the
    BIOS ROMs (though could obviously do so as there was no "proprietary
    hardware" limiting access to their contents and the instruction set of
    the processor is well known!).

    Instead, I did this by *guessing* how they would implement such a check
    in a bit of kit from that era (ERPOMs aren't easily modified by malware
    so it wasn't likely that they would go to great lengths to "protect" the image). And, if my guess had been incorrect, I could always reinstall
    the original EPROMs -- nothing lost, nothing gained.

    Had much experience with folks counterfeiting your products and making
    "simple" changes to the binaries? Like changing the copyright notice
    or splash screen?

    Then, bringing the (accused) counterfeit of YOUR product into a courtroom
    and revealing the *hidden* checksum that the counterfeiter wasn't aware of?

    "Gee, why does YOUR (alleged) device have *my* name in it -- in addition
    to behaving exactly like mine??"

    [I guess obscurity has its place!]

    Use a non-secret approach and you invite folks to alter it, as well.

    Using non-standard algorithms for security is a simple way to get things completely wrong.  "Security by obscurity" is very rarely the right answer.  In
    reality, good security algorithms, and good implementations, are difficult and
    specialised tasks, best left to people who know what they are doing.

    To make something secure, you have to ensure that the check algorithms depend
    on a key that you know, but that the attacker does not have. That's the basis
    of digital signatures (though you use a secure hash algorithm rather than a simple CRC).

    If you can remove the check, then what value the key's secrecy? By your criteria, the adversary KNOWS how you are implementing your security
    so he knows exactly what to remove to bypass your checks and allow his
    altered image to operate in its place.

    Ever notice how manufacturers don't PUBLICLY disclose their security
    hooks (without an NDA)? If "security by obscurity" was not important,
    they would publish these details INVITING challenges (instead of
    trying to limit the knowledge to people with whom they've officially contracted).

    [If it was so good and they were trying to rely on trade secret, why
    not just PATENT their approach, also disclosing it in the process?
    Surely, the details will "leak" from one of the NDA signers long
    before patent protection would expire... And, presumably, these
    are "people who know what they are doing"...]

    Sign all the binaries and all I have to do is remove the *test* for
    those signatures and the images can be as corrupted as I choose.

    You need to "secure" the test if you want the image to be securable.
    This is why it is so hard to use "open" security protocols on
    hardware devices (cuz there are almost always ways to subvert the
    verification process/hardware). Having physical access to a device
    usually means it can be compromised -- if worth your effort.

    [The trick is to make the effort great enough to be on a par with
    just copying the *functionality*, from scratch, and not bothering
    trying to alter the executable in a way that is not detectable]

    [[There are companies who's business models are exactly that -- cloning
    other products (e.g., from folks like big blue) at the functional
    level -- yet steering clear of any copyright issues.]]

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Fri Apr 21 05:12:46 2023
    From Newsgroup: comp.arch.embedded

    On Friday, April 21, 2023 at 4:53:18 AM UTC-4, Brian Cockburn wrote:
    On Thursday, April 20, 2023 at 12:06:36 PM UTC+10, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209
    Rick, What is the purpose of this? Is it (1) to be able to externally identify a binary, as one might a ROM image by computing a checksum? Is it (2) for a run-able binary to be able to check itself? This would of course only be able to detect corruption, not tampering. Is it (3) for the loader (whatever that might be) to be able to say 'this binary has the correct checksum' and only jump to it if it does? Again this would only be able to detect corruption, not tampering. Are you hoping for more than corruption detection?
    This is simply to be able to say this version is unique, regardless of what the version number says. Version numbers are set manually and not always done correctly. I'm looking for something as a backup so that if the checksums are different, I can be sure the versions are not the same.
    The less work involved, the better.
    --
    Rick C.
    ++ Get 1,000 miles of free Supercharging
    ++ Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 21 16:50:31 2023
    From Newsgroup: comp.arch.embedded

    On 21/04/2023 13:39, Don Y wrote:
    On 4/21/2023 3:43 AM, David Brown wrote:
    Note that you want to choose a polynomial that doesn't
    give you a "win" result for "obviously" corrupt data.
    E.g., if data is all zeros or all 0xFF (as these sorts of
    conditions can happen with hardware failures) you probably
    wouldn't want a "success" indication!

    No, that is pointless for something like a code image.  It just adds
    needless complexity to your CRC algorithm.

    Perhaps you've forgotten that you don't just use CRCs (secure hashes, etc.) on "code images"?

    No - but "code images" is the topic here.

    However, in almost every case where CRC's might be useful, you have
    additional checks of the sanity of the data, and an all-zero or all-one
    data block would be rejected. For example, Ethernet packets use CRC for integrity checking, but an attempt to send a packet type 0 from MAC
    address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0,
    would be rejected anyway.

    I can't think of any use-cases where you would be passing around a block
    of "pure" data that could reasonably take absolutely any value, without
    any type of "envelope" information, and where you would think a CRC
    check is appropriate.


    You should already have checks that would eliminate an all-zero image
    or other "obviously corrupt" data.  You'll be checking the image for a
    key or "magic number" that identifies the image as "program image for
    board X, project Y". You'll be checking version numbers.  You'll be
    reading the length of the image so you know the range for your CRC
    function, and where to find the appended CRC check.  You might not
    have all of these in a given system, but you'll have some kind of
    check which would fail on an all-zero image.

    See above.

    See above.


    You can also "salt" the calculation so that the residual
    is deliberately nonzero.  So, for example, "success" is
    indicated by a residual of 0x474E.  :>

    Again, pointless.

    Salt is important for security-related hashes (like password hashes),
    not for integrity checks.

    You've missed the point.  The correct "sum" can be anything.
    Why is "0" more special than any other value?  As the value is
    typically meaningless to anything other than the code that verifies
    it, you couldn't look at an image (or the output of the verifier)
    and gain anything from seeing that obscure value.

    Do you actually know what is meant by "salt" in the context of hashes,
    and why it is useful in some circumstances? Do you understand that
    "salt" is added (usually prepended, or occasionally mixed in in some
    other way) to the data /before/ the hash is calculated?

    I have not given the slightest indication to suggest that "0" is a
    special value. I fully agree that the value you get from the checking algorithm does not have to be 0 - I already suggested it could be
    compared to the stored value. I.e., your build your image file as "data
    ++ crc(data)", at check it by re-calculating "crc(data)" on the received
    image and comparing the result to the received crc. There is no
    necessity or benefit in having a crc run calculated over the received
    data plus the received crc being 0.

    "Salt" is used in cases where the original data must be kept secret, and
    only the hashes are transmitted or accessible - by adding salt to the
    original data before hashing it, you avoid a direct correspondence
    between the hash and the original data. The prime use-case is to stop
    people being able to figure out a password by looking up the hash in a
    list of pre-computed hashes of common passwords.


    OTOH, if the CRC yields something familiar -- or useful -- then
    it can tell you something about the image.  E.g., salt the algorithm
    with the product code, version number, your initials, 0xDEADBEEF, etc.


    You are making no sense at all. Are you suggesting that it would be a
    good idea to add some value to the start of the image so that the
    resulting crc calculation gives a nice recognisable product code? This
    "salt" would be different for each program image, and calculated by
    trial and error. If you want a product code, version number, etc., in
    the program image (and it's a good idea), just put these in the program
    image!


    So now you have a new extended block   |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or
    the original CRC value appended to it has been changed/corrupted.

    As there is usually a lack of originality in the algorithms
    chosen, you have to consider if you are also hoping to use
    this to safeguard the *integrity* of your image (i.e.,
    against intentional modification).

    "Integrity" has nothing to do with the motivation for change.
    /Security/ is concerned with intentional modifications that
    deliberately attempt to defeat /integrity/ checks.  Integrity is about
    detecting any changes.

    If you are concerned about the possibility of intentional malicious
    changes,

    Changes don't have to be malicious.


    Accidental changes (such as human error, noise during data transfer,
    memory cell errors, etc.) do not pass integrity tests unnoticed. To be
    more accurate, the chances of them passing unnoticed are of the order of
    1 in 2^n, for a good n-bit check such as a CRC check. Certain types of
    error are always detectable, such as single and double bit errors. That
    is the point of using a checksum or hash for integrity checking.

    /Intentional/ changes are a different matter. If a hacker changes the
    program image, they can change the transmitted hash to their own
    calculated hash. Or for a small CRC, they could change a different part
    of the image until the original checksum matched - for a 16-bit CRC,
    that only takes 65,535 attempts in the worst case.

    That is why you need to distinguish between the two possibilities. If
    you don't have to worry about malicious attacks, a 32-bit CRC takes a
    dozen lines of C code and a 1 KB table, all running extremely
    efficiently. If security is an issue, you need digital signatures - an RSA-based signature system is orders of magnitude more effort in both development time and in run time.


    I altered the test procedure for a
    piece of military gear we were building simply to skip some lengthy
    tests that I *knew* would pass (I don't want to inject an extra 20
    minutes of wait time
    just to get through a lengthy test I already know works before I can get
    to the test of interest to me, now.

    I failed to undo the change before the official signoff on the device.

    The only evidence of this was the fact that I had also patched the
    startup message to say "Go for coffee..." -- which remained on the
    screen for the duration of the lengthy (even with the long test
    elided) procedure...

    ..which alerted folks to the fact that this *probably* wasn't the
    original image.  (The computer running the test suite on the DUT had
    no problem accepting my patched binary)

    And what, exactly, do you think that anecdote tells us about CRC checks
    for image files? It reminds us that we are all fallible, but does no
    more than that.



    CRC's alone are useless.  All the attacker needs to do after modifying
    the image is calculate the CRC themselves, and replace the original
    checksum with their own.

    That assumes the "alterer" knows how to replace the checksum, how it
    is computed, where it is embedded in the image, etc.  I modified the Compaq portable mentioned without ever knowing where the checksum was store
    or *if* it was explicitly stored.  I had no desire to disassemble the
    BIOS ROMs (though could obviously do so as there was no "proprietary hardware" limiting access to their contents and the instruction set of
    the processor is well known!).

    Instead, I did this by *guessing* how they would implement such a check
    in a bit of kit from that era (ERPOMs aren't easily modified by malware
    so it wasn't likely that they would go to great lengths to "protect" the image).  And, if my guess had been incorrect, I could always reinstall
    the original EPROMs -- nothing lost, nothing gained.

    Had much experience with folks counterfeiting your products and making "simple" changes to the binaries?  Like changing the copyright notice
    or splash screen?

    Then, bringing the (accused) counterfeit of YOUR product into a courtroom
    and revealing the *hidden* checksum that the counterfeiter wasn't aware of?

    "Gee, why does YOUR (alleged) device have *my* name in it -- in addition
    to behaving exactly like mine??"

    [I guess obscurity has its place!]

    Security by obscurity is not security. Having a hidden signature or
    other mark can be useful for proving ownership (making an intentional
    mistake is another common tactic - such as commercial maps having a few
    subtle spelling errors). But that is not security.


    Use a non-secret approach and you invite folks to alter it, as well.

    Using non-standard algorithms for security is a simple way to get
    things completely wrong.  "Security by obscurity" is very rarely the
    right answer.  In reality, good security algorithms, and good
    implementations, are difficult and specialised tasks, best left to
    people who know what they are doing.

    To make something secure, you have to ensure that the check algorithms
    depend on a key that you know, but that the attacker does not have.
    That's the basis of digital signatures (though you use a secure hash
    algorithm rather than a simple CRC).

    If you can remove the check, then what value the key's secrecy?  By your criteria, the adversary KNOWS how you are implementing your security
    so he knows exactly what to remove to bypass your checks and allow his altered image to operate in its place.

    Ever notice how manufacturers don't PUBLICLY disclose their security
    hooks (without an NDA)?  If "security by obscurity" was not important,
    they would publish these details INVITING challenges (instead of
    trying to limit the knowledge to people with whom they've officially contracted).


    Any serious manufacturer /does/ invite challenges to their security.

    There are multiple reasons why a manufacturer (such as a semiconductor manufacturer) might be guarded about the details of their security
    systems. They can be avoiding giving hints to competitors. Maybe they
    know their systems aren't really very secure, because their keys are too
    short or they can be read out in some way.

    But I think the main reasons are often:

    They want to be able to change the details, and that's far easier if
    there are only a few people who have read the information.

    They don't want endless support questions from amateurs.

    They are limited by idiotic government export restrictions made by
    ignorant politicians who don't understand cryptography.



    Some things benefit from being kept hidden, or under restricted access.
    The details of the CRC algorithm you use to catch accidental errors in
    your image file is /not/ one of them. If you think hiding it has the
    remotest hint of a benefit, you are doing things wrong - you need a
    /security/ check, not a simple /integrity/ check.

    And then once you have switched to a security check - a digital
    signature - there's no need to keep that choice hidden either, because
    it is the /key/ that is important, not the type of lock.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 21 17:02:21 2023
    From Newsgroup: comp.arch.embedded

    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique, regardless
    of what the version number says. Version numbers are set manually
    and not always done correctly. I'm looking for something as a backup
    so that if the checksums are different, I can be sure the versions
    are not the same.

    The less work involved, the better.


    Run a simple 32-bit crc over the image. The result is a hash of the
    image. Any change in the image will show up as a change in the crc.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Stefan Reuther@stefan.news@arcor.de to comp.arch.embedded on Fri Apr 21 19:40:11 2023
    From Newsgroup: comp.arch.embedded

    Am 20.04.2023 um 22:44 schrieb George Neuner:
    On Thu, 20 Apr 2023 09:45:59 -0700 (PDT), Rick C
    CRC is not complicated, but I would not know how to calculate an
    inserted value to force the resulting CRC to zero. How do you do
    that?

    It's implicit in the equation they chose. I don't know how it works -
    just that it does.

    It works for any CRC that starts with zero and does not invert.

    CRC is based on polynomial division remainders. Basically, the CRC is a division remainder of the input interpreted as a polynomial, and if you
    add that remainder back into the equation, result is zero.

    https://crccalc.com/?crc=12345678&method=CRC-16/AUG-CCITT&datatype=hex&outtype=0
    result is 0xBA3C

    https://crccalc.com/?crc=12345678BA3C&method=CRC-16/AUG-CCITT&datatype=hex&outtype=0
    result is 0x0000

    (Need to be careful with byte orders; for some CRCs on that page, you
    need to swap the bytes before appending.)


    Stefan
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@Richard@Damon-Family.org to comp.arch.embedded on Fri Apr 21 19:30:05 2023
    From Newsgroup: comp.arch.embedded

    On 4/20/23 10:41 PM, Rick C wrote:
    On Thursday, April 20, 2023 at 10:09:35 PM UTC-4, Richard Damon wrote:
    On 4/19/23 10:06 PM, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

    IF I understand you correctly, what you want is for the file to compute
    to some "checksum" that comes from the basic contents of the file, and
    then you want to add the "checksum" into the file so the program itself
    can print its checksum.

    One fact to remember, is that "cryptographic hashes" were invented
    because it was too easy to create a faked file that matches a
    non-crptographic hash/checksum, so that couldn't be a key to make sure
    you really had the right file in the presence of a determined enemy, but
    the checksums were good enough to catch "random" errors.

    This means that you can add the checksum into the file, and some
    additional bytes (likely at the end) and by knowing the propeties of the
    checksum algorithm, compute a value for those extra bytes such that the
    "undo" the changes caused by adding the checksum bytes to file.

    I'm not sure exactly how to computes these, but the key is that you add
    something at the end of the file to get the checksum back to what the
    original file had before you added the checksum into the file.

    Yeah, for a simple checksum, I think that would be easy, at least if "checksum" means a bitwise XOR operation. If the checksum and extra bytes are both 16 bits, this would also work for an arithmetic checksum where each 16 bit word were added into the checksum. All the carries would cascade out of the upper 16 bits from adding the inserted checksum and it's 2's complement.

    I don't even want to think about using a CRC to try to do this.


    Its is a bit of work, but even a 32-bit CRC will be solvable to find the reverse equation. You can do the work once generically, and get a
    formula that computes the value you need to put into the final bytes to
    get the CRC of the file back to the CRC it was before adding the CRC and
    the extra bytes. It wouldn't surprise me if the formula isn't published somewhere for the common CRCs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Brian Cockburn@brian.cockburn.1959@gmail.com to comp.arch.embedded on Fri Apr 21 16:52:23 2023
    From Newsgroup: comp.arch.embedded

    On Friday, April 21, 2023 at 10:12:49 PM UTC+10, Rick C wrote:
    On Friday, April 21, 2023 at 4:53:18 AM UTC-4, Brian Cockburn wrote:
    On Thursday, April 20, 2023 at 12:06:36 PM UTC+10, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209
    Rick, What is the purpose of this? Is it (1) to be able to externally identify a binary, as one might a ROM image by computing a checksum? Is it (2) for a run-able binary to be able to check itself? This would of course only be able to detect corruption, not tampering. Is it (3) for the loader (whatever that might be) to be able to say 'this binary has the correct checksum' and only jump to it if it does? Again this would only be able to detect corruption, not tampering. Are you hoping for more than corruption detection?
    This is simply to be able to say this version is unique, regardless of what the version number says. Version numbers are set manually and not always done correctly. I'm looking for something as a backup so that if the checksums are different, I can be sure the versions are not the same.

    The less work involved, the better.

    --

    Rick C.

    ++ Get 1,000 miles of free Supercharging
    ++ Tesla referral code - https://ts.la/richard11209
    Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself? Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'? For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked. Neither of these involve embedding the 'checksum' value.
    And just to be sure I understand what you wrote in a somewhat convoluted way. When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Brian Cockburn@brian.cockburn.1959@gmail.com to comp.arch.embedded on Fri Apr 21 16:56:01 2023
    From Newsgroup: comp.arch.embedded

    On Saturday, April 22, 2023 at 1:02:28 AM UTC+10, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique, regardless
    of what the version number says. Version numbers are set manually
    and not always done correctly. I'm looking for something as a backup
    so that if the checksums are different, I can be sure the versions
    are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of the
    image. Any change in the image will show up as a change in the crc.
    David, a hash and a CRC are not the same thing. They both produce a reasonably unique result though. Any change would show in either (unless as a result of intentional tampering).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Fri Apr 21 17:29:51 2023
    From Newsgroup: comp.arch.embedded

    On 4/21/2023 7:50 AM, David Brown wrote:
    On 21/04/2023 13:39, Don Y wrote:
    On 4/21/2023 3:43 AM, David Brown wrote:
    Note that you want to choose a polynomial that doesn't
    give you a "win" result for "obviously" corrupt data.
    E.g., if data is all zeros or all 0xFF (as these sorts of
    conditions can happen with hardware failures) you probably
    wouldn't want a "success" indication!

    No, that is pointless for something like a code image.  It just adds
    needless complexity to your CRC algorithm.

    Perhaps you've forgotten that you don't just use CRCs (secure hashes, etc.) >> on "code images"?

    No - but "code images" is the topic here.

    So, anything unrelated to CRC's as applied to code images is off limits...
    per order of the Internet Police"?

    If *all* you use CRCs for is checking *a* code image at POST, you're
    wasting a valuable resource.

    Do you not think data/parameters need to be safeguarded? Program images? Communication protocols?

    Or, do you develop yet another technique for *each* of those?

    However, in almost every case where CRC's might be useful, you have additional
    checks of the sanity of the data, and an all-zero or all-one data block would
    be rejected.  For example, Ethernet packets use CRC for integrity checking, but
    an attempt to send a packet type 0 from MAC address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, would be rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its CRC?
    Run the CRC first. If it fails, decide how you are going to proceed
    or recover.

    ["Data" can be code or parameters]

    I treat blocks of "data" (carefully arranged) with individual CRCs,
    based on their relative importance to the operation. If the CRC is
    corrupt, I have no idea *where* the error lies -- as it could
    be anything in the checked block. So, one has to (typically)
    restore some defaults (or, invoke a reconfigure operation) which
    recreates *a* valid dataset.

    This is particularly useful when power to a device can be
    removed at arbitrary points in time (or, some other abrupt
    crash). Before altering anything in a block, take deliberate
    steps to invalidate the CRC, make your changes, then "fix"
    the CRC. So, an interrupted process causes the CRC to fail
    and remedial action taken.

    Note that replacing a FLASH image (mostly code) falls under
    such a mechanism.

    I can't think of any use-cases where you would be passing around a block of "pure" data that could reasonably take absolutely any value, without any type
    of "envelope" information, and where you would think a CRC check is appropriate.

    I append a *version specific* CRC to each packet of marshalled data
    in my RMIs. If the data is corrupted in transit *or* if the
    wrong version API ends up targeted, the operation will abend
    because we know the data "isn't right".

    I *could* put a header saying "this is version 4.2". And, that
    tells me nothing about the integrity of the rest of the data.
    OTOH, ensuring the CRC reflects "4.2" does -- it the recipient
    expects it to be so.

    You can also "salt" the calculation so that the residual
    is deliberately nonzero.  So, for example, "success" is
    indicated by a residual of 0x474E.  :>

    Again, pointless.

    Salt is important for security-related hashes (like password hashes), not >>> for integrity checks.

    You've missed the point.  The correct "sum" can be anything.
    Why is "0" more special than any other value?  As the value is
    typically meaningless to anything other than the code that verifies
    it, you couldn't look at an image (or the output of the verifier)
    and gain anything from seeing that obscure value.

    Do you actually know what is meant by "salt" in the context of hashes, and why
    it is useful in some circumstances?  Do you understand that "salt" is added (usually prepended, or occasionally mixed in in some other way) to the data /before/ the hash is calculated?

    What term would you have me use to indicate a "bias" applied to a CRC algorithm?

    I have not given the slightest indication to suggest that "0" is a special value.  I fully agree that the value you get from the checking algorithm does
    not have to be 0 - I already suggested it could be compared to the stored value.  I.e., your build your image file as "data ++ crc(data)", at check it by
    re-calculating "crc(data)" on the received image and comparing the result to the received crc.  There is no necessity or benefit in having a crc run calculated over the received data plus the received crc being 0.

    "Salt" is used in cases where the original data must be kept secret, and only
    the hashes are transmitted or accessible - by adding salt to the original data
    before hashing it, you avoid a direct correspondence between the hash and the
    original data.  The prime use-case is to stop people being able to figure out a
    password by looking up the hash in a list of pre-computed hashes of common passwords.

    See above.

    OTOH, if the CRC yields something familiar -- or useful -- then
    it can tell you something about the image.  E.g., salt the algorithm
    with the product code, version number, your initials, 0xDEADBEEF, etc.

    You are making no sense at all.  Are you suggesting that it would be a good idea to add some value to the start of the image so that the resulting crc calculation gives a nice recognisable product code?  This "salt" would be different for each program image, and calculated by trial and error.  If you
    want a product code, version number, etc., in the program image (and it's a good idea), just put these in the program image!

    Again, that tells you nothing about the rest of the image!
    See the RMI desciption.

    [Note that the OP is expecting the checksum to help *him*
    identify versions: "Just put these in the program image!" Eh?]

    So now you have a new extended block   |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or
    the original CRC value appended to it has been changed/corrupted.

    As there is usually a lack of originality in the algorithms
    chosen, you have to consider if you are also hoping to use
    this to safeguard the *integrity* of your image (i.e.,
    against intentional modification).

    "Integrity" has nothing to do with the motivation for change. /Security/ is
    concerned with intentional modifications that deliberately attempt to defeat
    /integrity/ checks.  Integrity is about detecting any changes.

    If you are concerned about the possibility of intentional malicious changes,

    Changes don't have to be malicious.

    Accidental changes (such as human error, noise during data transfer, memory cell errors, etc.) do not pass integrity tests unnoticed.

    That's not true. The role of the 8test* is to notice these. If the test
    is blind to the types of errors that are likely to occur, then it CAN'T
    notice them.

    A CRC (hash, etc.) reduces a large block of data to a small bit of
    data. So, by definition, there are multiple DIFFERENT sets of data that
    map to the same CRC/hash/etc. (2^(data_size-CRC-size))

    E.g., simply summing the values in a block of memory will yield "0"
    for ANY condition that results in the block having identical values
    for ALL members, if the block size is a power of 2. So, a block
    of 0xFF, 0x00, 0xFE, 0x27, 0x88, etc. will all yield the same sum.
    Clearly a bad choice of test!

    OTOH, "salting" the calculation so that it is expected to yield
    a value of 0x13 means *those* situations will be flagged as errors
    (and a different set of situations will sneak by, undetected).
    The trick (engineering) is to figure out which types of
    failures/faults/errors are most common to occur and guard
    against them.

    To be more accurate,
    the chances of them passing unnoticed are of the order of 1 in 2^n, for a good
    n-bit check such as a CRC check.  Certain types of error are always detectable,
    such as single and double bit errors.  That is the point of using a checksum or
    hash for integrity checking.

    /Intentional/ changes are a different matter.  If a hacker changes the program
    image, they can change the transmitted hash to their own calculated hash.  Or
    for a small CRC, they could change a different part of the image until the original checksum matched - for a 16-bit CRC, that only takes 65,535 attempts
    in the worst case.

    If the approach used is "typical", then you need far fewer attempts to
    produce a correct image -- without EVER knowing where the CRC is stored.

    That is why you need to distinguish between the two possibilities.  If you don't have to worry about malicious attacks, a 32-bit CRC takes a dozen lines
    of C code and a 1 KB table, all running extremely efficiently.  If security is
    an issue, you need digital signatures - an RSA-based signature system is orders
    of magnitude more effort in both development time and in run time.

    It's considerably more expensive AND not fool-proof -- esp if the
    attacker knows you are signing binaries. "OK, now I need to find
    WHERE the signature is verified and just patch that "CALL" out
    of the code".

    I altered the test procedure for a
    piece of military gear we were building simply to skip some lengthy tests >> that I *knew* would pass (I don't want to inject an extra 20 minutes of wait
    time
    just to get through a lengthy test I already know works before I can get
    to the test of interest to me, now.

    I failed to undo the change before the official signoff on the device.

    The only evidence of this was the fact that I had also patched the
    startup message to say "Go for coffee..." -- which remained on the
    screen for the duration of the lengthy (even with the long test
    elided) procedure...

    ..which alerted folks to the fact that this *probably* wasn't the
    original image.  (The computer running the test suite on the DUT had
    no problem accepting my patched binary)

    And what, exactly, do you think that anecdote tells us about CRC checks for image files?  It reminds us that we are all fallible, but does no more than that.

    That *was* the point. Because the folks who designed the test computer
    relied on common techniques to safeguard the image.

    The counterfeiting example I cited indicates how "obscurity/secrecy"
    is far more effective (yet you dismiss it out-of-hand).

    CRC's alone are useless.  All the attacker needs to do after modifying the
    image is calculate the CRC themselves, and replace the original checksum >>> with their own.

    That assumes the "alterer" knows how to replace the checksum, how it
    is computed, where it is embedded in the image, etc.  I modified the Compaq >> portable mentioned without ever knowing where the checksum was store
    or *if* it was explicitly stored.  I had no desire to disassemble the
    BIOS ROMs (though could obviously do so as there was no "proprietary
    hardware" limiting access to their contents and the instruction set of
    the processor is well known!).

    Instead, I did this by *guessing* how they would implement such a check
    in a bit of kit from that era (ERPOMs aren't easily modified by malware
    so it wasn't likely that they would go to great lengths to "protect" the
    image).  And, if my guess had been incorrect, I could always reinstall
    the original EPROMs -- nothing lost, nothing gained.

    Had much experience with folks counterfeiting your products and making
    "simple" changes to the binaries?  Like changing the copyright notice
    or splash screen?

    Then, bringing the (accused) counterfeit of YOUR product into a courtroom
    and revealing the *hidden* checksum that the counterfeiter wasn't aware of? >>
    "Gee, why does YOUR (alleged) device have *my* name in it -- in addition
    to behaving exactly like mine??"

    [I guess obscurity has its place!]

    Security by obscurity is not security.  Having a hidden signature or other mark
    can be useful for proving ownership (making an intentional mistake is another
    common tactic - such as commercial maps having a few subtle spelling errors).
    But that is not security.

    Of course it is! If *you* check the "hidden signature" at runtime
    and then alter "your" operation such that an altered copy fails
    to perform properly, then then you have secured it.

    Would you want to use a check-writing program if the account
    balances it maintains were subtly (but not consistently)
    incorrect?

    OTOH, if the (altered) program threw up a splash screen and
    said "Unlicensed copy detected" and refused to operate, the
    "program" is still "secured" -- but, now you've provided an
    easy indicator of whether or not the security has been
    defeated.

    We started doing this in the heyday of video (arcade) gaming;
    a counterfeiter would have a clone of YOUR game on the market
    (at substantially reduced prices) in a matter of *weeks*.
    As Operators have no foreknowledge of which games will be
    moneymakers and which will be "90 day wonders" (literally,
    no longer played after 90 days of exposure!), what incentive
    to pay for a genuine article?

    If all a counterfeiter had to do was alter the copyright
    notice (even if it was stored in some coded form), or alter
    some graphics (name of game, colors/shapes of characters)
    that's *no* impediment -- given how often and quickly
    it could be done.

    Games would not just look at their images during POST
    but, also, verify that routineX() had some particular
    side-effect that could be tested, etc. Counterfeiters
    would go to lengths to ensure even THESE tests would pass.

    Because the game would *complain*, otherwise! (so, keep
    looking for more tests until the game stops throwing an
    alarm).

    OTOH, if you *hide* the checks in the runtime and alter
    the game's performance subtly by folding expected values
    into key calculations such that values derived from
    altered code differ, you can annoy the player: "why did
    my guy just turn blue and run off the edge of the screen?"
    An annoyed player stops putting money into a game.
    A game that doesn't earn money -- regardless of how
    inexpensive it was to purchase -- quickly teaches the
    Owner not to invest in such "buggy" games.

    This is much better than taking the counterfeiter to court and
    proving the code is a copy of yours! (and, "FlyByNight
    Games Counterfeiters" simply closes up shop and opens up,
    next door)

    And, because there is no "drop dead" point in the code or
    the games behavior, the counterfeiter never knows when
    he's found all the protection mechanisms.

    Checking signatures, CRCs, licensing schemes, etc. all are used
    in a "drop dead" fashion so considerably easier to defeat.
    Witness the number of "products" available as warez...

    Use a non-secret approach and you invite folks to alter it, as well.

    Using non-standard algorithms for security is a simple way to get things >>> completely wrong.  "Security by obscurity" is very rarely the right answer.
    In reality, good security algorithms, and good implementations, are
    difficult and specialised tasks, best left to people who know what they are
    doing.

    To make something secure, you have to ensure that the check algorithms
    depend on a key that you know, but that the attacker does not have. That's >>> the basis of digital signatures (though you use a secure hash algorithm >>> rather than a simple CRC).

    If you can remove the check, then what value the key's secrecy?  By your
    criteria, the adversary KNOWS how you are implementing your security
    so he knows exactly what to remove to bypass your checks and allow his
    altered image to operate in its place.

    Ever notice how manufacturers don't PUBLICLY disclose their security
    hooks (without an NDA)?  If "security by obscurity" was not important,
    they would publish these details INVITING challenges (instead of
    trying to limit the knowledge to people with whom they've officially
    contracted).

    Any serious manufacturer /does/ invite challenges to their security.

    There are multiple reasons why a manufacturer (such as a semiconductor manufacturer) might be guarded about the details of their security systems. They can be avoiding giving hints to competitors.  Maybe they know their systems aren't really very secure, because their keys are too short or they can
    be read out in some way.

    But I think the main reasons are often:

    They want to be able to change the details, and that's far easier if there are
    only a few people who have read the information.

    So, a legitimate customer is subjected to arbitrary changes in
    the product's implementation?

    They don't want endless support questions from amateurs.

    Only answer with a support contract.

    They are limited by idiotic government export restrictions made by ignorant politicians who don't understand cryptography.

    Protections don't always have to be cryptographic. The
    "Fortress" payphone is remarkably well hardened to direct
    physical (brute force) attacks -- money is involved.
    Ditto many slot machines (again, CASH money). Yet, all
    have vulnerabilities. "Expose this portion of the die
    to ultraviolet light to reset the memory protection bits"
    Etc.

    Some things benefit from being kept hidden, or under restricted access. The details of the CRC algorithm you use to catch accidental errors in your image
    file is /not/ one of them.  If you think hiding it has the remotest hint of a
    benefit, you are doing things wrong - you need a /security/ check, not a simple
    /integrity/ check.

    And then once you have switched to a security check - a digital signature - there's no need to keep that choice hidden either, because it is the /key/ that
    is important, not the type of lock.

    Again, meaningless if the attacker can interfere with the *enforcement*
    of that check. Using something "well known" just means he already knows
    what to look for in your code. Or, how to interfere with your
    intended implementation in ways that you may have not anticipated
    (confident that your "security" can't be MATHEMATICALLY broken).

    I had a discussion with a friend who knew just enough about "computers"
    to THINK he understood that world. I mentioned my NOT using ecommerce.
    He laughed at me as "naive": "There's 40 bit encryption on those
    connections! No one is going to eavesdrop on your financial data!"

    [Really, Jerry? You think, as an OLD accountant, you know more
    than I do as a young engineer practicing in that field? Ok...]

    "Yeah, and are you 100% sure something isn't already *on* your computer
    looking at your keystrokes BEFORE they head down that encrypted tunnel?"

    Guess he hadn't really thought out the problem to that level of detail
    as his confidence quickly melted away to one of worry ("I wonder if
    I've already been hacked??")

    People implementing security almost always focus on the wrong
    aspects of the problem and walk away THINKING they can rest easy. Vulnerabilities are often so blatantly obvious, after the fact,
    as to be embarassing: "You're not supposed to do that!"
    "Then, why did your product LET ME?"

    I use *many* layers of security in my current design and STILL
    expect them (at least the ones that are accessible) to all
    be subverted. So, ultimately rely on controlling *what*
    the devices can do so that, even compromised, they can't
    cause undetectable failures or information leaks.

    "Here's my source code. Here are my schematics. Here's the
    name of the guy who oversees production (bribe him to gain
    access to the keys stored in the TPM). Now, what are you
    gonna *do* with all that?"

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Fri Apr 21 20:14:31 2023
    From Newsgroup: comp.arch.embedded

    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique, regardless
    of what the version number says. Version numbers are set manually
    and not always done correctly. I'm looking for something as a backup
    so that if the checksums are different, I can be sure the versions
    are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of the
    image. Any change in the image will show up as a change in the crc.
    No one is trying to detect changes in the image. I'm trying to label the image in a way that can be read in operation. I'm using the checksum simply because that is easy to generate. I've had problems with version numbering in the past. It will be used, but I want it supplemented with a number that will change every time the design changes, at least with a high probability, such as 1 in 64k.
    --
    Rick C.
    --- Get 1,000 miles of free Supercharging
    --- Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Fri Apr 21 20:23:23 2023
    From Newsgroup: comp.arch.embedded

    On Friday, April 21, 2023 at 7:52:27 PM UTC-4, Brian Cockburn wrote:
    On Friday, April 21, 2023 at 10:12:49 PM UTC+10, Rick C wrote:
    On Friday, April 21, 2023 at 4:53:18 AM UTC-4, Brian Cockburn wrote:
    On Thursday, April 20, 2023 at 12:06:36 PM UTC+10, Rick C wrote:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209
    Rick, What is the purpose of this? Is it (1) to be able to externally identify a binary, as one might a ROM image by computing a checksum? Is it (2) for a run-able binary to be able to check itself? This would of course only be able to detect corruption, not tampering. Is it (3) for the loader (whatever that might be) to be able to say 'this binary has the correct checksum' and only jump to it if it does? Again this would only be able to detect corruption, not tampering. Are you hoping for more than corruption detection?
    This is simply to be able to say this version is unique, regardless of what the version number says. Version numbers are set manually and not always done correctly. I'm looking for something as a backup so that if the checksums are different, I can be sure the versions are not the same.

    The less work involved, the better.

    --

    Rick C.

    ++ Get 1,000 miles of free Supercharging
    ++ Tesla referral code - https://ts.la/richard11209
    Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself? Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'? For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked. Neither of these involve embedding the 'checksum' value.
    And just to be sure I understand what you wrote in a somewhat convoluted way. When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?
    Yes, I want the checksum to be readable while operating. Calculation code??? Not going to happen. That's why I want to embed the checksum.
    Yes, two compiled files which ended up with the same version number by error. We are using an 8 bit version number, so two hex digits. Negative numbers are lab versions, positive numbers are releases, so 64 of each. We don't do a lot of actual work on the hardware. This code usually is 99.9% working by the time it is tested on hardware. So no need for lots of rev numbers. But sometimes, in the lab, the rev number is not bumped when it should be. The checksum will tell us if we are working with different revisions in that case.
    So far, it looks like a simple checksum is the way to go. Include the checksum and the 2's complement of the checksum (in locations that were zeros), and the checksum will not change.
    --
    Rick C.
    --+ Get 1,000 miles of free Supercharging
    --+ Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Brian Cockburn@brian.cockburn.1959@gmail.com to comp.arch.embedded on Sat Apr 22 07:07:34 2023
    From Newsgroup: comp.arch.embedded

    Rick,
    Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself? Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'? For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked. Neither of these involve embedding the 'checksum' value.
    And just to be sure I understand what you wrote in a somewhat convoluted way. When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?

    Yes, I want the checksum to be readable while operating. Calculation code??? Not going to happen. That's why I want to embed the checksum.
    Can you expand on what you mean or expect by 'readable while operating' please? Are you planning to use some sort of tool to inspect the executing binary to 'read' this thing, or provoke output to the console in some way like:
    $ run my-binary-thing --checksum
    10FD
    $
    This would be as distinct from:

    $ run my-binary-thing --version
    -52
    $
    Yes, two compiled files which ended up with the same version number by error. We are using an 8 bit version number, so two hex digits. Negative numbers are lab versions, positive numbers are releases, so 64 of each.
    Signed 8-bit numbers range from -128 to +127 (0x80 to 0x7F) so probably a few more than 64.
    ... sometimes, in the lab, the rev number is not bumped when it should be.
    This may be an indicator that better procedures are needed for code review-for-release. And that in independent pair of eyes should be doing the review against an agreed check list.
    So far, it looks like a simple checksum is the way to go. Include the checksum and the 2's complement of the checksum (in locations that were zeros), and the checksum will not change.
    How will the checksum 'not change'? It will be different for every build won't it?
    Cheers, Brian.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@Richard@Damon-Family.org to comp.arch.embedded on Sat Apr 22 10:31:48 2023
    From Newsgroup: comp.arch.embedded

    On 4/22/23 10:07 AM, Brian Cockburn wrote:
    Rick,

    So far, it looks like a simple checksum is the way to go. Include the checksum and the 2's complement of the checksum (in locations that were zeros), and the checksum will not change.

    How will the checksum 'not change'? It will be different for every build won't it?

    Cheers, Brian.

    He means the checksum of the file for a given build after the
    modification will be the same as the checksum of the file before the modification.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Apr 22 16:57:53 2023
    From Newsgroup: comp.arch.embedded

    On 22/04/2023 02:29, Don Y wrote:
    On 4/21/2023 7:50 AM, David Brown wrote:
    On 21/04/2023 13:39, Don Y wrote:
    On 4/21/2023 3:43 AM, David Brown wrote:
    Note that you want to choose a polynomial that doesn't
    give you a "win" result for "obviously" corrupt data.
    E.g., if data is all zeros or all 0xFF (as these sorts of
    conditions can happen with hardware failures) you probably
    wouldn't want a "success" indication!

    No, that is pointless for something like a code image.  It just adds >>>> needless complexity to your CRC algorithm.

    Perhaps you've forgotten that you don't just use CRCs (secure hashes,
    etc.)
    on "code images"?

    No - but "code images" is the topic here.

    So, anything unrelated to CRC's as applied to code images is off limits... per order of the Internet Police"?


    No, it's fine to discuss them - threads on Usenet often wander, and
    that's often good. (At least, that's my opinion - some people get their knickers in a twist if people stray from answering their original question.)

    But you have to assume that people are on topic unless it's clear that
    the topic is being expanded. We were discussing CRC's for code images,
    and so it is appropriate to take advantage of the features of code
    images. If you want to expand and talk about other uses of CRC's, I've
    no problem with that - but you need to say so.

    If *all* you use CRCs for is checking *a* code image at POST, you're
    wasting a valuable resource.

    Do you not think data/parameters need to be safeguarded?  Program images? Communication protocols?

    Sure. Many things need integrity checks. And CRC's are flexible enough
    to be useful in many circumstances.


    Or, do you develop yet another technique for *each* of those?

    Sometimes, yes. CRC's are, as I wrote, flexible. But they don't cover everything. Maybe you need a specific type of check to match existing protocols or requirements. Maybe you want forward error correction, not
    just error detection. Maybe you are guarding against malicious
    interference. Maybe you are guarding against different kinds of errors
    - CRC's are great for spotting a few damaged bits, but a poor choice if
    the risk is dropped bytes in transmission.

    But often CRC's will be a first choice, because they are simple and
    effective in a wide range of uses.


    However, in almost every case where CRC's might be useful, you have
    additional checks of the sanity of the data, and an all-zero or
    all-one data block would be rejected.  For example, Ethernet packets
    use CRC for integrity checking, but an attempt to send a packet type 0
    from MAC address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of
    length 0, would be rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its CRC?
    Run the CRC first.  If it fails, decide how you are going to proceed
    or recover.


    That is usually the order, yes. Sometimes you want "fail fast", such as dropping a packet that was not addressed to you (it doesn't matter if it
    was received correctly but for someone else, or it was addressed to you
    but the receiver address was corrupted - you are dropping the packet
    either way). But usually you will run the CRC then look at the data.

    But the order doesn't matter - either way, you are still checking for
    valid data, and if the data is invalid, it does not matter if the CRC
    only passed by luck or by all zeros.

    ["Data" can be code or parameters]

    I treat blocks of "data" (carefully arranged) with individual CRCs,
    based on their relative importance to the operation.  If the CRC is
    corrupt, I have no idea *where* the error lies -- as it could
    be anything in the checked block.  So, one has to (typically)
    restore some defaults (or, invoke a reconfigure operation) which
    recreates *a* valid dataset.

    This is particularly useful when power to a device can be
    removed at arbitrary points in time (or, some other abrupt
    crash).  Before altering anything in a block, take deliberate
    steps to invalidate the CRC, make your changes, then "fix"
    the CRC.  So, an interrupted process causes the CRC to fail
    and remedial action taken.

    Note that replacing a FLASH image (mostly code) falls under
    such a mechanism.


    That's all standard stuff. (Maybe it's new to some people in this group
    - although most of the regular posters here are experienced embedded developers, it's nice to think there might be some people reading these
    posts and learning!)

    If you have the space in your flash, eeprom, etc., then it is also
    common to have two slots for your configuration data or code. You don't "invalidate" anything - you keep a version counter with your data, and
    write your new data to the slot with the oldest version. When your
    system starts, it checks both slots - and uses the one with the newest
    version for which the CRC check passes.

    I can't think of any use-cases where you would be passing around a
    block of "pure" data that could reasonably take absolutely any value,
    without any type of "envelope" information, and where you would think
    a CRC check is appropriate.

    I append a *version specific* CRC to each packet of marshalled data
    in my RMIs.  If the data is corrupted in transit *or* if the
    wrong version API ends up targeted, the operation will abend
    because we know the data "isn't right".

    Using a version-specific CRC sounds silly. Put the version information
    in the packet.


    I *could* put a header saying "this is version 4.2".  And, that
    tells me nothing about the integrity of the rest of the data.
    OTOH, ensuring the CRC reflects "4.2" does -- it the recipient
    expects it to be so.

    Now you don't know if the data is corrupted, or for the wrong version -
    or occasionally, corrupted /and/ the wrong version but passing the CRC
    anyway.

    Unless you are absolutely desperate to save every bit you can, your
    system will be simpler, clearer, and more reliable if you separate your purposes.


    You can also "salt" the calculation so that the residual
    is deliberately nonzero.  So, for example, "success" is
    indicated by a residual of 0x474E.  :>

    Again, pointless.

    Salt is important for security-related hashes (like password
    hashes), not for integrity checks.

    You've missed the point.  The correct "sum" can be anything.
    Why is "0" more special than any other value?  As the value is
    typically meaningless to anything other than the code that verifies
    it, you couldn't look at an image (or the output of the verifier)
    and gain anything from seeing that obscure value.

    Do you actually know what is meant by "salt" in the context of hashes,
    and why it is useful in some circumstances?  Do you understand that
    "salt" is added (usually prepended, or occasionally mixed in in some
    other way) to the data /before/ the hash is calculated?

    What term would you have me use to indicate a "bias" applied to a CRC algorithm?

    Well, first I'd note that any kind of modification to the basic CRC
    algorithm is pointless from the viewpoint of its use as an integrity
    check. (There have been, mostly historically, some justifications in
    terms of implementation efficiency. For example, bit and byte
    re-ordering could be done to suit hardware bit-wise implementations.)

    Otherwise I'd say you are picking a specific initial value if that is
    what you are doing, or modifying the final value (inverting it or
    xor'ing it with a fixed value). There is, AFAIK, no specific terms for
    these - and I don't see any benefit in having one. Misusing the term
    "salt" from cryptography is certainly not helpful.



    I have not given the slightest indication to suggest that "0" is a
    special value.  I fully agree that the value you get from the checking
    algorithm does not have to be 0 - I already suggested it could be
    compared to the stored value.  I.e., your build your image file as
    "data ++ crc(data)", at check it by re-calculating "crc(data)" on the
    received image and comparing the result to the received crc.  There is
    no necessity or benefit in having a crc run calculated over the
    received data plus the received crc being 0.

    "Salt" is used in cases where the original data must be kept secret,
    and only the hashes are transmitted or accessible - by adding salt to
    the original data before hashing it, you avoid a direct correspondence
    between the hash and the original data.  The prime use-case is to stop
    people being able to figure out a password by looking up the hash in a
    list of pre-computed hashes of common passwords.

    See above.

    OTOH, if the CRC yields something familiar -- or useful -- then
    it can tell you something about the image.  E.g., salt the algorithm
    with the product code, version number, your initials, 0xDEADBEEF, etc.

    You are making no sense at all.  Are you suggesting that it would be a
    good idea to add some value to the start of the image so that the
    resulting crc calculation gives a nice recognisable product code?
    This "salt" would be different for each program image, and calculated
    by trial and error.  If you want a product code, version number, etc.,
    in the program image (and it's a good idea), just put these in the
    program image!

    Again, that tells you nothing about the rest of the image!

    Again, you are making no sense - not to me, anyway. If you want
    something in the image to tell you about the image, add such metadata - versions, dates, whatever. If you want an integrity check of the image,
    make one - such as appending a CRC. Trying to combine these two
    orthogonal tasks into one is not going to be good for either purpose.

    See the RMI desciption.

    I'm sorry, I have no idea what "RMI" is or where it is described.
    You've mentioned that abbreviation twice, but I can't figure it out.


    [Note that the OP is expecting the checksum to help *him*
    identify versions:  "Just put these in the program image!"  Eh?]

    No. The OP is looking for a way to be sure that two program images are
    the same. He wants to be sure that if he (or whoever makes the image)
    forgets to update the version number when making a change to the
    software, the difference between the images is easily detectable or identifiable without doing a byte-for-byte compare of the images. The
    answer to that is a hash of some sort - and a CRC of appropriate size is
    a simple hash that will work well against mistakes (but not necessarily malicious changes). But a hash will not give you a version number. It
    will let you see that two images are different, but it will not tell you
    that one of them is version 1.20.304 and the other is 1.21.308. What he
    will see is that if two files say they are version 1.20.304, but are
    actually different, someone has screwed up - the CRC hash makes such
    checks possible without having to read through the entire images.


    So now you have a new extended block   |....data....|crc|

    Now if you compute a new CRC on the extended block, the resulting
    value /should/ come out to zero. If it doesn't, either your data or >>>>>> the original CRC value appended to it has been changed/corrupted.

    As there is usually a lack of originality in the algorithms
    chosen, you have to consider if you are also hoping to use
    this to safeguard the *integrity* of your image (i.e.,
    against intentional modification).

    "Integrity" has nothing to do with the motivation for change.
    /Security/ is concerned with intentional modifications that
    deliberately attempt to defeat /integrity/ checks.  Integrity is
    about detecting any changes.

    If you are concerned about the possibility of intentional malicious
    changes,

    Changes don't have to be malicious.

    Accidental changes (such as human error, noise during data transfer,
    memory cell errors, etc.) do not pass integrity tests unnoticed.

    That's not true.  The role of the 8test* is to notice these.  If the test is blind to the types of errors that are likely to occur, then it CAN'T notice them.

    I assumed it was unnecessary to say that an integrity test needs to be appropriate for the type of data and transfer in question.


    A CRC (hash, etc.) reduces a large block of data to a small bit of
    data.  So, by definition, there are multiple DIFFERENT sets of data that
    map to the same CRC/hash/etc.  (2^(data_size-CRC-size))

    Correct.

    That's why you need to pick an appropriate size for your CRC. For a
    telegram of a dozen bytes, an 8-bit CRC is probably fine. For a program image, a 32-bit CRC is usually more appropriate - a one in four billion
    chance of an undetected error is reasonable for most uses. If you want
    to be more paranoid, go for 64-bit CRC - you should now be far more
    worried about meteors wiping out humanity than undetected errors. (More commonly, if a 32-bit CRC is not enough, it's because you have security concerns - so switch to a SHA hash.)


    E.g., simply summing the values in a block of memory will yield "0"
    for ANY condition that results in the block having identical values
    for ALL members, if the block size is a power of 2.  So, a block
    of 0xFF, 0x00, 0xFE, 0x27, 0x88, etc. will all yield the same sum.
    Clearly a bad choice of test!


    Correct.

    That's why simple sums are not usually considered very good integrity tests.

    A CRC has a spreading effect. Every bit in the data contributes with approximately equal weight to every bit in the CRC. This is a common
    feature for good hash functions.

    OTOH, "salting" the calculation so that it is expected to yield
    a value of 0x13 means *those* situations will be flagged as errors
    (and a different set of situations will sneak by, undetected).

    And that gives you exactly /zero/ benefit.

    You run your hash algorithm, and check for the single value that
    indicates no errors. It does not matter if that number is 0, 0x13, or -
    often more conveniently - the number attached at the end of the image as
    the expected result of the hash of the rest of the data.

    The trick (engineering) is to figure out which types of failures/faults/errors are most common to occur and guard
    against them.

    Yes, that is absolutely the case. And CRC's have the convenience of
    being particularly good at certain kinds of errors that are feasible in
    a lot of data transmissions. But they are not ideal for everything, and
    other kinds of checks can be better when you know more about the
    realistic errors.


    To be more accurate, the chances of them passing unnoticed are of the
    order of 1 in 2^n, for a good n-bit check such as a CRC check.
    Certain types of error are always detectable, such as single and
    double bit errors.  That is the point of using a checksum or hash for
    integrity checking.

    /Intentional/ changes are a different matter.  If a hacker changes the
    program image, they can change the transmitted hash to their own
    calculated hash.  Or for a small CRC, they could change a different
    part of the image until the original checksum matched - for a 16-bit
    CRC, that only takes 65,535 attempts in the worst case.

    If the approach used is "typical", then you need far fewer attempts to produce a correct image -- without EVER knowing where the CRC is stored.


    It is difficult to know what you are trying to say here, but if you
    believe that different initial values in a CRC algorithm makes it harder
    to modify an image to make it pass the integrity test, you are simply wrong.

    That is why you need to distinguish between the two possibilities.  If
    you don't have to worry about malicious attacks, a 32-bit CRC takes a
    dozen lines of C code and a 1 KB table, all running extremely
    efficiently.  If security is an issue, you need digital signatures -
    an RSA-based signature system is orders of magnitude more effort in
    both development time and in run time.

    It's considerably more expensive AND not fool-proof -- esp if the
    attacker knows you are signing binaries.  "OK, now I need to find
    WHERE the signature is verified and just patch that "CALL" out
    of the code".

    I'm not sure if that is a straw-man argument, or just showing your
    ignorance of the topic. Do you really think security checks are done by
    the program you are trying to send securely? That would be like trying
    to have building security where people entering the building look at
    their own security cards.


    I altered the test procedure for a
    piece of military gear we were building simply to skip some lengthy
    tests that I *knew* would pass (I don't want to inject an extra 20
    minutes of wait time
    just to get through a lengthy test I already know works before I can get >>> to the test of interest to me, now.

    I failed to undo the change before the official signoff on the device.

    The only evidence of this was the fact that I had also patched the
    startup message to say "Go for coffee..." -- which remained on the
    screen for the duration of the lengthy (even with the long test
    elided) procedure...

    ..which alerted folks to the fact that this *probably* wasn't the
    original image.  (The computer running the test suite on the DUT had
    no problem accepting my patched binary)

    And what, exactly, do you think that anecdote tells us about CRC
    checks for image files?  It reminds us that we are all fallible, but
    does no more than that.

    That *was* the point.  Because the folks who designed the test computer relied on common techniques to safeguard the image.

    There was a human error - procedures were not good enough, or were not followed. It happens, and you learn from it and make better procedures.
    The fault was in what people did, not in an automated integrity check.
    It is completely unrelated.


    The counterfeiting example I cited indicates how "obscurity/secrecy"
    is far more effective (yet you dismiss it out-of-hand).

    No, it does nothing of the sort. There is no connection at all.


    CRC's alone are useless.  All the attacker needs to do after
    modifying the image is calculate the CRC themselves, and replace the
    original checksum with their own.

    That assumes the "alterer" knows how to replace the checksum, how it
    is computed, where it is embedded in the image, etc.  I modified the
    Compaq
    portable mentioned without ever knowing where the checksum was store
    or *if* it was explicitly stored.  I had no desire to disassemble the
    BIOS ROMs (though could obviously do so as there was no "proprietary
    hardware" limiting access to their contents and the instruction set of
    the processor is well known!).

    Instead, I did this by *guessing* how they would implement such a check
    in a bit of kit from that era (ERPOMs aren't easily modified by malware
    so it wasn't likely that they would go to great lengths to "protect" the >>> image).  And, if my guess had been incorrect, I could always reinstall
    the original EPROMs -- nothing lost, nothing gained.

    Had much experience with folks counterfeiting your products and making
    "simple" changes to the binaries?  Like changing the copyright notice
    or splash screen?

    Then, bringing the (accused) counterfeit of YOUR product into a
    courtroom
    and revealing the *hidden* checksum that the counterfeiter wasn't
    aware of?

    "Gee, why does YOUR (alleged) device have *my* name in it -- in addition >>> to behaving exactly like mine??"

    [I guess obscurity has its place!]

    Security by obscurity is not security.  Having a hidden signature or
    other mark can be useful for proving ownership (making an intentional
    mistake is another common tactic - such as commercial maps having a
    few subtle spelling errors). But that is not security.

    Of course it is!  If *you* check the "hidden signature" at runtime
    and then alter "your" operation such that an altered copy fails
    to perform properly, then then you have secured it.


    That is not security. "Security" means that the program that starts the updated program checks the /entire/ image according to its digital
    signature, and rejects it /entirely/ if it does not match.

    What you are talking about here is the sort of cat-and-mouse nonsense
    computer games producers did with intentional disk errors to stop
    copying. It annoys legitimate users and does almost nothing to hinder
    the bad guys.

    Would you want to use a check-writing program if the account
    balances it maintains were subtly (but not consistently)
    incorrect?

    Again, you make no sense. What has this got to do with integrity checks
    or security?


    OTOH, if the (altered) program threw up a splash screen and
    said "Unlicensed copy detected" and refused to operate, the
    "program" is still "secured" -- but, now you've provided an
    easy indicator of whether or not the security has been
    defeated.

    We started doing this in the heyday of video (arcade) gaming;
    a counterfeiter would have a clone of YOUR game on the market
    (at substantially reduced prices) in a matter of *weeks*.
    As Operators have no foreknowledge of which games will be
    moneymakers and which will be "90 day wonders" (literally,
    no longer played after 90 days of exposure!), what incentive
    to pay for a genuine article?

    If all a counterfeiter had to do was alter the copyright
    notice (even if it was stored in some coded form), or alter
    some graphics (name of game, colors/shapes of characters)
    that's *no* impediment -- given how often and quickly
    it could be done.

    Games would not just look at their images during POST
    but, also, verify that routineX() had some particular
    side-effect that could be tested, etc.  Counterfeiters
    would go to lengths to ensure even THESE tests would pass.

    Because the game would *complain*, otherwise!  (so, keep
    looking for more tests until the game stops throwing an
    alarm).

    OTOH, if you *hide* the checks in the runtime and alter
    the game's performance subtly by folding expected values
    into key calculations such that values derived from
    altered code differ, you can annoy the player:  "why did
    my guy just turn blue and run off the edge of the screen?"
    An annoyed player stops putting money into a game.
    A game that doesn't earn money -- regardless of how
    inexpensive it was to purchase -- quickly teaches the
    Owner not to invest in such "buggy" games.

    This is much better than taking the counterfeiter to court and
    proving the code is a copy of yours!  (and, "FlyByNight
    Games Counterfeiters" simply closes up shop and opens up,
    next door)

    And, because there is no "drop dead" point in the code or
    the games behavior, the counterfeiter never knows when
    he's found all the protection mechanisms.

    Checking signatures, CRCs, licensing schemes, etc. all are used
    in a "drop dead" fashion so considerably easier to defeat.
    Witness the number of "products" available as warez...


    Look, it is all /really/ simple. And the year is 2023, not 1973.

    If you want to check the integrity of a file against accidental changes,
    a CRC is usually fine.

    If you want security, and to protect against malicious changes, use a
    digital signature. This must be checked by the program that /starts/
    the updated code, or that downloaded and stored it - not by the program itself!


    Use a non-secret approach and you invite folks to alter it, as well.

    Using non-standard algorithms for security is a simple way to get
    things completely wrong.  "Security by obscurity" is very rarely the >>>> right answer. In reality, good security algorithms, and good
    implementations, are difficult and specialised tasks, best left to
    people who know what they are doing.

    To make something secure, you have to ensure that the check
    algorithms depend on a key that you know, but that the attacker does
    not have. That's the basis of digital signatures (though you use a
    secure hash algorithm rather than a simple CRC).

    If you can remove the check, then what value the key's secrecy?  By your >>> criteria, the adversary KNOWS how you are implementing your security
    so he knows exactly what to remove to bypass your checks and allow his
    altered image to operate in its place.

    Ever notice how manufacturers don't PUBLICLY disclose their security
    hooks (without an NDA)?  If "security by obscurity" was not important,
    they would publish these details INVITING challenges (instead of
    trying to limit the knowledge to people with whom they've officially
    contracted).

    Any serious manufacturer /does/ invite challenges to their security.

    There are multiple reasons why a manufacturer (such as a semiconductor
    manufacturer) might be guarded about the details of their security
    systems. They can be avoiding giving hints to competitors.  Maybe they
    know their systems aren't really very secure, because their keys are
    too short or they can be read out in some way.

    But I think the main reasons are often:

    They want to be able to change the details, and that's far easier if
    there are only a few people who have read the information.

    So, a legitimate customer is subjected to arbitrary changes in
    the product's implementation?


    Yes. It may come as a shock to you, but welcome to the real world.

    They don't want endless support questions from amateurs.

    Only answer with a support contract.

    Oh, sure - the amateurs who have some of the information but not enough details, skill or knowledge to get things working will /never/ fill
    forums with questions, complaints or bad reviews that bother your
    support staff or scare away real sales.


    They are limited by idiotic government export restrictions made by
    ignorant politicians who don't understand cryptography.

    Protections don't always have to be cryptographic.

    Correct, but - as with a lot of what you write - completely irrelevant
    to the subject at hand.

    Why can't companies give out information about the security systems used
    in their microcontrollers (for example) ? Because some geriatric
    ignoramuses think banning "export" of such information to certain
    countries will stop those countries knowing about security and cryptography.

    The
    "Fortress" payphone is remarkably well hardened to direct
    physical (brute force) attacks -- money is involved.
    Ditto many slot machines (again, CASH money).  Yet, all
    have vulnerabilities.  "Expose this portion of the die
    to ultraviolet light to reset the memory protection bits"
    Etc.

    Some things benefit from being kept hidden, or under restricted
    access. The details of the CRC algorithm you use to catch accidental
    errors in your image file is /not/ one of them.  If you think hiding
    it has the remotest hint of a benefit, you are doing things wrong -
    you need a /security/ check, not a simple /integrity/ check.

    And then once you have switched to a security check - a digital
    signature - there's no need to keep that choice hidden either, because
    it is the /key/ that is important, not the type of lock.

    Again, meaningless if the attacker can interfere with the *enforcement*
    of that check.  Using something "well known" just means he already knows what to look for in your code.  Or, how to interfere with your
    intended implementation in ways that you may have not anticipated
    (confident that your "security" can't be MATHEMATICALLY broken).


    If the attacker can interfere with the enforcement of the check, then it doesn't matter what checks you have. Keeping the design of a building's
    locks secret does not help you if the bad guys have bribed the security
    guard /inside/ the building!

    I had a discussion with a friend who knew just enough about "computers"
    to THINK he understood that world.  I mentioned my NOT using ecommerce.
    He laughed at me as "naive":  "There's 40 bit encryption on those connections!  No one is going to eavesdrop on your financial data!"

    [Really, Jerry?  You think, as an OLD accountant, you know more
    than I do as a young engineer practicing in that field?  Ok...]

    "Yeah, and are you 100% sure something isn't already *on* your computer looking at your keystrokes BEFORE they head down that encrypted tunnel?"

    Guess he hadn't really thought out the problem to that level of detail
    as his confidence quickly melted away to one of worry ("I wonder if
    I've already been hacked??")

    People implementing security almost always focus on the wrong
    aspects of the problem and walk away THINKING they can rest easy. Vulnerabilities are often so blatantly obvious, after the fact,
    as to be embarassing:  "You're not supposed to do that!"
    "Then, why did your product LET ME?"

    I use *many* layers of security in my current design and STILL
    expect them (at least the ones that are accessible) to all
    be subverted.  So, ultimately rely on controlling *what*
    the devices can do so that, even compromised, they can't
    cause undetectable failures or information leaks.

    "Here's my source code.  Here are my schematics.  Here's the
    name of the guy who oversees production (bribe him to gain
    access to the keys stored in the TPM).  Now, what are you
    gonna *do* with all that?"


    The first two should be fine - if people can break your security after
    looking at your source code or schematics, your security is /bad/. As
    for the third one, if they can break your security by going through the production guy, your production procedures are bad.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Apr 22 17:01:10 2023
    From Newsgroup: comp.arch.embedded

    On 22/04/2023 01:56, Brian Cockburn wrote:
    On Saturday, April 22, 2023 at 1:02:28 AM UTC+10, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version numbers are
    set manually and not always done correctly. I'm looking for
    something as a backup so that if the checksums are different, I
    can be sure the versions are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of
    the image. Any change in the image will show up as a change in the
    crc.
    David, a hash and a CRC are not the same thing.

    A CRC is a type of hash - but hash is a more generic term.

    They both produce a
    reasonably unique result though. Any change would show in either
    (unless as a result of intentional tampering).

    Exactly. Thus a CRC is a hash.

    It is not a cryptographically secure hash, and is not suitable for
    protecting against intentional tampering. But it /is/ a hash.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Apr 22 17:13:27 2023
    From Newsgroup: comp.arch.embedded

    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version numbers are
    set manually and not always done correctly. I'm looking for
    something as a backup so that if the checksums are different, I
    can be sure the versions are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of
    the image. Any change in the image will show up as a change in the
    crc.

    No one is trying to detect changes in the image. I'm trying to label
    the image in a way that can be read in operation. I'm using the
    checksum simply because that is easy to generate. I've had problems
    with version numbering in the past. It will be used, but I want it supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.


    Again - use a CRC. It will give you what you want.

    You might want to go for 32-bit CRC rather than a 16-bit CRC, depending
    on the kind of program, how often you build it, and what consequences a
    hash collision could have. With a 16-bit CRC, you have a 5% chance of a collision after 82 builds. If collisions only matter for releases, and
    you only release a couple of updates, fine - but if they matter during development builds, you are getting a more significant risk. Since a
    32-bit CRC is quick and easy, it's worth using.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Apr 22 09:54:00 2023
    From Newsgroup: comp.arch.embedded

    On Saturday, April 22, 2023 at 10:07:37 AM UTC-4, Brian Cockburn wrote:
    Rick,
    Rick, so you want the executable to, as part of its execution, print on the console the 'checksum' of itself? Or do you want to be able to inspect the executable with some other tool to calculate its 'checksum'? For the latter there are lots of tools to do that (your OS or PROM programmer for instance), for the former you need to embed the calculation code into the executable (along with the length over which to calculate) and run this when asked. Neither of these involve embedding the 'checksum' value.
    And just to be sure I understand what you wrote in a somewhat convoluted way. When you have two binary executables that report the same version number you want to be able to distinguish them with a 'checksum', right?

    Yes, I want the checksum to be readable while operating. Calculation code??? Not going to happen. That's why I want to embed the checksum.
    Can you expand on what you mean or expect by 'readable while operating' please? Are you planning to use some sort of tool to inspect the executing binary to 'read' this thing, or provoke output to the console in some way like:

    $ run my-binary-thing --checksum
    10FD
    $

    This would be as distinct from:

    $ run my-binary-thing --version
    -52
    $
    More like $ run my-binary thing
    Hello, master. Would you like to achieve world domination today?
    No, thank you, can you display the contents of registers 26 and 27 in hex please?
    That would be X0FE38
    Thank you.
    Yes, two compiled files which ended up with the same version number by error. We are using an 8 bit version number, so two hex digits. Negative numbers are lab versions, positive numbers are releases, so 64 of each.
    Signed 8-bit numbers range from -128 to +127 (0x80 to 0x7F) so probably a few more than 64.
    See? This is why I need the checksum. I make mistakes.
    ... sometimes, in the lab, the rev number is not bumped when it should be.

    This may be an indicator that better procedures are needed for code review-for-release. And that in independent pair of eyes should be doing the review against an agreed check list.
    Or that I need a checksum. This is a lab compile, not a release. Let's try to stay on task.
    So far, it looks like a simple checksum is the way to go. Include the checksum and the 2's complement of the checksum (in locations that were zeros), and the checksum will not change.
    How will the checksum 'not change'? It will be different for every build won't it?
    It won't be changed by including the checksum and the complement because they add up to zero.
    --
    Rick C.
    -+- Get 1,000 miles of free Supercharging
    -+- Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sat Apr 22 09:56:03 2023
    From Newsgroup: comp.arch.embedded

    On Saturday, April 22, 2023 at 11:13:32 AM UTC-4, David Brown wrote:
    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version numbers are
    set manually and not always done correctly. I'm looking for
    something as a backup so that if the checksums are different, I
    can be sure the versions are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of
    the image. Any change in the image will show up as a change in the
    crc.

    No one is trying to detect changes in the image. I'm trying to label
    the image in a way that can be read in operation. I'm using the
    checksum simply because that is easy to generate. I've had problems
    with version numbering in the past. It will be used, but I want it supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.

    Again - use a CRC. It will give you what you want.
    Again - as will a simple addition checksum.
    You might want to go for 32-bit CRC rather than a 16-bit CRC, depending
    on the kind of program, how often you build it, and what consequences a
    hash collision could have. With a 16-bit CRC, you have a 5% chance of a collision after 82 builds. If collisions only matter for releases, and
    you only release a couple of updates, fine - but if they matter during development builds, you are getting a more significant risk. Since a
    32-bit CRC is quick and easy, it's worth using.
    Or, I might want to go with a simple checksum.
    Thanks for your comments.
    --
    Rick C.
    -++ Get 1,000 miles of free Supercharging
    -++ Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sat Apr 22 19:54:54 2023
    From Newsgroup: comp.arch.embedded

    On 22/04/2023 18:56, Rick C wrote:
    On Saturday, April 22, 2023 at 11:13:32 AM UTC-4, David Brown wrote:
    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version numbers are
    set manually and not always done correctly. I'm looking for
    something as a backup so that if the checksums are different, I
    can be sure the versions are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of
    the image. Any change in the image will show up as a change in the
    crc.

    No one is trying to detect changes in the image. I'm trying to label
    the image in a way that can be read in operation. I'm using the
    checksum simply because that is easy to generate. I've had problems
    with version numbering in the past. It will be used, but I want it
    supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.

    Again - use a CRC. It will give you what you want.

    Again - as will a simple addition checksum.

    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code changes
    "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum is likely to
    be exactly the same despite the change in the source. In general, you
    will have much higher chance of collisions, though I think it would be
    very hard to quantify that.

    Maybe it will be good enough for you. Simple checksums were popular
    once, and can still make sense if you are very short on program space.
    But there are good reasons why they fell out of favour in many uses.



    You might want to go for 32-bit CRC rather than a 16-bit CRC, depending
    on the kind of program, how often you build it, and what consequences a
    hash collision could have. With a 16-bit CRC, you have a 5% chance of a
    collision after 82 builds. If collisions only matter for releases, and
    you only release a couple of updates, fine - but if they matter during
    development builds, you are getting a more significant risk. Since a
    32-bit CRC is quick and easy, it's worth using.

    Or, I might want to go with a simple checksum.

    Thanks for your comments.



    It's your choice (obviously). I only point out the weaknesses in case
    anyone else is listening in to the thread.

    If you like, I can post code for a 32-bit CRC. It's a table, and a few
    lines of C code.




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Sat Apr 22 20:05:11 2023
    From Newsgroup: comp.arch.embedded

    On 2023-04-22, David Brown <david.brown@hesbynett.no> wrote:

    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code changes
    "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum is likely to
    be exactly the same despite the change in the source. In general, you
    will have much higher chance of collisions, though I think it would be
    very hard to quantify that.

    I remember a long discussion about this a few decades ago. An N bit
    additive checksum maps the source data into the same hash space
    as a N-bit crc.

    Therefore, for two randomly chosen sets of input bits, they both have
    a 1 in 2^N chance of a collision. I think that means that for random
    changes to an input set of unspecified properties, they would both
    have the same chance that the hash is unchanged.

    However... IIRC, somebody (probably at somewhere like Bell labs)
    noticed that errors in data transmitted over media like phone lines
    and microwave links are _not_ random. Errors tend to be "bursty" and
    can be statistically characterized. And it was shown that for the
    common error modes for _those_ media, CRCs were better at detecting
    real-world failures than additive checksum. And (this is also
    important) a CRC is far, far simpler to implement in hardware than an
    additive checksum. For the same reasons, CRCs tend to get used for
    things like Ethernet frames, disc sectors, etc.

    Later people seem to have adopted CRCs for detecting failures in other
    very dissimilar media (e.g. EPROMs) where implementing a CRC is _more_
    work than an additive checksum. If the failure modes for EPROM are
    similar to those studied at <wherever> when CRCs were chosen, then
    CRCs are probably also a good choice for EPROMs despite the additional overhead. If the failure modes for EPROMs are significantly different,
    then CRCs might be both sub-optimal and unnecessarily expensive.

    I have no hard data either way, but it was never obvious to me that
    the arguments people use in favor of CRCs (better at detecting burst
    errors on transmission media) necessarily applied to EPROMs.

    That said, I do use CRCs rather than additive checksums for things
    like EPROM and flash.

    --
    Grant



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From boB@boB@K7IQ.com to comp.arch.embedded on Sat Apr 22 13:41:00 2023
    From Newsgroup: comp.arch.embedded

    On Sat, 22 Apr 2023 19:54:54 +0200, David Brown
    <david.brown@hesbynett.no> wrote:

    On 22/04/2023 18:56, Rick C wrote:
    On Saturday, April 22, 2023 at 11:13:32?AM UTC-4, David Brown wrote:
    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28?AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version numbers are
    set manually and not always done correctly. I'm looking for
    something as a backup so that if the checksums are different, I
    can be sure the versions are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of
    the image. Any change in the image will show up as a change in the
    crc.

    No one is trying to detect changes in the image. I'm trying to label
    the image in a way that can be read in operation. I'm using the
    checksum simply because that is easy to generate. I've had problems
    with version numbering in the past. It will be used, but I want it
    supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.

    Again - use a CRC. It will give you what you want.

    Again - as will a simple addition checksum.

    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code changes
    "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum is likely to
    be exactly the same despite the change in the source. In general, you
    will have much higher chance of collisions, though I think it would be
    very hard to quantify that.

    Maybe it will be good enough for you. Simple checksums were popular
    once, and can still make sense if you are very short on program space.
    But there are good reasons why they fell out of favour in many uses.



    You might want to go for 32-bit CRC rather than a 16-bit CRC, depending
    on the kind of program, how often you build it, and what consequences a
    hash collision could have. With a 16-bit CRC, you have a 5% chance of a
    collision after 82 builds. If collisions only matter for releases, and
    you only release a couple of updates, fine - but if they matter during
    development builds, you are getting a more significant risk. Since a
    32-bit CRC is quick and easy, it's worth using.

    Totally agree ! I stopped using simple checksums years ago.
    Many processors these days also have a CRC peripheral that makes it
    easy to use. And I can simply chop that off to 16 bits if I don't
    want to transmit all 32 bits. OR even 24 bits.

    boB






    Or, I might want to go with a simple checksum.

    Thanks for your comments.



    It's your choice (obviously). I only point out the weaknesses in case >anyone else is listening in to the thread.

    If you like, I can post code for a 32-bit CRC. It's a table, and a few >lines of C code.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Apr 23 17:37:41 2023
    From Newsgroup: comp.arch.embedded

    On 22/04/2023 22:05, Grant Edwards wrote:
    On 2023-04-22, David Brown <david.brown@hesbynett.no> wrote:

    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code changes
    "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum is likely to
    be exactly the same despite the change in the source. In general, you
    will have much higher chance of collisions, though I think it would be
    very hard to quantify that.

    I remember a long discussion about this a few decades ago. An N bit
    additive checksum maps the source data into the same hash space
    as a N-bit crc.

    Therefore, for two randomly chosen sets of input bits, they both have
    a 1 in 2^N chance of a collision. I think that means that for random
    changes to an input set of unspecified properties, they would both
    have the same chance that the hash is unchanged.

    However... IIRC, somebody (probably at somewhere like Bell labs)
    noticed that errors in data transmitted over media like phone lines
    and microwave links are _not_ random. Errors tend to be "bursty" and
    can be statistically characterized. And it was shown that for the
    common error modes for _those_ media, CRCs were better at detecting real-world failures than additive checksum. And (this is also
    important) a CRC is far, far simpler to implement in hardware than an additive checksum. For the same reasons, CRCs tend to get used for
    things like Ethernet frames, disc sectors, etc.

    Later people seem to have adopted CRCs for detecting failures in other
    very dissimilar media (e.g. EPROMs) where implementing a CRC is _more_
    work than an additive checksum. If the failure modes for EPROM are
    similar to those studied at <wherever> when CRCs were chosen, then
    CRCs are probably also a good choice for EPROMs despite the additional overhead. If the failure modes for EPROMs are significantly different,
    then CRCs might be both sub-optimal and unnecessarily expensive.

    I have no hard data either way, but it was never obvious to me that
    the arguments people use in favor of CRCs (better at detecting burst
    errors on transmission media) necessarily applied to EPROMs.

    That said, I do use CRCs rather than additive checksums for things
    like EPROM and flash.


    That's a lot of good points. You are absolutely correct that CRC's are
    better for the types of errors that are often seen in transmission
    systems. The person at Bell Labs that you are thinking about is
    probably Claude Shannon, famous for his quantitive definition of
    information and work on the information capacity of communication
    channels with noise.

    Another thing you can look at is the distribution of checksum outputs,
    for random inputs. For an additive checksum, you can consider your
    input as N independent 0-255 random values, added together. The result
    will be a normal distribution of the checksum. If you have, say, a 100
    byte data block and a 16-bit checksum, it's clear that you will never
    get a checksum value greater than 25500, and that you are much more
    likely to get a value close to 12750. This kind of clustering means
    that the 16-bit checksum contains a lot less than 16 bits of
    information. Real data - program images, data telegrams, etc., - are
    not fully random and the result is even more clustering and less
    information in the checksum.

    Taking the additive checksum over a larger range, then "folding" the distribution back by wrapping the checksum to 8-bit or 16-bit will
    greatly reduce the clustering. That will help a lot if you have a
    program image and use a 16-bit additive checksum, but if you need more
    than "1 in 65536" integrity, it's hard to get.

    A particular weakness of purely additive checksums is that they only
    consider the values of the bytes, not their order - re-arranging the
    order of the same data gives the same additive checksum.

    CRC's are not as good as more advanced hashes like SHA or MD5. But
    their distributions are vastly better than additive checksums, and they provide integrity checks for a wider variety of possible errors.


    Of course, for some uses, an additive checksum might be considered good enough. There's no need to be more complicated than you need to be.
    But since CRC's are usually very simple and efficient to calculate, they
    give an option that is a lot better than an additive checksum for little
    extra cost, while going beyond them to MD5 or SHA involves significantly
    more effort. (SHA is your first choice if you are protecting against malicious changes.)






    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sun Apr 23 10:34:00 2023
    From Newsgroup: comp.arch.embedded

    On Saturday, April 22, 2023 at 1:55:01 PM UTC-4, David Brown wrote:
    On 22/04/2023 18:56, Rick C wrote:
    On Saturday, April 22, 2023 at 11:13:32 AM UTC-4, David Brown wrote:
    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote: >>>> On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version numbers are
    set manually and not always done correctly. I'm looking for
    something as a backup so that if the checksums are different, I
    can be sure the versions are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of
    the image. Any change in the image will show up as a change in the
    crc.

    No one is trying to detect changes in the image. I'm trying to label
    the image in a way that can be read in operation. I'm using the
    checksum simply because that is easy to generate. I've had problems
    with version numbering in the past. It will be used, but I want it
    supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.

    Again - use a CRC. It will give you what you want.

    Again - as will a simple addition checksum.
    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code changes
    "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum is likely to
    be exactly the same despite the change in the source. In general, you
    will have much higher chance of collisions, though I think it would be
    very hard to quantify that.

    Maybe it will be good enough for you. Simple checksums were popular
    once, and can still make sense if you are very short on program space.
    But there are good reasons why they fell out of favour in many uses.


    You might want to go for 32-bit CRC rather than a 16-bit CRC, depending >> on the kind of program, how often you build it, and what consequences a >> hash collision could have. With a 16-bit CRC, you have a 5% chance of a >> collision after 82 builds. If collisions only matter for releases, and
    you only release a couple of updates, fine - but if they matter during
    development builds, you are getting a more significant risk. Since a
    32-bit CRC is quick and easy, it's worth using.

    Or, I might want to go with a simple checksum.

    Thanks for your comments.

    It's your choice (obviously). I only point out the weaknesses in case
    anyone else is listening in to the thread.

    If you like, I can post code for a 32-bit CRC. It's a table, and a few
    lines of C code.
    You know nothing of the project I am working on or those that I typically work on. But thanks for the advice.
    --
    Rick C.
    +-- Get 1,000 miles of free Supercharging
    +-- Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Sun Apr 23 17:37:13 2023
    From Newsgroup: comp.arch.embedded

    On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:

    Another thing you can look at is the distribution of checksum outputs,
    for random inputs. For an additive checksum, you can consider your
    input as N independent 0-255 random values, added together. The result
    will be a normal distribution of the checksum. If you have, say, a 100
    byte data block and a 16-bit checksum, it's clear that you will never
    get a checksum value greater than 25500, and that you are much more
    likely to get a value close to 12750.

    It never occurred to me that for an N-bit checksum, you would sum
    something other than N-bit "words" of the input data.

    --
    Grant

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Apr 23 23:45:24 2023
    From Newsgroup: comp.arch.embedded

    On 23/04/2023 19:37, Grant Edwards wrote:
    On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:

    Another thing you can look at is the distribution of checksum outputs,
    for random inputs. For an additive checksum, you can consider your
    input as N independent 0-255 random values, added together. The result
    will be a normal distribution of the checksum. If you have, say, a 100
    byte data block and a 16-bit checksum, it's clear that you will never
    get a checksum value greater than 25500, and that you are much more
    likely to get a value close to 12750.

    It never occurred to me that for an N-bit checksum, you would sum
    something other than N-bit "words" of the input data.


    Usually - in my experience - you sum bytes, using an unsigned integer
    8-bit or 16-bit wide. Simple additive checksums are often used on small
    8-bit microcontrollers where CRC's are seen (rightly or wrongly) as too demanding. Perhaps other people have different experiences.

    You could certainly sum 16-bit words to get your 16-bit additive
    checksum, and that would give a different kind of clustering - maybe
    better, maybe not.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Apr 23 23:58:45 2023
    From Newsgroup: comp.arch.embedded

    On 23/04/2023 19:34, Rick C wrote:
    On Saturday, April 22, 2023 at 1:55:01 PM UTC-4, David Brown wrote:
    On 22/04/2023 18:56, Rick C wrote:
    On Saturday, April 22, 2023 at 11:13:32 AM UTC-4, David Brown
    wrote:
    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown
    wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version
    numbers are set manually and not always done correctly.
    I'm looking for something as a backup so that if the
    checksums are different, I can be sure the versions are
    not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a
    hash of the image. Any change in the image will show up as
    a change in the crc.

    No one is trying to detect changes in the image. I'm trying
    to label the image in a way that can be read in operation.
    I'm using the checksum simply because that is easy to
    generate. I've had problems with version numbering in the
    past. It will be used, but I want it supplemented with a
    number that will change every time the design changes, at
    least with a high probability, such as 1 in 64k.

    Again - use a CRC. It will give you what you want.

    Again - as will a simple addition checksum.
    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code
    changes "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum
    is likely to be exactly the same despite the change in the source.
    In general, you will have much higher chance of collisions, though
    I think it would be very hard to quantify that.

    Maybe it will be good enough for you. Simple checksums were
    popular once, and can still make sense if you are very short on
    program space. But there are good reasons why they fell out of
    favour in many uses.


    You might want to go for 32-bit CRC rather than a 16-bit CRC,
    depending on the kind of program, how often you build it, and
    what consequences a hash collision could have. With a 16-bit
    CRC, you have a 5% chance of a collision after 82 builds. If
    collisions only matter for releases, and you only release a
    couple of updates, fine - but if they matter during development
    builds, you are getting a more significant risk. Since a 32-bit
    CRC is quick and easy, it's worth using.

    Or, I might want to go with a simple checksum.

    Thanks for your comments.

    It's your choice (obviously). I only point out the weaknesses in
    case anyone else is listening in to the thread.

    If you like, I can post code for a 32-bit CRC. It's a table, and a
    few lines of C code.

    You know nothing of the project I am working on or those that I
    typically work on. But thanks for the advice.


    You haven't given much to go on. It is still not really clear (to me,
    at least) if you are asking about checksums or how to manipulate binary
    images as part of a build process, or what you are really asking.

    When someone wants a checksum on an image file, the appropriate choice
    in most cases is a CRC. If security is an issue, then a secure hash is needed. For a very limited system, additive checksums might be then
    only realistic choice.

    But more often, the reason people pick additive checksums rather than
    CRCs is because they don't realise that CRCs are actually very simple
    and efficient to implement. People unfamiliar with them might have read
    a little, and think they need to do calculations for each bit (which is possible but /slow/), or that they would have to understand the theory
    of binary polynomial division rings (they don't). They think CRC's are complicated and advanced, and shy away from them.

    There are a number of people who read this group - maybe some of them
    have learned a little from this thread.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@Richard@Damon-Family.org to comp.arch.embedded on Sun Apr 23 18:16:00 2023
    From Newsgroup: comp.arch.embedded

    On 4/23/23 5:45 PM, David Brown wrote:
    On 23/04/2023 19:37, Grant Edwards wrote:
    On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:

    Another thing you can look at is the distribution of checksum outputs,
    for random inputs.  For an additive checksum, you can consider your
    input as N independent 0-255 random values, added together.  The result >>> will be a normal distribution of the checksum.  If you have, say, a 100 >>> byte data block and a 16-bit checksum, it's clear that you will never
    get a checksum value greater than 25500, and that you are much more
    likely to get a value close to 12750.

    It never occurred to me that for an N-bit checksum, you would sum
    something other than N-bit "words" of the input data.


    Usually - in my experience - you sum bytes, using an unsigned integer
    8-bit or 16-bit wide.  Simple additive checksums are often used on small 8-bit microcontrollers where CRC's are seen (rightly or wrongly) as too demanding.  Perhaps other people have different experiences.

    You could certainly sum 16-bit words to get your 16-bit additive
    checksum, and that would give a different kind of clustering - maybe
    better, maybe not.



    I have seen 16-bit checksums done both ways. Summing 16 bit units does eliminate the issue of clustering, and makes adjacent byte swaps
    detectable.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Sun Apr 23 15:24:40 2023
    From Newsgroup: comp.arch.embedded

    On Sunday, April 23, 2023 at 5:58:51 PM UTC-4, David Brown wrote:
    On 23/04/2023 19:34, Rick C wrote:
    On Saturday, April 22, 2023 at 1:55:01 PM UTC-4, David Brown wrote:
    On 22/04/2023 18:56, Rick C wrote:
    On Saturday, April 22, 2023 at 11:13:32 AM UTC-4, David Brown
    wrote:
    On 22/04/2023 05:14, Rick C wrote:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown
    wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique,
    regardless of what the version number says. Version
    numbers are set manually and not always done correctly.
    I'm looking for something as a backup so that if the
    checksums are different, I can be sure the versions are
    not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a
    hash of the image. Any change in the image will show up as
    a change in the crc.

    No one is trying to detect changes in the image. I'm trying
    to label the image in a way that can be read in operation.
    I'm using the checksum simply because that is easy to
    generate. I've had problems with version numbering in the
    past. It will be used, but I want it supplemented with a
    number that will change every time the design changes, at
    least with a high probability, such as 1 in 64k.

    Again - use a CRC. It will give you what you want.

    Again - as will a simple addition checksum.
    A simple addition checksum might be okay much of the time, but it
    doesn't have the resolving power of a CRC. If the source code
    changes "a = 1; b = 2;" to "a = 2; b = 1;", the addition checksum
    is likely to be exactly the same despite the change in the source.
    In general, you will have much higher chance of collisions, though
    I think it would be very hard to quantify that.

    Maybe it will be good enough for you. Simple checksums were
    popular once, and can still make sense if you are very short on
    program space. But there are good reasons why they fell out of
    favour in many uses.


    You might want to go for 32-bit CRC rather than a 16-bit CRC,
    depending on the kind of program, how often you build it, and
    what consequences a hash collision could have. With a 16-bit
    CRC, you have a 5% chance of a collision after 82 builds. If
    collisions only matter for releases, and you only release a
    couple of updates, fine - but if they matter during development
    builds, you are getting a more significant risk. Since a 32-bit
    CRC is quick and easy, it's worth using.

    Or, I might want to go with a simple checksum.

    Thanks for your comments.

    It's your choice (obviously). I only point out the weaknesses in
    case anyone else is listening in to the thread.

    If you like, I can post code for a 32-bit CRC. It's a table, and a
    few lines of C code.

    You know nothing of the project I am working on or those that I
    typically work on. But thanks for the advice.

    You haven't given much to go on. It is still not really clear (to me,
    at least) if you are asking about checksums or how to manipulate binary images as part of a build process, or what you are really asking.
    If you don't understand, you are making this far more complicated than it is. I don't know what to tell you. There are no other details that are relevant. Don't read into this, what is not there.
    When someone wants a checksum on an image file, the appropriate choice
    in most cases is a CRC.
    Why? What makes a CRC an "appropriate" choice. Normally, when I design something, I establish the requirements. What requirements are you assuming, that would make the CRC more desireable than a simple checksum?
    If security is an issue, then a secure hash is
    needed. For a very limited system, additive checksums might be then
    only realistic choice.
    What have I said that makes you think security is an issue??? I don't recall ever mentioning anything about security. Do you recall what I did say?
    But more often, the reason people pick additive checksums rather than
    CRCs is because they don't realise that CRCs are actually very simple
    and efficient to implement.
    The fact that they are "simple and efficient" is not a reason to use them. I repeat, what are the requirements?
    People unfamiliar with them might have read
    a little, and think they need to do calculations for each bit (which is possible but /slow/), or that they would have to understand the theory
    of binary polynomial division rings (they don't). They think CRC's are complicated and advanced, and shy away from them.

    There are a number of people who read this group - maybe some of them
    have learned a little from this thread.
    I suppose there is that possibility. But when people make claims about something being good or "better", without substantiation, there's not much to learn.
    If you think a discussion of CRC calculations would be useful, why don't you open a thread and discuss them, instead of insisting they are the right solution to my problem, when you don't even know what the problem requirements are? It's all here in the thread. You only need to read, without projecting your opinions on the problem statement.
    --
    Rick C.
    +-+ Get 1,000 miles of free Supercharging
    +-+ Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Apr 24 09:13:13 2023
    From Newsgroup: comp.arch.embedded

    On 24/04/2023 00:16, Richard Damon wrote:
    On 4/23/23 5:45 PM, David Brown wrote:
    On 23/04/2023 19:37, Grant Edwards wrote:
    On 2023-04-23, David Brown <david.brown@hesbynett.no> wrote:

    Another thing you can look at is the distribution of checksum outputs, >>>> for random inputs.  For an additive checksum, you can consider your
    input as N independent 0-255 random values, added together.  The result >>>> will be a normal distribution of the checksum.  If you have, say, a 100 >>>> byte data block and a 16-bit checksum, it's clear that you will never
    get a checksum value greater than 25500, and that you are much more
    likely to get a value close to 12750.

    It never occurred to me that for an N-bit checksum, you would sum
    something other than N-bit "words" of the input data.


    Usually - in my experience - you sum bytes, using an unsigned integer
    8-bit or 16-bit wide.  Simple additive checksums are often used on
    small 8-bit microcontrollers where CRC's are seen (rightly or wrongly)
    as too demanding.  Perhaps other people have different experiences.

    You could certainly sum 16-bit words to get your 16-bit additive
    checksum, and that would give a different kind of clustering - maybe
    better, maybe not.



    I have seen 16-bit checksums done both ways. Summing 16 bit units does eliminate the issue of clustering, and makes adjacent byte swaps
    detectable.

    Long ago, there used to be a definite risk of mixing up endianness when dealing with program images burned to flash or eeprom. Popular "hex"
    formats like Intel Hex and Motorola SRecord could differ in endianness.
    So byte swaps in the entire image was a real possibility, and good to
    guard against. But it's hard to imagine how an individual byte swap
    could occur - I see bigger movements and re-arrangements being more
    likely, and using 16-bit units will not help much there. Still, I think
    there is little doubt that using 16-bit units is better than using 8-bit
    units in many ways (except for efficient implementation on small 8-bit devices).


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Apr 24 09:17:27 2023
    From Newsgroup: comp.arch.embedded

    On 24/04/2023 00:24, Rick C wrote:
    On Sunday, April 23, 2023 at 5:58:51 PM UTC-4, David Brown wrote:

    When someone wants a checksum on an image file, the appropriate
    choice in most cases is a CRC.

    Why? What makes a CRC an "appropriate" choice. Normally, when I
    design something, I establish the requirements. What requirements
    are you assuming, that would make the CRC more desireable than a
    simple checksum?


    I've already explained this in quite a lot of detail in this thread (as
    have others). If you don't like my explanation, or didn't read it,
    that's okay. You are under no obligation to learn about CRCs. Or if
    you prefer to look it up in other sources, that's obviously also an option.


    If security is an issue, then a secure hash is needed. For a very
    limited system, additive checksums might be then only realistic
    choice.

    What have I said that makes you think security is an issue??? I
    don't recall ever mentioning anything about security. Do you recall
    what I did say?


    If you think a discussion of CRC calculations would be useful, why
    don't you open a thread and discuss them, instead of insisting they
    are the right solution to my problem, when you don't even know what
    the problem requirements are? It's all here in the thread. You only
    need to read, without projecting your opinions on the problem
    statement.



    I've asked you this before - are you /sure/ you understand how Usenet works? --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Mon Apr 24 00:32:42 2023
    From Newsgroup: comp.arch.embedded

    On 4/22/2023 7:57 AM, David Brown wrote:
    However, in almost every case where CRC's might be useful, you have
    additional checks of the sanity of the data, and an all-zero or all-one data
    block would be rejected.  For example, Ethernet packets use CRC for
    integrity checking, but an attempt to send a packet type 0 from MAC address
    00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, would be
    rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its CRC?
    Run the CRC first.  If it fails, decide how you are going to proceed
    or recover.

    That is usually the order, yes.  Sometimes you want "fail fast", such as dropping a packet that was not addressed to you (it doesn't matter if it was received correctly but for someone else, or it was addressed to you but the receiver address was corrupted - you are dropping the packet either way).  But
    usually you will run the CRC then look at the data.

    But the order doesn't matter - either way, you are still checking for valid data, and if the data is invalid, it does not matter if the CRC only passed by
    luck or by all zeros.

    You're assuming the CRC is supposed to *vouch* for the data.
    The CRC can be there simply to vouch for the *transport* of a
    datagram.

    I can't think of any use-cases where you would be passing around a block of
    "pure" data that could reasonably take absolutely any value, without any >>> type of "envelope" information, and where you would think a CRC check is >>> appropriate.

    I append a *version specific* CRC to each packet of marshalled data
    in my RMIs.  If the data is corrupted in transit *or* if the
    wrong version API ends up targeted, the operation will abend
    because we know the data "isn't right".

    Using a version-specific CRC sounds silly.  Put the version information in the
    packet.

    The packet routed to a particular interface is *supposed* to
    conform to "version X" of an interface. There are different stubs
    generated for different versions of EACH interface. The OCL for
    the interface defines (and is used to check) the form of that
    interface to that service/mechanism.

    The parameters are checked on the client side -- why tie up the
    transport medium with data that is inappropriate (redundant)
    to THAT interface? Why tie up the server verifying that data?
    The stub generator can perform all of those checks automatically
    and CONSISTENTLY based on the OCL definition of that version
    of that interface (because developers make mistakes).

    So, at the instant you schedule the marshalled data for transmission,
    you *know* the parameters are "appropriate" and compliant with
    the constraints of THAT version of THAT interface.

    Now, you have to ensure the packet doesn't get corrupted (altered) in transmission. If it remains intact, then there is no need to check
    the parameters on the server side.

    NONE OF THE PARAMETERS... including the (implied) "interface version" field!

    Yet, folks make mistakes. So, you want some additional reassurance
    that this is at least intended for this version of the interface,
    ESPECIALLY IF THAT CAN BE MADE AVAILABLE FOR ZERO COST (i.e., check
    to see if the residual is 0xDEADBEEF instead of 0xB16B00B5).

    Why burden the packet with a "protocol version" parameter?

    So, use a version-specific CRC on the packet. If it fails, then
    either the data in the packet has been corrupted (which could just
    as easily have involved an embedded "interface version" parameter);
    or the packet was formed with the wrong CRC.

    If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
    why bother looking at a "protocol version" parameter? Would
    you ALSO want to verify all the rest of the parameters?

    I *could* put a header saying "this is version 4.2".  And, that
    tells me nothing about the integrity of the rest of the data.
    OTOH, ensuring the CRC reflects "4.2" does -- it the recipient
    expects it to be so.

    Now you don't know if the data is corrupted, or for the wrong version - or occasionally, corrupted /and/ the wrong version but passing the CRC anyway.

    You don't know if the parameters have been corrupted in a manner that
    allows a packet intended for the correct interface to appear as correct.
    What's your point?

    Unless you are absolutely desperate to save every bit you can, your system will
    be simpler, clearer, and more reliable if you separate your purposes.

    Yes. You verify the correct interface at the client side -- where
    it is invoked by the client and enforced in the OCL generated stub.
    Thereafter, the server is concerned with corruption during transport
    and the version specific CRC just gives another reassurance of
    correct version without adding another cost.

    [Imagine EVERY subroutine function call in your system having
    such overhead. Would you want to push an "interface version"
    onto the stack along with all of the arguments for that
    subr/ftn? Or, would you just hope everything was intact?]

    You can also "salt" the calculation so that the residual
    is deliberately nonzero.  So, for example, "success" is
    indicated by a residual of 0x474E.  :>

    Again, pointless.

    Salt is important for security-related hashes (like password hashes), not
    for integrity checks.

    You've missed the point.  The correct "sum" can be anything.
    Why is "0" more special than any other value?  As the value is
    typically meaningless to anything other than the code that verifies
    it, you couldn't look at an image (or the output of the verifier)
    and gain anything from seeing that obscure value.

    Do you actually know what is meant by "salt" in the context of hashes, and >>> why it is useful in some circumstances?  Do you understand that "salt" is >>> added (usually prepended, or occasionally mixed in in some other way) to the
    data /before/ the hash is calculated?

    What term would you have me use to indicate a "bias" applied to a CRC
    algorithm?

    Well, first I'd note that any kind of modification to the basic CRC algorithm
    is pointless from the viewpoint of its use as an integrity check.  (There have
    been, mostly historically, some justifications in terms of implementation efficiency.  For example, bit and byte re-ordering could be done to suit hardware bit-wise implementations.)

    Otherwise I'd say you are picking a specific initial value if that is what you
    are doing, or modifying the final value (inverting it or xor'ing it with a fixed value).  There is, AFAIK, no specific terms for these - and I don't see
    any benefit in having one.  Misusing the term "salt" from cryptography is certainly not helpful.

    Salt just ensures that you can differentiate between functionally identical values. I.e., in a CRC, it differentiates between the "0x0000" that CRC-1 generates from the "0x0000" that CRC-2 generates.

    You don't see the parallel to ensuring that *my* use of "Passw0rd" is
    encoded in a different manner than *your* use of "Passw0rd"?

    See the RMI desciption.

    I'm sorry, I have no idea what "RMI" is or where it is described. You've mentioned that abbreviation twice, but I can't figure it out.

    <https://en.wikipedia.org/wiki/RMI>
    <https://en.wikipedia.org/wiki/OCL>

    Nothing magical with either term.

    OTOH, "salting" the calculation so that it is expected to yield
    a value of 0x13 means *those* situations will be flagged as errors
    (and a different set of situations will sneak by, undetected).

    And that gives you exactly /zero/ benefit.

    See above.

    You run your hash algorithm, and check for the single value that indicates no
    errors.  It does not matter if that number is 0, 0x13, or - often more
    -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    As you've admitted, it doesn't matter. So, why wouldn't I opt to have
    an algorithm for THIS interface give me a result that is EXPECTED
    for this protocol? What value picking "0"?

    conveniently - the number attached at the end of the image as the expected result of the hash of the rest of the data.

    To be more accurate, the chances of them passing unnoticed are of the order
    of 1 in 2^n, for a good n-bit check such as a CRC check. Certain types of >>> error are always detectable, such as single and double bit errors.  That is
    the point of using a checksum or hash for integrity checking.

    /Intentional/ changes are a different matter.  If a hacker changes the >>> program image, they can change the transmitted hash to their own calculated
    hash.  Or for a small CRC, they could change a different part of the image
    until the original checksum matched - for a 16-bit CRC, that only takes >>> 65,535 attempts in the worst case.

    If the approach used is "typical", then you need far fewer attempts to
    produce a correct image -- without EVER knowing where the CRC is stored.

    It is difficult to know what you are trying to say here, but if you believe that different initial values in a CRC algorithm makes it harder to modify an
    image to make it pass the integrity test, you are simply wrong.

    Of course it does! You don't KNOW what to expect -- unless you've identified where the test is performed in the code and the result stored/checked. If
    you assume the residual will be 0 and make an attempt to generate a new checksum that yields 0 and it doesn't work FIRST TIME, then, by definition,
    it is HARDER (more work is required -- even if not *conceptually* more *difficult*).

    *My* example use of the different salt is for a different purpose.
    And, isn't meant as a deterrent to any developer/attacker but, rather,
    simply to ensure the transmission of the packet is intact AND carries
    some reassurance that it is in the correct format.

    That is why you need to distinguish between the two possibilities.  If you
    don't have to worry about malicious attacks, a 32-bit CRC takes a dozen >>> lines of C code and a 1 KB table, all running extremely efficiently.  If >>> security is an issue, you need digital signatures - an RSA-based signature >>> system is orders of magnitude more effort in both development time and in >>> run time.

    It's considerably more expensive AND not fool-proof -- esp if the
    attacker knows you are signing binaries.  "OK, now I need to find
    WHERE the signature is verified and just patch that "CALL" out
    of the code".

    I'm not sure if that is a straw-man argument, or just showing your ignorance of
    the topic.  Do you really think security checks are done by the program you are
    trying to send securely?  That would be like trying to have building security
    where people entering the building look at their own security cards.

    Do YOU really think we all design applications that run in PCs where some CLOSED OS performs these tests in a manner that can't be subverted?
    *WE* (tend to) write ALL the code in the products developed, here.
    So, whether it's the POST WE wrote that is performing the test or
    the loader WE wrote, it's still *our* program.

    Yes, we ARE looking at our own security cards!

    Manufacturers *try* to hide ("obscurity") details of these mechanisms
    in an attempt to improve effective security. But, there's nothing
    that makes these guarantees.

    Give me the sources for Windows (Linux, *BSD, etc.) and I can
    subvert all the state-of-the-art digital signing used to ensure
    binaries aren't altered. Nothing *outside* the box is involved
    so, by definition, everything I need has to reside *in* the box.

    DataI/O was always paranoid about their software/firmware.
    They rely on custom silicon to "protect" their investment.
    But, use a COTS CPU to execute the code!

    So, pull the MC68K out of its socket and plug in an emulator.
    Capture the execution trace and you know exactly what the
    instruction stream is/was -- despite it's encoding on the
    distribution media.

    I altered the test procedure for a
    piece of military gear we were building simply to skip some lengthy tests >>>> that I *knew* would pass (I don't want to inject an extra 20 minutes of >>>> wait time
    just to get through a lengthy test I already know works before I can get >>>> to the test of interest to me, now.

    I failed to undo the change before the official signoff on the device. >>>>
    The only evidence of this was the fact that I had also patched the
    startup message to say "Go for coffee..." -- which remained on the
    screen for the duration of the lengthy (even with the long test
    elided) procedure...

    ..which alerted folks to the fact that this *probably* wasn't the
    original image.  (The computer running the test suite on the DUT had
    no problem accepting my patched binary)

    And what, exactly, do you think that anecdote tells us about CRC checks for
    image files?  It reminds us that we are all fallible, but does no more than
    that.

    That *was* the point.  Because the folks who designed the test computer
    relied on common techniques to safeguard the image.

    There was a human error - procedures were not good enough, or were not followed.  It happens, and you learn from it and make better procedures.  The
    fault was in what people did, not in an automated integrity check.  It is completely unrelated.

    It shows that the check was designed without consideration of how
    it might be subverted. This is the most common flaw in all
    security schemes -- failing to consider an attack/fault vector.

    The vendor assumed no one would deliberately alter the test
    procedure. That anyone running it would willingly sit through an
    extra half hour of tests ALREADY KNOWN TO PASS instead of opting
    to find a way to skip that (because the test designer only consider
    "sell off" when designing the test and not *debug* and the test
    platform didn't provide hooks to facilitate that, either!!)

    I unplugged a cable between two pieces of equipment that I had
    never seen before to subvert a security mechanism in a product.
    Because the designers never considered the fact that someone
    might do that!

    Security is no different from any other "solution". You test
    the divisor before a calculation because you reasonably expect to
    encounter "unfortunate" values and don't want the operation to
    fail.

    The counterfeiting example I cited indicates how "obscurity/secrecy"
    is far more effective (yet you dismiss it out-of-hand).

    No, it does nothing of the sort.  There is no connection at all.

    The counterfeiter lost the lawsuit because he was unaware (obscurity)
    of the hidden SECURITY measures in the product design. This proven
    by his attempts to defeat the OBVIOUS ones!

    CRC's alone are useless.  All the attacker needs to do after modifying the
    image is calculate the CRC themselves, and replace the original checksum >>>>> with their own.

    That assumes the "alterer" knows how to replace the checksum, how it
    is computed, where it is embedded in the image, etc.  I modified the Compaq
    portable mentioned without ever knowing where the checksum was store
    or *if* it was explicitly stored.  I had no desire to disassemble the >>>> BIOS ROMs (though could obviously do so as there was no "proprietary
    hardware" limiting access to their contents and the instruction set of >>>> the processor is well known!).

    Instead, I did this by *guessing* how they would implement such a check >>>> in a bit of kit from that era (ERPOMs aren't easily modified by malware >>>> so it wasn't likely that they would go to great lengths to "protect" the >>>> image).  And, if my guess had been incorrect, I could always reinstall >>>> the original EPROMs -- nothing lost, nothing gained.

    Had much experience with folks counterfeiting your products and making >>>> "simple" changes to the binaries?  Like changing the copyright notice >>>> or splash screen?

    Then, bringing the (accused) counterfeit of YOUR product into a courtroom >>>> and revealing the *hidden* checksum that the counterfeiter wasn't aware of?

    "Gee, why does YOUR (alleged) device have *my* name in it -- in addition >>>> to behaving exactly like mine??"

    [I guess obscurity has its place!]

    Security by obscurity is not security.  Having a hidden signature or other
    mark can be useful for proving ownership (making an intentional mistake is >>> another common tactic - such as commercial maps having a few subtle spelling
    errors). But that is not security.

    Of course it is!  If *you* check the "hidden signature" at runtime
    and then alter "your" operation such that an altered copy fails
    to perform properly, then then you have secured it.

    That is not security.  "Security" means that the program that starts the updated program checks the /entire/ image according to its digital signature,
    and rejects it /entirely/ if it does not match.

    No, that's *your* naive assumption of security. It's why such attempts invariably fail; they are "drop dead" implementations that make it
    clear to anyone trying to subvert that security that their
    efforts have not (yet) succeeded.

    The goal is to prevent the program/device from being used without authorization/compensation. If it KILLS the user as a result of some
    hidden feature, it has met its goal -- even if a draconian approach.
    If it *pretends* to be doing what you want --- and then fails to
    complete some later step -- it is similarly preventing unauthorized
    use (and tying up a lot of your time, in the process).

    If you want to ensure the image isn't *corrupt* (which could
    lead to failures that could invite lawsuits, etc.), then you
    are concerned with INTEGRITY.

    What you are talking about here is the sort of cat-and-mouse nonsense computer
    games producers did with intentional disk errors to stop copying.  It annoys
    legitimate users and does almost nothing to hinder the bad guys.

    Because it was a bad solution that was fairly obvious in its presence:
    "I can't copy this disk! Let me buy Copy2PC..."

    The same applies to most licensing schemes and other "tamper
    detection" mechanisms.

    Would you want to use a check-writing program if the account
    balances it maintains were subtly (but not consistently)
    incorrect?

    Again, you make no sense.  What has this got to do with integrity checks or security?

    If you;re selling check writing software and want to prevent
    FlyByNight Accounting Software, Inc. from stealing your
    product and reselling it as your own, a great way to prevent that
    is to ensure THEIR copy of the product causes accounting errors
    that are hard to notice. Their customers will (eventually)
    complain that THEIR product is buggy. But, yours isn't!

    If your goal is to track your checks accurately, you're
    likely not going to want to wonder what yet-to-be-discovered
    errors exist in the "books" that THEIR software has been
    maintaining for you.

    The original vendor has secured his product against tampering.

    Checking signatures, CRCs, licensing schemes, etc. all are used
    in a "drop dead" fashion so considerably easier to defeat.
    Witness the number of "products" available as warez...

    Look, it is all /really/ simple.  And the year is 2023, not 1973.

    Yes! And it is considerably easier to subvert naive mechanisms
    AND SHARE YOUR HACKS!

    If you want to check the integrity of a file against accidental changes, a CRC
    is usually fine.

    As is a CRC on a network packet. Without having to double-check the
    contents of that packet after it has been verified on the sending side!

    If you want security, and to protect against malicious changes, use a digital
    signature.  This must be checked by the program that /starts/ the updated code,
    or that downloaded and stored it - not by the program itself!

    And who wrote THAT program? Where is it, physically? Is there some device OUTSIDE of the device that you've built that securely performs these
    checks?

    Only answer with a support contract.

    Oh, sure - the amateurs who have some of the information but not enough details, skill or knowledge to get things working will /never/ fill forums with
    questions, complaints or bad reviews that bother your support staff or scare away real sales.

    A forum doesn't have to be "public". FUD can scare off real sales
    even in the total absence of information (or knowledge).

    Your goal should always be to produce a good product that does
    what it claims to do. And, rely on the happiness of your
    customers to directly (or indirectly) generate additional sales.

    I've never "advertised" my services. I'm actually pretty hard to get
    in touch with! Yet, clients never had a hard time finding me -- through
    other clients who were happy with my work. As they likely weren't
    direct competitors to the original clients, they had nothing to
    fear (lose) from sharing me, as a resource.

    Similarly, a customer making widgets that employ some feature of
    your device likely has little to lose by sharing his (good or bad)
    experiences with another POTENTIAL customer (making wodjets).
    And, likely can benefit from the goodwill he receives from that
    other customer as well as from *you* ("Thanks for recommending
    us to him!"). And, ensures a continued demand for your products
    so you continue to be available for HIS needs!

    They are limited by idiotic government export restrictions made by ignorant
    politicians who don't understand cryptography.

    Protections don't always have to be cryptographic.

    Correct, but - as with a lot of what you write - completely irrelevant to the
    subject at hand.

    Why can't companies give out information about the security systems used in their microcontrollers (for example) ?  Because some geriatric ignoramuses think banning "export" of such information to certain countries will stop those
    countries knowing about security and cryptography.

    Do you really think that's the sole reason for all the "secrecy" and NDAs?
    I've had to sit with gummit folks and sort out what parts of our technology could LEGALLY be exported. Even to our partners in the UK! Some of it
    makes sense ("Nothing goes to Libya!"). Some is bogus.

    And, thinking that you can put up a wall that is impermeable is a joke.
    Just like printing PGP in book form and selling books overseas.

    Or, hiring someone who worked for Company X. Or, bribing someone
    to make a photocopy of <whatever>.

    But, this doesn't mean one should ENCOURAGE dissemination of things
    that may have special security/economic value. "Delay" often has
    as much value as "deter".

    A friend who designed arcade pieces recounted how he was contacted by a guy
    who had disassembled ~40KB of (hand-written) code in one of his products.
    He had even uncovered latent bugs (!) in the code.

    But, his efforts were so "late" that the product had long ago lost
    commercial value. So, it may have been flattering that someone
    would invest that much time in such an endeavor. But, little else.

    Nowadays, tools would make that a trivial undertaking. And, the
    possibility of easily enlisting others in the effort (without
    resorting to clandestine channels). OTOH, projects are now
    considerably larger (orders of magnitude). OToOH, much current
    work in done in HLLs (so tools can recognize their code genrator
    patterns) and with "standard" libraries; I can recognize a call to
    printf without decompiling any 9of the code -- folks aren't
    likely going to replace "%d" with "?b" just to obscure functionality!

    Some things benefit from being kept hidden, or under restricted access. The
    details of the CRC algorithm you use to catch accidental errors in your >>> image file is /not/ one of them.  If you think hiding it has the remotest >>> hint of a benefit, you are doing things wrong - you need a /security/ check,
    not a simple /integrity/ check.

    And then once you have switched to a security check - a digital signature -
    there's no need to keep that choice hidden either, because it is the /key/ >>> that is important, not the type of lock.

    Again, meaningless if the attacker can interfere with the *enforcement*
    of that check.  Using something "well known" just means he already knows
    what to look for in your code.  Or, how to interfere with your
    intended implementation in ways that you may have not anticipated
    (confident that your "security" can't be MATHEMATICALLY broken).


    If the attacker can interfere with the enforcement of the check, then it doesn't matter what checks you have.  Keeping the design of a building's locks
    secret does not help you if the bad guys have bribed the security guard /inside/ the building!

    But, if that's the only way to subvert the secrets of those locks,
    then you only have to worry about keeping that security guard "happy".

    "Here's my source code.  Here are my schematics.  Here's the
    name of the guy who oversees production (bribe him to gain
    access to the keys stored in the TPM).  Now, what are you
    gonna *do* with all that?"

    The first two should be fine - if people can break your security after looking
    at your source code or schematics, your security is /bad/.  As for the third
    one, if they can break your security by going through the production guy, your
    production procedures are bad.

    You can change your production procedures without having to redesign your product. You don't want to embrace a solution/technology that may soon/later be subverted (e.g., SHA1) and have to redesign portions of your product
    (which may already be deployed) to "fix".

    IMO, this is the downside of modern cryptography -- if you have a product
    with any significant lifespan and "exposure". You never know when the
    next "uncrackable" algorithm will fall. And, when someone might opt to marshall a community's resources to attack a particular implementation.

    Attacks that used to be considered "nation-state scale" are quickly
    becoming "big business scale" and even "network of workstations scale".
    So, any implementation that *shares* a key across a product line
    is vulnerable to the entire product line being compromised when/if
    that key is disclosed/broken.

    [I generate unique keys for each device on the customer's site
    using a dedicated (physically secure) interface so even the manufacturer doesn't know what they are. Crack one (possibly by physically attacking
    the device and microprobing the die) and all you get it that one
    device -- and whatever *its* role in the system may have been.]


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Mon Apr 24 01:07:13 2023
    From Newsgroup: comp.arch.embedded

    On Monday, April 24, 2023 at 3:17:33 AM UTC-4, David Brown wrote:
    On 24/04/2023 00:24, Rick C wrote:
    On Sunday, April 23, 2023 at 5:58:51 PM UTC-4, David Brown wrote:

    When someone wants a checksum on an image file, the appropriate
    choice in most cases is a CRC.

    Why? What makes a CRC an "appropriate" choice. Normally, when I
    design something, I establish the requirements. What requirements
    are you assuming, that would make the CRC more desireable than a
    simple checksum?

    I've already explained this in quite a lot of detail in this thread (as
    have others). If you don't like my explanation, or didn't read it,
    that's okay. You are under no obligation to learn about CRCs. Or if
    you prefer to look it up in other sources, that's obviously also an option.
    Hmmm... I ask you a question about why you think CRC is better for my application and you respond oddly. So you can't explain why the CRC would be better for my application? OK, thanks anyway.
    If security is an issue, then a secure hash is needed. For a very
    limited system, additive checksums might be then only realistic
    choice.

    What have I said that makes you think security is an issue??? I
    don't recall ever mentioning anything about security. Do you recall
    what I did say?


    If you think a discussion of CRC calculations would be useful, why
    don't you open a thread and discuss them, instead of insisting they
    are the right solution to my problem, when you don't even know what
    the problem requirements are? It's all here in the thread. You only
    need to read, without projecting your opinions on the problem
    statement.

    I've asked you this before - are you /sure/ you understand how Usenet works?
    I will say this again, rather than burying your comments on CRC in this thread about checksums, why not open a new thread, and allow the world to read what you have to say, instead of commenting as a side topic in a thread where most people have tuned out long ago? You can use an appropriate subject line like, "Why CRC is better than checksums for some applications".
    Or you can continue to muddy up the waters here by discussing something that is of no value in this application.
    --
    Rick C.
    ++- Get 1,000 miles of free Supercharging
    ++- Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Mon Apr 24 16:37:14 2023
    From Newsgroup: comp.arch.embedded

    On 24/04/2023 09:32, Don Y wrote:
    On 4/22/2023 7:57 AM, David Brown wrote:
    However, in almost every case where CRC's might be useful, you have
    additional checks of the sanity of the data, and an all-zero or
    all-one data block would be rejected.  For example, Ethernet packets >>>> use CRC for integrity checking, but an attempt to send a packet type
    0 from MAC address 00:00:00:00:00:00 to address 00:00:00:00:00:00,
    of length 0, would be rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its CRC?
    Run the CRC first.  If it fails, decide how you are going to proceed
    or recover.

    That is usually the order, yes.  Sometimes you want "fail fast", such
    as dropping a packet that was not addressed to you (it doesn't matter
    if it was received correctly but for someone else, or it was addressed
    to you but the receiver address was corrupted - you are dropping the
    packet either way).  But usually you will run the CRC then look at the
    data.

    But the order doesn't matter - either way, you are still checking for
    valid data, and if the data is invalid, it does not matter if the CRC
    only passed by luck or by all zeros.

    You're assuming the CRC is supposed to *vouch* for the data.
    The CRC can be there simply to vouch for the *transport* of a
    datagram.

    I am assuming that the CRC is there to determine the integrity of the
    data in the face of possible unintentional errors. That's what CRC
    checks are for. They have nothing to do with the content of the data,
    or the type of the data package or image.

    As an example of the use of CRC's in messaging, look at Ethernet frames:

    <https://en.wikipedia.org/wiki/Ethernet_frame>

    The CRC does not care about the content of the data it protects.


    So, use a version-specific CRC on the packet.  If it fails, then
    either the data in the packet has been corrupted (which could just
    as easily have involved an embedded "interface version" parameter);
    or the packet was formed with the wrong CRC.

    If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
    why bother looking at a "protocol version" parameter?  Would
    you ALSO want to verify all the rest of the parameters?


    I'm sorry, I simply cannot see your point. Identifying the version of a protocol, or other protocol type information, is a totally orthogonal
    task to ensuring the integrity of the data. The concepts should be
    handled separately.


    What term would you have me use to indicate a "bias" applied to a CRC
    algorithm?

    Well, first I'd note that any kind of modification to the basic CRC
    algorithm is pointless from the viewpoint of its use as an integrity
    check.  (There have been, mostly historically, some justifications in
    terms of implementation efficiency.  For example, bit and byte
    re-ordering could be done to suit hardware bit-wise implementations.)

    Otherwise I'd say you are picking a specific initial value if that is
    what you are doing, or modifying the final value (inverting it or
    xor'ing it with a fixed value).  There is, AFAIK, no specific terms
    for these - and I don't see any benefit in having one.  Misusing the
    term "salt" from cryptography is certainly not helpful.

    Salt just ensures that you can differentiate between functionally identical values.  I.e., in a CRC, it differentiates between the "0x0000" that CRC-1 generates from the "0x0000" that CRC-2 generates.

    Can we agree that this is called an "initial value", not "salt" ?


    You don't see the parallel to ensuring that *my* use of "Passw0rd" is
    encoded in a different manner than *your* use of "Passw0rd"?

    No. They are different things.

    An important difference is that adding "salt" to a password hash is an important security feature. Picking a different initial value for a CRC instead of having appropriate protocol versioning in the data (or a surrounding envelope) is a misfeature.

    The second difference is the purpose of the hashing. The CRC here is
    for data integrity - spotting mistakes in the data during transfer or
    storage. The hash in a password is for security, avoiding the password
    ever being transmitted or stored in plain text.

    Any coincidence in the the way these might be implemented is just that - coincidence.



    See the RMI desciption.

    I'm sorry, I have no idea what "RMI" is or where it is described.
    You've mentioned that abbreviation twice, but I can't figure it out.

    <https://en.wikipedia.org/wiki/RMI>
    <https://en.wikipedia.org/wiki/OCL>

    Nothing magical with either term.

    I looked up RMI on Wikipedia before asking, and saw nothing of relevance
    to CRC's or checksums. I noticed no mention of "OCL" in your posts, and looking it up on Wikipedia gives no clues.

    So for now, I'll assume you don't want anyone to know what you meant and
    I can safely ignore anything you write in connection with the terms.


    OTOH, "salting" the calculation so that it is expected to yield
    a value of 0x13 means *those* situations will be flagged as errors
    (and a different set of situations will sneak by, undetected).

    And that gives you exactly /zero/ benefit.

    See above.

    I did. Zero benefit.

    Actually, it is worse than useless - it makes it harder to identify the protocol, and reduces the information content of the CRC check.


    You run your hash algorithm, and check for the single value that
    indicates no errors.  It does not matter if that number is 0, 0x13, or
    - often more
    -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    As you've admitted, it doesn't matter.  So, why wouldn't I opt to have
    an algorithm for THIS interface give me a result that is EXPECTED
    for this protocol?  What value picking "0"?


    A /single/ result does not matter (other than needlessly complicating
    things). Having multiple different valid results /does/ matter.

    That is why you need to distinguish between the two possibilities.
    If you don't have to worry about malicious attacks, a 32-bit CRC
    takes a dozen lines of C code and a 1 KB table, all running
    extremely efficiently.  If security is an issue, you need digital
    signatures - an RSA-based signature system is orders of magnitude
    more effort in both development time and in run time.

    It's considerably more expensive AND not fool-proof -- esp if the
    attacker knows you are signing binaries.  "OK, now I need to find
    WHERE the signature is verified and just patch that "CALL" out
    of the code".

    I'm not sure if that is a straw-man argument, or just showing your
    ignorance of the topic.  Do you really think security checks are done
    by the program you are trying to send securely?  That would be like
    trying to have building security where people entering the building
    look at their own security cards.

    Do YOU really think we all design applications that run in PCs where some CLOSED OS performs these tests in a manner that can't be subverted?

    Do you bother to read my posts at all? Or do you prefer to make up
    things that you imagine I write, so that you can make nonsensical
    attacks on them? Certainly there is no sane reading of my posts
    (written and sent from an /open/ OS) where "do not rely on security by obscurity" could be taken to mean "rely on obscured and closed platforms".

    *WE* (tend to) write ALL the code in the products developed, here.
    So, whether it's the POST WE wrote that is performing the test or
    the loader WE wrote, it's still *our* program.

    Yes, we ARE looking at our own security cards!

    Manufacturers *try* to hide ("obscurity") details of these mechanisms
    in an attempt to improve effective security.  But, there's nothing
    that makes these guarantees.

    Why are you trying to "persuade" me that manufacturer obscurity is a bad thing? You have been promoting obscurity of algorithms as though it
    were helpful for security - I have made clear that it is not. Are you
    getting your own position mixed up with mine?


    Give me the sources for Windows (Linux, *BSD, etc.) and I can
    subvert all the state-of-the-art digital signing used to ensure
    binaries aren't altered.  Nothing *outside* the box is involved
    so, by definition, everything I need has to reside *in* the box.

    No, you can't. The sources for Linux and *BSD /are/ all freely
    available. The private signing keys used by, for example, Red Hat or
    Debian, are /not/ freely available. You cannot make changes to a Red
    Hat or Debian package that will pass the security checks - you are
    unable to sign the packages.

    This is precisely because something /outside/ the box /is/ involved -
    the private half of the public/private key used for signing. The public
    half - and all the details of the algorithms - is easily available to
    let people verify the signature, but the private half is kept secret.


    (Sorry, but I've skipped and snipped the rest. I simply don't have time
    to go through it in detail. If others find it useful or interesting,
    that's great, but there has to be limits somewhere.)



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Thu Apr 27 18:26:37 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is calculated.
    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not signed
    an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which released
    their code under MIT, the licensing has not been resolved yet
    but the patch is in binutils git, but reverted.

    You would write (IIRC):
    DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in the
    final link stage.

    /Ulf


    I'm not thinking anything too fancy, like a CRC, but rather a simple modulo N addition, maybe N being 2^16.

    I keep thinking of using a placeholder, but that doesn't seem to work out in any useful way. Even if you try to anticipate the impact of adding the checksum, that only gives you a different checksum, that you then need to anticipate further... ad infinitum.

    I'm not thinking of any special checksum generator that excludes the checksum data. That would be too messy.

    I keep thinking there is a different way of looking at this to achieve the result I want...

    Maybe I can prove it is impossible. Assume the file checksums to X when the checksum data is zero. The goal would then be to include the checksum data value Y in the file, that would change X to Y. Given the properties of the module N checksum, this would appear to be impossible for the general case, unless... Add another data value, called, checksum normalizer. This data value checksums with the original checksum to give the result zero. Then, when the checksum is also added, the resulting checksum is, in fact, the checksum. Another way of looking at this is to add a value that combines with the added checksum, to be zero, leaving the original checksum intact.

    This might be inordinately hard for a CRC, but a simple checksum would not be an issue, I think. At least, this could work in software, where data can be included in an image file as itself. In a device like an FPGA, it might not be included in the bit stream file so directly... but that might depend on where in the device it is inserted. Memory might have data that is stored as itself. I'll need to look into that.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Thu Apr 27 18:36:55 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-20 kl. 22:26, skrev David Brown:
    On 20/04/2023 18:45, Rick C wrote:
    On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner
    wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C
    <gnuarm.del...@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed
    a checksum in a code module to report the checksum, is there a
    way of doing this? It's a bit like being your own grandfather, I
    think.
    Take a look at the old xmodem/ymodem CRC. It was designed such
    that when the CRC was sent immediately following the data, a
    receiver computing CRC over the whole incoming packet (data and CRC
    both) would get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use
    xmodem/ymodem.
    I'm not thinking anything too fancy, like a CRC, but rather a
    simple modulo N addition, maybe N being 2^16.
    Sorry, I don't know a way to do it with a modular checksum. YMMV,
    but I think 16-bit CRC is pretty simple.

    George

    CRC is not complicated, but I would not know how to calculate an
    inserted value to force the resulting CRC to zero.  How do you do
    that?

    You "insert" the value at the end.  Anything else is insane.

    In all projects I have been involved with, the application binary starts
    with a header looking like this.


    MAGIC WORD 1
    CRC
    Entry Point
    Size
    other info...
    MAGIC WORD 2
    APPLICATION_START
    ...
    APPLICATION_END (aligned with flash sector)


    The bootloader first checks the two magic words.
    It then computes CRC on the header (from Entry Point) to APPLICATION_END

    I ported the IAR ielftool (open source) to Linux at https://github.com/emagii/ielftool

    This can insert the CRC in the ELF file, but needs tweaks to work
    with an ELF file generated by the GNU tools.

    /Ulf



    CRC's are quite good hashes, for suitable sized data.  There are perhaps some special cases, but basically you'd be doing trial-and-error
    searches to find an inserted value that gives you a zero CRC overall.
    2^16 is not an overwhelming search space, but the whole idea is pointless.


    Even so, I'm not trying to validate the file.  I'm trying to come up
    with a substitute for a time stamp or version number.  I don't want
    to have to rely on my consistency in handling the version number
    correctly.  This would be a backup in case there was more than one
    version released, even only within the "lab", that were different.  A
    checksum that could be read by the controlling software would do the
    job.

    A CRC is fine for that.


    I have run into this before, where the version number was not a 100%
    indication of the uniqueness of an executable.  The checksum would be
    a second indicator.

    I should mention that I'm not looking for a solution that relies on
    any specific details of the tools.


    A table-based CRC is easy, runs quickly, and can be quickly ported to
    pretty much any language (the C and Python code, for example, is almost
    the same).

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Thu Apr 27 18:42:34 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-22 kl. 05:14, skrev Rick C:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique, regardless
    of what the version number says. Version numbers are set manually
    and not always done correctly. I'm looking for something as a backup
    so that if the checksums are different, I can be sure the versions
    are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of the
    image. Any change in the image will show up as a change in the crc.

    No one is trying to detect changes in the image. I'm trying to label the image in a way that can be read in operation. I'm using the checksum simply because that is easy to generate. I've had problems with version numbering in the past. It will be used, but I want it supplemented with a number that will change every time the design changes, at least with a high probability, such as 1 in 64k.


    Another thing I added (and was later removed) was a timestamp directive.
    A 64 bit integer with the number of seconds since 1970-01-01 00:00.

    /Ulf


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rick C@gnuarm.deletethisbit@gmail.com to comp.arch.embedded on Thu Apr 27 10:09:27 2023
    From Newsgroup: comp.arch.embedded

    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is calculated.
    That assumes there is a linker. How does the application access this information?
    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not signed
    an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which released
    their code under MIT, the licensing has not been resolved yet
    but the patch is in binutils git, but reverted.

    You would write (IIRC):
    DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in the final link stage.
    You are making a lot of assumptions about the tools. I'm pretty sure they don't apply to my case. I'm not at all clear how this is workable, anyway. Adding the checksum to the file, changes the checksum, which is where this conversation started... unless I'm missing something significant.
    --
    Rick C.
    +++ Get 1,000 miles of free Supercharging
    +++ Tesla referral code - https://ts.la/richard11209
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Thu Apr 27 21:29:05 2023
    From Newsgroup: comp.arch.embedded

    On 2023-04-27 20:09, Rick C wrote:
    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.

    That assumes there is a linker.


    Almost all toolchains have a linker.


    How does the application access this information?


    In Ulf's suggestion, it seems the DIGEST directive emits 8 bytes of
    checksum at the current point (usually the linker "." symbol). I assume
    one can give that point in the image a linkage symbol, perhaps like

    _checksum DIGEST "CRC64-ECMA", (from, to)

    or like

    _checksum EQU. .
    DIGEST "CRC64-ECMA", (from, to)


    (This is schematic linker code, not necessarily proper syntax.)

    One can then from the application code access the "checksum" location as
    an externally defined object, say:

    extern uint8[8] checksum;

    The linker will connect that C identifier to the actual address of the
    DIGEST checksum. Here I assumed that the C compiler mangles C
    identifiers into linkage symbols by prefixing an underscore; YMMV.


    You are making a lot of assumptions about the tools. I'm pretty sure
    they don't apply to my case. I'm not at all clear how this is
    workable, anyway. Adding the checksum to the file, changes the
    checksum, which is where this conversation started... unless I'm
    missing something significant.


    But you have insisted that your "checksum" is for the purpose of
    identifying the version of the program, not for checking the integrity
    of the memory image. If so, that checksum does not have to be the
    checksum of the whole memory image, as long as it is the checksum of the
    part of the image that contains the actual code and constant data, and
    so will change according to changes in those parts of the image.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Thu Apr 27 21:39:51 2023
    From Newsgroup: comp.arch.embedded

    Um, just noting some typos in my last, with apologies:

    On 2023-04-27 21:29, Niklas Holsti wrote:

      _checksum  EQU.   .

    should be

    _checksum EQU .

    (Thunderbird inserted an extra period out of "friendliness"...)

       extern uint8[8] checksum;

    should be (my C is rusty):

    extern uint8 checksum[8];

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Thu Apr 27 22:36:46 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-27 kl. 19:09, skrev Rick C:
    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a checksum in a code module to report the checksum, is there a way of doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.

    That assumes there is a linker. How does the application access this information?


    Linker command file
    public CRC64; start, stop
    HEADER = .;
    QUAD(MAGIC);
    CRC64 = .;
    DIGEST "CRC64-ECMA", (start, stop)
    start = .;
    # Your data to be protected
    ...
    stop = .;

    C source code.

    extern uint64_t CRC64;
    extern char* start;
    extern char* stop;

    uint64_t crc;

    crc64 = calc_crc64_ecma(start, stop);
    if (crc64 == CRC64) {
    /* everything is OK */
    }

    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not signed
    an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which released
    their code under MIT, the licensing has not been resolved yet
    but the patch is in binutils git, but reverted.

    You would write (IIRC):
    DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in the
    final link stage.

    You are making a lot of assumptions about the tools. I'm pretty sure they don't apply to my case. I'm not at all clear how this is workable, anyway. Adding the checksum to the file, changes the checksum, which is where this conversation started... unless I'm missing something significant.

    I am assuming that no tool support this off the shelg, but the patches
    are inside binutils, but reverted.

    /Ulf
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Thu Apr 27 22:44:54 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-27 kl. 20:29, skrev Niklas Holsti:
    On 2023-04-27 20:09, Rick C wrote:
    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>> Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.

    That assumes there is a linker.


    Almost all toolchains have a linker.


    How does the application access this information?


    In Ulf's suggestion, it seems the DIGEST directive emits 8 bytes of
    checksum at the current point (usually the linker "." symbol). I assume
    one can give that point in the image a linkage symbol, perhaps like

      _checksum  DIGEST "CRC64-ECMA", (from, to)

    or like

      _checksum  EQU.   .
                 DIGEST "CRC64-ECMA", (from, to)


    (This is schematic linker code, not necessarily proper syntax.)

    One can then from the application code access the "checksum" location as
    an externally defined object, say:

       extern uint8[8] checksum;

    The linker will connect that C identifier to the actual address of the DIGEST checksum. Here I assumed that the C compiler mangles C
    identifiers into linkage symbols by prefixing an underscore; YMMV.


    Yes, that is more or less it.



    You are making a lot of assumptions about the tools.  I'm pretty sure
    they don't apply to my case.  I'm not at all clear how this is
    workable, anyway.  Adding the checksum to the file, changes the
    checksum, which is where this conversation started... unless I'm
    missing something significant.
    No, you reserve room for the checksum, but that needs to be outside
    the checked area.
    The address of the checksum needs to be known to the application.
    Also the limits of the checked area.
    That is why the application has a header in front in my projects.
    The application is started by the bootloader, which checks
    a number of things before the application is started.
    The application can read the header as well to allow checking
    the code area at runtime.





    But you have insisted that your "checksum" is for the purpose of
    identifying the version of the program, not for checking the integrity
    of the memory image. If so, that checksum does not have to be the
    checksum of the whole memory image, as long as it is the checksum of the part of the image that contains the actual code and constant data, and
    so will change according to changes in those parts of the image.


    /Ulf
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Niklas Holsti@niklas.holsti@tidorum.invalid to comp.arch.embedded on Fri Apr 28 01:10:02 2023
    From Newsgroup: comp.arch.embedded

    On 2023-04-27 23:36, Ulf Samuelsson wrote:
    Den 2023-04-27 kl. 19:09, skrev Rick C:
    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>> Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.

    That assumes there is a linker.  How does the application access this
    information?


    Linker command file
            public CRC64; start, stop
            HEADER = .;
            QUAD(MAGIC);
        CRC64 = .;
            DIGEST "CRC64-ECMA", (start, stop)
            start = .;
            # Your data to be protected
            ...
            stop = .;

    C source code.

    extern uint64_t CRC64;
    extern char* start;
    extern char* stop;

    uint64_t crc;

    crc64 = calc_crc64_ecma(start, stop);
    if (crc64 == CRC64) {
       /* everything is OK */
    }


    I'm nit-picking, but that C code does not look right to me. The extern declarations for "start" and "stop" claim them to be names of memory
    locations that contain addresses, but the linker file just places them
    at the starting and one-past-end locations of the block to be protected.
    So the "start" variable contains the first bytes of the "data to be protected", and the contents of the "stop" variable are not defined
    because it is placed after the "data to be protected", where no code or
    data is loaded (it seems).

    It seems to me that the call to calc_crc64_ecma should get the addresses
    of "start" and "stop" as arguments (&start, &stop), instead of their
    values. But perhaps calc_crc64_ecma is not a function, but a macro that
    can itself take the addresses of its parameters.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 09:12:53 2023
    From Newsgroup: comp.arch.embedded

    On 27/04/2023 18:36, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 22:26, skrev David Brown:
    On 20/04/2023 18:45, Rick C wrote:
    On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner
    wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C
    <gnuarm.del...@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed
    a checksum in a code module to report the checksum, is there a
    way of doing this? It's a bit like being your own grandfather, I
    think.
    Take a look at the old xmodem/ymodem CRC. It was designed such
    that when the CRC was sent immediately following the data, a
    receiver computing CRC over the whole incoming packet (data and CRC
    both) would get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use
    xmodem/ymodem.
    I'm not thinking anything too fancy, like a CRC, but rather a
    simple modulo N addition, maybe N being 2^16.
    Sorry, I don't know a way to do it with a modular checksum. YMMV,
    but I think 16-bit CRC is pretty simple.

    George

    CRC is not complicated, but I would not know how to calculate an
    inserted value to force the resulting CRC to zero.  How do you do
    that?

    You "insert" the value at the end.  Anything else is insane.

    In all projects I have been involved with, the application binary starts
    with a header looking like this.


    MAGIC WORD 1
    CRC
    Entry Point
    Size
    other info...
    MAGIC WORD 2
    APPLICATION_START
    ...
    APPLICATION_END (aligned with flash sector)


    The bootloader first checks the two magic words.
    It then computes CRC on the header (from Entry Point) to APPLICATION_END

    I ported the IAR ielftool (open source) to Linux at https://github.com/emagii/ielftool

    This can insert the CRC in the ELF file, but needs tweaks to work
    with an ELF file generated by the GNU tools.

    /Ulf

    That can work for some microcontrollers, but is unsuitable for others -
    it depends on how the flash is organised. For an msp430, for example,
    it would be fine, as the interrupt vectors (including the reset vector)
    are at the end of flash. But for most ARM Cortex M devices, it would
    not be suitable - they expect the reset vector and initial stack pointer
    at the start of the flash image. Some devices have a boot ROM, and then
    you have to match their specifics for the header - or you can have your
    own boot program, and make the header how ever you like.

    I am absolutely a fan of having some kind of header like this (and
    sometimes even a human-readable copyright notice, identifier and version information). And having it as near the beginning as possible is good.
    But for many microcontrollers, having it at the start is not feasible.
    And if you can't put the CRC at the start like you do, you have to put
    it at the end of the image.


    I've never really thought about trying to inject a CRC into an elf file.
    I use elfs (or should that be "elves" ?) for debugging, not flash programming. And usually the main concern for having a CRC at the end
    of the image is when you have an online update of some kind, to check
    that nothing has gone wrong during the transfer or in-field update.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 09:20:33 2023
    From Newsgroup: comp.arch.embedded

    On 27/04/2023 18:42, Ulf Samuelsson wrote:
    Den 2023-04-22 kl. 05:14, skrev Rick C:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique, regardless
    of what the version number says. Version numbers are set manually
    and not always done correctly. I'm looking for something as a backup
    so that if the checksums are different, I can be sure the versions
    are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of the
    image. Any change in the image will show up as a change in the crc.

    No one is trying to detect changes in the image.  I'm trying to label
    the image in a way that can be read in operation.  I'm using the
    checksum simply because that is easy to generate.  I've had problems
    with version numbering in the past.  It will be used, but I want it
    supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.


    Another thing I added (and was later removed) was a timestamp directive.
    A 64 bit integer with the number of seconds since 1970-01-01 00:00.


    Timestamping a build in some way (as part of the "make", using __DATE__
    or __TIME__ in source code, or some feature of a revision control
    system) is very tempting, and can be helpful for tracking exactly what
    code you have on the system.

    However, IMHO having reproducible builds is much more valuable. I am
    not happy with a project build until I am getting identical binaries
    built on multiple hosts (Windows and Linux). That's how you can be
    absolutely sure of what code went into a particular binary, even years
    or decades later.

    A compromise that can work is to distinguish development builds and
    production builds, and have timestamping in development builds. That
    also reduces the rate at which your minor version number or build number
    goes up, and avoids endless changes to your "version.h" include file.




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 09:24:17 2023
    From Newsgroup: comp.arch.embedded

    On 27/04/2023 18:26, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing.  If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this?  It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is calculated.
    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not signed
    an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which released
    their code under MIT, the licensing has not been resolved yet
    but the patch is in binutils git, but reverted.

    You would write (IIRC):
       DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in the final link stage.

    /Ulf


    I like that. Thanks for doing that work.

    Is there also a way to get the length of the final link, and insert it
    near the beginning of the image? I suppose that would be another kind
    of DIGEST where the algorithm is simply (to - from). (I assume that
    "to" and "from" may be linker symbols.)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 09:33:49 2023
    From Newsgroup: comp.arch.embedded

    On 27/04/2023 20:29, Niklas Holsti wrote:
    On 2023-04-27 20:09, Rick C wrote:
    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>> Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.

    That assumes there is a linker.


    Almost all toolchains have a linker.


    It is possible that Rick is using Forth, rather than C (or other
    languages traditionally compiled in a similar manner, such as C++ and
    Ada). There are also some commercial C toolchains for brain-dead 8-bit
    CISC devices that are monolithic and offer very little control over the linking.

    Ulf is correct that the ideal place to handle this is part of the
    linking process. I do it with a post-link Python script run during the
    build, because the linkers I use can't handle this at the moment. But
    if Ulf's patch works its way into binutils then I'll be able to do it
    directly during linking, which is neater. (I will still have post-link scripts to handle things like renaming image files according to version, making zips for sending to customers, etc. - linkers can't do
    /everything/ !)


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 09:38:17 2023
    From Newsgroup: comp.arch.embedded

    On 27/04/2023 22:44, Ulf Samuelsson wrote:
    Den 2023-04-27 kl. 20:29, skrev Niklas Holsti:
    On 2023-04-27 20:09, Rick C wrote:



    You are making a lot of assumptions about the tools.  I'm pretty sure
    they don't apply to my case.  I'm not at all clear how this is
    workable, anyway.  Adding the checksum to the file, changes the
    checksum, which is where this conversation started... unless I'm
    missing something significant.
    No, you reserve room for the checksum, but that needs to be outside
    the checked area.
    The address of the checksum needs to be known to the application.

    The address here could have a symbol, and then declared "extern" in the
    C code - it would not have to be a known numerical address. But if the
    image is checked or started from another program (such as a boot
    program), you need an absolute address somewhere to chain this all together.

    Also the limits of the checked area.
    That is why the application has a header in front in my projects.
    The application is started by the bootloader, which checks
    a number of things before the application is started.
    The application can read the header as well to allow checking
    the code area at runtime.


    Or for my preferences, the CRC "DIGEST" would be put at the end of the
    image, rather than near the start. Then the "from, to" range would
    cover the entire image except for the final CRC. But I'd have a similar directive for the length of the image at a specific area near the start.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Fri Apr 28 10:35:20 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 09:12, skrev David Brown:
    On 27/04/2023 18:36, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 22:26, skrev David Brown:
    On 20/04/2023 18:45, Rick C wrote:
    On Thursday, April 20, 2023 at 11:33:28 AM UTC-4, George Neuner
    wrote:
    On Wed, 19 Apr 2023 19:06:33 -0700 (PDT), Rick C
    <gnuarm.del...@gmail.com> wrote:

    This is a bit of the chicken and egg thing. If you want a embed
    a checksum in a code module to report the checksum, is there a
    way of doing this? It's a bit like being your own grandfather, I
    think.
    Take a look at the old xmodem/ymodem CRC. It was designed such
    that when the CRC was sent immediately following the data, a
    receiver computing CRC over the whole incoming packet (data and CRC
    both) would get a result of zero.

    But AFAIK it doesn't work with CCITT equation(s) - you have to use
    xmodem/ymodem.
    I'm not thinking anything too fancy, like a CRC, but rather a
    simple modulo N addition, maybe N being 2^16.
    Sorry, I don't know a way to do it with a modular checksum. YMMV,
    but I think 16-bit CRC is pretty simple.

    George

    CRC is not complicated, but I would not know how to calculate an
    inserted value to force the resulting CRC to zero.  How do you do
    that?

    You "insert" the value at the end.  Anything else is insane.

    In all projects I have been involved with, the application binary starts
    with a header looking like this.


    MAGIC WORD 1
    CRC
    Entry Point
    Size
    other info...
    MAGIC WORD 2
    APPLICATION_START
    ...
    APPLICATION_END (aligned with flash sector)


    The bootloader first checks the two magic words.
    It then computes CRC on the header (from Entry Point) to APPLICATION_END

    I ported the IAR ielftool (open source) to Linux at
    https://github.com/emagii/ielftool

    This can insert the CRC in the ELF file, but needs tweaks to work
    with an ELF file generated by the GNU tools.

    /Ulf

    That can work for some microcontrollers, but is unsuitable for others -
    it depends on how the flash is organised.  For an msp430, for example,
    it would be fine, as the interrupt vectors (including the reset vector)
    are at the end of flash.  But for most ARM Cortex M devices, it would
    not be suitable - they expect the reset vector and initial stack pointer
    at the start of the flash image.  Some devices have a boot ROM, and then you have to match their specifics for the header - or you can have your
    own boot program, and make the header how ever you like.


    All projects I am involved with have a custom bootloader.
    If there is a problem with the reset vector, then the program will fail immediately.
    The CRC is right after the initial vector table.
    The bootloader application contains a copy of the vector table.

    THe first thing the bootloader does is to check the CRC from right after
    the CRC. Then it compares the vector table with the copy.

    The header is only for the application.


    I am absolutely a fan of having some kind of header like this (and
    sometimes even a human-readable copyright notice, identifier and version information).  And having it as near the beginning as possible is good.
    But for many microcontrollers, having it at the start is not feasible.
    And if you can't put the CRC at the start like you do, you have to put
    it at the end of the image.


    I've never really thought about trying to inject a CRC into an elf file.
     I use elfs (or should that be "elves" ?) for debugging, not flash programming.  And usually the main concern for having a CRC at the end
    of the image is when you have an online update of some kind, to check
    that nothing has gone wrong during the transfer or in-field update.



    The last bootloader I wrote download using Y-Modem which has CRC
    checking. Since it had more RAM than internal flash, the whole
    application was downloaded to RAM first, and then when everything is OK,
    the flash can be programmed. Finally, the header is analyzed and the
    flash contents checked. There is absolutely no need to have the CRC at
    the end since the CRC result is stored in a known location.

    /Ulf


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Fri Apr 28 10:44:00 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 09:20, skrev David Brown:
    On 27/04/2023 18:42, Ulf Samuelsson wrote:
    Den 2023-04-22 kl. 05:14, skrev Rick C:
    On Friday, April 21, 2023 at 11:02:28 AM UTC-4, David Brown wrote:
    On 21/04/2023 14:12, Rick C wrote:

    This is simply to be able to say this version is unique, regardless
    of what the version number says. Version numbers are set manually
    and not always done correctly. I'm looking for something as a backup >>>>> so that if the checksums are different, I can be sure the versions
    are not the same.

    The less work involved, the better.

    Run a simple 32-bit crc over the image. The result is a hash of the
    image. Any change in the image will show up as a change in the crc.

    No one is trying to detect changes in the image.  I'm trying to label
    the image in a way that can be read in operation.  I'm using the
    checksum simply because that is easy to generate.  I've had problems
    with version numbering in the past.  It will be used, but I want it
    supplemented with a number that will change every time the design
    changes, at least with a high probability, such as 1 in 64k.


    Another thing I added (and was later removed) was a timestamp directive.
    A 64 bit integer with the number of seconds since 1970-01-01 00:00.


    Timestamping a build in some way (as part of the "make", using __DATE__
    or __TIME__ in source code, or some feature of a revision control
    system) is very tempting, and can be helpful for tracking exactly what
    code you have on the system.

    However, IMHO having reproducible builds is much more valuable.  I am
    not happy with a project build until I am getting identical binaries
    built on multiple hosts (Windows and Linux).  That's how you can be absolutely sure of what code went into a particular binary, even years
    or decades later.

    With the timestamp located in the header, you can simply compare the non-header area.
    Make with __DATE__ or __TIME__ will tell you when that module is
    compiled, not when the program is generated.
    That is why TIMESTAMP is best generated in the linker.

    /Ulf


    A compromise that can work is to distinguish development builds and production builds, and have timestamping in development builds.  That
    also reduces the rate at which your minor version number or build number goes up, and avoids endless changes to your "version.h" include file.





    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Fri Apr 28 10:50:02 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 09:38, skrev David Brown:
    On 27/04/2023 22:44, Ulf Samuelsson wrote:
    Den 2023-04-27 kl. 20:29, skrev Niklas Holsti:
    On 2023-04-27 20:09, Rick C wrote:



    You are making a lot of assumptions about the tools.  I'm pretty sure >>>> they don't apply to my case.  I'm not at all clear how this is
    workable, anyway.  Adding the checksum to the file, changes the
    checksum, which is where this conversation started... unless I'm
    missing something significant.
    No, you reserve room for the checksum, but that needs to be outside
    the checked area.
    The address of the checksum needs to be known to the application.

    The address here could have a symbol, and then declared "extern" in the
    C code - it would not have to be a known numerical address.  But if the image is checked or started from another program (such as a boot
    program), you need an absolute address somewhere to chain this all
    together.

    The header is declared as a struct.


    Also the limits of the checked area.
    That is why the application has a header in front in my projects.
    The application is started by the bootloader, which checks
    a number of things before the application is started.
    The application can read the header as well to allow checking
    the code area at runtime.


    Or for my preferences, the CRC "DIGEST" would be put at the end of the image, rather than near the start.  Then the "from, to" range would
    cover the entire image except for the final CRC.  But I'd have a similar directive for the length of the image at a specific area near the start.


    I really do not see a benefit of splitting the meta information about
    the image to two separate locations.

    The bootloader uses the struct for all checks.
    It is a much simpler implementation once the tools support it.

    You might find it easier to write a tool which adds the CRC at the end,
    but that is a different issue.

    Occam's Razor!

    /Ulf

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Fri Apr 28 10:54:08 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 00:10, skrev Niklas Holsti:
    On 2023-04-27 23:36, Ulf Samuelsson wrote:
    Den 2023-04-27 kl. 19:09, skrev Rick C:
    On Thursday, April 27, 2023 at 12:26:47 PM UTC-4, Ulf Samuelsson wrote: >>>> Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing. If you want a embed a
    checksum in a code module to report the checksum, is there a way of >>>>> doing this? It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.

    That assumes there is a linker.  How does the application access this
    information?


    Linker command file
             public CRC64; start, stop
             HEADER = .;
             QUAD(MAGIC);
         CRC64 = .;
             DIGEST "CRC64-ECMA", (start, stop)
             start = .;
             # Your data to be protected
             ...
             stop = .;

    C source code.

    extern uint64_t CRC64;
    extern char* start;
    extern char* stop;

    uint64_t crc;

    crc64 = calc_crc64_ecma(start, stop);
    if (crc64 == CRC64) {
        /* everything is OK */
    }


    I'm nit-picking, but that C code does not look right to me. The extern declarations for "start" and "stop" claim them to be names of memory locations that contain addresses, but the linker file just places them
    at the starting and one-past-end locations of the block to be protected.
    So the "start" variable contains the first bytes of the "data to be protected", and the contents of the "stop" variable are not defined
    because it is placed after the "data to be protected", where no code or
    data is loaded (it seems).

    It seems to me that the call to calc_crc64_ecma should get the addresses
    of "start" and "stop" as arguments (&start, &stop), instead of their
    values. But perhaps calc_crc64_ecma is not a function, but a macro that
    can itself take the addresses of its parameters.

    Whatever,
    I did not put a lot of thought into that, and certainly did not check
    it. The important thing is that you can declare labels in the linker
    and use them in the code through extern declarations.
    /Ulf
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Fri Apr 28 10:56:01 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 09:24, skrev David Brown:
    On 27/04/2023 18:26, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing.  If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this?  It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.
    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not signed
    an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which released
    their code under MIT, the licensing has not been resolved yet
    but the patch is in binutils git, but reverted.

    You would write (IIRC):
        DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in
    the final link stage.

    /Ulf


    I like that.  Thanks for doing that work.

    Is there also a way to get the length of the final link, and insert it
    near the beginning of the image?  I suppose that would be another kind
    of DIGEST where the algorithm is simply (to - from).  (I assume that
    "to" and "from" may be linker symbols.)



    app_size = .;
    LONG(to-from);

    should work using the GNU linker.
    /Ulf

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 15:04:45 2023
    From Newsgroup: comp.arch.embedded

    On 28/04/2023 10:50, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:38, skrev David Brown:
    On 27/04/2023 22:44, Ulf Samuelsson wrote:
    Den 2023-04-27 kl. 20:29, skrev Niklas Holsti:
    On 2023-04-27 20:09, Rick C wrote:



    You are making a lot of assumptions about the tools.  I'm pretty sure >>>>> they don't apply to my case.  I'm not at all clear how this is
    workable, anyway.  Adding the checksum to the file, changes the
    checksum, which is where this conversation started... unless I'm
    missing something significant.
    No, you reserve room for the checksum, but that needs to be outside
    the checked area.
    The address of the checksum needs to be known to the application.

    The address here could have a symbol, and then declared "extern" in
    the C code - it would not have to be a known numerical address.  But
    if the image is checked or started from another program (such as a
    boot program), you need an absolute address somewhere to chain this
    all together.

    The header is declared as a struct.


    Also the limits of the checked area.
    That is why the application has a header in front in my projects.
    The application is started by the bootloader, which checks
    a number of things before the application is started.
    The application can read the header as well to allow checking
    the code area at runtime.


    Or for my preferences, the CRC "DIGEST" would be put at the end of the
    image, rather than near the start.  Then the "from, to" range would
    cover the entire image except for the final CRC.  But I'd have a
    similar directive for the length of the image at a specific area near
    the start.


    I really do not see a benefit of splitting the meta information about
    the image to two separate locations.

    The bootloader uses the struct for all checks.
    It is a much simpler implementation once the tools support it.

    You might find it easier to write a tool which adds the CRC at the end,
    but that is a different issue.

    Occam's Razor!


    There are different needs for different projects - and more than one way
    to handle them. I find adding a CRC at the end of the image works best
    for me, but I have no problem appreciating that other people have
    different solutions.




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Fri Apr 28 15:09:00 2023
    From Newsgroup: comp.arch.embedded

    On 28/04/2023 10:56, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:24, skrev David Brown:
    On 27/04/2023 18:26, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing.  If you want a embed a
    checksum in a code module to report the checksum, is there a way of
    doing this?  It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC is
    calculated.
    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not signed >>> an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which released
    their code under MIT, the licensing has not been resolved yet
    but the patch is in binutils git, but reverted.

    You would write (IIRC):
        DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in
    the final link stage.

    /Ulf


    I like that.  Thanks for doing that work.

    Is there also a way to get the length of the final link, and insert it
    near the beginning of the image?  I suppose that would be another kind
    of DIGEST where the algorithm is simply (to - from).  (I assume that
    "to" and "from" may be linker symbols.)



       app_size = .;
       LONG(to-from);

    should work using the GNU linker.

    Will that work when placed earlier in the link than the definition of
    "to" ? I had assumed - perhaps completely incorrectly - that the linker
    would have to have established the value of "to" before its use in such
    an expression.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Sat Apr 29 23:02:15 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 15:09, skrev David Brown:
    On 28/04/2023 10:56, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:24, skrev David Brown:
    On 27/04/2023 18:26, Ulf Samuelsson wrote:
    Den 2023-04-20 kl. 04:06, skrev Rick C:
    This is a bit of the chicken and egg thing.  If you want a embed a >>>>> checksum in a code module to report the checksum, is there a way of >>>>> doing this?  It's a bit like being your own grandfather, I think.

    The proper way to do this is to have a directive in the linker.
    This reserves space for the CRC and defines the area where the CRC
    is calculated.
    I am not aware of any linker which support this.

    Two months ago, I added the DIGEST directive to binutils aka the GNU
    linker. It was committed, but then people realized that I had not
    signed
    an agreement with Free Software Foundation.
    Since part of the code I pushed was from a third party which
    released their code under MIT, the licensing has not been resolved yet >>>> but the patch is in binutils git, but reverted.

    You would write (IIRC):
        DIGEST "CRC64-ECMA", (from, to)
    and the linker would reserve 8 bytes which is filled with the CRC in
    the final link stage.

    /Ulf


    I like that.  Thanks for doing that work.

    Is there also a way to get the length of the final link, and insert
    it near the beginning of the image?  I suppose that would be another
    kind of DIGEST where the algorithm is simply (to - from).  (I assume
    that "to" and "from" may be linker symbols.)



        app_size = .;
        LONG(to-from);

    should work using the GNU linker.

    Will that work when placed earlier in the link than the definition of
    "to" ?  I had assumed - perhaps completely incorrectly - that the linker would have to have established the value of "to" before its use in such
    an expression.

    Yes, the GNU linker creates a list of statements with a known size
    updating the location counter for each statement.

    The expressions are evaluated in a later stage
    so you can add a DIGEST statement and compute the SIZE by
    "LONG(to-from);" before "to" and "from" are declared.

    /Ulf




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Sat Apr 29 23:03:16 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-28 kl. 15:04, skrev David Brown:
    On 28/04/2023 10:50, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:38, skrev David Brown:
    On 27/04/2023 22:44, Ulf Samuelsson wrote:
    Den 2023-04-27 kl. 20:29, skrev Niklas Holsti:
    On 2023-04-27 20:09, Rick C wrote:



    You are making a lot of assumptions about the tools.  I'm pretty sure >>>>>> they don't apply to my case.  I'm not at all clear how this is
    workable, anyway.  Adding the checksum to the file, changes the
    checksum, which is where this conversation started... unless I'm
    missing something significant.
    No, you reserve room for the checksum, but that needs to be outside
    the checked area.
    The address of the checksum needs to be known to the application.

    The address here could have a symbol, and then declared "extern" in
    the C code - it would not have to be a known numerical address.  But
    if the image is checked or started from another program (such as a
    boot program), you need an absolute address somewhere to chain this
    all together.

    The header is declared as a struct.


    Also the limits of the checked area.
    That is why the application has a header in front in my projects.
    The application is started by the bootloader, which checks
    a number of things before the application is started.
    The application can read the header as well to allow checking
    the code area at runtime.


    Or for my preferences, the CRC "DIGEST" would be put at the end of
    the image, rather than near the start.  Then the "from, to" range
    would cover the entire image except for the final CRC.  But I'd have
    a similar directive for the length of the image at a specific area
    near the start.


    I really do not see a benefit of splitting the meta information about
    the image to two separate locations.

    The bootloader uses the struct for all checks.
    It is a much simpler implementation once the tools support it.

    You might find it easier to write a tool which adds the CRC at the
    end, but that is a different issue.

    Occam's Razor!


    There are different needs for different projects - and more than one way
    to handle them.  I find adding a CRC at the end of the image works best
    for me, but I have no problem appreciating that other people have
    different solutions.




    I'd be curious to know WHY it works best for you.
    /Ulf
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Sun Apr 30 16:19:41 2023
    From Newsgroup: comp.arch.embedded

    On 29/04/2023 23:03, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 15:04, skrev David Brown:
    On 28/04/2023 10:50, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:38, skrev David Brown:


    Or for my preferences, the CRC "DIGEST" would be put at the end of
    the image, rather than near the start.  Then the "from, to" range
    would cover the entire image except for the final CRC.  But I'd have >>>> a similar directive for the length of the image at a specific area
    near the start.


    I really do not see a benefit of splitting the meta information about
    the image to two separate locations.

    The bootloader uses the struct for all checks.
    It is a much simpler implementation once the tools support it.

    You might find it easier to write a tool which adds the CRC at the
    end, but that is a different issue.

    Occam's Razor!


    There are different needs for different projects - and more than one
    way to handle them.  I find adding a CRC at the end of the image works
    best for me, but I have no problem appreciating that other people have
    different solutions.




    I'd be curious to know WHY it works best for you.
    /Ulf

    I regularly do not have a bootloader - I am not free to put a CRC at the
    start of the image. And if the bootloader itself needs to be updatable,
    it is again impossible to have the CRC (or any other metadata) at the
    start of the image. I want most of the metadata to be at a fixed
    location as close to the start as reasonably practical (such as after
    the vector table, or other microcontroller-specific information that
    might be used for flash security, early chip setup, etc.). If I am to
    have one single checksum for the image, which is what I prefer, then it
    has to be at the end of the image. For example, there might be :

    0x00000000 : vectors
    0x00000400 : external flash configuration block
    0x00000600 : program info metadata
    0x00001000 : main program
    : CRC

    There is no way to have the metadata or CRC at the start of the image,
    so the CRC goes at the end.

    It would be possible to have two CRCs - one that covers the vectors, configuration information, and metadata and is placed second last in the metadata block. A second CRC placed last in the metadata block would
    cover the main program - everything after the CRCs. That would let me
    have a single metadata block and no CRC at the end of the image.
    However, it would mean splitting the check in two, rather than one check
    for the whole image. I don't see that as a benefit.


    When making images that are started from a bootloader, I certainly
    /could/ put the CRC at the start. But I see no particular reason to do
    so - it makes a lot more sense to keep a similar format.

    (Bootloaders don't often have to check their own CRC - after all, even
    if the CRC fails there is usually little you can do about it, except
    charge on and hope for the best. But if the bootloader is updatable in system, then you want a CRC during the download procedure to check that
    you have got a good download copy before updating the flash.)







    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Wed May 3 00:15:27 2023
    From Newsgroup: comp.arch.embedded

    On 4/24/2023 7:37 AM, David Brown wrote:
    On 24/04/2023 09:32, Don Y wrote:
    On 4/22/2023 7:57 AM, David Brown wrote:
    However, in almost every case where CRC's might be useful, you have >>>>> additional checks of the sanity of the data, and an all-zero or all-one >>>>> data block would be rejected.  For example, Ethernet packets use CRC for
    integrity checking, but an attempt to send a packet type 0 from MAC >>>>> address 00:00:00:00:00:00 to address 00:00:00:00:00:00, of length 0, would
    be rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its CRC? >>>> Run the CRC first.  If it fails, decide how you are going to proceed
    or recover.

    That is usually the order, yes.  Sometimes you want "fail fast", such as >>> dropping a packet that was not addressed to you (it doesn't matter if it was
    received correctly but for someone else, or it was addressed to you but the
    receiver address was corrupted - you are dropping the packet either way). >>> But usually you will run the CRC then look at the data.

    But the order doesn't matter - either way, you are still checking for valid
    data, and if the data is invalid, it does not matter if the CRC only passed
    by luck or by all zeros.

    You're assuming the CRC is supposed to *vouch* for the data.
    The CRC can be there simply to vouch for the *transport* of a
    datagram.

    I am assuming that the CRC is there to determine the integrity of the data in
    the face of possible unintentional errors.  That's what CRC checks are for.
    They have nothing to do with the content of the data, or the type of the data
    package or image.

    Exactly. And, a CRC on *a* protocol can use ANY ALGORITHM that the protocol defines. Not some "canned one-size fits all" approach.

    As an example of the use of CRC's in messaging, look at Ethernet frames:

    <https://en.wikipedia.org/wiki/Ethernet_frame>

    The CRC  does not care about the content of the data it protects.

    AND, if the packet yielded an incorrect CRC, you can assume the
    data was corrupt... OR, you are looking at a different protocol
    and MISTAKING it for something that you *think* it might be.

    If I produce a stream of data, can you tell me what the checksum
    for THAT stream *should* be? You have to either be told what
    it is (and have a way of knowing what the checksum SHOULD be)
    *or* have to make some assumptions about it.

    If you have assumed wrong *or* if the data has been corrupt, then
    the CRC should fail. You don't care why it failed -- because you
    can't do anything about it. You just know that you can't use the data
    in the way you THOUGHT it could be used.

    So, use a version-specific CRC on the packet.  If it fails, then
    either the data in the packet has been corrupted (which could just
    as easily have involved an embedded "interface version" parameter);
    or the packet was formed with the wrong CRC.

    If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
    why bother looking at a "protocol version" parameter?  Would
    you ALSO want to verify all the rest of the parameters?

    I'm sorry, I simply cannot see your point.  Identifying the version of a protocol, or other protocol type information, is a totally orthogonal task to
    ensuring the integrity of the data.  The concepts should be handled separately.

    It is. A packet using protocol XYZ is delivered to port ABC.
    Port ABC *only* handles protocol XYZ. Anything else arriving there,
    with a potentially different checksum, is invalid. Even if, for example,
    byte number 27 happens to have the correct "magic number" for that
    protocol.

    Because the message doesn't obey the rules defined by the protocol
    FOR THAT PORT. What do I gain by insisting that byte number 27 must
    be 0x5A that the CRC doesn't already tell me?

    You are assuming the CRC has to identify the protocol. I didn't say that.
    All I said was the CRC has to be correct for THAT protocol.

    You likely don't use the same algorithm to compute the checksum of
    a boot image as you do to verify the integrity of a ethernet datagram.
    So, if you were presented with a stream of data, you wouldn't
    arbitrarily decide to try different CRCs to see which yielded correct
    results and, from that, *infer* the nature of the message.

    Why would you think I wouldn't expect *a* particular protocol to use
    a particular CRC?

    What term would you have me use to indicate a "bias" applied to a CRC
    algorithm?

    Well, first I'd note that any kind of modification to the basic CRC
    algorithm is pointless from the viewpoint of its use as an integrity check.
    (There have been, mostly historically, some justifications in terms of
    implementation efficiency.  For example, bit and byte re-ordering could be
    done to suit hardware bit-wise implementations.)

    Otherwise I'd say you are picking a specific initial value if that is what >>> you are doing, or modifying the final value (inverting it or xor'ing it with
    a fixed value).  There is, AFAIK, no specific terms for these - and I don't
    see any benefit in having one.  Misusing the term "salt" from cryptography
    is certainly not helpful.

    Salt just ensures that you can differentiate between functionally identical >> values.  I.e., in a CRC, it differentiates between the "0x0000" that CRC-1 >> generates from the "0x0000" that CRC-2 generates.

    Can we agree that this is called an "initial value", not "salt" ?

    It depends on how you implement it. The point is to produce
    different results for the same polynmomial.

    You don't see the parallel to ensuring that *my* use of "Passw0rd" is
    encoded in a different manner than *your* use of "Passw0rd"?

    No.  They are different things.

    An important difference is that adding "salt" to a password hash is an important security feature.  Picking a different initial value for a CRC instead of having appropriate protocol versioning in the data (or a surrounding
    envelope) is a misfeature.

    And you don't see that verifying that a packet of data received at
    port ABC that should only see the checksum associated with protocol
    XYZ as being similarly related?

    Why not just assume the lower level protocols are sufficient to
    guarantee reliable delivery and, if something arrives at port ABC
    then, by definition, it must be intact (not corrupt) and, as
    nothing other than protocol XYZ *should* target that port, why
    even bother checking magic numbers in a protocol packet?

    You build these *superfluous* tests into products to ensure their
    integrity -- by catching ANYTHING that "can't happen" (yet
    somehow does)

    The second difference is the purpose of the hashing.  The CRC here is for data
    integrity - spotting mistakes in the data during transfer or storage.  The hash
    in a password is for security, avoiding the password ever being transmitted or
    stored in plain text.

    Any coincidence in the the way these might be implemented is just that - coincidence.

    See the RMI desciption.

    I'm sorry, I have no idea what "RMI" is or where it is described. You've >>> mentioned that abbreviation twice, but I can't figure it out.

    <https://en.wikipedia.org/wiki/RMI>
    <https://en.wikipedia.org/wiki/OCL>

    Nothing magical with either term.

    I looked up RMI on Wikipedia before asking, and saw nothing of relevance to CRC's or checksums.

    How do you think the marshalled arguments get from device A to (remote)
    device B? And, the result(s) from device B back to device A?

    Obviously *some* form of communication medium. So, some potential for
    data to be corrupted (or altered!) in transit. Along with other
    data streams to compete for those endpoints.

    Imagine invoking a function and, between the actual construction of the
    stack frame and the first line of code in the targeted function, "something" can interfere with the data you're trying to pass (and results you're
    hoping to eventually receive) as well as the actual function being targeted!

    You don't worry about this because the compiler handles all of the machinery AND it relies on the CPU being well-behaved; nothing can sneak in and
    disturb the address/data -busses or alter register contents during this process.

    If, OTOH, such a possibility existed (as is the case with RPC/RMI), then
    you would want the compiler to generate the machinery to ensure the
    arguments get to the correct function and for the function to be able to
    ensure that the arguments are actually intended for it.

    If any of these things failed to happen, you'd panic() -- because there's nothing you can do, at that point. You certainly can't fix any corrupted values and can't deduce where they were intended to go (given that all
    of that information can be just as corrupt).

    With RPC/RMI, you can at least *know* that the "function linkage" failed
    to operate as expected ON THIS INVOCATION. Because the RPC/RMI can
    return a result indicating whether the linkage was intact *and*, if
    so, the result of the actual function invocation.

    If you deliver every packet to a single port, then the process listening
    to that port has to demultiplex incoming messages to determine the server-side stub to invoke for that message instance. You would likely use a standardized protocol because you don't know anything about the incoming message -- except that it is *supposed* to target a "remote procedure" (*local* to this node).

    OTOH, if you target each particular remote function/procedure/method to
    a function/procedure/method-SPECIFIC port, then how you handle "messages"
    for one function need have no bearing on how you handle them for others.
    And, you can exploit this as an added test to ensure the message you
    are receiving at port JKL actually *appears* to be intended for port
    JKL and not an accidental misdirect of a message intended for some
    other port.

    I noticed no mention of "OCL" in your posts, and looking

    You need to read more carefully.

    ---8<---8<---
    I can't think of any use-cases where you would be passing around a block of
    "pure" data that could reasonably take absolutely any value, without any
    type of "envelope" information, and where you would think a CRC check is
    appropriate.

    I append a *version specific* CRC to each packet of marshalled data
    in my RMIs. If the data is corrupted in transit *or* if the
    wrong version API ends up targeted, the operation will abend
    because we know the data "isn't right".

    Using a version-specific CRC sounds silly. Put the version information in
    the packet.

    The packet routed to a particular interface is *supposed* to
    conform to "version X" of an interface. There are different stubs
    generated for different versions of EACH interface. The OCL for
    the interface defines (and is used to check) the form of that
    interface to that service/mechanism.

    The parameters are checked on the client side -- why tie up the
    transport medium with data that is inappropriate (redundant)
    to THAT interface? Why tie up the server verifying that data?
    The stub generator can perform all of those checks automatically
    and CONSISTENTLY based on the OCL definition of that version
    of that interface (because developers make mistakes).

    So, at the instant you schedule the marshalled data for transmission,
    you *know* the parameters are "appropriate" and compliant with
    the constraints of THAT version of THAT interface.

    Now, you have to ensure the packet doesn't get corrupted (altered) in transmission. If it remains intact, then there is no need to check
    the parameters on the server side.

    NONE OF THE PARAMETERS... including the (implied) "interface version" field!

    Yet, folks make mistakes. So, you want some additional reassurance
    that this is at least intended for this version of the interface,
    ESPECIALLY IF THAT CAN BE MADE AVAILABLE FOR ZERO COST (i.e., check
    to see if the residual is 0xDEADBEEF instead of 0xB16B00B5).

    Why burden the packet with a "protocol version" parameter?
    ---8<---8<---

    it up on Wikipedia gives no clues.

    As I said, above:

    "If, OTOH, such a possibility existed (as is the case with RPC/RMI),
    then you would want the compiler to generate the machinery to ensure
    the arguments get to the correct function and for the function to be
    able to ensure that the arguments are actually intended for it."

    You would want the IDL (Interface Definition Language) compiler to
    generate stubs (client- and server-side) that enforced the constraints specified in the IDL and OCL.

    Again, in a perfect world, you'd not need any of these mechanisms.
    Data wouldn't be corrupted on the wire. Hostiles wouldn't try to
    subvert those messages. Developers would always ensure they
    adhered to the contracts laid out for each API. etc.

    "Yet, folks make mistakes."

    So for now, I'll assume you don't want anyone to know what you meant and I can
    safely ignore anything you write in connection with the terms.

    Perhaps other folks were more careful in their reading (of the quoted passage, above).

    OTOH, "salting" the calculation so that it is expected to yield
    a value of 0x13 means *those* situations will be flagged as errors
    (and a different set of situations will sneak by, undetected).

    And that gives you exactly /zero/ benefit.

    See above.

    I did.  Zero benefit.

    Perhaps your reading was as deficient there as you've admitted it to
    be elsewhere?

    Actually, it is worse than useless - it makes it harder to identify the protocol, and reduces the information content of the CRC check.

    You run your hash algorithm, and check for the single value that indicates >>> no errors.  It does not matter if that number is 0, 0x13, or - often more >> -----------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    As you've admitted, it doesn't matter.  So, why wouldn't I opt to have
    an algorithm for THIS interface give me a result that is EXPECTED
    for this protocol?  What value picking "0"?

    A /single/ result does not matter (other than needlessly complicating things).
    Having multiple different valid results /does/ matter.

    For any CRC calculation instance, you *know* what the result is expected to be. How many different "check" algorithms do you think are operating in your
    PC as you type/read (i.e., all of the protocols between devices running
    in the box, all of the ROMs in those devices, the media accessed by them, etc.)? Has EVERY developer who needed a CRC settled on the "Holy Grail"
    of CRCs... because it's easiest? Or, have they each chosen schemes that
    they consider appropriate to their needs?

    I compute hashes of individual memory pages during reschedule()s.
    And, verify that they are intact when next accessed (because they
    may have been corrupted by a side-channel attack while not
    actively being accessed -- by the owning task -- despite the
    protections afforded by the MMU). Should I use the same "check"
    algorithm that I do when sending a message to another node?
    Or, that I use on the wire?

    Should I use the same algorithm when checking 4K pages as I would
    when checking 16MB pages? The goal isn't to *correct* errors so
    I'd want one that detects the greatest number of errors LIKELY
    INDUCED BY SUCH AN ATTACK (which can differ from the types of
    *burst* errors that corrupt packets on the wire or lead to
    read/write disturb errors in FLASH...)

    As I said, up-thread: "... you don't just use CRCs (secure hashes, etc.)
    on 'code images'"

    That is why you need to distinguish between the two possibilities. If you
    don't have to worry about malicious attacks, a 32-bit CRC takes a dozen >>>>> lines of C code and a 1 KB table, all running extremely efficiently.  If
    security is an issue, you need digital signatures - an RSA-based signature
    system is orders of magnitude more effort in both development time and in
    run time.

    It's considerably more expensive AND not fool-proof -- esp if the
    attacker knows you are signing binaries.  "OK, now I need to find
    WHERE the signature is verified and just patch that "CALL" out
    of the code".

    I'm not sure if that is a straw-man argument, or just showing your ignorance
    of the topic.  Do you really think security checks are done by the program
    you are trying to send securely?  That would be like trying to have building
    security where people entering the building look at their own security cards.

    Do YOU really think we all design applications that run in PCs where some
    CLOSED OS performs these tests in a manner that can't be subverted?

    Do you bother to read my posts at all?  Or do you prefer to make up things that
    you imagine I write, so that you can make nonsensical attacks on them? Certainly there is no sane reading of my posts (written and sent from an /open/
    OS) where "do not rely on security by obscurity" could be taken to mean "rely
    on obscured and closed platforms".

    "Do you really think security checks are done by the program you are trying
    to send securely? That would be like trying to have building security where people entering the building look at their own security cards."

    Who *else* is involved in the acceptance/verification of a code image
    in an embedded product? (Not all "run Linux")

    *WE* (tend to) write ALL the code in the products developed, here.
    So, whether it's the POST WE wrote that is performing the test or
    the loader WE wrote, it's still *our* program.

    Yes, we ARE looking at our own security cards!

    Manufacturers *try* to hide ("obscurity") details of these mechanisms
    in an attempt to improve effective security.  But, there's nothing
    that makes these guarantees.

    Why are you trying to "persuade" me that manufacturer obscurity is a bad thing?  You have been promoting obscurity of algorithms as though it were helpful for security - I have made clear that it is not.  Are you getting your
    own position mixed up with mine?

    If the manufacturer saw no benefit to obscurity, then why embrace it?

    Give me the sources for Windows (Linux, *BSD, etc.) and I can
    subvert all the state-of-the-art digital signing used to ensure
    binaries aren't altered.  Nothing *outside* the box is involved
    so, by definition, everything I need has to reside *in* the box.

    No, you can't.  The sources for Linux and *BSD /are/ all freely available.  The
    private signing keys used by, for example, Red Hat or Debian, are /not/ freely
    available.  You cannot make changes to a Red Hat or Debian package that will
    pass the security checks - you are unable to sign the packages.

    Sure I can! If you are just signing a package to verify that it hasn't
    been tampered with BUT THE CONTENTS ARE NOT ENCRYPTED, then all you have
    to do is remove the signature check -- leaving the signature in the
    (unchecked) executable.

    This is different than *encrypting* the package (the OP said nothing
    about encrypting his executable).

    This is precisely because something /outside/ the box /is/ involved - the private half of the public/private key used for signing.  The public half - and
    all the details of the algorithms - is easily available to let people verify the signature, but the private half is kept secret.

    And, if I eliminate the check that verifies the signature, then what
    value signing? "Yes, I assume the risk of running an allegedly signed executable (THAT MAY HAVE BEEN TAMPERED WITH)."

    (Sorry, but I've skipped and snipped the rest.  I simply don't have time to go
    through it in detail.  If others find it useful or interesting, that's great,
    but there has to be limits somewhere.)

    The limits seem to be in your imagination. You believe there's *a* way
    of doing things instead of a multitude of ways, each with different
    tradeoffs. And, think you'll always have <whatever> is needed (resources, time, staff, expertise, etc.) to get exactly those things. The "box" surrounding you limits what you can see.

    Sad in an engineer. But, must be incredibly comforting!

    Bye, David.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed May 3 14:48:52 2023
    From Newsgroup: comp.arch.embedded

    On 03/05/2023 09:15, Don Y wrote:
    On 4/24/2023 7:37 AM, David Brown wrote:
    On 24/04/2023 09:32, Don Y wrote:
    On 4/22/2023 7:57 AM, David Brown wrote:
    However, in almost every case where CRC's might be useful, you
    have additional checks of the sanity of the data, and an all-zero >>>>>> or all-one data block would be rejected.  For example, Ethernet
    packets use CRC for integrity checking, but an attempt to send a
    packet type 0 from MAC address 00:00:00:00:00:00 to address
    00:00:00:00:00:00, of length 0, would be rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its
    CRC?
    Run the CRC first.  If it fails, decide how you are going to proceed >>>>> or recover.

    That is usually the order, yes.  Sometimes you want "fail fast",
    such as dropping a packet that was not addressed to you (it doesn't
    matter if it was received correctly but for someone else, or it was
    addressed to you but the receiver address was corrupted - you are
    dropping the packet either way). But usually you will run the CRC
    then look at the data.

    But the order doesn't matter - either way, you are still checking
    for valid data, and if the data is invalid, it does not matter if
    the CRC only passed by luck or by all zeros.

    You're assuming the CRC is supposed to *vouch* for the data.
    The CRC can be there simply to vouch for the *transport* of a
    datagram.

    I am assuming that the CRC is there to determine the integrity of the
    data in the face of possible unintentional errors.  That's what CRC
    checks are for. They have nothing to do with the content of the data,
    or the type of the data package or image.

    Exactly.  And, a CRC on *a* protocol can use ANY ALGORITHM that the protocol
    defines.  Not some "canned one-size fits all" approach.

    It makes sense to use an 8-bit CRC on small telegrams, 16-bit CRC on
    bigger things, 32-bit CRC on flash images, and 64-bit CRC when you want
    to use the CRC as an identifying hash (and malicious tampering is non-existent). There can also be benefits of particular choices of CRC
    for particular use-cases, in terms of detection of certain error
    patterns for certain lengths of data.


    What I don't see any point in is using variations, such as different
    initial values. I've already said why I think pathological cases such
    as all zero data are normally irrelevant - but I can accept that there
    may be occasions when they could happen, and thus a /single/ non-zero
    initial value would be useful.


    As an example of the use of CRC's in messaging, look at Ethernet frames:

    <https://en.wikipedia.org/wiki/Ethernet_frame>

    The CRC  does not care about the content of the data it protects.

    AND, if the packet yielded an incorrect CRC, you can assume the
    data was corrupt... OR, you are looking at a different protocol
    and MISTAKING it for something that you *think* it might be.

    If the CRC does not match, you reject the packet or data. End of story.
    You don't know or care /why/ - because you cannot be sure of any reason.


    If I produce a stream of data, can you tell me what the checksum
    for THAT stream *should* be?  You have to either be told what
    it is (and have a way of knowing what the checksum SHOULD be)
    *or* have to make some assumptions about it.

    If you are transmitting some data then both sides need to agree on the
    CRC algorithm (size, polynomial, initial value, etc.), and on whether a
    check is "CRC of everything gives 0" or "CRC of everything except the pre-calculated CRC equals the transmitted pre-calculated CRC".


    If you have assumed wrong *or* if the data has been corrupt, then
    the CRC should fail.  You don't care why it failed -- because you
    can't do anything about it.  You just know that you can't use the data
    in the way you THOUGHT it could be used.


    Well, yes. Obviously.

    If you are making incorrect assumptions here, someone is doing a pretty
    poor job at designing, describing or implementing the communications
    system. It is just like getting the baud rate wrong on a UART link.


    So, use a version-specific CRC on the packet.  If it fails, then
    either the data in the packet has been corrupted (which could just
    as easily have involved an embedded "interface version" parameter);
    or the packet was formed with the wrong CRC.

    If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
    why bother looking at a "protocol version" parameter?  Would
    you ALSO want to verify all the rest of the parameters?

    I'm sorry, I simply cannot see your point.  Identifying the version of
    a protocol, or other protocol type information, is a totally
    orthogonal task to ensuring the integrity of the data.  The concepts
    should be handled separately.

    It is.  A packet using protocol XYZ is delivered to port ABC.
    Port ABC *only* handles protocol XYZ.  Anything else arriving there,
    with a potentially different checksum, is invalid.  Even if, for example, byte number 27 happens to have the correct "magic number" for that
    protocol.

    Because the message doesn't obey the rules defined by the protocol
    FOR THAT PORT.  What do I gain by insisting that byte number 27 must
    be 0x5A that the CRC doesn't already tell me?


    A CRC failure doesn't tell you that the telegram type is wrong. It
    tells you that the data is corrupted.

    If there can be different protocols, or telegram types, or whatever,
    then identify them. Stop playing silly buggers with abuse of different concepts that have different roles in the communication system.


    Salt just ensures that you can differentiate between functionally
    identical
    values.  I.e., in a CRC, it differentiates between the "0x0000" that
    CRC-1
    generates from the "0x0000" that CRC-2 generates.

    Can we agree that this is called an "initial value", not "salt" ?

    It depends on how you implement it.  The point is to produce
    different results for the same polynmomial.

    It is called an "initial value" - it is not "salt". It doesn't matter
    if you want to pick different initial values for your CRC, or why you
    want to do that. You are still not talking about salt.

    If you insist on using your own terminology, you will be left talking to yourself.


    You don't see the parallel to ensuring that *my* use of "Passw0rd" is
    encoded in a different manner than *your* use of "Passw0rd"?

    No.  They are different things.

    An important difference is that adding "salt" to a password hash is an
    important security feature.  Picking a different initial value for a
    CRC instead of having appropriate protocol versioning in the data (or
    a surrounding envelope) is a misfeature.

    And you don't see that verifying that a packet of data received at
    port ABC that should only see the checksum associated with protocol
    XYZ as being similarly related?

    No. They are different things.

    Look, I /do/ understand what you are doing, and I appreciate that you
    think it is a good idea. To me, it is an unpleasant mix of orthogonal concepts that needlessly complicates things. Just because something is /possible/, does not mean it is a good idea.



    See the RMI desciption.

    I'm sorry, I have no idea what "RMI" is or where it is described.
    You've mentioned that abbreviation twice, but I can't figure it out.

    <https://en.wikipedia.org/wiki/RMI>
    <https://en.wikipedia.org/wiki/OCL>

    Nothing magical with either term.

    I looked up RMI on Wikipedia before asking, and saw nothing of
    relevance to CRC's or checksums.


    I've snipped the ramblings that have nothing to do with the question I
    asked. I assume you don't want to answer me.


    I noticed no mention of "OCL" in your posts, and looking

    You need to read more carefully.

    I've looked. You did not mention "OCL" anywhere before giving the URL
    to the wikipedia page. You only mentioned it /afterwards/ - without any context that suggests what you meant. (Here's a hint for you - if you
    want to refer to a wikipedia page, put a link to the /relevant/ page.)


    Presumably "RMI" and "OCL" have particular meanings that are relevant
    for projects you work on, and are so familiar to you that they are part
    of your language. No one else knows or cares what they are, and they
    are irrelevant in this thread. So let's leave them there.


    Give me the sources for Windows (Linux, *BSD, etc.) and I can
    subvert all the state-of-the-art digital signing used to ensure
    binaries aren't altered.  Nothing *outside* the box is involved
    so, by definition, everything I need has to reside *in* the box.

    No, you can't.  The sources for Linux and *BSD /are/ all freely
    available.  The private signing keys used by, for example, Red Hat or
    Debian, are /not/ freely available.  You cannot make changes to a Red
    Hat or Debian package that will pass the security checks - you are
    unable to sign the packages.

    Sure I can!  If you are just signing a package to verify that it hasn't
    been tampered with BUT THE CONTENTS ARE NOT ENCRYPTED, then all you have
    to do is remove the signature check -- leaving the signature in the (unchecked) executable.

    Woah, you /really/ don't understand this stuff, do you? Here's a clue -
    ask yourself what is being signed, and what is doing the checking.

    Perhaps also ask yourself if /all/ the people involved in security for
    Linux or BSD - all the companies such as Red Hat, IBM, Intel, etc., -
    ask if /all/ of them have got it wrong, and only /you/ realise that
    digital signatures on open source software is useless? /Very/
    occasionally, there is a lone genius that understands something while
    all the other experts are wrong - but in most cases, the loner is the
    one that is wrong.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Tue May 9 20:34:42 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-04-30 kl. 16:19, skrev David Brown:
    On 29/04/2023 23:03, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 15:04, skrev David Brown:
    On 28/04/2023 10:50, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:38, skrev David Brown:


    Or for my preferences, the CRC "DIGEST" would be put at the end of
    the image, rather than near the start.  Then the "from, to" range
    would cover the entire image except for the final CRC.  But I'd
    have a similar directive for the length of the image at a specific
    area near the start.


    I really do not see a benefit of splitting the meta information
    about the image to two separate locations.

    The bootloader uses the struct for all checks.
    It is a much simpler implementation once the tools support it.

    You might find it easier to write a tool which adds the CRC at the
    end, but that is a different issue.

    Occam's Razor!


    There are different needs for different projects - and more than one
    way to handle them.  I find adding a CRC at the end of the image
    works best for me, but I have no problem appreciating that other
    people have different solutions.




    I'd be curious to know WHY it works best for you.
    /Ulf

    I regularly do not have a bootloader - I am not free to put a CRC at the start of the image.  And if the bootloader itself needs to be updatable,
    it is again impossible to have the CRC (or any other metadata) at the
    start of the image.  I want most of the metadata to be at a fixed
    location as close to the start as reasonably practical (such as after
    the vector table, or other microcontroller-specific information that
    might be used for flash security, early chip setup, etc.).  If I am to
    have one single checksum for the image, which is what I prefer, then it
    has to be at the end of the image.  For example, there might be :

    0x00000000 : vectors
    0x00000400 : external flash configuration block
    0x00000600 : program info metadata
    0x00001000 : main program
               : CRC

    There is no way to have the metadata or CRC at the start of the image,
    so the CRC goes at the end.

    For the Bootloader, I keep the CRC right after the vectors.
    I keep a copy of the vectors right after the CRC, and compare
    the two vector tables.
    This is to always know the location of the CRC.


    It would be possible to have two CRCs - one that covers the vectors, configuration information, and metadata and is placed second last in the metadata block.  A second CRC placed last in the metadata block would
    cover the main program - everything after the CRCs.  That would let me
    have a single metadata block and no CRC at the end of the image.
    However, it would mean splitting the check in two, rather than one check
    for the whole image.  I don't see that as a benefit.


    When making images that are started from a bootloader, I certainly
    /could/ put the CRC at the start.  But I see no particular reason to do
    so - it makes a lot more sense to keep a similar format.

    You want more metadata like entry point and length, as well as text information about the image. Putting things in a header means that
    location is fixed.
    There are a number of checks in my bootloader to ensure that the
    information in the header makes sense.

    (Bootloaders don't often have to check their own CRC - after all, even
    if the CRC fails there is usually little you can do about it, except
    charge on and hope for the best.  But if the bootloader is updatable in system, then you want a CRC during the download procedure to check that
    you have got a good download copy before updating the flash.)

    In functional safety applications you regularily check the flash
    contents and refuse to boot if there is a mismatch.

    /Ulf








    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Tue May 9 20:42:25 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-05-03 kl. 14:48, skrev David Brown:
    On 03/05/2023 09:15, Don Y wrote:
    On 4/24/2023 7:37 AM, David Brown wrote:
    On 24/04/2023 09:32, Don Y wrote:
    On 4/22/2023 7:57 AM, David Brown wrote:
    However, in almost every case where CRC's might be useful, you
    have additional checks of the sanity of the data, and an all-zero >>>>>>> or all-one data block would be rejected.  For example, Ethernet >>>>>>> packets use CRC for integrity checking, but an attempt to send a >>>>>>> packet type 0 from MAC address 00:00:00:00:00:00 to address
    00:00:00:00:00:00, of length 0, would be rejected anyway.

    Why look at "data" -- which may be suspect -- and *then* check its >>>>>> CRC?
    Run the CRC first.  If it fails, decide how you are going to proceed >>>>>> or recover.

    That is usually the order, yes.  Sometimes you want "fail fast",
    such as dropping a packet that was not addressed to you (it doesn't >>>>> matter if it was received correctly but for someone else, or it was >>>>> addressed to you but the receiver address was corrupted - you are
    dropping the packet either way). But usually you will run the CRC
    then look at the data.

    But the order doesn't matter - either way, you are still checking
    for valid data, and if the data is invalid, it does not matter if
    the CRC only passed by luck or by all zeros.

    You're assuming the CRC is supposed to *vouch* for the data.
    The CRC can be there simply to vouch for the *transport* of a
    datagram.

    I am assuming that the CRC is there to determine the integrity of the
    data in the face of possible unintentional errors.  That's what CRC
    checks are for. They have nothing to do with the content of the data,
    or the type of the data package or image.

    Exactly.  And, a CRC on *a* protocol can use ANY ALGORITHM that the
    protocol
    defines.  Not some "canned one-size fits all" approach.

    It makes sense to use an 8-bit CRC on small telegrams, 16-bit CRC on
    bigger things, 32-bit CRC on flash images, and 64-bit CRC when you want
    to use the CRC as an identifying hash (and malicious tampering is non-existent).  There can also be benefits of particular choices of CRC
    for particular use-cases, in terms of detection of certain error
    patterns for certain lengths of data.

    Flash images larger than X kB may need a 64-bit CRC.
    I don't remember exactly when to start considering it,
    but something between 64kB-256kB is probably correct.

    It is all to do with Hamming Distance, and this is also affected by the polynome.
    /Ulf




    What I don't see any point in is using variations, such as different
    initial values.  I've already said why I think pathological cases such
    as all zero data are normally irrelevant - but I can accept that there
    may be occasions when they could happen, and thus a /single/ non-zero initial value would be useful.


    As an example of the use of CRC's in messaging, look at Ethernet frames: >>>
    <https://en.wikipedia.org/wiki/Ethernet_frame>

    The CRC  does not care about the content of the data it protects.

    AND, if the packet yielded an incorrect CRC, you can assume the
    data was corrupt... OR, you are looking at a different protocol
    and MISTAKING it for something that you *think* it might be.

    If the CRC does not match, you reject the packet or data.  End of story.
     You don't know or care /why/ - because you cannot be sure of any reason.


    If I produce a stream of data, can you tell me what the checksum
    for THAT stream *should* be?  You have to either be told what
    it is (and have a way of knowing what the checksum SHOULD be)
    *or* have to make some assumptions about it.

    If you are transmitting some data then both sides need to agree on the
    CRC algorithm (size, polynomial, initial value, etc.), and on whether a check is "CRC of everything gives 0" or "CRC of everything except the pre-calculated CRC equals the transmitted pre-calculated CRC".


    If you have assumed wrong *or* if the data has been corrupt, then
    the CRC should fail.  You don't care why it failed -- because you
    can't do anything about it.  You just know that you can't use the data
    in the way you THOUGHT it could be used.


    Well, yes.  Obviously.

    If you are making incorrect assumptions here, someone is doing a pretty
    poor job at designing, describing or implementing the communications system.  It is just like getting the baud rate wrong on a UART link.


    So, use a version-specific CRC on the packet.  If it fails, then
    either the data in the packet has been corrupted (which could just
    as easily have involved an embedded "interface version" parameter);
    or the packet was formed with the wrong CRC.

    If the CRC is correct FOR THAT VERSION OF THE PROTOCOL, then
    why bother looking at a "protocol version" parameter?  Would
    you ALSO want to verify all the rest of the parameters?

    I'm sorry, I simply cannot see your point.  Identifying the version
    of a protocol, or other protocol type information, is a totally
    orthogonal task to ensuring the integrity of the data.  The concepts
    should be handled separately.

    It is.  A packet using protocol XYZ is delivered to port ABC.
    Port ABC *only* handles protocol XYZ.  Anything else arriving there,
    with a potentially different checksum, is invalid.  Even if, for example, >> byte number 27 happens to have the correct "magic number" for that
    protocol.

    Because the message doesn't obey the rules defined by the protocol
    FOR THAT PORT.  What do I gain by insisting that byte number 27 must
    be 0x5A that the CRC doesn't already tell me?


    A CRC failure doesn't tell you that the telegram type is wrong.  It
    tells you that the data is corrupted.

    If there can be different protocols, or telegram types, or whatever,
    then identify them.  Stop playing silly buggers with abuse of different concepts that have different roles in the communication system.


    Salt just ensures that you can differentiate between functionally
    identical
    values.  I.e., in a CRC, it differentiates between the "0x0000" that >>>> CRC-1
    generates from the "0x0000" that CRC-2 generates.

    Can we agree that this is called an "initial value", not "salt" ?

    It depends on how you implement it.  The point is to produce
    different results for the same polynmomial.

    It is called an "initial value" - it is not "salt".  It doesn't matter
    if you want to pick different initial values for your CRC, or why you
    want to do that.  You are still not talking about salt.

    If you insist on using your own terminology, you will be left talking to yourself.


    You don't see the parallel to ensuring that *my* use of "Passw0rd" is
    encoded in a different manner than *your* use of "Passw0rd"?

    No.  They are different things.

    An important difference is that adding "salt" to a password hash is
    an important security feature.  Picking a different initial value for
    a CRC instead of having appropriate protocol versioning in the data
    (or a surrounding envelope) is a misfeature.

    And you don't see that verifying that a packet of data received at
    port ABC that should only see the checksum associated with protocol
    XYZ as being similarly related?

    No.  They are different things.

    Look, I /do/ understand what you are doing, and I appreciate that you
    think it is a good idea.  To me, it is an unpleasant mix of orthogonal concepts that needlessly complicates things.  Just because something is /possible/, does not mean it is a good idea.



    See the RMI desciption.

    I'm sorry, I have no idea what "RMI" is or where it is described.
    You've mentioned that abbreviation twice, but I can't figure it out.

    <https://en.wikipedia.org/wiki/RMI>
    <https://en.wikipedia.org/wiki/OCL>

    Nothing magical with either term.

    I looked up RMI on Wikipedia before asking, and saw nothing of
    relevance to CRC's or checksums.


    I've snipped the ramblings that have nothing to do with the question I asked.  I assume you don't want to answer me.


    I noticed no mention of "OCL" in your posts, and looking

    You need to read more carefully.

    I've looked.  You did not mention "OCL" anywhere before giving the URL
    to the wikipedia page.  You only mentioned it /afterwards/ - without any context that suggests what you meant.  (Here's a hint for you - if you
    want to refer to a wikipedia page, put a link to the /relevant/ page.)


    Presumably "RMI" and "OCL" have particular meanings that are relevant
    for projects you work on, and are so familiar to you that they are part
    of your language.  No one else knows or cares what they are, and they
    are irrelevant in this thread.  So let's leave them there.


    Give me the sources for Windows (Linux, *BSD, etc.) and I can
    subvert all the state-of-the-art digital signing used to ensure
    binaries aren't altered.  Nothing *outside* the box is involved
    so, by definition, everything I need has to reside *in* the box.

    No, you can't.  The sources for Linux and *BSD /are/ all freely
    available.  The private signing keys used by, for example, Red Hat or
    Debian, are /not/ freely available.  You cannot make changes to a Red
    Hat or Debian package that will pass the security checks - you are
    unable to sign the packages.

    Sure I can!  If you are just signing a package to verify that it hasn't
    been tampered with BUT THE CONTENTS ARE NOT ENCRYPTED, then all you have
    to do is remove the signature check -- leaving the signature in the
    (unchecked) executable.

    Woah, you /really/ don't understand this stuff, do you?  Here's a clue - ask yourself what is being signed, and what is doing the checking.

    Perhaps also ask yourself if /all/ the people involved in security for
    Linux or BSD - all the companies such as Red Hat, IBM, Intel, etc., -
    ask if /all/ of them have got it wrong, and only /you/ realise that
    digital signatures on open source software is useless?  /Very/ occasionally, there is a lone genius that understands something while
    all the other experts are wrong - but in most cases, the loner is the
    one that is wrong.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed May 10 10:06:34 2023
    From Newsgroup: comp.arch.embedded

    On 09/05/2023 20:42, Ulf Samuelsson wrote:
    Den 2023-05-03 kl. 14:48, skrev David Brown:

    It makes sense to use an 8-bit CRC on small telegrams, 16-bit CRC on
    bigger things, 32-bit CRC on flash images, and 64-bit CRC when you
    want to use the CRC as an identifying hash (and malicious tampering is
    non-existent).  There can also be benefits of particular choices of
    CRC for particular use-cases, in terms of detection of certain error
    patterns for certain lengths of data.

    Flash images larger than X kB may need a 64-bit CRC.
    I don't remember exactly when to start considering it,
    but something between 64kB-256kB is probably correct.

    It is all to do with Hamming Distance, and this is also affected by the polynome.
    /Ulf


    "Need" is too strong a word here. A CRC will guarantee detection of
    certain kinds of error (such as a single bit error), regardless of the
    length of the data. Some kinds of error are limited by length. If you
    plot a graph with guaranteed Hamming distance on the vertical scale and
    length of data on the horizontal scale, each CRC will drop off in steps.
    For the same CRC size, some will hold a high Hamming distance for
    longer and then drop off sharply, others will hold a lower Hamming
    distance for very large data. And in general, a bigger CRC will be
    better here.

    But Hamming distance is not everything. It is important in situations
    where there is an approximately independent risk of corruption for each
    bit individually - such as during radio transmission. Programming
    images into flash has a completely different error risk pattern. A
    little Hamming is nice to guarantee that any single cell failure in the
    flash will be be found, but the more realistic flash problems involve
    large scale effects - failure to erase a block fully, or software flaws.
    For this kind of thing, pretty much any valid CRC polynomial works the
    same - a 32-bit polynomial gives you a 1 in 2 ^ 32 chance of the error
    going undetected. Yes, a 1 in 2 ^ 64 chance is better, but it's rarely something to get excited about.

    Note that if you are sending the image to a board via a potentially
    flawed mechanism, you'll want appropriate checks during the transfers. Ethernet, Wifi, Bluetooth, USB - they will all have suitable checksums
    for each packet. And for some of those, Hamming distance and particular choice of polynomial /is/ an important consideration.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed May 10 10:18:52 2023
    From Newsgroup: comp.arch.embedded

    On 09/05/2023 20:34, Ulf Samuelsson wrote:
    Den 2023-04-30 kl. 16:19, skrev David Brown:
    On 29/04/2023 23:03, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 15:04, skrev David Brown:
    On 28/04/2023 10:50, Ulf Samuelsson wrote:
    Den 2023-04-28 kl. 09:38, skrev David Brown:


    Or for my preferences, the CRC "DIGEST" would be put at the end of >>>>>> the image, rather than near the start.  Then the "from, to" range >>>>>> would cover the entire image except for the final CRC.  But I'd
    have a similar directive for the length of the image at a specific >>>>>> area near the start.


    I really do not see a benefit of splitting the meta information
    about the image to two separate locations.

    The bootloader uses the struct for all checks.
    It is a much simpler implementation once the tools support it.

    You might find it easier to write a tool which adds the CRC at the
    end, but that is a different issue.

    Occam's Razor!


    There are different needs for different projects - and more than one
    way to handle them.  I find adding a CRC at the end of the image
    works best for me, but I have no problem appreciating that other
    people have different solutions.




    I'd be curious to know WHY it works best for you.
    /Ulf

    I regularly do not have a bootloader - I am not free to put a CRC at
    the start of the image.  And if the bootloader itself needs to be
    updatable, it is again impossible to have the CRC (or any other
    metadata) at the start of the image.  I want most of the metadata to
    be at a fixed location as close to the start as reasonably practical
    (such as after the vector table, or other microcontroller-specific
    information that might be used for flash security, early chip setup,
    etc.).  If I am to have one single checksum for the image, which is
    what I prefer, then it has to be at the end of the image.  For
    example, there might be :

    0x00000000 : vectors
    0x00000400 : external flash configuration block
    0x00000600 : program info metadata
    0x00001000 : main program
                : CRC

    There is no way to have the metadata or CRC at the start of the image,
    so the CRC goes at the end.

    For the Bootloader, I keep the CRC right after the vectors.
    I keep a copy of the vectors right after the CRC, and compare
    the two vector tables.
    This is to always know the location of the CRC.

    Fair enough - that is an entirely reasonable alternative. I have a
    knee-jerk reaction against duplication as a check, having cut my teeth
    on microcontrollers where 16 KB devices were "big", but of course a duplication of the vector table is not going to be a noticeable waste on
    a more modern device.

    It does, however, mean extra steps in checking, compared to a simpler
    CRC run over the entire image.



    It would be possible to have two CRCs - one that covers the vectors,
    configuration information, and metadata and is placed second last in
    the metadata block.  A second CRC placed last in the metadata block
    would cover the main program - everything after the CRCs.  That would
    let me have a single metadata block and no CRC at the end of the
    image. However, it would mean splitting the check in two, rather than
    one check for the whole image.  I don't see that as a benefit.


    When making images that are started from a bootloader, I certainly
    /could/ put the CRC at the start.  But I see no particular reason to
    do so - it makes a lot more sense to keep a similar format.

    You want more metadata like entry point and length, as well as text information about the image. Putting things in a header means that
    location is fixed.
    There are a number of checks in my bootloader to ensure that the
    information in the header makes sense.


    I do have all that kind of thing too. It's only the CRC itself that is
    put at the end, and it is easily found since the length of the image is
    in the metadata. (We are talking about one pointer access more than
    having it at a fixed address - it's not hard to find it!).


    (Bootloaders don't often have to check their own CRC - after all, even
    if the CRC fails there is usually little you can do about it, except
    charge on and hope for the best.  But if the bootloader is updatable
    in system, then you want a CRC during the download procedure to check
    that you have got a good download copy before updating the flash.)

    In functional safety applications you regularily check the flash
    contents and refuse to boot if there is a mismatch.


    Yes, that is a possibility.

    I've worked on safety-certified systems which required things like
    regular checks of flash while running (not just at bootup). A lot of
    the so-called "safety requirements" were directly detrimental. I
    believe many of these kinds of requirements were made by people who
    understood the "Swiss cheese" model of risks and safety, but not the
    more realistic "Hot cheese" model. And they seem more concerned about box-ticking and legal arse-covering than actual risk reduction.





    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ulf Samuelsson@ulf.r.samuelsson@gmail.com to comp.arch.embedded on Wed May 10 12:03:10 2023
    From Newsgroup: comp.arch.embedded

    Den 2023-05-10 kl. 10:06, skrev David Brown:
    On 09/05/2023 20:42, Ulf Samuelsson wrote:
    Den 2023-05-03 kl. 14:48, skrev David Brown:

    It makes sense to use an 8-bit CRC on small telegrams, 16-bit CRC on
    bigger things, 32-bit CRC on flash images, and 64-bit CRC when you
    want to use the CRC as an identifying hash (and malicious tampering
    is non-existent).  There can also be benefits of particular choices
    of CRC for particular use-cases, in terms of detection of certain
    error patterns for certain lengths of data.

    Flash images larger than X kB may need a 64-bit CRC.
    I don't remember exactly when to start considering it,
    but something between 64kB-256kB is probably correct.

    It is all to do with Hamming Distance, and this is also affected by
    the polynome.
    /Ulf


    "Need" is too strong a word here.  A CRC will guarantee detection of certain kinds of error (such as a single bit error), regardless of the length of the data.  Some kinds of error are limited by length.  If you plot a graph with guaranteed Hamming distance on the vertical scale and length of data on the horizontal scale, each CRC will drop off in steps.
     For the same CRC size, some will hold a high Hamming distance for
    longer and then drop off sharply, others will hold a lower Hamming
    distance for very large data.  And in general, a bigger CRC will be
    better here.

    But Hamming distance is not everything.  It is important in situations where there is an approximately independent risk of corruption for each
    bit individually - such as during radio transmission.  Programming
    images into flash has a completely different error risk pattern.  A
    little Hamming is nice to guarantee that any single cell failure in the flash will be be found, but the more realistic flash problems involve
    large scale effects - failure to erase a block fully, or software flaws.
     For this kind of thing, pretty much any valid CRC polynomial works the same - a 32-bit polynomial gives you a 1 in 2 ^ 32 chance of the error
    going undetected.  Yes, a 1 in 2 ^ 64 chance is better, but it's rarely something to get excited about.

    Programming a flash memory can flip bits in parts of the flash memory
    which is not programmed.
    Bit errors can also be introduced by radiation.
    Some applications require better security than others.
    Functional Safety may require CRC size based on code size.
    /Ulf



    Note that if you are sending the image to a board via a potentially
    flawed mechanism, you'll want appropriate checks during the transfers. Ethernet, Wifi, Bluetooth, USB - they will all have suitable checksums
    for each packet.  And for some of those, Hamming distance and particular choice of polynomial /is/ an important consideration.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Sat Aug 5 01:42:58 2023
    From Newsgroup: comp.arch.embedded

    On 5/3/2023 5:48 AM, David Brown wrote:

    Give me the sources for Windows (Linux, *BSD, etc.) and I can
    subvert all the state-of-the-art digital signing used to ensure
    binaries aren't altered.  Nothing *outside* the box is involved
    so, by definition, everything I need has to reside *in* the box.

    No, you can't.  The sources for Linux and *BSD /are/ all freely available.
    The private signing keys used by, for example, Red Hat or Debian, are /not/
    freely available.  You cannot make changes to a Red Hat or Debian package >>> that will pass the security checks - you are unable to sign the packages. >>
    Sure I can!  If you are just signing a package to verify that it hasn't
    been tampered with BUT THE CONTENTS ARE NOT ENCRYPTED, then all you have
    to do is remove the signature check -- leaving the signature in the
    (unchecked) executable.

    Woah, you /really/ don't understand this stuff, do you?  Here's a clue - ask
    yourself what is being signed, and what is doing the checking.

    Exactly. You don't attack the signature or the keys. You BUILD A NEW KERNEL THAT DOESN'T CHECK THE SIGNATURE. You attack (replace if you have access
    to the sources -- as I stipulated above) the "what is doing the checking".
    This is c.a.e; you likely have physical access and control of the device (unlike trying to attack a remote system)

    The binary is exposed UNENCRYPTED in the signed executable (please note my stipulation to that, too, above). The only thing preventing its execution
    (if tampered -- or unlicensed!) is the signature check. Bypass that in any
    way and the code executes AS IF signed.

    I design "devices". Don't you think if there was a foolproof way (by resorting to "school boy techniques") to protect them from counterfeiting and tampering that I would have already embraced that? That EVERY computer-based product would be INHERENTLY SECURED??

    [There are ways that are far from theoretical yet considerably more
    effective. A signature check is easy to detect in an executing device
    and, thus, elided. Lather, rinse, repeat for each precursor level
    of such protection.]

    Please read what I've written more carefully, lest you look foolish. Or,
    spend a few years playing red-blue games and actually trying to subvert hardware and software protection mechanisms in REAL products. (Hint:
    you will need to think down BELOW the hardware level to do so successfully
    so you can bypass the hardware mechanisms that folks keep trying to embed
    in their products)

    Perhaps also ask yourself if /all/ the people involved in security for Linux or
    BSD - all the companies such as Red Hat, IBM, Intel, etc., - ask if /all/ of them have got it wrong, and only /you/ realise that digital signatures on open
    source software is useless?

    The signature is only of use if the mechanism verifying it is tamperproof. That's not possible on most (all?) devices sold. SOMEONE has physical
    access to the device so all of the mechanisms you put in place can be subverted.

    *Ask* the Linux and BSD crowds if they can GUARANTEE that ALTERED signed code can't be executed on a system where the adversary can build and install their own kernel. Or, probe the innerworkings of such a device AT THEIR LEISURE/

    /Very/ occasionally, there is a lone genius that
    understands something while all the other experts are wrong - but in most cases, the loner is the one that is wrong.

    In this case, you have clearly failed to understand what was being said.
    So, don't count yourself in with the "experts".

    If the kernel loading the executable doesn't contain code to validate the signature (and, if I have the sources for said kernel/OS then I can
    easily *make* such a kernel) then the signature is just another unused "section" in the BLOB. Just like debug symbols or copyright information.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Don Y@blockedofcourse@foo.invalid to comp.arch.embedded on Sat Aug 5 01:48:49 2023
    From Newsgroup: comp.arch.embedded

    On 5/10/2023 3:03 AM, Ulf Samuelsson wrote:
    Programming a flash memory can flip bits in parts of the flash memory which is
    not programmed.
    Bit errors can also be introduced by radiation.

    Executing code can also introduce write *and* read disturb events.

    Ask yourself how to protect a design that allows arbitrary code to
    be executed (even if in a sandbox) in the presence of potential
    side-channel exploits.

    Or, as a "simpler" problem: how to detect if such an exploit
    has been invoked (even possibly unintentionally)!

    [Imagine devices that "run forever"...]

    Some applications require better security than others.

    Exactly.

    Functional Safety may require CRC size based on code size.


    --- Synchronet 3.20a-Linux NewsLink 1.114