• New meet old

    From =?UTF-8?Q?Arne_Vajh=C3=B8j?=@arne@vajhoej.dk to comp.lang.cobol on Sun Mar 31 19:55:30 2024
    From Newsgroup: comp.lang.cobol

    Someone created a framework for evaluating LLM's ability
    to write Cobol.

    https://bloop.ai/blog/evaluating-llms-on-cobol

    For those that do not bother reading the entire article,
    then the conclusion at the bottom is:

    <quote>
    GPT-4 - the best-performing model - generates a correct solution for
    10.27% of problems. Compare this to HumanEval, where it solves 67% of problems. CodeLlama, one of the best open-source coding models, fares
    even worse, with the 34b variant only clocking 2%. COBOLEval is hard.

    Looking at the failure cases, we can see that state-of-the-art LLMs
    struggle to generate COBOL that even compiles. Only 47.94% of GPT-4
    generated solutions compile with GnuCOBOL.
    </quote>

    Arne



    --- Synchronet 3.20a-Linux NewsLink 1.114