Seems like he’s been pushed into using LLMs as a way to cope with the deluge of LLM-generated security reports.

  • the_strange@feddit.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    7 hours ago

    I write python code for a living. There is no way to sugarcoat it, the new unittests are slop. There already exists a good writeup of why, which I’m going to quote here:

    So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting. https://github.com/RsyncProject/rsync/pull/903/
    This does not “translate tests into pytest” or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to “run rsync and get the results” - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.
    If instead of telling an LLM to “rewrite the tests in python” you just searched “python testing” you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just “rewrite the tests in python” its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

    https://neuromatch.social/@jonny/116666900898570791

    Yes right - and after reading about a dozen of the test scripts I can definitely see why using pytest would be useful here to consolidate some of the behavior that was repetitive and ad-hoc in the original testing scripts. Like the tests need to do repetitive things like set up test directories with different names and structures, run and capture results, setup and teardown a server, parameterize over a range of values. Done right, a pytest suite would have made perfect sense and improved both the existing tests by making them more systematic and uniform, but also made it easier to add new tests over time. However that is not what happened, and what did happen is much worse because it did the opposite of almost all those desirable qualities.

    https://neuromatch.social/@jonny/116671260017373441

    You should read the whole thread, the author goes into more detail, as to why you cannot trust the software any more after the rewrite of the unittests and why you should avoid any new release of rsync since then.

    • Arthur Besse@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 hours ago

      One shot rewriting the whole test suite

      tridge’s blog post makes it clear that this was not “one-shotted” at all.

      You should read the whole thread

      I regret reading it; I’ll assume in good faith that it wasn’t LLM generated but it is ironically as confidently wrong as if it were.

      It almost (and should have) lost me when it started by quote-agreeing with someone else saying “rsync was basically done until the maintainer discovered vibecoding” - no, pay attention, it was not “basically done”, there were/are a mountain of CVEs!

      But then this got my interest:

      This does not “translate tests into pytest” or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to “run rsync and get the results” - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function.

      tridge says he has used pytest on other projects and had good reasons not to use it here; I’m inclined to believe him.

      But the notion of every test defining its own way to invoke rsync sounded like a valid criticism, and an easy one to verify, so I checked: It turns out that there is in fact a common run_rsync function which is used by the majority of the tests. One test defines its own _run_and_capture function (which differs in that it writes the output to a file, for reasons I didn’t investigate), and it looks like a few others invoke rsync other ways, but the majority of them use the common function.

      So, that rambling thread’s sole concrete criticism of rsync’s new python tests turns out to be false.

    • fruitcantfly@programming.dev
      link
      fedilink
      arrow-up
      0
      ·
      4 hours ago

      I write python code for a living. There is no way to sugarcoat it, the new unittests are slop. There already exists a good writeup of why, which I’m going to quote here:

      They are not unit tests, they are integration tests. Which in my experience makes unit-testing frameworks like pytest a poor fit. I’ve also had to write my own framework, for that reason, despite preferring pytest for unit-testing.

      The author also greatly exaggerates the amount of code duplication: They claim that “tests are whole python scripts that redefine basic test functions in every script”, but in reality it is less than half of the tests that even define their own functions.

      Most basic functions are imported from a shared module (rsyncfns.py), and when they aren’t it’s mostly because the code needs to do something different. From what I can see, there is some code duplication that could be moved to the shared module, and some code that could be refactored, but it’s a modest amount