Dr Cog

Dr Cog@mander.xyz · 8 months ago

They’ll get the picture in the fine letter so make sure you give them your best one-finger salute

Dr Cog@mander.xyz · edit-2 11 months ago

The site lists 4 torrents and none of them are mine. I assume this is because my ISP assigned a dynamic IP address

Dr Cog@mander.xyz · edit-2 11 months ago

- Wayne Gretsky

Dr Cog@mander.xyz · 1 year ago

Wearing traditional black clothes is not necessarily racist. Wearing blackface has a long history of being directly racist.

There isn’t an equivalent with wearing Nazi clothes.

Dr Cog@mander.xyz · 1 year ago

Removing an episode for being racist (even though this one wasn’t racist) is not cultural genocide. Wearing blackface (I know, this wasn’t blackface) is not a culture that needs to be preserved.

Dr Cog@mander.xyz · 1 year ago

Many publications on arxiv (or biorxiv or medrxiv, etc) are early drafts, or otherwise not scientifically rigorous and wouldn’t be published in an actual journal due to failing peer review. Take what you find there with a grain of salt.

Although you should also take any single peer-reviewed article with a grain of salt as well.

Dr Cog@mander.xyz · 1 year ago

Specifically, the corresponding author (which should have their email listed in the publication)

Keep in mind you may get an “uncorrected proof” or “author copy” since many authors don’t want to run afoul of their publisher’s guidelines on giving out copies

Dr Cog@mander.xyz · 1 year ago

Get rid of that middle sentence.

“I heard about garage sailing. Turns out, they’re not as buoyant as they look”

Dr Cog@mander.xyz · 1 year ago

The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.

Dr Cog@mander.xyz · 1 year ago

The argument is less that an LLM is a human and more that it is not a copyright violation to use a material to train the LLM. By current legal definitions, it is fair use unless the material is able to be reproduced in its entirety (or at least, in some meaningful way).

Dr Cog@mander.xyz · 1 year ago

It’s only black box because nobody has the time (likely years to decades) to wade through the layers of a finished model to check every node and weight.

This is exactly correct, except you’re also not accounting for the insane amount of computational power that would be necessary to backtrack a single output of a single model. This is why it is a black box. It simply is not possible on a meaningful level.

So if math and computer science isn’t an exact science, what is?

Things that are reproducible with known inputs and outputs, allowing for all components to be studied and explained. As an example from my field: if you damage the dorsolateral prefrontal cortex in a fully grown adult, they will have the impulse control of a three-year old. We know this because we have observed damage to this area in multiple individuals, and can measure the effects based on the severity of that damage.

In contrast, if you provide the same billion-parameter neural network identical inputs, you will not receive identical outputs.

Dr Cog@mander.xyz · 1 year ago

Look, I understand why you think this. I thought this too when I was first beginning to learn machine learning and data science. But I’ve now been working with machine learning models including neural networks for nearly a decade, and the truth is that is nearly impossible to track the path of an input to a given output in machine learning models other than regression-based models and decision tree-based models.

There is an entire field of data science devoted to explaining how these models arrive at their conclusions. It’s called “explainable AI” or “xAI”, and I have a few papers that I’ve published in exploring the utility of them. The basic explanation for how they work is that we run hundreds of thousands of different models and then do statistical analysis to estimate why the models arrived at their conclusion. It isn’t an exact science, however.

Dr Cog@mander.xyz · 1 year ago

You really don’t understand how these models work and you should learn about them before you make statements about them.

Machine learning models are, almost by definition, non-deterministic.

Dr Cog@mander.xyz · edit-2 1 year ago

Neither citation nor compensation are necessary for fair use, which is what occurs when an original work is used for its concepts but not reproduced.

Dr Cog@mander.xyz · 1 year ago

I agree. But that isn’t what AI is doing, because it doesn’t store the actual book and it isn’t possible to reproduce any part in a format that is recognizable as the original work.

Dr Cog@mander.xyz · 1 year ago

LOL

We understand less about how LLMs generate a single output than we do about the human brain. You clearly have no experience developing models.

Dr Cog@mander.xyz · 1 year ago

I don’t need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI

Dr Cog@mander.xyz · 1 year ago

I never said laws change quickly.

Marijuana use only recently became publicly accepted by the majority of people. As a result, it is now completely legal in many states and will likely soon be legal federally.

This kind of thing would never happen in China.

Dr Cog@mander.xyz · 1 year ago

The difference between democracy and fascism is that the laws are quickly changing due to public opinion in America, but that does not happen in China.

Dr Cog@mander.xyz · 1 year ago

We do not federate with Meta