Add files via upload

This commit is contained in:
Peter Norvig 2022-10-29 15:43:10 -07:00 committed by GitHub
parent d873ed007e
commit f9ef5e1a98
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -449,23 +449,22 @@
"\n",
"# Conclusions and speculations\n",
"\n",
"Generative models can produce impressive results, both for final answers and for step-by-step reasoning. They are improving rapidly; it seems that every month sees a new improved result. But current models could be improved:\n",
"Generative models can produce impressive results, both for final answers and for step-by-step reasoning. They are improving rapidly; it seems that every month sees a new improved result. But current models could be improved. Here are some thoughts:\n",
"- They are vulnerable to reproducing poor quality training data. (I suspect the `b.pop(0)` stems from this.)\n",
"- They are good locally, but can have trouble keeping the focus all the way through a problem. (Stashing a character on the list `c` seemed like a good idea locally, but contributes nothing globally.)\n",
"- They can hallucinate incorrect statements. This is a big issue in tasks like mathematics, where the small difference between the statements \"*x* < 4\" and \"*x* > 4\" makes a big difference to the outcome. In normal natural language, there is more redundancy and less chance for a single character difference to cause such big problems.\n",
"- They need to be trained to provide trust. The Minerva model generates code, but does not generate documentation or tests that would build trust in the code.\n",
"- The majority voting method is quick and easy, but incomplete. A better architecture would be to force consensus: if different runs produce different final answers, the system should have a way to reconcile the differences, figuring how and why the minority answers were generated, and making sure that mistakes in reasoning are not repeated in the majority answer.\n",
"- The models should learn from interactions. Currently they are trained on a large corpus, then fine-tuned on a specific subject matter, and then run with appropriate prompts. If the prompt asks for step-by-step reasoning, the model can generate that, but then it doesn't learn anything from the process of solving the problem (whether it gets it right or wrong); every new problem posed to it is the same as the first problem. In the article [*Learning by Distilling Context*](https://arxiv.org/abs/2209.15189), the authors suggest an approach where a model is conditioned to predict the final answer and the step-by-step reasoning, given the problem description and the prompting instructions (such as \"show your reasoning step by step\"). The system is then fine-tuned to predict the final answer from the problem desscription, without seeing any prompting instructions or step-by-step reasoning. This approach has an interesting parallel to the [Dreyfus model of skill acquisition](https://www.bumc.bu.edu/facdev-medicine/files/2012/03/Dreyfus-skill-level.pdf), in which novices work by the rote application of rules. This works in routine situations, but the novice does not have a complete understanding of the contexts in which the rules will not apply. An expert uses their situational experience to arrive at a solution without the explicit application of rules. So the fine-tuning in this architecture can be seen as a process of building contextual arrangement and compiling step-by-step rules into immediate action.\n",
"- The enoder-decoder transformer model was designed for dealing with natural language, for which we don't know the true grammar; exceptions are more common than rules; and the acceptability of sentences is subjective and varies from person to person, place to place, and time to time. But none of those things apply to formal languages such as Python. We know exactly what the rules for a valid program are, yet we don't have a good way of incorporating that knowledge into the transformer model. Certainly we still need something like the transformer model, because we need to know that the variable name `i` usually references an integer, while the pair `(x, y)` often references a point in 2D space, and so on. These things are not mentioned in the formal grammar of Python. An approach that could combine the formal grammar rules and the learned transformer model would be welcome.\n",
"- The eminent computer scientist Edsger Dijkstra predicted that machine learning, especially with gradient descent, could never be applied to programming, writing \"*In the discrete world of computing, there is no meaningful metric in which 'small' changes and 'small' effects go hand in hand, and there never will be.*\" Systems like AlphaCode have proven him partially wrong, but further progress would be easier if our programming languages were designed in such a way that the space of programs could be more easily explored by making small changes, and if it was faster to evaluate the quality of a program. Perhaps we'd be better off with functional languages that facilitate caching of intermediate results, so that when a small change is suggested, recomputing the program mostly uses precomputed results.\n",
"- In modern software development many artifacts are produced. There's the code, but also documentation, test suites, design documents, performance timing results, user experience experiments and results, traces of user interactions, and so on. And then there's a machine learning model. We can optimize the machine learning model by feeding it inputs, examining the outputs, and modifying the model to minimize the loss between the expected and observed outputs. This is possible because the model is differentiable. But the machine learning model is just a small part of the overall software development process. If all the other parts could be incorporated into an end-to-end differntiable model, the process of evolving the system would be easier. Consider the scenario where the user experience researchers do an experiment comparring ten different user interfaces, and determine which one is best. The engineers then go implement that UI. Sometime later, the world changes: maybe the blend of users is different, maybe users migrate to devices with a different screen size. What would trigger an update to the UI? today, we rely on institutional memory: someone says, \"Hey, I remember that UX study a few years back; maybe we should look at it again and see if a different UI would be better.\" But if the experiment documents and everything else were all in an end-to-end model, then the model itself could detect when a change is warranted. Building languages that allow for the incorporation of all these different kinds of documents is a challenge for the future.\n"
"- In modern software development many artifacts are produced. There's the code, but also documentation, test suites, design documents, performance timing results, user experience experiments and results, traces of user interactions, and so on. And then there's a machine learning model. We can optimize the machine learning model by feeding it inputs, examining the outputs, and modifying the model to minimize the loss between the expected and observed outputs. This is possible because the model is differentiable. But the machine learning model is just a small part of the overall software development process. If all the other parts could be incorporated into an end-to-end differntiable model, the process of evolving the system would be easier. Consider the scenario where the user experience researchers do an experiment comparring ten different user interfaces, and determine which one is best. The engineers then go implement that UI. Sometime later, the world changes: maybe the blend of users is different, maybe users migrate to devices with a different screen size. What would trigger an update to the UI? today, we rely on institutional memory: someone says, \"Hey, I remember that UX study a few years back; maybe we should look at it again and see if a different UI would be better.\" But if the experiment documents and everything else were all in an end-to-end model, then the model itself could detect when a change is warranted. Building languages that allow for the incorporation of all these different kinds of documents is a challenge for the future.\n",
"- I'd prefer to find a way to train on complete software systems, with all the documentation, etc. Failing that, maybe we could find a better training set of shorter programs. I'm not a big fan of the programming contest training set, because I believe that programming contest problems have some unusual properties that don't hold for real-world problems. [Kevin Wang](https://blog.kevmo314.com/), a programming contest champion, told me a few of his tricks:\n",
" - \"*I save the most time by just observing that a problem is an adaptation of a common problem. For a problem like [2016 day 10](http://adventofcode.com/2016/day/10), it's just topological sort.*\" This suggests that the contest problems have a bias towards retrieving an existing solution (and adapting it) rather than synthesizing a new solution.\n",
" - \"*I think specifically for [AoC](https://adventofcode.com/) it's important to just read the input/output and skip all the instructions first. Especially for the first few days, you can guess what the problem is based on the sample input/output.*\" Kevin is saying that the input/output examples alone are sufficient to solve the easier problems; in the interest of speed he doesn't even read the problem description. I don't want to encourage a programming assistant that learns not to read the problem.\n",
" - \"*I also try to minimize the amount of code I write: each line of code is just another chance for a typo.*\" This is why AlphaCode learned to write code with one-letter variable names, and with no comments or docstrings.\n",
" - \"*Ultimately though it just comes down to a lot of practice and there's not really any secret tricks.*\" Again, this suggests Kevin is doing fast pattern recognition rather than slow case-by-case analysis and proof. That approach works great for programming contests, but maybe not as well for real-world problems.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {