Fixed typo in AlphaCode.ipynb

2022-11-02 08:59:41 -07:00 · 2022-11-02 08:59:41 -07:00 · 7ffd9b6f1b
commit 7ffd9b6f1b
parent 99d8034d73
1 changed files with 1 additions and 1 deletions
--- a/ipynb/AlphaCode.ipynb
+++ b/ipynb/AlphaCode.ipynb
@ -453,7 +453,7 @@
    "- They are vulnerable to reproducing poor quality training data. (I suspect the `b.pop(0)` stems from this.)\n",
    "- They are good locally, but can have trouble keeping the focus all the way through a problem. (Stashing a character on the list `c` seemed like a good idea locally, but contributes nothing globally.)\n",
    "- They can hallucinate incorrect statements. This is a big issue in tasks like mathematics, where the small difference between the statements \"*x* < 4\" and \"*x* > 4\" makes a big difference to the outcome. In normal natural language, there is more redundancy and less chance for a single character difference to cause such big problems.\n",
-    "- They need to be trained to provide trust. The Minerva model generates code, but does not generate documentation or tests that would build trust in the code.\n",
+    "- They need to be trained to provide trust. The AlphaCode model generates code, but does not generate documentation or tests that would build trust in the code.\n",
    "- The majority voting method is quick and easy, but incomplete. A better architecture would be to force consensus: if different runs produce different final answers, the system should have a way to reconcile the differences, figuring how and why the minority answers were generated, and making sure that mistakes in reasoning are not repeated in the majority answer.\n",
    "- The models should learn from interactions. Currently they are trained on a large corpus, then fine-tuned on a specific subject matter, and then run with appropriate prompts. If the prompt asks for step-by-step reasoning, the model can generate that, but then it doesn't learn anything from the process of solving the problem (whether it gets it right or wrong); every new problem posed to it is the same as the first problem. In the article [*Learning by Distilling Context*](https://arxiv.org/abs/2209.15189), the authors suggest an approach where a model is conditioned to predict the final answer and the step-by-step reasoning, given the problem description and the prompting instructions (such as \"show your reasoning step by step\"). The system is then fine-tuned to predict the final answer from the problem desscription, without seeing any prompting instructions or step-by-step reasoning. This approach has an interesting parallel to the [Dreyfus model of skill acquisition](https://www.bumc.bu.edu/facdev-medicine/files/2012/03/Dreyfus-skill-level.pdf), in which novices work by the rote application of rules. This works in routine situations, but the novice does not have a complete understanding of the contexts in which the rules will not apply. An expert uses their situational experience to arrive at a solution without the explicit application of rules. So the fine-tuning in this architecture can be seen as a process of building contextual arrangement and compiling step-by-step rules into immediate action.\n",
    "- The enoder-decoder transformer model was designed for dealing with natural language, for which we don't know the true grammar; exceptions are more common than rules; and the acceptability of sentences is subjective and varies from person to person, place to place, and time to time. But none of those things apply to formal languages such as Python. We know exactly what the rules for a valid program are, yet we don't have a good way of incorporating that knowledge into the transformer model.  Certainly we still need something like the transformer model, because we need to know that the variable name `i` usually references an integer, while the pair `(x, y)` often references a point in 2D space, and so on. These things are not mentioned in the formal grammar of Python. An approach that could combine the formal grammar rules and the learned transformer model would be welcome.\n",