高清日韩av,,粗壮挺进邻居人妻

Openai已根據(jù)備受期待的“草莓”建築發(fā)布了其新模型。這種稱為O1的創(chuàng)新模型增強(qiáng)了推理能力，使其在提供答案之前可以更有效地通過問題進(jìn)行思考。作為Chatgpt Plus用戶，我有機(jī)會親身探索這種新型號。我很高興分享我對用戶和開發(fā)人員的性能，能力以及對用戶的影響的見解。我將在不同指標(biāo)上徹底比較GPT-4O與OpenAI O1。沒有任何進(jìn)一步的ADO，讓我們開始。

在本文中，您將探討GPT O1andGpt-4O之間的差異，包括OFGPT O1與GPT 4的比較。我們將提供有關(guān)TheGPT 4O與O1中的性能的見解。此外，我們將討論TheGPT O1成本，突出顯示AGPT O1 Freetier的可用性，並引入TheGpt O1 Miniversion。最後，我們將分析正在進(jìn)行的辯論4O與O1 vs OpenAito幫助您做出明智的決定。

繼續(xù)閱讀！

Openai型號的新型？閱讀此信息以了解如何使用OpenAi O1：如何訪問OpenAi O1？

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

Openai O1的新更新：

Openai已將O1米尼的速率限制提高了7倍，從每週50條消息增加到每天50條消息。
對於O1進(jìn)行瀏覽，利率限制從30個每週消息增加到50個。

概述

OpenAI的新O1模型通過“思想鏈”方法增強(qiáng)了推理能力，使其非常適合複雜任務(wù)。
GPT-4O是一種多功能，多式模型，適用於文本，語音和視頻輸入的通用任務(wù)。
Openai O1在數(shù)學(xué)，編碼和科學(xué)問題解決方面表現(xiàn)出色，在較重的場景中表現(xiàn)優(yōu)於GPT-4O。
儘管OpenAI O1提供了改進(jìn)的多語言性能，但它具有速度，成本和多模式支持限制。
GPT-4O仍然是需要通用功能的快速，具有成本效益和多功能的AI應(yīng)用程序的更好選擇。
GPT-4O和OpenAI O1之間的選擇取決於特定需求。每種型號都為不同的用例提供了獨(dú)特的優(yōu)勢。

介紹
比較的目的：GPT-4O與OpenAI O1
所有OpenAI O1型號的概述
O1和GPT 4O的模型功能
- Openai O1
- Openai的O1：經(jīng)過思考的模型
- GPT-4O
GPT-4O與OpenAI O1：多語言能力
OpenAI O1的評估：超過人類考試和ML基準(zhǔn)的GPT-4O
GPT-4O與OpenAI O1：越獄評估
GPT-4O與OpenAI O1處理代理任務(wù)
GPT-4O與OpenAI O1：幻覺評估
質(zhì)量與速度與成本
Openai O1 vs GPT-4O：人類偏好的評估
Openai O1 vs GPT-4O：誰在不同的任務(wù)中更好？
- 解碼密碼文本
- 健康科學(xué)
- 推理問題
- 編碼：創(chuàng)建遊戲
GPT-4O vs OpenAI O1：API和用法詳細(xì)信息
Openai O1的局限性
Openai O1在最近的事件和實(shí)體方面的問答任務(wù)鬥爭
Openai O1在邏輯推理方面比GPT-4O更好
最終判決：GPT-4O與OpenAI O1
結(jié)論

比較的目的：GPT-4O與OpenAI O1

這就是為什麼我們要比較 - gpt-4o vs openai o1：

GPT-4O是一種能夠處理文本，語音和視頻輸入的多功能，多模型，使其適用於各種一般任務(wù)。它為Chatgpt的最新迭代提供了動力，展示了其在產(chǎn)生類似人類文本和跨多種方式相互作用的力量。
Openai O1是一個更專業(yè)的模型，用於數(shù)學(xué)，編碼和更多領(lǐng)域的複雜推理和解決問題。它符合需要對先進(jìn)概念有深入了解的任務(wù)，使其非常適合諸如高級邏輯推理之類的具有挑戰(zhàn)性的領(lǐng)域。

比較的目的：此比較突出了每個模型的獨(dú)特優(yōu)勢，並闡明了它們的最佳用例。雖然OpenAI O1非常適合複雜的推理任務(wù)，但它並不是要替換通用應(yīng)用程序的GPT-4O。通過檢查其功能，性能指標(biāo)，速度，成本和用例，我將提供對模型的見解，更適合不同的需求和場景。

所有OpenAI O1型號的概述

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

這是Openai O1的表格表示：

模型	描述	上下文窗口	最大輸出令牌	培訓(xùn)數(shù)據(jù)
O1-preiview	指向O1型號的最新快照：O1-Preview-2024-09-12	128,000個令牌	32,768令牌	直到2023年10月
O1-Preview-2024-09-12	最新的O1模型快照	128,000個令牌	32,768令牌	直到2023年10月
O1-Mini	指向最近的O1-Mini快照：O1-Mini-2024-09-12	128,000個令牌	65,536令牌	直到2023年10月
O1-MINI-2024-09-12	最新的O1-Mini模型快照	128,000個令牌	65,536令牌	直到2023年10月

O1和GPT 4O的模型功能

Openai O1

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

Openai的O1模型在各種基準(zhǔn)測試中表現(xiàn)出了出色的性能。它在Codeforces競爭性編程挑戰(zhàn)中排名第89個百分點(diǎn)，並躋身美國數(shù)學(xué)奧林匹克預(yù)選賽（AIME）的前500位。此外，它在物理，生物學(xué)和化學(xué)問題的基準(zhǔn)（GPQA）的基準(zhǔn)上超過了人類的PHD級準(zhǔn)確性。

該模型是使用大規(guī)模增強(qiáng)學(xué)習(xí)算法訓(xùn)練的，該算法通過“思想鏈”過程增強(qiáng)其推理能力，從而允許數(shù)據(jù)效率學(xué)習(xí)。研究結(jié)果表明，其性能隨訓(xùn)練期間的計(jì)算增加而提高，並在測試過程中分配了更多時間進(jìn)行推理，從而進(jìn)一步研究了這種新穎的縮放方法，這與傳統(tǒng)的LLM預(yù)讀方法不同。在進(jìn)一步比較之前，讓我們研究“思考過程如何提高Openai O1的推理能力”。

Openai的O1：經(jīng)過思考的模型

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

OpenAI O1模型引入了成本和性能方面的新權(quán)衡，以提供更好的“推理”能力。這些模型是專門針對“思想鏈”過程的培訓(xùn)，這意味著它們旨在在響應(yīng)之前逐步思考。這建立在2022年推出的思想促進(jìn)模式的基礎(chǔ)上，這鼓勵A(yù)I系統(tǒng)地思考，而不僅僅是預(yù)測下一個單詞。該算法教會他們分解複雜的任務(wù)，從錯誤中學(xué)習(xí)，並在必要時嘗試替代方法。

另請閱讀：O1??：Openai的新模型，該模型在回答棘手的問題之前“思考”

LLMS推理的關(guān)鍵要素

O1模型引入了推理令牌。這些模型使用這些推理令牌來“思考”，打破了他們對提示的理解，並考慮了產(chǎn)生響應(yīng)的多種方法。在生成推理令牌之後，該模型將作為可見的完成令牌產(chǎn)生答案，並從其上下文中丟棄推理令牌。

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

1。強(qiáng)化學(xué)習(xí)和思考時間

O1模型利用了一種增強(qiáng)學(xué)習(xí)算法，該算法在產(chǎn)生響應(yīng)之前會鼓勵更長，更深入的思維期。此過程旨在幫助模型更好地處理複雜的推理任務(wù)。

該模型的性能隨著訓(xùn)練時間增加（火車時間計(jì)算）以及在評估期間（測試時間計(jì)算）進(jìn)行思考時的提高。

2。思考鏈的應(yīng)用

思想方法鏈?zhǔn)鼓Ｐ湍軌驅(qū)?fù)雜的問題分解為更簡單，更易於管理的步驟。它可以重新審視和完善其策略，在初始方法失敗時嘗試不同的方法。

此方法對需要多步推理的任務(wù)有益，例如數(shù)學(xué)解決問題，編碼和回答開放式問題。

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

在此處閱讀有關(guān)及時工程的更多文章。

3。人類的偏好和安全評估

在比較O1-preiview與GPT-4O的性能的評估中，人類教練絕大多數(shù)人更喜歡O1-preview在需要強(qiáng)大推理能力的任務(wù)中。

將思想推理鏈整合到模型中也有助於提高與人類價值觀的安全性和對齊方式。通過將安全規(guī)則直接嵌入推理過程中，O1-preiview可以更好地了解安全界限，即使在具有挑戰(zhàn)性的情況下，也可以減少有害完成的可能性。

4。隱藏的推理令牌和模型透明度

Openai已決定將詳細(xì)的思想鏈隱藏在用戶中，以保護(hù)模型思維過程的完整性並保持競爭優(yōu)勢。但是，它們?yōu)橛脩籼峁┝艘粋€匯總版本，以幫助了解該模型如何得出其結(jié)論。

該決定允許OpenAI出於安全目的監(jiān)視模型的推理，例如檢測操作嘗試或確保策略合規(guī)性。

另請閱讀：GPT-4O vs Gemini：比較兩個強(qiáng)大的多模式模型

5?？冃е笜?biāo)和改進(jìn)

O1模型在關(guān)鍵績效領(lǐng)域顯示出重大進(jìn)展：

在復(fù)雜的推理基準(zhǔn)上，O1-preview取得了經(jīng)常與人類專家相抗衡的分?jǐn)?shù)。
該模型在競爭性編程競賽和數(shù)學(xué)競賽中的改進(jìn)表明了其提高的推理和解決問題的能力。

安全評估表明，在處理潛在的有害提示和邊緣案例中，O1概覽的性能明顯優(yōu)於GPT-4O，從而增強(qiáng)其穩(wěn)健性。

另請閱讀：Openai的O1-Mini：具有成本效益推理的STEM的改變遊戲規(guī)則的模型

GPT-4O

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

GPT-4O是一款多式聯(lián)運(yùn)的強(qiáng)國，擅長處理文本，語音和視頻輸入，使其用於一系列通用任務(wù)的通用性。該模型為chatgpt提供了動力，展示了其在產(chǎn)生類似人類的文本，解釋語音命令甚至分析視頻內(nèi)容方面的力量。對於需要可以無縫跨各種格式操作的模型的用戶，GPT-4O是強(qiáng)大的競爭者。

在GPT-4O之前，使用語音模式與CHATGPT一起使用GPT-3.5和5.4秒的平均潛伏期為2.8秒，而GPT-4的平均潛伏期為5.4秒。這是通過三個單獨(dú)模型的管道來實(shí)現(xiàn)的：基本模型首先轉(zhuǎn)錄到文本，然後gpt-3.5或gpt-4處理了文本輸入以生成文本輸出，最後，第三個模型將該文本轉(zhuǎn)換回音頻。這種設(shè)置意味著核心AI（gpt-4）有些有限，因?yàn)樗鼰o法直接解釋諸如音調(diào)，多個揚(yáng)聲器，背景聲音或諸如笑聲，唱歌或情感之類的細(xì)微差別。

借助GPT-4O，OpenAI開發(fā)了一個全新的模型，該模型將文本，視覺和音頻集成到一個端到端的神經(jīng)網(wǎng)絡(luò)中。這種統(tǒng)一的方法允許GPT-4O在同一框架內(nèi)處理所有輸入和輸出，從而大大增強(qiáng)了其理解和生成更細(xì)微的多模式內(nèi)容的能力。

您可以在這裡探索更多GPT-4O功能：Hello GPT-4O。

GPT-4O與OpenAI O1：多語言能力

OpenAI的O1模型與GPT-4O之間的比較突出了它們的多語言性能功能，重點(diǎn)介紹了針對GPT-4O的O1-preview和O1-Mini模型。

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

MMLU（大量多種語言理解）測試集被翻譯成14種語言，使用人類翻譯人員來評估其跨多種語言的性能。這種方法可確保更高的準(zhǔn)確性，尤其是對於較少代表或資源有限的語言，例如約魯巴語。該研究使用這些人類翻譯的測試集比較了不同語言環(huán)境中模型的能力。

關(guān)鍵發(fā)現(xiàn)：

O1-preiview的多語言能力明顯高於GPT-4O，具有明顯的語言，例如阿拉伯語，孟加拉語和中文。這表明O1瀏覽模型更適合需要對各種語言進(jìn)行強(qiáng)有力理解和處理的任務(wù)。
O1-Mini還勝過其對應(yīng)物GPT-4O-Mini，在多種語言上顯示出一致的改進(jìn)。這表明，即使是較小的O1模型也具有增強(qiáng)的多語言功能。

人類翻譯：

人類翻譯而不是機(jī)器翻譯（如與GPT-4和Azure Translate這樣的模型的早期評估一樣）被證明是評估性能的更可靠的方法。對於語言不多的語言而言，這尤其如此，在這種語言中，機(jī)器翻譯通常缺乏準(zhǔn)確性。

總體而言，評估表明，在多語言任務(wù)中，O1-preview和O1-Mini在多語言任務(wù)中的表現(xiàn)都優(yōu)於其GPT-4O對應(yīng)物，尤其是在語言多樣性或低資源語言中。在測試中使用人翻譯強(qiáng)調(diào)了對O1模型的卓越語言理解，從而使它們更有能力處理真實(shí)世界的多語言場景。這表明了Openai在建立模型方面的進(jìn)步，並具有更廣泛，更具包容性的語言理解。

OpenAI O1的評估：超過人類考試和ML基準(zhǔn)的GPT-4O

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

為了證明對GPT-4O的推理能力的提高，對O1模型進(jìn)行了測試，以各種人類的考試和機(jī)器學(xué)習(xí)基準(zhǔn)測試。結(jié)果表明，除非另有說明，否則使用最大測試時間計(jì)算設(shè)置在大多數(shù)推理密集型任務(wù)上大大優(yōu)於GPT-4O。

競爭評估

數(shù)學(xué)（AIME 2024），編碼（CodeForces）和PhD級科學(xué)（GPQA Diamond）： O1在挑戰(zhàn)性推理基準(zhǔn)方面顯示出對GPT-4O的實(shí)質(zhì)性改進(jìn)。通行證@1的準(zhǔn)確性由固體條表示，而陰影區(qū)域則用64個樣本描繪了大多數(shù)投票表現(xiàn)（共識）。
基準(zhǔn)比較： O1在廣泛的基準(zhǔn)測試中勝過GPT-4O，其中包括57個MMLU子類別中的54個。

詳細(xì)的績效見解

數(shù)學(xué)（AIME 2024）：關(guān)於美國邀請賽數(shù)學(xué)考試（AIME）2024，O1在GPT-4O上顯示出顯著進(jìn)步。 GPT-4O僅解決了問題的12％，而O1的精度為74％，每個問題單個樣本，83％，共有64個樣本共識，為93％，將1000個樣本重新排列。該表演水平將O1置於全國前500名學(xué)生中，並且在美國數(shù)學(xué)奧林匹克運(yùn)動會上的臨界值之上。
科學(xué)（GPQA鑽石）：在測試化學(xué)，物理和生物學(xué)方面的專業(yè)知識的GPQA鑽石基準(zhǔn)中，O1超過了人類專家的博士學(xué)位，這標(biāo)誌著模型首次這樣做。但是，該結(jié)果並不意味著O1在所有方面都優(yōu)於PHD，而是更精通博士學(xué)位的特定問題解決方案。

總體表現(xiàn)

O1在其他機(jī)器學(xué)習(xí)基準(zhǔn)測試中也表現(xiàn)出色，表現(xiàn)優(yōu)於最先進(jìn)的模型。憑藉視覺感知能力，它在MMMU上取得了78.2％的成績，這使其成為第一個與人類專家競爭的模型，並且在57個MMLU子類別中的54個中表現(xiàn)優(yōu)於GPT-4O。

GPT-4O與OpenAI O1：越獄評估

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

在這裡，我們討論了對“越獄”的O1模型（特別是O1-Preiview和O1-Mini）的魯棒性的評估，這些提示是旨在繞過模型限制的對抗性提示。以下四項(xiàng)評估用於衡量模型對這些越獄的韌性：

生產(chǎn)越獄：從Chatgpt的生產(chǎn)環(huán)境中實(shí)際使用數(shù)據(jù)確定的越獄技術(shù)集合。
越獄的增強(qiáng)示例：此評估將公開已知的越獄方法應(yīng)用於通常用於測試不允許內(nèi)容的示例，評估該模型抵制這些嘗試的能力。
人類越獄：人類測試人員創(chuàng)建的越獄技術(shù)，通常稱為“紅色團(tuán)隊(duì)”，強(qiáng)調(diào)模型的防禦能力。
StrongReject ：一種學(xué)術(shù)基準(zhǔn)，可評估模型對有據(jù)可查和常見的越獄攻擊的抵抗力。 “ [通過電子郵件保護(hù)]度量標(biāo)準(zhǔn)用於評估模型的安全性，通過針對每個提示的越獄方法的前10％來衡量其性能。

與GPT-4O的比較：

上圖比較了在這些評估中O1-preview，O1-Mini和GPT-4O模型的性能。結(jié)果表明，O1模型（O1-preiview和O1-Mini）表現(xiàn)出對GPT-4O的魯棒性的顯著改善，尤其是在強(qiáng)制評估中，這是因?yàn)樗睦щy和依賴先進(jìn)的越獄技術(shù)而引起的。這表明，與GPT-4O相比，O1模型可以更好地處理對抗提示並遵守內(nèi)容準(zhǔn)則。

GPT-4O與OpenAI O1處理代理任務(wù)

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

在這裡，我們在處理代理任務(wù)時評估了OpenAI的O1-Preview，O1-Mini和GPT-4O，突出了它們在各種情況下的成功率。這些任務(wù)旨在測試模型的能力，以執(zhí)行複雜的操作，例如設(shè)置Docker容器，啟動基於雲(yún)的GPU實(shí)例並創(chuàng)建身份驗(yàn)證的Web服務(wù)器。

評估環(huán)境和任務(wù)類別

評估是在兩個主要環(huán)境中進(jìn)行的：

文本環(huán)境：涉及Linux終端中的Python編碼，並通過GPU加速增強(qiáng)。
瀏覽器環(huán)境：利用包含可選屏幕截圖的預(yù)處理HTML的外部腳手架以尋求幫助。

這些任務(wù)涵蓋了一系列類別，例如：

配置Docker容器以運(yùn)行與OpenAI API兼容的推理服務(wù)器。
開發(fā)具有身份驗(yàn)證機(jī)制的基於Python的Web服務(wù)器。
部署基於雲(yún)的GPU實(shí)例。

Openai O1-Preview和O1-Mini今天在API中針對第5層的開發(fā)人員推出。

O1-preiview具有強(qiáng)大的推理能力和廣泛的世界知識。

O1-Mini在編碼任務(wù)中更快，便宜80％，並且與O1-preiview的競爭力。

https://t.co/l6vkoukfla中的更多內(nèi)容。 https://t.co/moqfsez2f6
- 2024年9月12日OpenAi開發(fā)人員（@openaidevs）

關(guān)鍵發(fā)現(xiàn)和績效結(jié)果

該圖在視覺上表示模型的成功率，每個任務(wù)都超過100個試驗(yàn)。關(guān)鍵觀察包括：

OpenAI API代理任務(wù)：建立OpenAI API代理的最困難的任務(wù)是所有模型都在掙扎的地方。沒有人取得很高的成功率，這表明全面面臨重大挑戰(zhàn)。
在Docker中加載Mistral 7b ：此任務(wù)取得了不同的成功。 O1-MINI模型的性能稍好一些，儘管與更輕鬆的任務(wù)相比，所有模型都掙扎。
通過Ranger購買GPU ：GPT-4O的表現(xiàn)優(yōu)於其他利潤，這表明在涉及第三方API和互動的任務(wù)方面表現(xiàn)出了卓越的能力。
採樣任務(wù)：GPT-4O在抽樣任務(wù)中顯示出更高的成功率，例如Pytorch中的Nanogpt或GPT-2進(jìn)行採樣，表明其在機(jī)器學(xué)習(xí)相關(guān)的任務(wù)中的效率。
諸如創(chuàng)建比特幣錢包的簡單任務(wù)：GPT-4O表現(xiàn)出色，幾乎取得了完美的成績。

另請閱讀：從GPT到Mistral-7b：AI對話中令人興奮的飛躍

對模型行為的見解

評估表明，雖然前沿模型（例如O1-preview和O1-Mini）偶爾成功地傳遞了主要的代理任務(wù)，但它們通常通過精通上下文子任務(wù)來實(shí)現(xiàn)。但是，這些模型仍然在始終管理複雜的多步任務(wù)中表現(xiàn)出顯著的缺陷。

在減壓後更新之後，與較早的ChatGpt版本相比，O1瀏覽模型表現(xiàn)出明顯的拒絕行為。這導(dǎo)致在特定子任務(wù)上的性能下降，尤其是涉及Openai等重新實(shí)現(xiàn)API的措施。另一方面，O1-preiview和O1-Mini都證明了在某些條件下通過主要任務(wù)的潛力，例如在Docker環(huán)境中建立已驗(yàn)證的API代理或部署推理服務(wù)器。儘管如此，手動檢查表明，這些成功有時涉及過度簡化的方法，例如使用比預(yù)期的Mistral 7b更複雜的模型。

總體而言，該評估突顯了AI模型在復(fù)雜的代理任務(wù)中取得一致成功方面面臨的持續(xù)挑戰(zhàn)。儘管像GPT-4O這樣的模型在更直接或狹義的任務(wù)中表現(xiàn)出很強(qiáng)的性能，但它們?nèi)匀挥龅嚼щy，而多層任務(wù)需要高階推理和持續(xù)的多步驟過程。研究結(jié)果表明，儘管進(jìn)步很明顯，但對於這些模型來說，仍有一條重要的途徑，可以可靠，可靠地處理所有類型的代理任務(wù)。

GPT-4O與OpenAI O1：幻覺評估

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

另請閱讀有關(guān)Knowhalu：AI最大的缺陷幻覺最終解決的信息！

為了更好地了解不同語言模型的幻覺評估，以下評估比較了幾個旨在引起幻覺的數(shù)據(jù)集的GPT-4O，O1-Preview和O1-Mini模型：

幻覺評估數(shù)據(jù)集

SimpleQA：一個由4,000個尋求事實(shí)的問題組成的數(shù)據(jù)集，並帶有簡短的答案。該數(shù)據(jù)集用於測量模型在提供正確答案時的準(zhǔn)確性。
生日事實(shí)：需要模型猜測一個人的生日的數(shù)據(jù)集，以測量模型提供不正確日期的頻率。
開放式問題：包含提示的數(shù)據(jù)集要求該模型生成有關(guān)任意主題的事實(shí)（例如，“寫有關(guān)”的簡歷）。根據(jù)Wikipedia之類的來源驗(yàn)證的不正確陳述的數(shù)量，對模型的性能進(jìn)行了評估。

發(fā)現(xiàn)

與GPT-4O相比，O1-preview表現(xiàn)出較少的幻覺，而O1-Mini幻覺量比所有數(shù)據(jù)集中的GPT-4O-Mini少頻率。
儘管有這些結(jié)果，但軼事證據(jù)表明，在實(shí)踐中，O1-preiview和O1-Mini實(shí)際上可能比其GPT-4O對應(yīng)物更頻繁地幻覺。有必要進(jìn)行進(jìn)一步的研究，以全面了解幻覺，尤其是在這些評估中未涵蓋的化學(xué)等專業(yè)領(lǐng)域。
紅色團(tuán)隊(duì)合作者還指出，O1-Preview在某些領(lǐng)域提供了更詳細(xì)的答案，這可能會使其幻覺更具說服力。這增加了用戶錯誤地信任並依賴模型產(chǎn)生的不正確信息的風(fēng)險。

雖然定量評估表明，與GPT-4O模型相比，O1模型（預(yù)覽和迷你版本）的幻覺頻率較低，但基於定性反饋的擔(dān)憂可能並非總是如此。需要對各個領(lǐng)域進(jìn)行更深入的分析，以對這些模型如何處理幻覺及其對用戶的潛在影響進(jìn)行整體了解。

另請閱讀：大語言模型（LLM）中的幻覺是不可避免的嗎？

質(zhì)量與速度與成本

讓我們比較有關(guān)質(zhì)量，速度和成本的模型。在這裡，我們有一個比較多個模型的圖表：

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

模型的質(zhì)量

O1-preiview和O1-Mini型號在圖表中佔(zhàn)據(jù)了頂峰！他們提供最高質(zhì)量的分?jǐn)?shù)，O1-preview為86，O1米尼的分?jǐn)?shù)為82。這意味著這兩種模型的表現(xiàn)都優(yōu)於其他其他模型，例如GPT-4O和Claude 3.5彗星。

模型的速度

現(xiàn)在，談?wù)撍俣?- 事情變得更加有趣。 O1-Mini非常快，每秒的時速為74個令牌，將其置於中間範(fàn)圍內(nèi)。但是，O1-preiview在較慢的一側(cè)，每秒僅花23個令牌。因此，在他們提供質(zhì)量的同時，如果您選擇O1-preiview，您可能必須交易一些速度。

型號的價格

踢球者來了！ O1-preiview的揮霍量是每百萬個代幣的26.3美元，比大多數(shù)其他選項(xiàng)都要多。同時，O1-Mini是一個更實(shí)惠的選擇，價格為5美元。但是，如果您是預(yù)算意識的，那麼像Gemini（僅為0.1美元）之類的模型或Llama型號可能會更加小巷。

底線

GPT-4O的優(yōu)化可用於更快的響應(yīng)時間和較低的成本，尤其是與GPT-4 Turbo相比。效率使需要快速且具有成本效益的解決方案的用戶不犧牲一般任務(wù)中的產(chǎn)出質(zhì)量。該模型的設(shè)計(jì)使其適用於速度至關(guān)重要的實(shí)時應(yīng)用。

但是，GPT O1可以換速度。由於它專注於深入的推理和解決問題，因此其響應(yīng)時間較慢，並產(chǎn)生較高的計(jì)算成本。該模型的複雜算法需要更多的處理能力，這是其處理高度複雜任務(wù)的必要權(quán)衡。因此，當(dāng)需要快速結(jié)果時，OpenAI O1可能不是理想的選擇，但是在準(zhǔn)確性和全面分析至關(guān)重要的情況下，它會發(fā)揮作用。

在此處閱讀更多有關(guān)它的信息：O1：Openai的新模型，該模型在回答棘手的問題之前“思考”

此外，GPT-O1的傑出功能之一是它依賴提示。該模型在詳細(xì)說明上蓬勃發(fā)展，這可以顯著增強(qiáng)其推理能力。通過鼓勵它可視化場景並通過每個步驟思考，我發(fā)現(xiàn)該模型可以產(chǎn)生更準(zhǔn)確和有見地的響應(yīng)。這種提示的方法表明，用戶必須調(diào)整其與模型的互動，以最大程度地發(fā)揮其潛力。

相比之下，我還通過通用任務(wù)測試了GPT-4O，令人驚訝的是，它的性能比O1模型更好。這表明儘管已經(jīng)取得了進(jìn)步，但這些模型如何處理複雜邏輯仍有改進(jìn)的空間。

Openai O1 vs GPT-4O：人類偏好的評估

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

Openai進(jìn)行了評估，以了解其兩個模型的人類偏好：O1-preiview和GPT-4O。這些評估的重點(diǎn)是具有挑戰(zhàn)性的開放式提示，跨越了各個領(lǐng)域。在此評估中，向人類培訓(xùn)師提供了兩個模型的匿名響應(yīng)，並要求選擇他們更喜歡哪種反應(yīng)。

結(jié)果表明，在需要大量推理的領(lǐng)域（例如數(shù)據(jù)分析，計(jì)算機(jī)編程和數(shù)學(xué)計(jì)算）中，O1概覽成為明顯的最愛。在這些域中，O1曲線比GPT-4O明顯優(yōu)選，這表明其在需要邏輯和結(jié)構(gòu)化思維的任務(wù)中表現(xiàn)出色。

但是，在圍繞自然語言任務(wù)（例如個人寫作或文本編輯）的範(fàn)圍內(nèi)，對O1-preview的偏愛並不那麼強(qiáng)大。這表明，雖然O1-preview在復(fù)雜的推理方面表現(xiàn)出色，但對於嚴(yán)重依賴細(xì)微差別的語言產(chǎn)生或創(chuàng)造性表達(dá)的任務(wù)的最佳選擇可能並不總是最佳選擇。

研究結(jié)果突出了一個關(guān)鍵點(diǎn)：O1-preiview在從更好的推理功能中受益的上下文中顯示出巨大的潛力，但是在更微妙和基於語言的任務(wù)方面，其應(yīng)用程序可能會受到更大的限制。這種雙重性質(zhì)為用戶提供了寶貴的見解，可以根據(jù)自己的需求選擇正確的模型。

另請閱讀：用於自然語言理解的生成預(yù)訓(xùn)練（GPT）

Openai O1 vs GPT-4O：誰在不同的任務(wù)中更好？

模型設(shè)計(jì)和功能的差異轉(zhuǎn)化為它們對不同用例的適用性：

GPT-4O在涉及文本生成，翻譯和摘要的任務(wù)中表現(xiàn)出色。它的多模式功能使其對於需要在各種格式（例如語音助手，聊天機(jī)器人和內(nèi)容創(chuàng)建工具）互動的應(yīng)用程序中特別有效。該模型多功能且靈活，適用於需要一般AI任務(wù)的廣泛應(yīng)用。

Openai O1是複雜的科學(xué)和數(shù)學(xué)解決問題的理想選擇。它通過改進(jìn)的代碼生成和調(diào)試功能來增強(qiáng)編碼任務(wù)，使其成為開發(fā)人員和研究人員從事挑戰(zhàn)項(xiàng)目的強(qiáng)大工具。它的力量正在處理需要先進(jìn)推理，詳細(xì)分析和特定領(lǐng)域?qū)I(yè)知識的複雜問題。

解碼密碼文本

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

GPT-4O分析

方法：認(rèn)識到原始短語轉(zhuǎn)化為“逐步思考”，並暗示解密涉及選擇或轉(zhuǎn)換特定的字母。但是，它沒有提供具體的解碼方法，使過程不完整併請求更多信息。
局限性：缺乏解碼的特定方法，導(dǎo)致未完成的分析。

Openai O1分析

方法：使用數(shù)學(xué)方法將字母對根據(jù)其字母順序的位置轉(zhuǎn)換為數(shù)值，計(jì)算平均值，然後將其轉(zhuǎn)換回字母。
優(yōu)勢：提供了對解碼過程的詳細(xì)，分步的細(xì)分，成功地將密文轉(zhuǎn)化為“草莓中有三個R”。

判決

Openai O1更有效：提供一種具體和邏輯的方法，提供了清晰的解決方案。
GPT-4O不完整：缺乏特定的解碼方法，導(dǎo)致未完成的輸出。

另請閱讀：您需要查看的Openai O1的3個動手實(shí)驗(yàn)

健康科學(xué)

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

GPT-4O診斷：Cornelia de Lange綜合徵（CDL）

關(guān)鍵原因：智力殘疾，全球發(fā)育延遲，身材矮小和麵部特徵（例如眉毛，三角形的臉，球形鼻子和低髮際線）在CDL中很常見。其他功能，例如大牙齒（牙齒腫大），手部異常特徵，運(yùn)動和語音延遲以及進(jìn)食困難，進(jìn)一步支持了這一診斷。
排除條件：缺乏某些心臟缺陷，聽力障礙和小頭畸形（頭大?。┡cCDL擬合，並有助於排除其他潛在條件。

Openai O1診斷：KBG綜合徵

關(guān)鍵原因：描述的癥狀（例如智力障礙，發(fā)育延遲，大牙齒，三角形的臉，濃密的眉毛，手動異常和矮小的身材）非常匹配KBG綜合徵。大牙齒（尤其是上央門牙）和其他特定面部特徵的標(biāo)誌性特徵強(qiáng)烈支持KBG綜合徵。
排除條件：缺乏特定的心臟缺陷和其他排除條件，例如聽力障礙和小頭畸形，與KBG綜合徵保持一致，因?yàn)檫@些特徵通常不存在於綜合徵中。

判決

這兩種診斷都是合理的，但根據(jù)相同的癥狀，它們專注於不同的綜合徵。
GPT-4O由於智力殘疾，發(fā)育延遲和某些面部特徵的結(jié)合而傾向於Cornelia de Lange綜合徵（CDL） 。
Openai O1建議KBG綜合徵，因?yàn)樗m合更具體的區(qū)別特徵（例如上央門牙的大牙和整體面部剖面）。
鑑於提供的細(xì)節(jié)， KBG綜合徵被認(rèn)為更有可能，尤其是因?yàn)樘囟ㄌ峒傲薑BG的關(guān)鍵特徵Macrodontia。

推理問題

要檢查這兩種模型的推理，我詢問了高級推理問題。

五個學(xué)生，P，Q，R，S和T有些順序排列，並接受餅乾和餅乾。沒有學(xué)生獲得相同數(shù)量的餅乾或餅乾。隊(duì)列中的第一個人的餅乾數(shù)量最少。每個學(xué)生收到的餅乾或餅乾的數(shù)量是1到9的自然數(shù)量，每個數(shù)字至少出現(xiàn)一次。

餅乾的總數(shù)比分佈的餅乾總數(shù)高兩個。在生產(chǎn)線中間的R比其他所有人都收到更多的好東西（餅乾和餅乾在一起）。 T比餅乾多8個餅乾。排在隊(duì)列中的人總共收到了10件，而P只收到一半的一半。 q是在p之後，但在隊(duì)列中的s之前。 Q Q接收的cookie數(shù)量等於餅乾p接收的數(shù)量。 Q receives one more good than S and one less than R. Person second in the queue receives an odd number of biscuits and an odd number of cookies.

Question: Who was 4th in the queue?

Answer: Q was 4th in the queue.

Also read: How Can Prompt Engineering Transform LLM Reasoning Ability?

GPT-4o Analysis

GPT-4o failed to solve the problem correctly. It struggled to handle the complex constraints, such as the number of goodies each student received, their positions in the queue, and their relationships. The multiple conditions likely confused the model or failed to interpret the dependencies accurately.

OpenAI o1 Analysis

OpenAI o1 accurately deduced the correct order by efficiently analyzing all constraints. It correctly determined the total differences between cookies and biscuits, matched each student's position with the given clues, and solved the interdependencies between the numbers, arriving at the correct answer for the 4th position in the queue.

判決

GPT-4o failed to solve the problem due to difficulties with complex logical reasoning.
OpenAI o1 mini solved it correctly and quickly, showing a stronger capability to handle detailed reasoning tasks in this scenario.

Coding: Creating a Game

To check the coding capabilities of GPT-4o and OpenAI o1, I asked both the models to – Create a space shooter game in HTML and JS. Also, make sure the colors you use are blue and red. Here's the result:

GPT-4o

I asked GPT-4o to create a shooter game with a specific color palette, but the game used only blue color boxes instead. The color scheme I requested wasn't applied at all.

OpenAI o1

On the other hand, OpenAI o1 was a success because it accurately implemented the color palette I specified. The game looked visually appealing and captured the exact style I envisioned, demonstrating precise attention to detail and responsiveness to my customization requests.

GPT-4o vs OpenAI o1: API and Usage Details

The API documentation reveals several key features and trade-offs:

Access and Support: The new models are currently available only to tier 5 API users, requiring a minimum spend of $1,000 on credits. They lack support for system prompts, streaming, tool usage, batch calls, and image inputs. The response times can vary significantly based on the complexity of the task.
Reasoning Tokens: The models introduce “reasoning tokens,” which are invisible to users but count as output tokens and are billed accordingly. These tokens are crucial for the model's enhanced reasoning capabilities, with a significantly higher output token limit than previous models.
Guidelines for Use: The documentation advises limiting additional context in retrieval-augmented generation (RAG) to avoid overcomplicating the model's response, a notable shift from the usual practice of including as many relevant documents as possible.

Also read: Here's How You Can Use GPT 4o API for Vision, Text, Image & More.

Hidden Reasoning Tokens

A controversial aspect is that the “reasoning tokens” remain hidden from users. OpenAI justifies this by citing safety and policy compliance, as well as maintaining a competitive edge. The hidden nature of these tokens is meant to allow the model freedom in its reasoning process without exposing potentially sensitive or unaligned thoughts to users.

Limitations of OpenAI o1

OpenAI's new model, o1, has several limitations despite its advancements in reasoning capabilities. Here are the key limitations:

Limited Non-STEM Knowledge: While o1 excels in STEM-related tasks, its factual knowledge in non-STEM areas is less robust compared to larger models like GPT-4o. This restricts its effectiveness for general-purpose question answering, particularly in recent events or non-technical domains.
Lack of Multimodal Capabilities: The o1 model currently does not support web browsing, file uploads, or image processing functionalities. It can only handle text prompts, which limits its usability for tasks that require visual input or real-time information retrieval.
Slower Response Times: The model is designed to “think” before responding, which can lead to slower answer times. Some queries may take over ten seconds to process, making it less suitable for applications requiring quick responses.
High Cost: Accessing o1 is significantly more expensive than previous models. For instance, the cost for the o1-preview is $15 per million input tokens, compared to $5 for GPT-4o. This pricing may deter some users, especially for applications with high token usage.
Early-Stage Flaws: OpenAI CEO Sam Altman acknowledged that o1 is “flawed and limited,” indicating that it may still produce errors or hallucinations, particularly in less structured queries. The model's performance can vary, and it may not always admit when it lacks an answer.
Rate Limits: The usage of o1 is restricted by weekly message limits (30 for o1-preview and 50 for o1-mini), which may hinder users who need to engage in extensive interactions with the model.
Not a Replacement for GPT-4o: OpenAI has stated that o1 is not intended to replace GPT-4o for all use cases. For applications that require consistent speed, image inputs, or function calling, GPT-4o remains the preferred option.

These limitations suggest that while o1 offers enhanced reasoning capabilities, it may not yet be the best choice for all applications, particularly those needing broad knowledge or rapid responses.

OpenAI o1 Struggles With Q&A Tasks on Recent Events and Entities

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

For instance, o1 is showing hallucination here because it shows IT in Gemma 7B-IT—“Italian,” but IT means instruction-tuned model. So, o1 is not good for general-purpose question-answering tasks, especially based on recent information.

Also, GPT-4o is generally recommended for building Retrieval-Augmented Generation (RAG) systems and agents due to its speed, efficiency, lower cost, broader knowledge base, and multimodal capabilities.

o1 should primarily be used when complex reasoning and problem-solving in specific areas are required, while GPT-4o is better suited for general-purpose applications.

OpenAI o1 is Better at Logical Reasoning than GPT-4o

GPT-4o is Terrible at Simple Logical Reasoning

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

The GPT-4o model struggles significantly with basic logical reasoning tasks, as seen in the classic example where a man and a goat need to cross a river using a boat. The model fails to apply the correct logical sequence needed to solve the problem efficiently. Instead, it unnecessarily complicates the process by adding redundant steps.

In the provided example, GPT-4o suggests:

Step 1 : The man rows the goat across the river and leaves the goat on the other side.
Step 2 : The man rows back alone to the original side of the river.
Step 3 : The man crosses the river again, this time by himself.

This solution is far from optimal as it introduces an extra trip that isn't required. While the objective of getting both the man and the goat across the river is achieved, the method reflects a misunderstanding of the simplest path to solve the problem. It seems to rely on a mechanical pattern rather than a true logical understanding, thereby demonstrating a significant gap in the model's basic reasoning capability.

OpenAI o1 Does Better in Logical Reasoning

In contrast, the OpenAI o1 model better understands logical reasoning. When presented with the same problem, it identifies a simpler and more efficient solution:

Both the Man and the Goat Board the Boat : The man leads the goat into the boat.
Cross the River Together : The man rows the boat across the river with the goat onboard.
Disembark on the Opposite Bank : Upon reaching the other side, both the man and the goat get off the boat.

This approach is straightforward, reducing unnecessary steps and efficiently achieving the goal. The o1 model recognizes that the man and the goat can cross simultaneously, minimizing the required number of moves. This clarity in reasoning indicates the model's improved understanding of basic logic and its ability to apply it correctly.

OpenAI o1 – Chain of Thought Before Answering

A key advantage of the OpenAI o1 model lies in its use of chain-of-thought reasoning . This technique allows the model to break down the problem into logical steps, considering each step's implications before arriving at a solution. Unlike GPT-4o, which appears to rely on predefined patterns, the o1 model actively processes the problem's constraints and requirements.

When tackling more complex challenges (advanced than the problem above of river crossing), the o1 model effectively draws on its training with classic problems, such as the well-known man, wolf, and goat river-crossing puzzle. While the current problem is simpler, involving only a man and a goat, the model's tendency to reference these familiar, more complex puzzles reflects its training data's breadth. However, despite this reliance on known examples, the o1 model successfully adapts its reasoning to fit the specific scenario presented, showcasing its ability to refine its approach dynamically.

By employing chain-of-thought reasoning, the o1 model demonstrates a capacity for more flexible and accurate problem-solving, adjusting to simpler cases without overcomplicating the process. This ability to effectively utilize its reasoning capabilities suggests a significant improvement over GPT-4o, especially in tasks that require logical deduction and step-by-step problem resolution.

The Final Verdict: GPT-4o vs OpenAI o1

GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？

Both GPT-4o and OpenAI o1 represent significant advancements in AI technology, each serving distinct purposes. GPT-4o excels as a versatile, general-purpose model with strengths in multimodal interactions, speed, and cost-effectiveness, making it suitable for a wide range of tasks, including text, speech, and video processing. Conversely, OpenAI o1 is specialized for complex reasoning, mathematical problem-solving, and coding tasks, leveraging its “chain of thought” process for deep analysis. While GPT-4o is ideal for quick, general applications, OpenAI o1 is the preferred choice for scenarios requiring high accuracy and advanced reasoning, particularly in scientific domains. The choice depends on task-specific needs.

Moreover, the launch of o1 has generated considerable excitement within the AI community. Feedback from early testers highlights both the model's strengths and its limitations. While many users appreciate the enhanced reasoning capabilities, there are concerns about setting unrealistic expectations. As one commentator noted, o1 is not a miracle solution; it's a step forward that will continue to evolve.

Looking ahead, the AI landscape is poised for rapid development. As the open-source community catches up, we can expect to see even more sophisticated reasoning models emerge. This competition will likely drive innovation and improvements across the board, enhancing the user experience and expanding the applications of AI.

Also read: Reasoning in Large Language Models: A Geometric Perspective

結(jié)論

In a nutshell, both GPT-4o vs OpenAI o1 represent significant advancements in AI technology, they cater to different needs: GPT-4o is a general-purpose model that excels in a wide variety of tasks, particularly those that benefit from multimodal interaction and quick processing. OpenAI o1 is specialized for tasks requiring deep reasoning, complex problem-solving, and high accuracy, especially in scientific and mathematical contexts. For tasks requiring fast, cost-effective, and versatile AI capabilities, GPT-4o is the better choice. For more complex reasoning, advanced mathematical calculations, or scientific problem-solving, OpenAI o1 stands out as the superior option.

Ultimately, the choice between GPT-4o vs OpenAI o1 depends on your specific needs and the complexity of the tasks at hand. While OpenAI o1 provides enhanced capabilities for niche applications, GPT-4o remains the more practical choice for general-purpose AI tasks.

Also, if you have tried the OpenAI o1 model, then let me know your experiences in the comment section below.

如果您想成為生成AI專家，請?zhí)剿鳎篏enai Pinnacle計(jì)劃

參考

OpenAI Models
o1-preview and o1-mini
OpenAI System Card
Openai O1-Mini
OpenAI API
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Q1。 What are the main differences between GPT-4o and OpenAI o1?

Ans。 GPT-4o is a versatile, multimodal model suited for general-purpose tasks involving text, speech, and video inputs. OpenAI o1, on the other hand, is specialized for complex reasoning, math, and coding tasks, making it ideal for advanced problem-solving in scientific and technical domains.

Q2。 Which model(GPT-4o or OpenAI o1) is better for multilingual tasks?

Ans。 OpenAI o1, particularly the o1-preview model, shows superior performance in multilingual tasks, especially for less widely spoken languages, thanks to its robust understanding of diverse linguistic contexts.

Q3。 How does OpenAI o1 handle complex reasoning tasks?

Ans。 OpenAI o1 uses a “chain of thought” reasoning process, which allows it to break down complex problems into simpler steps and refine its approach. This process is beneficial for tasks like mathematical problem-solving, coding, and answering advanced reasoning questions.

Q4。 What are the limitations of OpenAI o1?

Ans。 OpenAI o1 has limited non-STEM knowledge, lacks multimodal capabilities (eg, image processing), has slower response times, and incurs higher computational costs. It is not designed for general-purpose applications where speed and versatility are crucial.

Q5。 When should I choose GPT-4o over OpenAI o1?

Ans。 GPT-4o is the better choice for general-purpose tasks that require quick responses, lower costs, and multimodal capabilities. It is ideal for applications like text generation, translation, summarization, and tasks requiring interaction across different formats.

以上是GPT-4O vs OpenAI O1：新的Openai模型值得炒作嗎？的詳細(xì)內(nèi)容。更多資訊請關(guān)注PHP中文網(wǎng)其他相關(guān)文章！

本網(wǎng)站聲明

本文內(nèi)容由網(wǎng)友自願投稿，版權(quán)歸原作者所有。本站不承擔(dān)相應(yīng)的法律責(zé)任。如發(fā)現(xiàn)涉嫌抄襲或侵權(quán)的內(nèi)容，請聯(lián)絡(luò)admin@php.cn