November 2023

Benchmarks for ChatGPT & Co:

These November benchmarks evaluate GPT-4 Turbo, the latest GPT3.5 and introduce Mistral OpenChat 7B.

Trustbit Leaderboard November 2023

The Trustbit benchmarks evaluate the models in terms of their suitability for digital product development. The higher the score, the better.

☁️ - Cloud models with proprietary license
✅ - Open source models that can be run locally without restrictions
🦙 - Local models with Llama license

  
    


  
  
  


    
      
          model
          code
          crm
          docs
          integrate
          marketing
          reason
          final 🏆
          Cost
          Speed
        

      
          GPT-4 v1/0314 ☁️
          85
          88
          95
          52
          88
          50
          76
          7.18 €
          0.77 rps
        

GPT-4 Turbo v3/1106-preview ☁️
54
75
98
52
88
62
71
2.52 €
0.66 rps

GPT-3.5 v2/0613 ☁️
62
79
76
75
81
48
70
0.35 €
0.96 rps

GPT-3.5 v3/1106 ☁️
56
68
71
63
78
59
66
0.24 €
2.33 rps

GPT-3.5-instruct 0914 ☁️
51
90
69
60
88
32
65
0.36 €
2.35 rps

GPT-3.5 v1/0301 ☁️
38
75
67
67
82
38
61
0.36 €
1.76 rps

Mistral 7B OpenChat-3.5 f16 ✅
53
72
72
49
88
31
61
0.59 €
1.85 rps

Llama2 70B Hermes b8🦙
48
76
46
76
62
36
58
13.10 €
0.13 rps

Mistral 7B Instruct f16 ✅
36
68
68
44
74
36
54
0.68 €
1.60 rps

Mistral 7B OpenOrca f16 ✅
42
57
76
21
78
26
50
0.55 €
1.98 rps

Llama2 13B Hermes b8🦙
39
20
29
61
60
43
42
5.71 €
0.19 rps

Llama2 70B chat b4🦙
13
51
53
29
64
27
40
4.06 €
0.27 rps

Llama2 13B Hermes f16🦙
32
15
30
51
56
43
38
0.57 €
1.93 rps

Llama2 13B Vicuna-1.5 f16🦙
36
25
27
18
77
43
38
0.78 €
1.39 rps

Llama2 70B chat b8🦙
1
53
34
27
71
27
36
10.24 €
0.16 rps

Llama2 13B Puffin b8🦙
22
9
34
31
56
39
32
8.29 €
0.13 rps

  Llama2 13B chat f16🦙
  0
  38
  15
  30
  75
  8
  27
  0.64 €
  1.71 rps

  Mistral 7B Zephyr-β f16 ✅
  23
  34
  27
  44
  29
  4
  27
  0.60 €
  1.81 rps

  Llama2 13B chat b8🦙
  0
  38
  8
  30
  75
  8
  26
  4.01 €
  0.27 rps

  Llama2 7B chat f16🦙
  0
  33
  14
  27
  50
  20
  24
  0.65 €
  1.67 rps

  Mistral 7B f16 ✅
  8
  4
  20
  42
  52
  12
  23
  1.05 €
  1.04 rps

  Llama2 13B Puffin f16🦙
  14
  9
  9
  5
  54
  19
  18
  1.71 €
  0.64 rps

  Llama2 7B f16🦙
  0
  0
  4
  2
  28
  4
  6
  1.13 €
  0.97 rps



      
    
  

November 2023

Benchmarks for ChatGPT & Co:

Trustbit Leaderboard November 2023

The benchmark categories in detail

New ChatGPT Models Evaluated

Better language support in GPT-4 Turbo

Mistral 7B OpenChat catches up with ChatGPT 3

Beam Search Improves Accuracy of Language Models

Trustbit LLM Benchmarks Archive

You want to learn more about the use of ChatGPT and Co?