🧠

phind-codellama-34b-v2

by phind Model ID: hf-model--phind--phind-codellama-34b-v2
FNI 6.2
Top 83%

"We've fine-tuned Phind-CodeLlama-34B-v1 on an additional 1.5B tokens high-quality programming-related data, achieving **73.8% pass@1** on HumanEval. It's the current state-of-the-art amongst open-source models. Furthermore, this model is **instruction-tuned......"

🔗 View Source
Audited 6.2 FNI Score
34B Params
4k Context
2.6K Downloads
H100+ ~28GB Est. VRAM

⚡ Quick Commands

đŸĻ™ Ollama Run
ollama run phind-codellama-34b-v2
🤗 HF Download
huggingface-cli download phind/phind-codellama-34b-v2
đŸ“Ļ Install Lib
pip install -U transformers
📊

Engineering Specs

⚡ Hardware

Parameters
34B
Architecture
LlamaForCausalLM
Context Length
4K
Model Size
125.7GB

🧠 Lifecycle

Library
-
Precision
float16
Tokenizer
-

🌐 Identity

Source
HuggingFace
License
Open Access
💾

Est. VRAM Benchmark

~28GB

Analyze Hardware

* Technical estimation for FP16/Q4 weights. Does not include OS overhead or long-context batching. For Technical Reference Only.

đŸ•¸ī¸ Neural Mesh Hub

Interconnecting Research, Data & Ecosystem

📈 Interest Trend

--

* Real-time activity index across HuggingFace, GitHub and Research citations.

No similar models found.

đŸ”ŦTechnical Deep Dive

Full Specifications [+]
---

🚀 What's Next?

⚡ Quick Commands

đŸĻ™ Ollama Run
ollama run phind-codellama-34b-v2
🤗 HF Download
huggingface-cli download phind/phind-codellama-34b-v2
đŸ“Ļ Install Lib
pip install -U transformers
đŸ–Ĩī¸

Hardware Compatibility

Multi-Tier Validation Matrix

Live Sync
🎮 Compatible

RTX 3060 / 4060 Ti

Entry 8GB VRAM
🎮 Compatible

RTX 4070 Super

Mid 12GB VRAM
đŸ’ģ Compatible

RTX 4080 / Mac M3

High 16GB VRAM
🚀 Compatible

RTX 3090 / 4090

Pro 24GB VRAM
đŸ—ī¸ Compatible

RTX 6000 Ada

Workstation 48GB VRAM
🏭 Compatible

A100 / H100

Datacenter 80GB VRAM
â„šī¸

Pro Tip: Compatibility is estimated for 4-bit quantization (Q4). High-precision (FP16) or ultra-long context windows will significantly increase VRAM requirements.

README

**Phind-CodeLlama-34B-v2**

We've fine-tuned Phind-CodeLlama-34B-v1 on an additional 1.5B tokens high-quality programming-related data, achieving 73.8% pass@1 on HumanEval. It's the current state-of-the-art amongst open-source models.

Furthermore, this model is instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use.

More details can be found on our blog post.

Model Details

This model is fine-tuned from Phind-CodeLlama-34B-v1 and achieves 73.8% pass@1 on HumanEval.

Phind-CodeLlama-34B-v2 is multi-lingual and is proficient in Python, C/C++, TypeScript, Java, and more.

Dataset Details

We fined-tuned on a proprietary dataset of 1.5B tokens of high quality programming problems and solutions. This dataset consists of instruction-answer pairs instead of code completion examples, making it structurally different from HumanEval. LoRA was not used -- both models are a native finetune. We used DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in 15 hours on 32 A100-80GB GPUs. We used a sequence length of 4096 tokens.

How to Get Started with the Model

Make sure to install Transformers from the main git branch:

pip install git+https://github.com/huggingface/transformers.git

How to Prompt the Model

This model accepts the Alpaca/Vicuna instruction format.

For example:

### System Prompt
You are an intelligent programming assistant.

### User Message
Implement a linked list in C++

### Assistant
...

How to reproduce HumanEval Results

To reproduce our results:


from transformers import AutoTokenizer, LlamaForCausalLM
from human_eval.data import write_jsonl, read_problems
from tqdm import tqdm

# initialize the model

model_path = "Phind/Phind-CodeLlama-34B-v2"
model = LlamaForCausalLM.from_pretrained(model_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)

# HumanEval helper

def generate_one_completion(prompt: str):
    tokenizer.pad_token = tokenizer.eos_token
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)

    # Generate
    generate_ids = model.generate(inputs.input_ids.to("cuda"), max_new_tokens=384, do_sample=True, top_p=0.75, top_k=40, temperature=0.1)
    completion = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    completion = completion.replace(prompt, "").split("\n\n\n")[0]

    return completion

# perform HumanEval
problems = read_problems()

num_samples_per_task = 1
samples = [
    dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"]))
    for task_id in tqdm(problems)
    for _ in range(num_samples_per_task)
]
write_jsonl("samples.jsonl", samples)

# run `evaluate_functional_correctness samples.jsonl` in your HumanEval code sandbox

Bias, Risks, and Limitations

This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.

Training details

  • Hardware Type: 32x A100-80GB
  • Hours used: 480 GPU-hours
  • Cloud Provider: AWS
  • Compute Region: us-east-1
ZEN MODE â€ĸ README

**Phind-CodeLlama-34B-v2**

We've fine-tuned Phind-CodeLlama-34B-v1 on an additional 1.5B tokens high-quality programming-related data, achieving 73.8% pass@1 on HumanEval. It's the current state-of-the-art amongst open-source models.

Furthermore, this model is instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use.

More details can be found on our blog post.

Model Details

This model is fine-tuned from Phind-CodeLlama-34B-v1 and achieves 73.8% pass@1 on HumanEval.

Phind-CodeLlama-34B-v2 is multi-lingual and is proficient in Python, C/C++, TypeScript, Java, and more.

Dataset Details

We fined-tuned on a proprietary dataset of 1.5B tokens of high quality programming problems and solutions. This dataset consists of instruction-answer pairs instead of code completion examples, making it structurally different from HumanEval. LoRA was not used -- both models are a native finetune. We used DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in 15 hours on 32 A100-80GB GPUs. We used a sequence length of 4096 tokens.

How to Get Started with the Model

Make sure to install Transformers from the main git branch:

pip install git+https://github.com/huggingface/transformers.git

How to Prompt the Model

This model accepts the Alpaca/Vicuna instruction format.

For example:

### System Prompt
You are an intelligent programming assistant.

### User Message
Implement a linked list in C++

### Assistant
...

How to reproduce HumanEval Results

To reproduce our results:


from transformers import AutoTokenizer, LlamaForCausalLM
from human_eval.data import write_jsonl, read_problems
from tqdm import tqdm

# initialize the model

model_path = "Phind/Phind-CodeLlama-34B-v2"
model = LlamaForCausalLM.from_pretrained(model_path, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_path)

# HumanEval helper

def generate_one_completion(prompt: str):
    tokenizer.pad_token = tokenizer.eos_token
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)

    # Generate
    generate_ids = model.generate(inputs.input_ids.to("cuda"), max_new_tokens=384, do_sample=True, top_p=0.75, top_k=40, temperature=0.1)
    completion = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    completion = completion.replace(prompt, "").split("\n\n\n")[0]

    return completion

# perform HumanEval
problems = read_problems()

num_samples_per_task = 1
samples = [
    dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"]))
    for task_id in tqdm(problems)
    for _ in range(num_samples_per_task)
]
write_jsonl("samples.jsonl", samples)

# run `evaluate_functional_correctness samples.jsonl` in your HumanEval code sandbox

Bias, Risks, and Limitations

This model has undergone very limited testing. Additional safety testing should be performed before any real-world deployments.

Training details

  • Hardware Type: 32x A100-80GB
  • Hours used: 480 GPU-hours
  • Cloud Provider: AWS
  • Compute Region: us-east-1

📝 Limitations & Considerations

  • â€ĸ Benchmark scores may vary based on evaluation methodology and hardware configuration.
  • â€ĸ VRAM requirements are estimates; actual usage depends on quantization and batch size.
  • â€ĸ FNI scores are relative rankings and may change as new models are added.
  • ⚠ License Unknown: Verify licensing terms before commercial use.
  • â€ĸ Source: Unknown
📜

Cite this model

Academic & Research Attribution

BibTeX
@misc{hf_model__phind__phind_codellama_34b_v2,
  author = {phind},
  title = {undefined Model},
  year = {2026},
  howpublished = {\url{https://huggingface.co/phind/phind-codellama-34b-v2}},
  note = {Accessed via Free2AITools Knowledge Fortress}
}
APA Style
phind. (2026). undefined [Model]. Free2AITools. https://huggingface.co/phind/phind-codellama-34b-v2
🔄 Daily sync (03:00 UTC)

AI Summary: Based on Hugging Face metadata. Not a recommendation.

📊 FNI Methodology 📚 Knowledge Baseâ„šī¸ Verify with original source

đŸ›Ąī¸ Model Transparency Report

Verified data manifest for traceability and transparency.

100% Data Disclosure Active

🆔 Identity & Source

id
hf-model--phind--phind-codellama-34b-v2
author
phind
tags
transformerspytorchllamatext-generationcode llamalicense:llama2model-indextext-generation-inferenceendpoints_compatibleregion:us

âš™ī¸ Technical Specs

architecture
LlamaForCausalLM
params billions
34
context length
4,096
vram gb
28
vram is estimated
true
vram formula
VRAM ≈ (params * 0.75) + 2GB (KV) + 0.5GB (OS)

📊 Engagement & Metrics

likes
833
downloads
2,584

Free2AITools Constitutional Data Pipeline: Curated disclosure mode active. (V15.x Standard)