Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

hanzceo
/
SimpleEmbed-dev1

Sentence Similarity
sentence-transformers
Safetensors
English
feature-extraction
Generated from Trainer
dataset_size:219902
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
Model card Files Files and versions
xet
Community

Instructions to use hanzceo/SimpleEmbed-dev1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • sentence-transformers

    How to use hanzceo/SimpleEmbed-dev1 with sentence-transformers:

    from sentence_transformers import SentenceTransformer
    
    model = SentenceTransformer("hanzceo/SimpleEmbed-dev1")
    
    sentences = [
        "<p dir=\"auto\"><strong>Is your feature request related to a problem? Please describe.</strong><br>\nscipy.cluster.hierarchy.linkage uses double (float64) to store and do its computation for hierarchical clustering. However, I have a very large dataset (292000x292000) that I would like to perform hclust on but my computer is RAM limited. I have 252GB RAM and I think the clustering algorithm should be able to work on my dataset when all values are stored and computed using float16s instead.</p>\n<p dir=\"auto\">For large datasets on machines with insufficient RAM to store and compute on Arrays of float64s, it would be awesome if computation could be done on a different precision float to reduce the memory footprint.</p>\n<p dir=\"auto\">Additionally, adding choices for datatypes could be very useful for almost all scipy functions.</p>\n<p dir=\"auto\"><strong>Describe the solution you'd like</strong><br>\nAllow for an argument to specify what datatype you'd like to use (e.g. np.float64, np.float32, np.float16)</p>\n<p dir=\"auto\">The argument could be like dtype='np.double' by default but changable to whatever datatype is chosen.</p>",
        "<p dir=\"auto\">One representative error:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"torch/csrc/autograd/functions/init.cpp:220:37: error: address of overloaded function 'getTupleAttr' does not match required type '_object *(_object *, void *)'\n  {(char*)&quot;output_padding&quot;, (getter)getTupleAttr&lt;ConvBackwardBackward, std::vector&lt;int&gt;, ConvParams,\n                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ntorch/csrc/autograd/functions/init.cpp:82:11: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Convert'\nPyObject* getTupleAttr(PyObject* obj, void* _unused)\"><pre class=\"notranslate\"><code class=\"notranslate\">torch/csrc/autograd/functions/init.cpp:220:37: error: address of overloaded function 'getTupleAttr' does not match required type '_object *(_object *, void *)'\n  {(char*)\"output_padding\", (getter)getTupleAttr&lt;ConvBackwardBackward, std::vector&lt;int&gt;, ConvParams,\n                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ntorch/csrc/autograd/functions/init.cpp:82:11: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Convert'\nPyObject* getTupleAttr(PyObject* obj, void* _unused)\n</code></pre></div>\n<p dir=\"auto\">The cause of the problem is <a class=\"commit-link\" data-hovercard-type=\"commit\" data-hovercard-url=\"https://github.com/pytorch/pytorch/commit/aa911939a328eff55c9b28b39ed3c43507ba8a2a/hovercard\" href=\"https://github.com/pytorch/pytorch/commit/aa911939a328eff55c9b28b39ed3c43507ba8a2a\"><tt>aa91193</tt></a>:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"   {(char*)&quot;output_padding&quot;, (getter)getTupleAttr&lt;ConvForward, std::vector&lt;int&gt;, ConvParams,\n-                                         &amp;ConvParams::output_padding, long, PyInt_FromLong&gt;, NULL, NULL, NULL},\n+                                         &amp;ConvParams::output_padding, int64_t, PyInt_FromLong&gt;, NULL, NULL, NULL},\"><pre class=\"notranslate\"><code class=\"notranslate\">   {(char*)\"output_padding\", (getter)getTupleAttr&lt;ConvForward, std::vector&lt;int&gt;, ConvParams,\n-                                         &amp;ConvParams::output_padding, long, PyInt_FromLong&gt;, NULL, NULL, NULL},\n+                                         &amp;ConvParams::output_padding, int64_t, PyInt_FromLong&gt;, NULL, NULL, NULL},\n</code></pre></div>\n<p dir=\"auto\">It seems that on clang, changing the type parameter here is sufficient to cause template instantiation to fail.</p>\n<p dir=\"auto\">Maybe the easiest way to fix this is to write a more portable version of PyInt_FromLong (and friends) which always returns <code class=\"notranslate\">int64_t</code>.</p>",
        "<p dir=\"auto\">I try the scipy ward clustering, when calculating linkage, it report follow error:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"ward_h = linkage(X, method='ward', metric='euclidean')\nPython(2557,0x7fff732cc310) malloc: *** mach_vm_map(size=18446744067627675648) failed (error code=3)\n*** error: can't allocate region\n*** set a breakpoint in malloc_error_break to debug\n---------------------------------------------------------------------------\nMemoryError                               Traceback (most recent call last)\n&lt;ipython-input-10-769ae7c53f7c&gt; in &lt;module&gt;()\n----&gt; 1 ward_h = linkage(X, method='ward', metric='euclidean')\n\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.pyc in linkage(y, method, metric)\n    652             Z = np.zeros((n - 1, 4))\n    653             _hierarchy_wrap.linkage_euclid_wrap(dm, Z, X, m, n,\n--&gt; 654                                               int(_cpy_euclid_methods[method]))\n    655     return Z\n    656 \n\nMemoryError: out of memory while computing linkage\"><pre class=\"notranslate\"><code class=\"notranslate\">ward_h = linkage(X, method='ward', metric='euclidean')\nPython(2557,0x7fff732cc310) malloc: *** mach_vm_map(size=18446744067627675648) failed (error code=3)\n*** error: can't allocate region\n*** set a breakpoint in malloc_error_break to debug\n---------------------------------------------------------------------------\nMemoryError                               Traceback (most recent call last)\n&lt;ipython-input-10-769ae7c53f7c&gt; in &lt;module&gt;()\n----&gt; 1 ward_h = linkage(X, method='ward', metric='euclidean')\n\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.pyc in linkage(y, method, metric)\n    652             Z = np.zeros((n - 1, 4))\n    653             _hierarchy_wrap.linkage_euclid_wrap(dm, Z, X, m, n,\n--&gt; 654                                               int(_cpy_euclid_methods[method]))\n    655     return Z\n    656 \n\nMemoryError: out of memory while computing linkage\n</code></pre></div>\n<p dir=\"auto\">How can I solve this?</p>\n<p dir=\"auto\">The data set I use is here: <a href=\"https://dl.dropboxusercontent.com/u/68126956/df.csv\" rel=\"nofollow\">https://dl.dropboxusercontent.com/u/68126956/df.csv</a>.</p>\n<p dir=\"auto\">Thanks.</p>",
        "<p dir=\"auto\">Make sure these boxes are checked before submitting your issue - thank you!</p>\n<ul dir=\"auto\">\n<li>[yes ] I have checked the superset logs for python stacktraces and included it here as text if any</li>\n<li>[yes ] I have reproduced the issue with at least the latest released version of superset</li>\n<li>[yes ] I have checked the issue tracker for the same issue and I haven't found one similar</li>\n</ul>\n<h3 dir=\"auto\">Superset version</h3>\n<p dir=\"auto\">0.19.1</p>\n<h3 dir=\"auto\">Expected results</h3>\n<p dir=\"auto\">I try to draw mapbox in superset. I have dataset with column Latitude and Longitude and use it in respective field.</p>\n<h3 dir=\"auto\">Actual results</h3>\n<p dir=\"auto\">TypeError: &lt;superset.connectors.druid.models.DruidMetric object at 0xefbea90&gt; is not JSON serializable</p>\n<h3 dir=\"auto\">Steps to reproduce</h3>\n<p dir=\"auto\"><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https://user-images.githubusercontent.com/13684386/31226202-8d89286c-a9d5-11e7-8d8b-eb7e9d6c4d77.png\"><img src=\"https://user-images.githubusercontent.com/13684386/31226202-8d89286c-a9d5-11e7-8d8b-eb7e9d6c4d77.png\" alt=\"togithub\" style=\"max-width: 100%;\"></a></p>\n<p dir=\"auto\">Anyone  already have the same problem?<br>\nThanks</p>"
    ]
    embeddings = model.encode(sentences)
    
    similarities = model.similarity(embeddings, embeddings)
    print(similarities.shape)
    # [4, 4]
  • Notebooks
  • Google Colab
  • Kaggle
SimpleEmbed-dev1
540 MB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 2 commits
hanzceo's picture
hanzceo
Add new SentenceTransformer model
664401b verified 1 day ago
  • .gitattributes
    1.52 kB
    initial commit 1 day ago
  • README.md
    52 kB
    Add new SentenceTransformer model 1 day ago
  • config_sentence_transformers.json
    283 Bytes
    Add new SentenceTransformer model 1 day ago
  • model.safetensors
    530 MB
    xet
    Add new SentenceTransformer model 1 day ago
  • modules.json
    156 Bytes
    Add new SentenceTransformer model 1 day ago
  • tokenizer.json
    10.1 MB
    Add new SentenceTransformer model 1 day ago