Sentence Similarity
sentence-transformers
Safetensors
English
feature-extraction
Generated from Trainer
dataset_size:219902
loss:MatryoshkaLoss
loss:MultipleNegativesRankingLoss
Instructions to use hanzceo/SimpleEmbed-dev1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use hanzceo/SimpleEmbed-dev1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("hanzceo/SimpleEmbed-dev1") sentences = [ "<p dir=\"auto\"><strong>Is your feature request related to a problem? Please describe.</strong><br>\nscipy.cluster.hierarchy.linkage uses double (float64) to store and do its computation for hierarchical clustering. However, I have a very large dataset (292000x292000) that I would like to perform hclust on but my computer is RAM limited. I have 252GB RAM and I think the clustering algorithm should be able to work on my dataset when all values are stored and computed using float16s instead.</p>\n<p dir=\"auto\">For large datasets on machines with insufficient RAM to store and compute on Arrays of float64s, it would be awesome if computation could be done on a different precision float to reduce the memory footprint.</p>\n<p dir=\"auto\">Additionally, adding choices for datatypes could be very useful for almost all scipy functions.</p>\n<p dir=\"auto\"><strong>Describe the solution you'd like</strong><br>\nAllow for an argument to specify what datatype you'd like to use (e.g. np.float64, np.float32, np.float16)</p>\n<p dir=\"auto\">The argument could be like dtype='np.double' by default but changable to whatever datatype is chosen.</p>", "<p dir=\"auto\">One representative error:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"torch/csrc/autograd/functions/init.cpp:220:37: error: address of overloaded function 'getTupleAttr' does not match required type '_object *(_object *, void *)'\n {(char*)"output_padding", (getter)getTupleAttr<ConvBackwardBackward, std::vector<int>, ConvParams,\n ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ntorch/csrc/autograd/functions/init.cpp:82:11: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Convert'\nPyObject* getTupleAttr(PyObject* obj, void* _unused)\"><pre class=\"notranslate\"><code class=\"notranslate\">torch/csrc/autograd/functions/init.cpp:220:37: error: address of overloaded function 'getTupleAttr' does not match required type '_object *(_object *, void *)'\n {(char*)\"output_padding\", (getter)getTupleAttr<ConvBackwardBackward, std::vector<int>, ConvParams,\n ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\ntorch/csrc/autograd/functions/init.cpp:82:11: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'Convert'\nPyObject* getTupleAttr(PyObject* obj, void* _unused)\n</code></pre></div>\n<p dir=\"auto\">The cause of the problem is <a class=\"commit-link\" data-hovercard-type=\"commit\" data-hovercard-url=\"https://github.com/pytorch/pytorch/commit/aa911939a328eff55c9b28b39ed3c43507ba8a2a/hovercard\" href=\"https://github.com/pytorch/pytorch/commit/aa911939a328eff55c9b28b39ed3c43507ba8a2a\"><tt>aa91193</tt></a>:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\" {(char*)"output_padding", (getter)getTupleAttr<ConvForward, std::vector<int>, ConvParams,\n- &ConvParams::output_padding, long, PyInt_FromLong>, NULL, NULL, NULL},\n+ &ConvParams::output_padding, int64_t, PyInt_FromLong>, NULL, NULL, NULL},\"><pre class=\"notranslate\"><code class=\"notranslate\"> {(char*)\"output_padding\", (getter)getTupleAttr<ConvForward, std::vector<int>, ConvParams,\n- &ConvParams::output_padding, long, PyInt_FromLong>, NULL, NULL, NULL},\n+ &ConvParams::output_padding, int64_t, PyInt_FromLong>, NULL, NULL, NULL},\n</code></pre></div>\n<p dir=\"auto\">It seems that on clang, changing the type parameter here is sufficient to cause template instantiation to fail.</p>\n<p dir=\"auto\">Maybe the easiest way to fix this is to write a more portable version of PyInt_FromLong (and friends) which always returns <code class=\"notranslate\">int64_t</code>.</p>", "<p dir=\"auto\">I try the scipy ward clustering, when calculating linkage, it report follow error:</p>\n<div class=\"snippet-clipboard-content notranslate position-relative overflow-auto\" data-snippet-clipboard-copy-content=\"ward_h = linkage(X, method='ward', metric='euclidean')\nPython(2557,0x7fff732cc310) malloc: *** mach_vm_map(size=18446744067627675648) failed (error code=3)\n*** error: can't allocate region\n*** set a breakpoint in malloc_error_break to debug\n---------------------------------------------------------------------------\nMemoryError Traceback (most recent call last)\n<ipython-input-10-769ae7c53f7c> in <module>()\n----> 1 ward_h = linkage(X, method='ward', metric='euclidean')\n\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.pyc in linkage(y, method, metric)\n 652 Z = np.zeros((n - 1, 4))\n 653 _hierarchy_wrap.linkage_euclid_wrap(dm, Z, X, m, n,\n--> 654 int(_cpy_euclid_methods[method]))\n 655 return Z\n 656 \n\nMemoryError: out of memory while computing linkage\"><pre class=\"notranslate\"><code class=\"notranslate\">ward_h = linkage(X, method='ward', metric='euclidean')\nPython(2557,0x7fff732cc310) malloc: *** mach_vm_map(size=18446744067627675648) failed (error code=3)\n*** error: can't allocate region\n*** set a breakpoint in malloc_error_break to debug\n---------------------------------------------------------------------------\nMemoryError Traceback (most recent call last)\n<ipython-input-10-769ae7c53f7c> in <module>()\n----> 1 ward_h = linkage(X, method='ward', metric='euclidean')\n\n/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/cluster/hierarchy.pyc in linkage(y, method, metric)\n 652 Z = np.zeros((n - 1, 4))\n 653 _hierarchy_wrap.linkage_euclid_wrap(dm, Z, X, m, n,\n--> 654 int(_cpy_euclid_methods[method]))\n 655 return Z\n 656 \n\nMemoryError: out of memory while computing linkage\n</code></pre></div>\n<p dir=\"auto\">How can I solve this?</p>\n<p dir=\"auto\">The data set I use is here: <a href=\"https://dl.dropboxusercontent.com/u/68126956/df.csv\" rel=\"nofollow\">https://dl.dropboxusercontent.com/u/68126956/df.csv</a>.</p>\n<p dir=\"auto\">Thanks.</p>", "<p dir=\"auto\">Make sure these boxes are checked before submitting your issue - thank you!</p>\n<ul dir=\"auto\">\n<li>[yes ] I have checked the superset logs for python stacktraces and included it here as text if any</li>\n<li>[yes ] I have reproduced the issue with at least the latest released version of superset</li>\n<li>[yes ] I have checked the issue tracker for the same issue and I haven't found one similar</li>\n</ul>\n<h3 dir=\"auto\">Superset version</h3>\n<p dir=\"auto\">0.19.1</p>\n<h3 dir=\"auto\">Expected results</h3>\n<p dir=\"auto\">I try to draw mapbox in superset. I have dataset with column Latitude and Longitude and use it in respective field.</p>\n<h3 dir=\"auto\">Actual results</h3>\n<p dir=\"auto\">TypeError: <superset.connectors.druid.models.DruidMetric object at 0xefbea90> is not JSON serializable</p>\n<h3 dir=\"auto\">Steps to reproduce</h3>\n<p dir=\"auto\"><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https://user-images.githubusercontent.com/13684386/31226202-8d89286c-a9d5-11e7-8d8b-eb7e9d6c4d77.png\"><img src=\"https://user-images.githubusercontent.com/13684386/31226202-8d89286c-a9d5-11e7-8d8b-eb7e9d6c4d77.png\" alt=\"togithub\" style=\"max-width: 100%;\"></a></p>\n<p dir=\"auto\">Anyone already have the same problem?<br>\nThanks</p>" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Ctrl+K