Talk:Splay tree
This article is rated C-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||
|
Zigzag
[edit]Um. I'm not sure the zigzig step is correct. I implemented it and the performance sucked. Works better if the result from a zigzig zigzags. —Preceding unsigned comment added by 122.107.243.206 (talk) 02:01, 19 April 2009 (UTC)
- Yes, I noticed there is a problem with the description of the Zig-zag step and editted it. Please someone check the changes. — Preceding unsigned comment added by MazinIssa (talk • contribs) 08:06, 11 July 2013 (UTC)
tree
[edit]Someone please do a pseudocode implementation of splay on the main page. (I removed the note suggesting that, since such notes should be confined to the talk pages.) 68.100.96.143
I've been trying to work out a way to balance a splay tree since I saw this article's claim that splay trees are "self-balancing" and support O(log N) "amortized time" insertions and deletions. I have found multiple splay trees that cannot be balanced and also keep the order defined by splaying. I'm changing the "amortized time" for now, but I would like to chat with someone about the idea that a splay tree can self-balance, just in case I'm crazy. Tjdw 14:16, 15 Jan 2004 (UTC)
- I believe splay trees (at least above a certain size) will be (roughly) balanced (meaning their height will be in O(log n)), or at least cannot stay unbalanced once nodes in unbalanced parts of the tree are accessed. I have not checked for myself, but Sleator and Tarjan state in their original article that the height of nodes on the access path to a node being accessed get approximately halfed. If splay trees could be and stay unbalanced, the given worst-case access time could hardly hold. As many internet sources claim this too, I reintroduced it to the article (otherwise the last paragraph would in my eyes not be correct).DrZ 23:11, 14 May 2004 (UTC)
- Actually, splay trees are not necessarily self-balancing, they are only stochastically self-balancing. That is why you get O(log n) amortized time - a single access can take n comparisons, just as with a plain binary search tree. It will, however, also somewhat rebalance the tree. Moreover, the insertations that debalance the tree are very cheap, so that in sum you get excellent behaviour. --Stephan Schulz 13:57, 26 May 2005 (UTC)
Should note that the complexity approximates O(M) where M is the working set of nodes, and is therefore appropriate to use when the working set is significantly smaller than the full set. Should also note that both reading (find) and writing (insert/delete) are about equally expensive, so it may behave poorly compared to other self-balancing trees (that only modify the tree in update) when updates are rare.
- Complexity should be O(log M), unless I misuderstand what you are saying. --Stephan Schulz 13:57, 26 May 2005 (UTC)
- I believe you are right. It should be O(log M). Vecter (talk) 16:36, 16 June 2010 (UTC)
Although the working set theorem says splays have a near O(1) access time for those items in the working set (given working set is small relative to the number of total nodes and constant), in practice randomly constructed BSTs have been shown to outperform splays even when the access pattern is heavily skewed (see, for example, Bell and Gupta). Also of note is that splay trees have better cache performance than BSTs under certain settings. Finally, we should discuss different splalying variants. Top Down vs. Bottom up. The "simple" top-down splaying variant (explained by Sleator and Tarjan) as implemented by Weiss (Data structures in C) is fast in practice (as compared to other variants).
I should probably write up pseudocode (or other more detailed explanation of the algorithm), given that I have both implemented it and documented it before. Bovlb 07:56, 2004 Mar 5 (UTC)
Diagrams of Splaying would be helpful.
I would like to know what literature/research support the claim that "randomly rebalancing the tree can avoid this unbalancing effect and give similar performance to the other self-balancing algorithms"? I didn't manage to verify that, though I tried.
- I agree with you. Randomization for splay trees (as by such authors as Fürer, Albers and Karpinski) only helps to save time on randomly not splaying. And only in some cases. As conjectured, of course, it only improves the working time by a constant factor. —Preceding unsigned comment added by 129.67.117.122 (talk) 19:38, 7 January 2008 (UTC)
Self-balancing?
[edit]I have an issue with the fact that this page says splay trees are "self-balancing" as opposed to "self-adjusting". AVL trees, for example, are self-balancing because they maintain a height of O(log n). In contrast, splay trees can have linear height for an indefinite period of time. Of course, splay trees were indroduced as "self-adjusting binary search trees" as well. (Jamie King 16:30, 23 February 2006 (UTC))
- See my comment above. They are only stochastically self-balancing. Still, I think the link is useful, so I don't know if we should change it.--Stephan Schulz 20:09, 23 February 2006 (UTC)
- They are NOT self balancing. They are self adjusting. They do, as you say, for certain lookup patterns (e.g., random and others (see the "working set theorem" by Sleator and Tarjan)) provide amoritized O(lg n) lookups. However, amortized lookups (for some access patterns) is not equivalent to "stocahtiscally self-balancing". A valid splay tree could have essentially a linked list structure (at times) which is hardly "balanced".
- Concur. Splay tress are NOT self balancing trees, they tend to be balanced for common lookup sequences (e.g., uniform queries) but they are not self-balancing in the same way that AVL or Red-Black trees are.
- This should be taken care of. I'll do it (i.e. remove the 'self-balancing') if no objection is made during the week --24.37.173.150 (talk) 04:37, 29 November 2009 (UTC)
- Not without a reference, and I doubt you'll find one.- Wolfkeeper 05:10, 29 November 2009 (UTC)
I just wanted to also point that splay tree are not self-balancing. In fact they are very unbalanced! They are self-adjusting yes, to the workload, and lookup patterns. I can easly create such pattern of accesses to splay tree which will make it a list! Which is patologically unbalannced. —Preceding unsigned comment added by 87.239.216.2 (talk) 19:46, 4 August 2010 (UTC)
- If you think about it they have to be actually self-balancing in a stochastic sense. If they weren't they would never be able to achieve O(ln n) average speed. There are cases where they are partially linear and can become unbalanced, but it takes O(1) operations to cause that, and an O(n) type operation to fix it, so the average is better than O(ln n).- Wolfkeeper 03:40, 13 August 2010 (UTC)
Uniform sequence
[edit]Can anyone explain to me what "uniform sequence" (of operations) means ? --194.9.67.130 12:48, 26 November 2006 (UTC)
- Uniform access basically refers to accessing elements of the tree uniformly at random. A uniform sequence of operations to a sequence exhibiting uniform access, i.e., artifacts such as long runs accessing a single node or accessing just a small subset of the nodes are uncommon. It is for non-uniform sequences such as these that splay trees show an advantage. Deco 17:06, 26 November 2006 (UTC)
Where is Splay?
[edit]Why when you search for splay are you brought here. A friend was told they had splayed feet I searched for it on here and found nothing but this on the word splay. I have of course found out what it means. But why do you redirect people to this technical description of something else. Can you add splay (and it's 2 meanings and multiple uses)? I know I can, but I don't know how to make it stop redirecting —The preceding unsigned comment was added by 87.74.86.198 (talk • contribs) .
- Just go to this page and edit away. In general, if you come via a redirect, clicking on the little "(Redirected from Splay)" will take you to the page that redirected you. The redirection is part of the page contents, just replace it with whatever you think needs to be said. You probably will create a disambiguation page, so please try to follow the Manual of Style. --Stephan Schulz 22:23, 27 November 2006 (UTC)
symmetric order
[edit]The sequential access theorem mentions symmetric order. I don't know what that means. The original paper also states the theorem (using the same term), but does not prove it. Does anybody know what this means? —Preceding unsigned comment added by Raduberinde (talk • contribs) 13:37, 15 September 2007 (UTC)
Performance theorems
[edit]The last theorem of this section mentions a concrete constant for the upper bound. It is meaningless without knowing which operations are counted. 83.29.230.229 (talk) 19:37, 15 May 2008 (UTC)
Multi threading
[edit]Without being any expert on splay trees it seems like splay trees makes looking up elements into a changing operation which cannot be done from multiple threads at the same time. This is different from most other binary tree and it makes sense to mention it as a performance characteristic. —Preceding unsigned comment added by 38.99.50.50 (talk) 20:17, 20 April 2010 (UTC)
- You are absolutely right. I mentioned this in the disadvantages. However, I think concurrent find operations can be made to work without much trouble. First of all, the splay operation need to be in a critical section. Secondly, we can't splay at a node x if a thread is accessing the parent or grandparent of x. Thirdly, we don't want two threads performing splay operations to be at nodes too close to each other. I think everything can be managed in a fairly straightforward way, but the fact that we need to manage it at all is certainly a drawback, and a constant-factor time increase is inevitable. --Jamie King (talk) 16:58, 8 June 2010 (UTC)
- Modifying during simple get/find operations would lead to cache trashing when cache lines are owned by another CPUs. That would not be a constant time, though. Usually modifying shared state is the worst scalability bottleneck. While releasing the mutex there will be extra write buffers to be flushed as well, the performance would vary depending if the buffers are can settled during cache misses. Bestsss (talk) 00:52, 5 December 2011 (UTC)
- Top-down splaying can be modified to support concurrency using the method I developed in the paper Concurrent Operations on Priority Queues, CACM 32, 1 (January 1989) -- the algorithm developed there uses skew heaps, but the parallelization method works equally well for top-down splaying. Each thread completes its operation in an amortized O(log n) time, but a new operation may be started by another thread after O(1) time. This does indeed create problems with cache thrashing, but it can lead to significant speedups for small scale multithreading. Douglas W. Jones (talk) 14:58, 10 December 2014 (UTC)
Persistence
[edit]It's mentioned that splay trees can be made into a persistent data structure. Intuitively, I don't see how this is true. Certainly you could make insert and delete operations return new trees, but because accesses in a splay tree edit the tree, is everytime you *access* the tree going to return the desired element *and a new tree?* That seems prohibitively expensive, even for a persistent structure. —Preceding unsigned comment added by 192.160.124.68 (talk) 20:46, 12 May 2010 (UTC)
Splay trees cannot be made persistent. The issue is not the fact that accesses return new trees or whatnot; that can be dealt with if the number of rotations were O(log N). The real issue is that splay trees require amortized analysis. If you go back in a "bad" part of history where the potential function is very high and repeatedly do the worst-case operation a bunch of times, it can cause each operation to be O(N). — Preceding unsigned comment added by 47.187.194.19 (talk) 00:42, 23 July 2017 (UTC)
External links
[edit]There are too many external links in this article. From WP:EL: "Links in the "External links" section should be kept to a minimum," "Long lists of links are not acceptable." Many of the sites linked to are also of unknown reliability. For example [1]. This seems completely unreliable: [2]. Offliner (talk) 00:46, 8 April 2009 (UTC)
- I agree. I removed all but three of the links. --Jamie King (talk) 17:19, 8 June 2010 (UTC)
Deleted deletion
[edit]I've deleted the alleged C code for deletion because a) it was not C and b) it was badly broken.
If we want to give an example implementation, I propose Sleator's original code, released into the public domain [3]:
Tree * delete(int i, Tree * t) {
/* Deletes i from the tree if it's there. */
/* Return a pointer to the resulting tree. */
Tree * x;
if (t==NULL) return NULL;
t = splay(i,t);
if (i == t->item) { /* found it */
if (t->left == NULL) {
x = t->right;
} else {
x = splay(i, t->left);
x->right = t->right;
}
size--;
free(t);
return x;
}
return t; /* It wasn't there */
}
or, cleaned up a bit:
/* Deletes i from the tree if it's there. */
/* Return a pointer to the resulting tree. */
Tree * delete(int i, Tree * t)
{
Tree * x;
if (t==NULL)
{
return NULL;
}
t = splay(i,t);
if (i == t->item)
{ /* found it */
if (t->left == NULL)
{
x = t->right;
}
else
{
x = splay(i, t->left);
x->right = t->right;
}
size--;
free(t);
return x;
}
return t; /* It wasn't there */
}
Likewise, the original code for splay would be preferable because it needs not "parent" link, unlike the version we present now. --Stephan Schulz (talk) 13:28, 20 April 2011 (UTC)
Recursion in the implementation of splay method
[edit]The recursive call to the splay method inserts the same arguments "root" and "x" again. This means that the method need not be done recursively, as it becomes less efficient.
Using recursion in all cases does not really clarify the point of splaying and I was wondering whether it be best that it is turned into a non-recursive method. —Preceding unsigned comment added by 129.132.45.239 (talk) 15:10, 12 May 2011 (UTC)
- What do you mean "inserts"? Recursion is a natural approach for tree operations like this. The function calls itself until x==root (after being changed by left and right rotations). I think you'll find an iterative implementation is harder to follow and only marginally more efficient. Maghnus (talk) 00:09, 13 May 2011 (UTC)
C code, splay problem
[edit]Watch out since splay function modifies only copy of root. So if you start with some root don't expect it to change when running splay(x, root)! — Preceding unsigned comment added by 89.67.175.78 (talk) 22:33, 10 January 2013 (UTC)
C++ implementation
[edit]There are several problems:
- Memory leaks - operator 'new' is used without single 'delete'.
- Splaying should always be performed for complexity statements to be true.
- That means while finding an element, last visited node should be splayed regardless whether element was found or not.
- This is also true for deletion, min_element and max_element.
Additional issues with the C++ implementation
[edit]- Since duplicates are permitted on the right of the subtree, the properties of subtrees must be left < root <= right. Unfortunately, right rotations are done symmetrically to left rotations, thus breaking the property.
- Running the following code leads to a segmentation fault
int main(int argc, char **argv)
{
splay_tree<int> T;
T.insert(1);
T.insert(1);
T.insert(2);
T.erase(2);
}
The implementation is not correct
[edit]In the find procedure the splay method is not called, thus we are not paying by restructuring the tree for accessing the element. The same reasoning applies to the delete method. If both the left and the right subtrees of the node we want to delete is non-empty then we are taking the minimal element of the right subtree. Again potentially visiting O(n) nodes without calling the splay method. — Preceding unsigned comment added by 78.131.82.48 (talk) 12:27, 7 June 2017 (UTC)
- This issue still exists and it is quite an important problem. Lot of my students get confused by this and if anyone actually uses this implementation, it may cause serious problems. I suggest a simple fix - current find method should be made private and possibly renamed and new public find should be made. Public find would just call private find and then do splay (possible dealing with some corner cases)- This way, implementation of all other methods such as insert and delete would remain the same. Mitch.ondra (talk) 15:26, 20 October 2021 (UTC)
amortized(?) worst case
[edit]As with Scapegoat tree the infobox shows "amortized O(log n)" for worst case insertion or deletion. What I don't understand at all in this context is the term "amortized". Isn't "worst case" specific enough? Does "amortized O(log n)" for worst case in effect mean: possibly worse than O(log n), but on a certain average O(log n)?
If so then "amortized O(log n)" for average insertion or deletion would say what is meant. –Nomen4Omen (talk) 08:30, 4 October 2021 (UTC)
- See Amortized analysis. An amortized time bound is for worst-case sequences of operations, but averaged over all of the operations in the sequence. Some individual operations can be slow as long as the total time for the whole sequence is small. A time bound that is stated as worst case without this qualifier is usually understood to be the largest possible time for a single individualoperation. —David Eppstein (talk) 16:58, 4 October 2021 (UTC)
- OK, I looked at it — and the point I am making has absolutely NOTHING to do with the article Amortized analysis. The columns titles are "average" and "worst case" and there is no mention of "amortized worst case". For me, this would mean that –if people are not to be mislead– a third column "worst case" must be given in addition to the two columns "amortized average" (or "average") and "amortized worst case", because obviously amortized worst case is NOT worst case (worst case is O(n) and WORSE than the displayed complexity O(log n)). –Nomen4Omen (talk) 17:47, 4 October 2021 (UTC)
- Amortized analysis is a type of worst case analysis. It is the worst case over a whole sequence of operations rather than for a single operation, but worst case is how to classify it. If you're only making the distinction between average-case complexity and worst-case complexity (which is what the infobox does), then amortized is on the worst-case side of that distinction. Average case is about making assumptions about typical inputs and computing expectations based on the assumptions. Worst case is about being ready to handle even the most diabolical input designed to make your algorithm look bad. Amortized analysis is worst case. —David Eppstein (talk) 18:12, 4 October 2021 (UTC)
- OK, I looked at it — and the point I am making has absolutely NOTHING to do with the article Amortized analysis. The columns titles are "average" and "worst case" and there is no mention of "amortized worst case". For me, this would mean that –if people are not to be mislead– a third column "worst case" must be given in addition to the two columns "amortized average" (or "average") and "amortized worst case", because obviously amortized worst case is NOT worst case (worst case is O(n) and WORSE than the displayed complexity O(log n)). –Nomen4Omen (talk) 17:47, 4 October 2021 (UTC)
- It is really very nice what you are saying and I agree somehow: "Amortized analysis is a type of worst case analysis." (It is not so absolutely true, because there is also "amortized average" in the infobox. [Maybe there is also an "amortized average worst case".] As I understand it – but it is not really important for our debate: both, worst case and average, can be analyzed in an amortized way.)
But the title of the column is Worst case and NOT "amortized worst case", so you should at least also bring the true worst case and not only the amortized worst case. At least, if they are different, and they are different in our case.
(If the title is mammals, you bring as the item: squirrels. Although you certainly know that they are breast feeding animals – and there are also apes and humans and pigs.) –Nomen4Omen (talk) 21:33, 4 October 2021 (UTC)- You are starting to remind me of User:Jnc/Astronomer vs Amateur. I hate to pull the expert card, but this is a topic I have been teaching at the graduate level for years. To put it bluntly: average versus worst case is about what kind of inputs you are looking at. You can do worst case analysis in all sorts of ways: worst case for space, worst case for time for individual operations, worst case for time for sequences of operations. Amortized is merely one of those details. If the infobox is incapable of representing fine distinctions (as most infoboxes are) then we should go with the distinctions that it has on offer. It would be a severe misrepresentation of the splay tree data structure to describe it as having good average case but a bad worst case; it's missing the entire point of this data structure. In fact the average case complexity is also wrongly represented here; it should probably be described as amortized O(H) or amortized O(entropy) because for random inputs drawn from any probability distribution the splay trees are as good as any static tree, in the same amortized sense. Or another way of putting it: there is no sense in which amortized time is not "the true worst case". It is the worst case time, over sequences of operations. Almost always, except in real-time applications it is what you want. And in many situations, even if you think you are implementing a data structure with worst-case-per-operation performance, what you actually end up getting is amortized, because of effects like triggering a slow garbage-collection in the middle of an operation. —David Eppstein (talk) 22:47, 4 October 2021 (UTC)
- It is really very nice what you are saying and I agree somehow: "Amortized analysis is a type of worst case analysis." (It is not so absolutely true, because there is also "amortized average" in the infobox. [Maybe there is also an "amortized average worst case".] As I understand it – but it is not really important for our debate: both, worst case and average, can be analyzed in an amortized way.)
- Thanx for your detailed response. I had again a look at the article Amortized analysis and although the word "worst" occurs three times in it, I cannot understand that amortized analysis describes THE WORST CASE. The term worst-case is defined precisely enough and does not need to receive additional precision, e.g. by adding "amortized" or the like. On the other hand, amortized analysis is as well defined precisely enough, and as I understand e.g. Mehlhorn & Sanders[1], there is neither an "amortized worst-case" nor an "amortized average", but only "amortized analysis". This would at least mean that you have to decide which column in the InfoBox shall receive the attribute amortized. As pointed out earlier, I would prefer to give it to the average, because as I understand "Amortized analysis", it resembles much more an average than it resembles a worst-case, at least if you consider that it is averaged across a so-called sequence of operations – although, as you may object, it does not take the optimistic results of the single operations. As I understand the literature, "Amortized analysis" has been invented for giving the user of a data structure a more realistic impression of its performance than a worst-case figure or an average over all possibilities (whatever the latter could mean [indeed it looks as if this kind of more precise statement is done by the "Amortized analysis"]). So, if we take the "Splay tree" it is really a pity to have to display O(n) in all 3 lines (search,insert,delete) of the column worst-case. But it is the truth, because it may happen for every single operation. Nevertheless, the knowing user knows that the performance of his application is not dominated by a single operation. –Nomen4Omen (talk) 09:45, 6 October 2021 (UTC)
- ^ Mehlhorn & Sanders 2008, pp. 165, 158
- Both columns are amortized. The splay tree, on random sequences of accesses with some probability distribution, takes an expected amount of time per operation that is proportional to the entropy of the distribution. On arbitrary (i.e. worst case) sequences, it takes an amount of time that is always (i.e. in the worst case) at most proportional to log n per operation. In both cases "amortized" means we are analyzing whole sequences of operations rather than one operation at a time. Your "
I cannot understand
" should have been a big flashing clue: if you do not understand something, you should not be arguing about it and trying to edit the article to reflect your non-understanding, when there are other editors around who do understand it. —David Eppstein (talk) 16:18, 6 October 2021 (UTC)
- Both columns are amortized. The splay tree, on random sequences of accesses with some probability distribution, takes an expected amount of time per operation that is proportional to the entropy of the distribution. On arbitrary (i.e. worst case) sequences, it takes an amount of time that is always (i.e. in the worst case) at most proportional to log n per operation. In both cases "amortized" means we are analyzing whole sequences of operations rather than one operation at a time. Your "
- I indeed fully agree (and have never objected to) that "amortized" means we are analyzing whole sequences of operations rather than one operation at a time. But I disagree that "Worst case" means a sequence of many cases. It simply means "Worst case", some case which is worst. And the article admits "Since the height of a tree corresponds to the worst-case access time, this means that the actual cost of a single operation can be high", and points out that there is a difference between worst case and amortized access cost.
But you as teacher of splay trees at the graduate level for years insist to point out: there are no single operations, only big sequences of operations do exist. How big? Arbitrarily big, asymptotically big. –Nomen4Omen (talk) 16:56, 6 October 2021 (UTC)- You're still arguing and still reminding me very strongly of astronomer vs amateur. Stop. —David Eppstein (talk) 17:00, 6 October 2021 (UTC)
- I indeed fully agree (and have never objected to) that "amortized" means we are analyzing whole sequences of operations rather than one operation at a time. But I disagree that "Worst case" means a sequence of many cases. It simply means "Worst case", some case which is worst. And the article admits "Since the height of a tree corresponds to the worst-case access time, this means that the actual cost of a single operation can be high", and points out that there is a difference between worst case and amortized access cost.
Strangely enough, the stupid "amateur" was (within the timeframe of the above debate) able to solicit the godlike "astronomer" to change his "celestial laws" (in the Splay-tree-infobox) from (timestamp 16:59, 4 October 2021)
Algorithm Average Worst case Search amortized O(log n) amortized O(log n) Insert amortized O(log n) amortized O(log n) Delete amortized O(log n) amortized O(log n)
to the marvellous improvement (timestamp 22:52, 4 October 2021)
Algorithm Average Worst case Search amortized O(entropy) amortized O(log n) Insert amortized O(entropy) amortized O(log n) Delete amortized O(entropy) amortized O(log n)
And because Big-O is always a limit, we possibly now have learnt that , because Average is always < Worst case.
And this all even in an amortized way !!
Is more possible ?? No, not possible !!
This is really great an achievement !! ?? Thanks to the "astronomer" !! –Nomen4Omen (talk) 13:38, 7 October 2021 (UTC)
- Gibbs' inequality#Corollary. This is standard and well-known. —David Eppstein (talk) 18:07, 7 October 2021 (UTC)
Referring to your „terrific“ post in Talk:Big O Notation
[edit]@David Eppstein: Referring to your „terrific“ post in Talk:Big O Notation on 22:46, 25 November 2021 I learn that you do not know very much about the subject. This is really a problem, not so much for us Wikipedians, who are able to fight against you, but severely for the students who you are teaching and who are depending on you.
In the big-O assessments of performance analysis the given variable(s) has/have to describe the problem size, otherwise the limit stated by the big-O is senseless. You introduce O(entropy), but although entropy may finally lead to the same number (e.g. ) and describes some state of the problem, it is not the problem size which analysis is based upon.
Unfortunately, your knowledge of amortized analysis is defective as well. As Rebecca Fiebrink points out: “Amortized analysis is a useful tool that complements other techniques such as worst-case and average-case analysis.“ So there should be 3 columns in the InfoBox Data structure, namely amortized, average and worst-case. But there are only the latter 2. Since amortized is much more related to average than to worst-case (and kind of alternative assessment to it), I put the big-O there with the additional adjective amortized.
So I am giving back what you inserted as "Edit summary on 16:59, 4 October 2021: «If you don't understand amortized analysis you should not be making this sort of change.» Nevertheless, best regards, –Nomen4Omen (talk) 17:00, 26 November 2021 (UTC)
- This is so wrong that I think it is not even worth explaining in detail exactly where all your mistakes are. For one big one: no, it is incorrect that size is the only parameter of a problem instance relative to which we can measure runtime. Please find a different area of Wikipedia to edit, one that does not so clearly show you to be out of your depth. —David Eppstein (talk) 17:31, 26 November 2021 (UTC)
@David Eppstein: Thank you very much for bringing in the quote of Grinberg "Average depth of access in a splay tree is proportional to the entropy." And we know that the entropy is proportional to And we know that the cost of searching is amortized which is a quote of Tarjan p. 315: "The amortized time of an access in an n-node splay tree is O(log n)."
In order to be very explicit: I do NOT have a problem with the Grinberg statement. But, why is this not the same as yours? With the big-O notation there are some tacit assumptions:
- A naked statement "search is amortized " tacitly declares the (here) one and only one independent variable in the big-O expression, here the , to be proportional to the problem size. ( is not an exception to this, because 1 is not a variable.)
- Another tacit assumption is the default to "worst-case", if there is no explicit "average" or "amortized" or the like.
- A further assumption which (in computer science) rarely is made explicit is that the big-O is expressed for infinitely big problems, here
Now, certainly as the problem size is not proportional to the entropy (which is proportional to and neither nor entropy is the problem size).
Of course, you are right by saying "it is incorrect that size is the only parameter of a problem instance relative to which we can measure runtime". Of course, there may be other parameters in which we can measure runtime as well. And sometimes we need more than one. But in the given shorthand notation we are well advised to adhere very closely to the established(??) (and already much too loose!!) conventions and not to stress them additionally. And "search is amortized " can approximately be found at Tarjan. Why are you insisting in throwing it out?
PS: However, if you succeed to show me a reliable source, almost verbally saying "The amortized time of an access in an n-node splay tree is O(entropy)", then I will give in.
Best regards, –Nomen4Omen (talk) 14:09, 1 December 2021 (UTC)
- Are you really asking why a quote that says "Average depth of access in a splay tree is proportional to the entropy" can be used to justify an infobox entry that states the average time as O(entropy)? Really? You have it fed to you on a spoon with almost the exact wording and still you cannot understand it yourself? As for the dependence on n, first of all you are wrong: the dependent variable inside the O is not assumed to always be problem size; it is whatever is unbound inside the O, with some textual disambiguation needed only in cases where there is more than one unbound thing inside the O. Indeed, for many graph algorithms it is the number of vertices which is not the problem size. In uses of O-notation outside algorithm analysis, there is no "problem size". And second, even if you really really insist on that comfort blanket of problem size as variable, it works to imagine that you are dealing with a parameterized family of input distributions whose entropy depends on n, so that the expression can be read as a shorthand for O(entropy(n)). —David Eppstein (talk) 17:16, 1 December 2021 (UTC)
- "the dependent variable inside the O ...": most probably you mean the independent variable.
- I think I was clear enough that I understand: "Average depth of access in a splay tree is proportional to the entropy."
- "it is whatever is unbound inside the O" In your formulation this would be entropy which is NOT the problem size !!
- Instead of "read as a shorthand for O(entropy(n))" I would prefer a good source.
- And it would be 100% sufficient for me to find deeper in the article "Average depth of access in a splay tree is proportional to the entropy."
- –Nomen4Omen (talk) 17:46, 1 December 2021 (UTC)
- I mean the variable quantified by the O. And if you keep up this ridiculous WP:RANDY tendentiousness I am becoming convinced that you need a topic ban from algorithm analysis related topics, broadly considered. You are an enormous waste of time here and have contributed nothing useful. —David Eppstein (talk) 17:50, 1 December 2021 (UTC)
- @David Eppstein:
I have changed your entries in the InfoBox to entries for which there are precise and reliable sources.
Before you revert, please do not forget to enter sources for:
- The notion of »Worst-case Search/Insert/Delete = amortized O(log n)«. I never saw the combination of Worst-case and amortized. In all literature I have seen so far, there is either Worst-case or amortized, but never a combination of the two (best example Rebecca Fiebrink: „Amortized analysis is a useful tool that complements other techniques such as worst-case and average-case analysis.“
- I have found a source which denies a guarantee for search in O(log n).
- Sources for »Search/Insert/Delete = amortized O(entropy)« are missing as well. (See above)
- You quote Grinberg in section References: „Average depth of access in a splay tree is proportional to the entropy.“ (OK, no problem!) But in the open text you say: »The expected (average case) amortized cost of each access is proportional to the Shannon entropy of the distribution.« (Problem: amortized conflicts with expected and with average.) I have removed the word amortized.
- –Nomen4Omen (talk) 14:13, 4 December 2021 (UTC)
- I have undone your edits because they are bad. Your ignorance of how amortized analysis works, demonstrated repeatedly throughout this discussion, is a reason for you to improve your understanding, not to disimprove the article to match your lack of understanding.
- Some citation overkill for you (off-topic for this article):
- Lecture notes from U. Washington: "While amortized analysis is about averages, we are averaging cost-per-operation on worst-case input. Contrast: Average-case analysis is about averages across possible inputs."
- UCSD lecture notes: uses exact phrase "worst-case amortized time".
- Cornell lecture notes: "Amortized analysis is a worst-case analysis of a a sequence of operations".
- —David Eppstein (talk) 17:10, 4 December 2021 (UTC)
- @David Eppstein:
- No, my edits are not bad. They are well sourced.
- My big error is that I change too much for your comprehensibility in one edit. The theory is really tricky, already verbally. E.g. you identify upper bound (of a sum) with worst-case. But you are not the only one with this mistake.
- This I can say immediately. –Nomen4Omen (talk) 18:18, 4 December 2021 (UTC)
- "Upper bound" is pretty much the same concept as "worst case": it's an upper bound because it's greater than or equal to the time for the worst case input. If you are going to quibble that it might be a sloppy upper bound that is bigger than anything that can actually achieved by an input, then that same quibble applies to a lot of other worst case analysis of algorithms, but it has nothing to do with whether it is worst case analysis. If you would get out of the mode of micro-analysis of text for minor sloppiness that you can then jump on as somehow being an error, and into the mode of recognizing that you have more to learn, your contributions might be more helpful. As it is, the next time you revert I am taking you to ANI for a user-conduct discussion as indistinguishable from the green cheese amateur. —David Eppstein (talk) 19:45, 4 December 2021 (UTC)
- From seeing through the discussion, David Eppstein clearly has a better part of the argument here. But Nomen4Omen is failing to WP:LISTEN. You should also learn how to rightly cite sources in Wikipedia since there are a lot of MOS mess-ups concerning citations here. WikiLinuz 🍁 (talk) 19:45, 4 December 2021 (UTC)
@David Eppstein:
In the InfoBox there are 2 columns to be filled by data, namely Average and Worst-case. In the article we have the 3 operations (rows) Search+Insert+Delete.
Question 1: | Can you explain to a WP-reader, what does Average and Worst-case mean in the case of the InfoBox Data Structure? What is the difference? (In all other InfoBoxes I've looked at, I had the impression that everybody understood Average as in average case analysis and Worst-case as in worst case analysis.) |
Question 2: | Do you mean amortized Average in column Average and amortized Worst-case in column Worst-case (when entering amortized O(entropy) and amortized O(log n) respectively)? If not, please explain! |
Question 3: | Can you name us articles which explain the difference between amortized Average and amortized Worst-case? |
Question 4: | What shall the difference between amortized O(entropy) and amortized O(log n) mean in your opinion? |
All questions are interrelated. But Question 3 is the most important. –Nomen4Omen (talk) 16:41, 5 December 2021 (UTC)
- I don't think that it is now the right moment to care about MOS concerning citations.
- Pls be careful about whom is failing to listen in our case here. (Because at least the very same could be said of David Eppstein. And as you can look up in the history, he was the one who started to revert, and this although the references are (and have been) top. And are now replaced by home-fiddled ones.)
–Nomen4Omen (talk) 16:41, 5 December 2021 (UTC)
- Your batches of multiple questions, stopped being useful for article improvement long long ago. Answering them has been demonstrated to lead only to another batch of even more questions, without any evidence that you are taking in anything from the previous answers. I am not going to answer these. Go learn this material properly and answer them for yourself. —David Eppstein (talk) 17:36, 5 December 2021 (UTC)
- OK, if 4 questions are a batch for you, please respond at least to question 3. –Nomen4Omen (talk) 18:01, 5 December 2021 (UTC)
- No. If you ask questions that already demonstrate your bad misunderstanding as part of the question, you should not be surprised to be told to go away until you understand the subject better. And it is off-topic for this talk page and this article. —David Eppstein (talk) 19:08, 5 December 2021 (UTC)
- OK, if 4 questions are a batch for you, please respond at least to question 3. –Nomen4Omen (talk) 18:01, 5 December 2021 (UTC)
Year of Invention
[edit]Given Splay trees appeared in https://dl.acm.org/doi/10.1145/800061.808752 in 1983 and they are used within Tarjan's Data Structures and Network Algorithms book, shouldn't the year be 1983 instead of 1985? 2A09:80C0:192:0:80F0:4292:4AC1:57E0 (talk) 14:31, 7 February 2024 (UTC)